solr - an R client for Apache Solr




Scott Chamberlain (@recology_ / @ropensci)

UC Berkeley / rOpenSci

Supported by:
sloan

License: CC-BY 4.0 - You are free to copy, share, adapt, or remix, photograph, film, or broadcast, blog, live-blog, or post video of this presentation, provided that you attribute the work to its author and respect the rights and licenses associated with its components.

https://creativecommons.org/licenses/by/4.0/


<ropensci>






A perfect combination



     

Data is increasingly on the web



API: Application Programming Interface





  



Reproducibly plug data from the web into research workflows



</ropensci>



<solr>



The Solr R client:



github.com/ropensci/solr

Solr in R



  • Use cases for Solr in R

  • solr R client - Search

  • solr R client - Server management

Solr v5


We're developing against Solr v5


So some things may not work with older versions

Use case: Data exploration


R has all the tools you'll need for data manipulation, vizualization, and statistics


Access to infinite data via Solr makes this a powerful combination

Use case: Data exploration


The data.frame is the most common data structure in R, and the easiest to work with


 

Use case: Data exploration


in solr R, we make data.frame default output from search


easy downstream use for:

  • vizualization
  • statistics
  • modelling

Use case: Easy R client libraries


Many public web APIs use Solr



R client = easy w/ solr R client

Use case: Easy R client libraries


Examples:


Use case: Solr Server Management


Probably don't want to do all server mngt. in R, but e.g.,



  • create/delete a collection/core

  • add/delete/update documents from files, and R objects



are, or will be, easy in solr R client

First, let's connect - solr_connect()


You can also toggle:

  • error verbosity
  • whether URLs are printed
  • use a proxy

Additional search functions


  • solr_mlt() - more like this search

  • solr_group() - group search

  • solr_highlight() - highlight search

  • solr_stats() - stats search

Server management

Server management functions

  • core_*() - manage cores

  • collection_*() - manage collections

  • add_*() - add documents from R objects

  • solr_get() - get documents by id

  • update_*() - add documents from files

  • delete_*() - delete documents

  • config_*() - set/unset config params

Three update_*() fxns:


  • update_json()
  • update_xml()
  • update_csv()



- Input: files

- JSON and XML versions can include add & delete for specific documents

In the works...


  • Inspect configuration

  • Write configuration

  • Compatibilty with older Solr versions

  • Support spatial search

  • Plugin handler (if possible)

In closing...


Would love your feedback
kick the tires
let me know what could be better




solr R client on GitHub: github.com/ropensci/solr



</solr>

let's talk


 

I'll be around tomorrow if you want to meet


rOpenSci on the web: ropensci.org



rOpenSci discussion forum: discuss.ropensci.org



This talk on the web: recology.info/talks/sfbaysolr







Made w/ reveal.js


Icons by: FontAwesome v4.4.0