solr - an R client for Apache Solr
Scott Chamberlain (@recology_ / @ropensci)
UC Berkeley / rOpenSci
Supported by:
License: CC-BY 4.0 - You are free to copy, share, adapt, or remix, photograph, film, or broadcast, blog, live-blog, or post video of this presentation, provided that you attribute the work to its author and respect the rights and licenses associated with its components.
https://creativecommons.org/licenses/by/4.0/
A perfect combination
Data is increasingly on the web
API: Application Programming Interface
Reproducibly plug data from the web into research workflows
Solr in R
Use cases for Solr in R
solr R client - Search
solr R client - Server management
Solr v5
We're developing against Solr v5
So some things may not work with older versions
Use case: Data exploration
R has all the tools you'll need for data manipulation, vizualization, and statistics
Access to infinite data via Solr makes this a powerful combination
Use case: Data exploration
The data.frame is the most common data structure in R, and the easiest to work with
Use case: Data exploration
in solr
R, we make data.frame default output from search
easy downstream use for:
- vizualization
- statistics
- modelling
Use case: Easy R client libraries
Many public web APIs use Solr
R client = easy w/ solr
R client
Use case: Easy R client libraries
Examples:
Use case: Solr Server Management
Probably don't want to do all server mngt. in R, but e.g.,
create/delete a collection/core
add/delete/update documents from files, and R objects
are, or will be, easy in solr
R client
First, let's connect - solr_connect()
You can also toggle:
- error verbosity
- whether URLs are printed
- use a proxy
Additional search functions
solr_mlt() - more like this search
solr_group() - group search
solr_highlight() - highlight search
solr_stats() - stats search
Server management functions
core_*() - manage cores
collection_*() - manage collections
add_*() - add documents from R objects
solr_get() - get documents by id
update_*() - add documents from files
delete_*() - delete documents
config_*() - set/unset config params
Three update_*()
fxns:
update_json()
update_xml()
update_csv()
- Input: files
- JSON and XML versions can include add
& delete
for specific documents
In the works...
- Inspect configuration
- Write configuration
- Compatibilty with older Solr versions
- Support spatial search
- Plugin handler (if possible)
In closing...
Would love your feedback
kick the tires
let me know what could be better
let's talk
I'll be around tomorrow if you want to meet