The Taxonomic and Biodiversity Software Stack in R


Scott Chamberlain (@sckottie/@ropensci)

UC Berkeley - rOpenSci
rOpenSci

hemlsley foundation

R intro


  • growing in popularity

  • very widespread with biologists

  • similar to e.g., Ruby, Python, Julia

  • > 10K packages

a pitch for R


making biodiveristy and taxonomy software in R:


  • R has all parts of digital research workflow

  • R software means targeting users

  • meets scientists/etc. where they are, which:

    • facilitates reproducible science

    • facilitates open science

rOpenSci

rOpenSci?


  • non-profit

  • make open-source software for R

  • large community developers/users

  • target science use cases

  • we often play role of communicating btw. scientists/data provders



the R taxonomy landscape

R packages

* not rOpenSci


R taxonomy task view:  

taxize

taxize


  • access to more than 20 data sources

  • consistent interfaces for tasks:
    • search
    • taxonomic ids
    • hierarchy
    • up/down-stream
    • children
    • synonyms
    • name resolution

taxize citations


with > 50 citations, some e.g.'s

representative taxize use case

Leung et al. (2017). A quantitative-PCR based method to estimate ranavirus viral load following normalisation by reference to an ultraconserved vertebrate target. J. Virological Methods. 

"retrieve classification hierarchy from the Integrated Taxonomic Information System"

taxa

taxa


  • taxonomic classes for R: taxon IDs, names, ranks, data sources

  • taxonomic classes w/ & w/o data

  • manipulate taxonomic classes: grow, prune, combine, calculate, etc.

  • manipulate taxonomic classes while maintaining link to data

taxa as foundation software


idea is to integrate taxa in other software, e.g.:


Thoughts on taxa?


Would love to know what people think?

pegax

PEG - Parsing Expression Grammar


  • define rules (e.g., capture any digit)

  • combine rules to form a grammar

  • apply grammar to strings (e.g., taxonomic names)

gnparser exists

tdwg17 talk


thanks dmitry


but ...


R and Java don't play nice

R and C++ do play nice


I'm learning from gnparser to implement R/C++ parser

pegax status


not done yet, but


can instead and into the future use Globalnames Resolver as a web service via taxize


taxize::gnr_resolve



the R biodiversity landscape

R packages

* not rOpenSci

R packages

* not rOpenSci

GBIF


pre-print in PeerJ: 10.7287/peerj.preprints.3304v1


R: rgbif

Python: pygbif

Ruby: gbifrb


R/Python more for researchers,
Python/Ruby more for web dev

rgbif


> 35 citations, some e.g.'s

representative rgbif use case

Ludt et al. (2017). A quantitative and statistical biological comparison of three semi-enclosed seas: the Red Sea, the Persian (Arabian) Gulf, and the Gulf of California. Marine Biodiversity 

"... species occurrence data were gathered for major marine phyla ... from geo-referenced specimen data on [GBIF] ... using a polygon search area in ... rgbif"


let us know:

what are we missing?

what problems do you have?


scotttalks.info/tdwg17



Made w/: reveal.js v3.2.0


Styling: Bootstrap v3.3.5


Icons: FontAwesome v4.4.0

FIN