The Taxonomic and Biodiversity Software Stack in R

Scott Chamberlain (@sckottie/@ropensci)

UC Berkeley - rOpenSci

hemlsley foundation

R intro

  • growing in popularity

  • very widespread with biologists

  • similar to e.g., Ruby, Python, Julia

  • > 10K packages

a pitch for R

making biodiveristy and taxonomy software in R:

  • R has all parts of digital research workflow

  • R software means targeting users

  • meets scientists/etc. where they are, which:

    • facilitates reproducible science

    • facilitates open science



  • non-profit

  • make open-source software for R

  • large community developers/users

  • target science use cases

  • we often play role of communicating btw. scientists/data provders

the R taxonomy landscape

R packages

* not rOpenSci

R taxonomy task view:  



  • access to more than 20 data sources

  • consistent interfaces for tasks:
    • search
    • taxonomic ids
    • hierarchy
    • up/down-stream
    • children
    • synonyms
    • name resolution

taxize citations

with > 50 citations, some e.g.'s

representative taxize use case

Leung et al. (2017). A quantitative-PCR based method to estimate ranavirus viral load following normalisation by reference to an ultraconserved vertebrate target. J. Virological Methods. 

"retrieve classification hierarchy from the Integrated Taxonomic Information System"



  • taxonomic classes for R: taxon IDs, names, ranks, data sources

  • taxonomic classes w/ & w/o data

  • manipulate taxonomic classes: grow, prune, combine, calculate, etc.

  • manipulate taxonomic classes while maintaining link to data

taxa as foundation software

idea is to integrate taxa in other software, e.g.:

Thoughts on taxa?

Would love to know what people think?


PEG - Parsing Expression Grammar

  • define rules (e.g., capture any digit)

  • combine rules to form a grammar

  • apply grammar to strings (e.g., taxonomic names)

gnparser exists

tdwg17 talk

thanks dmitry

but ...

R and Java don't play nice

R and C++ do play nice

I'm learning from gnparser to implement R/C++ parser

pegax status

not done yet, but

can instead and into the future use Globalnames Resolver as a web service via taxize


the R biodiversity landscape

R packages

* not rOpenSci

R packages

* not rOpenSci


pre-print in PeerJ: 10.7287/peerj.preprints.3304v1

R: rgbif

Python: pygbif

Ruby: gbifrb

R/Python more for researchers,
Python/Ruby more for web dev


> 35 citations, some e.g.'s

representative rgbif use case

Ludt et al. (2017). A quantitative and statistical biological comparison of three semi-enclosed seas: the Red Sea, the Persian (Arabian) Gulf, and the Gulf of California. Marine Biodiversity 

"... species occurrence data were gathered for major marine phyla ... from geo-referenced specimen data on [GBIF] ... using a polygon search area in ... rgbif"

let us know:

what are we missing?

what problems do you have?

Made w/: reveal.js v3.2.0

Styling: Bootstrap v3.3.5

Icons: FontAwesome v4.4.0