The Taxonomic and Biodiversity Software Stack in R
Scott Chamberlain (@sckottie/@ropensci)
UC Berkeley - rOpenSci
R intro
growing in popularity
very widespread with biologists
similar to e.g., Ruby, Python, Julia
> 10K packages
a pitch for R
making biodiveristy and taxonomy software in R:
R has all parts of digital research workflow
R software means targeting users
meets scientists/etc. where they are, which:
facilitates reproducible science
facilitates open science
rOpenSci?
non-profit
make open-source software for R
large community developers/users
target science use cases
we often play role of communicating btw. scientists/data provders
the R taxonomy landscape
R packages
* not rOpenSci
R taxonomy task view:
taxize
access to more than 20 data sources
consistent interfaces for tasks:
- search
- taxonomic ids
- hierarchy
- up/down-stream
- children
- synonyms
- name resolution
taxize citations
with > 50 citations, some e.g.'s
representative taxize use case
Leung et al. (2017). A quantitative-PCR based method to estimate ranavirus viral load following normalisation by reference to an ultraconserved vertebrate target. J. Virological Methods.
"retrieve classification hierarchy from the Integrated Taxonomic Information System"
taxa
taxonomic classes for R: taxon IDs, names, ranks, data sources
taxonomic classes w/ & w/o data
manipulate taxonomic classes: grow, prune, combine, calculate, etc.
manipulate taxonomic classes while maintaining link to data
taxa as foundation software
idea is to integrate taxa in other software, e.g.:
Thoughts on taxa?
Would love to know what people think?
PEG - Parsing Expression Grammar
define rules (e.g., capture any digit)
combine rules to form a grammar
apply grammar to strings (e.g., taxonomic names)
thanks dmitry
but ...
R and Java don't play nice
R and C++ do play nice
I'm learning from gnparser to implement R/C++ parser
pegax status
not done yet, but
can instead and into the future use Globalnames Resolver as a web service via taxize
taxize::gnr_resolve
the R biodiversity landscape
R packages
* not rOpenSci
finch - parse GBIF bulk data
EML - read and create EML
rbison - USGS's BISON
rebird - eBird (see also auk)
rinat - iNaturalist
ALA4R - Atlas of Living Australia (*)
robis - OBIS (*)
spocc - one stop shop (of all above)
R packages
* not rOpenSci
scrubr - occ. data cleaning
...
dismo - lots of biodiv. analysis pkgs
GBIF
R/Python more for researchers,
Python/Ruby more for web dev
rgbif
> 35 citations, some e.g.'s
representative rgbif use case
Ludt et al. (2017). A quantitative and statistical biological comparison of three semi-enclosed seas: the Red Sea, the Persian (Arabian) Gulf, and the Gulf of California. Marine Biodiversity
"... species occurrence data were gathered for major marine phyla ... from geo-referenced specimen data on [GBIF] ... using a polygon search area in ... rgbif"
let us know:
what are we missing?
what problems do you have?