rOpenSci & R packages for Biodiversity Analysis


Scott Chamberlain (@sckottie/@ropensci)

UC Berkeley / rOpenSci
rOpenSci

hemlsley foundation

LICENSE: CC-BY 4.0




open science


Open science as a lego set


Open science as a lego set


open science may be hard to do


but - you can work on different components


and - individual components are worth learning

Open Data


make your data open


funders/journals often requiring this anyway


future self will thank you

Versioning: code/data/text


Versioning: code/data/text


failure proofs your work


experiment freely!

Do all work programatically



from geeksaresexy.net/2012/01/05/geeks-vs-non-geeks-picture

Do all work programatically


Key to reproduciblity


Most important person that wants to reproduce your work is you!

Do all work programatically



you and yourself

- one week from now

- two months from now

- & so on

An example to shoot for


BAAD blog post 

important scientific programming languages



R language

  • used widely in biology, psychology, medicine, etc.

  • rapidly growing user base, companies surrounding it

  • includes all tools for open science workflow

  • though work to be done ...

Open science ecosytsem


open-science-ecosystem

rOpenSci
ropensci.org  

rOpenSci does:



           

rOpenSci Staff

ropensci.org/about/#staff
  • 4 full time

  • now including a community manager!

  • leadership team

  • advisory board

Community stats



  • ~ 250 code contributors

  • large no. bug reports/feature requests

  • ~ 364 Github repositories

  • ~ 30,000 commits

  • ~ 123 published R packages

What data do you use in your research?



walkthrough ...

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

rOpenSci Tools

https://ropensci.org/packages

We make data driven stories easier to tell

here are some stories ...

use case 1

McGee, M. D., Borstein, S. R., Neches, R. Y., Buescher, H. H., Seehausen, O., & Wainwright, P. C. (2015). A pharyngeal jaw evolutionary innovation facilitated extinction in Lake Victoria cichlids. Science, 350(6264), 1077–1079 


use case 2

Turner et al. (2015). Adaptive plasticity and niche expansion in an invasive thistle. Ecology and Evolution 

use case 3

Butterfield et al. (2016). Prestoration: using species in restoration that will persist now and into the future. Restoration Ecology. 

rOpenSci Biodiversity Tools

Taxonomy

Taxonomy

  • taxize - Taxonomic toolbelt

  • taxizesoap - Taxonomic toolbelt (SOAP)

  • ritis - ITIS client (avail. in taxize)

  • taxizedb - Access to SQL dumps

  • wikitaxa - Taxonomy from Wiki-pedia/-species/-commons

  • worrms - WORMS client (avail. in taxize)

  • natserv - Natureserve client (avail. in taxize)

  • taxa - Taxonomic classes to be used by other pkgs (coming soon)

Taxonomic IDs


always try to move from:


  • taxonomic name -- to

  • taxonomic ID -- to

  • whatever other data

coming soon ...

Species occurrence data

Species occurrence data

  • spocc - One client to rule them all

  • rgbif - GBIF data (avail. in spocc)

  • AntWeb - AntWeb ant data (avail. in spocc)

  • ecoengine - Berekeley Ecoengine client (avail. in spocc)

  • rinat - iNaturalist client (avail. in spocc)

  • rbison - USGS BISON client (avail. in spocc)

  • rebird - eBird data (avail. in spocc)

  • rvertnet - VertNet data (avail. in spocc)

  • rfishbase - Fishbase.org data

  • rAvis - PreyectoAvis data

Species occurrence data: other

Biodiveristy related data

Occurrence data cleaning

Occurrence data cleaning

how do you clean your data?



Geospatial

Geospatial

Geospatial: conversion between data/spatial data formats - geojsonio


  • geojson_list - convert to GeoJSON as R list

  • geojson_json - convert to GeoJSON as JSON

  • geojson_read/geojson_write - read/write GeoJSON



from most R object types + many spatial data formats

Climate data

Climate data

  • rnoaa - Lots of NOAA data

  • ccafs - Climate Change, Agriculture, and Food Security (CCAFS) General Circulation Models (GCM) data

  • camsRad - CAMS Radiation service

  • clifro - New Zealand's National Climate Database CliFlo

  • GSODR - NOAA Global Summary Daily Weather Data

  • riem - ASOS data via the Iowa Environment Mesonet

  • ropenaq - OpenAQ API client

  • rWBclimate - World Bank climate data

Phylogenies

What's too hard?



talk to us


what would you like to see?

what open data is too hard to get?

discussion forum: discuss.ropensci.org

submit a package/review a package: github.com/ropensci/onboarding


scotttalks.info/ossps2



Made w/: reveal.js v3.2.0


Some Styling: Bootstrap v3.3.5


Icons by: FontAwesome v4.4.0