Open Science / Research w/ R featuring rOpenSci
Scott Chamberlain (@sckottie/@ropensci)
UC Berkeley / rOpenSci
Keyboard shortcuts: press ?
Open science as a lego set
Open science as a lego set
open science may be hard to do
but - you can work on different components
and - individual components are worth learning
Open Data
(at least within your organization)
funders/journals often requiring this anyway
future self will thank you
Versioning: code/data/text
failure proofs your work
experiment freely!
makes collaboration easier
Do all work programatically
Key to reproduciblity
Most important person that wants to reproduce your work is you!
Do all work programatically
you and yourself
- one week from now
- two months from now
- & so on
important (higher level) scientific programming languages
R language
used widely in biology, psychology, medicine, etc.
rapidly growing user base, companies surrounding it
includes all tools for open science workflow
though work to be done ...
Open science ecosytsem
the research workflow
Data acquisition
data manipulation/analysis/viz
writing
publish
the research workflow
Data acquisition
data manipulation/analysis/viz
writing
publish
the research workflow
Data acquisition
data manipulation/analysis/viz
writing
publish
the research workflow
Data acquisition
data manipulation/analysis/viz
writing
publish
the research workflow
Data acquisition
data manipulation/analysis/viz
writing
publish
but, software sustainability is hard
each panel is a package, each dot a person
rOpenSci software used in
research
within companies
fun side projects
journalism
and more
here are some of the academic research uses
... usually found in methods section of papers
Taxonomic IDs
always try to move from:
taxonomic name -- to
taxonomic ID -- to
whatever other data
Genomic Data Retrieval - biomartr Interfaces to:
Geospatial: conversion between data/spatial data formats - geojsonio
geojson_list - convert to GeoJSON as R list
geojson_json - convert to GeoJSON as JSON
geojson_read/geojson_write - read/write GeoJSON
from most R object types + many spatial data formats
geojson workflow
we're trying for a GeoJSON workflow in R, w/o heavy dependencies like GDAL/GEOS - get in touch if you have any interest
NOAA climate data - rnoaa
NCDC API
Severe weather data
Sea ice data
NOAA buoy data
Tornadoes
HOMR - Historical Observing Metadata Repository
Storm data
GHCND FTP data
Global Ensemble Forecast System (GEFS) data
Extended Reconstructed Sea Surface Temperature (ERSST) data
Argo buoys data
NOAA CO-OPS - tides and currents data
NOAA Climate Prediction Center (CPC)
Africa Rainfall Climatology version 2
Wrapping web APIs:
High level concepts
Each pkg is a snowflake: every web API is different
Try to cater to both beginners and power users
Fail fast and fail well: APIs may not do it for you
Pass on curl options! empower your users to:
investigate http request problems
set proxy options (IT often blocks certain sites/ports)
and more
Example pkg wrapping web API
ritis: client for ITIS taxonomic data
ritis: notes/thoughts
imports: solrium, crul, jsonlite, data.table, tibble
package API: fxns for REST API and Solr API
a downside of this package possibly: a lot of functions
return tibbles from all functions
but raw
JSON/XML output for those that want it
Solr queries handled by solrium package
Combining many sources into one package
Many into one considerations
Is it really a good idea?
Inputs:
What parameters can be unified across sources?
Allow users to fiddle with sources specific options
Fail consistently across sources if possible
Outputs:
What if any outputs can be combined
Many into one e.g.: spocc
Many into one e.g.: spocc
All 10 sources share common input: taxonomic names
Pagination is similar-ish across sources (requires some source specific variable mapping)
Geospatial search: WKT and bounding boxes then map to what source requires
Most can toggle whether to return records that have coordinates or not
Outputs: combine the minimum set of similar fields
rOpenSci Software Review
Editors determine fit or not a fit
Editors assign reviewers
Reviewers have ~ 3 weeks
Reviewers and maintainer go back and forth refining pkg
After approval, pkg moved to rOpenSci
A number of e.g.'s of pkgs from government agencies (including Canada)
rOpenSci Software Review
Completely open source tools
Free to run
All reviews/conversations in the open
Reviews are/can be linked to code changes
Paired with journal submission: JOSS and MEE
rOpenSci Onboarding
not sure?
pre-submission inquiry!
Bioconductor Does Open Review too!
talk to us
what would you like to see?
what open data is too hard to get?