A data toolkit for open science
Scott Chamberlain
@recology_
License: CC-BY 3.0 - You are free to copy, share, adapt, or remix, photograph, film, or broadcast, blog, live-blog, or post video of this presentation, provided that you attribute the work to its author and respect the rights and licenses associated with its components.
Science needs to be more...
Why Open?
To increase the pace of science
Most research publicly funded
Why Reproducible?
for yourself!
for others (if reproducible, more confidence in results)
Why Reproducible?
Sort of a moral obligation as a scientist, right?
And to avoid this ->
|
Reinhart-Rogoff
excel |
|
Chronic fatigue syndrome-XMRV
cfs |
Or maybe you just want science to be easier
What tools do we need? What's missing?
Not these!
These!
There's a learning curve, but...
link
Workflows side by side
|
Our workflow now
- Browser
- Excel
- SAS
- SigmaPlot
- Word
- Endnote
|
|
What it could be
|
|
|
Cost = $$$$$$$
Open? = Nope
Reproducible? = Nope |
|
Cost = 0
Open? = Yes!
Reproducible? = Yep |
Data is increasingly on the web
Reproducibly plug data from the web into your science
API: Application Programming Interface - APIs are the highways connecting providers to users.
Data Science
Connecting scientists to open data on the web
Data acquisition data manipulation data analysis data visualization - combined with tools for writing in R, including knitr
and Markdown/LaTeX
- make an open science workflow
DATA ACQUISITION data manipulation data analysis data visualization - combined with tools for writing in R
, including knitr
and Markdown/LaTeX
- make an open science workflow
Hold up - Why would I want to get data programatically?
Science is just easier
You can reproduce your work
Others can reproduce your work
## Public Library of Science full text - rplos
```r
library(rplos)
plot_throughtime(list("reproducible science"), 500)
```
![](code/figure/unnamed-chunk-1.png)
## Mapping biodiversity data - rgbif
```r
library(rgbif)
key <- name_backbone(name='Danaus plexippus', kingdom='animals')$speciesKey
out <- occ_search(taxonKey=key, limit=300, return='data')
gbifmap(out)
```
![](code/figure/rgbif.png)
## Projected climate data - rWBclimate
```r
library(rWBclimate)
country_dat <- get_historical_temp(c("USA", "MEX", "CAN", "BLZ"), "year")
ggplot(country_dat, aes(x = year, y = data, group = locator)) +
theme_bw(base_size=18) +
geom_point() +
geom_path() +
labs(y="Average annual temperature of Canada", x="Year") +
stat_smooth(se = F, colour = "black") +
facet_wrap(~locator, scale = "free")
```
![](code/figure/rWBclimate.png)
Unified species occurrence data - spocc
## Unified species occurrence data - spocc
```r
library(spocc); library(rCharts)
spnames <- c('Accipiter striatus', 'Setophaga caerulescens',
'Spinus tristis')
out <- occ(query=spnames, from=c('gbif','bison'),
gbifopts=list(georeferenced=TRUE))
head(out$gbif$data)
```
```
## name key longitude latitude prov
## 1 Accipiter striatus Vieillot, 1808 773408845 -97.28 32.876 gbif
## 2 Accipiter striatus Vieillot, 1808 768992325 -76.10 4.724 gbif
## 3 Accipiter striatus Vieillot, 1808 773414146 -122.27 37.771 gbif
## 4 Accipiter striatus Vieillot, 1808 773440541 -98.00 32.800 gbif
## 5 Accipiter striatus Vieillot, 1808 773423188 -76.54 38.688 gbif
## 6 Accipiter striatus Vieillot, 1808 773432602 -122.78 38.613 gbif
```
Visualize data interactively with GitHub
Visualize data interactively with CartoDB
knitr Markdown or LaTeX
executable papers
Xie Y (2012). knitr: A general-purpose package for dynamic report generation in R.
The final paper, compiled w/ knitr from text+code
Data acquisition data manipulation data analysis data visualization - combined with tools for writing in R, including knitr
and Markdown/LaTeX
- make an open science workflow