A data toolkit for open science

Scott Chamberlain

@recology_


License: CC-BY 3.0 - You are free to copy, share, adapt, or remix, photograph, film, or broadcast, blog, live-blog, or post video of this presentation, provided that you attribute the work to its author and respect the rights and licenses associated with its components.

Science needs to be more...



    

 Why Open?



    To increase the pace of science

    Most research publicly funded

 Why Open?

Sharing data increases citations

Piwowar et al.
 Why Reproducible?



    for yourself!

    for others (if reproducible, more confidence in results)

 Why Reproducible?

Sort of a moral obligation as a scientist, right?

And to avoid this ->

      

Reinhart-Rogoff

excel
      

Chronic fatigue syndrome-XMRV

cfs

Or maybe you just want science to be easier

What tools do we need? What's missing?

A reproducible workflow

Not these!

These!

There's a learning curve, but...

link

Workflows side by side

  

Our workflow now

  • Browser
  • Excel
  • SAS
  • SigmaPlot
  • Word
  • Endnote
  

What it could be

  
  

Cost = $$$$$$$

Open? = Nope

Reproducible? = Nope

  

Cost = 0

Open? = Yes!

Reproducible? = Yep

A perfect marriage



     

Data is increasingly on the web

Reproducibly plug data from the web into your science

API: Application Programming Interface - APIs are the highways connecting providers to users.

Data    Science

Connecting scientists to open data on the web





Data acquisition data manipulation data analysis data visualization - combined with tools for writing in R, including knitr and Markdown/LaTeX - make an open science workflow



DATA ACQUISITION data manipulation data analysis data visualization - combined with tools for writing in R, including knitr and Markdown/LaTeX - make an open science workflow

rOpenSci packages

at ropensci.org/packages

        Data

        Literature

        Hybrid

Hold up - Why would I want to get data programatically?

Science is just easier




You can reproduce your work




Others can reproduce your work

## Public Library of Science full text - rplos ```r library(rplos) plot_throughtime(list("reproducible science"), 500) ``` ![](code/figure/unnamed-chunk-1.png)
## Mapping biodiversity data - rgbif ```r library(rgbif) key <- name_backbone(name='Danaus plexippus', kingdom='animals')$speciesKey out <- occ_search(taxonKey=key, limit=300, return='data') gbifmap(out) ``` ![](code/figure/rgbif.png)
## Projected climate data - rWBclimate ```r library(rWBclimate) country_dat <- get_historical_temp(c("USA", "MEX", "CAN", "BLZ"), "year") ggplot(country_dat, aes(x = year, y = data, group = locator)) + theme_bw(base_size=18) + geom_point() + geom_path() + labs(y="Average annual temperature of Canada", x="Year") + stat_smooth(se = F, colour = "black") + facet_wrap(~locator, scale = "free") ``` ![](code/figure/rWBclimate.png)

Unified species occurrence data - spocc

## Unified species occurrence data - spocc ```r library(spocc); library(rCharts) spnames <- c('Accipiter striatus', 'Setophaga caerulescens', 'Spinus tristis') out <- occ(query=spnames, from=c('gbif','bison'), gbifopts=list(georeferenced=TRUE)) head(out$gbif$data) ``` ``` ## name key longitude latitude prov ## 1 Accipiter striatus Vieillot, 1808 773408845 -97.28 32.876 gbif ## 2 Accipiter striatus Vieillot, 1808 768992325 -76.10 4.724 gbif ## 3 Accipiter striatus Vieillot, 1808 773414146 -122.27 37.771 gbif ## 4 Accipiter striatus Vieillot, 1808 773440541 -98.00 32.800 gbif ## 5 Accipiter striatus Vieillot, 1808 773423188 -76.54 38.688 gbif ## 6 Accipiter striatus Vieillot, 1808 773432602 -122.78 38.613 gbif ```

Various plotting options

              
           

Visualize data interactively with GitHub


Visualize data interactively with CartoDB






this is powerful





R writing

knitr Markdown or LaTeX
executable papers

Xie Y (2012). knitr: A general-purpose package for dynamic report generation in R.

An example

Piwowar paper on GitHub at https://github.com/hpiwowar/citation11k

The final paper, compiled w/ knitr from text+code





Data acquisition data manipulation data analysis data visualization - combined with tools for writing in R, including knitr and Markdown/LaTeX - make an open science workflow

rOpenSci on the web: http://ropensci.org/


This talk on the interwebs: http://recology.info/talks/montreal


Montreal Software Carpentry 2 day Bootcamp

Made w/ reveal.js

Icons by: FontAwesome v4.0.3