ropensci

Cultivating open and reproducible science


Scott Chamberlain (@recology_)

UC Berkeley

recology.info/talks/uga

Supported by:
sloan

These data are hard to get

We need...



Cultural barriers



    Lack of incentives (carrots)

    Lack of pressure (sticks)

    Getting scooped ()

    Takes too much time! ()

Instructions for preparation of the Biographical Sketch have been revised to rename the "Publications" section to "Products" and amend terminology and instructions accordingly. This change makes clear that products may include, but are not limited to, publications, data sets, software, patents, and copyrights.




Issuance of a new NSF Proposal & Award Policies and Procedures Guide (October 4th)

the Data Policy states the ‘minimal dataset’ consists “of the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. This does not mean that authors must submit all data collected as part of the research, but that they must provide the data that are relevant to the specific analysis presented in the paper.


PLOS Editorial and Publishing Policies

 Why Open?

 Why Open?



Most research publicly funded



 Why Open?



To increase the pace of science



 Why Open?

Sharing data increases citations

Piwowar et al. 2007

Openness facilitates reproducibility

 Why Reproducible?



For yourself!



 Why Reproducible?


To avoid mistakes



      

Reinhart-Rogoff

excel
      

Chronic fatigue syndrome-XMRV

cfs

What's needed?


toolset

 A reproducible workflow

Not these!

These!

There's a learning curve, but...

link

Workflows side by side

  

Our workflow now

  • Browser
  • Excel
  • SAS
  • SigmaPlot
  • Word
  • Endnote
  

What it could be

  
   Cost = $$$$$$$

Open? = Nope

Reproducible? = Nope

   Cost = 0

Open? = Yes!

Reproducible? = Yep

A perfect marriage



     

Data is increasingly on the web



API: Application Programming Interface





  



Reproducibly plug data from the web into your science






How rOpenSci got started




formed from ad-hoc conversation over Twitter. Now a worldwide community of researchers

http://ropensci.org/community.




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish

rOpenSci packages

ropensci.org/packages

 Data


 

 Literature


 

 Altmetrics


 

 Publishing


Code demos



    The diversity of projects

    How easy it is


    Image examples at scale




 Data

rgbif

rentrez - an R client for ENTREZ


david winter

taxize
all things taxonomy



ben marwick ed Scott Chamberlain karthik ram

and many more...

Unified species occurrence data - spocc

Various plotting options

              
           

Visualize data interactively with GitHub


Visualize data interactively with CartoDB






this is kinda cool




 Literature




 Altmetrics

article-level metrics

There are a lot of altmetrics out there

lotsalts
Canonical altmetrics document

altmetrics data often open






 Publishing

EML


EML provides a common structure for data, to better enable ecologists to document, share, and interpret ecological data



EML standard enables data integration at the machine level (with little or no human intervention).


Read more about EML

Publishing outlets



  • Figshare

  • Zenodo

  • Dataone

  • Various journals

  • More...




Data acquisition    

data manipulation/analysis/viz    

writing    

publish

Community building

Why community building?

network
link

The rOpenSci Community

ropensci_community http://ropensci.org/community

Community stats



  • 71 contributors

  • 148 Github repositories

  • > 9,000 commits over ~3 years

  • a few pkgs with ~900 commits

  • ~66 R packages

Domains



  • Ecology



  • History

  • Archeology

  • More...

Training







rOpenSci Ambassador Program



Give talks


Code demos

rOpenSci brings to the table:



    Bridge between scientists and data providers

    Visibility

    Consistency

    Quality

    Conversation/community

    Stability

In closing...


This is all about making science better


rOpenSci is just one vehicle with which to help


License: CC-BY 3.0 - You are free to copy, share, adapt, or remix, photograph, film, or broadcast, blog, live-blog, or post video of this presentation, provided that you attribute the work to its author and respect the rights and licenses associated with its components.


rOpenSci on the web: http://ropensci.org/



This talk on the web: http://recology.info/talks/uga







Made w/ reveal.js


Icons by: FontAwesome v4.2.0