Cultivating open and reproducible science
Scott Chamberlain (@recology_)
UC Berkeley
recology.info/talks/uga
Supported by:
These data are hard to get
Cultural barriers
Lack of incentives (carrots)
Lack of pressure (sticks)
Getting scooped ()
Takes too much time! ()
the Data Policy states the ‘minimal dataset’ consists “of the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. This does not mean that authors must submit all data collected as part of the research, but that they must provide the data that are relevant to the specific analysis presented in the paper.
PLOS Editorial and Publishing Policies
Why Open?
Most research publicly funded
Why Open?
To increase the pace of science
Openness facilitates reproducibility
Why Reproducible?
For yourself!
Why Reproducible?
To avoid mistakes
|
Reinhart-Rogoff
excel |
|
Chronic fatigue syndrome-XMRV
cfs |
Not these!
These!
There's a learning curve, but...
link
Workflows side by side
|
Our workflow now
- Browser
- Excel
- SAS
- SigmaPlot
- Word
- Endnote
|
|
What it could be
|
|
|
Cost = $$$$$$$
Open? = Nope
Reproducible? = Nope |
|
Cost = 0
Open? = Yes!
Reproducible? = Yep |
Data is increasingly on the web
API: Application Programming Interface
Reproducibly plug data from the web into your science
Data acquisition
data manipulation/analysis/viz
writing
publish
Data acquisition
data manipulation/analysis/viz
writing
publish
Data acquisition
data manipulation/analysis/viz
writing
publish
Data acquisition
data manipulation/analysis/viz
writing
publish
Data acquisition
data manipulation/analysis/viz
writing
publish
rOpenSci packages
Data
|
|
Literature
|
|
Altmetrics
|
|
Publishing
|
Code demos
The diversity of projects
How easy it is
Image examples at scale
rentrez - an R client for ENTREZ
taxize
all things taxonomy
and many more...
Unified species occurrence data - spocc
Visualize data interactively with GitHub
Visualize data interactively with CartoDB
altmetrics data often open
EML
EML provides a common structure for data, to better enable ecologists to document,
share, and interpret ecological data
EML standard enables data integration at the machine level (with little or no human intervention).
Read more about EML
Publishing outlets
Figshare
Zenodo
Dataone
Various journals
More...
Data acquisition
data manipulation/analysis/viz
writing
publish
Domains
Ecology
History
Archeology
More...
rOpenSci Ambassador Program
Give talks
Code demos
rOpenSci brings to the table:
Bridge between scientists and data providers
Visibility
Consistency
Quality
Conversation/community
Stability
In closing...
This is all about making science better
rOpenSci is just one vehicle with which to help
License: CC-BY 3.0 - You are free to copy, share, adapt, or remix, photograph, film, or broadcast, blog, live-blog, or post video of this presentation, provided that you attribute the work to its author and respect the rights and licenses associated with its components.