Open and reproducible science with R


Scott Chamberlain

UC Berkeley / rOpenSci
rOpenSci

hemlsley foundation

LICENSE: CC-BY 4.0




open science/research


open science is badly needed

Retractions


science should be reproducible!


but doing for real is another issue

100 psychology studies

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science 

Emergent findings




open data can make a new finding possible

Cultural barriers



    Lack of incentives (carrots)

    Lack of pressure (sticks)

    Getting scooped ()

    Takes too much time! ()

Open science as a lego set


Open science as a lego set


open science may be hard to do


but - you can work on different components


and - individual components are useful on their own

you don't need to do it all at once

Open science components

Open science components

Open Data


make your data open


funders/journals often requiring this anyway


future self will thank you

Open science components

Open Data: Venues


  • Include data with publications
  • Data specific repositories
  • Code sharing sites: e.g., GitHub
  • so-called Institutional Repositories (IRs)

Open science components

Open Access


make your papers open


funders often requiring this anyway


talk to your librarians!

Open science components

Open Access: Preprints


Preprints increasingly allowed by publishers


++ preprint outlets

SSRN*, SocArXiv, PsyArXiv


talk to your librarians!

*: think twice maybe

Open science components

Open Access: Green OA


Allowed to put up your "authors copy" on your website/etc.


the internet will surface it

Open science components

Versioning


Open science components

Versioning



source
Open science components

Versioning


Including basically all research components:

  • Code
  • Data
  • Metadata
  • Text: manuscripts

Open science components

Why use Versioning?


  • failure proofs your work
  • allows you to experiment freely!
  • Metadata
  • Text: manuscripts

git and R help 
Open science components

Versioning: Git
Resources


Open science components

Do all work programatically



from geeksaresexy.net/2012/01/05/geeks-vs-non-geeks-picture
Open science components

Do all work programatically


Key to reproduciblity:


Most important person that wants to reproduce your work is you!

Open science components

Do all work programatically



you and yourself

- one week from now

- two months from now

- & so on

Open science components

Do all work programatically



allows others to:

- contribute to your work

- check your work

- build on top of your work

scientific programming languages

scientific programming languages





are:
the canvas on which to do science

important scientific programming languages





R language


R homepage  

  • used widely in biology, psychology, medicine, etc.

  • rapidly growing user base, companies surrounding it

  • includes all tools for open science workflow

  • salaries for R skills up there (1, 2)

Open/Rep. Science w/ R

What's the most important thing about R wrt open/reproducible?


R itself -> you're programming!

Tools


Workflows


  • A script (i.e., a .R file)

  • Script + Text = Markdown/Latex (e.g., journal article)

  • Any files + Dropbox

  • Any files + versioning (git)

  • Any files + versioning (git) + Pandoc

What to aim for


  • Do as much as possible in code

  • Version control all products

  • Combine text and code together

  • Share/open up your work

workflow demo

Open science ecosytsem


open-science-ecosystem

rOpenSci

ropensci.org  

rOpenSci does:



           

rOpenSci staff


ropensci.org/about/#staff

  • 4 full time

  • now including a community manager!

  • leadership team

  • advisory board

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

rOpenSci makes data driven stories easier to tell

here are some stories ...

use case 1

Lovelace, R., Goodman, A., Aldred, R., Berkoff, N., Abbas, A., & Woodcock, J. (2015). The Propensity to Cycle Tool: An open source online system for sustainable transport planning. arXiv preprint 

http://pct.bike 

stplanr R package 

use case 2

Serfass, D. G., & Sherman, R. A. (2015). Situations in 140 Characters: Assessing Real-World Situations on Twitter. PLoS ONE 



ropensci/gender 

use case 3: OKMaps

openknowledgemaps.org 
okmaps

Wrap Up


  • Open science is essential

  • Open science tools are useful on their own

  • rOpenSci: one of the tool makers


  • Challenges going forward

    • Largely cultural - will slowly change

Wrap Up


  • rOpenSci is a community project

  • Let us know what you need

  • Help us make better tools

Questions?


scotttalks.info/uofo



Made w/: reveal.js v3.2.0


Some Styling: Bootstrap v3.3.5


Icons by: FontAwesome v4.4.0

rOpenSci Tools




Data Publication | Data Access |

Literature

| Altmetrics | Scalable & Reproducible Computing | Databases | Data Vizualization | Image Processing | Data Tools | Taxonomy | HTTP tools | Geospatial | Data Analysis

rOpenSci Tools




Data Publication | Data Access |

Literature

| Altmetrics | Scalable & Reproducible Computing | Databases | Data Vizualization | Image Processing | Data Tools | Taxonomy | HTTP tools | Geospatial | Data Analysis

rOpenSci Literature Tools


Public Library of Science


using rplos we can access metadata and fulltext for any PLOS article


install rplos like

install.packages("rplos")


example demo

Exercise


  1. Create a .Rmd file
  2. Use rplos to get fulltext for 1000 articles
    -calculate number of authors per article
    -make a simple plot of authors per article
  3. Render the .Rmd to .html
  4. Send the .Rmd version to your partner via email
  5. Render the .Rmd file you received
  6. Does your .html look the same?

rOpenSci Tools




Data Publication | Data Access | Literature | Altmetrics | Scalable & Reproducible Computing | Databases | Data Vizualization | Image Processing | Data Tools | Taxonomy | HTTP tools |

Geospatial

| Data Analysis

rOpenSci Geospatial Tools


using openadds (link) get addresses for Lane County


using leaflet visualize locations on map

Exercise


  1. Create a .Rmd file
  2. Use openadds to get a data.frame of addresses, then leaflet to visualize the map
  3. Render the .Rmd to .html