Open science and R


Scott Chamberlain (@sckottie/@ropensci)

UC Berkeley / rOpenSci
rOpenSci

hemlsley foundation

LICENSE: CC-BY 4.0




open science


open science is badly needed

Retractions


science should be reproducible!


but doing for real is another issue

100 psychology studies

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science 

Emergent findings




open data can make a new finding possible

Open science as a lego set


Open science as a lego set


open science may be hard to do


but - you can work on different components


and - individual components are useful on their own

Open Data


make your data open


funders/journals often requiring this anyway


future self will thank you

Open Access


make your papers open


funders often requiring this anyway


talk to your librarians!

Versioning: code/data/text


Versioning: code/data/text


failure proofs your work


experiment freely!


git and R help 

Do all work programatically



from geeksaresexy.net/2012/01/05/geeks-vs-non-geeks-picture

Do all work programatically


Key to reproduciblity


Most important person that wants to reproduce your work is you!

Do all work programatically



you and yourself

- one week from now

- two months from now

- & so on

Wellcome Trust



N=583 (N=259 ESRC)


link

Wellcome Trust: Open Access


OA part of open science held back by impact factors



“As much as I love the idea, my long term career prospects currently depend on obtaining high impact papers, so fully Open Access journals have to be of comparable merit.”

Wellcome Trust: Open Data


"The majority of respondents make datasets available as open access (80%), 19% make data available upon request via an application procedure, 10% restrict access to immediate collaborators and 9% restrict access to registered users."


No!!!

Wellcome Trust: Open Code


"only 12% ... indicated they had a bad experience when sharing code ... BUT the majority of ESRC-funded respondents did not recognise any personal benefits from code sharing activities"

scientific programming languages





are:
the canvas on which to do science

important scientific programming languages





Jupyter Notebooks


link 

reproducing a Jupyter notebook


reproducing a Jupyter notebook


something similar in R: Rmarkdown


rmarkdown.rstudio.com 

R language


R homepage  

  • used widely in biology, psychology, medicine, etc.

  • rapidly growing user base, companies surrounding it

  • includes all tools for open science workflow

  • though work to be done ...

Open science ecosytsem


open-science-ecosystem

rOpenSci

ropensci.org  

rOpenSci does:



           

rOpenSci staff


ropensci.org/about/#staff

  • 4 full time

  • now including a community manager!

  • leadership team

  • advisory board

rOpenSci stats



  • ~ 250 code contributors

  • ~ 343 Github repositories

  • ~ 30,000 commits

  • ~ 117 published R packages

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

the research workflow



Data acquisition    

data manipulation/analysis/viz    

writing    

publish

We make data driven stories easier to tell

here are some stories ...

use case 1

McGee, et al. (2015). A pharyngeal jaw evolutionary innovation facilitated extinction in Lake Victoria cichlids. Science 



fishbase.org 

use case 2

Serfass, D. G., & Sherman, R. A. (2015). Situations in 140 Characters: Assessing Real-World Situations on Twitter. PLoS ONE 



ropensci/gender 

use case 3: OKMaps

openknowledgemaps.org 
okmaps

use case 4: mining gene ontology labels

goldi R package 
goldi

using our R package pdftools

use case 5: plant pathogens explained by taxonomic similarity

Bufford, et al. (2016). Taxonomic similarity, more than contact opportunity, explains novel plant-pathogen associations between native and alien taxa. New Phytologist 


Plant-pathogen associations explained by taxonomic similarity



taxonomic data cleaning with our R package taxize

Wrap Up


  • Open science is essential

  • Open science tools are useful on their own

  • rOpenSci: one of the tool makers


  • Challenges going forward

    • Largely cultural - will slowly change

Wrap Up


  • rOpenSci is a community project

  • Let us know what you need

  • Help us make better tools


scotttalks.info/ossps



Made w/: reveal.js v3.2.0


Some Styling: Bootstrap v3.3.5


Icons by: FontAwesome v4.4.0