Software & best practices to facilitate open science


Scott Chamberlain (@sckottie/@ropensci)

UC Berkeley / rOpenSci
rOpenSci

hemlsley foundation

LICENSE: CC-BY 4.0




software tools


Software tools are important

Neil Chue Hong. 2014. We are the 92% 

  • nearly all researchers use software
  • majority say research impossible without software
  • software is widespread even in less traditionally technical fields
  • software use is moving towards open source
  • most are self-trained coders
  • most researchers DO NOT stay in academia

Software is not cited

Howison, J., & Bullard, J. (2015). How is software visible in the scientific literature. Technical report, Univ. of Texas. 

  • 37% of mentions involve formal citations

  • 28% provide any version information

  • 20% of the software mentioned is inaccessible in any form

  • 20% is available as source code form with permission to modify

Incentives are for:

no citations needed

  • Papers

  • Grants

  • More papers

  • And more papers

  • maybe teaching



so, software is essentially not even on the list

how do we change incentives?



are there lessons from history?

Resistance to change


  • Some academics tend towards non-open tools (b/c what they know?)

  • Convincing PIs difficult b/c above, and they're busy

  • Reach out to graduate students -> they can change from below

culture clash: open source software vs. academic culture


Despite the importance of software in the life sciences, many researchers still tend to regard the labor involved in producing software as a service or support activity, which though instrumental to the main goal of producing research claims, does not constitute a research contribution in itself


Levin, N., & Leonelli, S. (2016). How Does One “Open” Science? Questions of Value in Biological Research. Science, Technology & Human Values, 0162243916672071. 

Gov't Funders: All the new are belong to us!


  • Funders want new findings - that is, new software

  • Few ways to fund infrastructure sustainably

  • Universities don't want to pay salaries for software engineers

Evidence of above: vibrant non-profit culture surrounding academia funded by private foundations

There are hopeful signs though ...

Para-academic guild


rOpenSci ImpactStory jupyter

open source R/Python taking over

rOpenSci ImpactStory



[^1]: O'Reilly Data Science Salary Survey
[^2]: Muenchen, R. A. (2012). The popularity of data analysis software. UR L http://r4stats. com/popularity.

We're taking a look at reproducibility
... it can be low

Estimating the reproducibility of psychological science. 

Open science can be done!
but still hard ...

The challenge of combining 176 x #otherpeoplesdata to create the Biomass And Allometry Database. 

... lessons learned:

  • Script everything / use source everything

  • Establish a data-processing pipeline

  • Version control (git) / code sharing site

  • Embrace openness


source code: dfalster/baad

Open science ecosytsem


open-science-ecosystem

rOpenSci

rOpenSci does:



           

rOpenSci origin



formed from ad-hoc conversation over blogs/Twitter

by the need to make research easier/more reproducible

now worldwide community

rOpenSci Staff

ropensci.org/about/#staff
  • 4 full time

  • now including a community manager!

  • leadership team

  • advisory board

rOpenSci Community

https://ropensci.org/community
ropensci community

Community stats



  • 250 code contributors

  • 343 Github repositories

  • 30,000 commits

  • a few pkgs with >1,000 commits

  • 113 published R packages

the research workflow




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish




Data acquisition    

data manipulation/analysis/viz    

writing    

publish

rOpenSci Tools

https://ropensci.org/packages

use case 1

McGee, M. D., Borstein, S. R., Neches, R. Y., Buescher, H. H., Seehausen, O., & Wainwright, P. C. (2015). A pharyngeal jaw evolutionary innovation facilitated extinction in Lake Victoria cichlids. Science, 350(6264), 1077–1079 


use case 2

Serfass, D. G., & Sherman, R. A. (2015). Situations in 140 Characters: Assessing Real-World Situations on Twitter. PLoS ONE, 10(11), e0143051 


use case 3: OKMaps

okmaps

Software Best Practices

  • continuous integration

  • unit testing

  • consistent style (within reason)

  • thorough documentation (can always be better though)

  • DRY code


  • teach these to community

best practices for many small pieces?


  • a lot of practices around large projects

  • hard to keep track of all the projects

  • bigger chance of reinventing wheels

  • many small, makes easy for people to get involved

how to make a resilient contributor base?


  • be really nice!!!

  • > 1 contributor

  • direct access to dedicated contribs

  • contributor diversity

  • turn users into contribs

Make software work easier: incentives


  • make software count for tenure/grad school

  • more funding opportunities

  • more jobs making software in academia

Make software work easier: remove barriers


  • make path academia to industry easy

  • provide software training

rOpenSci Software Review

rOpenSci Software Review


  • Completely open source tools

  • Free to run

  • All reviews/conversations in the open

  • Reviews are/can be linked to code changes

  • Paired with submission to a journal - JOSS

Journal of Open Source Software


Software Review: Rising Tide Raises All Boats



People really like software review



what it looks like


ropensci/onboarding#43

incentives: getting credit for reviews

opencage rOpenSci package


author: Maëlle Salmon

Software Review: Spread it Around


we're experimenting with how to package up the workflow and make it easy to deploy

let us know if you have any feedback on the process

similar groups



rOpenGov
ropengov
Algorithms for Computational Social Science and Digital Humanities

Wrap Up

  • Software is important for (open) science

  • But software not appreciated

  • Para-academics/rOpenSci model for long-term software sustainability

Wrap Up: rOpenSci

  • rOpenSci: software best practices, community based

  • rOpenSci: software review

  • Moving forward

    • rOpenSci: expand to more disciplines

    • use rOpenSci model in other disciplines


scotttalks.info/phos16



Made w/: reveal.js v3.2.0


Some Styling: Bootstrap v3.3.5


Icons by: FontAwesome v4.4.0