Working with Scholarly Text w/ rOpenSci Tools
Scott Chamberlain (@sckottie)
UC Berkeley / rOpenSci
What kinds of questions can we ask?
Does number of authors per article increase through time?
Do p-values on average differ by impact factor?
How do length of methods sections change through time?
How does the use of the word ___ vary through time?
How does code sharing vary by journal/discipline/etc.?
scholarly text data flow
tabulizer example
how open will publishers be moving forward?
full text
metadata, including references
Open Citations!
OCC: Open Citations Corpus
As of March 12, 2018, the OCC has ingested the references from 302,758 citing bibliographic resources
and contains information about 12.8 million citation links to 6.5 million cited resources.