class: center # Phylogeny Based Biodiversity Data Queries .center[ ## Scott Chamberlain ### UC Berkeley - rOpenSci <img src="img/icon_lettering_color.svg" alt="ropensci" width="600" /> ] --- class: middle .center[ <img src="img/helmsley.png" alt="hemlsley foundation" width="1000" /> ## [scotttalks.info/tdwg18](https://scotttalks.info/tdwg17) <br> ## LICENSE: [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) ] --- # What is rOpenSci? ## * non-profit ## * make open-source software for R ## * software review ## * community developers/users ## * target science use cases ## * staff communicating btw. scientists/data provders --- background-image: url("img/phylogeny.png") class: bottom <!-- via [ggtree tutorial](https://www.bioconductor.org/packages/release/bioc/vignettes/ggtree/inst/doc/treeVisualization.html) --> <p style="background-color: white; width: 250px; font-size: 30px;">via <a href="https://www.bioconductor.org/packages/release/bioc/vignettes/ggtree/inst/doc/treeVisualization.html">ggtree tutorial</a></p> --- background-image: url("img/biodiv.png") class: bottom <p style="background-color: white; width: 250px; font-size: 30px;">via <a href="http://api.gbif.org/v2/map/debug/">GBIF map API demos</a></p> --- background-image: url("img/lit.png") class: bottom <p style="background-color: #EB45F2; width: 600px; font-size: 30px;">lots of papers combining phylogenies and biodiversity</p> --- class: center # How do we quickly link the two? <br><br> ## Ideally we want: <br><br> ## users to be able to ask A LOT of questions VERY QUICKLY --- # phylodiv - an R package ## source code: [ropensci/phylodiv](https://github.com/ropensci/phylodiv) --- # phylodiv - an R package <br> ## builds on top of ... ## - [taxizedb][] for taxonomy data ## - [rgbif][] for biodiversity data ## - [rgbif][]/[raster][]/[ggplot2][]/[ggtree][]/[patchwork][]/etc. for mapping --- # High level workflow ``` read phylogeny gather higher taxonomy phylogenetic query gather biodiversity data visualization ... history ``` --- # High level workflow ``` read phylogeny -> pd_read() gather higher taxonomy -> pd_taxa() phylogenetic query -> pd_query() gather biodiversity data -> pd_biodiv() visualization -> pd_viz() ... history -> pd_meta() ``` --- # High level workflow ``` read phylogeny -> pd_read() gather higher taxonomy -> pd_taxa() phylogenetic query -> pd_query() gather biodiversity data -> pd_biodiv() visualization -> pd_viz() ... history -> pd_meta() ``` <br> rapid iteration between phylogenetic queries and vizualization <img src="img/iterate.png" alt="iterate" width="200" /> --- class: center, middle <span style="font-size: 100px">LIVE DEMO</span> --- background-image: url("img/sad.png") class: center, top # this will be easy, right? --- # hard parts <!-- especially when you combine the various ways one can ask questions w/ the various options for data sources and vis tools --> <br> ## Taxonomic names ## Queries on trees ## Biodiversity data ## Visualization <!-- but before we get into that, let's try it --> --- # Taxonomy ### * web or local data access (local faster, but requires more setup 😬) ### * can we label nodes programatically? * just talked to Emily yesterday 😮 ¿annotate via OpenTree of Life? <!-- ### * does it even make sense to collect higher taxonomic names for tips and query based on these higher names? --> ### * higher taxonomic names on taxa w/o names? ### * leverage [taxa][] internally? (see [ropensci/taxa#184](https://github.com/ropensci/taxa/issues/184) for discussion) --- # Queries ### * how do we make queries on trees drop dead simple __AND__ flexible * all tips within node A * node A vs. node B * node A vs. node B and node C * node A of tree 1 vs. node C of tree 2 * node D of each of 100 trees (bla! how do you viz. that?) * _Helianthus annuus_ vs. _Helianthus angustifolius_ * all names starting with _Helianthus_ * compare sub trees that match shape X --- # Queries ### formula syntax? ```r A ~ . A ~ B A ~ B + C t1.A ~ t2.C D ~ t* ``` ### function calls? ```r descendants(A) children(A) ~ descendants(Y) tip(A) ~ tip(B) tips(Helianthus*) tip(A, tree = 1) ~ tip(W, tree = 78) ``` --- # Biodiversity <!-- ### * multiple use cases, some fast, some slow --> <!-- ### * what happens when query contains extinct taxa? --> ### * many levels of data - total counts - total counts faceted by variable (e.g., country) - complete occurrence data - visually summarized data (rasters) ### * strike balance btw bringing as much data to bear on problem as possible & speed ### * making dealing with web requests easier: rate limits, caching, etc. ### * consistent queries across data sources: [spocc][] already done hard work - may integrate --- # Maps <!-- ### * some options very fast, some very slow --> ### * how best to do visualizations for single trees vs. many trees ### * so many customizations possible - can only give a very simplified subset + hopefully allowing customizations on top of (easy-ish with `ggplot2`/`ggtree`) <img src="img/facet.png" width="250"> <img src="img/count.png" width="175"> <img src="img/raster.png" width="175"> --- # What do you want to see? ### * is <span style="color: blue">phylodiv</span> a bad or good idea? ### * what are your main pain points in linking phylogenies to biodiversity data? ### * do you really need a GUI interface? or is a programmatic interface okay? is it worth it if you get greater reproducibility ### * data sources: which ones do you trust? not everyone is likely to trust the same source 😏 --- # Future work -- <img src="img/fixit.gif" width="1000"> --- # Future work ## * focus on ease of use first, performance later ## * make more taxonomic databases work locally ## * make queries easy ## * can [phyloreferencing][] help at all here? ## * maybe break up into a few pkgs --- class: middle .center[ ## [scotttalks.info/tdwg18](https://scotttalks.info/tdwg18) ## Made w/: [xaringan](https://github.com/yihui/xaringan) ] [rgbif]: https://github.com/ropensci/rgbif [taxizedb]: https://github.com/ropensci/taxizedb [taxa]: https://github.com/ropensci/taxa [ggplot2]: https://github.com/tidyverse/ggplot2 [taxizedb]: https://github.com/ropensci/taxizedb [patchwork]: https://github.com/thomasp85/patchwork [ggtree]: https://www.bioconductor.org/packages/release/bioc/html/ggtree.html [raster]: https://cran.r-project.org/package=raster [phyloreferencing]: https://www.phyloref.org/ [spocc]: https://github.com/ropensci/spocc