+ - 0:00:00
Notes for current slide
Notes for next slide

Phylogeny Based Biodiversity Data Queries

Scott Chamberlain

UC Berkeley - rOpenSci

ropensci

1 / 24

What is rOpenSci?

* non-profit

* make open-source software for R

* software review

* community developers/users

* target science use cases

* staff communicating btw. scientists/data provders

3 / 24

via ggtree tutorial

4 / 24

lots of papers combining phylogenies and biodiversity

6 / 24

How do we quickly link the two?



Ideally we want:



users to be able to ask A LOT of questions VERY QUICKLY

7 / 24

phylodiv - an R package

source code: ropensci/phylodiv

8 / 24

phylodiv - an R package


builds on top of ...

- taxizedb for taxonomy data

- rgbif for biodiversity data

- rgbif/raster/ggplot2/ggtree/patchwork/etc. for mapping

9 / 24

High level workflow

read phylogeny
gather higher taxonomy
phylogenetic query
gather biodiversity data
visualization
... history
10 / 24

High level workflow

read phylogeny -> pd_read()
gather higher taxonomy -> pd_taxa()
phylogenetic query -> pd_query()
gather biodiversity data -> pd_biodiv()
visualization -> pd_viz()
... history -> pd_meta()
11 / 24

High level workflow

read phylogeny -> pd_read()
gather higher taxonomy -> pd_taxa()
phylogenetic query -> pd_query()
gather biodiversity data -> pd_biodiv()
visualization -> pd_viz()
... history -> pd_meta()


rapid iteration between phylogenetic queries and vizualization

iterate

12 / 24

LIVE DEMO

13 / 24

this will be easy, right?

14 / 24

hard parts


Taxonomic names

Queries on trees

Biodiversity data

Visualization

15 / 24

Taxonomy

* web or local data access (local faster, but requires more setup 😬)

* can we label nodes programatically?

  • just talked to Emily yesterday 😮 ¿annotate via OpenTree of Life?

* higher taxonomic names on taxa w/o names?

* leverage taxa internally? (see ropensci/taxa#184 for discussion)

16 / 24

Queries

* how do we make queries on trees drop dead simple AND flexible

  • all tips within node A
  • node A vs. node B
  • node A vs. node B and node C
  • node A of tree 1 vs. node C of tree 2
  • node D of each of 100 trees (bla! how do you viz. that?)
  • Helianthus annuus vs. Helianthus angustifolius
  • all names starting with Helianthus
  • compare sub trees that match shape X
17 / 24

Queries

formula syntax?

A ~ .
A ~ B
A ~ B + C
t1.A ~ t2.C
D ~ t*

function calls?

descendants(A)
children(A) ~ descendants(Y)
tip(A) ~ tip(B)
tips(Helianthus*)
tip(A, tree = 1) ~ tip(W, tree = 78)
18 / 24

Biodiversity

* many levels of data

  • total counts
  • total counts faceted by variable (e.g., country)
  • complete occurrence data
  • visually summarized data (rasters)

* strike balance btw bringing as much data to bear on problem as possible & speed

* making dealing with web requests easier: rate limits, caching, etc.

* consistent queries across data sources: spocc already done hard work - may integrate

19 / 24

Maps

* how best to do visualizations for single trees vs. many trees

* so many customizations possible - can only give a very simplified subset + hopefully allowing customizations on top of (easy-ish with ggplot2/ggtree)

             

20 / 24

What do you want to see?

* is phylodiv a bad or good idea?

* what are your main pain points in linking phylogenies to biodiversity data?

* do you really need a GUI interface? or is a programmatic interface okay? is it worth it if you get greater reproducibility

* data sources: which ones do you trust? not everyone is likely to trust the same source 😏

21 / 24

Future work

22 / 24

Future work

22 / 24

Future work

* focus on ease of use first, performance later

* make more taxonomic databases work locally

* make queries easy

* can phyloreferencing help at all here?

* maybe break up into a few pkgs

23 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow