18
Taxonomic 'data' exchange as expression and synthesis of phylogenetic claims Jonathan A. Rees National Evolutionary Synthesis Center IevoBio, 25 June 2014

Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

  • Upload
    jar375

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

'Slides' from a 5-minute presentation at iEvoBio 2014.

Citation preview

Page 1: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Taxonomic 'data' exchangeas

expression and synthesisof

phylogenetic claims

Jonathan A. ReesNational Evolutionary Synthesis Center

IevoBio, 25 June 2014

Page 2: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Synergy

CoL IRMNG NCBI GBIF EOL Union4 Treebase OpenTree...

Finding inconsistencies = goodbut hard

Collecting information is useful

Page 3: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

'Data' – BAH!

'data' 'information' 'representation' 'format' 'nomenclature' - how bland. Distracting.

Claims, not data. Consequential.

Page 4: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Terminology

Taxon: a set determined by a membership rule. ['taxon concept']

Character basedDescent basedConspecifcity based

Taxonomy: a collection of taxa that form a hierarchy.

Some taxonomies are phylogenetic (all clades).

Page 5: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Taxonomies are collections of claims

X

A

B

C

X includes A, B, and CA, B, C are mutually disjointX, A, B, and C are clades - if phylogenetic.

Page 6: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

The important claims are about biology

X includes Y

X1, X2, X3, … are mutually disjoint

X is a clade

X is a species

Page 7: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

We have to designate taxa somehow, when we express a claim

Many taxon names are polysemous

To be clear, always say 'in the sense of' some static document (article or database snapshot)X = Mammalia sensu http://dx.doi.org/10.1126/science.1211028

If used multiple ways in some document, give further qualifcation

Claims about taxa

Page 8: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Reasoning with claims

X includes Y and Y includes Z → X includes Z

X includes Y → X and Y are not disjoint

X and Y are clades → one includes the other, or they are disjoint

Page 9: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Two ways to be wrong

Wrong about designation

Wrong about science

Page 10: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

'Alignment' = estimating coreference

Alignment claims:

X = Y (X and Y are the same taxon)Mammalia sensu http://dx.doi.org/10.1126/science.1211028} = Mammalia sensu NCBI.20140515

Heuristics based on properties and relations (including names...)

Manual 'curation' if necessary

Page 11: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Incertae sedis

Confusing.

X is incertae sedis in A means(1) A includes X(2) it's not known which of A's non-incertae-sedis 'children' X belongs to, if any

(2) is not a claim about biology.

Logical content = (1).

Page 12: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

'Data exchange'

Taxonomies - NP

Page 13: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Exchanging 'corrections'

'Rozella belongs in Fungi.'

'Rhodophyceae is the same as Rhodophyta.'

'SILVA's Morganella isn't the same as Index Fungorum's Morganella.''Anolis isn't a clade unless it is Norops is merged into it.'

Page 14: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Interpreting advice

“Rozella is in Fungi.”

Rozella sensu SILVA115 and Fungi sensu SILVA115 belong to a clade disjoint from the other SILVA115 children of Nucletmycea.

How about let's apply the label 'Fungi' to such a clade and not to Fungi sensu SILVA115.

Page 15: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Notation not so important, but for example -

includes(X, Y)

disjoint(A, B, C, …)

clade(X)

node(X, A, B, C, …) - abbreviationspecies(X)

same(X, Y) notSame(X, Y)

sensu('Name', source)

+ nomenclatural claims

Page 16: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

On and on

Synthesis

Identifer stability

Alignment details

Compare 'macrotaxonomy' and 'microtaxonomy'

Defense of scrufy

Compare Rod's github proposal

Philosophy of language

Page 17: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Separate science from nomenclature.

Use logic to do science.

Always use names with sensu.Use heuristics to prevent paralysis.

Don't 'represent data' – express claims!

https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Expressing-phylogenetic-claims

Bottom line

Page 18: Taxonomic 'data' exchange as expression and synthesis of phylogenetic claimsRees claims-ievobio2014

Ack

Nico Franz, David Thau, Rod Page

Open Tree: Karen Cranston, Stephen Smith, Mark Holder, and legions of others

Gerald Jay Sussman

Jonathan A. Rees 2014

Copyright waived CC0 1.0