Taxonomic 'data' exchangeas
expression and synthesisof
phylogenetic claims
Jonathan A. ReesNational Evolutionary Synthesis Center
IevoBio, 25 June 2014
Synergy
CoL IRMNG NCBI GBIF EOL Union4 Treebase OpenTree...
Finding inconsistencies = goodbut hard
Collecting information is useful
'Data' – BAH!
'data' 'information' 'representation' 'format' 'nomenclature' - how bland. Distracting.
Claims, not data. Consequential.
Terminology
Taxon: a set determined by a membership rule. ['taxon concept']
Character basedDescent basedConspecifcity based
Taxonomy: a collection of taxa that form a hierarchy.
Some taxonomies are phylogenetic (all clades).
Taxonomies are collections of claims
X
A
B
C
X includes A, B, and CA, B, C are mutually disjointX, A, B, and C are clades - if phylogenetic.
The important claims are about biology
X includes Y
X1, X2, X3, … are mutually disjoint
X is a clade
X is a species
We have to designate taxa somehow, when we express a claim
Many taxon names are polysemous
To be clear, always say 'in the sense of' some static document (article or database snapshot)X = Mammalia sensu http://dx.doi.org/10.1126/science.1211028
If used multiple ways in some document, give further qualifcation
Claims about taxa
Reasoning with claims
X includes Y and Y includes Z → X includes Z
X includes Y → X and Y are not disjoint
X and Y are clades → one includes the other, or they are disjoint
Two ways to be wrong
Wrong about designation
Wrong about science
'Alignment' = estimating coreference
Alignment claims:
X = Y (X and Y are the same taxon)Mammalia sensu http://dx.doi.org/10.1126/science.1211028} = Mammalia sensu NCBI.20140515
Heuristics based on properties and relations (including names...)
Manual 'curation' if necessary
Incertae sedis
Confusing.
X is incertae sedis in A means(1) A includes X(2) it's not known which of A's non-incertae-sedis 'children' X belongs to, if any
(2) is not a claim about biology.
Logical content = (1).
'Data exchange'
Taxonomies - NP
Exchanging 'corrections'
'Rozella belongs in Fungi.'
'Rhodophyceae is the same as Rhodophyta.'
'SILVA's Morganella isn't the same as Index Fungorum's Morganella.''Anolis isn't a clade unless it is Norops is merged into it.'
Interpreting advice
“Rozella is in Fungi.”
Rozella sensu SILVA115 and Fungi sensu SILVA115 belong to a clade disjoint from the other SILVA115 children of Nucletmycea.
How about let's apply the label 'Fungi' to such a clade and not to Fungi sensu SILVA115.
Notation not so important, but for example -
includes(X, Y)
disjoint(A, B, C, …)
clade(X)
node(X, A, B, C, …) - abbreviationspecies(X)
same(X, Y) notSame(X, Y)
sensu('Name', source)
+ nomenclatural claims
On and on
Synthesis
Identifer stability
Alignment details
Compare 'macrotaxonomy' and 'microtaxonomy'
Defense of scrufy
Compare Rod's github proposal
Philosophy of language
Separate science from nomenclature.
Use logic to do science.
Always use names with sensu.Use heuristics to prevent paralysis.
Don't 'represent data' – express claims!
https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Expressing-phylogenetic-claims
Bottom line
Ack
Nico Franz, David Thau, Rod Page
Open Tree: Karen Cranston, Stephen Smith, Mark Holder, and legions of others
Gerald Jay Sussman
Jonathan A. Rees 2014
Copyright waived CC0 1.0