Upload
taxonbytes
View
2.025
Download
0
Embed Size (px)
DESCRIPTION
Update on the Euler/X project at http://www.entsoc.org/entomology2014; see also: http://taxonbytes.org/prior-work-on-concept-taxonomy-2013/
Citation preview
Aligning insect phylogenies:
Perelleschus and other cases
Nico M. Franz 1,2
Arizona State University
http://taxonbytes.org/
1 Concepts and tools developed jointly with members of the Ludäscher Lab (UC Davis & UIUC):
Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher
2 Systematics, Evolution and Biodiversity Section, Ten Minute Papers
Annual Meeting of the Entomological Society of America
November 18, 2014 - Portland, Oregon
On-line @ http://www.slideshare.net/taxonbytes/franz-2014-esa-aligning-insect-phylogenies-perelleschus-and-other-cases-41654235
Research motivation: 1
How can we represent, and reason over,
taxonomic concept provenance,
based on varying input classifications
and differentially sampled phylogenies?
1 This presentation concentrates on the "how?"; though the "why?" is addressed in the References (listed at the end).
Taxonomic concept: 1
The circumscription of a perceived
(or, more accurately, hypothesized)
taxonomic group, as advocated by
a particular author and source.
Definitional preliminaries, 1
1 Not the same as species concepts, which are theories about what species are, and/or how they are recognized.
Provenance: 1
Information describing the origin, derivation,
history, custody, or context of an entity (etc.).
Provenance establishes the authenticity, integrity
and trustworthiness of information about entities.
Definitional preliminaries, 2
1 See, e.g.: http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
Alignment ("merge"):
A comprehensive, logically consistent, and
(where possible) well-specified reconciliation
of shared and unique Euler regions that result from
integrating two or more taxonomic concept
hierarchies ("trees") with RCC-5 articulations.1
Definitional preliminaries, 3
1 RCC-5 = Region Connection Calculus (set theory relationships: congruence, inclusion, overlap, exclusion, etc.).
Input for provenance reasoning: Perelleschus use case, 1936−2013
Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013)
Female
,
habitu
s
Labium Maxill
a
• Habitus, mouthparts
Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013)
Female
,
habitu
s
Labium Maxill
a
• Habitus, mouthparts One might call this string a Taxonomic Concept Label.
Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013)
Synapomorphy (genus-level): Spermatheca
with an acute, sclerotized appendix at
insertion of the collum (character 17:1).
Synapomorphy (subclade-level):
Aedeagus with endophallic
sclerites extending in apical
half of aedeagus (character
11:1).
• Male & female terminalia, showing putative synapomorphies
"11"
"17"
Phylogeny: Perelleschus sec. Franz & Cardona-Duque (2013)
Aedeagal synapomorphy
Spermathecal synapomorphy
Perelleschus concept history:
• 6 classifications,
• 54 taxonomic concepts,
• 75 concept2 RCC-5 articulations;
Suitable for provenance reasoning. 1
1 Franz et al. 2014. Reasoning over taxonomic change: Exploring alignments for the Perelleschus use case. PLoS ONE.
1936: 1st species-
level concept.
1954: Genus named,
+ 2 species.
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
1936: 1st species-
level concept.
1954: Genus named,
+ 2 species.
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
1936: 1st species-
level concept.
1954: Genus named,
+ 2 species.
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
1936: 1st species-
level concept.
1954: Genus named,
+ 2 species.
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
1936: 1st species-
level concept.
1954: Genus named,
+ 2 species.
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
1936: 1st species-
level concept.
1954: Genus named,
+ 2 species.
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
Focal alignments (today)
• 1986 versus 2001
• Classification / Phylogeny
• 2001 versus 2006
• Phylogeny / Exemplar Analysis
• 2001 versus 2013 (appended)
• Phylogeny / Extended Phylogeny
2001 /
2013
"A toolkit for consistently aligning
sets of hierarchically arranged entities
under (relaxable) logic constraints,
and using RCC-5 articulations."
Introducing the Euler/X software toolkit (Open Source)
Desktop tool @ https://bitbucket.org/eulerx
Euler server @ http://euler.asu.edu
Euler/X toolkit − Please ask me (later) about a live demonstration!
Euler/X uses Answer Set Programming.
The reasoner asks, and solves, the question:
"Which possible worlds can be generated
that satisfy (i.e., are consistent with)
a given set of input constraints?" 1
Euler/X uses Answer Set Programming.
The reasoner asks, and solves, the question:
"Which possible worlds can be generated
that satisfy (i.e., are consistent with)
a given set of input constraints?" 1
1 Input constraints:
• T1 − taxonomy 1
• T2 − taxonomy 2
• A − user-asserted articulations
• C − additional 'tree' constraints
Alignment 1 - Perelleschus sec. WOB (1986) versus sec. FOB (2001)
T1: Perelleschus sec. 1986
• Traditional classification
• 1 genus-level concept
• 3 species-level concepts
Alignment 1 - Perelleschus sec. WOB (1986) versus sec. FOB (2001)
T1: Perelleschus sec. 1986
• Traditional classification
• 1 genus-level concept
• 3 species-level concepts
T2: Perelleschus sec. 2001
• Phylogenetic revision
• 2 genus-level concepts
• 7 clade-level concepts
• 9 species-level concepts
Year Source
Parent
conceptChild
concepts
T1
T2 to T1Articulations
(as provided
by the user)
Format for alignment input file (constraints: T1, T2, A, C)
T2
Input visualization
Six1 user-asserted input articulations (pink lines) are sufficient to yield a single,
well-specified alignment.
1 Actually, three (species-level) articulations are sufficient to achieve this for the 2001/1986 alignment.
Alignment (merge) visualization
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
Alignment (merge) visualization
3 congruent 2001/1986 species-level concepts.
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
Alignment (merge) visualization
3 congruent 2001/1986 species-level concepts.
6 species-level concepts unique sec. FOB (2001).
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
Alignment (merge) visualization
3 congruent 2001/1986 species-level concepts.
6 species-level concepts unique sec. FOB (2001).
3 congruent 2001/1986 species-level concepts.
6 species-level concepts unique sec. FOB (2001).
6 clade-level concepts unique to FOB (2001).
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
Alignment (merge) visualization
3 congruent 2001/1986 species-level concepts.
6 species-level concepts unique sec. FOB (2001).
6 clade-level concepts unique to FOB (2001).
2001.PER & 2001.PHY in overlap with 1986.PER.
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
Alignment (merge) visualization
We can 'zoom in' on the overlap
and resolve the resulting subregions
in the "merge concept view".
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
Merge concept view (in part)
2001 concepts
"2001.PER and 1986.PER share a region (2001.PER * 1986.PER) constituted (at lower
levels) by 2001/1986.P_rectirostris; this latter region is that which is entailed in
1986.PER and excluded from 2001.PHY. (1986.PER\2001.PHY)."
2001/1986 concepts
Merge concept view (in part)
2001 concepts
"2001.PHYsubcin/1986.Psubcin differentially 'participates' in 2001.PHY and
1986.PER; but not 2001.PER (or any of its children)."
2001/1986 concepts
T1: Perelleschus sec. 2001
• Phylogenetic revision
• 8 ingroup species concepts
• 2 outgroup concepts
• 18 concepts total
Alignment 2 - Perelleschus sec. FOB (2001) versus sec. F (2006)
T1: Perelleschus sec. 2001
• Phylogenetic revision
• 8 ingroup species concepts
• 2 outgroup concepts
• 18 concepts total
T2: Perelleschus sec. 2006
• Exemplar analysis
• 2 ingroup species concepts
• 1 outgroup concept
• 7 concepts total
Alignment 2 - Perelleschus sec. FOB (2001) versus sec. F (2006)
Logic representation challenge:
Perelleschus sec. 2001 & 2006 concepts
have incongruent sets of subordinate members,
yet each concept has congruent synapomorphies.
Ostensive alignment: the congruence among higher-level
concepts is assessed in relation to their entailed members.
Ostension: giving meaning through an act of pointing out.
Definitional preliminaries, 4 1
1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
Ostensive alignment: the congruence among higher-level
concepts is assessed in relation to their entailed members.
Ostension: giving meaning through an act of pointing out.
Intensional alignment: the congruence among higher-
level concepts is assessed in relation to their properties.
Intension: giving meaning through the specification of properties.
Definitional preliminaries, 4 1
1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
Ostensive alignment – members are all that counts
Challenge 1: Ostensive alignmentInput constraints
Ostensive alignment
2001 & 2006
Ostensive alignment – members are all that counts
Challenge 1: Ostensive alignment
Solution: 11 ingroup concept articulations
are coded ostensively – either as
<, ><, or | – to represent non-
congruence in the representation
of child concepts
Input constraints
Ostensive alignment
2001 & 2006
Ostensive alignment – members are all that counts
Challenge 1: Ostensive alignment
Solution: 11 ingroup concept articulations
are coded ostensively – either as
<, ><, or | – to represent non-
congruence in the representation
of child concepts
Result: 2006.PER < 2001.PER
2006.PER | 2001.[5 species concepts]
etc.
Input constraints
Ostensive alignment
2001 & 20065 x |
2 x ><
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment
2001 & 2006
Challenge 2: Intensional alignment
"17"
"11"
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment
2001 & 2006
Challenge 2: Intensional alignment
Solution: An Implied Child (_IC) concept is
added to the undersampled (2006)
clade concept; and the (5) "missing"
species-level concepts are included
within this Implied Child
"17"
"11"
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment
2001 & 2006
Challenge 2: Intensional alignment
Solution: An Implied Child (_IC) concept is
added to the undersampled (2006)
clade concept; and the (5) "missing"
species-level concepts are included
within this Implied Child
11 ingroup concept articulations are
coded intensionally – as == or > –
to reflect congruent synapomorphies
(chars. 11, 17) of 2001 & 2006
"17"
"11"
Intensional alignment – representation of congruent synapomorphies
Input constraints
Intensional alignment
2001 & 2006
Challenge 2: Intensional alignment
Result: The genus- and ingroup clade-level
concepts are inferred as congruent:
2006. PER == 2001.PER
2006.PcarPeve == 2001.PcarPsul
etc.
Review – representing ostensive versus intensional alignments
Ostensive alignment
2001.PER includes more
species-level concepts
than 2006.PER [>].
Review – representing ostensive versus intensional alignments
Ostensive alignment
2001.PER includes more
species-level concepts
than 2006.PER [>].
Intensional alignment
2006.PER reconfirms the
synapomorphies inferred
in 2001.PER [==].
Is this approach scalable?
Quite possibly yes.
Use case: Alternative phylogenetic schemes of higher-level weevils
T1: Curculionoidea sec. Kuschel (1995)
• Cladistic analysis
• 41 concepts
Use case: Alternative phylogenetic schemes of higher-level weevils
T1: Curculionoidea sec. Kuschel (1995)
• Cladistic analysis
• 41 concepts
T2: Curculionoidea sec. Marvaldi &
Morrone (2000)
• Cladistic analysis
• 25 concepts
Alignment: Curculionoidea sec. K (1995) versus sec. MM (2000)
Initial visual impression: Lots of green rectangles, yellow octagons, and overlap (><).
Much taxonomic concept incongruence.
Use case: Dwarf lemurs sec. 1993 & 2005 1
Chirogaleus furcifer sec. Mühel (1890) – Brehms Tierleben.
Public Domain: http://books.google.com/books?id=sDgQAQAAMAAJ
1 Franz et al. 2014. Taxonomic provenance: Two influential primate classifications logically aligned. (in preparation)
The 2nd & 3rd Editions of the Mammal Species of the World
Primates sec. Groves (1993)
317 taxonomic concepts,
233 at the species level.
Primates sec. Groves (2005)
483 taxonomic concepts,
376 at the species level.
1993 2005
Δ = 143
species-
level
concepts
Alignment of Primates sec. Groves 1993 / 2005
Primates: 800 concepts
402
articulations
153,111 MIR
~ 380x information gain!
Strepsirrhini sec. MSW3
Haplorrhini sec. MSW3
Catarrhini sec. MSW3
Taxonomic provenance quantify name/meaning dissociation
"Reliable names" "Unreliable names"
'Dissociation' means that either un-identical names are paired with congruent concepts,
or that identical names are paired with incongruent concepts.
So, given an input set of [T1, T2, A, C], one gains:
(1) Logical consistency in the alignment;
(2) Intended degree of alignment resolution;
(3) Additional, logically implied articulations;
(4) Visualizations of taxonomic provenance;
(5) Quantifications of name/meaning relations.
In summary (1) − What this approach can provide:
• Compatibility with contemporary Linnaean nomenclature (and PhyloCode too);
• Integration of many-to-many name/circumscription relationships across taxonomies;
• Reconciliation of traditional classifications with fully bifurcated phylogenies;
• Representation of monotypic concept lineages with congruent taxonomic extensions;
In summary (2) − Representation and reasoning abilities
• Compatibility with contemporary Linnaean nomenclature (and PhyloCode too);
• Integration of many-to-many name/circumscription relationships across taxonomies;
• Reconciliation of traditional classifications with fully bifurcated phylogenies;
• Representation of monotypic concept lineages with congruent taxonomic extensions;
• Accounting for insufficiently specified higher-level entities:
• Undersampled outgroup entities;
• Differentially sampled ingroup entities;
• Resolution of taxonomically overlapping entities and merge concepts;
• Differentiation of ostensive versus intensional readings of concept articulations;
• Representation of topologically localized resolution versus ambiguity in alignments.
In summary (2) − Representation and reasoning abilities
• Compatibility with contemporary Linnaean nomenclature (and PhyloCode too);
• Integration of many-to-many name/circumscription relationships across taxonomies;
• Reconciliation of traditional classifications with fully bifurcated phylogenies;
• Representation of monotypic concept lineages with congruent taxonomic extensions;
• Accounting for insufficiently specified higher-level entities:
• Undersampled outgroup entities;
• Differentially sampled ingroup entities;
• Resolution of taxonomically overlapping entities and merge concepts;
• Differentiation of ostensive versus intensional readings of concept articulations;
• Representation of topologically localized resolution versus ambiguity in alignments.
• Next critical step(s): accessible, scalable, usable, integrated web instance of Euler/X
In summary (2) − Representation and reasoning abilities
In summary (3) − Take-home message
We can explain (much of)
taxonomy's legacy to computers (e.g.)
for superior name/meaning resolution.
Well, then, should we?
And at what cost?
And, in the near future..?
A future beyond concept-to-concept alignments
Reasoning over the provenance / identity of:
• Taxonomic concepts;
• Concept-associated traits;
• Vouchered specimens.
Acknowledgments
• Euler/X team: Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers
& Bertram Ludäscher.
• Juliana Cardona-Duque, Charles O'Brien (Perelleschus), Naomi Pier (primates) &
Alan Weakley (Magnolia).
• taxonbytes lab members: Andrew Johnston & Guanyang Zhang.
• NSF DEB–1155984, DBI–1342595 (Franz); IIS–118088, DBI–1147273
(Ludäscher).
• Information @ http://taxonbytes.org/tag/concept-taxonomy/
• Euler/X code @ https://bitbucket.org/eulerx
• Euler server @ http://euler.asu.edu
https://sols.asu.edu/Franz Lab: http://taxonbytes.org/
Select references on concept taxonomy and the Euler/X toolkit
• Franz et al. 2008. On the use of taxonomic concepts in support of biodiversity
research and taxonomy. In: The New Taxonomy; pp. 63–86. Link
• Franz & Peet. 2009. Towards a language for mapping relationships among
taxonomic concepts. Systematics and Biodiversity 7: 5–20. Link
• Franz & Thau. 2010. Biological taxonomy and ontology development: Scope and
limitations. Biodiversity Informatics 7: 45–66. Link
• Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. WFLP
2013 – 22nd International Workshop on Functional and (Constraint) Logic
Programming. Link
• Chen et al. 2014. A hybrid diagnosis approach combining Black-Box and White-
Box reasoning. Lecture Notes in Computer Science 8620: 127–141. Link
• Franz et al. 2014. Names are not good enough: Reasoning over taxonomic change in
the Andropogon complex. Semantic Web – Interoperability, Usability, Applicability –
Special Issue on Semantics for Biodiversity. (in press) Link
• Franz et al. 2014. Reasoning over taxonomic change: Exploring alignments for the
Perelleschus use case. PLoS ONE. (in press) Link
• Franz et al. 2015. Taxonomic provenance: Two influential primate classifications
logically aligned. (in preparation)
Miscellaneous appended slides
T1 = Taxonomy 1
T2 = Taxonomy 2
A = Input articulations
[==, >, <, ><, |]
C = Taxonomic constraints
User/reasoner interaction: achieving well-specified alignments
Articulations are asserted
by taxonomic experts.
User/reasoner interaction: achieving well-specified alignments
MIR =Maximally Informative Relations
[==, >, <, ><, |]for each concept pair
Yes
Yes
Euler/X toolkit − Desktop version downloadable on Bitbucket
Alan Weakley 2014 (UNC Herbarium) - Magnolia concept evolution
R32 lattice of RCC-5 articulations (lighter color = less certainty)
The other piece in the puzzle: Concept-to-voucher identifications
Source: Baskauf & Webb. 214. Darwin-SW. URL: http://www.semantic-web-journal.net/system/files/swj635.pdf
1986: Validation of
generic name.
2001: Revision & phylogeny,
+ 6 / - 1 species.
2006: Exemplar cladistic
analysis; 3 species. 2013: Revision & phylogeny,
+ 2 species.
Focal alignments (today)
• 1986 versus 2001
• Classification / Phylogeny
• 2001 versus 2006
• Phylogeny / Exemplar Analysis
• 2001 versus 2013
• Phylogeny / Extended Phylogeny
2001 /
2013
Ostensive alignment
10 overlapping articulations
Species-level congruence
'Cascading' clade concepts
Intensional alignment
Congruent synapomorphies
reconfirmed across sub-
clades; with minor low-
level concept additions
Alignment 3 - Perelleschus sec. FOB (2001) versus sec. FCD (2013)