View
34
Download
0
Category
Tags:
Preview:
DESCRIPTION
Taxonomic databases: The SEEK and VegBank experience. R.K. Peet The University of North Carolina Ecological Society of America Vegetation Panel The SEEK development team. Biodiversity informatics depends on accurate and precise taxonomy. - PowerPoint PPT Presentation
Citation preview
Taxonomic databases: Taxonomic databases: The SEEK and VegBank The SEEK and VegBank
experienceexperience
R.K. PeetR.K. Peet
The University of North CarolinaEcological Society of America Vegetation Panel
The SEEK development team
• Accurate identification and labelling of organisms is a critical part of collecting, recording and reporting biological data.
• Increasingly, research in biodiversity and ecology is based on the integration (and re-use) of multiple datasets.
Biodiversity informatics Biodiversity informatics depends on accurate and depends on accurate and
precise taxonomyprecise taxonomy
• What was a minor annoyance for a few tens of records becomes intractable when looking at a million records.
• Some data types, such as organism identifications, are inherently more complex to define with the consequence that few standards have been adopted.
Biodiversity data structure
Taxonomic database
Observation database
Occurrence database
Observation/Collection Event
Specimen or Object
Bio-Taxon
Locality
Observation or Community Type
Observation type database
VegBankVegBank
• The ESA Vegetation Panel is developing VegBank as a public archive for vegetation plot observations (http://vegbank.org).
• VegBank is expected to function for vegetation plot data in a manner analogous to GenBank.
• Primary data will be deposited for reference, novel synthesis, and reanalysis.
• The database architecture is generalizable to most types of species co-occurrence data.
www.vegbank.orgwww.vegbank.org
What is SEEK?Science Environment for Ecological Knowledge
Multidisciplinary project to create:Scientific-workflow system (Kepler)
– Design, reuse, and execute scientific analyses
Distributed data network (EcoGrid)– Environmental, ecological, and systematics data
KR & Semantic Mediation– Discover, integrate, and compose hard-to-relate data and services via
ontologies
Taxonomic concept services– Resolve taxon ambiguities
Collaborators (the SEEK team)• NCEAS, UNM, SDSC/UCSD, U Kansas• Vermont, Napier, ASU, UNC
Data SetData Set
Data Set
Ecological Data Set
Ecological data set providers
Concept Provider 1e.g. Fishbase
Concept Provider 3e.g. Prometheus
Concept Provider 2e.g. ITIS
Taxonomic concept providers
Taxonomy transfer schema- TML
Concept matching/expansion/…Weighted concepts
Semantic Mediation SystemReturn list of Data Sets
User’s Taxonomic concept + quality measure
Name/Concept Repository
Ecological metadata language- EML (Containing Collector’s
Taxonomic concept(s))
EML repository
Taxon coverage
SEEK High-Level Approach
Taxonomic database Taxonomic database challenge:challenge:
Standardizing organisms and Standardizing organisms and communitiescommunities
The problem:The problem: Integration of data potentially Integration of data potentially
representing different times, places, representing different times, places, investigators and taxonomic standards.investigators and taxonomic standards.
The traditional solution:The traditional solution: A standard list of organisms / A standard list of organisms /
communities.communities.
Standard lists are available for Taxa
Representative examples for higher plants in Representative examples for higher plants in North America / USNorth America / US
USDA PlantsUSDA Plants http://plants.usda.gov ITIS http://www.itis.usda.gov NatureServe http://www.natureserve.org BONAP Flora North America
These are intended to be checklists wherein the taxa These are intended to be checklists wherein the taxa recognized perfectly partition all plants. The lists can recognized perfectly partition all plants. The lists can be dynamic.be dynamic.
Abies lasiocarpa
Abies bifolia
Abies lasiocarpa
sec. Littlesec. USDA PLANTS
sec. Flora North America
Three concepts of subalpine firThree concepts of subalpine fir
Splitting one species into two illustrates the ambiguity often associated with scientific names.
USDA Plants & ITIS
Abies lasiocarpa
var. lasiocarpa
var. arizonica
One concept ofAbies lasiocarpa
Flora North America
Abies lasiocarpa
Abies bifolia
A narrow concept of Abies lasiocarpa
Partnership with USDA plants to provide plant concepts for data integration
Andropogon virginicusAndropogon virginicus complex in the complex in the CarolinasCarolinas
9 elemental units; 17 base concepts9 elemental units; 17 base concepts
Standardized taxon lists Standardized taxon lists failfail
to allow dataset integrationto allow dataset integration
The reasons include:The reasons include:
• Taxonomic concepts are not defined (just Taxonomic concepts are not defined (just lists), lists),
• Relationships among concepts are not Relationships among concepts are not defineddefined
• The user cannot reconstruct the database as The user cannot reconstruct the database as viewed at an arbitrary time in the past, viewed at an arbitrary time in the past,
• Multiple party perspectives on taxonomic Multiple party perspectives on taxonomic concepts and names cannot be supported or concepts and names cannot be supported or reconciled.reconciled.
Name ReferenceConcept
Taxonomic theoryTaxonomic theory
A taxon concept represents a unique combination of a name and a reference.
Report -- name sec reference.
.
Name ConceptUsage
A usage represents an association of a concept with
a name.
• The name used in defining the concept need not be the same name used in your work.
e.g. Carya alba = Carya tomentosa sec. Gleason & Cronquist 1991.
• Usage can be used to apply multiple name systems to a concept
Relationships among concepts
allow comparisons and conversions
• Congruent, equal (=)• Includes (>)• Included in (<)• Overlaps (><)• Disjunct (|)• and others …
High-elevation fir trees of western US
AZ NM CO WY MT AB eBC wBC WA OR
var. arizonica
Abies lasiocarpa
Distribution
USDA & ITIS
Flora North America
Abies bifolia Abies lasiocarpa
A. lasiocarpa sec USDA > A. lasiocarpa sec FNA
A. lasiocarpa sec USDA > A. bifolia sec FNA
A. lasiocarpa v. lasiocarpa sec USDA > A. lasiocarpa sec FNA
A. lasiocarpa v. lasiocarpa sec USDA | A. bifolia sec FNA
A. lasiocarpa v. arizonica sec USDA < A. bifolia sec FNA
var. lasiocarpa
Party Perspective
The Party Perspective on a Concept includes:
• Status – Standard, Nonstandard, Undetermined
• Correlation with other concepts – Equal, Greater, Lesser, Overlap, Undetermined.
• Start & Stop dates.
Intended functionality
• Organisms are labeled by reference to concept (name-reference combination),
• Party perspectives on concepts and names can be dynamic, but remain perfectly archived,
• User can select which party perspective to follow, and at which date,
• Different names systems are supported,
• Enhanced stability in recognized concepts by separating name assignment and rank from concept.
When reporting the identity of organisms in publications, data, or on specimens, provide the full scientific name of each kind of organism and the reference that provided the taxonomic concept.
e.g., Abies lasiocarpa sec. Flora North America 1997.
Best practice: Report taxa by reference to concepts.
• Reference high-quality sources for taxon concepts such as a major compendium that provides its own defined concepts, or a source that references the concepts of others.
• Avoid checklists as they typically lack true taxonomic descriptions or circumscriptions.
Best practice: Choose high-quality concepts
SEEK & GBIF are working to provide standards for concept
data• Several data models incorporate
taxon concepts. The IOPI, VegBank, and Taxonomer models are optimized for different uses.
• SEEK, GBIF, and TDWG developed TCS, which was adopted by TDWG in August 2005 and is being implemented by GBIF and SEEK.
• A name in a publication could be either a concept or an identification.
• An annotation is an identification.
• Identifications should include linkage to at least one concept, but need not be limited to a single concept.
Concepts and identifications
are distinct.
Documenting identifications
Relationships added for identification= Indicates identification ~ (or aff.) Indicates similarity≡ Indicates identity, or defined as
Example of complex identification< Potentilla sec. Cronquist 1991 +~ Potentilla simplex sec Cronquist 1991 +~ Potentilla canadensis sec Cronquist 1991
Fuzzy logic qualification
1 = Absolutely wrong2 = Understandable but wrong3 = Reasonable or acceptable 4 = Good answer5 = Absolutely correct
Biodiversity informatics depends on standards and
connectivity• Names (Linnean Core)• Taxonomic concepts (TCS)• Publications (Alexandrian core, etc)• Observations (proposed TDWG
standard)• Identifications (proposed EML
extension)• GUIDS (under development by GBIF)
Tools to develop and map concepts
• Taxonomists need mapping and visualization tools for relating concepts of various authors. SEEK is building prototypes for review and possible adoption.
• Aggregators need tools for mapping relationships among concepts.
• Users need tools for entering legacy concepts. Several are in development.
Concept mapper
Demonstration ProjectsConcept relationships of Southeastern US
plants treated in different floras.
Based on > 50,000 mapped concepts
Step 1: Adoption of minimum standards and best practices by high-quality journals, funding agencies, and professional organizations.
Distributed information systems - and the way
ahead
Publishers, curators and data managers need to tag taxon
interpretations with concepts
• Precedence exists with tagging literature citations and GenBank accessions
• Presses are linking scientific names in many ejournals to ITIS (e.g. Evolution, Ecology)
Step 2: Creation, availability, and maintenance of databases that document core sets of taxonomic concepts and the relationships of these concepts to each other.
The way ahead
True concept-based checklists
• Equivalent of ITIS but with concept documentation and including how other concepts map onto the concepts accepted by the party.
• Several are operative or in development including EuroMed, IOPI-GPC, Biotics, VegBank. Concept documentation planned for ITIS/USDA.
Registration system and standard identifiers for names, references, and
concepts• Essential for data exchange
• GBIF is hosting a set of international workshops to design the GUID infrastructure.
Step 3: Development and provision of tools to facilitate mark-up of data and manuscripts with taxonomic concepts
Step 4: Demonstration projects
The way ahead
Recommended