Upload
natalie-watkins
View
214
Download
1
Embed Size (px)
Citation preview
Philosophy
Structure has to be as simple as possible But not any simpler!!
Alternatives to represent classification and hierarchy
Alternatives to represent synonymy
Hierarchy: flat table
Every rank in the hierarchy is represented by a field in the table
Simplest solution Easy to create Easy to query
Hierarchy: flat table
Problems not normalised!
Not a real problem if a quick-and-dirty solution is all that is needed
Difficult to maintain hierarchy in the long term
‘Standard’ problems with non-normalised database Possible conflicting information, inefficient storage…
Cfr MASDEA; too simple
Hierarchy: normalised tables
Every rank is represented by a separate table
Not very difficult to write a query to regenerate flat table
Every taxon can have additional information Extra fields with description…
Hierarchy: normalised tables
Avantages Easy to maintain and query Normailised, possible to add information at any level
of the hierarchy Drawbacks
Ranks are hard-wired on the structure of the database New rank would require change of the structure of the
database And probably of the user interface, web interface…
Number of tables Lot of functionality duplicated
Taxonomic reality
Ranks used depend on the taxonomic group Botany: mainly infra-specific; zoology: mainly
on higher levels Many of the ranks are only sparsely used
Needs for a more flexible system Much of the functionality is the same
across all ranks ‘parent’, synonymy Authority, description…
‘Open Hierarchy’
Possible to define new ranks without having to rewrite the structure of the database
All taxonomic names are stored in a field in a single table; other fields indicate parent and rank
Many-to-one relation: a single parent, several descendants Include ID of parent in the record of the
descendant
Open Hierarchy
Avantages Completely normalised Flexible
Drawbacks Difficult to query classification
Queries of the type ‘all species of the Echinodermata’… Solution:
‘Calculated field’ Programmatical (loop in computer language) Recursive query
Synonymy
Every taxon can have several synonyms; in principle, only one valid name for any synonym Many-to-one relation: one valid name,
many synonymous names Include ID of the valid name in the record
of the synonymous nameOther fields for the type of synonymy…
Calculated field: ‘stored path’
Calculate a field, as a concatenation of id of parent, parent of parent…
E.g. x5x45x65x 5: Animalia 45: Arthropoda 65: Crustacea Stored path of all taxa belonging to
Crustacea start with x5x45x65x
Query the Stored Path
Get all species from Echinodermata:
select * from obis.tnameswhere storedpath~(select '^'||
storedpath||id||'x' from obis.tnameswhere tname='Echinodermata')::textand rank_id=220
Recursive query
All taxa belonging to given taxon:
with recursive includedtaxa(id, tname) as (select id, tname from obis.tnames where tname='Semelidae'
unionselect tnames.id, tnames.tname from obis.tnames inner join includedtaxa on tnames.parent_id=includedtaxa.id
) select * from includedtaxa order by tname
The other way
Finding parent of given rank of a species
with recursive parenttaxa(id, parent_id, tname) as (select id, parent_id, tname from obis.tnames where tname='Abra alba'
unionselect tnames.id, tnames.parent_id, tnames.tname from obis.tnames inner join parenttaxa on parenttaxa.parent_id=tnames.idand tnames.rank_id>=140
) select * from parenttaxa order by tname
Rest of the taxonomic model
Ranks should be in a separate table Information on the level of the rank
can be added Possibility of extra quality control
Rank of a parent as compared to rank of descendants
Rank of siblings should be same
Documentation
Documenting sources of information Add sources/references
‘Audit trail’: source of the information in the database
Taxonomic information: reference of the original description
Type of the source: expert, database, publication
Date and person responsible for the last revision of the record
Sources
Many-to-many relation Every source can contain information on
several taxa A single taxon can be documented in several
sources
Necessitates an extra table to represent the relationship Divide one many-to-many in to one-to-many
relationships
Add distribution
Localities from where a taxon has been reported
Many-to-many relation One locality has several taxa One taxon is found on several localities
Relation must be qualified Source! Validity of the observation