29
Representing taxonomy MarBEF-IODE workshop Oostende, 19-23 March 2007

Representing taxonomy MarBEF-IODE workshop Oostende, 19-23 March 2007

Embed Size (px)

Citation preview

Representing taxonomy

MarBEF-IODE workshopOostende, 19-23 March 2007

Philosophy

Structure has to be as simple as possible But not any simpler!!

Alternatives to represent classification and hierarchy

Alternatives to represent synonymy

Hierarchy: flat table

Every rank in the hierarchy is represented by a field in the table

Simplest solution Easy to create Easy to query

Hierarchy: flat table

Hierarchy: flat table

Problems not normalised!

Not a real problem if a quick-and-dirty solution is all that is needed

Difficult to maintain hierarchy in the long term

‘Standard’ problems with non-normalised database Possible conflicting information, inefficient storage…

Cfr MASDEA; too simple

Hierarchy: normalised tables

Every rank is represented by a separate table

Not very difficult to write a query to regenerate flat table

Every taxon can have additional information Extra fields with description…

Hierarchy: cascading tables

Hierarchy: normalised tables

Avantages Easy to maintain and query Normailised, possible to add information at any level

of the hierarchy Drawbacks

Ranks are hard-wired on the structure of the database New rank would require change of the structure of the

database And probably of the user interface, web interface…

Number of tables Lot of functionality duplicated

Taxonomic reality

Ranks used depend on the taxonomic group Botany: mainly infra-specific; zoology: mainly

on higher levels Many of the ranks are only sparsely used

Needs for a more flexible system Much of the functionality is the same

across all ranks ‘parent’, synonymy Authority, description…

‘Open Hierarchy’

Possible to define new ranks without having to rewrite the structure of the database

All taxonomic names are stored in a field in a single table; other fields indicate parent and rank

Many-to-one relation: a single parent, several descendants Include ID of parent in the record of the

descendant

Open Hierarchy

Avantages Completely normalised Flexible

Drawbacks Difficult to query classification

Queries of the type ‘all species of the Echinodermata’… Solution:

‘Calculated field’ Programmatical (loop in computer language) Recursive query

Synonymy

Every taxon can have several synonyms; in principle, only one valid name for any synonym Many-to-one relation: one valid name,

many synonymous names Include ID of the valid name in the record

of the synonymous nameOther fields for the type of synonymy…

Implementation in OBIS(PostgreSQL)

Calculated field: ‘stored path’

Calculate a field, as a concatenation of id of parent, parent of parent…

E.g. x5x45x65x 5: Animalia 45: Arthropoda 65: Crustacea Stored path of all taxa belonging to

Crustacea start with x5x45x65x

Query the Stored Path

Get all species from Echinodermata:

select * from obis.tnameswhere storedpath~(select '^'||

storedpath||id||'x' from obis.tnameswhere tname='Echinodermata')::textand rank_id=220

Recursive query

All taxa belonging to given taxon:

with recursive includedtaxa(id, tname) as (select id, tname from obis.tnames where tname='Semelidae'

unionselect tnames.id, tnames.tname from obis.tnames inner join includedtaxa on tnames.parent_id=includedtaxa.id

) select * from includedtaxa order by tname

The other way

Finding parent of given rank of a species

with recursive parenttaxa(id, parent_id, tname) as (select id, parent_id, tname from obis.tnames where tname='Abra alba'

unionselect tnames.id, tnames.parent_id, tnames.tname from obis.tnames inner join parenttaxa on parenttaxa.parent_id=tnames.idand tnames.rank_id>=140

) select * from parenttaxa order by tname

Rest of the taxonomic model

Ranks should be in a separate table Information on the level of the rank

can be added Possibility of extra quality control

Rank of a parent as compared to rank of descendants

Rank of siblings should be same

Documentation

Documenting sources of information Add sources/references

‘Audit trail’: source of the information in the database

Taxonomic information: reference of the original description

Type of the source: expert, database, publication

Date and person responsible for the last revision of the record

Sources

Many-to-many relation Every source can contain information on

several taxa A single taxon can be documented in several

sources

Necessitates an extra table to represent the relationship Divide one many-to-many in to one-to-many

relationships

Add distribution

Localities from where a taxon has been reported

Many-to-many relation One locality has several taxa One taxon is found on several localities

Relation must be qualified Source! Validity of the observation