Upload
zody
View
34
Download
7
Embed Size (px)
DESCRIPTION
Vocabulary Workshop, RAL, February 25, 2009. Controlled Vocabularies: What, Why, How?. Metadata. Love it or hate it without metadata automated data handling isn’t possible For automated data handling to be possible across distributed data sources metadata standards are required - PowerPoint PPT Presentation
Citation preview
NERC DataGridNERC DataGrid Controlled Vocabularies:Controlled Vocabularies:What, Why, How?What, Why, How?
Vocabulary Workshop, RAL, February 25, 2009Vocabulary Workshop, RAL, February 25, 2009
NERC DataGridNERC DataGrid
MetadataMetadata
Love it or hate it without metadata automated data handling isn’t possible
For automated data handling to be possible across distributed data sources metadata standards are required
Standardised metadata comprises fields that represent real world entities such as location, time, phenomena, etc.
NERC DataGridNERC DataGrid
MetadataMetadata These fields need to be populated
Plaintext may be used. Makes population easy, but it’s next to useless.
Some real examples: A wide variety of chemical and biological parameters Amplitude de l'echo retrodiffuse Cu, Zn, Fe, Pb, Cd, Cr, Ni in biota MACR0-MEIOFAUNA,SED
BIOCHEMISTRY,ZOOPLANKTON, CILIATES,BACT CELLS,BACT BIOMASS,LEUCINE UPT,PRIM. PROD,METABOL, COCCOLITH
Plaintext should be confined to abstracts
NERC DataGridNERC DataGrid
Controlled VocabulariesControlled Vocabularies
Much better to use concepts labelled using universally agreed terms that have universally agreed meanings
A collection of concepts designed to populate a given metadata field may be called a controlled vocabulary
Controlled vocabularies
Ensure consistent spellings Ensure consistent syntax
Well-managed controlled vocabularies
Prevent metadata misunderstandings Maintain a static relationship between metadata
fields and the real world
NERC DataGridNERC DataGrid
ThesuariThesuari
Concepts within a controlled vocabulary may be semantically connected using simple relationships: Blue broader colour Colour narrower blue Colour related pigmentation
Concepts from different controlled vocabularies describing the same type of thing may be semantically connected using simple mapping relationships: Bacillariophycaea exactMatch diatoms IPTS68 temperature closeMatch ITS90 temperature Nutrients in rivers relatedMatch nitrate in water
bodies Salinity broadMatch physical oceanography Physical oceanography narrowMatch salinity
The results may termed thesauri
NERC DataGridNERC DataGrid
OntologiesOntologies
But what if the controlled vocabularies describe different types of thing?
We can relate them by increasing the semantic richness of the relationships
For example:
We could have a controlled vocabulary of instruments
We could also have a controlled vocabulary of parameters
NERC DataGridNERC DataGrid
OntologiesOntologies
We can link these up using relationships such as:
Themosalinograph measures salinity Fluorometer measures chlorophyll Air temperature measuredBy
psychrometer
The result may be termed an ontology
NERC DataGridNERC DataGrid
OntologiesOntologies
Ontology relationships are:
Semantically rich Potentially abundant
Software agents need to have some relationship understanding to exploit the knowledge encoded in the ontology
This is achieved through relationships describing relationships called rules
NERC DataGridNERC DataGrid
Knowledge RepresentationKnowledge Representation
Relationships between concepts may be expressed using Resource Description Framework (RDF)
W3C standard XML encoding having ‘triples’ as its basic building block
Each triple has a subject, a predicate and an object. For example:
Colour related pigmentation Thermosalinograph measures salinity
Familiar?
NERC DataGridNERC DataGrid
Knowledge RepresentationKnowledge Representation
Controlled vocabularies (concept collections) and thesauri may be represented using the Simple Knowledge Organization System (SKOS)
W3C standard XML schema based on RDF
Jointly developed by STFC and Manchester University Computer Science
2008 version is the one to use
NERC DataGridNERC DataGrid
Knowledge RepresentationKnowledge Representation
<?xml version="1.0" ?>
- <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/">
- <skos:Concept rdf:about="http://vocab.ndg.nerc.ac.uk/term/P011/116/TEMPS901">
<skos:externalID>SDN:P011:116:TEMPS901</skos:externalID>
<skos:prefLabel>Temperature (ITS-90) of the water column by CTD or STD</skos:prefLabel>
<skos:altLabel>CTDTmp90</skos:altLabel>
<skos:definition>Unavailable</skos:definition>
<dc:date>2009-02-09T10:45:32.262+0000</dc:date>
<skos:broadMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P021/37/TEMP" />
</skos:Concept>
</rdf:RDF>
NERC DataGridNERC DataGrid
Knowledge RepresentationKnowledge Representation
Ontologies may be represented using Web Ontology Language (OWL)
W3C standard XML schema based on RDF
Example OWL documenthttp://mida.ucc.ie/ont/20080124/theme.owl
Alternative simple text encodings are available such as Open Biomedical Ontologies (OBO)
OBO used for NERC-related EnvO ontology
NERC DataGridNERC DataGrid
Knowledge Management ToolsKnowledge Management Tools
RDF
Tools abound – see for example http://planetrdf.com/guide/
Jena is one of the better known
SKOS
See the SKOS Tool Shed http://esw.w3.org/topic/SkosDev/ToolShed
Note this includes a Protégé plugin
NERC DataGridNERC DataGrid
Knowledge Management ToolsKnowledge Management Tools
OWL
Protégé with appropriate plugin is the most widely used
There are commercial alternatives such as TopBraid Composer
MMI (http://marinemetadata.org) has developed a vocabulary to OWL converter (voc2OWL)
OBO
Text so text tools work OWL and SKOS converters available
NERC DataGridNERC DataGrid
Knowledge Management ToolsKnowledge Management Tools
Mapping
MMI have developed a mapping tool (VINE) to build maps from two OWL files
Visualisation
Concept maps are useful Cmap tools is very good FreeMind (open source)