15
NERC DataGrid NERC DataGrid Controlled Vocabularies: Controlled Vocabularies: What, Why, How? What, Why, How? Vocabulary Workshop, RAL, February Vocabulary Workshop, RAL, February 25, 2009 25, 2009

Controlled Vocabularies: What, Why, How?

  • Upload
    zody

  • View
    34

  • Download
    7

Embed Size (px)

DESCRIPTION

Vocabulary Workshop, RAL, February 25, 2009. Controlled Vocabularies: What, Why, How?. Metadata. Love it or hate it without metadata automated data handling isn’t possible For automated data handling to be possible across distributed data sources metadata standards are required - PowerPoint PPT Presentation

Citation preview

Page 1: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid Controlled Vocabularies:Controlled Vocabularies:What, Why, How?What, Why, How?

Vocabulary Workshop, RAL, February 25, 2009Vocabulary Workshop, RAL, February 25, 2009

Page 2: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

MetadataMetadata

Love it or hate it without metadata automated data handling isn’t possible

For automated data handling to be possible across distributed data sources metadata standards are required

Standardised metadata comprises fields that represent real world entities such as location, time, phenomena, etc.

Page 3: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

MetadataMetadata These fields need to be populated

Plaintext may be used. Makes population easy, but it’s next to useless.

Some real examples: A wide variety of chemical and biological parameters Amplitude de l'echo retrodiffuse Cu, Zn, Fe, Pb, Cd, Cr, Ni in biota MACR0-MEIOFAUNA,SED

BIOCHEMISTRY,ZOOPLANKTON, CILIATES,BACT CELLS,BACT BIOMASS,LEUCINE UPT,PRIM. PROD,METABOL, COCCOLITH

Plaintext should be confined to abstracts

Page 4: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Controlled VocabulariesControlled Vocabularies

Much better to use concepts labelled using universally agreed terms that have universally agreed meanings

A collection of concepts designed to populate a given metadata field may be called a controlled vocabulary

Controlled vocabularies

Ensure consistent spellings Ensure consistent syntax

Well-managed controlled vocabularies

Prevent metadata misunderstandings Maintain a static relationship between metadata

fields and the real world

Page 5: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

ThesuariThesuari

Concepts within a controlled vocabulary may be semantically connected using simple relationships: Blue broader colour Colour narrower blue Colour related pigmentation

Concepts from different controlled vocabularies describing the same type of thing may be semantically connected using simple mapping relationships: Bacillariophycaea exactMatch diatoms IPTS68 temperature closeMatch ITS90 temperature Nutrients in rivers relatedMatch nitrate in water

bodies Salinity broadMatch physical oceanography Physical oceanography narrowMatch salinity

The results may termed thesauri

Page 6: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

OntologiesOntologies

But what if the controlled vocabularies describe different types of thing?

We can relate them by increasing the semantic richness of the relationships

For example:

We could have a controlled vocabulary of instruments

We could also have a controlled vocabulary of parameters

Page 7: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

OntologiesOntologies

We can link these up using relationships such as:

Themosalinograph measures salinity Fluorometer measures chlorophyll Air temperature measuredBy

psychrometer

The result may be termed an ontology

Page 8: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

OntologiesOntologies

Ontology relationships are:

Semantically rich Potentially abundant

Software agents need to have some relationship understanding to exploit the knowledge encoded in the ontology

This is achieved through relationships describing relationships called rules

Page 9: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Knowledge RepresentationKnowledge Representation

Relationships between concepts may be expressed using Resource Description Framework (RDF)

W3C standard XML encoding having ‘triples’ as its basic building block

Each triple has a subject, a predicate and an object. For example:

Colour related pigmentation Thermosalinograph measures salinity

Familiar?

Page 10: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Knowledge RepresentationKnowledge Representation

Controlled vocabularies (concept collections) and thesauri may be represented using the Simple Knowledge Organization System (SKOS)

W3C standard XML schema based on RDF

Jointly developed by STFC and Manchester University Computer Science

2008 version is the one to use

Page 11: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Knowledge RepresentationKnowledge Representation

<?xml version="1.0" ?>

- <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/">

- <skos:Concept rdf:about="http://vocab.ndg.nerc.ac.uk/term/P011/116/TEMPS901">

  <skos:externalID>SDN:P011:116:TEMPS901</skos:externalID>

  <skos:prefLabel>Temperature (ITS-90) of the water column by CTD or STD</skos:prefLabel>

  <skos:altLabel>CTDTmp90</skos:altLabel>

  <skos:definition>Unavailable</skos:definition>

  <dc:date>2009-02-09T10:45:32.262+0000</dc:date>

  <skos:broadMatch rdf:resource="http://vocab.ndg.nerc.ac.uk/term/P021/37/TEMP" />

  </skos:Concept>

  </rdf:RDF>

Page 12: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Knowledge RepresentationKnowledge Representation

Ontologies may be represented using Web Ontology Language (OWL)

W3C standard XML schema based on RDF

Example OWL documenthttp://mida.ucc.ie/ont/20080124/theme.owl

Alternative simple text encodings are available such as Open Biomedical Ontologies (OBO)

OBO used for NERC-related EnvO ontology

Page 13: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Knowledge Management ToolsKnowledge Management Tools

RDF

Tools abound – see for example http://planetrdf.com/guide/

Jena is one of the better known

SKOS

See the SKOS Tool Shed http://esw.w3.org/topic/SkosDev/ToolShed

Note this includes a Protégé plugin

Page 14: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Knowledge Management ToolsKnowledge Management Tools

OWL

Protégé with appropriate plugin is the most widely used

There are commercial alternatives such as TopBraid Composer

MMI (http://marinemetadata.org) has developed a vocabulary to OWL converter (voc2OWL)

OBO

Text so text tools work OWL and SKOS converters available

Page 15: Controlled Vocabularies: What, Why, How?

NERC DataGridNERC DataGrid

Knowledge Management ToolsKnowledge Management Tools

Mapping

MMI have developed a mapping tool (VINE) to build maps from two OWL files

Visualisation

Concept maps are useful Cmap tools is very good FreeMind (open source)