50
Simon J D Cox, Jonathan Yu, Megan Williams, Fabrizio Giabardo, Dominic Lowe 16 April 2015 LAND AND WATER FLAGSHIP Technologies and practices for maintaining and publishing earth science vocabularies

Technologies and practices for maintaining and publishing earth science vocabularies

Embed Size (px)

Citation preview

Simon J D Cox, Jonathan Yu, Megan Williams, Fabrizio Giabardo, Dominic Lowe

16 April 2015

LAND AND WATER FLAGSHIP

Technologies and practices for maintaining and publishing earth science vocabularies

Are these the same?

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe

“nitrogen”

“dissolved nitrogen”

“Total nitrogen, water, filtered, milligrams per liter”

“Concentration of nitrogen (total) per unit volume of the water body [dissolved plus reactive particulate phase] by oxidation and colorimetric autoanalysis“

“Concentration of nitrogen (total) per unit mass of the water body [dissolved plus reactive particulate <GF/F phase] by filtration and high temperature Pt catalytic oxidation”

“Concentration (moles or mass) of total nitrogen (i.e. nitrogen in all chemical forms) in suspended particulate material per unit volume of the water column.”

“Concentration of nitrogen (total) {'PON'} per unit volume of the water body [particulate 2-10um phase] by filtration, acidification and elemental analysis”

“Dissolved total and organic nitrogen concentrations in the water column”

2 |

Why are vocabularies important?

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe3 |

OM_Observation

+ phenomenonTime

+ resultTime

+ validTime [0..1]

+ resultQuality [0..*]

+ parameter [0..*]

GFI_PropertyTypeGFI_Feature

OM_ProcessGFI_DomainFeature Any

+observedProperty

1+propertyValueProvider

0..*

+featureOfInterest

1

+generatedObservation

0..*

+procedure1 +result

Range

observed property

Parameter dictionary

procedure

Register of sensors, processes & algorithms

feature of interest

Feature-type catalogue

Feature service

result format:

GML, SWE, netCDF, JSON, SQLite...

O&M domain specialization

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe4 |

RDF Data Cube 101 - Slices and observations

Dimension d6

Dimension d7

Dimension d1

Dimension d2

Dimension d3

Dimension d4

Dimension d5

Measure m1, m2, …

Attribute a1, a2, …

Cube

Slice

Observation

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe5 |A linked sensor data cube, Lefort, 5th Intl. SSN workshop, 2012

W3C Data Cube ontology

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe6 |

Each axis or variable

specified as a skos:Concept

Values of coded-properties

selected from a

skos:ConceptScheme

Homogeneous observations,

common structure definition

The RDF Data Cube Vocabulary, Cyganiak & Reynolds, W3C Recommendation 2014

What is available?

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe7 |

AGU

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe8 |

Thomson-Reuters

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe9 |

ANZSRC

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe10 |

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe11 |

ICS

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe

12 |

GSSP

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe13 |

GCMD

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe14 |

Standard ontology of chemicals

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe15 |

Vocabulary formalization

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe16 |

Formalization: RDF – SKOS for basic vocabularies

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe17 |

chem:sodium

a skos:Concept ;

rdfs:label "sodium"^^xsd:string ;

skos:broader chem:alkali ;

skos:exactMatch <http://dbpedia.org/resource/Sodium> ;

skos:inScheme skos:chemicals ;

skos:prefLabel "nátrium"@hu , "sodio"@it , "sodium"@fr , "sodium"@en .

RDFS

Semantic web dead long live semantic web | Simon Cox18 |

GeochronEraTemporalReference

System

componentmember

skos:ConceptSchemeskos:Concept

skos:hasTopConcept skos:narrowersubClassOf subClassOf

subPropertyOf subPropertyOf

domain

domain

domain

range

rangerange

domain range

Inferencing

• Entailments and reasoning

• What does this combination of axioms imply?

• Is there anything unexpected?

Phanerozoic

Cenozoic

Neogene

StratigraphicChart

GeochronEra

TemporalReferenceSystem

type

type

type

type

component member

member

hasTopConcept narrower

narrowernarrowerTransitive

Concept

ConceptScheme

broaderTransitive

Semantic web dead long live semantic web | Simon Cox19 |

Formalization and encoding process

Create order within existing excel spreadsheets

Every layout is different

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe20 |

Formalization and encoding process

RDF 123

Every mapping

is different

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe21 |

Formalization and encoding process

Turtle,

in text editor …

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe22 |

People + judgement

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe23 |

Vocabulary distribution

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe24 |

• Physical documents, PDF

• Tables on web pages

• Bespoke XML documents

• RDF documents, OWL documents

• Web services

• RESTful web resources, Linked data

Vocabulary services | Cox & Yu

Delivery

Publish as linked data

URI = web-scale foreign-key

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe26 |

Linked vocabularies can be shared and re-used

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe27 |

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe28 |

Status and lifecycle

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe29 |

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe30 |

Governance issues, design flaws

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe31 |

Governance issues

What is the best way to re-use existing content already published as linked data?

Do we fix it for them? Do we re-claim it?

Vocabulary deployment and governance | Cox32 |

Modeling flaws

GCMD science keywords

• Same textual definition, same label

• Different parent, different URI

– are they the same concept?

Vocabulary deployment and governance | Cox33 |

Re-base the URI?

<http://registry.it.csiro.au/def/kwa/gcmd/ABRASION>

a skos:Concept ;

rdfs:label "ABRASION" ;

dct:description "Mechanical scraping of a rock surface by friction between rocks and moving particles."@en ;

owl:sameAs

<http://gcmdservices.gsfc.nasa.gov/kms/concept/8f57f4b0-5177-4362-81e8-ced75d37d1aa> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/fd29bf77-df38-4b80-8148-8184fa41d843> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/efacd4f6-59ea-4019-8265-8cc81ecc99c0> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/f6e19e2e-555a-4d40-9833-c7513d92c813> ;

skos:prefLabel "ABRASION"@en .

Vocabulary deployment and governance | Cox34 |

Versioning flaws

NASA SWEET

http://sweet.jpl.nasa.gov/1.1/time.owl#PLIOCENE

http://sweet.jpl.nasa.gov/2.0/timeGeologic.owl#Pliocene

http://sweet.jpl.nasa.gov/2.1/reprTimeGeologicPeriod.owl#Pliocene

http://sweet.jpl.nasa.gov/2.2/stateTimeGeologic.owl#Pliocene

http://sweet.jpl.nasa.gov/2.3/stateTimeGeologic.owll#Pliocene

• Same label, and same place in hierarchy

• Different URI

- are they the same concept?

Vocabulary deployment and governance | Cox35 |

Governance issues

Who is the expert? - Wikipedia??

Vocabulary deployment and governance | Cox36 |

Collection sub-set?

<http://registry.it.csiro.au/def/kwa/gcmd/GCMD-keywords-subset_newnames>

a skos:Collection ;

rdfs:label "Subset of GCMD keywords - re-based"̂ x̂sd:string ;

skos:member <http://registry.it.csiro.au/def/kwa/gcmd/ABLATION> , <http://registry.it.csiro.au/def/kwa/gcmd/ABRASION> , <http://registry.it.csiro.au/def/kwa/gcmd/ABLATION-ZONES-ACCUMULATION-ZONES> .

- Or -

<http://registry.it.csiro.au/def/kwa/gcmd/GCMD-keywords-subset>

a skos:Collection ;

rdfs:label "Subset of GCMD keywords"̂ x̂sd:string ;

skos:member

<http://gcmdservices.gsfc.nasa.gov/kms/concept/8f57f4b0-5177-4362-81e8-ced75d37d1aa> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/95fbaefd-1afe-4887-a1ba-fc338a8109bb> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/99db4dca-4d07-48fd-8ba3-393532d04aa6> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/a994a6f6-cfcd-45d2-95a4-0f8455a9454d> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/efacd4f6-59ea-4019-8265-8cc81ecc99c0> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/fd29bf77-df38-4b80-8148-8184fa41d843> , <http://gcmdservices.gsfc.nasa.gov/kms/concept/f6e19e2e-555a-4d40-9833-c7513d92c813> .

Vocabulary deployment and governance | Cox37 |

More complex constraints?

OWL classes vs instances

cgi-lith-instance:carbonate_rich_mudstone a skos:Concept ;

rdfs:label "carbonate-rich mudstone" ;skos:broader cgi-lith-instance:rock_material ;CGI_Lith:ConsolDegree CGI_Lith:consolidated ;CGI_Lith:Constituents CGI_Lith:carbonateBearing ;CGI_Lith:GeneticCateg CGI_Lith:sedimentary ;CGI_Lith:GrainSize CGI_Lith:mud_size ;CGI_Lith:ParticleType CGI_Lith:grain .

Vocabulary deployment and governance | Cox38 |

Summary and conclusions

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe39 |

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe40 |

Source Vocabulary

(csv, html, txt)

Database (triple-store)

Formalized vocabulary

(skos/rdf)

Vocab service

LDR API SPARQL

SISSVoc

Summary

• Term vocabularies can be formalized in RDF (SKOS, OWL) and published as linked data

• Much content available, but needs converting (‘lifting’) to semantic technologies

• Excel, RDF123, Text editor, SKOS, LDR and SISSVoc are our enablers (but people are essential)

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe41 |

Applications and published vocabularies

• GeoSciML vocabularies

• http://def.seegrid.csiro.au/sissvoc/cgi201211/collection

• http://resource.geosciml.org/classifier/ics/ischart/

• Environmental observations vocabularies

• http://environment.data.gov.au/def/

• http://registry.it.csiro.au/environment/def

• Bioregional assessments glossary

• http://registry.it.csiro.au/test1/ba-glossary

• Agriculture definitions

• http://registry.it.csiro.au/agriculture/def

• Australian Government definitions - AGIFT 2014, ANZSRC 2008 …

• http://registry.it.csiro.au/agldwg/def

• CSIRO Keyword aggregator … • Coming soon

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe42 |

LAND AND WATER FLAGSHIP

Thank youEnvironmental Informatics InfrastructureSimon J D CoxResearch Scientist

t +61 3 9252 6342e [email protected] people.csiro.au/C/S/Simon-Cox

Jonathan YuResearch Engineer

t +61 3 9252 6440e [email protected] people.csiro.au/C/S/Jonathan-Yu

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe44 |

SISSVoc UI & API for vocabulary query

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe45 |

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe46 |

Simplified Knowledge Organization System SKOS: a W3C Standard

Focus on the concept rather than the term

• Web/Linked data principle: Concept is identified by a URI

• Concept is annotated with text labels (i.e. the traditional ‘term’)

• Structured using hierarchical relations within a vocabulary• broader, narrower

• Matching relations between vocabularies• broadMatch, closeMatch, exactMatch

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe47 |

• Physical documents, PDF

• Tables on web pages

• Bespoke XML documents

• RDF documents, OWL documents

• Web services

• RESTful web resources, Linked data

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe

Delivery

O&M

OM_Observation

+ phenomenonTime

+ resultTime

+ validTime [0..1]

+ resultQuality [0..*]

+ parameter [0..*]

GF_PropertyType

GFI_Feature

OM_Process Any

+observedProperty

1

0..*

+featureOfInterest 1

0..*

+procedure1 +result

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, LoweISO 19156:2011 Geographic Information – Observations and measurements – ed. S Cox49 |

Governance

Clear roles:

• Content is determined by the experts

• Formalization may uncover inconsistencies

• History and status must be visible

• No deletions! - retirement or supercession

Publishing earth science vocabularies | Cox, Yu, Williams, Giabardo, Lowe50 |