Upload
laurent-lefort
View
976
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Canberra Semantic Web Meetup. Initiatives have been launched to develop semantic vocabularies representing statistical classifications and discovery metadata. Tools are also being created by statistical organizations to support the publication of dimensional data conforming to the Data Cube specification, now in Last Call at W3C. The meeting will be an opportunity to hear about two semantic Web and Linked Data initiatives for statistical data that are driven by the Australian Government. The Bureau of Meteorlogy and CSIRO have recently released a Linked Data version of the ACORN-SAT historical climate data at http://lab.environment.data.gov.au and the ABS has released the Census data modelled in the Data Cube vocabulary which is part of a challenge the ABS is organising in context of the SemStats Workshop (http://www.datalift.org/en/event/semstats2013/challenge) at the International Semantic Web Conference (ISWC) in Sydney (http://iswc2013.semanticweb.org). Come along to hear about these two projects, the challenges encountered and the solutions developed.
Citation preview
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.auCanberra Semantic Web meetup
CSIRO COMPUTATIONAL INFORMATICS
Laurent Lefort, Armin Haller
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort
Outline
• ACORN-SAT Dataset• Building the Data Cube• Enriching ACORN-SAT Linked Data with Metadata• Published ACORN-SAT Linked Data
2 |
The ACORN-SAT dataset
• Released by Aus. Bureau of Meteorology (23 March 2012)• Available at http://www.bom.gov.au/climate/change/acorn-sat/ • 112 stations in total - 60 from 1910 to 2011• Homogenised (adjusted) daily temperatures• Tabular format (1 file per time series/station)
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort3 |
“Catalogue websites do notunlock the full potential of thecollected data and metadata”
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort4 |
Richard Cyganiak,
Limitations of ACORN-SAT in Tabular files
• Metadata fields are not documented• Querying across the catalog is difficult• Exploring the catalog through different facets
geographical/statistical/tabular is not possible• Bulk processing of the dataset or parts of it is not possible• Social annotations are not possible• Integrating the dataset within other datasets is difficult
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort5 |
ACORN-SAT as Linked Data
Linked Data is a shift from publishing data in human readable HTML documents to machine readable documents.
Linked Data Principles:1. Use URIs as identifiers for Things
http://sws.geonames.org/2172517
2. Make them actionable→ http://www.geonames.org/2172517/canberra.html
3. Return information following standards→ http://sws.geonames.org/2172517/about.rdf
4. Link to other information objects<rdfs:seeAlso rdf:resource="http://dbpedia.org/resource/Canberra"/>
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort6 |
ACORN-SAT as Linked Data
RDF Data Cube: a method to organise linked data in slices • A vocabulary published by the W3C
Government Linked Data (GLD) Working Group (Working Draft)• Also the method used to publish statistics data and environmental data in
Europe e.g. for Bathing Water Quality in UK http://www.epimorphics.com/web/projects/bathing-water-quality
Advantages• Allows multiple views on the same data (similar to OLAP)• Generic approach which supports the links to domain-specific definitions
Useable:• In any browser via Linked Data API (HTML output)• In JavaScript via Linked Data API (JSON output)• In R via SPARQL
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort7 |
RDF Data Cube 101 - Slices and observations
Dimension d6
Dimension d7
Dimension d1
Dimension d2
Dimension d3
Dimension d4
Dimension d5
Measure m1, m2, …Attribute a1, a2, …
Cube
Slice
Observation
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort8 |
RDF Data Cube 101 – Dataset, Slice, ObservationCube and Slice
qb:DataSet
qb:slice
qb:Observation
Cube observation
qb:observation
qb:subSlice
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort9 |
qb:Slice
qb:dataSet
void:subset
RDF Data Cube 101 – Data Structure Definitions (DSDs)
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort10 |
http://sdmx.org/wp-content/uploads/2012/11/SDMX-Guidelines-for-the-Design-of-Data-Structure-Definitions.pdf RDF Data Cube model compatible with SDMX
5 basic steps
• 1.Define the prefixes to be used • 2.Publish your schema
• Define the dimension(s) – used to identify the observations (ex. time, region), what the observation applies to
• Define the measure(s) – the phenomenon being observed • Define the attribute(s) - unit of measure • Define the DSD (attach components)
• 3.Publish your data • Define the Dataset (attach DSD) • Define Observations – the actual data
• 4.Include Slices (views) on your data • Define SliceKey(s) - the fixed dimensions • Define the DSD (attach SliceKey(s)) • Define the Dataset (attach Slices to be defined) • Define Slices and Observations
• 5.Select appropriate URIs
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort11 |
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort
1. Prefixes
• PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>• PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#• PREFIX qb: <http://purl.org/linked-data/cube#>• PREFIX interval: <http://reference.data.gov.uk/def/intervals/>• PREFIX gn: <http://www.geonames.org/ontology#>• PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn#>• PREFIX acorn-sat:
<http://lab.environment.data.gov.au/def/acorn/sat/>• PREFIX acorn-series:
<http://lab.environment.data.gov.au/def/acorn/time-series/>
12 |
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort
2. Define the schema
13 |
Dimension
Dimension
Dimension
Measure
Atttribute
Measure
Attribute
Measure
Attribute
Atttribute
Atttribute
Dimension
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort
3. Define the Observations
14 |
4. Define the slices
Observation
- MinTemperature- MaxTemperature- Rainfall
- Booleans for missing data
Day
(3) Month
(2) Year
(1) ACORN-SAT Series/System (station)
Current Data Cube structure (and URI/API logic)• Stations/time series
• Year• Month
• All linking to observations
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort15 |
Define the DSD
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort16 |
5. Select appropriate URIs
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort17 |
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort18 |
(extra) Statistics at slice levelTo port to DDI-RDF Discovery
• Data describing the deployment history • Available in ACORN-SAT station catalogue (pdf)• Not available in tabular format distribution
• ACORN-SAT composite stations – composed of one or several BoM stations
• BoM (Bureau of Meteorology) stations – composed of one or several station sharing the same codes
• Textual description of significant events
• Data describing the detailed conditions of observations• Sensors• Deployment Intervals
… using Semantic Sensor Network (SSN) ontology• SSN-XG report http://www.w3.org/2005/Incubator/ssn/XGR-ssn/• SSN Ontology http://purl.oclc.org/NET/ssnx/ssn
Station metadata
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort19 |
SSN: deployed systems and observations
Skeleton
Device
Deployment
PlatformSite
System
ssn:System
onPlatform
hasSubsystem
hasDeployment
ssn:DeploymentRelatedProcess
ssn:Deployment
deploymentProcesPartdeployedSystem
ssn:Platform
deployedOnPlatform
attachedSystem
ssn:Device
ssn:Sensor
ssn:SensingDevice
observes
inDeployment
observedBy
ssn:PropertyobservedProperty
ssn:Observation
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort20 |
Example (Darwin)Time series – Weather stations – Sites – (Sensors)
Darwin Post Office 014016 (1910-1942)
Darwin Airport014015 (1941-2007 & 2001-now)2 sites – 1km apart – same code used
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort21 |
Deployment phases in Darwin
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort22 |
Multiple Views on Data – Mashups
• Display the station locations and their average temperature readings on a map• http://lab.environment.data.gov.au/mashup/drilldown
• Select a Date range for climate readings for a given location• http://lab.environment.data.gov.au/mashup
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort23 |
Multiple Views on Data – ELDA Linked Data API
ssn:hasSubSystemssn:hasDeployment
ssn:deploymentProcessPartssn:observedBy
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort24 |
Multiple Views on Data – SPARQL
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort25 |
Multiple Views on Data – SPARQL
PREFIX cube: <http://purl.org/linked-data/cube#>PREFIX sat: <http://lab.environment.data.gov.au/def/acorn/sat/>
SELECT ?x, MAX(?max) AS ?MaxEver
WHERE { <http://lab.environment.data.gov.au/data/acorn/climate/slice/station/086071> cube:subSlice ?y . ?y cube:subSlice ?x .
?x sat:month ?z .?x cube:observation ?obs .?obs sat:maxTemperature ?max .FILTER regex(?z, "07")
}ORDER BY DESC(?max) LIMIT 1
RESULT:http://lab.environment.data.gov.au/data/acorn/climate/slice/station/086071/year/1975/month/07 23.3
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort26 |
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort27 |
Wrap up
• Experimental version of ACORN-SAT data • Available at http://lab.environment.data.gov.au/ • Developed for the Australian Bureau of Meteorology (BOM) by CSIRO in cooperation with the Australian
Government Information Management Office (AGIMO)• Temperature (homogenised) plus Rainfall (not homogenised)
• First version presented at Australian GovHack Day• Alternative to tabular data
• Last version, uploaded to LOD cloud• http://thedatahub.org/dataset/acorn-sat
• Linked data (and well managed URIs) to build the bridges between the different agencies• Current linked data pilot is one agency (BoM) and one server but applies solutions and
schemes already in place in multi-agencies and multi-service providers context (e.g. UK)
• Thanks to AGIMO for helping us to set up http://lab.environment.data.gov.au/
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort28 |
Use It! http://michaelhalls.net/planforsun/index.php
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort29 |
Australian Government Linked Data Working Group (AGLDWG)
• Ad-hoc group established August 2012– BoM, OSP, CSIRO , AGIMO, DRALGAS, NAA, GA, ABS
• Terms of reference– Develop technical guidelines and best practice on the use of ‘linked-
data’ by AG agencies– Inform the development of data.gov.au as a platform for publishing
Commonwealth PSI– Promote the benefits and encourage adoption of ‘linked-data’ for
publishing Commonwealth PSI– Where appropriate, undertake specific activities and coordinate
projects in pursuit of these objectives• Seeking formal endorsement
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort30 |
Conclusions
• Approach is applicable to all climate time series • Opportunities to link to other datasets (Australia, World)
• Geo-features (e.g. GeoNames - done) for weather station sites, districts• Other climate data e.g. regional and world climate data archives, cyclone
tracks (not yet available as linked data)• Other environmental data (not yet available as linked data)
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort31 |
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort
ISWC 2013
32 |
• The 12th International Semantic Web Conferenceand the 1st Australasian Semantic Web Conference21-25 October 2013, Sydney, Australia
• http://iswc2013.semanticweb.org/• https://twitter.com/iswc2013
• First International Workshop on Semantic Statistics (SemStats 2013)• SemStats 2013 Challenge
• Call for Papers • http://datalift.org/en/event/semstats2013/challenge-cfp• Data• http://datalift.org/en/event/semstats2013/challenge
Recommended by!
CSIRO Computational InformaticsLaurent LefortOntologistt +61 2 9123 4567e [email protected] csiro.au
CSIRO COMPUTATIONAL INFORMATICS
Thank you
Images credits
• Blair Trewin The ACORN-SAT station at Butlers Gorge in central Tasmania (surfacetemperatures.blogspot.com.au )
• Nathanael Boehm
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort34 |
Using the Data Cube vocabulary for Publishing Environmental Linked Data on lab.environment.data.gov.au | Laurent Lefort
More information
• Laurent Lefort, Josh Bobruk, Armin Haller, Kerry Taylor and Andrew Woolf A Linked Sensor Data Cube for a 100 Year Homogenised daily temperature dataset Proc. SSN 2012
35 |