Upload
dave-reynolds
View
883
Download
0
Embed Size (px)
DESCRIPTION
Presentation on the Data Cube vocabulary, and its uses, given at the Semantic Technologies Business Conference in London.
Citation preview
Linked data hypercubes
Dave Reynolds, Epimorphics Ltd
Linked Data - great for describing “things”
data
e.g. Schools in England and Wales
Linked Data - great for describing “things”
data model
ontology development classifications phase of education location, contact reporting class sizes etc
URI scheme reference data to link to
admin geography, LLSC, charity ...
Linked Data - great for describing “things”
data model publish
convert to RDF in a triple store
entity URIs as linked data
SPARQL endpoint
Linked data API
Linked Data - great for describing “things”
data model publish use
But what about ... data
Government budget analysis
local authority spend with suppliers
regional demographic trends
performance metrics
air quality measurements
energy consumption
Publishing tabular data as linked data
? why?
how?
does it work?
Benefits data slices and values becomes addressable
annotate, explain, qualify values provenance for values trace back for derived reports
integrate, compare, slice across datasets common terms for dimensions and units common identifiers for values (regions,
departments ...) link to non-tabular data
put the data in context
Data cube vocabulary collaborative development
sponsored by data.gov.uk simple, flexible vocabulary mirrors core information models from:
SDMX (Statistical Data and Metadata eXchange) DDI (Data Documentation Initiative)
extension to SCOVO vocabulary
Data cube modelA set of observations indexed by dimensions describing measures interpreted according to attributes
dimension(e.g. time)
dim
ensi
on
(e.g
. re
gio
n)
• population = 32,567
measure(s)
unit of measure = countstatus = preliminary...
attributes
Data cube vocabulary1. Top level DataSet
provenance and metadata structure
dimension valuesmeasure value(s)attribute values
qb:component
qb:DataSet
qb:Slice
qb:slice
qb:Observation
qb:observation
qb:dataset
qb:structure
qb:SliceKey
qb:sliceStructure
qb:DataStructureDefinition
qb:sliceKey
qb:subSlice
Data cube vocabulary1. Top level DataSet
provenance and metadata structure
Observation measured values, at
dimensions with attributes direct link to DataSet
dimension valuesmeasure value(s)attribute values
qb:component
qb:DataSet
qb:Slice
qb:slice
qb:Observation
qb:observation
qb:dataset
qb:structure
qb:SliceKey
qb:sliceStructure
qb:DataStructureDefinition
qb:sliceKey
qb:subSlice
Data cube vocabulary1. Top level DataSet
provenance and metadata structure
Observation measured values, at
dimensions with attributes direct link to DataSet
Slice optional grouping by fixing
dimensions guide to presentation allows for abbreviated data
dimension valuesmeasure value(s)attribute values
qb:component
qb:DataSet
qb:Slice
qb:slice
qb:Observation
qb:observation
qb:dataset
qb:structure
qb:SliceKey
qb:sliceStructure
qb:DataStructureDefinition
qb:sliceKey
qb:subSlice
Data cube vocabulary2. Data Structure Definition explicit definition of cube
structure, inline in the data enables
validation visualization discovery abbreviation
still open world
qb:ComponentSpecification
qb:DataStructureDefinition
qb:DataSetqb:structure
qb:component
qb:dimension
qb:measure
qb:attribute
qb:componentRequired qb:componentAttachment qb:order
Data cube vocabulary3. Coding values numeric or
symbolic explicit link to
coding scheme allows for
hierarchical codes SDMX coding
schemes and role markers available
qb:ComponentProperty
qb:DimensionProperty
qb:AttributeProperty
qb:MeasureProperty
qb:CodedPropertysdmx:ConceptRole
skos:ConceptSchemeqb:codeList
qb:concept
sdmx:FrequencyRolesdmx:CountRolesdmx:EntityRolesdmx:TimeRolesdmx:MeasureTypeRolesdmx:NonObsTimeRolesdmx:IdentityRolesdmx:PrimaryMeasureRole
sdmx:Concept
sdmx:CodeList
qb:measureTypeskos:Concept
Exampleeg:dsd-le a qb:DataStructureDefinition; # The dimensions qb:component [qb:dimension eg:refArea; qb:order 1]; qb:component [qb:dimension eg:refPeriod; qb:order 2]; qb:component [qb:dimension sdmx-dimension:sex; qb:order 3]; # The measure(s) qb:component [qb:measure eg:lifeExpectancy]; # The attributes qb:component [qb:attribute sdmx-attribute:unitMeasure; qb:componentAttachment qb:DataSet;] .
eg:dataset-le1 a qb:DataSet; rdfs:label "Life expectancy"@en; rdfs:comment "Life expectancy in Welsh Unitary authorities"@en; qb:structure eg:dsd-le ; sdmx-attribute:unitMeasure <http://dbpedia.org/resource/Year> .
eg:o1 a qb:Observation; qb:dataset eg:dataset-le1 ; eg:refArea admingeo:newport_00pr ; eg:refPeriod <http://reference.data.gov.uk/id/year/2004> ; sdmx-dimension:sex sdmx-code:sex-M ; eg:lifeExpectancy 76.7 .
Case study: Local government payments
UK local authorities publish data on all spending above £500
linked data version to enable comparison
data
Case study: Local government payments
cube structure measure
amount net of recoverable VAT attributes
currency dimensions
time payer payee expenditure code item
package as an ontology
data model
Case study: Local government payments
data model publish
LD API
visualizations
API structure mirrors cube dimensional structure
Case study: Local government payments
data model publish use
Case study: Environmental monitoring
data
Environment Agency bathing water quality monitoring
samples assay compliance
assessment
Case study: Environmental monitoring
measures total coliform count, entero virus count, ... sample classification
dimensions sampling point sampling week sampling year
attributes abnormal weather
data model
Case study: Environmental monitoring
data model publish
LD API
visualizations
API structure mirrors cube dimensional structure
Case study: Environmental monitoring
data model publish use
Data Cube : Summary foundational approach to publishing multi-
dimensional data as linked data enables
addressing – annotate, explain, provenance, context
integration – slice, dice and compare across setsputs data in context
explicit declarative structure => validation discovery automation - web APIs, visualizations, exploration
tools
Acknowledgements John Sheridan (The National Archive)
for sponsoring the development of data cube Richard Cyganiak, Jeni Tennison
co-developers of the data cube vocabulary Paul Davidson
instigator of the Payments ontology Stuart Williams, Ian Dickinson
developers of the bathing water use case Photos:
dullhunk @ flickr Martin Pettitt @ flickr kikasso @ flickr Tax_Rebate @ fliCkr