Towards Easy Matching Between Statistical Linked Data:
Dimension Patterns
Hideto Sato and Wen Wen
2013/10/22 1
FirstInterna0onalWorkshoponSeman0cSta0s0cs(SemStats2013)
22October2013,Sydney
Introduction
• Formatchingsta0s0caldatafromdifferentsources,upperconceptsandschema-levellinksareimportant.
• ThreeProblems(1)Asmallnumberofupperconceptsareavailable.(2)CertainpaHernsofdimensiondescrip0onprevent
someschema-levellinks.(3)Usageofexternalcodesishardtofindinaschema-
level.• Thispaperfocuseson(2)and(3),andproposepa9ernsofdimensiondescrip:ontoimprovethem.
2013/10/22 2
Trial Matching
• ItalianImmigra0onSta0s0cs⇒ thenumbersofimmigrantstoItaly
bybirthcountrybyyear• WorldBankSta0s0cs
⇒ thetotalpopula0on bycountrybyyear
• IntegratedSta0s0csPercentageofImmigrantstoItalybycountrybyyear
2013/10/22 3
qb:component
qb:dimension
2013/10/22 4
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
qb:component
qb:dimension
2013/10/22 5
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
(1) What role does the dimension play?
• place of residence • place of birth
qb:component
qb:dimension
2013/10/22 6
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
(2) What type of code does the dimension use ?
• Countries • Domestic Administrative Areas • River Basins, and so on.
qb:component
qb:dimension
2013/10/22 7
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
(3) What common codes are available?
• Geonames • DBPedia
preferably in the schema-level
Matching Data from Different Sources
For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence
For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas • River Basins
For Code Values What common codes are available? • Geonames• DBPedia
2013/10/22 8
Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,
Matching Data from Different Sources
For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence
For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas
For Code Values What common codes are available? • Geonames• DBPedia
2013/10/22 9
Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,UpperConcepts
Schema-LevelDescrip:on
QB and Upper Concepts
QB:TheRDFDataCubeVocabularyQBprovidesabridgetoupperconceptsbyreferringtotheSDMX-RDFvocabulary.
2013/10/22 10
Upper Concepts and SDMX-RDFUpperconcept UpperresourceinSDMX-RDF
Dimension Property PlaceofBirth sdmx-dimension:visAreaPlaceofResidence sdmx-dimension:refArea
Code Class (Range of Dimension) Area sdmx-code:AreaCountry (notdefined)Domes0cArea (notdefined)RiverBasin (notdefined)
2013/10/22 11
(sdmx-dimension:visArea has been removed in the current version of SDMX-RDF.)
eg:cardiff_00pt(local:code)
DimensionDescrip:oninQB
Code
Dimension Property
rdfs:range
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
rdfs:subClassOf
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
sdmx-code:Area(upper:
AbstractCodeClass)
Code List
Data Structure Definition
qb:dimension
2013/10/22 12
Anti-Patterns
• TwoAn:-Pa9ernspreventdescribingschema-levellinksproperly.– Directuseofanabstractupperresource
– Directuseofanexternalcodeclass
2013/10/22 13
eg:cardiff_00pt(local:code)
An:-Pa9ern:DirectUseofanUpperResource
Code
Dimension Property
rdf:type
LocalUpper
eg:areaCodeList(local:codeList)
Code Class
eg:UnitaryAuthority(local:CodeClass)
sdmx-code:Area(upper:
AbstractCodeClass)
Code List
?
skos:hasTopConcept|qb:hierarchyRoot
qb:codeList
rdfs:range
Data Structure Definitionqb:dimension
rdfs:subClassOf
2013/10/22 14
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
eg:cardiff_00pt(local:code)
ThePa9ernforUsingaLocalCodeClass
Code
Dimension Property
rdfs:range
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
rdfs:subClassOf
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
sdmx-code:Area(upper:
AbstractCodeClass)
Code List
Data Structure Definition
qb:dimension
2013/10/22 15
An:-Pa9ern:DirectUseofanExternalCodeClass
Dimension Property
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper External
Code Classrdfs:range
sdmx-code:Area(upper:
AbstractCodeClass)
Code
?eg:areaCodeList(local:codeList)
qb:hierarchyRoot
Code List
qb:codeList
Data Structure Definition
qb:dimension
2013/10/22 16
rdfs:subClassOf
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
eg:refArea(local:
dimensionProperty)
ThePa9ernforUsinganExternalCodeClass
Dimension Property
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper External
Code Classrdfs:range
sdmx-code:Area(upper:
AbstractCodeClass)
Codeeg:areaCodeList(local:codeList)
qb:hierarchyRoot
Code List
qb:codeList
eg:UnitaryAuthority(local:
CodeClassAdapter)
rdfs:subClassOf owl:equivalentClass
Data Structure Definitionqb:dimension
2013/10/22 17
eg:refArea(local:
dimensionProperty)
Alternate Code Class
Whenusingbothlocalandexternalcodeclasses,itisdifficulttofindwhetheranexternalcodeclassisemployedornot.
Weneedaschema-leveldescrip:onforanalternatecodeclass.
2013/10/22 18
eg:cardiff_00pt(local:code)
UsingLocalandExternalCodeClasses
Code
Dimension Property
rdfs:range
rdf:type
Local
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
Code List
Data Structure Definition
qb:dimension
2013/10/22 19
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
?
External
skos:exactMatch|owl:sameAs
eg:cardiff_00pt(local:code)
Proposalofanaddi:onallink(ext:altClass)
Code
Dimension Property
rdfs:range
rdf:type
Local
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
Code List
Data Structure Definition
qb:dimension
2013/10/22 20
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
External
ext:altClass
skos:exactMatch|owl:sameAs
From Our Survey
2013/10/22 21
AreaDimension
TimeDimension
DirectUseofanUpperResource
3/12 3/12
DirectUseofanExternalCodeClass
2/12 8/12
UseofAlternateCodeClasses
10/12 1/12
ThecountsareDSDs(DataStructureDefini7ons)foundintheendpointslistedathHp://www.w3.org/2011/gld/wiki/Data_Cube_Implementa0ons.
Conclusion• Weintroduceddimensionpa9ernsfordescribingschema-levellinksincludingreferencestoupperresourcesandalternateclasslinks.
• ThesewillextracttheQB'spowerofdescrip0ontoitsfullextent.
• However,onlyafewupperresourcesareavailablenow.Therefore,thepartofthepaHernsconcerningtoupperconceptsarepreparatoryforthefuture.
• Wethinkthatitisanurgenttasktoenrichupperresourcessuitableforsta0s0caldata.
2013/10/22 22