44
Data Cube Vocabulary Workshop 26 th May 2015, Luxembourg Developing a Browser for Linked Data Cubes Evangelos Kalampokis

A Browser for Linked Data Cubes

Embed Size (px)

Citation preview

Page 1: A Browser for Linked Data Cubes

Data Cube Vocabulary Workshop26th May 2015, Luxembourg

Developing a Browser for Linked Data CubesEvangelos Kalampokis

Page 2: A Browser for Linked Data Cubes

2Eurostat Workshop

The OpenCube Browser enables exploring an RDF Data Cube by presenting the values in a table.

It is currently based on the InformationWorkbench open source platform.

It requires as an input the URI of a cube.

The OpenCube Browser

26th May 2015

Page 3: A Browser for Linked Data Cubes

Eurostat Workshop 326th May 2015

The OpenCube Browser

It presents a 2-dimensitonal slice of an

RDF cube in a table

Page 4: A Browser for Linked Data Cubes

Eurostat Workshop 426th May 2015

The OpenCube Browser

Change the two axes of the table (in the case of cubes with more

than 2 dimensions)

Page 5: A Browser for Linked Data Cubes

Eurostat Workshop 526th May 2015

The OpenCube Browser

Change the fixed values of the dimensions that are not

included in the table

Page 6: A Browser for Linked Data Cubes

Eurostat Workshop 626th May 2015

The OpenCube Browser

Change the language of the values

Page 7: A Browser for Linked Data Cubes

Eurostat Workshop 7

The feedback received by the employees of the Flemish Government It should be understood in the context of a department considering

alternatives to an existing proprietary solution. Overall, users were satisfied. Main criticism by some users

Usefulness: “We don’t see added value compared to other tools” Easy of use

User interface is not clear to an average user Performance (response time) should be better

26th May 2015

Evaluation of the OpenCube Browser

Page 8: A Browser for Linked Data Cubes

Eurostat Workshop 8

The OpenCube OLAP Browser enhances previous version. The OpenCube OLAP Browser

New clear and user-friendly interface (similar to an OLAP browser) Enables browsing integrated multiple RDF cubes

26th May 2015

OpenCube OLAP Browser

http://83.212.122.81:8888/resource/OpenCubeOLAPBrowser

Page 9: A Browser for Linked Data Cubes

Eurostat Workshop 9

We start with an empty canvas User can add dimensions and measures

26th May 2015

Address easy of use related comments

Page 10: A Browser for Linked Data Cubes

Eurostat Workshop 10

Linked Data as a competitive advantage over existing tools. Enabling the integration of RDF data cubes across the Web. One can start with an initial RDF cube, identify compatible cubes

from other sources and browse the new expanded cube.

26th May 2015

Address usefulness related comments

Page 11: A Browser for Linked Data Cubes

Eurostat Workshop 11

The OpenCube OLAP browser is a proof of concept of this vision

26th May 2015

The vision of exploiting Expanded Linked Data Cubes

Page 12: A Browser for Linked Data Cubes

LATC’s Eurostat ExampleMain economic variables (2006)Dimensions: timePeriod (2006) Geopolitical entity Economical indicator for structural

business statistics Classification of economic activities

Measures: ObsValue

Main economic variables (2007)Dimensions:timePeriod (2007)Geopolitical entityEconomical indicator for structural business statisticsClassification of economic activities

Measures: ObsValue

Page 13: A Browser for Linked Data Cubes

Eurostat Workshop 1326th May 2015

Emphasizing on Cubes Integration

Discover compatible to join linked data cubes.

Establish typed links between compatible to join cubes

Create expanded cubes by increasing the size of one of the sets that define a cube

Page 14: A Browser for Linked Data Cubes

Eurostat Workshop 14

Some functionalities of the OpenCube OLAP Browser require the output of functionalities performed by other tools of the OpenCube Toolkit.

26th May 2015

Tools and Functionalities

OpenCube OLAP Browser

B1: Present 2D slice of a cube

B2: Add/remove Measures

B3: Support multiple languages

B4: Add/remove Dimensions

B5: Roll-up/drill-down

B6: Integrated view of cubes

Aggregator

A1: Compute aggregations across dimension

A2: Compute aggregations across hierarchy

requires Compatibility Explorer

CE1: Identify compatible cubes

Page 15: A Browser for Linked Data Cubes

Eurostat Workshop 1526th May 2015

B1: Present a two-dimensional slice of a cube

Page 16: A Browser for Linked Data Cubes

Eurostat Workshop 1626th May 2015

B2: Add/remove measures

Adding one more measure to be presented

Page 17: A Browser for Linked Data Cubes

Eurostat Workshop 17

Change the language of the available data

26th May 2015

B3: Support multiple languages

Page 18: A Browser for Linked Data Cubes

Eurostat Workshop 1826th May 2015

B4: Add/remove dimensions

Adding one more dimension to be presented

Page 19: A Browser for Linked Data Cubes

Eurostat Workshop 19

Functionality B4 requires the use of the OpenCube Aggregator tool in order to pre-compute aggregations across all dimensions.

The Aggregator creates 2n-1 sub-cubes from a cube of n dimensions. We define this set of cubes as an Aggregation Set.

26th May 2015

A1: Compute aggregations across dimension

Time

GeoSex

Time Time

Geo Sex

Geo

Sex

Time GeoSex

Total

Three dimensions

Two dimensions

One dimension

No dimensions

Page 20: A Browser for Linked Data Cubes

Eurostat Workshop 20

If a user adds/removes one dimension the OpenCube OLAP browser picks up and presents a new cube from the aggregation set based on the selected dimensions.

In order to improve performance we establish typed links between the pre-computed cubes and the Aggregation Set

26th May 2015

B4: Add/remove dimension

Page 21: A Browser for Linked Data Cubes

Eurostat Workshop 21

Select the aggregation function This can be done based on the unit of measure

Meaningless aggregations Users intervention is probably required

26th May 2015

Challenges related to the Aggregator

Page 22: A Browser for Linked Data Cubes

Eurostat Workshop 22

This functionality is under development It requires pre-computing aggregations across a hierarchy using the

OpenCube Aggregator tool

26th May 2015

B5: Roll-up/Drill-down

Under Development

Page 23: A Browser for Linked Data Cubes

Eurostat Workshop 23

It enriches an existing cube with new observations by using a hierarchy.

26th May 2015

A2: Compute aggregations across hierarchy

Time

Geo

Sex

city1

city2

city3

+city4

country1

region1

region2

city1

city2

city3

city4Time

Geo

city1city2city3city4=

region1

region2country

1 Sex

Page 24: A Browser for Linked Data Cubes

24Eurostat Workshop

The user selects a cube and an operation

Add new measure Add new value to dimension

The tool presents all the available compatible cubes.

The user selects one of the cubes.

B6: Integrated view of multiple cubes (1/2)

26th May 2015

Page 25: A Browser for Linked Data Cubes

Eurostat Workshop 25

The OpenCube OLAP Browser presents an integrated view of the two RDF data cubes.

26th May 2015

B6: Integrated view of multiple cubes (2/2)

Added values

Page 26: A Browser for Linked Data Cubes

Eurostat Workshop 26

The OpenCube Compatibility Explorer pre-identifies and establishes typed links between compatible to merge cubes.

Two types of compatibility Add new measure compatible Add new value to dimension compatible

26th May 2015

E1: Identify compatible cubes

Page 27: A Browser for Linked Data Cubes

Eurostat Workshop 27

Identification of same dimension If the dimensions use codelists, the code list URI is used to determine

equality. If no code lists exist, the dimension URI is used to determine equality

Identification of equal measures Two measures are considered equal if they have the same URI

The measure obsValue does not explain what is actually measured at the cube so equality cannot be determined

Identify and make available the key reference datasets that connect different statistical datasets – e.g. concerning geography, physical assets and areas of government policy.

26th May 2015

Challenges related to compatibility (1/2)

Page 28: A Browser for Linked Data Cubes

Eurostat Workshop 28

The approach of expanding data cubes requires small cubes i.e. cubes that describe few measures.

…but how a cube can be modelled? For example

LATC’s Eurostat: more than 5000 cubes with few measures per cube Irish Census 2011: 682 cubes with one measure per cube Digital Agenda: Only 2 cubes with more than 100 measures per cube.

We need common understanding on how to conceptually model a Cube.

26th May 2015

Challenges related to compatibility (2/2)

Page 29: A Browser for Linked Data Cubes

Data level Operations per Functionality

29

Browser functionality Data level Operations

B1: Present 2D slice of a cube • Identify single cube measure (D1), multiple cube measures (D2)• Identify cube dimensions (D3)• Identify cube attributes (D4) • Identify dimension values (D5)

B2: Add/remove measures D3

B3: Multilinguality Data available at multiple languages (D6)

B4: Add/remove Dimensions D1, D2

B5: Roll-up/drill-down Definition of a hierarchy for hierarchical data (D7)

B6: Integrated view of cubes D1, D2, D3, D4, D5

Page 30: A Browser for Linked Data Cubes

Data level Operations per Functionality

30

Aggregator functionality Data level Operations

A1: Compute aggregations across dimension Identify the unit of cube’s single measure (D8), multiple measures (D9)

A2: Compute aggregations across hierarchy D7, D8, D9

Compatibility Explorer Functionality Data level Operations

CE1: Identify compatible cubes D1, D2, D3, D4, D5, D7

Page 31: A Browser for Linked Data Cubes

Eurostat Workshop 31

According to the QB vocabulary the qb:DataStructureDefinition must have a qb:MeasureProperty that defines the measure.

The Browser follows this approach However, other approach are used:

LATC’s Eurostat dataset defines sdmx-measure:obsValue as a qb:DimensionProperty

26th May 2015

D1: Identify single cube measure

Page 32: A Browser for Linked Data Cubes

Eurostat Workshop 32

The QB vocabulary offers two options: Multimeasure Observations

Define multiple qb:MeasureProperty one for each measure attached to qb:DataStructureDefinition

Use all qb:MeasureProperty at each observation

Measure Dimension Define multiple qb:MeasureProperty components one for each measure attached to

qb:DataStructureDefinition Define a special qb:DimensionProperty named qb:measureType At each observation use one qb:MeasureProperty. The dimension qb:measureType

defines the qb:MeasureProperty used at the specific observation

The browser follows the Multimeasure Observations approach

26th May 2015

D2: Identify multiple measures

Page 33: A Browser for Linked Data Cubes

Eurostat Workshop 33

The Open Data Communities use the Measure Dimension approach The Irish 2011 Census dataset defines only one measure per cube However other approaches are followed:

LATC’s Eurostat uses a qb:DimensionProperty to encode multiple measures The same holds for Digital Agenda

Other challenges: In multiple-measure observation, missing value in one measure will invalidate

the entire observation.

26th May 2015

D2: Identify multiple measures (Challenges)

Page 34: A Browser for Linked Data Cubes

Eurostat Workshop 34

The QB vocabulary defines that the qb:DataStructureDefinition must have a qb:DimensionProperty for every dimension.

The Browsers assumes that qb:DimensionProperty defines ONLY dimensions of a cube.

Other approaches: In LATC’s Eurostat dataset (a) attributes e.g. sdmx-dimension:freq ,

property:unit and (b) sdmx-measure:obsValue are declared as qb:DimensionProperty

Digital Agenda defines the breakdown dimension, which is actually a “super-dimension” in which one can add all the values of dimensions other than time and geography.

26th May 2015

D3: Identify cube dimensions

Page 35: A Browser for Linked Data Cubes

Eurostat Workshop 35

According to the QB vocabulary a qb:DimensionProperty is connected to a skos:ConceptScheme with the values of the dimension.

The Browser get the URIs of the dimension values from the observations and the labels from the Concept Scheme.

Some other approaches: Although the Irish CSO does not connect a qb:DimensionProperty with a

qb:codeList property, it gets the dimension values from a Concept Scheme. Other challenges:

How to encode the order of dimension values

26th May 2015

D5: Identify dimension values

Page 36: A Browser for Linked Data Cubes

Eurostat Workshop 36

The names of a qb:ComponentProperty can be Directly defined either as rdfs:label or skos:prefLabel Defined through a skos:Concept connected with the qb:concept to the

qb:ComponentProperty The Browser use both ways to get the names of the values. Other approaches

LATC’s Eurostat uses rdfs:label attached to the qb:ComponentPropoerty Open Data Communities in some cubes uses the first approach and in some

others use the second

26th May 2015

D6: Data available in multiple languages

Page 37: A Browser for Linked Data Cubes

Eurostat Workshop 37

According to the RDF Data Cube Voc: Use qb:HierarchicalCodeLists Use both SKOS and XKOS vocabularies:

skos defines the hierarchy (skos:broader, skos:inScheme, skos:member etc.) xkos defines the classification levels (xkos:ClassificationLevel, xkos:numberOfLevels,

xkos:depth etc)

The Browser assumes that The levels of the hierarchies used by the cube should be defined using the

xkos:ClassificationLevel concept. Each member of the level is defined as a skos:Concept object which is related

to the xkos:ClassificationLevel by the skos:member property

26th May 2015

D7: Define a hierarchy

Page 38: A Browser for Linked Data Cubes

Eurostat Workshop 38

Other practices LATC’s Eurostat does not define hierarchies although hierarchical data exist in

the same cube e.g. city(Riga), region (South West Wales), country (Greece), set of countries (EU28) (aei_ps_alt.ttl)

Irish Census does not define hierarchies. It has data about 12 geographical levels, it defines different Code Lists per geo level, and different cube per geo level.

Open Data Communities re-uses URIs from the Spatial Relations Ontology defined by Ordnance Survey and thus reuse the hierarchy.

The Flemish Government’s dataset defines hierarchies based on SKOS and XKOS

Other challenges xkos semantics of xkos:isPartOf and xkos:generalises seem wrong

26th May 2015

Challenges regarding hierarchies

Page 39: A Browser for Linked Data Cubes

Eurostat Workshop 39

Unit attach to the dataset Declare that the sdmx-attribute:unitMeasure can be attached

(qb:componentAttachment) to the qb:DataSet Use the sdmx-attribute:unitMeasure to the qb:Dataset to define the

measure’s unit In case multiple units are used for the same measure e.g. Kilos and

grams for the measure weight, then: Declare that the sdmx-attribute:unitMeasure can be attached

(qb:componentAttachment) to the qb:Observation Use the sdmx-attribute:unitMeasure at each observation to define the

measure’s unit The Browser does not support multiple units for the same measure

26th May 2015

D8: Identify the unit of cube’s single measure

Page 40: A Browser for Linked Data Cubes

Eurostat Workshop 40

Which measurement units to use; is there an easy to use universal standard?

26th May 2015

Challenges regarding unit of measure

Page 41: A Browser for Linked Data Cubes

Eurostat Workshop 41

Challenges: “Note that one limitation of the multi-measure approach is that it is not

possible to attach an attribute to a single observed value.” If one splits the observations and has one measure per observation then

again this seems contradicting to the vocabulary: “It is also possible to attach attributes to a qb:MeasureProperty in which case the attribute is intended to apply only to that property and not to the observations in which that property occurs.”

26th May 2015

D9: Identify the unit of cube’s multiple measures

Page 42: A Browser for Linked Data Cubes

Eurostat Workshop 42

The Browser assumes that the geospatioal dimension is the sdmx:refArea or a subproperty of the sdmx:refArea

26th May 2015

Identify the geospatial dimension of the cube

Page 43: A Browser for Linked Data Cubes

Eurostat Workshop 43

The time can be represented either as URI or as a literal. The Browser takes into account both, Approaches in existing datasets:

LATC’s Eurostat uses dcterms:date in the DSD and sdmx-dimension: timePeriod in the data. No URIs but literal.

The Irish Census data does not have a time dimension Challenges:

Which is better: atomic values or identifiers for e.g. year

26th May 2015

Identify the time dimension of the cube

Page 44: A Browser for Linked Data Cubes

Eurostat Workshop 4426th May 2015