81
RML.io Generating High Quality Linked Open Data from Open or Not Data Anastasia Dimou Data Science Lab, Ghent University - iMinds [email protected] @natadimou

Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Embed Size (px)

Citation preview

Page 1: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RML.io Generating High Quality

Linked Open Data from Open or Not Data

Anastasia Dimou Data Science Lab, Ghent University - iMinds

[email protected] @natadimou

Page 2: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

What is the Semantic Web?

Page 3: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

The Semantic Web is the extension of the World Wide Web

Page 4: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Are you the owner of your data? OR is the application that hosts your data?

Page 5: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

The Semantic Web is the extension of the World Wide Web

enables sharing content beyond the boundaries of applications & websites

Page 6: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

The Web for humans, thanks to HTML, is understandable & constant BUT is the Web for machines too?

Page 7: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

The Semantic Web is the extension of the World Wide Web

enables sharing content beyond the boundaries of applications & websites

allows machines to understand the meaning of hyperlinked information

Page 8: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Semantic Web enabled applications rely on data represented as Linked Data

Page 9: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

What is Linked (Open) Data?

Page 10: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Linked (Open) Data

a standardized way of expressing the relationships between data

semantically annotated the data with different vocabularies or ontologies describe domain-level knowledge understandable by humans & machines

Page 11: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

How is Linked Data published?

Page 12: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Linked (Open) Data published in the form of RDF datasets

Page 13: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Resource Description Framework (RDF) is the prevalent data model for describing Linked (Open) Data driven by unique identifiers (URIs) allows establishing a shared meaning

predicate subject object

Page 14: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

How is Linked Data derived

from (semi-)structured data?

Page 15: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

How is Linked Data derived

from (semi-)structured data?

id firstname lastname lab city

1 Anastasia Dimou DSLab Ghent

2 Ruben Verborgh DSLab Ghent

3 Erik Mannens DSLab Ghent

Page 16: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Person 1 Data Science

Lab works

“Anastasia Dimou”

located DataScience Lab

Ghent

Person 2 Data Science

Lab works

“Ruben Verborgh”

Person 3 DataScience

Lab works

“Erik Mannens”

Page 17: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Person {id}

{lab}

Assign unique identifiers (URIs)

“{firstname} {surname}”

http:://ex.com{id}

http://ex.com{lab}

“{firstname} {surname}”

Page 18: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Annotate data relationships with ontologies

http:://ex.com{id}

http://ex.com{lab}

“{firstname} {surname}”

http:://ex.com{id}

http://ex.com{lab}

“{firstname} {surname}”

Page 19: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

ex:1 ex:DSLab ex:works

“Anastasia Dimou”

ex:located ex:DSLab ex:Ghent

ex:2 ex:DSLab ex:works

“Ruben Verborgh”

ex:3 ex:DSLab ex:works

“Erik Mannens”

Page 20: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

ex:{id}

ex:{lab}

ex:located ex:{lab} ex:{city}

sets of triples of a dataset have repetitive patterns

“{firstname} {surname}”

Page 21: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

ex:{id}

ex:{lab}

sets of triples of a dataset have repetitive patterns

“{firstname} {surname}”

RDF dataset generation tools rely their implementation on repetitively applying those patterns to input data

ex:located ex:{lab} ex:{city}

Page 22: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

What are the different Linked Data Generation approaches?

Page 23: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Linked Data generation approaches

case-specific solutions OR format and source specific

Page 24: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

R2RML mappings R2RML processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML

RDF RDF RDF

Page 25: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RDF Terms (focusing on IRIs) are…

generated independently disregarding their possible prior definitions

manually replicated by reconstructing the same URIs (if possible)

manually aligned afterwards links with other datasets are defined after the RDF terms are published

Page 26: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Why not a uniform approach?

Page 27: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Uniform and declarative RDF generation

from heterogeneous data sources

mappings processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML RDF

Page 28: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RDF Mapping Language (RML)

generic scalable mapping language

for generating and interlinking

RDF data from heterogeneous resources

in an integrable and interoperable fashion

superset of the W3C standardized

R2RML mapping language

http://rml.io

Page 29: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Uniform and declarative RDF generation

from heterogeneous data sources

RML mappings processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML RDF

Page 30: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings

Page 31: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Defining Mappings to generate RDF data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings

Page 32: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RML describes how to generated RDF from structured data

predicate subject object

Predicate Map Subject Map

Object Map

<#TriplesMap>

Page 33: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

rr:constant ex:located

rr:template “http://ex.com/{lab}”

rr:template “http://ex.com/{city}”

rr:template “http://ex.com/{id}”

rr:template “http://ex.com/{lab}”

<#ResearcherMap>

<#LabMap>

rr:template “{firstname} {surname}” rr:termType rr:Literal

RDF Mapping Language (RML)

Page 34: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Extraction Module Mapping Module

RML Processor

Page 35: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings

Page 36: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Page 37: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RML describes rules to map any structured data to RDF

RML supports any data independently of

which structure and format they have where they originally reside how they are accessed & retrieved

Page 38: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Specifying data which data form a data input how to reference data input extracts

Accessing & Retrieving data data input from original source(s)

Page 39: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Specifying data which data form a data input how to reference data input extracts

Accessing & Retrieving data data input from original source(s)

Page 40: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Page 41: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Support data in Heterogeneous Structures and Formats

tabular-structured tables in DBs or CSV files …

hierarchical-structured JSON or XML …

(semi-)structured HTML …

… … …

Page 42: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

rr:template “http://ex.com/{id}”

rr:template “http://ex.com/{lab}”

<#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal

id firstname surname lab

1 Anastasia Dimou DSLab

2 Ruben Verborgh DSLab

3 Erik Mannens DSLab

tabular-structured data

Page 43: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

hierarchical-structured data

Page 44: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Reference Formulation

Page 45: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

Page 46: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Reference Formulation

iterator

Page 47: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

“/labs/lab”

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

Page 48: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Specifying data which data form a data input how to reference data input extracts

Accessing & Retrieving data data input from original source(s)

Page 49: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Input data

Input data

Input

data

Output RDF

Mapping module

RML Processor

Map doc

Page 50: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Data source

Acce

ss in

terface

Input data

Input data

Input data

Output RDF

Mapping module

RML Processor

Map doc

Data source

Acce

ss in

terface

Data source

Acce

ss in

terface

Retrieval module

Source description

Page 51: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Support different Locations and Access Interfaces

Local File(s) Database connectivity D2RQ

Web source(s) (Web API/service)

DCAT, CSVW, Hydra, VOiD (Dataset)

RDF source(s) VOiD (Endpoint), SPARQL-SD

Page 52: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Reference Formulation

iterator

Source

Page 53: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

file.xml

WEB

AP

I D

CA

T

XML data

JSON data

tabular data

Output RDF

Mapping module

RML Processor

Map doc

Data repo

WEB

AP

I H

ydra

Data base

JDB

C

D2

RQ

Retrieval module

Source description

Triple store

SPAR

QL

Page 54: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings

Page 55: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

http://example.com/ Giddeon_Massie

dbo:Event

"1981-08-27" xsd:gYear

http://example.com/ Brick_Bronsky

dbo:Event

"1964" xsd:gYear

http://example.com/ Steve_Meilinger

dbo:Event

"1930-12-12" xsd:gYear

dbo:birthDate http://example.com/

Chuck_Bednarik dbo:Event

"1925-05-01" xsd:gYear

http://example.com/ Matt_McBride

dbo:Event

"1985-05-23" xsd:gYear

dbo:birthDate

dbo:birthDate

dbo:birthDate

dbo:birthDate

Page 56: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

dbo:birthDate range xsd:date dbo:birthDate domain dbo:Person

http://example.com/ Chuck_Bednarik

dbo:Event

"1925-05-01" xsd:gYear

dbo:birthDate

Page 57: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Violations Most frequent violations are related to how vocabularies or ontologies are applied to the data

dbo:birthDate range xsd:date dbo:birthDate domain dbo:Person

http://example.com/ Chuck_Bednarik

dbo:Event

"1925-05-01" xsd:gYear

dbo:birthDate

Page 58: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RDF DQA with RDFUnit

test-driven data-debugging framework

based on SPARQL-patterns

dbo:birthDate http://example.com/

Chuck_Bednarik dbo:Event

"1925-05-01" xsd:gYear

http://rdfunit.aksw.org

Page 59: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

DQA: Dataset Quality Assessment

Adjustments to the dataset are manually but rarely applied but not at the root (hard to identify)

are overwritten if a new version of

the original data is mapped & published

violations DQA

Page 60: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Instead of applying Quality Assessment to the already published RDF dataset

as part of data consumption

Apply Quality Assessment to the Mappings that generate the RDF dataset

Page 61: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

MQA: Mapping Quality Assessment

discover violations before they are even generated

specify the origin of the violation

easily apply structural adjustments to the mappings

Page 62: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

sets of triples of a dataset have repetitive patterns

dbo:birthDate http://example.com/ {Name}_{Surname}

dbo:Event

“Birth" xsd:gYear

Mapping languages formalize patterns into rules to generate the RDF dataset from the original data

Page 63: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

MQA with RDFUnit over RML

dbo:birthDate http://example.com/

Chuck_Bednarik

dbo:Person

"1925-05-01"

xsd:date

DEL: <#ObjectMap> rr:datatype xsd:gYear ADD: <#ObjectMap> rr:datatype xsd:date

Page 64: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

data map doc

Mapping Processor

violations MDQA

MDQA: Uniform Mapping & Dataset Quality Assessment

Page 65: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Dataset Vs Mapping Quality Assessment Dataset Quality Assessment Mapping Quality Assessment

size time size time

DBPedia EN 62M 16h 115K 11s

DBPedia NL 21M 1.5h 53K 6s

DBpedia all 511K 32s

* http://mappings.dbpedia.org/validation

Live update of DBpedia Mapping Quality Assessment results every night!

Page 66: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings

Page 67: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Metadata

manually defined by data publishers (person-agents), rather than produced by applications (software-agents)

Page 68: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Consider mapping rules to automatically generate self-descriptive provenance and other metadata

Page 69: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

W3C standardized Metadata

PROV provenance information

VoID expressing RDF dataset metadata

general metadata structural metadata, links between datasets

DCAT describe datasets in data catalogs

Page 70: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings

Page 71: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Semantic Web experts Vs. Data specialists

Modeling Domain Knowledge as Linked (Open) Data is not straightforward for Data Specialists Data context is not straightforward for Semantic Web experts

Page 72: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Semantic Web experts Vs. Data specialists

Data Specialists should be able to specify the mappings, modify and extend them at any time

Page 73: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Approaches for Editing Mappings

Page 74: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Page 75: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Page 76: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Page 77: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RML Editor

http://rml.io/RMLeditor

Page 78: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings

Page 79: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

The five stars of the Linked Open Data scheme should not be approached as a set of consecutive steps

Page 80: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

Well-considered policy regarding mapping and interlinking of data in the context of a certain knowledge domain

Page 81: Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

RML.io Generating High Quality

Linked Open Data from Open or Not Data

Anastasia Dimou Data Science Lab, Ghent University - iMinds

[email protected] @natadimou