58
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data Access & Retrieval Anastasia Dimou, Ruben Verborgh, Miel Vander Sande, Erik Mannens, Rik Van de Walle [email protected] @natadimou Ghent University – iMinds – Multimedia Lab http://RML.io

Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data Access and Retrieval

  • Upload
    andimou

  • View
    1.375

  • Download
    1

Embed Size (px)

Citation preview

Machine-Interpretable Dataset and Service Descriptions

for Heterogeneous Data Access & Retrieval

Anastasia Dimou, Ruben Verborgh, Miel Vander Sande, Erik Mannens, Rik Van de Walle

[email protected] @natadimou

Ghent University – iMinds – Multimedia Lab

http://RML.io

Semantic Web enabled applications

rely on data represented as

Linked Open Data

Linked Open Data describe domain-level knowledge that is understandable by both humans and machines

Resource Description Framework (RDF) is the prevalent data model for describing Linked Open Data

predicate subject object

Resource Description Framework (RDF)

ex:1 ex:MMLab ex:works

“Anastasia Dimou”

ex:1 ex:MMLab ex:works

“Anastasia Dimou”

ex:2 ex:MMLab ex:works

“Ruben Verborgh”

ex:1 ex:MMLab ex:works

“Anastasia Dimou”

ex:2 ex:MMLab ex:works

“Ruben Verborgh”

ex:3 ex:MMLab ex:works

“Miel Vander Sande”

ex:1 ex:MMLab ex:works

“Anastasia Dimou”

ex:located ex:MMLab ex:Ghent

ex:2 ex:MMLab ex:works

“Ruben Verborgh”

ex:3 ex:MMLab ex:works

“Miel Vander Sande”

ex:{id}

ex:{lab}

ex:located ex:{lab} ex:{city}

sets of triples of a dataset have repetitive patterns

“{firstname} {surname}”

ex:{id}

ex:{lab}

sets of triples of a dataset have repetitive patterns

“{firstname} {surname}”

triple-oriented mapping languages formalize patterns into rules to map data to RDF

ex:located ex:{lab} ex:{city}

RDF Mapping Language (RML) map any data to RDF

uniform, integrable, interoperable, extensible

extends the W3C-recommended R2RML http://RML.io

A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. In Proceedings of the 7th Workshop on Linked Data on the Web (LDOW2014), 2014.

RML describes rules to map any structured data to RDF

RML supports any data independently of

which structure and format they have where they originally reside how they are accessed & retrieved

data access and retrieval is manually performed remains hard-coded

Mapping data any data to RDF with RML

Specifying data which data form a data input how to reference data input extracts

Accessing & Retrieving data data input from original source(s)

Mapping data any data to RDF with RML

Specifying which data form a data input how to reference data input extracts

Accessing & Retrieving data input from original source(s)

rr:constant ex:located

rr:template “http://ex.com/{lab}”

rr:template “http://ex.com/{city}”

rr:template “http://ex.com/{id}”

rr:template “http://ex.com/{lab}”

rr:template “{firstname} {surname}” rr:termType rr:Literal

RDF Mapping Language (RML)

@prefix rr: <http://www.w3.org/ns/r2rml#>

Predicate Map Subject Map

Object Map

<#TriplesMap>

RDF Mapping Language (RML)

rr:constant ex:located

rr:template “http://ex.com/{lab}”

rr:template “http://ex.com/{city}”

rr:template “http://ex.com/{lab}”

rr:template “http://ex.com/{lab}”

<#ResearcherMap>

<#LabMap>

rr:template “{firstname} {surname}” rr:termType rr:Literal

Mapping data data to RDF with RML

Specifying data which data form a data input how to reference data input extracts

Accessing & Retrieving data data input from original source(s)

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Support data in Heterogeneous Structures

tabular-structured hierarchical-structured (semi-)structured

… … …

Support data in Heterogeneous Structures and Formats

tabular-structured tables in DBs or CSV files …

hierarchical-structured JSON or XML …

(semi-)structured HTML …

… … …

rr:template “http://ex.com/{id}”

rr:template “http://ex.com/{lab}”

<#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal

id firstname surname lab

1 Anastasia Dimou MMLab

2 Ruben Verborgh MMLab

3 Miel Vander Sande MMLab

support tabular-structured data

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

support hierarchical-structured data

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

How to reference data extracts?

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Reference Formulation

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

How to iterate over the data?

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Reference Formulation

iterator

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

“/labs/lab”

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

Mapping data data to RDF with RML

Specifying data which data form a data source how to reference data extracts

Accessing & Retrieving data data from their original sources

Input data

Input data

Input

data

Output RDF

Mapping module

RML Processor

Map doc

Data source

Acce

ss in

terface

Input data

Input data

Input data

Output RDF

Mapping module

RML Processor

Map doc

Data source

Acce

ss in

terface

Data source

Acce

ss in

terface

Retrieval module

Source description

Data source

Acce

ss in

terface

Input data

Input data

Input data

Output RDF

Mapping module

RML Processor

Map doc

Data source

Acce

ss in

terface

Data source

Acce

ss in

terface

Retrieval module

Source description

Where does this data originally come from?

Support different Locations and Access Interfaces

Local File(s) Database connectivity

Web source(s)

RDF source(s)

Dataset and Service Vocabularies advertising in machine-interpretable fashion how to access the underlying data

can also be used in combination with RML

to retrieve the data input to be mapped

from its original source

Support different Locations and Access Interfaces

Local File(s) Database connectivity D2RQ

Web source(s) (Web API/service)

DCAT, CSVW, Hydra, VOiD (Dataset)

RDF source(s) VOiD (Endpoint), SPARQL-SD

Triples Map

RDF Mapping Language (RML)

Predicate Object Map

Subject Map

Predicate Map

Object Map

Logical Source

Reference Formulation

iterator

Source

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

“/labs/lab”

_:Source

Where does this data originally come from?

file.xml

XML data

Output RDF

Mapping module

RML Processor

Map doc

Retrieval module

Support Local File(s)

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rr:template “http://ex.com/

{/labs/lab/location/city}”

<#LabMap>

“/labs/lab”

“file.xml”

Support Local File(s)

file.xml

WEB

AP

I D

CA

T

XML data

Output RDF

Mapping module

RML Processor

Map doc

Retrieval module

Source description

Support file(s) published on the Web

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

dcat: distribution

a dcat: Distribution

“/labs/lab”

_:Source

Support dataset on the Web (DCAT)

_:Source

dcat:Dataset

<http://ex.com/ file.xml>

dcat: downloadUrl

file.xml

WEB

AP

I D

CA

T

XML data

JSON data

Output RDF

Mapping module

RML Processor

Map doc

Data repo

WEB

AP

I H

ydra

Retrieval module

Source description

Support data derived from a Web API

<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>

<#Lab Logical

Source>

ql:XPath

hydra: template

“http://ex.com/lab? name={labName}”

“/labs/lab”

_:Source

Support data from a Web API (Hydra)

_:Source

hydra: IriTemplate

file.xml

WEB

AP

I D

CA

T

XML data

JSON data

tabular data

Output RDF

Mapping module

RML Processor

Map doc

Data repo

WEB

AP

I H

ydra

Data base

JDB

C

D2

RQ

Retrieval module

Source description

rr:template “http://ex.com/{id}”

rr:template “http://ex.com/{lab}”

<#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal

id firstname surname lab

1 Anastasia Dimou MMLab

2 Ruben Verborgh MMLab

3 Miel Vander Sande

MMLab

Support tabular-structured data

<#DB Logical

Source>

rr:SQL2008

“…”

_:Source

“SELECT …”

rr:template “http://ex.com/{id}”

rr:template “http://ex.com/{lab}”

<#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal

Support tabular-structured data

<#DB Logical

Source>

rr:SQL2008

“…”

_:Source

“SELECT …”

“…”

_:Source

d2rq:Database

“…”

“…”

“…”

file.xml

WEB

AP

I D

CA

T

XML data

JSON data

tabular data

Output RDF

Mapping module

RML Processor

Map doc

Data repo

WEB

AP

I H

ydra

Data base

JDB

C

D2

RQ

Retrieval module

Source description

Triple store

SPAR

QL

ex:located ex:{lab} dbpedia:

{city}

ex:located ex:{lab} ex:{city}

object defined in existing RDF source(s)

<#Lab Logical

Source>

ql:XPath

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rml:reference “{/…/city}” rr:termType rr:IRI

<#LabMap>

“/labs/lab”

_:Source

<#Dbpedia Logical

Source>

ql:XPath

“/…/result”

DBpedia

<#DBpediaMap>

ex:located ex:{lab} dbpedia:

{city}

“SELECT …”

<#Lab Logical

Source>

ql:XPath

rr:constant ex:located

rr:template “http://ex.com/

{/labs/lab/short}”

rml:reference “{/…/city}” rr:termType rr:IRI

<#LabMap>

“/labs/lab”

_:Source

<#Dbpedia Logical

Source>

ql:XPath

“/…/result”

DBpedia

<#DBpediaMap>

ex:located ex:{lab} dbpedia:

{city}

“SELECT …”

RML Editor (http://RML.io/RMLeditor)

Mapping data any data to RDF with RML

Specifying data which data form a data input how to reference data input extracts

Accessing & Retrieving data data input from original source(s)

Data access, retrieval and mapping descriptions

are machine-interpretable Granular robust solution based on RML which further automates and facilitates the generation of RDF representations

RML.io

Questions?

Anastasia Dimou @natadimou