Upload
andimou
View
1.375
Download
1
Embed Size (px)
Citation preview
Machine-Interpretable Dataset and Service Descriptions
for Heterogeneous Data Access & Retrieval
Anastasia Dimou, Ruben Verborgh, Miel Vander Sande, Erik Mannens, Rik Van de Walle
[email protected] @natadimou
Ghent University – iMinds – Multimedia Lab
http://RML.io
ex:1 ex:MMLab ex:works
“Anastasia Dimou”
ex:2 ex:MMLab ex:works
“Ruben Verborgh”
ex:3 ex:MMLab ex:works
“Miel Vander Sande”
ex:1 ex:MMLab ex:works
“Anastasia Dimou”
ex:located ex:MMLab ex:Ghent
ex:2 ex:MMLab ex:works
“Ruben Verborgh”
ex:3 ex:MMLab ex:works
“Miel Vander Sande”
ex:{id}
ex:{lab}
ex:located ex:{lab} ex:{city}
sets of triples of a dataset have repetitive patterns
“{firstname} {surname}”
ex:{id}
ex:{lab}
sets of triples of a dataset have repetitive patterns
“{firstname} {surname}”
triple-oriented mapping languages formalize patterns into rules to map data to RDF
ex:located ex:{lab} ex:{city}
RDF Mapping Language (RML) map any data to RDF
uniform, integrable, interoperable, extensible
extends the W3C-recommended R2RML http://RML.io
A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. In Proceedings of the 7th Workshop on Linked Data on the Web (LDOW2014), 2014.
RML describes rules to map any structured data to RDF
RML supports any data independently of
which structure and format they have where they originally reside how they are accessed & retrieved
Mapping data any data to RDF with RML
Specifying data which data form a data input how to reference data input extracts
Accessing & Retrieving data data input from original source(s)
Mapping data any data to RDF with RML
Specifying which data form a data input how to reference data input extracts
Accessing & Retrieving data input from original source(s)
rr:constant ex:located
rr:template “http://ex.com/{lab}”
rr:template “http://ex.com/{city}”
rr:template “http://ex.com/{id}”
rr:template “http://ex.com/{lab}”
rr:template “{firstname} {surname}” rr:termType rr:Literal
RDF Mapping Language (RML)
@prefix rr: <http://www.w3.org/ns/r2rml#>
rr:constant ex:located
rr:template “http://ex.com/{lab}”
rr:template “http://ex.com/{city}”
rr:template “http://ex.com/{lab}”
rr:template “http://ex.com/{lab}”
<#ResearcherMap>
<#LabMap>
rr:template “{firstname} {surname}” rr:termType rr:Literal
Mapping data data to RDF with RML
Specifying data which data form a data input how to reference data input extracts
Accessing & Retrieving data data input from original source(s)
Triples Map
RDF Mapping Language (RML)
Predicate Object Map
Subject Map
Predicate Map
Object Map
Logical Source
Support data in Heterogeneous Structures
tabular-structured hierarchical-structured (semi-)structured
… … …
Support data in Heterogeneous Structures and Formats
tabular-structured tables in DBs or CSV files …
hierarchical-structured JSON or XML …
(semi-)structured HTML …
… … …
rr:template “http://ex.com/{id}”
rr:template “http://ex.com/{lab}”
<#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal
id firstname surname lab
1 Anastasia Dimou MMLab
2 Ruben Verborgh MMLab
3 Miel Vander Sande MMLab
support tabular-structured data
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rr:template “http://ex.com/
{/labs/lab/location/city}”
<#LabMap>
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
support hierarchical-structured data
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rr:template “http://ex.com/
{/labs/lab/location/city}”
<#LabMap>
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
How to reference data extracts?
Triples Map
RDF Mapping Language (RML)
Predicate Object Map
Subject Map
Predicate Map
Object Map
Logical Source
Reference Formulation
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
<#Lab Logical
Source>
ql:XPath
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rr:template “http://ex.com/
{/labs/lab/location/city}”
<#LabMap>
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
<#Lab Logical
Source>
ql:XPath
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rr:template “http://ex.com/
{/labs/lab/location/city}”
<#LabMap>
How to iterate over the data?
Triples Map
RDF Mapping Language (RML)
Predicate Object Map
Subject Map
Predicate Map
Object Map
Logical Source
Reference Formulation
iterator
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
<#Lab Logical
Source>
ql:XPath
“/labs/lab”
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rr:template “http://ex.com/
{/labs/lab/location/city}”
<#LabMap>
Mapping data data to RDF with RML
Specifying data which data form a data source how to reference data extracts
Accessing & Retrieving data data from their original sources
Data source
Acce
ss in
terface
Input data
Input data
Input data
Output RDF
Mapping module
RML Processor
Map doc
Data source
Acce
ss in
terface
Data source
Acce
ss in
terface
Retrieval module
Source description
Data source
Acce
ss in
terface
Input data
Input data
Input data
Output RDF
Mapping module
RML Processor
Map doc
Data source
Acce
ss in
terface
Data source
Acce
ss in
terface
Retrieval module
Source description
Where does this data originally come from?
Support different Locations and Access Interfaces
Local File(s) Database connectivity
Web source(s)
RDF source(s)
Dataset and Service Vocabularies advertising in machine-interpretable fashion how to access the underlying data
can also be used in combination with RML
to retrieve the data input to be mapped
from its original source
Support different Locations and Access Interfaces
Local File(s) Database connectivity D2RQ
Web source(s) (Web API/service)
DCAT, CSVW, Hydra, VOiD (Dataset)
RDF source(s) VOiD (Endpoint), SPARQL-SD
Triples Map
RDF Mapping Language (RML)
Predicate Object Map
Subject Map
Predicate Map
Object Map
Logical Source
Reference Formulation
iterator
Source
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
<#Lab Logical
Source>
ql:XPath
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rr:template “http://ex.com/
{/labs/lab/location/city}”
<#LabMap>
“/labs/lab”
_:Source
Where does this data originally come from?
file.xml
XML data
Output RDF
Mapping module
RML Processor
Map doc
Retrieval module
Support Local File(s)
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
<#Lab Logical
Source>
ql:XPath
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rr:template “http://ex.com/
{/labs/lab/location/city}”
<#LabMap>
“/labs/lab”
“file.xml”
Support Local File(s)
file.xml
WEB
AP
I D
CA
T
XML data
Output RDF
Mapping module
RML Processor
Map doc
Retrieval module
Source description
Support file(s) published on the Web
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
<#Lab Logical
Source>
ql:XPath
dcat: distribution
a dcat: Distribution
“/labs/lab”
_:Source
Support dataset on the Web (DCAT)
_:Source
dcat:Dataset
<http://ex.com/ file.xml>
dcat: downloadUrl
file.xml
WEB
AP
I D
CA
T
XML data
JSON data
Output RDF
Mapping module
RML Processor
Map doc
Data repo
WEB
AP
I H
ydra
Retrieval module
Source description
Support data derived from a Web API
<labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs>
<#Lab Logical
Source>
ql:XPath
hydra: template
“http://ex.com/lab? name={labName}”
“/labs/lab”
_:Source
Support data from a Web API (Hydra)
_:Source
hydra: IriTemplate
file.xml
WEB
AP
I D
CA
T
XML data
JSON data
tabular data
Output RDF
Mapping module
RML Processor
Map doc
Data repo
WEB
AP
I H
ydra
Data base
JDB
C
D2
RQ
Retrieval module
Source description
rr:template “http://ex.com/{id}”
rr:template “http://ex.com/{lab}”
<#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal
id firstname surname lab
1 Anastasia Dimou MMLab
2 Ruben Verborgh MMLab
3 Miel Vander Sande
MMLab
Support tabular-structured data
<#DB Logical
Source>
rr:SQL2008
“…”
_:Source
“SELECT …”
rr:template “http://ex.com/{id}”
rr:template “http://ex.com/{lab}”
<#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal
Support tabular-structured data
<#DB Logical
Source>
rr:SQL2008
“…”
_:Source
“SELECT …”
“…”
_:Source
d2rq:Database
“…”
“…”
“…”
file.xml
WEB
AP
I D
CA
T
XML data
JSON data
tabular data
Output RDF
Mapping module
RML Processor
Map doc
Data repo
WEB
AP
I H
ydra
Data base
JDB
C
D2
RQ
Retrieval module
Source description
Triple store
SPAR
QL
ex:located ex:{lab} dbpedia:
{city}
ex:located ex:{lab} ex:{city}
object defined in existing RDF source(s)
<#Lab Logical
Source>
ql:XPath
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rml:reference “{/…/city}” rr:termType rr:IRI
<#LabMap>
“/labs/lab”
_:Source
<#Dbpedia Logical
Source>
ql:XPath
“/…/result”
DBpedia
<#DBpediaMap>
ex:located ex:{lab} dbpedia:
{city}
“SELECT …”
<#Lab Logical
Source>
ql:XPath
rr:constant ex:located
rr:template “http://ex.com/
{/labs/lab/short}”
rml:reference “{/…/city}” rr:termType rr:IRI
<#LabMap>
“/labs/lab”
_:Source
<#Dbpedia Logical
Source>
ql:XPath
“/…/result”
DBpedia
<#DBpediaMap>
ex:located ex:{lab} dbpedia:
{city}
“SELECT …”
Mapping data any data to RDF with RML
Specifying data which data form a data input how to reference data input extracts
Accessing & Retrieving data data input from original source(s)
Data access, retrieval and mapping descriptions
are machine-interpretable Granular robust solution based on RML which further automates and facilitates the generation of RDF representations