34
RDB2RDF mapping with D2RQ and D2R Server Richard Cyganiak Presentation to W3C RDB2RDF WG, 10 Nov 2009

d2rq Rdb2rdf Wg Slides

Embed Size (px)

Citation preview

Page 1: d2rq Rdb2rdf Wg Slides

RDB2RDF mapping with D2RQ and D2R Server

Richard CyganiakPresentation to W3C RDB2RDF WG, 10 Nov 2009

Page 2: d2rq Rdb2rdf Wg Slides

Topics

1. The D2RQ project

2. The D2RQ mapping language

3. Requirements for RDB2RDF

2

Page 3: d2rq Rdb2rdf Wg Slides

1. The project

3

Page 4: d2rq Rdb2rdf Wg Slides

D2RQ

• DB-to-RDF mapper written in Java

• In: any JDBC database

• Out: SPARQL, Linked Data, ETL, Jena API

• GPL, popular, easy to get started

• Axiom: We never modify the database

4

Page 5: d2rq Rdb2rdf Wg Slides

The project

• Started 2004 (roots: 2002) by Chris Bizer at FU Berlin; later me at FU and HP Labs; today Christian Becker, Andy Langegger

• 250+ downloads/month, 8700+ total

• mailing list at ~20 msgs/month, 1000+ total

• In LOD cloud, LinkedMDB, LODD,TopBraid Composer

5

Page 6: d2rq Rdb2rdf Wg Slides

D2R Server + SPARQLintroduced

6

Page 7: d2rq Rdb2rdf Wg Slides

Architecture

SPARQL

RDF

HTML

Jena/Sesame

RDF dump

SPARQL

Clients

Linked Data

Clients

HTML

Browsers

Non-RDF

Database

HTTP

Local Java

Application

Triple Store

D2RQ

Engine

D2R

Server

D2RQ Mapping

File

7

Page 8: d2rq Rdb2rdf Wg Slides

• maps DB to virtual RDF graph

• easy to offer arbitrary interfaces to the RDF graph

• most requested: SPARQL and RDF dumps

Architecture (2)

8

Page 9: d2rq Rdb2rdf Wg Slides

2. Mapping language

9

Page 10: d2rq Rdb2rdf Wg Slides

Mapping language

• N3 based syntax

• Very flexible

• Language is not trivial, wish we had a GUI

• Usual workflow: auto-generate mapping from DB schema, then customize

10

Page 11: d2rq Rdb2rdf Wg Slides

Flexible mappings!

• Properties of one class from multiple tables

• Several classes in the same table

• Value translations, SQL expressions

• Arbitrary joins and SQL conditions

11

Page 12: d2rq Rdb2rdf Wg Slides

To SQL or not to SQL?

• Users want to deal with complexity by using their SQL knowledge

• They want to write arbitrary SQL queries

• We don’t want to parse SQL (painful, DB differences)

• We force users to decompose their query into small fragments

12

Page 13: d2rq Rdb2rdf Wg Slides

Mapping process

1. Define DB connection

2. Define your entities

3. Add properties to entites

4. Link entities together

5. Advanced stuff: conditions, joins, value translations

13

Page 14: d2rq Rdb2rdf Wg Slides

1. Define DB connection

14

Page 15: d2rq Rdb2rdf Wg Slides

map:MyDatabase a d2rq:Database; d2rq:jdbcDSN "jdbc:mysql://localhost/mydb"; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:username "user"; d2rq:password "password".

15

Page 16: d2rq Rdb2rdf Wg Slides

2. Define your entities

16

Page 17: d2rq Rdb2rdf Wg Slides

(SQL fragments in red)

map:People a d2rq:ClassMap; d2rq:uriPattern “http://.../people/@@User.ID@@”.

17

Page 18: d2rq Rdb2rdf Wg Slides

map:People a d2rq:ClassMap; d2rq:uriPattern “http://.../people/@@User.ID@@”; d2rq:condition “User.deleted=0”.

18

Page 19: d2rq Rdb2rdf Wg Slides

map:People a d2rq:ClassMap; d2rq:bNodeIdColumns “User.ID”; d2rq:condition “User.deleted=0”.

19

Page 20: d2rq Rdb2rdf Wg Slides

3. Add properties to entities

20

Page 21: d2rq Rdb2rdf Wg Slides

map:People a d2rq:ClassMap; d2rq:uriPattern “http://.../people/@@User.ID@@”; d2rq:condition “User.deleted=0”; d2rq:class foaf:Person .

(SQL fragments in red, RDFS/OWL vocabulary in blue)

21

Page 22: d2rq Rdb2rdf Wg Slides

map:People a d2rq:ClassMap .

map:name a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:nick; d2rq:column “User.name”.

map:mbox a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:mbox; d2rq:uriPattern “mailto:@@User.email@@”.

22

sp

o

sp

o

Page 23: d2rq Rdb2rdf Wg Slides

map:mbox_sha1 a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:mbox_sha1sum; d2rq:sqlExpression “SHA1(CONCAT(‘mailto:’, User.email))”.

23

Page 24: d2rq Rdb2rdf Wg Slides

4. Link your entities

24

Page 25: d2rq Rdb2rdf Wg Slides

map:Photos a d2rq:ClassMap; d2rq:uriPattern “http://.../photo/@@Photo.ID@@”; d2rq:class foaf:Image .

25

Page 26: d2rq Rdb2rdf Wg Slides

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:uriPattern “http://.../photo/@@Photo.UserID@@”.

26

(Photo.UserID is a foreign key to User.ID)

Page 27: d2rq Rdb2rdf Wg Slides

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .

Better, less repitition

27

Page 28: d2rq Rdb2rdf Wg Slides

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .

(also d2rq:alias for self-joins)

Better, less repitition

28

Page 29: d2rq Rdb2rdf Wg Slides

Mapping file overview

:be

lon

gsTo

Cla

ssM

ap

:property

:property

:property

:property

:property

:property

:dataStorage

:dataStorage:refersToClassMap

:join "Paper.author=Author.ID"

:uriColumn "Paper.weblink"

:column "Paper.abstract"

:column "Paper.title"

:pattern "@@Author.first@@ @@Author.last@@"

:uriPattern "mailto:@@Author.email@@"

map:Database

map:title_PropertyBridge

map:abstract_PropertyBridge

map:author_PropertyBridge

map:weblink_PropertyBridge

foaf:Person

dcmi:Text

dc:title

dc:description

owl:sameAs

dc:creator

foaf:name

foaf:mboxmap:email_PropertyBridge

map:name_PropertyBridge

:uriPattern "/docs/@@Paper.ID@@"

map:Paper_ClassMap

:uriPattern "/people/@@Author.ID@@"

map:Author_ClassMap

:be

lon

gsTo

Cla

ssM

ap

:class

:class

29

Page 30: d2rq Rdb2rdf Wg Slides

3. RDB2RDF Requirements

30

Page 31: d2rq Rdb2rdf Wg Slides

Syntax?

• Turtle, XML, SPARQL-like, SQL-like?

• Should be human-writable

• Would like to avoid parsing SQL

• “SQL Query + RDF template” vs.“RDF Graph + SQL fragment”

31

Page 32: d2rq Rdb2rdf Wg Slides

Expressivity?

• Arbitrary SQL for value transforms and conditions

• Dynamic properties

• Char-separated lists within values

• Transformation tables (for type codes)

32

Page 33: d2rq Rdb2rdf Wg Slides

DB compatibility?

• Syntax rules for table/column names (espacing, case sensitivity)

• Datatypes

• Extension functions

• “AS”, “LIMIT”, “CONCAT”

33

Page 34: d2rq Rdb2rdf Wg Slides

Links

• D2RQ homepagehttp://www4.wiwiss.fu-berlin.de/bizer/d2rq/

• D2RQ manual & language spechttp://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/

• Mailing [email protected]

34