23
Improve the way you create, manage and distribute information www.innodata-isogen.com INNOVATION INSPIRATION Relational database integration with RDF/OWL Bob DuCharme December 7, 2006 XML 2006

Improve the way you create, manage and distribute information INNOVATION INSPIRATION Relational database integration with RDF/OWL

Embed Size (px)

Citation preview

Improve the way you create, manage and distribute information

www.innodata-isogen.com

INNOVATION INSPIRATION

Relational database integration with RDF/OWL

Bob DuCharme

December 7, 2006

XML 2006

2

2

About me

• Senior Consultant, Innodata Isogen

• weblog: http://www.snee.com/bobdc.blog

• other writing: See http://www.snee.com/bob

3

3

What is an RDF/OWL ontology?

• Ontology: “Computational formalization of a subject matter” (Bijan Parsia et al)

• Describe metadata about resource classes and their relationships

• Web Ontology Language a W3C update of DAML+OIL

• Good fit with Knowledge Representation and other AI work

• Ontologies vs. traditional schemas

4

4

“Ontologies for the sake of ontologies”

• If metadata is data about data, what data is your metadata about?

• Field of Dreams attitude of many ontology developers

5

5

RDF in one slide

• A data model, not a syntax.• Three-part statement called a triple:

(Subject, Predicate, Object)

• For example: (urn:isbn:0553213113, http://purl.org/dc/elements/1.1/creator, ”Herman Melville”)

• Great for loosely structured data, but…

6

6

RDBMS integration with RDF/OWL

• This presentation: background + demo• Paper accompanying presentation:

7

7

Use Cases

• Two address book databases that use different names (e.g. workState, businessState)

• Find useful queries across the two that are easier in SPARQL than in SQL, thanks to RDF/OWL:• Who works in NY state? • List any phone numbers (home, mobile, business, etc.)

that I have for Alfred Adams.• Find all info for Bobby Fischer at 2304 Eighth Lane,

even if the other database lists him as Robert L. Fischer of 2304 8th Ln.

8

8

Basic Steps

• Generate data• Load into MySQL• Let D2RQ (RDBMS/RDF interface server) know

about those databases• Get a dump of representative RDF data• Create ontology for that data• Issue ontology-aware SPARQL queries against

that data

9

9

Generate Data

• Fill out every field in a Eudora address book entry, export to CSV, see what’s there

• Repeat for Outlook• Write python script to generate data, e.g.

"Miguel","[email protected]","Miguel Porter","Miguel","Porter","1462 Oak St.","Kitchener","TN","US","67117-2620","(364) 769-1070","(431) 985-7923","(850) 998-7790","http://www.radioshack.com/Miguel","RadioShack","","2109 Green Ave.","Boston","MP","US","48379-6760","(824) 959-5268","(354) 384-8517","(992) 963-9772","http://www.radioshack.com", "[email protected]","(748) 965-6871","","Here is a sample note.\n\nThat was two carriage returns."

10

10

Load into MySQL

CREATE DATABASE eudora;

USE eudora;

CREATE TABLE entries (

nickname VARCHAR(20),

email1 VARCHAR(50),

fullName VARCHAR(30),

firstName VARCHAR(15),

lastName VARCHAR(20),

address VARCHAR(60),

# etc.

PRIMARY KEY (lastName,firstName)

);

11

11

Tell D2RQ about databases

• Generate mapping files (command lines split):• generate-mapping -o eudoraMapping.ttl -u root -p mypw

jdbc:mysql://localhost/eudora • generate-mapping -o outlookMapping.ttl -u root -p if27

jdbc:mysql://localhost/outlook

• Combine two mapping files• Start server with combined mapping file:

• d2r-server comboMapping.ttl

12

12

Get some data to use for ontology creation

•SPARQL Query:

CONSTRUCT { ?s ?p ?o }

WHERE { ?s ?p ?o }

•URL version:http://localhost:2020/sparql?query=CONSTRUCT+%7B+%3Fs+%3Fp+%3Fo+%7D+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D

13

13

rdfcat.xsl

• XSLT 1.0 stylesheet to create a single RDF file from a source file like this:

<rdfcat xmlns:xi="http://www.w3.org/2001/XInclude">

<xi:include href=“myfile1.rdf"/>

<xi:include href=“myfile2.rdf"/>

<xi:include href=“myfile3.rdf"/>

</rdfcat>

14

14

List of files to concatenate together (rdfcat.rdf)

<rdfcat xmlns:xi="http://www.w3.org/2001/XInclude">

<xi:include href="http://localhost:2020/sparql?query=CONSTRUCT+%7B+%3Fs+%3Fp+%3Fo+%7D+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D"/>

<!--xi:include href="properties.owl"/-->

</rdfcat>

• Short XSLT stylesheet reads listed resources, concatenates them together. Now we have RDF of sample data.

15

15

Generate ontology

• Tell SWOOP to load an ontology… then just load a regular RDF file!

• Save it right away, see what you have.• Add That Value:

• Define more relationships between properties with Swoop

• Save it• Look at the resulting ontology

16

16

New ontology rules

• Define equivalent fields in the two databases• Declare “phone” property, name its subproperties

(home, mobile, cell, work, business, fax…)• email as inverse function

17

17

Separate new rules into separate file

<rdfcat xmlns:xi="http://www.w3.org/2001/XInclude">

<xi:include href="http://localhost:2020/sparql?query=CONSTRUCT+%7B+%3Fs+%3Fp+%3Fo+%7D+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D"/>

<xi:include href="properties.owl"/>

</rdfcat>

18

18

Issue Queries

• Who works in NY state? • List any phone numbers (home, mobile,

business, etc.) that I have for Alfred Adams.• Find all info for Bobby Fischer at 2304 Eighth

Lane, even if other database lists him as Robert L. Fischer of 2304 8th Ln.

• Sample running of pellet query (split onto two lines):

pellet -if file:///dat/xml/rdf/databaseint/sampleout.rdf -ifmt RDF/XML -qf atest1.spq

19

19

Who works in NY state?

PREFIX e: <http://localhost:2020/resource/eudora/>PREFIX o: <http://localhost:2020/resource/outlook/>

SELECT * WHERE { ?s e:entries_workState "NY" }--------------------------------------------------------------Query Results (9 answers):s================jill:Jonessarah:Richardsonvictor:Hernandezelaine:Sanchezannie:Butlerrodney:Jonesjesus:Wellscurtis:Barnescrystal:Martin

20

20

Alfred Adams’ phone numbers

PREFIX e: <http://localhost:2020/resource/entries/>

SELECT ?phoneType ?phone WHERE {

?s ?phoneType ?phone.

?s e:phone ?phone.

?s eud:entries_lastName "Adams".

?s eud:entries_firstName "Alfred".

}

-------------------------------------------------------

Query Results (13 answers):

phoneType | phone

================================================

outlook:entries_businessPhone | "(768) 629-3639"

eudora:entries_workPhone | "(768) 629-3639"

eudora:entries_workFax | "(865) 937-1192"

eudora:entries_workMobile | "(262) 851-6276"

eudora:entries_otherPhone | "(840) 290-6143"

eudora:entries_mobile | "(257) 372-7719"

et cetera…

outlook:entries_mobilePhone | "(257) 372-7719"

21

21

Bobby Fischer info

SELECT * WHERE { <http://localhost:2020/resource/entries/Bobby/Fisher> ?p ?o }--------------------------------------------------------------------------------Query Results (41 answers):p | o

===============================================================================eudora:entries_mobile | "(989) 402-5141"eudora:entries_workWebAddress | "http://www.atmosenergy.com"outlook:entries_lastName | "Fisher"eudora:entries_firstName | "Bobby"eudora:entries_state | "NE"eudora:entries_zip | "29565-9670"outlook:entries_businessPhone | "(167) 559-3177"eudora:entries_lastName | "Fisher"eudora:entries_workCity | "El Paso"eudora:phone | "(974) 270-6457"# et cetera...eudora:entries_country | "US"eudora:entries_otherPhone | "(974) 270-6457"outlook:entries_mobilePhone | "(974) 270-6457"outlook:entries_homePhone | "(254) 133-8460"eudora:entries_workMobile | "(602) 997-9361"eudora:entries_workAddress | "3839 Maple Lane"eudora:entries_workOrganization | "Atmos Energy"eudora:entries_email1 | "[email protected]"eudora:entries_fullName | "Bobby Fisher"eudora:entries_workTitle | ""outlook:entries_businessState | "NE"outlook:entries_firstName | "Bobby"eudora:entries_city | "New York"

22

22

Caveats

• Querying disk file of full dump• Scaleable?

23

23

Improve the way you create, manage and distribute information

INNOVATION INSPIRATIONRelational database integration with

RDF/OWL

Bob DuCharme

December 7, 2006

XML 2006