46
Semantic Web Query Processing with Relational Databases Artem Chebotko [email protected] Department of Computer Science Wayne State University

Semantic Web Query Processing with Relational Databases

  • Upload
    kieve

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Semantic Web Query Processing with Relational Databases. Artem Chebotko [email protected] Department of Computer Science Wayne State University. Outline. The Semantic Web RDF SPARQL Relational Storage of RDF data SPARQL-to-SQL Translation Relational Nested Optional Join. - PowerPoint PPT Presentation

Citation preview

Page 1: Semantic Web Query Processing  with Relational Databases

Semantic Web Query Processing with Relational Databases

Artem [email protected]

Department of Computer ScienceWayne State University

Page 2: Semantic Web Query Processing  with Relational Databases

1/23/2007 2

Outline

The Semantic Web RDF SPARQL Relational Storage of RDF data SPARQL-to-SQL Translation Relational Nested Optional Join

Page 3: Semantic Web Query Processing  with Relational Databases

1/23/2007 3

Page 4: Semantic Web Query Processing  with Relational Databases

1/23/2007 4

My Web page as seen by a Human

Page 5: Semantic Web Query Processing  with Relational Databases

1/23/2007 5

My Web page as seen by a Computer

Page 6: Semantic Web Query Processing  with Relational Databases

1/23/2007 6

My Web page with Semantics <foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID">

<foaf:name>Artem Chebotko</foaf:name>

<foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" />

<foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" />

<foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/>

</foaf:Person>

Page 7: Semantic Web Query Processing  with Relational Databases

1/23/2007 7

The Semantic Web

A Web of data (vs. a Web of documents)

… machine-processable/readable data

Framework for integration and combination of data from various sources

Data reuse across application, organization, and community boundaries

Page 8: Semantic Web Query Processing  with Relational Databases

1/23/2007 8

The Semantic Web “Stack”

Page 9: Semantic Web Query Processing  with Relational Databases

1/23/2007 9

RDF

RDF (Resource Description Framework) provides a common framework for representing resources and relations among them. Anything can be a resource (e.g., a person, a file, etc).

RDF provides a data model and a syntax

<foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID">

<foaf:name>Artem Chebotko</foaf:name>

<foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" />

<foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" />

<foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/>

</foaf:Person>

Page 10: Semantic Web Query Processing  with Relational Databases

1/23/2007 10

RDF Model

RDF statement is a triple that consists of a subject, a predicate, and an object. foaf="http://xmlns.com/foaf/0.1/"

<foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID">

<foaf:name>Artem Chebotko</foaf:name>

<foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" />

<foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" />

<foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/>

</foaf:Person>

Page 11: Semantic Web Query Processing  with Relational Databases

1/23/2007 11

RDF Model

RDF’s graph model: RDF models statements as nodes and edges in a graph.

http://www.cs.wayne.edu/~artem/ID

http://www.cs.wayne.edu/~artem

http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg

http://www.cs.wayne.edu

Artem Chebotko

foaf:name

foaf:homepage foaf:img

foaf:workplaceHomepage

Page 12: Semantic Web Query Processing  with Relational Databases

1/23/2007 12

SPARQL

SPARQL is an RDF query language Graph pattern matching

Basic graph patterns, optional graph patterns, etc.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?url FROM <my-foaf-data.rdf>

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }

Query 1: Find the homepage URL of Artem Chebotko

Result 1: ?url is bound to the value “http://www.cs.wayne.edu/~artem”

?url

http://www.cs.wayne.edu/~artem

Page 13: Semantic Web Query Processing  with Relational Databases

1/23/2007 13

SPARQL

Query 2: Find both the homepage and weblog of Artem Chebotko

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?url ?log FROM <my-foaf-data.rdf>

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

?someone foaf:weblog ?log .}

Result 2: ?url and ?log are unbound

?url ?log

Page 14: Semantic Web Query Processing  with Relational Databases

1/23/2007 14

SPARQL

Query 3: Find (1) the homepage of Artem Chebotko and

(2) his weblog if this information is available

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?url ?log FROM <my-foaf-data.rdf>

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .}

}

Result 3: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound

?url ?log

http://www.cs.wayne.edu/~artem

Page 15: Semantic Web Query Processing  with Relational Databases

1/23/2007 15

SPARQL

Basic semantics of OPTIONAL patterns The evaluation of an OPTIONAL clause is not

obligated to succeed, and in the case of failure, no value will be returned for those unbound variables in the SELECT clause.

Semantics of shared variables In general, shared variables must be bound to the

same values. Variables can be shared among subjects, predicates, objects, and across each other.

More complicated semantics follows …

Page 16: Semantic Web Query Processing  with Relational Databases

1/23/2007 16

SPARQL

Semantics of parallel OPTIONAL patterns While the failure of the evaluation of an OPTIONAL

clause does not block the evaluation of a following parallel OPTIONAL clause, the success of the evaluation of an OPTIONAL clause obligates the same variables in the following parallel OPTIONAL clauses to be bound to the same values.

Page 17: Semantic Web Query Processing  with Relational Databases

1/23/2007 17

SPARQLQuery 4: Find (1) the homepage of Artem Chebotko and

(2) his weblog if this information is available

(3) his workplace homepage if this information is available

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?url ?log ?work FROM <my-foaf-data.rdf>

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .}

OPTIONAL { ?someone foaf:workplaceHomepage ?work .}

}

Result 4:

?url ?log ?work

http://www.cs.wayne.edu/~artem http://www.cs.wayne.edu

What if …

OPTIONAL { ?someone foaf:workplaceHomepage ?log .}

Page 18: Semantic Web Query Processing  with Relational Databases

1/23/2007 18

SPARQL

Semantics of nested OPTIONAL patterns Before an OPTIONAL clause is evaluated, all

containing basic graph patterns or OPTIONAL clauses must have succeeded.

Page 19: Semantic Web Query Processing  with Relational Databases

1/23/2007 19

SPARQLQuery 5: Find (1) the homepage of Artem Chebotko and

(2) his weblog if this information is available

(3) his workplace homepage if this information is available and weblog is

available

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?url ?log ?work FROM <my-foaf-data.rdf>

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .

OPTIONAL { ?someone foaf:workplaceHomepage ?work .}

}

}

Result 5: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound

?url ?log ?work

http://www.cs.wayne.edu/~artem

Page 20: Semantic Web Query Processing  with Relational Databases

1/23/2007 20

Relational Storage of RDF data

Increasing amount of RDF data on the Web highlights the need for its efficient and effective management.

Using relational database technology as a basis for storing and querying RDF data is a reasonable choice as this technology is well understood and known to have good performance.

Page 21: Semantic Web Query Processing  with Relational Databases

1/23/2007 21

Relational Storage of RDF data

The simplest oneTable Triples

More complicated (and more efficient) storage schemas are possible

subject predicate object

http://www.cs.wayne.edu/~artem/ID foaf:name Artem Chebotko

http://www.cs.wayne.edu/~artem/ID foaf:homepage http://www.cs.wayne.edu/~artem

http://www.cs.wayne.edu/~artem/ID foaf:img http://www.cs.wayne.edu/~artem/main/welcome/welc ome.jpg

http://www.cs.wayne.edu/~artem/ID foaf:workplaceHomepage

http://www.cs.wayne.edu

Page 22: Semantic Web Query Processing  with Relational Databases

1/23/2007 22

SPARQL-to-SQL Translation

Problem: Relational databases “know” SQL, but not SPARQL

Solution: translate SPARQL queries into equivalent SQL queries in order to access RDF data stored in a relational database Algorithm BGPtoSQL to translate a SPARQL basic

graph pattern to its SQL equivalent Algorithm SPARQLtoSQL to translate SPARQL

queries with arbitrary complex optional graph patterns

Page 23: Semantic Web Query Processing  with Relational Databases

1/23/2007 23

BGPtoSQL

Basic idea: Step 1:

Assign a unique table alias to every triple pattern E.g., t1 and t2 Construct the FROM clause to contain all the table

aliases

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }

FROM Triples t1, Triples t2

Page 24: Semantic Web Query Processing  with Relational Databases

1/23/2007 24

BGPtoSQL

Step 2: Construct the SELECT clause to contain every

relational attribute that corresponds to a distinct variable

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }

SELECT t1.subject AS someone, t2.object AS url

FROM Triples t1, Triples t2

Page 25: Semantic Web Query Processing  with Relational Databases

1/23/2007 25

BGPtoSQL

Step 3: Construct the WHERE clause to restrict attribute

values to the corresponding URIs and Literals

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }

SELECT t1.subject AS someone, t2.object AS url

FROM Triples t1, Triples t2

WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND

t2.predicate = ‘foaf:homepage’

Page 26: Semantic Web Query Processing  with Relational Databases

1/23/2007 26

BGPtoSQL

Step 4: Create an inverted list for variables

Finish the WHERE clause: attributes that correspond to shared variables must have same values)

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }

SELECT t1.subject AS someone, t2.object AS url

FROM Triples t1, Triples t2

WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND

t2.predicate = ‘foaf:homepage’ AND t1.subject = t2.subject

?someone t1.subject, t2.subject

?url t2.object

Page 27: Semantic Web Query Processing  with Relational Databases

1/23/2007 27

SPARQLtoSQL

Step 1: Translate all BGPs to SQL with BGPtoSQL. E.g., q1, q2, q3, q4

SELECT ?url ?log ?topic

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .

OPTIONAL { ?url foaf:topic ?topic .}

}

OPTIONAL { ?someone http://www.example.org/blog ?log .}

}

Page 28: Semantic Web Query Processing  with Relational Databases

1/23/2007 28

SPARQLtoSQL Step 2:

Join the ‘relations’ (q1, q2, q3, q4) in the order as their corresponding graph patterns appear in the query

LEFT OUTER JOIN

SELECT ?url ?log ?topic

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .

OPTIONAL { ?url foaf:topic ?topic .}

}

OPTIONAL { ?someone http://www.example.org/blog ?log .}

}

Q = SELECT r1.someone AS someone, r1.url AS url, r2.log AS log

FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)

Page 29: Semantic Web Query Processing  with Relational Databases

1/23/2007 29

SPARQLtoSQL

SELECT ?url ?log ?topic

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .

OPTIONAL { ?url foaf:topic ?topic .}

}

OPTIONAL { ?someone http://www.example.org/blog ?log .}

}

Q = SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic

FROM (Q) r11 LEFT OUTER JOIN (q3) r22 ON (

r11.url = r22.url AND r11.log IS NOT NULL)

Page 30: Semantic Web Query Processing  with Relational Databases

1/23/2007 30

SPARQLtoSQL

SELECT ?url ?log ?topic

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .

OPTIONAL { ?url foaf:topic ?topic .}

}

OPTIONAL { ?someone http://www.example.org/blog ?log .}

}

Q = SELECT r111.someone AS someone, r111.url AS url,

COALESCE(r111.log,r222.log) AS log, r111.topic AS topic

FROM (Q) r111 LEFT OUTER JOIN (q4) r222 ON (

r111.someone = r222.someone

AND (r111.log = r222.log OR r111.log IS NULL) )

Page 31: Semantic Web Query Processing  with Relational Databases

1/23/2007 31

SPARQLtoSQL

Step 3: Project only required attributes (variables)

SELECT ?url ?log ?topic

WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .

OPTIONAL { ?someone foaf:weblog ?log .

OPTIONAL { ?url foaf:topic ?topic .}

}

OPTIONAL { ?someone http://www.example.org/blog ?log .}

} }

SELECT r.url AS url, r.log AS log, r.topic AS topic

FROM (Q) r

Page 32: Semantic Web Query Processing  with Relational Databases

1/23/2007 32

SPARQLtoSQL

Almost complete query (need to replace q1, q2, q3, q4)SELECT r.url AS url, r.log AS log, r.topic AS topic

FROM (

SELECT r111.someone AS someone, r111.url AS url,

COALESCE(r111.log,r222.log) AS log, r111.topic AS topic

FROM (

SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic

FROM (

SELECT r1.someone AS someone, r1.url AS url, r2.log AS log

FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)

) r11 LEFT OUTER JOIN (q3) r22 ON (r11.url = r22.url AND r11.log IS NOT NULL)

) r111 LEFT OUTER JOIN (q4) r222 ON (

r111.someone = r222.someone

AND (r111.log = r222.log OR r111.log IS NULL) )

) r

Page 33: Semantic Web Query Processing  with Relational Databases

1/23/2007 33

Experimental Study

Dataset: WordNet, 700,000+ triples Translation algorithms are very efficient and scalable.

For example, SPARQLtoSQL translated queries with less than 50 OPTIONAL clauses with one triple pattern in each in less than 0.001 sec. regardless of the clause tree layout

The evaluation of most sample queries in Oracle showed to be unsatisfactory (order of seconds) due to the simple relational schema being the most important reason. Note that this does not imply that the algorithms are not

practical. SPARQLtoSQL does not directly depend on a particular database schema as long as the BGPtoSQL stub for the database is provided, which we believe is a reasonable expectation from existing RDF storage systems.

Page 34: Semantic Web Query Processing  with Relational Databases

1/23/2007 34

Experimental Study

The evaluation of sample queries in the in-memory relational database showed much better results. In these experiments, we were able to try different

implementations of the left outer join based on nested-loops, sort-merge and simple hash methods.

Page 35: Semantic Web Query Processing  with Relational Databases

1/23/2007 35

Relational Nested Optional Join

Page 36: Semantic Web Query Processing  with Relational Databases

1/23/2007 36

New Example

P r o f es s o r

T o m

J er r y

J ef f

G r ad S tu d en t

r d f :ty p eh as Ad v is o r

N ata lia

Inst

ance

Sche

ma

h as C o ad v is o r

h as Ad v is o r

h as C o ad v is o r

Page 37: Semantic Web Query Processing  with Relational Databases

1/23/2007 37

New Example

Retrieve: (1) every graduate student in the RDF graph; (2) the student's advisor if this information is available; (3) the student's coadvisor if this information is available and if the

student's advisor has been successfully retrieved in the previous step. In other words, the query returns students and as many advisors as

possible; there is no point to return a coadvisor if there is even no advisor for a student.

Page 38: Semantic Web Query Processing  with Relational Databases

1/23/2007 38

Motivation: Computation Waste with LOJ

R 1 R 2

R 1 .s tu = R 2 .s tu

R 3

R 4 .s tu = R 3 .s tu A N DR 4 .a dv IS N O T N U LL

stu

Je rry

Na ta lia

stu co adv

Je rry Je ff

Na ta lia T o m

stu ad v

Je rry T o m

Na ta lia NUL L

stu ad v co adv

Je rry T o m Je ff

Na ta lia NUL L NUL L

R 4

R re s

stu ad v

Je rry T o m

Page 39: Semantic Web Query Processing  with Relational Databases

1/23/2007 39

Nested Optional Join

A novel relational operator to translate nested optional patterns

An alternative to the left outer join Joins Twin Relations (base relation + optional relation)

A base relation: tuples that have a potential to satisfy a join condition if used in a nested optional join.

An optional relation: tuples that are guaranteed to fail a join condition if used in a nested optional join.

S b( )S oR b( )R o

Q b( )Q o

r sr ( a ) = s ( b ) ?

r nf a lse

r strue

r n

Page 40: Semantic Web Query Processing  with Relational Databases

1/23/2007 40

SPARQL-to-SQL Translation with NOJ

(R 1b ,R 1

o ) (R 2b ,R 2

o )

(R 1b,R 1

o).s tu = (R 2b,R 2

o).s tu(R 3

b ,R 3o )

(R 4b,R 4

o).s tu = (R 3b,R 3

o).s tu

stu

Je rry

Na ta lia

stu coadv

Je rry Je ff

Na ta lia T om

(R 4b ,R 4

o )

(R re sb ,R re s

o )

stustu adv

stu adv

Je rry T om

stu adv

Na ta lia NULL

stu coadv

stu adv coadv

Je rry T om Je ff

stu adv coadv

Na ta lia NULL NULL

( )

( )

)(

(

( )

)

stu adv

Je rry T om

Page 41: Semantic Web Query Processing  with Relational Databases

1/23/2007 41

Nested Optional Join

NOJ vs. LOJ the NOJ allows the processing of the tuples that are

guaranteed to be NULL padded very efficiently, in linear time

the NOJ does not require the NOT NULL check to return correct results

NOJ algorithms nested-loops NOJ algorithm NL-NOJ sort-merge NOJ algorithm SM-NOJ simple hash NOJ algorithm SH-NOJ.

Page 42: Semantic Web Query Processing  with Relational Databases

1/23/2007 42

Nested Optional Join

Queries with joins with low selectivity factors (<0.0002)

Page 43: Semantic Web Query Processing  with Relational Databases

1/23/2007 43

Nested Optional Join

for in-memory evaluation: JSF <= 0.005, SH-NOJ JSF >= 0.8, NL-NOJ 0.005 < JSF < 0.8, SM-NOJ

Page 44: Semantic Web Query Processing  with Relational Databases

1/23/2007 44

Possible Future Work

Extending our work to support other SPARQL constructs, such as UNION, FILTER, etc.

Adding intelligence to our SPARQL-to-SQL translation to support the nested optional join.

Investigating possible optimizations of parallel optional graph patterns.

Defining the relational algebra for SPARQL with the support of nested and parallel optional joins.

… and more

Page 45: Semantic Web Query Processing  with Relational Databases

1/23/2007 45

References

Artem Chebotko, Mustafa Atay, Shiyong Lu and Farshad Fotouhi "Extending Relational Databases with a Nested Optional Join for Efficient Semantic Web Query Processing". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, November, 2006. Download

Artem Chebotko, Shiyong Lu, Hasan M. Jamil and Farshad Fotouhi "Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, May, 2006. Download

Page 46: Semantic Web Query Processing  with Relational Databases

1/23/2007 46

Acknowledgements

Dr. Shiyong Lu, Dr. Farshad Fotouhi, Dr. Hasan Jamil, Dr. Mustafa Atay,

Oracle DBA Shwetal Joshi

Questions?

Thank you!