56
. . Ontop at Work Mariano Rodríguez-Muro 1 , Roman Kontchakov 2 Michael Zakharyaschev 2 1 Faculty of Computer Science, Free University of Bozen-Bolzano, Italy 2 Department of Computer Science and Information Systems, Birkbeck, University of London, U.K. May 22th, 2013

OXFORD'13 Optimising OWL 2 QL query rewriring

Embed Size (px)

DESCRIPTION

OXFORD 2013, Presentation on the query rewriting approach taken in ontop/Quest. Separating reasoning with respect to hierarchies and existential constants using mapping transformation techniques and a specialised query rewriting algorithm

Citation preview

Page 1: OXFORD'13 Optimising OWL 2 QL query rewriring

..

..

Ontop at Work

Mariano Rodríguez-Muro1,Roman Kontchakov2

Michael Zakharyaschev2

1 Faculty of Computer Science, FreeUniversity of Bozen-Bolzano, Italy2 Department of Computer Science

and Information Systems,Birkbeck, University of London, U.K.

May 22th, 2013

Page 2: OXFORD'13 Optimising OWL 2 QL query rewriring

...

OBDA: What is it?

.Loosely speaking.....

.Using ontologies to access of data.

(Virtual) ABox

UserQuery Ontology

(TBox)

Mappings

OBDA System

RBMSData source

Our focus are OWL 2 QL ontologies, since they are tailored tohandle very large amounts of data by means of query rewritingtechniques.

Ontop at Work 2 / 29

Page 3: OXFORD'13 Optimising OWL 2 QL query rewriring

...

OBDA: What is it?

.Loosely speaking.....

.Using ontologies to access of data.

(Virtual) ABox

UserQuery Ontology

(TBox)

Mappings

OBDA System

RBMSData source

Our focus are OWL 2 QL ontologies, since they are tailored tohandle very large amounts of data by means of query rewritingtechniques.

Ontop at Work 2 / 29

Page 4: OXFORD'13 Optimising OWL 2 QL query rewriring

...

OBDA: What is it?

.Loosely speaking.....

.Using ontologies to access of data.

(Virtual) ABox

UserQuery Ontology

(TBox)

Mappings

OBDA System

RBMSData source

Our focus are OWL 2 QL ontologies, since they are tailored tohandle very large amounts of data by means of query rewritingtechniques.

Ontop at Work 2 / 29

Page 5: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Query Answering by Query rewriting

.Objective..

.Given a query Q over the ontology T derive a query Q′

over the database D that preserves the semantics of T.

.

.

Consider a TBox T

Movie ≡ ∃title, Movie ⊑ ∃year,Movie ≡ ∃cast, ∃cast− ⊑ PersonActor ⊑ Person Actress ⊑ Person,

Producer ⊑ Person, Director ⊑ Person,Writer ⊑ Person, Editor ⊑ Person.

Ontop at Work 3 / 29

Page 6: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Query Answering by Query rewriting

.Objective..

.Given a query Q over the ontology T derive a query Q′

over the database D that preserves the semantics of T..

.

Consider a TBox T

Movie ≡ ∃title, Movie ⊑ ∃year,Movie ≡ ∃cast, ∃cast− ⊑ PersonActor ⊑ Person Actress ⊑ Person,

Producer ⊑ Person, Director ⊑ Person,Writer ⊑ Person, Editor ⊑ Person.

Ontop at Work 3 / 29

Page 7: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Example

.

.

The Database D: Two DB relations title[m, t, y] andcastinfo[p,m, r].

The mapping M (logical form, think R2RML):

Movie(m)← title(m, t, y), title(m, t)← title(m, t, y),year(m, y)← title(m, t, y), cast(m, p)← castinfo(p,m, r),Person(p)← castinfo(p,m, r),Actor(p)← castinfo(p,m, ”c1”) · · ·Editor(p)← castinfo(p,m, ”c6”).

Ontop at Work 4 / 29

Page 8: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Example

.

.

The Database D: Two DB relations title[m, t, y] andcastinfo[p,m, r].

The mapping M (logical form, think R2RML):

Movie(m)← title(m, t, y), title(m, t)← title(m, t, y),year(m, y)← title(m, t, y), cast(m, p)← castinfo(p,m, r),Person(p)← castinfo(p,m, r),Actor(p)← castinfo(p,m, ”c1”) · · ·Editor(p)← castinfo(p,m, ”c6”).

Ontop at Work 4 / 29

Page 9: OXFORD'13 Optimising OWL 2 QL query rewriring

...

The classic OBDA architecture

..CQ q .

ontology T

. FO q′.

mapping

. SQL.

data D

.

ABox A

.+.rewriting

. +.unfolding

.

+

.

ABox virtualisation

Stages in the classic OBDA approach:. Rewriting w.r.t. T,. Unfolding w.r.t. M,. Execution over D.

.

.Unfolding and Mappings are ignored in most OBDA literature

Ontop at Work 5 / 29

Page 10: OXFORD'13 Optimising OWL 2 QL query rewriring

...

The classic OBDA architecture

..CQ q .

ontology T

. FO q′.

mapping

. SQL.

data D

.

ABox A

.+.rewriting

. +.unfolding

.

+

.

ABox virtualisation

Stages in the classic OBDA approach:. Rewriting w.r.t. T,. Unfolding w.r.t. M,. Execution over D.

.

.Unfolding and Mappings are ignored in most OBDA literatureOntop at Work 5 / 29

Page 11: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Example: Rewriting

Given the query Qq(x)← Person(x)

Gives the rewriting

q(x)← Person(x)q(x)← cast(z, x)q(x)← Actor(x). . .

q(x)← Editor(x)

Ontop at Work 6 / 29

Page 12: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Example: Unfolding

Given the query Qq(x)← Person(x)

Gives the rewriting

q(x1)← castinfo(x1,m, r)q(x2)← castinfo(x2,m, r)q(x3)← castinfo(x3,m, ”c1”)

. . .

q(x8)← castinfo(x8,m, ”c6”)

Ontop at Work 7 / 29

Page 13: OXFORD'13 Optimising OWL 2 QL query rewriring

...

IssuesThe issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

In the literature we find two solutions:. Encoding the rewriting as a Datalog program. For example,

given the query:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

we generate the rewriting:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

Person(x)← cast(m, x)Person(x)← Actor(x)

. . .

Person(x)← Edtior(x)

Ontop at Work 8 / 29

Page 14: OXFORD'13 Optimising OWL 2 QL query rewriring

...

IssuesThe issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

In the literature we find two solutions:. Encoding the rewriting as a Datalog program. For example,

given the query:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

we generate the rewriting:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

Person(x)← cast(m, x)Person(x)← Actor(x)

. . .

Person(x)← Edtior(x)

Ontop at Work 8 / 29

Page 15: OXFORD'13 Optimising OWL 2 QL query rewriring

...

IssuesThe issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

In the literature we find two solutions:. Encoding the rewriting as a Datalog program. For example,

given the query:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

we generate the rewriting:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

Person(x)← cast(m, x)Person(x)← Actor(x)

. . .

Person(x)← Edtior(x)Ontop at Work 8 / 29

Page 16: OXFORD'13 Optimising OWL 2 QL query rewriring

...

IssuesThe issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

In the literature we find two solutions:. Encoding the rewriting as a Datalog program.

.But.....

.

The query still needs to be unfolded into an SQL query. There aretwo choices here:

. Generate SQL queries with nested UNIONs. Very bad forperformance.

. Expand into a UCQ. Back to square 1.

Ontop at Work 9 / 29

Page 17: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Issues (cont.)The issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

. Using Query Containment to clean the output. For example,to detect that this:

q(x1)← castinfo(x1,m, r)q(x2)← castinfo(x2,m, r)q(x3)← castinfo(x3,m, ”c1”)

. . .

q(x8)← castinfo(x8,m, ”c6”)

can be simplified to

q(x1)← castinfo(x1,m, r)

Ontop at Work 10 / 29

Page 18: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Issues (cont.)The issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

. Using Query Containment to clean the output.

For example,to detect that this:

q(x1)← castinfo(x1,m, r)q(x2)← castinfo(x2,m, r)q(x3)← castinfo(x3,m, ”c1”)

. . .

q(x8)← castinfo(x8,m, ”c6”)

can be simplified to

q(x1)← castinfo(x1,m, r)

Ontop at Work 10 / 29

Page 19: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Issues (cont.)The issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

. Using Query Containment to clean the output. For example,to detect that this:

q(x1)← castinfo(x1,m, r)q(x2)← castinfo(x2,m, r)q(x3)← castinfo(x3,m, ”c1”)

. . .

q(x8)← castinfo(x8,m, ”c6”)

can be simplified to

q(x1)← castinfo(x1,m, r)

Ontop at Work 10 / 29

Page 20: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Issues (cont.)The issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

. Using Query Containment to clean the output. For example,to detect that this:

q(x1)← castinfo(x1,m, r)q(x2)← castinfo(x2,m, r)q(x3)← castinfo(x3,m, ”c1”)

. . .

q(x8)← castinfo(x8,m, ”c6”)

can be simplified to

q(x1)← castinfo(x1,m, r)

Ontop at Work 10 / 29

Page 21: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Issues (cont.)The issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

. Using Query Containment to clean the output..But.....

.

. Query containment is an extremely expensive operation.

. We are working with large sets of queries.

Ontop at Work 11 / 29

Page 22: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Roots of the problem

There are 3 main reasons for large CQ rewritings and unfoldings:

(E) Sub-queries of q with existentially quantified variablescan be folded in many different ways to match thecanonical model (existential trees), e.g.,

Person ⊑ ∃hasFather.Person

and the query

q(x)← hasFather(x, y), hasFather(y, z)

(H) The concepts and roles for atoms in q can have manysub-concepts and sub-roles according to T,

(M) The mapping M can have multiple definitions of theontology terms,

Most of the proposed rewriting techniques try to tame (E).

Ontop at Work 12 / 29

Page 23: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Roots of the problem

There are 3 main reasons for large CQ rewritings and unfoldings:(E) Sub-queries of q with existentially quantified variables

can be folded in many different ways to match thecanonical model (existential trees), e.g.,

Person ⊑ ∃hasFather.Person

and the query

q(x)← hasFather(x, y), hasFather(y, z)

(H) The concepts and roles for atoms in q can have manysub-concepts and sub-roles according to T,

(M) The mapping M can have multiple definitions of theontology terms,

Most of the proposed rewriting techniques try to tame (E).

Ontop at Work 12 / 29

Page 24: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Roots of the problem

There are 3 main reasons for large CQ rewritings and unfoldings:(E) Sub-queries of q with existentially quantified variables

can be folded in many different ways to match thecanonical model (existential trees), e.g.,

Person ⊑ ∃hasFather.Person

and the query

q(x)← hasFather(x, y), hasFather(y, z)

(H) The concepts and roles for atoms in q can have manysub-concepts and sub-roles according to T,

(M) The mapping M can have multiple definitions of theontology terms,

Most of the proposed rewriting techniques try to tame (E).

Ontop at Work 12 / 29

Page 25: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Roots of the problem

There are 3 main reasons for large CQ rewritings and unfoldings:(E) Sub-queries of q with existentially quantified variables

can be folded in many different ways to match thecanonical model (existential trees), e.g.,

Person ⊑ ∃hasFather.Person

and the query

q(x)← hasFather(x, y), hasFather(y, z)

(H) The concepts and roles for atoms in q can have manysub-concepts and sub-roles according to T,

(M) The mapping M can have multiple definitions of theontology terms,

Most of the proposed rewriting techniques try to tame (E).Ontop at Work 12 / 29

Page 26: OXFORD'13 Optimising OWL 2 QL query rewriring

...

More about (E)

More about (E). it is in theory incurable. it is independent of (H) and (M)

However. Rewriting algorithms deal with (E) and (H) at the same time. Real-world Qs and T’s generate few queries when dealing with

(E) in isolation.. Even artificially constructed Qs and T’s become simple.

.

.The strongest issues in query rewriting are (H) and (M)

In Ontop we deal with (H) and (M) separately from (E). We do itthrough T-mappings and TreeWitness rewritings.

Ontop at Work 13 / 29

Page 27: OXFORD'13 Optimising OWL 2 QL query rewriring

...

More about (E)

More about (E). it is in theory incurable. it is independent of (H) and (M)

However. Rewriting algorithms deal with (E) and (H) at the same time. Real-world Qs and T’s generate few queries when dealing with

(E) in isolation.. Even artificially constructed Qs and T’s become simple.

.

.The strongest issues in query rewriting are (H) and (M)

In Ontop we deal with (H) and (M) separately from (E). We do itthrough T-mappings and TreeWitness rewritings.

Ontop at Work 13 / 29

Page 28: OXFORD'13 Optimising OWL 2 QL query rewriring

...

More about (E)

More about (E). it is in theory incurable. it is independent of (H) and (M)

However. Rewriting algorithms deal with (E) and (H) at the same time. Real-world Qs and T’s generate few queries when dealing with

(E) in isolation.. Even artificially constructed Qs and T’s become simple.

.

.The strongest issues in query rewriting are (H) and (M)

In Ontop we deal with (H) and (M) separately from (E). We do itthrough T-mappings and TreeWitness rewritings.

Ontop at Work 13 / 29

Page 29: OXFORD'13 Optimising OWL 2 QL query rewriring

...

More about (E)

More about (E). it is in theory incurable. it is independent of (H) and (M)

However. Rewriting algorithms deal with (E) and (H) at the same time. Real-world Qs and T’s generate few queries when dealing with

(E) in isolation.. Even artificially constructed Qs and T’s become simple.

.

.The strongest issues in query rewriting are (H) and (M)

In Ontop we deal with (H) and (M) separately from (E). We do itthrough T-mappings and TreeWitness rewritings.

Ontop at Work 13 / 29

Page 30: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Dealing with (H) and (M): T-MappingsA T-mapping MT is a transformation of M that enforces all (H)entailments (H-completeness), formally,

M |= A(c) and T |= A ⊑ B→ MT |= B(c)

.T-mapping example 1..

.

Consider two DB relations title[m, t, y] and castinfo[p,m, r] and anontology MO describing the film domain as follows:

Movie ≡ ∃cast

Let M be the following mappings:

Movie(m)← title(m, t, y),cast(m, p)← castinfo(p,m, r).

Ontop at Work 14 / 29

Page 31: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Dealing with (H) and (M): T-MappingsA T-mapping MT is a transformation of M that enforces all (H)entailments (H-completeness), formally,

M |= A(c) and T |= A ⊑ B→ MT |= B(c).T-mapping example 1..

.

Consider two DB relations title[m, t, y] and castinfo[p,m, r] and anontology MO describing the film domain as follows:

Movie ≡ ∃cast

Let M be the following mappings:

Movie(m)← title(m, t, y),cast(m, p)← castinfo(p,m, r).

Ontop at Work 14 / 29

Page 32: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Dealing with (H) and (M): T-MappingsA T-mapping MT is a transformation of M that enforces all (H)entailments (H-completeness), formally,

M |= A(c) and T |= A ⊑ B→ MT |= B(c).T-mapping example 1 (domain/range)..

.

Consider two DB relations title[m, t, y] and castinfo[p,m, r] and anontology MO describing the film domain as follows:

Movie ≡ ∃cast

Let M be the following mappings:

Movie(m)← title(m, t, y),cast(m, p)← castinfo(p,m, r).Movie(m)← castinfo(p,m, r).

Ontop at Work 15 / 29

Page 33: OXFORD'13 Optimising OWL 2 QL query rewriring

...

T-Mappings: Example 2

.T-mappings example 2 (hierarchies)..

.

Consider a TBox T

Actor ⊑ Person Actress ⊑ Person,Producer ⊑ Person, Director ⊑ Person,

Writer ⊑ Person, Editor ⊑ Person.

The mapping M:

Actor(p)← castinfo(p,m, ”c1”) · · ·Editor(p)← castinfo(p,m, ”c6”).

Ontop at Work 16 / 29

Page 34: OXFORD'13 Optimising OWL 2 QL query rewriring

...

T-Mappings: Example 2

.T-mappings example 2 (hierarchies)..

.

Consider a TBox T

Actor ⊑ Person Actress ⊑ Person,Producer ⊑ Person, Director ⊑ Person,

Writer ⊑ Person, Editor ⊑ Person.

The mapping M:

Person(p)← castinfo(p,m, ”c1”) · · ·Person(p)← castinfo(p,m, ”c6”).

Ontop at Work 17 / 29

Page 35: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising T-mappings

.

.

The objective of T-mapping allow to deal with hierarchical reasoning(H) at the level of the unfolding. At this point, we can exploit

. DB dependencies and

. SQL expressivity to reduce and often the exponential growthcoming form (H) and (M).

Ontop at Work 18 / 29

Page 36: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with Dependencies

A first optimisation is Query Containment (w.r.t. dependencies)

.Example..

.

Consider the previous example, since T |= ∃cast ⊑ Movie, theT-mapping contains:

Movie(m) ← title(m, t, y),Movie(m) ← castinfo(p,m, r).

The latter rule is redundant since IMDb contains the foreign key

title(m, t, y)⇝ title(p,m, r)

This step is crucial to reduce the growth due to inferences related todomain and range.

Ontop at Work 19 / 29

Page 37: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with Dependencies

A first optimisation is Query Containment (w.r.t. dependencies).Example..

.

Consider the previous example, since T |= ∃cast ⊑ Movie, theT-mapping contains:

Movie(m) ← title(m, t, y),Movie(m) ← castinfo(p,m, r).

The latter rule is redundant since IMDb contains the foreign key

title(m, t, y)⇝ title(p,m, r)

This step is crucial to reduce the growth due to inferences related todomain and range.

Ontop at Work 19 / 29

Page 38: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with Dependencies

A first optimisation is Query Containment (w.r.t. dependencies).Example..

.

Consider the previous example, since T |= ∃cast ⊑ Movie, theT-mapping contains:

Movie(m) ← title(m, t, y),Movie(m) ← castinfo(p,m, r).

The latter rule is redundant since IMDb contains the foreign key

title(m, t, y)⇝ title(p,m, r)

This step is crucial to reduce the growth due to inferences related todomain and range.

Ontop at Work 19 / 29

Page 39: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with SQL expressivity

Observation. The only means for perfect reformulations to dealwith (H) is through disjunction (UNION). DBMS are not goodplanning UNIONs.

However, At the level of the unfolding and mappings, we have fullSQL expressivity (e.g., Disjunction (OR), inequalities, etc.).

.Objective..

.

Given a T-mapping, define mapping transformations thatentail the same ABox using less mappings while ensuringthat the encoding used is efficient during execution.

Ontop at Work 20 / 29

Page 40: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with SQL expressivity

Observation. The only means for perfect reformulations to dealwith (H) is through disjunction (UNION). DBMS are not goodplanning UNIONs.

However, At the level of the unfolding and mappings, we have fullSQL expressivity (e.g., Disjunction (OR), inequalities, etc.).

.Objective..

.

Given a T-mapping, define mapping transformations thatentail the same ABox using less mappings while ensuringthat the encoding used is efficient during execution.

Ontop at Work 20 / 29

Page 41: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with SQL expressivity

Observation. The only means for perfect reformulations to dealwith (H) is through disjunction (UNION). DBMS are not goodplanning UNIONs.

However, At the level of the unfolding and mappings, we have fullSQL expressivity (e.g., Disjunction (OR), inequalities, etc.).

.Objective..

.

Given a T-mapping, define mapping transformations thatentail the same ABox using less mappings while ensuringthat the encoding used is efficient during execution.

Ontop at Work 20 / 29

Page 42: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with SQL expressivity

Observation. The only means for perfect reformulations to dealwith (H) is through disjunction (UNION). DBMS are not goodplanning UNIONs.

However, At the level of the unfolding and mappings, we have fullSQL expressivity (e.g., Disjunction (OR), inequalities, etc.).

.Objective..

.

Given a T-mapping, define mapping transformations thatentail the same ABox using less mappings while ensuringthat the encoding used is efficient during execution.

Ontop at Work 20 / 29

Page 43: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with SQL expressivity

Use OR and inequalities to re-express mappings for hierarchies anddiscriminant columns.

.Dealing with discriminant columns..

.

For example, the mapping M for IMDb and MO contains six rulesfor sub-concepts of Person:

Person(p)← castinfo(p,m, ”c1”)· · ·

Person(p)← castinfo(p,m, ”c6”)

These can be reduced to a single rule:

Person(p)← castinfo(c, p,m, r), (r = c1) ∨ · · · ∨ (r = c6).

Ontop at Work 21 / 29

Page 44: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with SQL expressivity

Use OR and inequalities to re-express mappings for hierarchies anddiscriminant columns..Dealing with discriminant columns..

.

For example, the mapping M for IMDb and MO contains six rulesfor sub-concepts of Person:

Person(p)← castinfo(p,m, ”c1”)· · ·

Person(p)← castinfo(p,m, ”c6”)

These can be reduced to a single rule:

Person(p)← castinfo(c, p,m, r), (r = c1) ∨ · · · ∨ (r = c6).

Ontop at Work 21 / 29

Page 45: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Optimising with SQL expressivity

Use OR and inequalities to re-express mappings for hierarchies anddiscriminant columns..Dealing with discriminant columns..

.

For example, the mapping M for IMDb and MO contains six rulesfor sub-concepts of Person:

Person(p)← castinfo(p,m, ”c1”)· · ·

Person(p)← castinfo(p,m, ”c6”)

These can be reduced to a single rule:

Person(p)← castinfo(c, p,m, r), (r = c1) ∨ · · · ∨ (r = c6).

Ontop at Work 21 / 29

Page 46: OXFORD'13 Optimising OWL 2 QL query rewriring

...

The architecture of Ontop

..CQ q .

ontology T

. UCQ qtw.

T-mapping

.

mapping M

.

dependencies Σ

. SQL.

data D

.

ABox A

.

H-complete ABox A

.+ .tw-rewriting

. +.unfolding

.

+

.

ABox virtualisation

.

+

.

ABox virtualisation

.

+

.

ABox completion

.

+

.completion

.

SQO

.

SQO

.

Highlights: (H) and (M) dealt with T-mappings, rewriting for(H)-complete ABoxes, extensive use of SQO over the unfolding.

Ontop at Work 22 / 29

Page 47: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Other Optimisations in Ontop

We also apply other important optimisations during system setupand at query time, the most important:Equivalence Simplification Simplify the ontology vocabulary w.r.t.

equivalence (keep one representative of eachequivalence class).

Semantic Query Optimisation Optimise each query generatedindividually... see next slides.

Emptiness indexes Keeping track of empty predicates

Ontop at Work 23 / 29

Page 48: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Results

A summary of the results we have observed using this architecture:. Mappings per class/property are few. Query rewritings are small. SQL queries generated like this often correspond to what a

human expert would have generated.. Query execution of SPARQL with entailments is fast, often

much faster than in triple stores...Query rewriting can be done efficiently

Ontop at Work 24 / 29

Page 49: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Benchmarks

0.1  

1  

10  

100  

1000  

10000  

100000  

1000000  

R1   R2   R3   R4   R5   Q1   Q2   Q3   Q4   Q5   Q6   V7   V8   V9   V10  

OWLIM  

STARDOG  

ONTOP  

Benchmark: LUBMex, 200 Unis (30M triples). Systems: OWLIM(forward chaining), Stardog (rewriting), Ontop/DB2.Ontop/DB2

. returns immediately for 5/15 queries,

. faster than the rest in 12/15 queriesOntop at Work 25 / 29

Page 50: OXFORD'13 Optimising OWL 2 QL query rewriring

...

SummaryResults so far

. Efficiently dealt with exponential growth from (H) and (M)

. Use of dependencies and CQC/SQO to minimise and optimisemapping rules

. We exploit SQL expressivity to transform mappings to minimizethe number of mappings.

.

.OWL 2 QL query answering with query rewriting is efficient andmaterialisation is not required.

Ontop is available as an SPARQL end-point, OWLAPI andSesame library, and Protege 4 plugin. Many more features(SPARQL, R2RML). Permanently under-development, however,stable enough to be used seriously in many projects, incl. Optique.Current work is applying these techniques to more expressivesettings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through anhybrid approach.

Ontop at Work 26 / 29

Page 51: OXFORD'13 Optimising OWL 2 QL query rewriring

...

SummaryResults so far

. Efficiently dealt with exponential growth from (H) and (M)

. Use of dependencies and CQC/SQO to minimise and optimisemapping rules

. We exploit SQL expressivity to transform mappings to minimizethe number of mappings.

.

.OWL 2 QL query answering with query rewriting is efficient andmaterialisation is not required.

Ontop is available as an SPARQL end-point, OWLAPI andSesame library, and Protege 4 plugin. Many more features(SPARQL, R2RML). Permanently under-development, however,stable enough to be used seriously in many projects, incl. Optique.Current work is applying these techniques to more expressivesettings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through anhybrid approach.

Ontop at Work 26 / 29

Page 52: OXFORD'13 Optimising OWL 2 QL query rewriring

...

SummaryResults so far

. Efficiently dealt with exponential growth from (H) and (M)

. Use of dependencies and CQC/SQO to minimise and optimisemapping rules

. We exploit SQL expressivity to transform mappings to minimizethe number of mappings.

.

.OWL 2 QL query answering with query rewriting is efficient andmaterialisation is not required.

Ontop is available as an SPARQL end-point, OWLAPI andSesame library, and Protege 4 plugin. Many more features(SPARQL, R2RML). Permanently under-development, however,stable enough to be used seriously in many projects, incl. Optique.

Current work is applying these techniques to more expressivesettings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through anhybrid approach.

Ontop at Work 26 / 29

Page 53: OXFORD'13 Optimising OWL 2 QL query rewriring

...

SummaryResults so far

. Efficiently dealt with exponential growth from (H) and (M)

. Use of dependencies and CQC/SQO to minimise and optimisemapping rules

. We exploit SQL expressivity to transform mappings to minimizethe number of mappings.

.

.OWL 2 QL query answering with query rewriting is efficient andmaterialisation is not required.

Ontop is available as an SPARQL end-point, OWLAPI andSesame library, and Protege 4 plugin. Many more features(SPARQL, R2RML). Permanently under-development, however,stable enough to be used seriously in many projects, incl. Optique.Current work is applying these techniques to more expressivesettings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through anhybrid approach.

Ontop at Work 26 / 29

Page 54: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Semantic Query Optimisation

Consider the query

q(t, y)← Movie(m), title(m, t), year(m, y), (y > 2010)

By straightforwardly applying the unfolding to qtw and theT-mapping M above, we obtain the query

q′tw(t, y)← title(m, t0, y0), title(m, t, y1), title(m, t2, y), (y > 2010),

which requires two (potentially) expensive Join operations.However, by using the primary key m of title we obtain:

q′′tw(t, y)← title(m, t, y), (y > 2010).

Page 55: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Semantic Query Optmization

Semantic Query Optimisation (SQO) is a field from DB theoryfocused on optimisation of queries w.r.t. dependencies.Semantic Query Optimisations in DB and OBDA

. While some of SQO techniques reached industrial RDBMSs,it never had a strong impact on the database community.

. In OBDA, in contrast, SQL queries are generatedautomatically, and so SQO is the only tools to reach optimalqueries.

.

.

In practice, an OBDA system must implement at least SQOw.r.t. primary keys and foreign keys to deal with the disparitiesbetween RDF and relational.

Page 56: OXFORD'13 Optimising OWL 2 QL query rewriring

...

Why does it work?

DBs are created through standard practices that generate featuresthat are the focus of the previous optimisations.Starting from a rich conceptual schema, we encode it in a relationalschema by:

– amalgamating N-to-1 and 1-to-1 attributes of an entity to asingle n-ary relation with a primary key identifying the entity(e.g., title with title and year),

– using foreign keys over attribute columns when a column refersto the entity (e.g., name and castinfo),

– using type-discriminant columns to encode hierarchicalinformation (e.g., castinfo).

As this process is universal, the T-mappings created for the resultingdatabases are dramatically simplified by the Ontop optimisations