OXFORD'13 Optimising OWL 2 QL query rewriring

..

..

Ontop at Work

Mariano Rodríguez-Muro1,Roman Kontchakov2

Michael Zakharyaschev2

1 Faculty of Computer Science, FreeUniversity of Bozen-Bolzano, Italy2 Department of Computer Science

and Information Systems,Birkbeck, University of London, U.K.

May 22th, 2013

...

OBDA: What is it?

.Loosely speaking.....

.Using ontologies to access of data.

(Virtual) ABox

UserQuery Ontology

(TBox)

Mappings

OBDA System

RBMSData source

Our focus are OWL 2 QL ontologies, since they are tailored tohandle very large amounts of data by means of query rewritingtechniques.

Ontop at Work 2 / 29

...

OBDA: What is it?



(Virtual) ABox

UserQuery Ontology

(TBox)

Mappings

OBDA System

RBMSData source



...

OBDA: What is it?



(Virtual) ABox

UserQuery Ontology

(TBox)

Mappings

OBDA System

RBMSData source



...

Query Answering by Query rewriting

.Objective..

.Given a query Q over the ontology T derive a query Q′

over the database D that preserves the semantics of T.

.

.

Consider a TBox T

Movie ≡ ∃title, Movie ⊑ ∃year,Movie ≡ ∃cast, ∃cast− ⊑ PersonActor ⊑ Person Actress ⊑ Person,

Producer ⊑ Person, Director ⊑ Person,Writer ⊑ Person, Editor ⊑ Person.


...

Query Answering by Query rewriting

.Objective..

.Given a query Q over the ontology T derive a query Q′

over the database D that preserves the semantics of T..

.

Consider a TBox T

Movie ≡ ∃title, Movie ⊑ ∃year,Movie ≡ ∃cast, ∃cast− ⊑ PersonActor ⊑ Person Actress ⊑ Person,

Producer ⊑ Person, Director ⊑ Person,Writer ⊑ Person, Editor ⊑ Person.


...

Example

.

.

The Database D: Two DB relations title[m, t, y] andcastinfo[p,m, r].

The mapping M (logical form, think R2RML):

Movie(m)← title(m, t, y), title(m, t)← title(m, t, y),year(m, y)← title(m, t, y), cast(m, p)← castinfo(p,m, r),Person(p)← castinfo(p,m, r),Actor(p)← castinfo(p,m, ”c1”) · · ·Editor(p)← castinfo(p,m, ”c6”).


...

Example

.

.

The Database D: Two DB relations title[m, t, y] andcastinfo[p,m, r].

The mapping M (logical form, think R2RML):

Movie(m)← title(m, t, y), title(m, t)← title(m, t, y),year(m, y)← title(m, t, y), cast(m, p)← castinfo(p,m, r),Person(p)← castinfo(p,m, r),Actor(p)← castinfo(p,m, ”c1”) · · ·Editor(p)← castinfo(p,m, ”c6”).


...

The classic OBDA architecture

..CQ q .

ontology T

. FO q′.

mapping

. SQL.

data D

.

ABox A

.+.rewriting

. +.unfolding

.

+

.

ABox virtualisation

Stages in the classic OBDA approach:. Rewriting w.r.t. T,. Unfolding w.r.t. M,. Execution over D.

.

.Unfolding and Mappings are ignored in most OBDA literature


...

The classic OBDA architecture

..CQ q .

ontology T

. FO q′.

mapping

. SQL.

data D

.

ABox A

.+.rewriting

. +.unfolding

.

+

.

ABox virtualisation

Stages in the classic OBDA approach:. Rewriting w.r.t. T,. Unfolding w.r.t. M,. Execution over D.

.

.Unfolding and Mappings are ignored in most OBDA literatureOntop at Work 5 / 29

...

Example: Rewriting

Given the query Qq(x)← Person(x)

Gives the rewriting

q(x)← Person(x)q(x)← cast(z, x)q(x)← Actor(x). . .

q(x)← Editor(x)


...

Example: Unfolding

Given the query Qq(x)← Person(x)

Gives the rewriting

q(x1)← castinfo(x1,m, r)q(x2)← castinfo(x2,m, r)q(x3)← castinfo(x3,m, ”c1”)

. . .

q(x8)← castinfo(x8,m, ”c6”)


...

IssuesThe issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)

. Largely redundant (w.r.t. query containment)

In the literature we find two solutions:. Encoding the rewriting as a Datalog program. For example,

given the query:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

we generate the rewriting:q(x, y)← Person(x),Person(y), cast(m, x), cast(m, z)

Person(x)← cast(m, x)Person(x)← Actor(x)

. . .

Person(x)← Edtior(x)


...


. Large size (n1 ∗ . . . ∗ n2)






. . .

Person(x)← Edtior(x)


...


. Large size (n1 ∗ . . . ∗ n2)






. . .

Person(x)← Edtior(x)Ontop at Work 8 / 29

...


. Large size (n1 ∗ . . . ∗ n2)


In the literature we find two solutions:. Encoding the rewriting as a Datalog program.

.But.....

.

The query still needs to be unfolded into an SQL query. There aretwo choices here:

. Generate SQL queries with nested UNIONs. Very bad forperformance.

. Expand into a UCQ. Back to square 1.


...

Issues (cont.)The issues with these rewritings are:

. Large size (n1 ∗ . . . ∗ n2)


. Using Query Containment to clean the output. For example,to detect that this:


. . .


can be simplified to

q(x1)← castinfo(x1,m, r)


...


. Large size (n1 ∗ . . . ∗ n2)


. Using Query Containment to clean the output.

For example,to detect that this:


. . .





...


. Large size (n1 ∗ . . . ∗ n2)




. . .





...


. Large size (n1 ∗ . . . ∗ n2)




. . .





...


. Large size (n1 ∗ . . . ∗ n2)


. Using Query Containment to clean the output..But.....

.

. Query containment is an extremely expensive operation.

. We are working with large sets of queries.


...

Roots of the problem

There are 3 main reasons for large CQ rewritings and unfoldings:

(E) Sub-queries of q with existentially quantified variablescan be folded in many different ways to match thecanonical model (existential trees), e.g.,

Person ⊑ ∃hasFather.Person

and the query

q(x)← hasFather(x, y), hasFather(y, z)

(H) The concepts and roles for atoms in q can have manysub-concepts and sub-roles according to T,

(M) The mapping M can have multiple definitions of theontology terms,

Most of the proposed rewriting techniques try to tame (E).


...


There are 3 main reasons for large CQ rewritings and unfoldings:(E) Sub-queries of q with existentially quantified variables

can be folded in many different ways to match thecanonical model (existential trees), e.g.,


and the query






...





and the query






...





and the query




Most of the proposed rewriting techniques try to tame (E).Ontop at Work 12 / 29

...

More about (E)

More about (E). it is in theory incurable. it is independent of (H) and (M)

However. Rewriting algorithms deal with (E) and (H) at the same time. Real-world Qs and T’s generate few queries when dealing with

(E) in isolation.. Even artificially constructed Qs and T’s become simple.

.

.The strongest issues in query rewriting are (H) and (M)

In Ontop we deal with (H) and (M) separately from (E). We do itthrough T-mappings and TreeWitness rewritings.


...

More about (E)




.




...

More about (E)




.




...

More about (E)




.




...

Dealing with (H) and (M): T-MappingsA T-mapping MT is a transformation of M that enforces all (H)entailments (H-completeness), formally,

M |= A(c) and T |= A ⊑ B→ MT |= B(c)

.T-mapping example 1..

.

Consider two DB relations title[m, t, y] and castinfo[p,m, r] and anontology MO describing the film domain as follows:

Movie ≡ ∃cast

Let M be the following mappings:

Movie(m)← title(m, t, y),cast(m, p)← castinfo(p,m, r).


...


M |= A(c) and T |= A ⊑ B→ MT |= B(c).T-mapping example 1..

.


Movie ≡ ∃cast


Movie(m)← title(m, t, y),cast(m, p)← castinfo(p,m, r).


...


M |= A(c) and T |= A ⊑ B→ MT |= B(c).T-mapping example 1 (domain/range)..

.


Movie ≡ ∃cast


Movie(m)← title(m, t, y),cast(m, p)← castinfo(p,m, r).Movie(m)← castinfo(p,m, r).


...

T-Mappings: Example 2

.T-mappings example 2 (hierarchies)..

.

Consider a TBox T

Actor ⊑ Person Actress ⊑ Person,Producer ⊑ Person, Director ⊑ Person,

Writer ⊑ Person, Editor ⊑ Person.

The mapping M:

Actor(p)← castinfo(p,m, ”c1”) · · ·Editor(p)← castinfo(p,m, ”c6”).


...

T-Mappings: Example 2

.T-mappings example 2 (hierarchies)..

.

Consider a TBox T

Actor ⊑ Person Actress ⊑ Person,Producer ⊑ Person, Director ⊑ Person,

Writer ⊑ Person, Editor ⊑ Person.

The mapping M:

Person(p)← castinfo(p,m, ”c1”) · · ·Person(p)← castinfo(p,m, ”c6”).


...

Optimising T-mappings

.

.

The objective of T-mapping allow to deal with hierarchical reasoning(H) at the level of the unfolding. At this point, we can exploit

. DB dependencies and

. SQL expressivity to reduce and often the exponential growthcoming form (H) and (M).


...

Optimising with Dependencies

A first optimisation is Query Containment (w.r.t. dependencies)

.Example..

.

Consider the previous example, since T |= ∃cast ⊑ Movie, theT-mapping contains:

Movie(m) ← title(m, t, y),Movie(m) ← castinfo(p,m, r).

The latter rule is redundant since IMDb contains the foreign key

title(m, t, y)⇝ title(p,m, r)

This step is crucial to reduce the growth due to inferences related todomain and range.


...


A first optimisation is Query Containment (w.r.t. dependencies).Example..

.







...


A first optimisation is Query Containment (w.r.t. dependencies).Example..

.







...

Optimising with SQL expressivity

Observation. The only means for perfect reformulations to dealwith (H) is through disjunction (UNION). DBMS are not goodplanning UNIONs.

However, At the level of the unfolding and mappings, we have fullSQL expressivity (e.g., Disjunction (OR), inequalities, etc.).

.Objective..

.

Given a T-mapping, define mapping transformations thatentail the same ABox using less mappings while ensuringthat the encoding used is efficient during execution.


...




.Objective..

.



...




.Objective..

.



...




.Objective..

.



...


Use OR and inequalities to re-express mappings for hierarchies anddiscriminant columns.

.Dealing with discriminant columns..

.

For example, the mapping M for IMDb and MO contains six rulesfor sub-concepts of Person:

Person(p)← castinfo(p,m, ”c1”)· · ·

Person(p)← castinfo(p,m, ”c6”)

These can be reduced to a single rule:

Person(p)← castinfo(c, p,m, r), (r = c1) ∨ · · · ∨ (r = c6).


...


Use OR and inequalities to re-express mappings for hierarchies anddiscriminant columns..Dealing with discriminant columns..

.







...


Use OR and inequalities to re-express mappings for hierarchies anddiscriminant columns..Dealing with discriminant columns..

.







...

The architecture of Ontop

..CQ q .

ontology T

. UCQ qtw.

T-mapping

.

mapping M

.

dependencies Σ

. SQL.

data D

.

ABox A

.

H-complete ABox A

.+ .tw-rewriting

. +.unfolding

.

+

.

ABox virtualisation

.

+

.

ABox virtualisation

.

+

.

ABox completion

.

+

.completion

.

SQO

.

SQO

.

Highlights: (H) and (M) dealt with T-mappings, rewriting for(H)-complete ABoxes, extensive use of SQO over the unfolding.


...

Other Optimisations in Ontop

We also apply other important optimisations during system setupand at query time, the most important:Equivalence Simplification Simplify the ontology vocabulary w.r.t.

equivalence (keep one representative of eachequivalence class).

Semantic Query Optimisation Optimise each query generatedindividually... see next slides.

Emptiness indexes Keeping track of empty predicates


...

Results

A summary of the results we have observed using this architecture:. Mappings per class/property are few. Query rewritings are small. SQL queries generated like this often correspond to what a

human expert would have generated.. Query execution of SPARQL with entailments is fast, often

much faster than in triple stores...Query rewriting can be done efficiently


...

Benchmarks

0.1

1

10

100

1000

10000

100000

1000000

R1 R2 R3 R4 R5 Q1 Q2 Q3 Q4 Q5 Q6 V7 V8 V9 V10

OWLIM

STARDOG

ONTOP

Benchmark: LUBMex, 200 Unis (30M triples). Systems: OWLIM(forward chaining), Stardog (rewriting), Ontop/DB2.Ontop/DB2

. returns immediately for 5/15 queries,

. faster than the rest in 12/15 queriesOntop at Work 25 / 29

...

SummaryResults so far

. Efficiently dealt with exponential growth from (H) and (M)

. Use of dependencies and CQC/SQO to minimise and optimisemapping rules

. We exploit SQL expressivity to transform mappings to minimizethe number of mappings.

.

.OWL 2 QL query answering with query rewriting is efficient andmaterialisation is not required.

Ontop is available as an SPARQL end-point, OWLAPI andSesame library, and Protege 4 plugin. Many more features(SPARQL, R2RML). Permanently under-development, however,stable enough to be used seriously in many projects, incl. Optique.Current work is applying these techniques to more expressivesettings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through anhybrid approach.


...





.




...





.


Ontop is available as an SPARQL end-point, OWLAPI andSesame library, and Protege 4 plugin. Many more features(SPARQL, R2RML). Permanently under-development, however,stable enough to be used seriously in many projects, incl. Optique.

Current work is applying these techniques to more expressivesettings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through anhybrid approach.


...





.




...

Semantic Query Optimisation

Consider the query

q(t, y)← Movie(m), title(m, t), year(m, y), (y > 2010)

By straightforwardly applying the unfolding to qtw and theT-mapping M above, we obtain the query

q′tw(t, y)← title(m, t0, y0), title(m, t, y1), title(m, t2, y), (y > 2010),

which requires two (potentially) expensive Join operations.However, by using the primary key m of title we obtain:

q′′tw(t, y)← title(m, t, y), (y > 2010).

...

Semantic Query Optmization

Semantic Query Optimisation (SQO) is a field from DB theoryfocused on optimisation of queries w.r.t. dependencies.Semantic Query Optimisations in DB and OBDA

. While some of SQO techniques reached industrial RDBMSs,it never had a strong impact on the database community.

. In OBDA, in contrast, SQL queries are generatedautomatically, and so SQO is the only tools to reach optimalqueries.

.

.

In practice, an OBDA system must implement at least SQOw.r.t. primary keys and foreign keys to deal with the disparitiesbetween RDF and relational.

...

Why does it work?

DBs are created through standard practices that generate featuresthat are the focus of the previous optimisations.Starting from a rich conceptual schema, we encode it in a relationalschema by:

– amalgamating N-to-1 and 1-to-1 attributes of an entity to asingle n-ary relation with a primary key identifying the entity(e.g., title with title and year),

– using foreign keys over attribute columns when a column refersto the entity (e.g., name and castinfo),

– using type-discriminant columns to encode hierarchicalinformation (e.g., castinfo).

As this process is universal, the T-mappings created for the resultingdatabases are dramatically simplified by the Ontop optimisations

Technology

OXFORD'13 Optimising OWL 2 QL query rewriring