Managing Completeness of Web Data

Preview:

Citation preview

Managing Completeness of Web Data

Fariz DarariPhD Supervisor: Werner Nutt

Supported by the project MAGIC, funded by the province of Bolzano

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38

About Us

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 2 / 38

Research Group

Sorted by distance to Werner’s office :)

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 3 / 38

Bozen-Bolzano

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 4 / 38

Motivation

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 5 / 38

Completeness statements are already there

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 6 / 38

However . . .

Completeness statements are availablebut only in natural languageUnclear what data completeness & query completeness meanNo techniques to check whether data completeness entailsquery completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 7 / 38

Solution Ideas

Completeness statements are availablebut only in natural language

Solution: RDF-ize completeness statements

Unclear what data completeness & query completeness meanSolution: Formalize data completeness & query completeness

No techniques to check whether data completeness entailsquery completeness

Solution: Develop techniques to check whether data completenessentails query completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38

Solution Ideas

Completeness statements are availablebut only in natural language

Solution: RDF-ize completeness statementsUnclear what data completeness & query completeness mean

Solution: Formalize data completeness & query completeness

No techniques to check whether data completeness entailsquery completeness

Solution: Develop techniques to check whether data completenessentails query completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38

Solution Ideas

Completeness statements are availablebut only in natural language

Solution: RDF-ize completeness statementsUnclear what data completeness & query completeness mean

Solution: Formalize data completeness & query completeness

No techniques to check whether data completeness entailsquery completeness

Solution: Develop techniques to check whether data completenessentails query completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38

Solutions

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 9 / 38

Background: RDF

Grd = { (resDogs,dir , tarantino),(resDogs,act , tarantino) }

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 10 / 38

Background: SPARQL

SELECTQsdir = ({ ?m }, { (?m,dir , tarantino) })

ASKQadir = ({ }, { (?m,dir , tarantino) })

CONSTRUCT

Qcdir = ({ (?m,dir , tarantino) }, { (?m,dir , tarantino) })

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 11 / 38

Story: Incomplete Data Source

An incomplete data source of Reservoir Dogs,Gdbp = (Ga

dbp,Gidbp):

Gadbp = {(resDogs,dir , tarantino)}

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 12 / 38

Story: Incomplete Data Source

An incomplete data source of Reservoir Dogs,Gdbp = (Ga

dbp,Gidbp):

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 13 / 38

Story: Completeness Statement

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

From (Gadbp,G

idbp), we can say that DBpedia is complete

for movies directed by Tarantino:

Cdir = Compl((?m,dir , tarantino) | ∅)

However, it is not complete for actors in movies directed by Tarantino:

Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38

Story: Completeness Statement

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

From (Gadbp,G

idbp), we can say that DBpedia is complete

for movies directed by Tarantino:

Cdir = Compl((?m,dir , tarantino) | ∅)

However, it is not complete for actors in movies directed by Tarantino:

Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38

Story: Query Completeness

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

Consequently, when we ask for all movies directed by Tarantinoover DBpedia:

Qdir = ({?m}, {(?m,dir , tarantino)})

the query completeness Compl(Qdir ) is obtained.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 15 / 38

Story: Query Completeness

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

However, if we ask for all movies directed by and starring Tarantino:

Qdir+act = ({?m}, {(?m,dir , tarantino), (?m,act , tarantino)})

the query completeness Compl(Qdir+act) is not obtained.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 16 / 38

Incomplete Data Source

Definition (Incomplete Data Source)An incomplete data source is a pair of two graphs

G = (Ga,Gi), where Ga ⊆ Gi .

We call Ga the available graph and Gi the ideal graph.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 17 / 38

Completeness Statement

Definition (Completeness Statement)Let P1 be a non-empty BGP and P2 a BGP.

A completeness statement is defined as

Compl(P1 | P2)

where we call P1 the pattern and P2 the condition of the statement.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 18 / 38

Satisfaction of Completeness Statements

To a statementC = Compl(P1 | P2),

we associate the CONSTRUCT query

QC = (P1,P1 ∪ P2).

Then, we say:

C is satisfied by an incomplete data source G = (Ga,Gi),written G |= C, if

JQCKGi ⊆ Ga.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 19 / 38

Completeness Statements in RDF

Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))

lv:dataset a void:Dataset;c:hasComplStmt lv:csAct.

lv:csAct c:hasPattern [c:subject [c:varName "m"];c:predicate s:actor;c:object [c:varName "a"]];

c:hasCondition [c:subject [c:varName "m"];c:predicate s:director;c:object lmdb:Quentin_Tarantino].

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 20 / 38

Query Completeness

Definition (Query Completeness)Let Q be a query. We write

Compl(Q)

to say that Q is complete.

An incomplete data source G = (Ga,Gi) satisfies Compl(Q),written G |= Compl(Q), if

JQKGi = JQKGa .

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 21 / 38

Completeness Entailment

Problem Definition (Completeness Entailment)Let C be a set of completeness statements and Q a query.

We say that C entails the completeness of Q, written

C |= Compl(Q),

if any incomplete data source satisfying C also satisfies Compl(Q).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 22 / 38

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act) where

Pdir+act = { (?m,dir , tarantino), (?m,act , tarantino) }.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 23 / 38

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 24 / 38

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Therefore,

JQCdir KPdir+act∪ JQCact KPdir+act

=

{ (m,dir , tarantino), (m,act , tarantino) } =Pdir+act .

Thus,Cdir ,act |= Compl(Qdir+act).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Therefore,

JQCdir KPdir+act∪ JQCact KPdir+act

=

{ (m,dir , tarantino), (m,act , tarantino) } =

Pdir+act .

Thus,Cdir ,act |= Compl(Qdir+act).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Therefore,

JQCdir KPdir+act∪ JQCact KPdir+act

=

{ (m,dir , tarantino), (m,act , tarantino) } =Pdir+act .

Thus,Cdir ,act |= Compl(Qdir+act).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38

Prototypical Graph

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Definition (Prototypical Graph)Let Q = (W ,P) be a query.

The freeze mapping id is defined as a mappingfrom each variable ?v in P to a new IRI v .

Instantiating the graph pattern P with id yields the graph

P := id P,

which we call the prototypical graph of Q.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 26 / 38

Transfer Operator

JQCdir KPdir+act∪ JQCact KPdir+act

Definition (Transfer Operator)For any set C of completeness statements and a graph G,we define the transfer operator TC that computes the unionof the evaluation over G of all CONSTRUCT queriesof the statements in C:

TC(G) =⋃

C∈C

JQCKG

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 27 / 38

Completeness Entailment Theorem

Pdir+act = TCdir,act (Pdir+act)

Theorem (Completeness of Basic Queries)Let C be a set of completeness statements andQ = (W ,P) a basic query. Then,

C |= Compl(Q) if and only if P = TC(P).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 28 / 38

Query Class: DISTINCT Queries

Give us all Oscar-winning things:

Qawd = (Wawd ,Pawd)d =

({?m}, { (?m,award ,oscar), (?m,award , ?aw) })d

Complete for all Oscar-winning things:

Cos = Compl((?m,award ,oscar) | ∅)

{Cos } |= Compl(Qawd) holds?

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 29 / 38

Query Class: OPT Queries

Give us all movies, and their awards, if any:

Qmaw = ({ ?m, ?aw }, ((?m,a,Movie) OPT (?m,award , ?aw)))

Complete for all movies and their awards:

Caw = Compl((?m,a,Movie), (?m,award , ?aw) | ∅)

{Caw } |= Compl(Qmaw ) holds?

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 30 / 38

Query Class: Queries under RDFS Semantics

Give us all films:

Qfilm = ({ ?m }, { (?m,a,Film) })

Complete for all movies:

Cmovie = Compl((?m,a,Movie) | ∅)

Films are the same as movies:

Sfm = {(Film, subclass,Movie), (Movie, subclass,Film)}

{Cmovie } |= Compl(Qfilm) wrt. Sfm holds?

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 31 / 38

Federated Completeness Statements

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 32 / 38

Timestamped Completeness Statements

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 33 / 38

Conclusions

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 34 / 38

Conclusions

Completeness statements can now be represented in RDFWe know how completeness statements can entail querycompleteness in different query classes anddifferent settings of completeness statements

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 35 / 38

Future Work

Completeness statements for queries with negationCompleteness statements as session annotationsfor RDF streamsStatistical completeness reasoning

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 36 / 38

Publications

Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski: CompletenessStatements about RDF Data Sources and Their Use for Query Answering.ISWC 2013.

Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A CompletenessReasoner for SPARQL Queries Over RDF Data Sources. ESWC Posters andDemos 2014.

Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gapbetween RDF and SPARQL using Completeness Statements. ISWC Postersand Demos 2014.

Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-ValueInformation in RDF. ISWC Posters & Demos 2015.

The latest results (timestamped statements and efficient completenessreasoning with 1 million statements) have been submitted to a journal.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 37 / 38

Compl((myDaSePresentation, slide, ?s) | ∅)

Thank You!

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 38 / 38

Recommended