42
Linked Justifications: Provenance Aware Data Integration on Linked Data Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute Nov 2, 2009

Linked Justifications: Provenance Aware Data Integration on Linked Data

  • Upload
    hisa

  • View
    72

  • Download
    0

Embed Size (px)

DESCRIPTION

Linked Justifications: Provenance Aware Data Integration on Linked Data. Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute Nov 2, 2009. Linked Data. Data on the Web Use RDF Use dereferenceable HTTP URI Linked by typed links rdfs:seeAlso owl:sameAs ... - PowerPoint PPT Presentation

Citation preview

Page 1: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Linked Justifications: Provenance Aware Data Integration on Linked Data

Li DingTetherless World Constellation

Rensselaer Polytechnic InstituteNov 2, 2009

Page 2: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Linked Data

• Data on the Web– Use RDF– Use dereferenceable

HTTP URI• Linked by typed links– rdfs:seeAlso– owl:sameAs– ...

• Many datasets

Page 3: Linked Justifications:  Provenance Aware Data Integration on Linked Data

A Simple Linked Data Example

Li Ding

Ying Ding Katy Bӧrner

RPI Troy, NY

Page 4: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Motivation

• Justification shows why someone properly holds a belief

• Justifications are important– Daily life, e.g. government budget, résumé – Intelligent systems, e.g. GPS rounting

• It would be nice to reuse justifications–Chained justifications: organic eggs–Alternative justifications: creation of human

Page 5: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Challenges and Solutions

• Challenges: reuse distributed, isolate and heterogeneous Justifications

• Solutions– Make it linked data• Use general purposed simple structure• Support extensible semantic annotation• Use RDF with dereferencable URI• Make it linked

– Support interesting computations

Page 6: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Puzzle “who killed Aunt Agatha?”(1) Someone who lives in Dreadsbury Mansion killed Aunt Agatha. (2) Agatha, the butler, and Charles live in Dreadsbury Mansion, and

are the only people who live therein. (3) A killer always hates his victim, and is never richer than his

victim. (4) Charles hates no one that Aunt Agatha hates. (5) Agatha hates everyone except the butler. (6) The butler hates everyone not richer than Aunt Agatha. (7) The butler hates everyone Agatha hates. (8) No one hates everyone. (9) Agatha is not the butler.

Page 7: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Linked Justifications

Page 8: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Intuition 1+1 2

B1

B2

A A

Page 9: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Roadmap for Linked Justification

• Put linked justifications on the Web– Choose TPTP dataset– Model Justification (TPTP proofs) using Hypergraph– Publish justifications in PML– Link justifications using owl:sameAs

• Consume linked justifications– Visualize– Validation– Improve

Page 10: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Encoding Linked Justification

legendBs3

vertex hyperarc output input

(a) directed hypergraph (b) directed bipartite graph

English interpretation1. A,B,C,D,E are statements.2. s1 ~s6 are steps in justification j13. A was derived by s1 from B,C,D4. B was derived by s2 from E5. B was also derived by s3 from C,D6. D,C,E were derived from s4, s5, s6 respectively

A

B

C

D

E

s3

s1

s2

s4

s5

s6

AB

s1

CDE

s3s2

s6

s4s5

Page 11: Linked Justifications:  Provenance Aware Data Integration on Linked Data
Page 12: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Example Linked justification

Page 13: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Self-Improve

Page 14: Linked Justifications:  Provenance Aware Data Integration on Linked Data
Page 15: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Improve•Less steps•New formula•hybird

Page 16: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Some statistics

Page 17: Linked Justifications:  Provenance Aware Data Integration on Linked Data
Page 18: Linked Justifications:  Provenance Aware Data Integration on Linked Data

G(Freebase:fairfax_county)

G(Freebase:Virginia)

#Fairfax_County1

G(dbpedia:Fairfax_County%2C_Virginia)

G(dbpedia:Virginia)

G(dbpedia:Fairfax_County_Board_of_Supervisors)

address

address

address

addressaddress

#Virginia1

#George Mason

reference reference

#Virginia2

#Fairfax_County2

#Fairfax_County3

Page 19: Linked Justifications:  Provenance Aware Data Integration on Linked Data

G(Freebase:fairfax_county)

G(Freebase:Virginia)

#Fairfax_County1

G(dbpedia:Fairfax_County%2C_Virginia)

G(dbpedia:Virginia)

G(dbpedia:Fairfax_County_Board_of_Supervisors)

address

address

address

addressaddress

#Virginia1

#George Mason

reference reference

Page 20: Linked Justifications:  Provenance Aware Data Integration on Linked Data

A

Bs1

C DE

s2 s3

s4

s5 s6

Page 21: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Hyper-graph syntax

English Interpretation1. A,B,C,D,E are statements.2. s1 ~s6 are steps in justification j13. A was derived by s1 from B,C,D4. B was derived by s2 from E5. B was alternatively derived by s3

from C,D6. E,C,D were directly derived by

s4,s5,s6 respectively7. s4~s6 are terminal

j1

Directed Hypergraph Representation

A

B

s1

C DEs2 s3

s4 s5 s6

Hyperarc

vertex

Directed Hypergraph

AND

OR

Page 22: Linked Justifications:  Provenance Aware Data Integration on Linked Data

General Problem Context• Justifications (or proofs) generated by different reasoners may

derive semantically equivalent intermediate/final conclusions; therefore, – We can combine existing justifications into an AND-OR graph (encoded as

a hypergraph)– We can search the AND-OR graph for a “better” solution graph which is a

combination of justification fragments

j4

legend

j3j1 j2A

B

s1

C D

B

E

s2

B

s3

C D

B s3vertex hyperarc is conclusion of has antecedent

s4 s5 s6 s7 s8 s9

A is derived from B, C, DB,C,D are asserted

B is derived from EE is asserted B is derived from C,D

C,D are asserted

A

B

s1

C DE

s2 s3s4

s5 s6s7 s8 s9

+ + = =>

j5

A is derived from B,C,DC,D are asserted

A

B

s1

C D

s3

s5 s6

Linked justifications rooted at AP4 is created by linking p1,p2 and p3

Search

combine

Page 23: Linked Justifications:  Provenance Aware Data Integration on Linked Data

General Problem Context

j4

legend

j3j1 j2A

B

s1

C D

B

E

s2

B

s3

C D

B s3vertex hyperarc is conclusion of has antecedent

s4 s5 s6 s7 s8 s9

A is derived from B, C, DB,C,D are asserted

B is derived from EE is asserted B is derived from C,D

C,D are asserted

A

B

s1

C DE

s2 s3s4

s5 s6s7 s8 s9

+ + = =>

j5

A is derived from B,C,DC,D are asserted

A

B

s1

C D

s3

s5 s6

Linked justifications rooted at AP4 is created by linking p1,p2 and p3

Search

combine

Page 24: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Directed HyperGraph Formalism• A justification is encoded by an annotated directed hypergraph H(V, A, C):

– V={v1,v2…vn}, set of vertex – a vertex denotes a unique formula– A={a1,a2,…am}, set of hyperarc – a hyperarc denotes a step in justification– C: context data

• Source – a hyperarc may come from multiple sources• Weight – each hyperarc has a weight for optimization purpose

• Notations– Hyperarc ai A(H)

• output(ai) V(H), formula derived as conclusions, OR?• input(ai) V(H), formula used as antecedents, AND

– Vertex vi V(H)• Inlink(vi) A(H), hyperarcs having vi as tail• Outlink(vi) A(H) , hyperarcs having vi as head

– Hyergraph -H• A(H) = ai where ai H• V(H) = vi where vi H• Output(H)= output(ai) where ai A(H)• Input(H) = Input(ai) where ai A(H)• Roots(H) = Output(H) – Input(H)

– Hyperpath – p={v1,a1,v2,a2,..vn} , a path in hypergraph• Vi input(ai)• Vi+1 output(ai)

Page 25: Linked Justifications:  Provenance Aware Data Integration on Linked Data

More Definitions• A hyperpath p is cyclic iff. p ends at its starting vertex, i.e. p = {V1, …Vn, An, V1}• A hypergraph H(X,A,C) is

– concise iff. No two steps derives the same statement i.e. output(ai) ∩ output(aj) = ai,aj A, i j

– complete iff. Every statement has justification i.e. Input(H) Output(H)

– acyclic iff. H has no cyclic hyperpath.• A solution graph Hs(X’,A’,C’) for v of a hypergraph H w.r.t. vertex v is

– A subgraph of H i.e. A’ A– Rooted at vertex v i.e. Roots(Hs)={v}– Concise– Complete– Acyclic

• Weighted directed hypergraph – Each hyperedge has a numeric weight, weight(ai)– The weight of a directed hypergraph weight(H) = weight (ai) ai A

Page 26: Linked Justifications:  Provenance Aware Data Integration on Linked Data

The “Search” Problem

• Given a weighted directed hypergraph H(X,A,C) and a starting vertex v, find the optimal solution graph H’(X’,A’,C’) rooted at v.– Optimal – minimal weight

• Discussion– Search space is huge, could be exponential– Similar to AO* search, which assumes Tree instead

of DAG

Page 27: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Example1: AO* Search does not workFind minimal (weight) solution graph

A

Bs1

C DEs2 s3

s5 s6s41 1 1

1 1

1

j0 A

Bs1

C DEs2 s3

s5 s6s4

2 3

5

j1

2

5

A

Bs1

C DEs2 s3

s5 s6s4

j0 A

Bs1

C DEs2 s3

s5 s6s4

j1

A

Bs1

C DEs2 s3

s5 s6s4

2 3

4

j2

?

4

j2 A

Bs1

C DEs2 s3

s5 s6s4

j1 is AO* Search result j2 is the optimal resultj0 is the input

Assign each hyperarc weight 1 AO* does not consider shared hyperarc

Page 28: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Example2: Combine & Improve Proof

Page 29: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Architecture

J1(pml2)

J2(pml2)Mappings

(owl)

H(A,X,C)(Graph)

H_OPT(A,X,C)(Graph)

J_OPT(pml2)

statistics

Proofs(tptp)

J_ALL(pml2)

map

visualize diff

hg2pml

search

combine

translate

Page 30: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Backup

Page 31: Linked Justifications:  Provenance Aware Data Integration on Linked Data

RDF graph syntax

AB

s1

CDE

s3s2s4s5s6

j1

0

1

0

00

11

weightpartOf

output

input

Page 32: Linked Justifications:  Provenance Aware Data Integration on Linked Data
Page 33: Linked Justifications:  Provenance Aware Data Integration on Linked Data

A A B

Modus Ponens

B B CModus Ponens

C

A A C

Modus Ponens

C

Page 34: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Freebase:fairfax_county

dbpedia:Fairfax_County%2C_Virginia

geonames:4758041rdfabout:fairfax_county

Freebase:Virginia

dbpedia:Virginia

geonames:6254928

dbpedia:Fairfax_County_Board_of_Supervisors

address

address

address

address

same

same

Page 35: Linked Justifications:  Provenance Aware Data Integration on Linked Data

G(Freebase:fairfax_county)

dbpedia:Fairfax_County%2C_Virginia

G(Freebase:Virginia)

dbpedia:Virginia

dbpedia:Fairfax_County_Board_of_Supervisors

Freebase:Virginia

Freebase:fairfax_county addr

ess

reference

address

G(dbpedia:Fairfax_County%2C_Virginia)

G(dbpedia:Virginia)

G(dbpedia:Fairfax_County_Board_of_Supervisors)

address

reference

address

address

Page 36: Linked Justifications:  Provenance Aware Data Integration on Linked Data

G(Freebase:fairfax_county)

G(Freebase:Virginia)

#Fairfax_County

G(dbpedia:Fairfax_County%2C_Virginia)

G(dbpedia:Virginia)

G(dbpedia:Fairfax_County_Board_of_Supervisors)

address

address

address

address

address

#Virginia

#George Mason

reference

reference

Page 37: Linked Justifications:  Provenance Aware Data Integration on Linked Data
Page 39: Linked Justifications:  Provenance Aware Data Integration on Linked Data

g1

uri2

parse

address

g2

same uri3

address

g3

Page 40: Linked Justifications:  Provenance Aware Data Integration on Linked Data

g1

address

g3

address

g2

Page 41: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Hypergraph Notation

A

BCD

E

AB

s1

CDE

s3s2

output

input

legendBs3

vertex hyperarc output input

(a) directed hypergraph (b) directed bipartite graph

s2

s1

s3

Page 42: Linked Justifications:  Provenance Aware Data Integration on Linked Data

Hypergraph Notation

A

BCD

E

AB

s1

CDE

s3s2

output

input

legendBs3

vertex hyperarc output input

(a) directed hypergraph (b) directed bipartite graph

legend B s3vertex hyperarc output input

s6

s4s5

s2

s1

s3