Upload
alfred-williamson
View
244
Download
0
Embed Size (px)
Citation preview
URDFQuery-Time Reasoning in Uncertain RDF Knowledge Bases
Ndapandula NakasholeMauro SozioFabian SuchanekMartin Theobald
bornOn(Jeff, 09/22/42)gradFrom(Jeff, Columbia)hasAdvisor(Jeff, Arthur)hasAdvisor(Surajit, Jeff)knownFor(Jeff, Theory)
type(Jeff, Author)[0.9]
author(Jeff, Drag_Book)[0.8]
author(Jeff,Cind_Book)[0.6]
worksAt(Jeff, Bell_Labs)[0.7]
type(Jeff, CEO)[0.4]
Information Extraction
YAGO/DBpedia et al.
New fact candidates
>120 M facts for YAGO2(mostly from Wikipedia infoboxes)
100’s M additional facts from Wikipedia text
Outline
Motivation & Problem Setting URDF running example: people graduating from
universities
Efficient MAP Inference MaxSAT solving with soft & hard constraints
Grounding Deductive grounding of soft rules (SLD resolution) Iterative grounding of hard rules (closure)
MaxSAT Algorithm MaxSAT algorithm in 3 steps
Experiments & Future Work
Query-Time Reasoning in Uncertain RDF Knowledge Bases
3
URDF: Uncertain RDF Data Model
Extensional Layer (information extraction & integration) High-confidence facts: existing knowledge base (“ground truth”) New fact candidates: extracted facts with confidence values Integration of different knowledge sources: Ontology merging or explicit Linked Data (owl:sameAs, owl:equivProp.)
Large “Uncertain Database” of RDF facts
Intensional Layer (query-time inference) Soft rules: deductive grounding & lineage (Datalog/SLD resolution) Hard rules: consistency constraints (more general FOL rules) Propositional & probabilistic consistency reasoning
Query-Time Reasoning in Uncertain RDF Knowledge Bases
4
Soft Rules vs. Hard Rules
(Soft) Deduction Rules vs. (Hard) Consistency Constraints
People may live in more than one placelivesIn(x,y) marriedTo(x,z) livesIn(z,y)livesIn(x,y) hasChild(x,z) livesIn(z,y)
People are not born in different places/on different datesbornIn(x,y) bornIn(x,z) y=z
People are not married to more than one person (at the same time, in most countries?)
marriedTo(x,y,t1) marriedTo(x,z,t2) y≠z
disjoint(t1,t2)
[0.8]
[0.5]
Query-Time Reasoning in Uncertain RDF Knowledge Bases
5
Soft Rules vs. Hard Rules
(Soft) Deduction Rules vs. (Hard) Consistency Constraints
People may live in more than one placelivesIn(x,y) marriedTo(x,z) livesIn(z,y)livesIn(x,y) hasChild(x,z) livesIn(z,y)
People are not born in different places/on different datesbornIn(x,y) bornIn(x,z) y=z
People are not married to more than one person (at the same time, in most countries?)
marriedTo(x,y,t1) marriedTo(x,z,t2) y≠z
disjoint(t1,t2)
[0.8]
[0.5]
Query-Time Reasoning in Uncertain RDF Knowledge Bases
6
Rule-based (deductive) reasoning:
Datalog, RDF/S, OWL2-RL, etc.
FOL constraints (in particular
mutex): Datalog with constraints,
X-tuples in Prob. DB’s
owl:FunctionalProperty, etc.
URDF Running Example
Jeff
Stanford
University
type[1.0]
Surajit
Princeton
David
Computer Scientist
worksAt[0.9]
type[1.0]
type[1.0]
type[1.0]type[1.0]
graduatedFrom[0.6]
graduatedFrom[0.7]
graduatedFrom[0.9]
hasAdvisor[0.8]hasAdvisor[0.7]
KB: RDF Base Facts
Derived FactsgradFrom(Surajit,Stanfo
rd)gradFrom(David,Stanford
)
graduatedFrom[?]graduatedFrom[?] graduatedFrom[?]
graduatedFrom[?]
First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z)
[0.4]
graduatedFrom(x,y) graduatedFrom(x,z) y=z
Query-Time Reasoning in Uncertain RDF Knowledge Bases
7
Basic Types of Inference
Maximum-A-Posteriori (MAP) Inference
Find the most likely assignment to query variables y under a given evidence x.
Compute: arg max y P( y | x) (NP-hard for
propositional formulas, e.g., MaxSAT over CNFs)
Marginal/Success Probabilities
Probability that query y is true in a random world under a given evidence x.
Compute: ∑y P( y | x) (#P-hard for propositional formulas)
Query-Time Reasoning in Uncertain RDF Knowledge Bases
8
9 Query-Time Reasoning in Uncertain RDF Knowledge Bases
General Route: Grounding & MaxSAT Solving
Query graduatedFrom(x, y)
CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton))
(graduatedFrom(David, Stanford) graduatedFrom(David, Princeton))
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))
(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford))
worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton)
1000
1000
0.4
0.4
0.9 0.8 0.7 0.6 0.7 0.9
1) Grounding– Consider only facts (and
rules) which are relevant for answering the query
2) Propositional formula in CNF, consisting of– Grounded hard & soft rules– Uncertain base facts
3) Propositional Reasoning– Find truth assignment to
facts such that the total weight of the satisfied clauses is maximized
MAP inference: compute “most likely” possible world
Why are high weights for hard rules not enough?
Consider the following CNF (for A,B > 0, A >> B)
The optimal solution has weight A+B The next-best solution has weight A+0 Hence the ratio of the optimal over the approximate
solution is A+B / A
In general, any (1+) approximation algorithm, with > 0, may set graduatedFrom(Surajit, Princeton) to true, as A+B / A 1 for A .Query-Time Reasoning in Uncertain RDF
Knowledge Bases10
CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton))
graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford)
A
0B
Find: arg max y P( y | x) Resolves to a variant of
MaxSAT for propositional formulas
URDF: MaxSAT Solving with Soft & Hard Rules
Query-Time Reasoning in Uncertain RDF Knowledge Bases
{ graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) }
{ graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) }
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))
(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford))
worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton)
0.4
0.4
0.9 0.8 0.7 0.6 0.7 0.9
S:
Mut
ex-c
onst
.
Special case: Horn-clauses as soft rules & mutex-constraints as hard rules
C:
Wei
ghte
d H
orn
clau
ses
(CN
F)
Compute W0 = ∑clauses C w(C) P(C is satisfied);For each hard constraint S { For each fact f in St { Compute Wf+
t = ∑clauses C w(C) P(C is sat. | f = true); } Compute WS-
t = ∑clauses C w(C) P(C is sat. | St = false); Choose truth assignment to f in St that maximizes Wf+
t , WS-t ;
Remove satisfied clauses C; t++;}
• Runtime: O(|S||C|)
• Approximation guarantee of 1/211
MaxSAT Alg.
Deductive Grounding Algorithm (SLD Resolution/Datalog)
/\
graduatedFrom(Surajit, Princeton)
hasAdvisor(Surajit,Jeff)
worksAt(Jeff,Stanford
)
graduatedFrom(Surajit, Stanford)
Query graduatedFrom(Surajit, y)
First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z)
[0.4]
graduatedFrom(x,y) graduatedFrom(x,z) y=z
Base FactsgraduatedFrom(Surajit, Princeton)
[0.7]graduatedFrom(Surajit, Stanford)
[0.6]graduatedFrom(David, Princeton)
[0.9]hasAdvisor(Surajit, Jeff) [0.8]hasAdvisor(David, Jeff) [0.7]worksAt(Jeff, Stanford) [0.9]type(Princeton, University) [1.0]type(Stanford, University) [1.0]type(Jeff, Computer_Scientist) [1.0]type(Surajit, Computer_Scientist)
[1.0]type(David, Computer_Scientist)
[1.0]
Query-Time Reasoning in Uncertain RDF Knowledge Bases
12
Grounded Rules
hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)
gradFrom(Surajit, Stanford) gradFrom(Surajit, Princeton)
Dependency Graph of a Query
SLD grounding always starts from a query literal and first pursues over the soft deduction rules.
Grounding is also iterated over the hard rules in a top-down fashion by using the literals in each hard rule as new subqueries.
Cycles (due to recursive rules) are detected and resolved via a form of tabling known from Datalog.
Grounding terminates when a closure is reached, i.e., when no new facts can be grounded from the rules and all subgoals are either resolved or form the root of a cycle.
Query-Time Reasoning in Uncertain RDF Knowledge Bases
13
Weighted MaxSAT AlgorithmGeneral ideaCompute a potential function Wt that iterates over all hard rules St and set the fact f St that maximizes Wt (or none of them) to true; set all other facts in St to false.
Query-Time Reasoning in Uncertain RDF Knowledge Bases
14
At iteration 0, we have
At any intermediate iteration t, we compare
At the final iteration t_max, all facts are assigned either true or false.
Wt_max is equal to the total weight of all clauses that are satisfied.
Step 1
Weights w(fi) and probabilities pi
Query-Time Reasoning in Uncertain RDF Knowledge Bases
15
{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }
{ gradFrom(David, Stanford), gradFrom(David, Princeton) }
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4
(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4
worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9
S:
Mut
ex-c
onst
.C
: W
eigh
ted
Hor
n cl
ause
s (C
NF
)
Fact fi w(fi) pi
gradFrom(Surajit, Stanford) 0.7 1.0
gradFrom(Surajit, Princeton) 0.6 0.0
gradFrom(David, Stanford) 0.0 0.0
gradFrom(David, Princeton) 0.9 1.0
worksAt(Jeff, Stanford) 0.9 1.0
hasAdvisor(Surajit, Jeff) 0.8 1.0
hasAdvisor(David, Jeff) 0.7 1.0
Query-Time Reasoning in Uncertain RDF Knowledge Bases
16
Step 2
{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }
{ gradFrom(David, Stanford), gradFrom(David, Princeton) }
S:
Mut
ex-c
onst
.C
: W
eigh
ted
Hor
n cl
ause
s (C
NF
)
Weights w(fi) and probabilities pi
Fact fi w(fi) pi
gradFrom(Surajit, Stanford) 0.7 1.0
gradFrom(Surajit, Princeton) 0.6 0.0
gradFrom(David, Stanford) 0.0 0.0
gradFrom(David, Princeton) 0.9 1.0
worksAt(Jeff, Stanford) 0.9 1.0
hasAdvisor(Surajit, Jeff) 0.8 1.0
hasAdvisor(David, Jeff) 0.7 1.0
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4
(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4
worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4
(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4
worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9
Weights w(fi) and probabilities pi
Fact fi w(fi) pi
gradFrom(Surajit, Stanford) 0.7 1.0
gradFrom(Surajit, Princeton) 0.6 0.0
gradFrom(David, Stanford) 0.0 0.0
gradFrom(David, Princeton) 0.9 1.0
worksAt(Jeff, Stanford) 0.9 1.0
hasAdvisor(Surajit, Jeff) 0.8 1.0
hasAdvisor(David, Jeff) 0.7 1.0Query-Time Reasoning in Uncertain RDF
Knowledge Bases17
Step 2
{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }
{ gradFrom(David, Stanford), gradFrom(David, Princeton) }
S:
Mut
ex-c
onst
.C
: W
eigh
ted
Hor
n cl
ause
s (C
NF
)
C1: hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)
P(C1) = 1 – (1-(1-1))(1-(1-1))(1-1) = 1
single partition, negated: 1 - pi
single partition, negated: 1 - pi
single partition, positive: pi
Query-Time Reasoning in Uncertain RDF Knowledge Bases
18
Step 2
{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }
{ gradFrom(David, Stanford), gradFrom(David, Princeton) }
S:
Mut
ex-c
onst
.C
: W
eigh
ted
Hor
n cl
ause
s (C
NF
) Weights w(fi) and probabilities pi
P(C1 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-1) = 1
P(C2 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-0)
= 0 ...
W0 = 0.4 + 0.9 + 0.8 + 0.7 + 0.6 + 0.7 + 0.9 = 5.0
Fact fi w(fi) pi
gradFrom(Surajit, Stanford) 0.7 1.0
gradFrom(Surajit, Princeton) 0.6 0.0
gradFrom(David, Stanford) 0.0 0.0
gradFrom(David, Princeton) 0.9 1.0
worksAt(Jeff, Stanford) 0.9 1.0
hasAdvisor(Surajit, Jeff) 0.8 1.0
hasAdvisor(David, Jeff) 0.7 1.0
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4
(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4
worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4
(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4
worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9
Query-Time Reasoning in Uncertain RDF Knowledge Bases
19
Step 3
{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }
{ gradFrom(David, Stanford), gradFrom(David, Princeton) }
S:
Mut
ex-c
onst
.C
: W
eigh
ted
Hor
n cl
ause
s (C
NF
) Weights w(fi), probabilities pi, truth values
P(C1 is satisfied | f1=true) = 1-(1-(1-1))(1-(1-1))(1-1) = 1
P(C1 is satisfied | f2=true) = 1-(1-(1-1))(1-(1-1))
(1-0) = 0 ...
W1 = 0.4 + 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.8
W2 = 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.4
Fact fi w(fi) pi
gradFrom(Surajit, Stanford) 0.7 1.0
gradFrom(Surajit, Princeton) 0.6 0.0
gradFrom(David, Stanford) 0.0 0.0
gradFrom(David, Princeton) 0.9 1.0
worksAt(Jeff, Stanford) 0.9 1.0
hasAdvisor(Surajit, Jeff) 0.8 1.0
hasAdvisor(David, Jeff) 0.7 1.0
true
false
false
true
true
true
true
Experiments – Setup YAGO Knowledge Base
2 Mio entities, 20 Mio facts Soft Rules
16 soft rules (hand-crafted deduction rules with weights)
Hard Rules 5 predicates with functional properties (bornIn, diedIn, bornOnDate, diedOnDate, marriedTo)
Queries 10 conjunctive SPARQL queries
Markov Logic as Competitor (based on MCMC) MAP inference: Alchemy employs a form of
MaxWalkSAT MC-SAT: Iterative MaxSAT & Gibbs sampling
Query-Time Reasoning in Uncertain RDF Knowledge Bases
20
YAGO Knowledge Base: URDF vs. Markov Logic
URDF: SLD grounding & MaxSat solving
|C| - # ground literals in soft rules|S| - # ground literals in hard rules
URDF vs. Markov Logic (MAP inference & MC-SAT)
• First run: ground each query against the rules (SLD grounding + MaxSAT solving) & report sum of runtimes• Asymptotic runtime checks: synthetic soft rule expansions
Query-Time Reasoning in Uncertain RDF Knowledge Bases
21
Recursive Rules & LUBM Benchmark
42 inductively learned (partly recursive) rules over 20 Mio facts in YAGO
URDF grounding with different maximum SLD levels
Query-Time Reasoning in Uncertain RDF Knowledge Bases
22
URDF (SLD grounding + MaxSAT) vs. Jena (only grounding) over the LUBM benchmark SF-1: 103,397 triplets SF-5: 646,128 triplets SF-10: 1,316,993 triplets
Current & Future Topics... Temporal consistency reasoning
Soft/hard rules with temporal predicates Soft deduction rules: deduce confidence distribution of
derived facts
Learning soft rules & consistency constraints Explore how Inductive Logic Programming can be applied
to large, uncertain & incomplete knowledge bases
More solving/sampling Linear-time constrained & weighted MaxSAT solver Improved Gibbs sampling with soft & hard rules
Scale-out Distributed grounding via message passing
Updates/versioning for (linked) RDF data Non-monotonic answers for rules with negation!Query-Time Reasoning in Uncertain RDF
Knowledge Bases23
Online Demo!
urdf.mpi-inf.mpg.de
Query-Time Reasoning in Uncertain RDF Knowledge Bases
24