URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald

URDFQuery-Time Reasoning in Uncertain RDF Knowledge Bases

Ndapandula NakasholeMauro SozioFabian SuchanekMartin Theobald

bornOn(Jeff, 09/22/42)gradFrom(Jeff, Columbia)hasAdvisor(Jeff, Arthur)hasAdvisor(Surajit, Jeff)knownFor(Jeff, Theory)

type(Jeff, Author)[0.9]

author(Jeff, Drag_Book)[0.8]

author(Jeff,Cind_Book)[0.6]

worksAt(Jeff, Bell_Labs)[0.7]

type(Jeff, CEO)[0.4]

Information Extraction

YAGO/DBpedia et al.

New fact candidates

>120 M facts for YAGO2(mostly from Wikipedia infoboxes)

100’s M additional facts from Wikipedia text

Outline

Motivation & Problem Setting URDF running example: people graduating from

universities

Efficient MAP Inference MaxSAT solving with soft & hard constraints

Grounding Deductive grounding of soft rules (SLD resolution) Iterative grounding of hard rules (closure)

MaxSAT Algorithm MaxSAT algorithm in 3 steps

Experiments & Future Work

Query-Time Reasoning in Uncertain RDF Knowledge Bases

3

URDF: Uncertain RDF Data Model

Extensional Layer (information extraction & integration) High-confidence facts: existing knowledge base (“ground truth”) New fact candidates: extracted facts with confidence values Integration of different knowledge sources: Ontology merging or explicit Linked Data (owl:sameAs, owl:equivProp.)

Large “Uncertain Database” of RDF facts

Intensional Layer (query-time inference) Soft rules: deductive grounding & lineage (Datalog/SLD resolution) Hard rules: consistency constraints (more general FOL rules) Propositional & probabilistic consistency reasoning


4

Soft Rules vs. Hard Rules

(Soft) Deduction Rules vs. (Hard) Consistency Constraints

People may live in more than one placelivesIn(x,y) marriedTo(x,z) livesIn(z,y)livesIn(x,y) hasChild(x,z) livesIn(z,y)

People are not born in different places/on different datesbornIn(x,y) bornIn(x,z) y=z

People are not married to more than one person (at the same time, in most countries?)

marriedTo(x,y,t1) marriedTo(x,z,t2) y≠z

disjoint(t1,t2)

[0.8]

[0.5]


5

Soft Rules vs. Hard Rules

(Soft) Deduction Rules vs. (Hard) Consistency Constraints

People may live in more than one placelivesIn(x,y) marriedTo(x,z) livesIn(z,y)livesIn(x,y) hasChild(x,z) livesIn(z,y)

People are not born in different places/on different datesbornIn(x,y) bornIn(x,z) y=z

People are not married to more than one person (at the same time, in most countries?)

marriedTo(x,y,t1) marriedTo(x,z,t2) y≠z

disjoint(t1,t2)

[0.8]

[0.5]


6

Rule-based (deductive) reasoning:

Datalog, RDF/S, OWL2-RL, etc.

FOL constraints (in particular

mutex): Datalog with constraints,

X-tuples in Prob. DB’s

owl:FunctionalProperty, etc.

URDF Running Example

Jeff

Stanford

University

type[1.0]

Surajit

Princeton

David

Computer Scientist

worksAt[0.9]

type[1.0]

type[1.0]

type[1.0]type[1.0]

graduatedFrom[0.6]

graduatedFrom[0.7]

graduatedFrom[0.9]

hasAdvisor[0.8]hasAdvisor[0.7]

KB: RDF Base Facts

Derived FactsgradFrom(Surajit,Stanfo

rd)gradFrom(David,Stanford

)

graduatedFrom[?]graduatedFrom[?] graduatedFrom[?]

graduatedFrom[?]

First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z)

[0.4]

graduatedFrom(x,y) graduatedFrom(x,z) y=z


7

Basic Types of Inference

Maximum-A-Posteriori (MAP) Inference

Find the most likely assignment to query variables y under a given evidence x.

Compute: arg max y P( y | x) (NP-hard for

propositional formulas, e.g., MaxSAT over CNFs)

Marginal/Success Probabilities

Probability that query y is true in a random world under a given evidence x.

Compute: ∑y P( y | x) (#P-hard for propositional formulas)


8

9 Query-Time Reasoning in Uncertain RDF Knowledge Bases

General Route: Grounding & MaxSAT Solving

Query graduatedFrom(x, y)

CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton))

(graduatedFrom(David, Stanford) graduatedFrom(David, Princeton))

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford))

worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton)

1000

1000

0.4

0.4

0.9 0.8 0.7 0.6 0.7 0.9

1) Grounding– Consider only facts (and

rules) which are relevant for answering the query

2) Propositional formula in CNF, consisting of– Grounded hard & soft rules– Uncertain base facts

3) Propositional Reasoning– Find truth assignment to

facts such that the total weight of the satisfied clauses is maximized

MAP inference: compute “most likely” possible world

Why are high weights for hard rules not enough?

Consider the following CNF (for A,B > 0, A >> B)

The optimal solution has weight A+B The next-best solution has weight A+0 Hence the ratio of the optimal over the approximate

solution is A+B / A

In general, any (1+) approximation algorithm, with > 0, may set graduatedFrom(Surajit, Princeton) to true, as A+B / A 1 for A .Query-Time Reasoning in Uncertain RDF

Knowledge Bases10

CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton))

graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford)

A

0B

Find: arg max y P( y | x) Resolves to a variant of

MaxSAT for propositional formulas

URDF: MaxSAT Solving with Soft & Hard Rules


{ graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) }

{ graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) }

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford))

worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton)

0.4

0.4

0.9 0.8 0.7 0.6 0.7 0.9

S:

Mut

ex-c

onst

.

Special case: Horn-clauses as soft rules & mutex-constraints as hard rules

C:

Wei

ghte

d H

orn

clau

ses

(CN

F)

Compute W0 = ∑clauses C w(C) P(C is satisfied);For each hard constraint S { For each fact f in St { Compute Wf+

t = ∑clauses C w(C) P(C is sat. | f = true); } Compute WS-

t = ∑clauses C w(C) P(C is sat. | St = false); Choose truth assignment to f in St that maximizes Wf+

t , WS-t ;

Remove satisfied clauses C; t++;}

• Runtime: O(|S||C|)

• Approximation guarantee of 1/211

MaxSAT Alg.

Deductive Grounding Algorithm (SLD Resolution/Datalog)

/\

graduatedFrom(Surajit, Princeton)

hasAdvisor(Surajit,Jeff)

worksAt(Jeff,Stanford

)

graduatedFrom(Surajit, Stanford)

Query graduatedFrom(Surajit, y)

First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z)

[0.4]

graduatedFrom(x,y) graduatedFrom(x,z) y=z

Base FactsgraduatedFrom(Surajit, Princeton)

[0.7]graduatedFrom(Surajit, Stanford)

[0.6]graduatedFrom(David, Princeton)

[0.9]hasAdvisor(Surajit, Jeff) [0.8]hasAdvisor(David, Jeff) [0.7]worksAt(Jeff, Stanford) [0.9]type(Princeton, University) [1.0]type(Stanford, University) [1.0]type(Jeff, Computer_Scientist) [1.0]type(Surajit, Computer_Scientist)

[1.0]type(David, Computer_Scientist)

[1.0]


12

Grounded Rules

hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)

gradFrom(Surajit, Stanford) gradFrom(Surajit, Princeton)

Dependency Graph of a Query

SLD grounding always starts from a query literal and first pursues over the soft deduction rules.

Grounding is also iterated over the hard rules in a top-down fashion by using the literals in each hard rule as new subqueries.

Cycles (due to recursive rules) are detected and resolved via a form of tabling known from Datalog.

Grounding terminates when a closure is reached, i.e., when no new facts can be grounded from the rules and all subgoals are either resolved or form the root of a cycle.


13

Weighted MaxSAT AlgorithmGeneral ideaCompute a potential function Wt that iterates over all hard rules St and set the fact f St that maximizes Wt (or none of them) to true; set all other facts in St to false.


14

At iteration 0, we have

At any intermediate iteration t, we compare

At the final iteration t_max, all facts are assigned either true or false.

Wt_max is equal to the total weight of all clauses that are satisfied.

Step 1

Weights w(fi) and probabilities pi


15

{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }

{ gradFrom(David, Stanford), gradFrom(David, Princeton) }

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4

worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9

S:

Mut

ex-c

onst

.C

: W

eigh

ted

Hor

n cl

ause

s (C

NF

)

Fact fi w(fi) pi

gradFrom(Surajit, Stanford) 0.7 1.0

gradFrom(Surajit, Princeton) 0.6 0.0

gradFrom(David, Stanford) 0.0 0.0

gradFrom(David, Princeton) 0.9 1.0

worksAt(Jeff, Stanford) 0.9 1.0

hasAdvisor(Surajit, Jeff) 0.8 1.0

hasAdvisor(David, Jeff) 0.7 1.0


16

Step 2



S:

Mut

ex-c

onst

.C

: W

eigh

ted

Hor

n cl

ause

s (C

NF

)


Fact fi w(fi) pi















Fact fi w(fi) pi







hasAdvisor(David, Jeff) 0.7 1.0Query-Time Reasoning in Uncertain RDF

Knowledge Bases17

Step 2



S:

Mut

ex-c

onst

.C

: W

eigh

ted

Hor

n cl

ause

s (C

NF

)

C1: hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)

P(C1) = 1 – (1-(1-1))(1-(1-1))(1-1) = 1

single partition, negated: 1 - pi

single partition, negated: 1 - pi

single partition, positive: pi


18

Step 2



S:

Mut

ex-c

onst

.C

: W

eigh

ted

Hor

n cl

ause

s (C

NF

) Weights w(fi) and probabilities pi

P(C1 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-1) = 1

P(C2 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-0)

= 0 ...

W0 = 0.4 + 0.9 + 0.8 + 0.7 + 0.6 + 0.7 + 0.9 = 5.0

Fact fi w(fi) pi















19

Step 3



S:

Mut

ex-c

onst

.C

: W

eigh

ted

Hor

n cl

ause

s (C

NF

) Weights w(fi), probabilities pi, truth values

P(C1 is satisfied | f1=true) = 1-(1-(1-1))(1-(1-1))(1-1) = 1

P(C1 is satisfied | f2=true) = 1-(1-(1-1))(1-(1-1))

(1-0) = 0 ...

W1 = 0.4 + 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.8

W2 = 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.4

Fact fi w(fi) pi








true

false

false

true

true

true

true

Experiments – Setup YAGO Knowledge Base

2 Mio entities, 20 Mio facts Soft Rules

16 soft rules (hand-crafted deduction rules with weights)

Hard Rules 5 predicates with functional properties (bornIn, diedIn, bornOnDate, diedOnDate, marriedTo)

Queries 10 conjunctive SPARQL queries

Markov Logic as Competitor (based on MCMC) MAP inference: Alchemy employs a form of

MaxWalkSAT MC-SAT: Iterative MaxSAT & Gibbs sampling


20

YAGO Knowledge Base: URDF vs. Markov Logic

URDF: SLD grounding & MaxSat solving

|C| - # ground literals in soft rules|S| - # ground literals in hard rules

URDF vs. Markov Logic (MAP inference & MC-SAT)

• First run: ground each query against the rules (SLD grounding + MaxSAT solving) & report sum of runtimes• Asymptotic runtime checks: synthetic soft rule expansions


21

Recursive Rules & LUBM Benchmark

42 inductively learned (partly recursive) rules over 20 Mio facts in YAGO

URDF grounding with different maximum SLD levels


22

URDF (SLD grounding + MaxSAT) vs. Jena (only grounding) over the LUBM benchmark SF-1: 103,397 triplets SF-5: 646,128 triplets SF-10: 1,316,993 triplets

Current & Future Topics... Temporal consistency reasoning

Soft/hard rules with temporal predicates Soft deduction rules: deduce confidence distribution of

derived facts

Learning soft rules & consistency constraints Explore how Inductive Logic Programming can be applied

to large, uncertain & incomplete knowledge bases

More solving/sampling Linear-time constrained & weighted MaxSAT solver Improved Gibbs sampling with soft & hard rules

Scale-out Distributed grounding via message passing

Updates/versioning for (linked) RDF data Non-monotonic answers for rules with negation!Query-Time Reasoning in Uncertain RDF

Knowledge Bases23

Online Demo!

urdf.mpi-inf.mpg.de


24

http://infao5501.ag5.mpi-sb.mpg.de:8080/urdf/UViz.html

Documents

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald