91
Learning and Reasoning for AI Luc De Raedt [email protected]

Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Learning and Reasoning for AI

Luc De Raedt [email protected]

Page 2: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Roadmap• Prob. Programming - Modeling

• Inference

• Learning

• Dynamics

• KBMC & Markov Logic

• DeepProbLog

• From StarAI to Nesy

... with some detours on the way2

Page 3: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Part V: KBMC, Markov Logic

3

Page 4: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A key question in AI:Dealing with uncertainty

Reasoning with relational data

Learning

Statistical relational learning& Probabilistic programming, ...

?• logic• databases• programming• ...

• probability theory• graphical models• ...

• parameters• structure

4

so far

Page 5: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A key question in AI:Dealing with uncertainty

Reasoning with relational data

Learning

Statistical relational learning & Probabilistic programming, ...

?• logic• databases• programming• ...

• probability theory• graphical models• ...

• parameters• structure

5

next

Page 6: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Flexible and Compact Relational Model for Predicting Grades

“Program” Abstraction:▪ S, C logical variable representing students, courses▪ the set of individuals of a type is called a population▪ Int(S), Grade(S, C), D(C) are parametrized random variables

Grounding:• for every student s, there is a random variable Int(s)• for every course c, there is a random variable Di(c)• for every s, c pair there is a random variable Grade(s,c)• all instances share the same structure and parameters

Page 7: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

G

7

ProbLog by example: Grading

Shows relational structure

grounded model: replace variables by constants

Works for any number of students / classes (for 1000 students and 100 classes, you get 101100 random variables); still only few parameters

With SRL / PP

build and learn compact models,

from one set of individuals - > other sets;

reason also about exchangeability,

build even more complex models,

incorporate background knowledge

Page 8: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Lots of proposals in the literature, e.g.• relational Markov networks (RMNs) [Taskar et al 2002]

• Markov logic networks (MLNs) [Richardson & Domingos 2006]

• probabilistic soft logic (PSL) [Broecheler et al 2010]

• FACTORIE [McCallum et al 2009]

• Bayesian logic programs (BLPs) [Kersting & De Raedt 2001]

• relational Bayesian networks (RBNs) [Jaeger 2002]

• logical Bayesian networks (LBNs) [Fierens et al 2005]

• probabilistic relational models (PRMs) [Koller & Pfeffer 1998]

• Bayesian logic (BLOG) [Milch et al 2005]

• CLP(BN) [Santos Costa et al 2008]

• and many more ...

8

Page 9: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Probabilistic Relational Models (PRMs)

9

PersonBloodtype

M-chromosomeP-chromosome

Person

Bloodtype M-chromosome

P-chromosome

(Father)

Person

Bloodtype M-chromosome

P-chromosome

(Mother)

Table

[Getoor,Koller, Pfeffer]

[Getoor,Koller, Pfeffer]

Page 10: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Probabilistic Relational Models (PRMs)

10

PersonBloodtype

M-chromosomeP-chromosome

Person

Bloodtype M-chromosome

P-chromosome

(Father)

Person

Bloodtype M-chromosome

P-chromosome

(Mother)

Table[Getoor,Koller, Pfeffer]

[Getoor,Koller, Pfeffer]

bt(Person)= BT.

pc(Person)= PC.

mc(Person) = MC.

bt(Person)=BT | pc(Person) =PC , mc(Person) =MC. pc(Person) = PC | pc_father(Father)= PCf, mc_father(Father)= MCf.

pc_father(Person) =PCf | father(Father,Person),pc(Father)=PC. ...

View :

Dependencies (CPDs associated with):

father(Father,Person). mother(Mother,Person).

Page 11: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Probabilistic Relational Models (PRMs) Bayesian Logic Programs (BLPs)

11

father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry).

bt(Person)=BT | pc(Person)=PC, mc(Person)=MC.pc(Person)=PC | pc_father(Person)=PCf, mc_father(Person)=MCf.mc(Person)=MC | pc_mother(Person)=PCm, pc_mother(Person)=MCm.

mc(rex)

bt(rex)

pc(rex)mc(ann) pc(ann)

bt(ann)

mc(fred) pc(fred)

bt(fred)

mc(brian)

bt(brian)

pc(brian)mc(utta) pc(utta)

bt(utta)

mc(doro) pc(doro)

bt(doro)

mc(henry)pc(henry)

bt(henry)

RV State

pc_father(Person)= PCf | father(Father,Person),pc(Father) = PC. ...

Extension

Intension

Page 12: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Answering Queries

12

mc(rex)

bt(rex)

pc(rex)mc(ann) pc(ann)

bt(ann)

mc(fred) pc(fred)

bt(fred)

mc(brian)

bt(brian)

pc(brian)mc(utta) pc(utta)

bt(utta)

mc(doro) pc(doro)

bt(doro)

mc(henry)pc(henry)

bt(henry)

P(bt(ann)) ?

Support Network

Page 13: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Answering Queries

13

mc(rex)

bt(rex)

pc(rex)mc(ann) pc(ann)

bt(ann)

mc(fred) pc(fred)

bt(fred)

mc(brian)

bt(brian)

pc(brian)mc(utta) pc(utta)

bt(utta)

mc(doro) pc(doro)

bt(doro)

mc(henry)pc(henry)

bt(henry)

P(bt(ann), bt(fred)) ?

P(bt(ann)| bt(fred)) =P(bt(ann),bt(fred))

P(bt(fred))

Bayes‘ rule

Page 14: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Combining Rules

• Students reads two books

• Typical, noisy-or, noisy-max,

• ...

14

P(A|B,C)

P(A|B) and P(A|C)

prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic).

Page 15: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Knowledge Based Model Construction

Extension + Intension =>Probabilistic Model

Advantages

same intension used for multiple extensions

parameters are being shared / tied together

unification is essential

•learning becomes feasible

•max. likelihood parameter estimation & structure learning

15

Page 16: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Bayesian Logic Programs

16

% apriori nodes nat(0).

% aposteriori nodes nat(s(X)) | nat(X).

nat(0) nat(s(0)) nat(s(s(0)) ...MC

% apriori nodes state(0).

% aposteriori nodes state(s(Time)) | state(Time). output(Time) | state(Time)

state(0)

output(0)

state(s(0))

output(s(0))

...HMM

% apriori nodes n1(0).

% aposteriori nodes n1(s(TimeSlice) | n2(TimeSlice). n2(TimeSlice) | n1(TimeSlice). n3(TimeSlice) | n1(TimeSlice), n2(TimeSlice).

n1(0)

n2(0)

n3(0)

n1(s(0))

n2(s(0))

n3(s(0))

...DBN

pure Pro

log

Prolog and Bayesian Nets as Special Case

Page 17: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Learning BLPs

RVs + States = (partial) Herbrand interpretationProbabilistic learning from interpretations

Family(1)pc(brian)=b,bt(ann)=a,bt(brian)=?,bt(dorothy)=a

Family(2)bt(cecily)=ab,pc(henry)=a,mc(fred)=?,bt(kim)=a,pc(bob)=b

Backgroundm(ann,dorothy),f(brian,dorothy),m(cecily,fred),f(henry,fred),f(fred,bob),m(kim,bob),...

Family(3)pc(rex)=b,bt(doro)=a,bt(brian)=?

17

Page 18: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Parameter Estimation

• +

bt(Person,BT) | pc(Person,PC), mc(Person,MC).pc(Person,PC) | pc_father(Person,PCf), mc_father(Person,MCf).mc(Person,MC) | pc_mother(Person,PCm), pc_mother(Person,MCm).

yields

18

Page 19: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Parameter Estimation

• +

bt(Person,BT) | pc(Person,PC), mc(Person,MC).pc(Person,PC) | pc_father(Person,PCf), mc_father(Person,MCf).mc(Person,MC) | pc_mother(Person,PCm), pc_mother(Person,MCm).

yields

Parameter tying

19

Page 20: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Expectation Maximization

Initial Parameters q0

Logic Program L

Expected counts of a clause

Update parameters (ML, MAP)

Maximization

EM-algorithm:iterate until convergence

Current Model (M,qk)

P( head(GI), body(GI) | DC )MM

DataCaseDC

Ground InstanceGI

P( head(GI), body(GI) | DC )MM

DataCaseDC

Ground InstanceGI

P( body(GI) | DC )MM

DataCaseDC

Ground InstanceGI

20

Page 21: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Markov Logic: Intuition

▪ Undirected graphical model▪ A logical KB is a set of hard constraints

on the set of possible worlds▪ Let’s make them soft constraints:

When a world violates a formula,it becomes less probable, not impossible

▪ Give each formula a weight(Higher weight ⇒ Stronger constraint)

( )∑∝ satisfiesit formulas of weightsexpP(world)

Page 22: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A possible worlds view

),( BobAnnaFriends¬

)(BobHappy)(BobHappy¬

),( BobAnnaFriends

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Say we have two domain elements Anna and Bob as well as two predicates Friends and Happy

slides by Pedro Domingos

Page 23: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A possible worlds view

),( BobAnnaFriends¬

)(BobHappy)(BobHappy¬

),( BobAnnaFriends

)(),(BobHappy

BobAnnaFriends∨

¬

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Logical formulas such as not Friends(Anna,Bob) or Happy(Bob)

exclude possible worlds

slides by Pedro Domingos

Page 24: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A possible worlds view

),( BobAnnaFriends¬

)(BobHappy)(BobHappy¬

),( BobAnnaFriends

1))(),(( =∨¬Φ BobHappyBobAnnaFriends75.0))(),(( =¬∧Φ BobHappyBobAnnaFriends

1 1

175.0

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

four times as likely that rule holds

slides by Pedro Domingos

Page 25: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

An possible worlds view

),( BobAnnaFriends¬

)(BobHappy)(BobHappy¬

),( BobAnnaFriends

29.0)75.0/1log()))(),(((

==

∨¬Φ BobHappyBobAnnaFriendsw

1 1

175.0

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Or as log-linear model this is:

This can also be viewed as building a graphical model

Page 26: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Cancer(A)

Smokes(A) Smokes(B)

Cancer(B)

Suppose we have two constants: Anna (A) and Bob (B)

slides by Pedro Domingos

Markov Logic

Page 27: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Suppose we have two constants: Anna (A) and Bob (B)

slides by Pedro Domingos

Markov Logic

Page 28: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Suppose we have two constants: Anna (A) and Bob (B)

slides by Pedro Domingos

Markov Logic

Page 29: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Suppose we have two constants: Anna (A) and Bob (B)

slides by Pedro Domingos

Markov Logic

Page 30: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Markov Logic

30

𝑪(𝑨) 𝑺(𝑨) 𝑭 (𝑨, 𝑩) 𝑭 (𝑩, 𝑨) 𝑪(𝑩)𝑺(𝑩)

F1(A) F1(B)F2(A,B)

𝑭 (𝑨, 𝑨)

F2(A,A) F2(B,A)

𝑭 (𝑩, 𝑩)

F2(B,B)

represented as a factor graph

P(Interpretation) ∝ ∏i,θ

Fi(X, Y )θ = ∏i,θ

exp(wi𝕀(Interpretation ⊧ Fi(X, Y )θ)

Page 31: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Markov Logic

▪ A Markov Logic Network (MLN) is a set of pairs (F, w) where▪ F is a formula in first-order logic▪ w is a real number

▪ An MLN defines a Markov network with▪ One node for each grounding of each predicate

in the MLN▪ One feature for each grounding of each formula F in the MLN,

with the corresponding weight w▪ Probability of a world

Weight of formula i No. of true groundings of formula i in x

!"

#$%

&= ∑

iii xnw

ZxP )(exp

1)(

Page 32: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Possible WorldsA vocabulary

Possible worldsLogical interpretations

Sm

okes

(Alic

e)

Sm

okes

(Bob

)

Frie

nds(

Alic

e,Bo

b)

Frie

nds(

Bob,

Alic

e)

Slides adapted from Guy Van den Broeck

Page 33: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A logical theory

Interpretations that satisfy the theoryModels

∀x,y, Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Sm

okes

(Alic

e)

Sm

okes

(Bob

)

Frie

nds(

Alic

e,Bo

b)

Frie

nds(

Bob,

Alic

e)

Possible Worlds

Slides adapted from Guy Van den Broeck

Page 34: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A logical theory

First-Order Model Counting

First-order model count~#SAT

Sm

okes

(Alic

e)

Sm

okes

(Bob

)

Frie

nds(

Alic

e,Bo

b)

Frie

nds(

Bob,

Alic

e)

∀x,y, Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Slides Guy Van den Broeck

Page 35: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Markov Logic

▪ A Markov Logic Network (MLN) is a set of pairs (F, w) where▪ F is a formula in first-order logic▪ w is a real number

▪ An MLN defines a Markov network with▪ One node for each grounding of each predicate

in the MLN▪ One feature for each grounding of each formula F in the MLN,

with the corresponding weight w▪ Probability of a world

Weight of formula i No. of true groundings of formula i in x

!"

#$%

&= ∑

iii xnw

ZxP )(exp

1)(

Page 36: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

1.5 ∀x,y, Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Sm

okes

(Alic

e)

Sm

okes

(Bob

)

Frie

nds(

Alic

e,Bo

b)

Frie

nds(

Bob,

Alic

e)

Markov Logic

Slides adapted from Guy Van den Broeck

counting only substitutions for which X =/= Y X=Alice, Y=BobX=Bob, Y=Alice

1

Zexp(1.5 ⇤ 2)

1

Zexp(1.5 ⇤ 2)

1

Zexp(1.5 ⇤ 1)

A Markov Logic theory

Page 37: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

1.5 ∀x,y, Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)

Sm

okes

(Alic

e)

Sm

okes

(Bob

)

Frie

nds(

Alic

e,Bo

b)

Frie

nds(

Bob,

Alic

e)

Markov Logic

Slides adapted from Guy Van den Broeck

1

Zexp(1.5 ⇤ 2)

1

Zexp(1.5 ⇤ 2)

1

Zexp(1.5 ⇤ 1)

A Markov Logic theory

Zpartition function

Page 38: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

A logical theory and a weight function for predicates

Weighted first-order model count∑

Weighted First-Order Model Counting S

mok

es(A

lice)

Sm

okes

(Bob

)

Frie

nds(

Alic

e,Bo

b)

Frie

nds(

Bob,

Alic

e) Smokes → 1 ¬Smokes → 2 Friends → 4 ¬Friends → 1

Related to ProbLog Inference !

Page 39: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Parameter Learning

39

No. of times clause i is true in data

Expected no. times clause i is true according to MLN

[ ])()()(log xnExnxPw iwiwi

−=∂

Has been used for generative learning (Pseudolikelihood); Many variations (also discriminative); applications in networks, NLP, bioinformatics, …

Page 40: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Applications

▪ Natural language processing, Collective Classification, Social Networks, Activity Recognition, …

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Page 41: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Information Extraction

Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).

Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.

H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.

P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Page 42: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Segmentation

Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).

Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.

H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.

P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.

AuthorTitleVenue

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Page 43: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Entity Resolution

Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).

Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.

H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.

P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Page 44: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Entity Resolution

Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).

Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.

H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.

P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Page 45: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Roadmap• Prob. Programming - Modeling

• Inference

• Learning

• Dynamics

• KBMC & Markov Logic

• DeepProbLog

• From StarAI to Nesy

... with some detours on the way45

Page 46: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Part VI: DeepProbLog

46

Page 47: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Learning

PROBABILITY

LOGIC

THREE DIFFERENT PARADIGMS FOR LEARNING

NEURAL

Page 48: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Integrate Deep Learning and (Probabilistic) Logics ?

48

earthquake burglary

alarmhears_alarm

callsAre there an equal number of large things

and metal spheres?

Deep Learning

Logic

?

Cf. Visual Genome en Clevr datasets

Neural-symbolic learning and reasoning: A survey and interpretation.[Besold et all ]

Page 49: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

NeSY state-of-the-art• The integration of perception and reasoning is still an open problem.

• Main idea: inject/encode logic into neural networks (and let the NN do the rest)

• Encoding logic in the weights of neural networks

• Learning embeddings for logical entities

• Logical constraints as a regularizer during training

• Templating neural networks

• Building neural networks from functional programs

• Building neural networks from backwards proving

• Differentiable neural computers / program interpreters

Page 50: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

State-of-the-art

• Encoding logic in the weights of neural networks

• Logic Tensor Networks (Serafini et al.)

• A Semantic Loss Function for Deep Learning with Symbolic Knowledge (Xu et al.)

• Ontology Reasoning with Deep Neural Networks (Hohenecker et al.)

• Semantic Based Regularization (Diligenti et al.)

50

Page 51: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

State-of-the-art

• Templates for neural networks (ako Knowledge Base Model Construction)

• Lifted Relational Neural Networks (Šourek et al.)

• Neural Theorem Prover (Rocktäschel et al.)

• Neural Module Networks (Andreas et al.)

51

Page 52: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

State-of-the-art

• Differentiable neural computers / program interpreters

• Differentiable Neural Computer (Graves et al.)

• Neural Programmer-Interpreters (Reed et al.)

• Differentiable Forth Interpreter (Bošnjak et al.)

52

Page 53: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DeepProbLogIdea: inject neural networks into logic by extending an existing PLP language

DeepProbLog = ProbLog + neural predicate

The neural predicate makes neural networks a first-class citizen

53

Related work DeepProbLog

Logic is made less expressive Full expressivity is retained

Logic is pushed into the neural network Clean separation

Fuzzy logic Probabilistic logicLanguage semantics unclear Clear semantics

NeurIPS 2018

Page 54: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Neural predicate

• Neural networks have uncertainty in their predictions

• A normalized output can be interpreted as a probability distribution

• Neural predicate models the output as probabilistic facts

• No changes needed in the probabilistic host language

54

Neural network

Page 55: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

The neural predicateThe output of the neural network is probabilistic facts in DeepProbLog

Example:

nn(mnist_net, [X], Y, [0 ... 9] ) :: digit(X,Y).

Instantiated into a (neural) Annotated Disjunction:

0.04::digit( ,0) ; 0.35::digit( ,1) ; ... ; 0.53::digit( ,7) ; ... ; 0.014::digit( ,9).

Page 56: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

DeepProbLog exemplified: MNIST addition

Task: Classify pairs of MNIST digits with their sum

Benefit of DeepProbLog:

• Encode addition in logic

• Separate addition from digit classification

8411

nn(mnist_net, [X], Y, [0 ... 9] ) :: digit(X,Y).

addition(X,Y,Z) :- digit(X,N1), digit(Y,N2), Z is N1+N2.

Examples: addition( , ,8), addition( , ,4), addition( , ,11), …

Page 57: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

DeepProbLog exemplified: MNIST addition

Task: Classify pairs of MNIST digits with their sum

Benefit of DeepProbLog:

• Encode addition in logic

• Separate addition from digit classification

8411

nn(mnist_net, [X], Y, [0 ... 9] ) :: digit(X,Y).

addition(X,Y,Z) :- digit(X,N1), digit(Y,N2), Z is N1+N2.

addition( , ,8) :- digit( ,N1), digit( ,N2), 8 is N1 + N2.

Examples: addition( , ,8), addition( , ,4), addition( , ,11), …

Page 58: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

ExampleLearn to classify the sum of pairs of MNIST digits

Individual digits are not labeled!

E.g. ( , , 8)

Could be done by a CNN: classify the concatenation of both images into 19 classes

However:

58

+ = ?

Page 59: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

MNIST Addition• Pairs of MNIST images, labeled

with sum

• Baseline: CNN

• Classifies concatenation of both images into classes 0 ...18

• DeepProbLog:

• CNN that classifies images into 0 … 9

• Two lines of DeepProblog code

59

Page 60: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Multi-digit MNIST addition with MNIST

Result

60

number ( [ ] , Result , Result ) .number ( [H | T ] , Acc , Result) :−

digit(H, Nr ), Acc2 is Nr +10*Acc , number ( T , Acc2 , Result ) .

number (X,Y) :− number (X, 0 ,Y ) .

multiaddition(X, Y, Z ) :− number (X, X2 ) ,

number (Y, Y2 ) , Z is X2+Y2 .

Page 61: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

ExampleLearn to classify the sum of pairs of MNIST digits

Individual digits are not labeled!

E.g. ( , , 8)

Could be done by a CNN: classify the concatenation of both images into 19 classes

However:

61

+ = ?

Page 62: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

(Deep)ProbLog : Inference

Page 63: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Inference / Reasoning• Most of the work in PP and StarAI is on

inference

• It is hard (complexity wise)

• Many inference methods

• exact, approximate, sampling and lifted …

• Inference is the key to learning

63

Page 64: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

ProbLog InferenceAnswering a query in a ProbLog program happens in four steps1. Grounding the program w.r.t. the query2. Rewrite the ground logic program into a propositional logic formula3. Compile the formula into an arithmetic circuit4. Evaluate the arithmetic circuit

0.1 :: burglary. 0.5 :: hears_alarm(mary).

0.2 :: earthquake. 0.4 :: hears_alarm(john).

alarm :– earthquake.

alarm :– burglary. calls(X) :– alarm, hears_alarm(X).

Query

?-P(calls(mary)

Page 65: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

ProbLog InferenceAnswering a query in a ProbLog program happens in four steps1. Grounding the program w.r.t. the query (only relevant part !)2. Rewrite the ground logic program into a propositional logic formula3. Compile the formula into an arithmetic circuit4. Evaluate the arithmetic circuit

0.1 :: burglary. 0.5 :: hears_alarm(mary).

0.2 :: earthquake. 0.4 :: hears_alarm(john).

alarm :– earthquake.

alarm :– burglary. calls(mary) :– alarm, hears_alarm(mary).calls(john) :– alarm, hears_alarm(john).

Query

?-P(calls(mary)

Page 66: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

ProbLog InferenceAnswering a query in a ProbLog program happens in four steps1. Grounding the program w.r.t. the query 2. Rewrite the ground logic program into a propositional logic formula3. Compile the formula into an arithmetic circuit4. Evaluate the arithmetic circuit

0.1 :: burglary. 0.5 :: hears_alarm(mary).

0.2 :: earthquake. 0.4 :: hears_alarm(john).

alarm :– earthquake.

alarm :– burglary. calls(mary) :– alarm, hears_alarm(mary).

calls(john) :– alarm, hears_alarm(john).

calls(mary)

hears_alarm(mary) ∧ (burglary ∨ earthquake)

Page 67: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

ProbLog InferenceAnswering a query in a ProbLog program happens in four steps1. Grounding the program w.r.t. the query 2. Rewrite the ground logic program into a propositional logic formula3. Compile the formula into an arithmetic circuit (knowledge compilation)4. Evaluate the arithmetic circuit

calls(mary)

hears_alarm(mary) ∧ (burglary ∨ earthquake) AND AND

AND

OR

calls(mary)

¬earthquake

0.8

earthquake

0.2

burglary

0.1

hears_alarm(mary)

0.5

0.08 0.1

0.04

0.14

Page 68: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

ProbLog InferenceAnswering a query in a ProbLog program happens in four steps1. Grounding the program w.r.t. the query 2. Rewrite the ground logic program into a propositional logic formula3. Compile the formula into an arithmetic circuit (knowledge compilation)4. Evaluate the arithmetic circuit

calls(mary)

hears_alarm(mary) ∧ (burglary ∨ earthquake) AND AND

AND

OR

calls(mary)

¬earthquake

0.8

earthquake

0.2

burglary

0.1

hears_alarm(mary)

0.5

0.08 0.1

0.04

0.14

Page 69: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

Optimization

PLP usually considers the inference settings

DeepProbLog focuses on optimization• We have a set of tuples (q,p)• q is a query and p its desired success probability

Page 70: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DeepProbLog

• We use our algebraic ProbLog and use the gradient semi-ring

• What is aProbLog ?

• a version of ProbLog where the probabilistic semi-ring is replaced by an arbitrary semiring structure

• labels on facts are elements of the semiring

• cf. the different semi-rings for the WMC, #SAT, …

70

Page 71: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

more examples

71

semiring label functionI I

Page 72: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

Implementing DeepProbLog

1. Evaluating the neural networks:• Instantiate the neural annotated disjunction• Happens during grounding• ProbLog already had support for external functions

2. Perform backpropagation in the neural networks• No direct loss for neural networks• Loss defined on the logic level• Derive gradient in logic• Start backpropagation with derived gradient

Page 73: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

Deriving the gradient• The output of the neural network are probabilistic facts

• Probabilistic facts are leaves in the AC

• The AC is a differentiable structure

• We can derive it in the forward pass along with the probability

aProbLog + gradient semiring

Page 74: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

Gradient semiringt(0.2) :: earthquake.

t(0.1) :: burglary.

0.5 :: hears_alarm.

alarm :- earthquake.

alarm :- burglary.

calls :- alarm, hears_alarm.

Page 75: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

DTAI research group

The DeepProbLog pipeline

Page 76: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

EXPERIMENTS

76

Page 77: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Program Induction• Approach similar to that of ‘Programming with a Differentiable

Forth Interpreter’ [1] (∂4)

• Partially defined Forth program with slots / holes

• Slots are filled by neural network (encoder / decoder)

• Fully differentiable interpreter: NNs are trained with input / output examples

• DeepProbLog program with switches

• Switches are controlled by neural networks

77

[1]: Matko Bosnjak, Tim Rocktäschel, Jason Naradowsky, Sebastian Riedel: Programming with a Differentiable Forth Interpreter. ICML 2017: 547-556

Logic

Neural

Page 78: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

● Sorting○ Sort lists of numbers using Bubble sort○ Hole: Swap or don’t swap when comparing two numbers

● Addition○ Add two numbers and a carry○ Hole: What is the resulting digit and carry on each step○ (Note: not MNIST digits, but actual numbers)

● Word Algebra Problems○ E.g. “Ann has 8 apples. She buys 4 more. She distributes them equally

among her 3 kids. How many apples does each child receive?○ Hole: Sequence of permuting, swapping and performing operations on

the three numbers[1]: Matko Bosnjak, Tim Rocktäschel, Jason Naradowsky, Sebastian Riedel: Programming with a Differentiable Forth Interpreter. ICML 2017: 547-556

Tasks[1]

Page 79: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

hole(X,Y,X,Y):- swap(X,Y,0).

hole(X,Y,Y,X):- swap(X,Y,1).

bubble([X],[],X).bubble([H1,H2|T],[X1|T1],X):- hole(H1,H2,X1,X2), bubble([X2|T],T1,X).

bubblesort([],L,L).

bubblesort(L,L3,Sorted) :- bubble(L,L2,X), bubblesort(L2,[X|L3],Sorted).

sort(L,L2) :- bubblesort(L,[],L2).

Holes defined by neural predicate

Bubble sort implementation

Example DeepProbLog solution

Page 80: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Result

80

Page 81: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Noisy AdditionProbability

nn(classifier, [X], Y, [0 .. 9]) :: digit(X,Y).t(0.2) :: noisy.

1/19 :: uniform(X,Y,0) ; ... ; 1/19 :: uniform(X,Y,18).

addition(X,Y,Z) :- noisy, uniform(X,Y,Z).addition(X,Y,Z) :- \+noisy, digit(X,N1), digit(Y,N2), Z is N1+N2.

(a) The DeepProbLog program.

nn(classifier,[a],0) :: digit(a,0); nn(classifier,[a],1) :: digit(a,1).nn(classifier,[b],0) :: digit(b,0); nn(classifier,[b],1) :: digit(b,1).t(0.2)::noisy.

1/19::uniform(a,b,1).addition(a,b,1) :- noisy, uniform(a,b,1).

addition(a,b,1) :- \+noisy, digit(a,0), digit(b,1).addition(a,b,1) :- \+noisy, digit(a,1), digit(b,0).

(b) The ground DeepProbLog program.

(c) The AC for query addition(a,b,1).

Figure 4: Parameter learning in DeepProbLog. (Example 5)

Figure 5: The learning pipeline.

19

Neural

Page 82: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Noisy AdditionProbability

nn(classifier, [X], Y, [0 .. 9]) :: digit(X,Y).t(0.2) :: noisy.

1/19 :: uniform(X,Y,0) ; ... ; 1/19 :: uniform(X,Y,18).

addition(X,Y,Z) :- noisy, uniform(X,Y,Z).addition(X,Y,Z) :- \+noisy, digit(X,N1), digit(Y,N2), Z is N1+N2.

(a) The DeepProbLog program.

nn(classifier,[a],0) :: digit(a,0); nn(classifier,[a],1) :: digit(a,1).nn(classifier,[b],0) :: digit(b,0); nn(classifier,[b],1) :: digit(b,1).t(0.2)::noisy.

1/19::uniform(a,b,1).addition(a,b,1) :- noisy, uniform(a,b,1).

addition(a,b,1) :- \+noisy, digit(a,0), digit(b,1).addition(a,b,1) :- \+noisy, digit(a,1), digit(b,0).

(b) The ground DeepProbLog program.

(c) The AC for query addition(a,b,1).

Figure 4: Parameter learning in DeepProbLog. (Example 5)

Figure 5: The learning pipeline.

19

nn(classifier, [X], Y, [0 .. 9]) :: digit(X,Y).t(0.2) :: noisy.

1/19 :: uniform(X,Y,0) ; ... ; 1/19 :: uniform(X,Y,18).

addition(X,Y,Z) :- noisy, uniform(X,Y,Z).addition(X,Y,Z) :- \+noisy, digit(X,N1), digit(Y,N2), Z is N1+N2.

(a) The DeepProbLog program.

nn(classifier,[a],0) :: digit(a,0); nn(classifier,[a],1) :: digit(a,1).nn(classifier,[b],0) :: digit(b,0); nn(classifier,[b],1) :: digit(b,1).t(0.2)::noisy.

1/19::uniform(a,b,1).addition(a,b,1) :- noisy, uniform(a,b,1).

addition(a,b,1) :- \+noisy, digit(a,0), digit(b,1).addition(a,b,1) :- \+noisy, digit(a,1), digit(b,0).

(b) The ground DeepProbLog program.

(c) The AC for query addition(a,b,1).

Figure 4: Parameter learning in DeepProbLog. (Example 5)

Figure 5: The learning pipeline.

19

Neural

Page 83: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Noisy Addition

noisy

0.2,[1, 0,0,.. 0,0,..]

addition(a,b,1)

⨁ p,[∂p/∂pnoisy,

∂p/∂pdigit(a,0),...,∂p/∂pdigit(a,9),∂p/∂pdigit(b,0),...,∂p/∂pdigit(b,9)]

¬noisy

0.8,[-1, 0,0,.. 0,0,..]

digit(a,0)

0.8,[0, 1,0,.. 0,0,..]

digit(b,1)

0.6,[0, 0,0,.. 0,1,..]

digit(a,1)

0.1,[0, 0,1,.. 0,0,..]

digit(b,0)

0.2,[0, 0,0,.. 1,0,..]

uniform(a,b,1)

0.053,[0, 0,0,.. 0,0,..]

⨂ ⨂

0.011,[0.053, 0,0,.. 0,0,..]

0.02,[0, 0,0.2,.. 0.1,0,..]

0.48,[0, 0.6,0,.. 0,0.8,..]

0.5,[0, 0.6,0.2,.. 0.1,0.8,..]

0.4,[-0.5, 0.48,0.16,.. 0.08,0.64,..]

0.411,[-0.447, 0.48,0.16,.. 0.08,0.64,..]

Legend

Page 84: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Noisy Addition

Figure 8: The accuracy on the MNIST test set for individual digits while training on (T3).

Fraction of noise0.0 0.2 0.4 0.6 0.8 1.0

Baseline 93.46 87.85 82.49 52.67 8.79 5.87DeepProbLog 97.20 95.78 94.50 92.90 46.42 0.88

DeepProbLog w/ explicit noise 96.64 95.96 95.58 94.12 73.22 2.92Learned fraction of noise 0.000 0.212 0.415 0.618 0.803 0.985

Table 3: The accuracy on the test set for T4.

.

noise tolerant, even retaining an accuracy of 73.2% with 80% noisy labels.As shown in the last row, it is also able to learn the fraction of noisy labelsin the data. This shows that the model is able to recognize which exampleshave noisy labels.

6.2. Program Induction

The second set of problems demonstrates that DeepProbLog can performprogram induction. We follow the program sketching [25] setting of differentiableForth (@4) [8], where holes in given programs need to be filled by neural networkstrained on input-output examples for the entire program. As in their work, weconsider three tasks: addition, sorting [26] and word algebra problems (WAPs)[27].

T5: forth_addition([4], [8], 1, [1, 3])The input consists of two numbers, represented as lists of digits, and acarry. The output is the sum of the numbers and the carry. The programspecifies the basic addition algorithm in which we go from right to left overall digits, calculating the sum of two digits and taking the carry over tothe next pair. The hole in this program corresponds to calculating theresulting digit (result/4) and carry (carry/4), given two digits and theprevious carry.

23

Page 85: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Addition of images only• Examples of the form

• addition( , , )

• What will happen ?Figure 8: The accuracy on the MNIST test set for individual digits while training on (T3).

Fraction of noise0.0 0.2 0.4 0.6 0.8 1.0

Baseline 93.46 87.85 82.49 52.67 8.79 5.87DeepProbLog 97.20 95.78 94.50 92.90 46.42 0.88

DeepProbLog w/ explicit noise 96.64 95.96 95.58 94.12 73.22 2.92Learned fraction of noise 0.000 0.212 0.415 0.618 0.803 0.985

Table 3: The accuracy on the test set for T4.

.

noise tolerant, even retaining an accuracy of 73.2% with 80% noisy labels.As shown in the last row, it is also able to learn the fraction of noisy labelsin the data. This shows that the model is able to recognize which exampleshave noisy labels.

6.2. Program Induction

The second set of problems demonstrates that DeepProbLog can performprogram induction. We follow the program sketching [25] setting of differentiableForth (@4) [8], where holes in given programs need to be filled by neural networkstrained on input-output examples for the entire program. As in their work, weconsider three tasks: addition, sorting [26] and word algebra problems (WAPs)[27].

T5: forth_addition([4], [8], 1, [1, 3])The input consists of two numbers, represented as lists of digits, and acarry. The output is the sum of the numbers and the carry. The programspecifies the basic addition algorithm in which we go from right to left overall digits, calculating the sum of two digits and taking the carry over tothe next pair. The hole in this program corresponds to calculating theresulting digit (result/4) and carry (carry/4), given two digits and theprevious carry.

23

Page 86: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Addition of images only• Examples of the form

• addition( , , )

• What will happen ?

• usual loss function will map all images onto 0 (0 + 0 = 0 )

• can compensate for this by adding regularisation term based on max entropy

Figure 8: The accuracy on the MNIST test set for individual digits while training on (T3).

Fraction of noise0.0 0.2 0.4 0.6 0.8 1.0

Baseline 93.46 87.85 82.49 52.67 8.79 5.87DeepProbLog 97.20 95.78 94.50 92.90 46.42 0.88

DeepProbLog w/ explicit noise 96.64 95.96 95.58 94.12 73.22 2.92Learned fraction of noise 0.000 0.212 0.415 0.618 0.803 0.985

Table 3: The accuracy on the test set for T4.

.

noise tolerant, even retaining an accuracy of 73.2% with 80% noisy labels.As shown in the last row, it is also able to learn the fraction of noisy labelsin the data. This shows that the model is able to recognize which exampleshave noisy labels.

6.2. Program Induction

The second set of problems demonstrates that DeepProbLog can performprogram induction. We follow the program sketching [25] setting of differentiableForth (@4) [8], where holes in given programs need to be filled by neural networkstrained on input-output examples for the entire program. As in their work, weconsider three tasks: addition, sorting [26] and word algebra problems (WAPs)[27].

T5: forth_addition([4], [8], 1, [1, 3])The input consists of two numbers, represented as lists of digits, and acarry. The output is the sum of the numbers and the carry. The programspecifies the basic addition algorithm in which we go from right to left overall digits, calculating the sum of two digits and taking the carry over tothe next pair. The hole in this program corresponds to calculating theresulting digit (result/4) and carry (carry/4), given two digits and theprevious carry.

23

Page 87: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Simplified Poker• dealing with uncertainty

• ignore suits and just with A, J, Q and K

• two players, two cards, and one community card

• train the neural network to recognize the four cards

• reason probabilistically about the non-observed card

• learn the distribution of the unlabeled community card

Probability

NeuralLogic

nn(m_swap, [X]) :: swap(X,Y,).

hole(X,Y,X,Y):-\+swap(X,Y).

hole(X,Y,Y,X):-swap(X,Y).

bubble([X],[],X).bubble([H1,H2|T],[X1|T1],X):-

hole(H1,H2,X1,X2),bubble([X2|T],T1,X).

bubblesort([],L,L).

bubblesort(L,L3,Sorted) :-bubble(L,L2,X),bubblesort(L2,[X|L3],Sorted).

forth_sort(L,L2) :- bubblesort(L,[],L2).

Listing 6: Forth sorting sketch (T6)

Figure A.10: Examples of cards used as input for the Poker without perturbations(T9) ex-

periment.

calculate the result.

In Listing 8, there are two neural predicates: coin1/2 and coin2/2. Theirinput is the image of the two coins (e.g. Figure 9). The output is heads or tails.The coins/2 classifies both coins using these two predicates and then performsthe comparison of the classes with the compare/3 predicate.

In Listing 9, there’s a single neural predicate rank/2 that takes as input theimage of a card and classifies it as either a jack, queen, king or ace. There’s alsoan AD with learnable parameters that represents the distribution of the unseencommunity card (house_rank/1). The hand/2 predicate’s first argument is alist of 3 cards. It unifies the output with any of the valid hands that these cardscontain. The valid hands are: high card, pair (two cards have the same rank),three of a kind (three cards have the same rank), low straight (jack, queen king)and high straight(queen, king, ace). Each hand is assigned a rank with the

41

Distribution Jack Queen King Ace

Actual 0.2 0.4 0.15 0.25Learned 0.203± 0.002 0.396± 0.002 0.155± 0.003 0.246± 0.002

Table 8: The results for the Poker experiment (T9).

two cards and the community card.For simplicity, we only use the jack, queen, king and ace. We also do notconsider the suits of the cards.The input consists of 4 images that show the cards dealt to the two players.Additionally, every example is labeled with the chance that the game iswon, lost or ended in a draw, e.g.:

0.8 :: poker([Q~, Q}, A}, K|], loss)

We expect DeepProbLog to:

• train the neural network to recognize the four cards• reason probabilistically about the non-observed card• learn the distribution of the unlabeled community card

To make DeepProbLog converge more reliably, we add some examples withadditional supervision. Namely, in 10% of the examples we additionallyspecify the community card, i.e.

poker([Q~, Q}, A}, K|], A}, loss).

This also showcases one of the strengths of DeepProbLog, namely, it canmake use of examples that have different levels of observability. The lossfunction used in this experiment is the MSE between the predicted andtarget probabilities.

Results. We ran the experiment 10 times. Out of these 10 runs, 4 didn’tconverge on the correct solution. The average values of the learned pa-rameters for the remaining 6 runs are shown Table 8. As can be seen,DeepProbLog is able to correctly learn the probabilistic parameters. Inthese 6 runs, the neural network also correctly learned to classify all cardtypes, achieving a 100% accuracy. The other runs did not converge becausesome of the classes were permuted (i.e., queens predicted as aces and viceversa) or multiple classes mapped onto the same one (queens and kingswere both predicted as kings).

27

Distribution Jack Queen King Ace

Actual 0.2 0.4 0.15 0.25Learned 0.203± 0.002 0.396± 0.002 0.155± 0.003 0.246± 0.002

Table 8: The results for the Poker experiment (T9).

two cards and the community card.For simplicity, we only use the jack, queen, king and ace. We also do notconsider the suits of the cards.The input consists of 4 images that show the cards dealt to the two players.Additionally, every example is labeled with the chance that the game iswon, lost or ended in a draw, e.g.:

0.8 :: poker([Q~, Q}, A}, K|], loss)

We expect DeepProbLog to:

• train the neural network to recognize the four cards• reason probabilistically about the non-observed card• learn the distribution of the unlabeled community card

To make DeepProbLog converge more reliably, we add some examples withadditional supervision. Namely, in 10% of the examples we additionallyspecify the community card, i.e.

poker([Q~, Q}, A}, K|], A}, loss).

This also showcases one of the strengths of DeepProbLog, namely, it canmake use of examples that have different levels of observability. The lossfunction used in this experiment is the MSE between the predicted andtarget probabilities.

Results. We ran the experiment 10 times. Out of these 10 runs, 4 didn’tconverge on the correct solution. The average values of the learned pa-rameters for the remaining 6 runs are shown Table 8. As can be seen,DeepProbLog is able to correctly learn the probabilistic parameters. Inthese 6 runs, the neural network also correctly learned to classify all cardtypes, achieving a 100% accuracy. The other runs did not converge becausesome of the classes were permuted (i.e., queens predicted as aces and viceversa) or multiple classes mapped onto the same one (queens and kingswere both predicted as kings).

27

Distribution Jack Queen King Ace

Actual 0.2 0.4 0.15 0.25Learned 0.203± 0.002 0.396± 0.002 0.155± 0.003 0.246± 0.002

Table 8: The results for the Poker experiment (T9).

two cards and the community card.For simplicity, we only use the jack, queen, king and ace. We also do notconsider the suits of the cards.The input consists of 4 images that show the cards dealt to the two players.Additionally, every example is labeled with the chance that the game iswon, lost or ended in a draw, e.g.:

0.8 :: poker([Q~, Q}, A}, K|], loss)

We expect DeepProbLog to:

• train the neural network to recognize the four cards• reason probabilistically about the non-observed card• learn the distribution of the unlabeled community card

To make DeepProbLog converge more reliably, we add some examples withadditional supervision. Namely, in 10% of the examples we additionallyspecify the community card, i.e.

poker([Q~, Q}, A}, K|], A}, loss).

This also showcases one of the strengths of DeepProbLog, namely, it canmake use of examples that have different levels of observability. The lossfunction used in this experiment is the MSE between the predicted andtarget probabilities.

Results. We ran the experiment 10 times. Out of these 10 runs, 4 didn’tconverge on the correct solution. The average values of the learned pa-rameters for the remaining 6 runs are shown Table 8. As can be seen,DeepProbLog is able to correctly learn the probabilistic parameters. Inthese 6 runs, the neural network also correctly learned to classify all cardtypes, achieving a 100% accuracy. The other runs did not converge becausesome of the classes were permuted (i.e., queens predicted as aces and viceversa) or multiple classes mapped onto the same one (queens and kingswere both predicted as kings).

27

in 6/10 experiments

Page 88: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Challenges

• The data needs to provide a signal (cf. Addition of images only, and Poker … ); aka curriculum layer + regularization

• Scaling up -

• still using the exact inference of ProbLog

• circuits can be very large

• we were working on approximate inference

Page 89: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Further Reading• One book

• Three websites to start

• http://probmods.org/ Probabilistic Models of Cognition — Church

• http://dtai.cs.kuleuven.be/problog/ — check also [DR & Kimmig, MLJ 15]

• http://alchemy.cs.washington.edu/ —Markov Logic, check also [Domingos & Lowd] Markov Logic, Morgan Claypool.

Page 90: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

Thanks!

http://dtai.cs.kuleuven.be/problog

Maurice BruynoogheBart DemoenAnton DriesDaan FierensJason Filippou

Bernd GutmannManfred JaegerGerda Janssens

Kristian KerstingAngelika Kimmig

Theofrastos MantadelisWannes Meert

Bogdan MoldovanSiegfried Nijssen

Davide NittiJoris Renkens

Kate RevoredoRicardo Rocha

Vitor Santos CostaDimitar Shterionov

Ingo ThonHannu Toivonen

Guy Van den BroeckMathias VerbekeJonas Vlasselaer

90

Thanks !

Page 91: Learning and Reasoning for AIluc.deraedt/Francqui4ab.pdf · Lots of proposals in the literature, e.g. • relational Markov networks (RMNs) [Taskar et al 2002] • Markov logic networks

• PRISM http://sato-www.cs.titech.ac.jp/prism/

• ProbLog2 http://dtai.cs.kuleuven.be/problog/

• Yap Prolog http://www.dcc.fc.up.pt/~vsc/Yap/ includes

• ProbLog1

• cplint https://sites.google.com/a/unife.it/ml/cplint

• CLP(BN)

• LP2

• PITA in XSB Prolog http://xsb.sourceforge.net/

• AILog2 http://artint.info/code/ailog/ailog2.html 

• SLPs http://stoics.org.uk/~nicos/sware/pepl

• contdist http://www.cs.sunysb.edu/~cram/contdist/

• DC https://code.google.com/p/distributional-clauses

• WFOMC http://dtai.cs.kuleuven.be/ml/systems/wfomc

PLP Systems

91