368
Probabilistic Circuits Representations Inference Learning Applications Antonio Vergari University of California, Los Angeles YooJung Choi University of California, Los Angeles Robert Peharz TU Eindhoven Guy Van den Broeck University of California, Los Angeles February 7th, 2020 - AAAI 2020 - New York City

Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

ProbabilisticCircuits

RepresentationsInferenceLearningApplications

Antonio VergariUniversity of California, Los Angeles

YooJung ChoiUniversity of California, Los Angeles

Robert PeharzTU Eindhoven

Guy Van den BroeckUniversity of California, Los Angeles

February 7th, 2020 - AAAI 2020 - New York City

Page 2: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Fully factorized

NaiveBayes AndOrGraphs PDGs

Trees PSDDs CNets LTMs SPNs NADEs

Thin Junction Trees ACs MADEs MAFs VAEs

Polytrees FVSBNs TACs IAFs NAFs RAEs

Mixtures BNs NICE FGs GANs

RealNVP MNs

The Alphabet Soup of probabilistic models2/158

Page 3: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Fully factorized

NaiveBayes AndOrGraphs PDGs

Trees PSDDs CNets LTMs SPNs NADEs

Thin Junction Trees ACs MADEs MAFs VAEs

Polytrees FVSBNs TACs IAFs NAFs RAEs

Mixtures BNs NICE FGs GANs

RealNVP MNs

Intractable and tractablemodels3/158

Page 4: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Fully factorized

NaiveBayes AndOrGraphs PDGs

Trees PSDDs CNets LTMs SPNs NADEs

Thin Junction Trees ACs MADEs MAFs VAEs

Polytrees FVSBNs TACs IAFs NAFs RAEs

Mixtures BNs NICE FGs GANs

RealNVP MNs

tractability is a spectrum4/158

Page 5: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Fully factorized

NaiveBayes AndOrGraphs PDGs

Trees PSDDs CNets LTMs SPNs NADEs

Thin Junction Trees ACs MADEs MAFs VAEs

Polytrees FVSBNs TACs IAFs NAFs RAEs

Mixtures BNs NICE FGs GANs

RealNVP MNs

Expressivemodels without compromises5/158

Page 6: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Fully factorized

NaiveBayes AndOrGraphs PDGs

Trees PSDDs CNets LTMs SPNs NADEs

Thin Junction Trees ACs MADEs MAFs VAEs

Polytrees FVSBNs TACs IAFs NAFs RAEs

Mixtures BNs NICE FGs GANs

RealNVP MNs

a unifying framework for tractable models6/158

Page 7: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why tractable inference?or expressiveness vs tractability

7/158

Page 8: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why tractable inference?or expressiveness vs tractability

Probabilistic circuitsa unified framework for tractable probabilistic modeling

7/158

Page 9: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why tractable inference?or expressiveness vs tractability

Probabilistic circuitsa unified framework for tractable probabilistic modeling

Learning circuitslearning their structure and parameters from data

7/158

Page 10: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why tractable inference?or expressiveness vs tractability

Probabilistic circuitsa unified framework for tractable probabilistic modeling

Learning circuitslearning their structure and parameters from data

Representations and theorytracing the boundaries of tractability and connections to other formalisms

7/158

Page 11: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why tractable inference?or the inherent trade-off of tractability vs. expressiveness

Page 12: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q1: What is the probability that today is a Monday andthere is a traffic jam on 5th Avenue?

© fineartamerica.com

9/158

Page 13: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q1: What is the probability that today is a Monday andthere is a traffic jam on 5th Avenue?

q2: Which day is most likely to have a traffic jam on myroute to work?

© fineartamerica.com

9/158

Page 14: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q1: What is the probability that today is a Monday andthere is a traffic jam on 5th Avenue?

q2: Which day is most likely to have a traffic jam on myroute to work?

⇒ fitting a predictive model!

© fineartamerica.com

9/158

Page 15: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q1: What is the probability that today is a Monday andthere is a traffic jam on 5th Avenue?

q2: Which day is most likely to have a traffic jam on myroute to work?

⇒ fitting a predictive model!

© fineartamerica.com

9/158

Page 16: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q1: What is the probability that today is a Monday andthere is a traffic jam on 5th Avenue?

q2: Which day is most likely to have a traffic jam on myroute to work?

⇒ fitting a predictive model!⇒ answering probabilistic queries on a probabilistic

model of the worldm

q1(m) = ? q2(m) = ?

© fineartamerica.com

9/158

Page 17: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q1: What is the probability that today is a Monday andthere is a traffic jam on 5th Avenue?

X = Day,Time, JamStr1, JamStr2, . . . , JamStrN

q1(m) = pm(Day = Mon, Jam5th = 1)

© fineartamerica.com

9/158

Page 18: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q1: What is the probability that today is a Monday andthere is a traffic jam on 5th Avenue?

X = Day,Time, JamStr1, JamStr2, . . . , JamStrN

q1(m) = pm(Day = Mon, Jam5th = 1)

⇒ marginals © fineartamerica.com

9/158

Page 19: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q2: Which day is most likely to have a traffic jam on myroute to work?

X = Day,Time, JamStr1, JamStr2, . . . , JamStrN

q2(m) = argmaxd pm(Day = d ∧∨

i∈route JamStri)

© fineartamerica.com

9/158

Page 20: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why probabilistic inference?

q2: Which day is most likely to have a traffic jam on myroute to work?

X = Day,Time, JamStr1, JamStr2, . . . , JamStrN

q2(m) = argmaxd pm(Day = d ∧∨

i∈route JamStri)

⇒ marginals + MAP + logical events © fineartamerica.com

9/158

Page 21: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Tractable Probabilistic Inference

A class of queriesQ is tractable on a family of probabilistic modelsMiff for any query q ∈ Q and modelm ∈Mexactly computing q(m) runs in timeO(poly(|m|)).

10/158

Page 22: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Tractable Probabilistic Inference

A class of queriesQ is tractable on a family of probabilistic modelsMiff for any query q ∈ Q and modelm ∈Mexactly computing q(m) runs in timeO(poly(|m|)).

⇒ often poly will in fact be linear!

10/158

Page 23: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Tractable Probabilistic Inference

A class of queriesQ is tractable on a family of probabilistic modelsMiff for any query q ∈ Q and modelm ∈Mexactly computing q(m) runs in timeO(poly(|m|)).

⇒ often poly will in fact be linear!

⇒ Note: ifM andQ are compact in the number of random variablesX,that is, |m|, |q| ∈ O(poly(|X|)), then query time isO(poly(|X|)).

10/158

Page 24: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why exact inference?or “What about approximate inference?”

1. No need for approximations when we can be exact

2. We can do exact inference in approximate models [Dechter et al. 2002; Choi

et al. 2010; Lowd et al. 2010; Sontag et al. 2011; Friedman et al. 2018]

3. Approximations shall come with guarantees

4. Approximate inference (even with guarantees) can mislead learners[Kulesza et al. 2007]5. Approximations can be intractable as well [Dagum et al. 1993; Roth 1996]

11/158

Page 25: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why exact inference?or “What about approximate inference?”

1. No need for approximations when we can be exact⇒ do we lose some expressiveness?

2. We can do exact inference in approximate models [Dechter et al. 2002; Choi

et al. 2010; Lowd et al. 2010; Sontag et al. 2011; Friedman et al. 2018]

3. Approximations shall come with guarantees

4. Approximate inference (even with guarantees) can mislead learners[Kulesza et al. 2007]5. Approximations can be intractable as well [Dagum et al. 1993; Roth 1996]

11/158

Page 26: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why exact inference?or “What about approximate inference?”

1. No need for approximations when we can be exact

2. We can do exact inference in approximate models [Dechter et al. 2002; Choi

et al. 2010; Lowd et al. 2010; Sontag et al. 2011; Friedman et al. 2018]

3. Approximations shall come with guarantees⇒ sometimes they do, e.g., [Dechter et al. 2007]

4. Approximate inference (even with guarantees) can mislead learners[Kulesza et al. 2007]5. Approximations can be intractable as well [Dagum et al. 1993; Roth 1996]

11/158

Page 27: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why exact inference?or “What about approximate inference?”

1. No need for approximations when we can be exact

2. We can do exact inference in approximate models [Dechter et al. 2002; Choi

et al. 2010; Lowd et al. 2010; Sontag et al. 2011; Friedman et al. 2018]

3. Approximations shall come with guarantees

4. Approximate inference (even with guarantees) can mislead learners[Kulesza et al. 2007] ⇒ Chaining approximations is flying with a blindfold on

5. Approximations can be intractable as well [Dagum et al. 1993; Roth 1996]

11/158

Page 28: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why exact inference?or “What about approximate inference?”

1. No need for approximations when we can be exact

2. We can do exact inference in approximate models [Dechter et al. 2002; Choi

et al. 2010; Lowd et al. 2010; Sontag et al. 2011; Friedman et al. 2018]

3. Approximations shall come with guarantees

4. Approximate inference (even with guarantees) can mislead learners[Kulesza et al. 2007]

5. Approximations can be intractable as well [Dagum et al. 1993; Roth 1996]

11/158

Page 29: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Stay tuned for...

Next: 1. What are classes of queries?

2. Are my favorite models tractable?

3. Are tractable models expressive?

After: We introduce probabilistic circuits as a unifiedframework for tractable probabilistic modeling

12/158

Page 30: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Complete evidence (EVI)

q3: What is the probability that today is a Monday at12.00 and there is a traffic jam only on 5thAvenue?

© fineartamerica.com

13/158

Page 31: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Complete evidence (EVI)

q3: What is the probability that today is a Monday at12.00 and there is a traffic jam only on 5thAvenue?

X = Day,Time, Jam5th, JamStr2, . . . , JamStrN

q3(m) = pm(X = Mon, 12.00, 1, 0, . . . , 0)

© fineartamerica.com

13/158

Page 32: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Complete evidence (EVI)

q3: What is the probability that today is a Monday at12.00 and there is a traffic jam only on 5thAvenue?

X = Day,Time, Jam5th, JamStr2, . . . , JamStrN

q3(m) = pm(X = Mon, 12.00, 1, 0, . . . , 0)

…fundamental inmaximum likelihood learning

θMLEm = argmaxθ

∏x∈D pm(x; θ)

© fineartamerica.com

13/158

Page 33: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Generative Adversarial Networks

minθ maxϕ Ex∼pdata(x)

[logDϕ(x)

]+ Ez∼p(z)

[log(1−Dϕ(Gθ(z)))

]

Goodfellow et al., “Generative adversarial nets”, 2014 14/158

Page 34: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Generative Adversarial Networks

minθ maxϕ Ex∼pdata(x)

[logDϕ(x)

]+ Ez∼p(z)

[log(1−Dϕ(Gθ(z)))

]no explicit likelihood!

⇒ adversarial training instead of MLE⇒ no tractable EVI

good sample quality⇒ but lots of samples needed for MC

unstable training ⇒ mode collapse

Goodfellow et al., “Generative adversarial nets”, 2014 15/158

Page 35: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

EVI

tractable bands16/158

Page 36: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Variational Autoencoders

pθ(x) =∫pθ(x | z)p(z)dz

an explicit likelihood model!

Rezende et al., “Stochastic backprop. and approximate inference in deep generative models”, 2014Kingma et al., “Auto-Encoding Variational Bayes”, 2014 17/158

Page 37: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Variational Autoencoders

log pθ(x) ≥ Ez∼qϕ(z|x)[log pθ(x | z)

]−KL(qϕ(z | x)||p(z))

an explicit likelihood model!

... but computing log pθ(x) is intractable

⇒ an infinite and uncountable mixture⇒ no tractable EVI

we need to optimize the ELBO…⇒ which is “tricky” [Alemi et al. 2017; Dai

et al. 2019; Ghosh et al. 2019]

18/158

Page 38: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

EVI

tractable bands19/158

Page 39: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Autoregressive models

pθ(x) =∏

i pθ(xi | x1, x2, . . . , xi−1)

an explicit likelihood!

…as a product of factors ⇒ tractable EVI!

many neural variantsNADE [Larochelle et al. 2011],MADE [Germain et al. 2015]PixelCNN [Salimans et al. 2017],PixelRNN [Oord et al. 2016]

X1 X2 X3 X4

. . . . . . . . . . . .

X1 X2 X3 X4

20/158

Page 40: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal queries (MAR)

q1: What is the probability that today is a Monday at12.00 and there is a traffic jam only on 5thAvenue?

© fineartamerica.com

21/158

Page 41: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal queries (MAR)

q1: What is the probability that today is a Monday at12.00 and there is a traffic jam only on 5thAvenue?

q1(m) = pm(Day = Mon, Jam5th = 1)

© fineartamerica.com

21/158

Page 42: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal queries (MAR)

q1: What is the probability that today is a Monday at12.00 and there is a traffic jam only on 5thAvenue?

q1(m) = pm(Day = Mon, Jam5th = 1)

General: pm(e) =∫pm(e,H) dH

where E ⊂ X, H = X \ E © fineartamerica.com

21/158

Page 43: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal queries (MAR)

q1: What is the probability that today is a Monday at12.00 and there is a traffic jam only on 5thAvenue?

q1(m) = pm(Day = Mon, Jam5th = 1)

General: pm(e) =∫pm(e,H) dH

and if you can answer MAR queries,then you can also do conditional queries (CON):

pm(q | e) = pm(q, e)

pm(e)

© fineartamerica.com

21/158

Page 44: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Tractable MAR : scene understanding

Fast and exact marginalization over unseen or “do not care” parts in the sceneStelzner et al., “Faster Attend-Infer-Repeat with Tractable Probabilistic Models”, 2019Kossen et al., “Structured Object-Aware Physics Prediction for Video Modeling and Planning”, 2019 22/158

Page 45: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Autoregressive models

pθ(x) =∏

i pθ(xi | x1, x2, . . . , xi−1)

an explicit likelihood!

…as a product of factors ⇒ tractable EVI!

X1 X2 X3 X4

. . . . . . . . . . . .

X1 X2 X3 X4

23/158

Page 46: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Autoregressive models

pθ(x) =∏

i pθ(xi | x1, x2, . . . , xi−1)

an explicit likelihood!

…as a product of factors ⇒ tractable EVI!

... but we need to fix a variable ordering⇒ only some MAR queries are tractable

for one ordering

X1 X2 X3 X4

. . . . . . . . . . . .

X1 X2 X3 X4

24/158

Page 47: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Normalizing flows

pX(x) = pZ(f−1(x))

∣∣∣det( δf−1

δx

)∣∣∣an explicit likelihood ⇒ tractable EVI!

... computing the determinant of the Jacobian

Z

X

f−1f

25/158

Page 48: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Normalizing flows

pX(x) = pZ(f−1(x))

∣∣∣det( δf−1

δx

)∣∣∣an explicit likelihood ⇒ tractable EVI!

... computing the determinant of the Jacobian

MAR is generally intractable⇒ unless f is a “trivial” bijection

Z

X

f−1f

26/158

Page 49: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

EVI MAR CON

I

tractable bands27/158

Page 50: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probabilistic Graphical Models (PGMs)

Declarative semantics: a clean separation of modeling assumptions from inference

Nodes: random variablesEdges: dependencies

+

Inference: conditioning [Darwiche 2001; Sang et al. 2005]

elimination [Zhang et al. 1994; Dechter 1998]

message passing [Yedidia et al. 2001; Dechter

et al. 2002; Choi et al. 2010; Sontag et al. 2011]

X1

X2

X3

X4

X5

28/158

Page 51: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Complexity of MAR on PGMs

Exact complexity: Computing MAR and CON is #P-complete⇒ [Cooper 1990; Roth 1996]

Approximation complexity: Computing MAR and COND approximatelywithin a relative error of 2n

1−ϵ

for any fixed ϵ is NP-hard⇒ [Dagum et al. 1993; Roth 1996]

29/158

Page 52: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why? Treewidth!

Treewidth:

Informally, how tree-like is the graphical modelm?Formally, the minimum width of any tree-decomposition ofm.

Fixed-parameter tractable: MAR and CON on a graphical modelm withtreewidthw take timeO(|X| · 2w), which is linear for fixed widthw[Dechter 1998; Koller et al. 2009]. ⇒ what about bounding the treewidth by design?

30/158

Page 53: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Low-treewidth PGMs

X1

X2

X3

X4

X5

Trees[Meilă et al. 2000]

X1

X2

X3

X4

X5

Polytrees[Dasgupta 1999]

X1 X2

X1 X3 X4

X3 X5

Thin Junction trees[Bach et al. 2001]

If treewidth is bounded (e.g.≊ 20), exact MAR and CON inference is possible in practice

31/158

Page 54: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Tree distributions

A tree-structured BN [Meilă et al. 2000] where eachXi ∈ X has at most one parent PaXi.

X1

X2

X3

X4

X5

p(X) =∏n

i=1p(xi|Paxi

)

Exact querying: EVI, MAR, CON tasks linear for trees: O(|X|)Exact learning from d examples takesO(|X|2 · d) with the classical Chow-Liu algorithm1

1Chow et al., “Approximating discrete probability distributions with dependence trees”, 1968 32/158

Page 55: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

EVI MAR CON

II I

tractable bands33/158

Page 56: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

What do we lose?

Expressiveness: Ability to represent rich and complex classes of distributions

X1

X2

X3

X4

X5

Bounded-treewidth PGMs lose the ability to represent all possible distributions …

Cohen et al., “On the expressive power of deep learning: A tensor analysis”, 2016Martens et al., “On the Expressive Efficiency of Sum Product Networks”, 2014 34/158

Page 57: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Mixtures

Mixtures as a convex combination of k (simpler) probabilistic models

−10 −5 0 5 10

X1

0.00

0.05

0.10

0.15

0.20

0.25

p(X

1)

p(X) = w1·p1(X)+w2·p2(X)

EVI, MAR, CON queries scale linearly in k

35/158

Page 58: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Mixtures

Mixtures as a convex combination of k (simpler) probabilistic models

−10 −5 0 5 10

X1

0.00

0.05

0.10

0.15

0.20

0.25

p(X

1)

p(X) =p(Z = 1 ) · p1(X|Z = 1 )

+ p(Z = 2 ) · p2(X|Z = 2 )

Mixtures are marginalizing a categorical latent variable Z with k values⇒ increased expressiveness

35/158

Page 59: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressiveness and efficiency

Expressiveness: Ability to represent rich and effective classes of functions

⇒ mixture of Gaussians can approximate any distribution!

Cohen et al., “On the expressive power of deep learning: A tensor analysis”, 2016Martens et al., “On the Expressive Efficiency of Sum Product Networks”, 2014 36/158

Page 60: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressiveness and efficiency

Expressiveness: Ability to represent rich and effective classes of functions

⇒ mixture of Gaussians can approximate any distribution!

Expressive efficiency (succinctness) Ability to represent rich and effective classes offunctions compactly

⇒ but how many components does a Gaussian mixture need?

Cohen et al., “On the expressive power of deep learning: A tensor analysis”, 2016Martens et al., “On the Expressive Efficiency of Sum Product Networks”, 2014 36/158

Page 61: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

37/158

Page 62: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

37/158

Page 63: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

37/158

Page 64: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

37/158

Page 65: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

37/158

Page 66: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

37/158

Page 67: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

37/158

Page 68: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive efficient are mixture?

⇒ stack mixtures like in deep generative models37/158

Page 69: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

Mixtures

EVI MAR CON

II II I

tractable bands

38/158

Page 70: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Maximum A Posteriori (MAP)aka Most Probable Explanation (MPE)

q5: Which combination of roads is most likely to bejammed on Monday at 9am?

© fineartamerica.com

39/158

Page 71: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Maximum A Posteriori (MAP)aka Most Probable Explanation (MPE)

q5: Which combination of roads is most likely to bejammed on Monday at 9am?

q5(m) = argmaxj pm(j1, j2, . . . | Day=M,Time=9)

© fineartamerica.com

39/158

Page 72: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Maximum A Posteriori (MAP)aka Most Probable Explanation (MPE)

q5: Which combination of roads is most likely to bejammed on Monday at 9am?

q5(m) = argmaxj pm(j1, j2, . . . | Day=M,Time=9)

General: argmaxq pm(q | e)

where Q ∪ E = X © fineartamerica.com

39/158

Page 73: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Maximum A Posteriori (MAP)aka Most Probable Explanation (MPE)

q5: Which combination of roads is most likely to bejammed on Monday at 9am?

…intractable for latent variable models!

maxq

pm(q | e) = maxq

∑z

pm(q, z | e)

=∑z

maxq

pm(q, z | e)© fineartamerica.com

39/158

Page 74: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

MAP inference : image inpainting7.3 Face Image Completion

Original

Covered

MEAN

PD

|z

BACK-ORIG

SUM

BACK-MPE

DV

|z

BACK-ORIG

SUM

BACK-MPE

Merge

|z

BACK-ORIG

SUM

BACK-MPE

Figure 7.3: Examples of face image reconstructions, left half is covered.

– 121 –

7.3 Face Image Completion

Original

Covered

MEAN

PD

|z

BACK-ORIG

SUM

BACK-MPE

DV

|z

BACK-ORIG

SUM

BACK-MPE

Merge

|z

BACK-ORIG

SUM

BACK-MPE

Figure 7.3: Examples of face image reconstructions, left half is covered.

– 121 –

Predicting arbitrary patchesgiven a singlemodelwithout the need of retraining.

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011Sguerra et al., “Image classification using sum-product networks for autonomous flight of microaerial vehicles”, 2016 40/158

Page 75: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

Mixtures

EVI MAR CON MAP

II II

tractable bands41/158

Page 76: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal MAP (MMAP)aka Bayesian Network MAP

q6: Which combination of roads is most likely to bejammed on Monday at 9am?

© fineartamerica.com

42/158

Page 77: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal MAP (MMAP)aka Bayesian Network MAP

q6: Which combination of roads is most likely to bejammed on Monday at 9am?

q6(m) = argmaxj pm(j1, j2, . . . | Time=9)

© fineartamerica.com

42/158

Page 78: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal MAP (MMAP)aka Bayesian Network MAP

q6: Which combination of roads is most likely to bejammed on Monday at 9am?

q6(m) = argmaxj pm(j1, j2, . . . | Time=9)

General: argmaxq pm(q | e)= argmaxq

∑h pm(q,h | e)

where Q ∪H ∪ E = X

© fineartamerica.com

42/158

Page 79: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Marginal MAP (MMAP)aka Bayesian Network MAP

q6: Which combination of roads is most likely to bejammed on Monday at 9am?

q6(m) = argmaxj pm(j1, j2, . . . | Time=9)

⇒ NPPP-complete [Park et al. 2006]

⇒ NP-hard for trees [Campos 2011]

⇒ NP-hard even for Naive Bayes [ibid.]© fineartamerica.com

42/158

Page 80: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

Mixtures

EVI MAR CON MAP MMAP

III

tractable bands43/158

Page 81: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Advanced queries

q2: Which day is most likely to have a traffic jam onmy route to work?

© fineartamerica.com

Bekker et al., “Tractable Learning for Complex Probability Queries”, 2015 44/158

Page 82: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Advanced queries

q2: Which day is most likely to have a traffic jam onmy route to work?

q2(m) = argmaxd pm(Day = d∧∨

i∈route JamStr i)

⇒ marginals + MAP + logical events

© fineartamerica.com

Bekker et al., “Tractable Learning for Complex Probability Queries”, 2015 44/158

Page 83: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Advanced queries

q2: Which day is most likely to have a traffic jam onmy route to work?

q7: What is the probability of seeing more traffic jamsin Uptown than Midtown?

© fineartamerica.com

Bekker et al., “Tractable Learning for Complex Probability Queries”, 2015 44/158

Page 84: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Advanced queries

q2: Which day is most likely to have a traffic jam onmy route to work?

q7: What is the probability of seeing more traffic jamsin Uptown than Midtown?

⇒ counts + group comparison

© fineartamerica.com

Bekker et al., “Tractable Learning for Complex Probability Queries”, 2015 44/158

Page 85: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Advanced queries

q2: Which day is most likely to have a traffic jam onmy route to work?

q7: What is the probability of seeing more traffic jamsin Uptown than Midtown?

and more:

expected classification agreement[Oztok et al. 2016; Choi et al. 2017, 2018]

expected predictions [Khosravi et al. 2019b] © fineartamerica.com

Bekker et al., “Tractable Learning for Complex Probability Queries”, 2015 44/158

Page 86: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

Mixtures

EVI MAR CON MAP MMAP ADV

III

tractable bands45/158

Page 87: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

Mixtures

?

EVI MAR CON MAP MMAP ADV

IIII Itractable bands

46/158

Page 88: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Fully factorized models

A completely disconnected graph. Example: Product of Bernoullis (PoBs)

X1

X2

X3

X4

X5

p(x) =∏n

i=1p(xi)

Complete evidence, marginals and MAP, MMAP inference is linear!

⇒ but definitely not expressive…

47/158

Page 89: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

Mixtures

Factorized

EVI MAR CON MAP MMAP ADV

IIII I

tractable bands48/158

Page 90: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

larger tractable bands

smaller tractable bands

49/158

Page 91: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

larger tractable bands

smaller tractable bands

BNs

NFs

NADEs

MNsVAEs

GANs

Expressive models are not very tractable…50/158

Page 92: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

larger tractable bands

smaller tractable bands

BNs

NFs

NADEs

MNsVAEs

GANs

Fully factorized

NBTrees

PolytreesTJT

LTM

Mixtures

and tractable ones are not very expressive…51/158

Page 93: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

larger tractable bands

smaller tractable bands

BNs

NFs

NADEs

MNsVAEs

GANs

Fully factorized

NBTrees

PolytreesTJT

LTM

Mixtures

X

probabilistic circuits are at the “sweet spot”52/158

Page 94: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probabilistic Circuits

Page 95: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probabilistic circuits

A probabilistic circuit C over variablesX is a computational graph encoding a(possibly unnormalized) probability distribution p(X)

54/158

Page 96: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probabilistic circuits

A probabilistic circuit C over variablesX is a computational graph encoding a(possibly unnormalized) probability distribution p(X)

⇒ operational semantics!

54/158

Page 97: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probabilistic circuits

A probabilistic circuit C over variablesX is a computational graph encoding a(possibly unnormalized) probability distribution p(X)

⇒ operational semantics!

⇒ by constraining the graph we can make inference tractable…

54/158

Page 98: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Stay tuned for...

Next: 1. What are the building blocks of probabilistic circuits?⇒ How to build a tractable computational graph?

2. For which queries are probabilistic circuits tractable?⇒ tractable classes induced by structural properties

After: How can probabilistic circuits be learned?

55/158

Page 99: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Distributions as computational graphs

X

Base case: a single node encoding a distribution⇒ e.g., Gaussian PDF continuous random variable

56/158

Page 100: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Distributions as computational graphs

¬X

Base case: a single node encoding a distribution⇒ e.g., indicators forX or ¬X for Boolean random variable

56/158

Page 101: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Distributions as computational graphs

x

X

pX(x)

Simple distributions are tractable “black boxes” for:

EVI: output p(x) (density or mass)

MAR: output 1 (normalized) or Z (unnormalized)

MAP: output the mode

56/158

Page 102: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Distributions as computational graphs

1.3

X

.33

Simple distributions are tractable “black boxes” for:

EVI: output p(x) (density or mass)

MAR: output 1 (normalized) or Z (unnormalized)

MAP: output the mode

56/158

Page 103: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Factorizations as product nodesDivide and conquer complexity

p(X1, X2, X3) = p(X1) · p(X2) · p(X3)

X1 X2 X3

X1

X2

X3

0.0

0.5

1.0

1.5

2.0

2.5

3.0

⇒ e.g. modeling a multivariate Gaussian with diagonal covariance matrix…57/158

Page 104: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Factorizations as product nodesDivide and conquer complexity

p(X1, X2, X3) = p(X1) · p(X2) · p(X3)

X1 X2 X3

X1

X2

X3

0.0

0.5

1.0

1.5

2.0

2.5

3.0 ×

X1 X2 X3

⇒ …with a product node over some univariate Gaussian distribution57/158

Page 105: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Factorizations as product nodesDivide and conquer complexity

p(x1, x2, x3) = p(x1) · p(x2) · p(x3)

X1 X2 X3

X1

X2

X3

0.0

0.5

1.0

1.5

2.0

2.5

3.0 ×

0.8

X1

0.5

X2

0.9

X3

⇒ feedforward evaluation57/158

Page 106: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Factorizations as product nodesDivide and conquer complexity

p(x1, x2, x3) = p(x1) · p(x2) · p(x3)

X1 X2 X3

X1

X2

X3

0.0

0.5

1.0

1.5

2.0

2.5

3.0 .36

0.8

X1

0.5

X2

0.9

X3

⇒ feedforward evaluation57/158

Page 107: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Mixtures as sum nodesEnhance expressiveness

−10 −5 0 5 10

X1

0.00

0.05

0.10

0.15

0.20

0.25

p(X

1)

p(X) = w1·p1(X)+w2·p2(X)

⇒ e.g. modeling a mixture of Gaussians…

58/158

Page 108: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Mixtures as sum nodesEnhance expressiveness

X1 X1

w1 w2

p(x) = 0.2·p1(x)+0.8·p2(x)

⇒ …as weighted a sum node over Gaussian input distributions

58/158

Page 109: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Mixtures as sum nodesEnhance expressiveness

.44

0.2

X1

0.5

X1

0.2 0.8

p(x) = 0.2·p1(x)+0.8·p2(x)

⇒ by stacking them we increase expressive efficiency

58/158

Page 110: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

A grammar for tractable modelsRecursive semantics of probabilistic circuits

X1

59/158

Page 111: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

A grammar for tractable modelsRecursive semantics of probabilistic circuits

X1 X1 X1

w1 w2

59/158

Page 112: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

A grammar for tractable modelsRecursive semantics of probabilistic circuits

X1 X1 X1

w1 w2

×

X1 X2

59/158

Page 113: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

A grammar for tractable modelsRecursive semantics of probabilistic circuits

X1 X1 X1

w1 w2

×

X1 X2

×

X1 X2

×

X1 X2

w1 w2

59/158

Page 114: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

A grammar for tractable modelsRecursive semantics of probabilistic circuits

X1 X1 X1

w1 w2

×

X1 X2

×

X1 X2

×

X1 X2

w1 w2

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

59/158

Page 115: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probabilistic circuits are not PGMs!

They are probabilistic and graphical, however …

PGMs Circuits

Nodes: random variables unit of computationsEdges: dependencies order of execution

Inference: conditioning

elimination

message passing

feedforward pass

backward pass

⇒ they are computational graphs, more like neural networks

60/158

Page 116: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Just sum, products and distributions?

X1

X2

X3

X4

X5

X6

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

just arbitrarily compose them like a neural network!61/158

Page 117: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Just sum, products and distributions?

X1

X2

X3

X4

X5

X6

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

×

just arbitrarily compose them like a neural network!⇒ structural constraints needed for tractability 61/158

Page 118: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Which structural constraintsto ensure tractability?

Page 119: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Decomposability

A product node is decomposable if its children depend on disjoint sets of variables⇒ just like in factorization!

×

X1 X2 X3

decomposable circuit

×

X1 X1 X3

non-decomposable circuit

Darwiche et al., “A knowledge compilation map”, 2002 63/158

Page 120: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothnessaka completeness

A sum node is smooth if its children depend of the same variable sets⇒ otherwise not accounting for some variables

X1 X1

w1 w2

smooth circuit

X1 X2

w1 w2

non-smooth circuit

⇒ smoothness can be easily enforced [Shih et al. 2019]

Darwiche et al., “A knowledge compilation map”, 2002 64/158

Page 121: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Computing arbitrary integrations (or summations)⇒ linear in circuit size!

E.g., suppose we want to compute Z:∫p(x)dx

65/158

Page 122: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

If p(x) =∑

i wipi(x), (smoothness):

∫p(x)dx =

∫ ∑i

wipi(x)dx =

=∑i

wi

∫pi(x)dx

⇒ integrals are “pushed down” to children

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

65/158

Page 123: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

If p(x,y, z) = p(x)p(y)p(z), (decomposability):

∫ ∫ ∫p(x,y, z)dxdydz =

=

∫ ∫ ∫p(x)p(y)p(z)dxdydz =

=

∫p(x)dx

∫p(y)dy

∫p(z)dz

⇒ integrals decompose into easier ones

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

65/158

Page 124: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Forward pass evaluation for MAR⇒ linear in circuit size!

E.g. to compute p(x2, x4):

leafs overX1 andX3 outputZi =∫p(xi)dxi

⇒ for normalized leaf distributions: 1.0

leafs overX2 andX4 output EVI

feedforward evaluation (bottom-up)

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

65/158

Page 125: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Forward pass evaluation for MAR⇒ linear in circuit size!

E.g. to compute p(x2, x4):

leafs overX1 andX3 outputZi =∫p(xi)dxi

⇒ for normalized leaf distributions: 1.0

leafs overX2 andX4 output EVI

feedforward evaluation (bottom-up)

× ×

× ×× ×

1.0

X1

.61

X2

1.0

X1

.83

X2

1.0

X3

.58

X4

1.0

X3

.77

X4

65/158

Page 126: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Forward pass evaluation for MAR⇒ linear in circuit size!

E.g. to compute p(x2, x4):

leafs overX1 andX3 outputZi =∫p(xi)dxi

⇒ for normalized leaf distributions: 1.0

leafs overX2 andX4 output EVI

feedforward evaluation (bottom-up)

.35 .64

.49

.58 .77

.58 .77.58 .77

1.0

X1

.61

X2

1.0

X1

.83

X2

1.0

X3

.58

X4

1.0

X3

.77

X4

65/158

Page 127: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable CON

Analogously, for arbitrary conditional queries:

p(q | e) = p(q, e)

p(e)

1. evaluate p(q, e) ⇒ one feedforward pass

2. evaluate p(e) ⇒ another feedforward pass

⇒ …still linear in circuit size!

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

66/158

Page 128: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Tractable MAR : Robotics

Pixels for scenes and abstractions for mapsdecompose along circuit structures.

Fast and exact marginalization over unseenor “do not care” scene and map parts forhierarchical planning robot executions

Pronobis et al., “Learning Deep Generative Spatial Models for Mobile Robots”, 2016Pronobis et al., “Deep spatial affordance hierarchy: Spatial knowledge representation for planningin large-scale environments”, 2017Zheng et al., “Learning graph-structured sum-product networks for probabilistic semantic maps”,2018 67/158

Page 129: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAP

We can also decompose bottom-up a MAP query:

argmaxq

p(q | e)

68/158

Page 130: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAP

We cannot decompose bottom-up a MAP query:

argmaxq

p(q | e)

since for a sum node we are marginalizing out a latent variable

argmaxq

∑i

wipi(q, e) = argmaxq

∑z

p(q, z, e) =∑z

argmaxq

p(q, z, e)

⇒ MAP for latent variable models is intractable [Conaty et al. 2017]

69/158

Page 131: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinismaka selectivity

A sum node is deterministic if the output of only one children is non zero for any input⇒ e.g. if their distributions have disjoint support

×

X1 ≤ θ X2

×

X1 > θ X2

w1 w2

deterministic circuit

×

X1 X2

×

X1 X2

w1 w2

non-deterministic circuit 70/158

Page 132: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Computing maximization with arbitrary evidence e⇒ linear in circuit size!

E.g., suppose we want to compute:

maxq

p(q | e)

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

71/158

Page 133: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

If p(q, e) =∑

i wipi(q, e) = maxi wipi(q, e),(deterministic sum node):

maxq

p(q, e) = maxq

∑i

wipi(q, e)

= maxq

maxi

wipi(q, e)

= maxi

maxq

wipi(q, e)

⇒ one non-zero child term, thus sum is max

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

71/158

Page 134: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

If p(q, e) = p(qx, ex,qy, ey) = p(qx, ex)p(qy, ey)(decomposable product node):

maxq

p(q | e) = maxq

p(q, e)

= maxqx,qy

p(qx, ex,qy, ey)

= maxqx

p(qx, ex),maxqy

p(qy, ey)

⇒ solving optimization independently

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

71/158

Page 135: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Evaluating the circuit twice:bottom-up and top-down ⇒ still linear in circuit size! × ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

71/158

Page 136: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Evaluating the circuit twice:bottom-up and top-down ⇒ still linear in circuit size!

E.g., for argmaxx1,x3p(x1, x3 | x2, x4):

1. turn sum into max nodes anddistributions into max distributions

2. evaluate p(x2, x4) bottom-up

3. retrieve max activations top-down

4. compute MAP states forX1 andX3 at leaves

× ×

max

max max

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

71/158

Page 137: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Evaluating the circuit twice:bottom-up and top-down ⇒ still linear in circuit size!

E.g., for argmaxx1,x3p(x1, x3 | x2, x4):

1. turn sum into max nodes anddistributions into max distributions

2. evaluate p(x2, x4) bottom-up

3. retrieve max activations top-down

4. compute MAP states forX1 andX3 at leaves

× ×

max

max max

× ×× ×

.66

X1

.61

X2

.2

X1

.83

X2

1.2

X3

.58

X4

1.8

X3

.77

X4

71/158

Page 138: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Evaluating the circuit twice:bottom-up and top-down ⇒ still linear in circuit size!

E.g., for argmaxx1,x3p(x1, x3 | x2, x4):

1. turn sum into max nodes anddistributions into max distributions

2. evaluate p(x2, x4) bottom-up

3. retrieve max activations top-down

4. compute MAP states forX1 andX3 at leaves

.35 .64

.49

.58 .77

.58 .77.58 .77

.66

X1

.61

X2

.2

X1

.83

X2

1.2

X3

.58

X4

1.8

X3

.77

X4

71/158

Page 139: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Evaluating the circuit twice:bottom-up and top-down ⇒ still linear in circuit size!

E.g., for argmaxx1,x3p(x1, x3 | x2, x4):

1. turn sum into max nodes anddistributions into max distributions

2. evaluate p(x2, x4) bottom-up

3. retrieve max activations top-down

4. compute MAP states forX1 andX3 at leaves

.35 .64

.49

.58 .77

.58 .77.58 .77

.66

X1

.61

X2

.2

X1

.83

X2

1.2

X3

.58

X4

1.8

X3

.77

X4

71/158

Page 140: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

MAP inference : image segmentation

Semantic segmentation is MAP over joint pixel and label space

Even approximate MAP for non-deterministic circuits (SPNs) delivers good performances.Rathke et al., “Locally adaptive probabilistic models for global segmentation of pathological octscans”, 2017Yuan et al., “Modeling spatial layout for scene image understanding via a novel multiscalesum-product network”, 2016Friesen et al., “Submodular Sum-product Networks for Scene Understanding”, 2016 72/158

Page 141: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MMAP

Analogously, we could can also do a MMAP query:

argmaxq

∑z

p(q, z | e)

73/158

Page 142: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MMAP

We cannot decompose a MMAP query!

argmaxq

∑z

p(q, z | e)

we still have latent variables to marginalize…

⇒ The final part of this tutorial will talk more about advanced queries andtheir tractability properties.

74/158

Page 143: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

more tractable queries

less tractable queries

BNs

NFs

NADEs

MNsVAEs

GANs

Fully factorized

NBTrees

PolytreesTJT

LTM

Mixtures

?

where are probabilistic circuits?75/158

Page 144: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

more tractable queries

less tractable queries

PCs

PCs

PCs

BNs

NFs

PCsPCs

NADEs

MNsVAEs

GANs

Fully factorized

NBTrees

PolytreesTJT

LTM

Mixtures

tractability vs expressive efficiency76/158

Page 145: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

more tractable queries

less tractable queries

SPNs

PSDDs

CNets

BNs

NFs

ACsAoGs

NADEs

MNsVAEs

GANs

Fully factorized

NBTrees

PolytreesTJT

LTM

Mixtures

tractability vs expressive efficiency77/158

Page 146: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive are probabilistic circuits?

Measuring average test set log-likelihood on 20 density estimation benchmarks

Comparing against intractable models:

Bayesian networks (BN) [Chickering 2002] with sophisticated context-specific CPDs

MADEs [Germain et al. 2015]

VAEs [Kingma et al. 2014] (IWAE ELBO [Burda et al. 2015])

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013Peharz et al., “Random sum-product networks: A simple but effective approach to probabilisticdeep learning”, 2019 78/158

Page 147: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

How expressive are probabilistic circuits?density estimation benchmarks

dataset best circuit BN MADE VAE dataset best circuit BN MADE VAE

nltcs -5.99 -6.02 -6.04 -5.99 dna -79.88 -80.65 -82.77 -94.56msnbc -6.04 -6.04 -6.06 -6.09 kosarek -10.52 -10.83 - -10.64kdd -2.12 -2.19 -2.07 -2.12 msweb -9.62 -9.70 -9.59 -9.73plants -11.84 -12.65 -12.32 -12.34 book -33.82 -36.41 -33.95 -33.19audio -39.39 -40.50 -38.95 -38.67 movie -50.34 -54.37 -48.7 -47.43jester -51.29 -51.07 -52.23 -51.54 webkb -149.20 -157.43 -149.59 -146.9netflix -55.71 -57.02 -55.16 -54.73 cr52 -81.87 -87.56 -82.80 -81.33accidents -26.89 -26.32 -26.42 -29.11 c20ng -151.02 -158.95 -153.18 -146.9retail -10.72 -10.87 -10.81 -10.83 bbc -229.21 -257.86 -242.40 -240.94pumbs* -22.15 -21.72 -22.3 -25.16 ad -14.00 -18.35 -13.65 -18.81

79/158

Page 148: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Hybrid intractable + tractable EVI

VAEs as intractable input distributions, orchestrated by a circuit on top

⇒ decomposing a joint ELBO: better lower-bounds than a single VAE⇒ more expressive efficient and less data hungry

Tan et al., “Hierarchical Decompositional Mixtures of Variational Autoencoders”, 2019 80/158

Page 149: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning Probabilistic Circuits

Page 150: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuits

A probabilistic circuit C over variablesX is a computational graph encodinga (possibly unnormalized) probability distribution p(X) parameterized byΩ

82/158

Page 151: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuits

A probabilistic circuit C over variablesX is a computational graph encodinga (possibly unnormalized) probability distribution p(X) parameterized byΩ

Learning a circuit C from dataD can therefore involve learning the graph(structure) and/or its parameters

82/158

Page 152: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuitsParameters Structure

Gen

erative

? ?

Discriminative

? ?

83/158

Page 153: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Stay tuned for...

Next: 1. How to learn circuit parameters?⇒ convex optimization, EM, SGD, Bayesian learning, …

2. How to learn the structure of circuits?⇒ local search, random structures, ensembles, …

After: How circuits are related to other tractable models?

84/158

Page 154: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuits

Probabilistic circuits are (peculiar) neural networks… just backprop with SGD!

85/158

Page 155: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuits

Probabilistic circuits are (peculiar) neural networks… just backprop with SGD!

…end of Learning section!

85/158

Page 156: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuits

Probabilistic circuits are (peculiar) neural networks… just backprop with SGD!

wait but…

SGD is slow to converge…can we do better?

How to learn normalized weights?

Can we exploit structural properties somehow?

85/158

Page 157: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning input distributionsAs simple as tossing a coin

X1

The simplest PC: a single input distribution pL with parameters θ⇒ maximum likelihood (ML) estimation over dataD

86/158

Page 158: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning input distributionsAs simple as tossing a coin

X1

The simplest PC: a single input distribution pL with parameters θ⇒ maximum likelihood (ML) estimation over dataD

E.g. Bernoulli with parameter θ

θML =

∑x∈D 1[x = 1] + α

|D|+ 2α⇒ Laplace smoothing

86/158

Page 159: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning input distributionsGeneral case: still simple

Bernoulli, Gaussian, Dirichlet, Poisson, Gamma are exponential families of the form:

pL(x) = h(x) exp(T (x)Tθ − A(θ))

87/158

Page 160: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning input distributionsGeneral case: still simple

Bernoulli, Gaussian, Dirichlet, Poisson, Gamma are exponential families of the form:

pL(x) = h(x) exp(T (x)Tθ − A(θ))

Where:

A(θ) : log-normalizer

h(x) base-measure

T (x) sufficient statistics

θ natural parameters

87/158

Page 161: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning input distributionsGeneral case: still simple

Bernoulli, Gaussian, Dirichlet, Poisson, Gamma are exponential families of the form:

pL(x) = h(x) exp(T (x)Tθ − A(θ))

Where:

A(θ) : log-normalizer

h(x) base-measure

T (x) sufficient statistics

θ natural parameters

orϕ expectation parameters — 1:1 mapping with θ⇒θ = θ(ϕ)87/158

Page 162: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning input distributionsGeneral case: still simple

Bernoulli, Gaussian, Dirichlet, Poisson, Gamma are exponential families of the form:

pL(x) = h(x) exp(T (x)Tθ − A(θ))

Maximum likelihood estimation is still “counting”:

ϕML = ED[T (x)] =1

|D|∑x∈D

T (x)

θML = θ(ϕML)

87/158

Page 163: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

The simplest “real” PC: a sum node

−10 −5 0 5 10

X1

0.00

0.05

0.10

0.15

0.20

0.25

p(X

1)

X1 X1

w1 w2

Recall that sum nodes representmixture models:

pS(x) =K∑k=1

wkpLk(x)

88/158

Page 164: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

The simplest “real” PC: a sum node

−10 −5 0 5 10

X1

0.00

0.05

0.10

0.15

0.20

0.25

p(X

1)

X1 X1

w1 w2

Recall that sum nodes represent latent variable models:

pS(x) =K∑k=1

p(Z = k)p(x | Z = k)

88/158

Page 165: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-Maximization (EM)Learning latent variable models: the EM recipe

Expectation-maximization=maximum-likelihood under missing data.

Given: p(X,Z) whereX observed, Zmissing at random.

θnew ← argmaxθ Ep(Z |X;θold) [log p(X,Z;θ)]

89/158

Page 166: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-Maximization for mixtures

θnew ← argmaxθ Ep(Z |X;θold) [log p(X, Z;θ)]

ML if Z was observed:

wk =

∑z∈D 1[z = k]

|D|ϕk =

∑x,z∈D 1[z = k]T (x)∑

z∈D 1[z = k]

Z is unobserved—but we have p(Z = k |x) ∝ wk Lk(x).

wnewk =

∑x∈D p(Z = k |x)

|D|ϕnew

k =

∑x,z∈D p(Z = k |x)T (x)∑

z∈D p(Z = k |x)

90/158

Page 167: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-Maximization for PCs

EM for mixtures well understood.

Mixtures are PCs with 1 sum node.

The general case, PCs with many sum nodes, is similar …

…but a bit more complicated.

91/158

Page 168: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-Maximization for PCs

EM for mixtures well understood.

Mixtures are PCs with 1 sum node.

The general case, PCs with many sum nodes, is similar …

…but a bit more complicated.

91/158

Page 169: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 170: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 171: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 172: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

Setting all indicators to 1⇒ same computation.

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 173: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

Setting single indicators to 1⇒ switches on corresponding child.

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 174: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

Setting single indicators to 1⇒ switches on corresponding child.

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 175: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

Setting single indicators to 1⇒ switches on corresponding child.

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 176: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

Setting all indicators to 1⇒ same computation.Have we included ZS in the model?

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 177: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

Setting all indicators to 1⇒ same computation.Have we included ZS in the model?Yes, but we might have destroyed smoothness…

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 178: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

Setting all indicators to 1⇒ same computation.Have we included ZS in the model?Yes, but we might have destroyed smoothness…

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 179: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

We can fix this though…

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 180: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

We can fix this though…

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 181: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

We can fix this though…

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 182: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.

But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 183: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.

But what did we mean with this ctx?

One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 184: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.

But what did we mean with this ctx?

One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 185: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?

One can show that the latent variables “above”…

…select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 186: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”…

…select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 187: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Page 188: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Thus, sum weights have sound probabilistic semantics.

Page 189: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Thus, sum weights have sound probabilistic semantics.

Page 190: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Thus, sum weights have sound probabilistic semantics.

Page 191: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Note, that when ctx = 0, ZS becomes independent ofX !

Thus, w1, w2, w3 can be set arbitrary.Do we need to store them then? No!

Page 192: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Note, that when ctx = 0, ZS becomes independent ofX !Thus, w1, w2, w3 can be set arbitrary.

Do we need to store them then? No!

Page 193: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Note, that when ctx = 0, ZS becomes independent ofX !Thus, w1, w2, w3 can be set arbitrary.Do we need to store them then?

No!

Page 194: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

Note, that when ctx = 0, ZS becomes independent ofX !Thus, w1, w2, w3 can be set arbitrary.Do we need to store them then? No!

Page 195: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

This additional structure is a theoretical tool…

…and doesn’t need be generated in memory.

Page 196: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AugmentationMaking Latent Variables Explicit [Peharz et al. 2016]

92/158

This is an example of smoothing.But what did we mean with this ctx?One can show that the latent variables “above”……select either a path to S, or …

ctx = 1

…to its “twin” — but not both.

ctx = 0

This additional structure is a theoretical tool……and doesn’t need be generated in memory.

Page 197: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

93/158

For learning, we need to knowfor each sum S:

1. Is S reached (ctx =?)

2. Which child does it select (ZS =?)

Page 198: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

93/158

For learning, we need to knowfor each sum S:

1. Is S reached (ctx =?)

2. Which child does it select (ZS =?)

Page 199: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

93/158

For learning, we need to knowfor each sum S:

1. Is S reached (ctx =?)

2. Which child does it select (ZS =?)

We can infer it: p(ctx, ZS |x)

Page 200: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

wnewi,j ←

∑x∈D p[ctxi = 1, Zi = j |x;wold]∑

x∈D p[ctxi = 1 |x;wold]

We get all the required statistics with a single backprop pass:

p[ctxi = 1, Zi = j |x;wold] =1

p(x)

∂p(x)

∂Si(x)Nj(x)w

oldi,j

Darwiche, “A Differential Approach to Inference in Bayesian Networks”, 2003Peharz et al., “On the Latent Variable Interpretation in Sum-Product Networks”, 2016 94/158

Page 201: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

wnewi,j ←

∑x∈D p[ctxi = 1, Zi = j |x;wold]∑

x∈D p[ctxi = 1 |x;wold]

We get all the required statistics with a single backprop pass:

p[ctxi = 1, Zi = j |x;wold] =1

p(x)

∂p(x)

∂Si(x)Nj(x)w

oldi,j

Darwiche, “A Differential Approach to Inference in Bayesian Networks”, 2003Peharz et al., “On the Latent Variable Interpretation in Sum-Product Networks”, 2016 94/158

Page 202: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

wnewi,j ←

∑x∈D p[ctxi = 1, Zi = j |x;wold]∑

x∈D p[ctxi = 1 |x;wold]

We get all the required statistics with a single backprop pass:

p[ctxi = 1, Zi = j |x;wold] =1

p(x)

∂p(x)

∂Si(x)Nj(x)w

oldi,j

⇒ This also works with missing values in x!

Darwiche, “A Differential Approach to Inference in Bayesian Networks”, 2003Peharz et al., “On the Latent Variable Interpretation in Sum-Product Networks”, 2016 94/158

Page 203: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

wnewi,j ←

∑x∈D p[ctxi = 1, Zi = j |x;wold]∑

x∈D p[ctxi = 1 |x;wold]

We get all the required statistics with a single backprop pass:

p[ctxi = 1, Zi = j |x;wold] =1

p(x)

∂p(x)

∂Si(x)Nj(x)w

oldi,j

⇒ Similar updates for leaves, when in exponential family.

Darwiche, “A Differential Approach to Inference in Bayesian Networks”, 2003Peharz et al., “On the Latent Variable Interpretation in Sum-Product Networks”, 2016 94/158

Page 204: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR (smooth, decomposable)

wnewi,j ←

∑x∈D p[ctxi = 1, Zi = j |x;wold]∑

x∈D p[ctxi = 1 |x;wold]

We get all the required statistics with a single backprop pass:

p[ctxi = 1, Zi = j |x;wold] =1

p(x)

∂p(x)

∂Si(x)Nj(x)w

oldi,j

⇒ also derivable from a concave-convex procedure (CCCP) [Zhao et al. 2016a]

Darwiche, “A Differential Approach to Inference in Bayesian Networks”, 2003Peharz et al., “On the Latent Variable Interpretation in Sum-Product Networks”, 2016 94/158

Page 205: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-MaximizationTractable MAR/MAP (smooth, decomposable, deterministic)

95/158

Page 206: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expectation-Maximization Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

95/158

Page 207: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

96/158

Deterministic circuit⇒ at most one non-zero sum child (for complete input).

Page 208: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

96/158

Deterministic circuit⇒ at most one non-zero sum child (for complete input).

Page 209: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

96/158

E.g., the second child of this sum node…

Page 210: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

96/158

…but that rules out ZS ∈ 1, 3! ⇒ ZS = 2

Thus, the latent variables are actually observed in deterministic circuits!They are (deterministic) functions of the observed data.

Page 211: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

96/158

…but that rules out ZS ∈ 1, 3! ⇒ ZS = 2Thus, the latent variables are actually observed in deterministic circuits!

They are (deterministic) functions of the observed data.

Page 212: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

96/158

…but that rules out ZS ∈ 1, 3! ⇒ ZS = 2Thus, the latent variables are actually observed in deterministic circuits!They are (deterministic) functions of the observed data.

Page 213: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

ExampleTractable MAR/MAP (smooth, decomposable, deterministic)

97/158

For each sum node, we know

1. if it is reached (ctx = 1)

2. which child it selects

Page 214: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

ExampleTractable MAR/MAP (smooth, decomposable, deterministic)

97/158

For each sum node, we know

1. if it is reached (ctx = 1)

2. which child it selects

Page 215: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

ExampleTractable MAR/MAP (smooth, decomposable, deterministic)

97/158

For each sum node, we know

1. if it is reached (ctx = 1)

2. which child it selects

Page 216: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

ExampleTractable MAR/MAP (smooth, decomposable, deterministic)

97/158

For each sum node, we know

1. if it is reached (ctx = 1)

2. which child it selects

⇒ MLE by counting!

Page 217: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

Given a complete datasetD, the maximum-likelihood sum-weights are:

wMLi,j =

∑x∈D 1x |= [i ∧ j]∑

x∈D 1x |= [i]

⇒ global maximum with single pass overD⇒ regularization, e.g. Laplace-smoothing, to avoid division by zero

⇒ when missing data, fallback to EM

Kisa et al., “Probabilistic sentential decision diagrams”, 2014Peharz et al., “Learning Selective Sum-Product Networks”, 2014 98/158

← ctxi = 1, Zi = j

← ctxi = 1

Page 218: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

Given a complete datasetD, the maximum-likelihood sum-weights are:

wMLi,j =

∑x∈D 1x |= [i ∧ j]∑

x∈D 1x |= [i]

⇒ global maximum with single pass overD⇒ regularization, e.g. Laplace-smoothing, to avoid division by zero

⇒ when missing data, fallback to EM

Kisa et al., “Probabilistic sentential decision diagrams”, 2014Peharz et al., “Learning Selective Sum-Product Networks”, 2014 98/158

← ctxi = 1, Zi = j

← ctxi = 1

Page 219: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

Given a complete datasetD, the maximum-likelihood sum-weights are:

wMLi,j =

∑x∈D 1x |= [i ∧ j]∑

x∈D 1x |= [i]

⇒ global maximum with single pass overD⇒ regularization, e.g. Laplace-smoothing, to avoid division by zero

⇒ when missing data, fallback to EM

Kisa et al., “Probabilistic sentential decision diagrams”, 2014Peharz et al., “Learning Selective Sum-Product Networks”, 2014 98/158

← ctxi = 1, Zi = j

← ctxi = 1

Page 220: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Exact MLTractable MAR/MAP (smooth, decomposable, deterministic)

Given a complete datasetD, the maximum-likelihood sum-weights are:

wMLi,j =

∑x∈D 1x |= [i ∧ j]∑

x∈D 1x |= [i]

⇒ global maximum with single pass overD⇒ regularization, e.g. Laplace-smoothing, to avoid division by zero

⇒ when missing data, fallback to EM

Kisa et al., “Probabilistic sentential decision diagrams”, 2014Peharz et al., “Learning Selective Sum-Product Networks”, 2014 98/158

← ctxi = 1, Zi = j

← ctxi = 1

Page 221: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Bayesian parameter learning

Formulate a prior p(w,θ) over sum-weights and leaf-parameters and perform posteriorinference:

p(w,θ|D) ∝ p(w,θ) p(D|w,θ)

Moment matching (oBMM) [Jaini et al. 2016; Rashwan et al. 2016]

Collapsed variational inference algorithm [Zhao et al. 2016b]

Gibbs sampling [Trapp et al. 2019; Vergari et al. 2019]

99/158

Page 222: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuitsParameters Structure

Gen

erative deterministic

closed-form MLE [Kisa et al. 2014a; Peharz et al. 2014]non-deterministicEM [Poon et al. 2011; Peharz 2015; Zhao et al. 2016a]SGD [Sharir et al. 2016; Peharz et al. 2019a]Bayesian [Jaini et al. 2016; Rashwan et al. 2016][Zhao et al. 2016b; Trapp et al. 2019; Vergari et al. 2019]

?

Discriminative

? ?

100/158

Page 223: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 224: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 225: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 226: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 227: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 228: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 229: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 230: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 231: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 232: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 233: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

Page 234: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

⇒ Smooth & Decomposable

Page 235: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Image-tailored (handcrafted) structures“Recursive Image Slicing”

Poon et al., “Sum-Product Networks: a New Deep Architecture”, 2011 101/158

⇒ Smooth & Decomposable⇒ Tractable MAR

Page 236: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Cluster

Page 237: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Cluster→ sum node

Page 238: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Try to find independent groupsof random variables

Page 239: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Try to find independent groupsof random variablesSuccess→ product node

Page 240: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Try to find independent groupsof random variables

Page 241: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Try to find independent groupsof random variablesSuccess→ product node

Page 242: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Single variable

Page 243: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Single variable→ leaf

Page 244: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Try to find independent groupsof random variables

Page 245: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

Try to find independent groupsof random variablesFail→ cluster→ sum node

Page 246: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

⇒ Continue until no furtherleaf can be expanded.⇒ Clustering ratios also deliver(initial) parameters.

Page 247: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning the structure from data“Recursive Data Slicing” — LearnSPN

Gens et al., “Learning the Structure of Sum-Product Networks”, 2013102/158

⇒ Continue until no furtherleaf can be expanded.⇒ Clustering ratios also deliver(initial) parameters.⇒ Smooth & Decomposable⇒ Tractable MAR

Page 248: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

LearnSPNVariants

ID-SPN [Rooshenas et al. 2014]

LearnSPN-b/T/B [Vergari et al. 2015]

for heterogeneous data [Molina et al. 2018]

using k-means [Butz et al. 2018] or SVD splits [Adel et al. 2015]

learning DAGs [Dennis et al. 2015; Jaini et al. 2018]

approximating independence tests [Di Mauro et al. 2018]

103/158

Page 249: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 250: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 251: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 252: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 253: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 254: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 255: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 256: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 257: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 258: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 259: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 260: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 261: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 262: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Page 263: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

…and so on.

Page 264: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Convert into PC…

Page 265: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Convert into PC…

Page 266: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Convert into PC…

Page 267: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Convert into PC…

Page 268: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Convert into PC…

Page 269: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure Learning + MAP (determinism)“Recursive conditioning” — Cutset Networks [Rahman et al. 2014]

104/158

Convert into PC… Resulting PCis deterministic.

Page 270: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Cutset networks (CNets)Variants

Variable selection based on entropy [Rahman et al. 2014]

Can be extended to mixtures of CNets using EM [ibid.]

Structure search over OR-graphs/CL-trees [Di Mauro et al. 2015a]

Boosted CNets [Rahman et al. 2016]

Randomized CNets, Bagging [Di Mauro et al. 2017]

105/158

Page 271: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structure learning + MAP (determinism)Greedy structure search [Peharz2014; Lowd et al. 2008; Liang et al. 2017a]

Structure learning as discrete optimization

Typical objective:

O = logL+ λ|C|,

where logL is log-likelihood using ML-parameters, and |C| the PC’s size (⇔ worstcase inference cost).

Iterate:1. Start with a simple initial structure.2. Perform local structure modifications, greedily improvingO

106/158

Page 272: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Randomized structure learning

Extremely Randomized CNets (XCNets) [Di Mauro et al. 2017]

Top-down random conditioning.

Learning Chow-Liu trees at the leaves.

Smooth, decomposable, deterministic.

Random Tensorized SPNs (RAT-SPNs) [Peharz et al. 2019a]

Random tree-shaped PCs.

Discriminative+generative parameter learning (SGD/EM + dropout).

Smooth, decomposable.

107/158

Page 273: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Ensembles of probabilistic circuits

Single circuits might be not accurate enough or overfit training data…Solution: ensembles of circuits!

⇒ non-deterministic mixture models: another sum node!

p(X) =K∑i=1

λiCi(X), λi ≥ 0K∑i=1

λi = 1

Ensemble weights and components can be learned separately or jointly

EM or structural EM

bagging

boosting108/158

Page 274: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Bagging

more efficient than EM

mixture coefficients are set equally probable

mixture components can be learned independently on different bootstraps

Adding random subspace projection to bagged networks (like for CNets)

more efficient than bagging

Di Mauro et al., “Learning Accurate Cutset Networks by Exploiting Decomposability”, 2015Di Mauro et al., “Learning Bayesian Random Cutset Forests”, 2015 109/158

Page 275: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Boosting

Boosting Probabilistic Circuits

BDE: boosting density estimationsequentially grows the ensemble, adding a weak base learner at each stageat each boosting stepm, find a weak learner cm and a coefficient ηm maximizing theweighted LL of the new model

fm = (1− ηm)fm−1 + ηmcm

GBDE: a kernel based generalization of BDE—AdaBoost style algorithm

sequential EMat each stepm, jointly optimize ηm and cm keeping fm−1 fixed

Rahman et al., “Learning Ensembles of Cutset Networks”, 2016 110/158

Page 276: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuitsParameters Structure

Gen

erative deterministic

closed-form MLE [Kisa et al. 2014a; Peharz et al. 2014]non-deterministicEM [Poon et al. 2011; Peharz 2015; Zhao et al. 2016a]SGD [Sharir et al. 2016; Peharz et al. 2019a]Bayesian [Jaini et al. 2016; Rashwan et al. 2016][Zhao et al. 2016b; Trapp et al. 2019; Vergari et al. 2019]

greedytop-down [Gens et al. 2013; Rooshenas et al. 2014][Rahman et al. 2014; Vergari et al. 2015]bottom-up [Peharz et al. 2013]hill climbing [Lowd et al. 2008, 2013; Peharz et al. 2014][Dennis et al. 2015; Liang et al. 2017a]random RAT-SPNs [Peharz et al. 2019a] XCNet [Di Mauro et al. 2017]

Discriminative

? ?

111/158

Page 277: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

EVI inference : density estimation

dataset single models ensembles dataset single models ensembles

nltcs -5.99 [ID-SPN] -5.99 [LearnPSDDs] dna -79.88 [SPGM] -80.07 [SPN-btb]

msnbc -6.04 [Prometheus] -6.04 [LearnPSDDs] kosarek -10.59 [Prometheus] -10.52 [LearnPSDDs]

kdd -2.12 [Prometheus] -2.12 [LearnPSDDs] msweb -9.73 [ID-SPN] -9.62 [XCNets]

plants -12.54 [ID-SPN] -11.84 [XCNets] book -34.14 [ID-SPN] -33.82 [SPN-btb]

audio -39.77 [BNP-SPN] -39.39 [XCNets] movie -51.49 [Prometheus] -50.34 [XCNets]

jester -52.42 [BNP-SPN] -51.29 [LearnPSDDs] webkb -151.84 [ID-SPN] -149.20 [XCNets]

netflix -56.36 [ID-SPN] -55.71 [LearnPSDDs] cr52 -83.35 [ID-SPN] -81.87 [XCNets]

accidents -26.89 [SPGM] -29.10 [XCNets] c20ng -151.47 [ID-SPN] -151.02 [XCNets]

retail -10.85 [ID-SPN] -10.72 [LearnPSDDs] bbc -248.5 [Prometheus] -229.21 [XCNets]

pumbs* -22.15 [SPGM] -22.67 [SPN-btb] ad -15.40 [CNetXD] -14.00 [XCNets]

112/158

Page 278: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Learning probabilistic circuitsParameters Structure

Gen

erative deterministic

closed-form MLE [Kisa et al. 2014a; Peharz et al. 2014]non-deterministicEM [Poon et al. 2011; Peharz 2015; Zhao et al. 2016a]SGD [Sharir et al. 2016; Peharz et al. 2019a]Bayesian [Jaini et al. 2016; Rashwan et al. 2016][Zhao et al. 2016b; Trapp et al. 2019; Vergari et al. 2019]

greedytop-down [Gens et al. 2013; Rooshenas et al. 2014][Rahman et al. 2014; Vergari et al. 2015]bottom-up [Peharz et al. 2013]hill climbing [Lowd et al. 2008, 2013; Peharz et al. 2014][Dennis et al. 2015; Liang et al. 2017a]random RAT-SPNs [Peharz et al. 2019a] XCNet [Di Mauro et al. 2017]

Discriminative

deterministicconvex-opt MLE [Liang et al. 2019]non-deterministicEM [Rashwan et al. 2018]SGD [Gens et al. 2012; Sharir et al. 2016][Peharz et al. 2019a]

greedytop-down [Shao et al. 2019]hill climbing [Rooshenas et al. 2016]

113/158

Page 279: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Representations and theory

Page 280: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Stay tuned for...

Next: 1. How are probabilistic circuits related to logical ones?⇒ a historical perspective

2. How classical tractable models can be turned in a circuit?⇒ Compiling low-treewidth PGMs

3. How do PCs in the literature relate and differ?⇒ SPNs, ACs, CNets, PSDDs

After: More advanced query classes and structural properties!115/158

Page 281: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Tractability to other semi-rings

Tractable probabilistic inference exploits efficient summation for decomposablefunctions in the probability commutative semiring:

(R,+,×, 0, 1)analogously efficient computations can be done in other semi-rings:

(S,⊕,⊗, 0⊕, 1⊗)⇒ Algebraic model counting [Kimmig et al. 2017], Semi-ring

programming [Belle et al. 2016]Historically, very well studied for boolean functions:

(B = 0, 1,∨,∧, 0, 1) ⇒ logical circuits!116/158

Page 282: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Logical circuits

∧ ∧

X4 X3

∨ ∨

∧ ∧∧ ∧

X3 X4

X1 X2 X1 X2

s/d-D/NNFs[Darwiche et al. 2002a]

O/BDDs[Bryant 1986]

SDDs[Darwiche 2011]

Logical circuits are compact representations for boolean functions…117/158

Page 283: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Logical circuitsstructural properties

…and like probabilitistic circuits, one can define structural properties: (structured)decomposability, smoothness, determinism allowing for tractable computations

Darwiche et al., “A knowledge compilation map”, 2002 118/158

Page 284: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Logical circuitsa knowledge compilation map

…inducing a hierarchy of tractable logical circuit families

Darwiche et al., “A knowledge compilation map”, 2002 119/158

Page 285: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Logical circuitsconnection to probabilistic circuits through WMC

A task called weighted model counting (WMC)

WMC(∆, w) =∑x|=∆

∏l∈x

w(l)

Probabilistic inference by WMC:1. Encode probabilistic model as WMC formula∆2. Compile∆ into a logical circuit (e.g. d-DNNF, OBDD, SDD, etc.)3. Tractable MAR/CON by tractable WMC on circuit4. Answer complex queries tractably by enforcing more structural properties

120/158

Page 286: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Logical circuitsconnection to probabilistic circuits through WMC

Resulting compiled WMC circuit equivalent to probabilistic circuit⇒ parameter variables→ edge parameters

λa λa

×× × ×

θa|cθa|c θa|c θa|c θb|c θb|c

λb λb

×× × ×

θb|c θb|c

λc λc

× ×

θc θc

Compiled circuit of WMC encoding

A = a A = a

θa|c θa|c θa|c θa|c

B = b B = b

θb|c θb|c θb|c θb|cC = c C = c

× ×

θc θc

Equivalent probabilistic circuit121/158

Page 287: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A B

A = 0 A = 1 B = 0 B = 1

C = 0 C = 1

× ×

× ×

D = 0 D = 1

122/158

Page 288: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A B

Bottom-up compilation: starting from leaves…

122/158

Page 289: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A B

…compile a leaf CPT

A = 0 A = 1

.3 .7

p(A|C = 0)

122/158

Page 290: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A B

…compile a leaf CPT

A = 0 A = 1

.6 .4

p(A|C = 1)

122/158

Page 291: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A B

…compile a leaf CPT…for all leaves…

A = 0 A = 1 B = 0 B = 1

p(A|C) p(B|C)

122/158

Page 292: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A B

…and recurse over parents…

A = 0 A = 1 B = 0 B = 1

C = 0 C = 1

× ×

.2.8

p(C|D = 0)

122/158

Page 293: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A B

…while reusing previously compiled nodes!…

A = 0 A = 1 B = 0 B = 1

C = 0 C = 1

× ×

.9

.1

p(C|D = 1)

122/158

Page 294: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

From BN trees to circuitsvia compilation

D

C

A BA = 0 A = 1 B = 0 B = 1

C = 0 C = 1

× ×

× ×

D = 0 D = 1

.5 .5

p(D)

122/158

Page 295: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Compilation : probabilistic programming

Chavira et al., “Compiling relational Bayesian networks for exact inference”, 2006Holtzen et al., “Symbolic Exact Inference for Discrete Probabilistic Programs”, 2019De Raedt et al.; Riguzzi; Fierens et al.; Vlasselaer et al., “ProbLog: A Probabilistic Prolog and ItsApplication in Link Discovery.”; “A top down interpreter for LPAD and CP-logic”; “Inference andLearning in Probabilistic Logic Programs using Weighted Boolean Formulas”; “Anytime Inference inProbabilistic Logic Programs with Tp-compilation”, 2007; 2007; 2015; 2015Olteanu et al.; Van den Broeck et al., “Using OBDDs for efficient query evaluation on probabilisticdatabases”; Query Processing on Probabilistic Data: A Survey, 2008; 2017Vlasselaer et al., “Exploiting Local and Repeated Structure in Dynamic Bayesian Networks”, 2016 123/158

Page 296: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Low-treewidh PGMs

Tree, polytrees andThin Junction treescan be turned into

decomposable

smooth

deterministic

circuits

Therefore they supporttractable

EVI

MAR/CON

MAP

D

C

A B

124/158

Page 297: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Arithmetic Circuits (ACs)

ACs [Darwiche 2003] are

decomposable

smooth

deterministic

They support tractable

EVI

MAR/CON

MAP

⇒ parameters are attached to the leaves⇒ …but can be moved to the sum node edges [Rooshenas et al. 2014]

Lowd et al., “Learning Markov Networks With Arithmetic Circuits”, 2013 125/158

Page 298: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Sum-Product Networks (SPNs)

SPNs [Poon et al. 2011] are

decomposable

smooth

deterministic

They support tractable

EVI

MAR/CON

MAP

× ×

× ×× ×

X3 X4

X1 X2 X1 X2

0.3 0.7

0.5 0.5 0.9 0.1

⇒ deterministic SPNs are also called selective [Peharz et al. 2014]

126/158

Page 299: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Cutset Networks (CNets)

CNets[Rahman et al. 2014] are

decomposable

smooth

deterministic

They support tractable

EVI

MAR/CON

MAP

X1

X2 X3

C1

C2 C30.4 0.6

X4X5

X6 X3

0.7

X5X6

X4 X2

0.8

X5X4

X6 X2

0.2

X3X6

X5 X4

0.3

Rahman et al., “Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving theAccuracy of Chow-Liu Trees”, 2014Di Mauro et al., “Learning Accurate Cutset Networks by Exploiting Decomposability”, 2015 127/158

Page 300: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probabilistic Sentential Decision Diagrams

PSDDs [Kisa et al. 2014b] are

structureddecomposable

smooth

deterministic

They support tractable

EVI

MAR/CON

MAP

Complex queries!

Kisa et al., “Probabilistic sentential decision diagrams”, 2014Choi et al., “Tractable learning for structured probability spaces: A case study in learningpreference distributions”, 2015Shen et al., “Conditional PSDDs: Modeling and learning with modular knowledge”, 2018 128/158

Page 301: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

AndOrGraphs

AndOrGarphs[Dechter et al. 2007] are

structureddecomposable

smooth

deterministic

They support tractable

EVI

MAR/CON

MAP

Complex queries!

Dechter et al., “AND/OR search spaces for graphical models”, 2007Marinescu et al., “Best-first AND/OR search for 0/1 integer programming”, 2007 129/158

Page 302: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smooth ∨ decomposable ∨ deterministic∨ structured decomposable PCs?

smooth dec. det. str.dec.

Arithmetic Circuits (ACs) [Darwiche 2003] 4 4 4 8Sum-Product Networks (SPNs) [Poon et al. 2011] 4 4 8 8

Cutset Networks (CNets) [Rahman et al. 2014] 4 4 4 8PSDDs [Kisa et al. 2014b] 4 4 4 4

AndOrGraphs [Dechter et al. 2007] 4 4 4 4

130/158

Page 303: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structured decomposability

A product node is structured decomposable if decomposes according to a node in a vtree⇒ stronger requirement than decomposability

X3

X1 X2

vtree

×

X1 X2

×

X1 X2

X3

×

×

X1 X2

×

X1 X2

X3

×

structured decomposable circuit131/158

Page 304: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Structured decomposability

A product node is structured decomposable if decomposes according to a node in a vtree⇒ stronger requirement than decomposability

X3

X1 X2

vtree

×

X1 X3

×

X1 X3

X2

×

×

X1 X2

×

X1 X2

X3

×

non structured decomposable circuit131/158

Page 305: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probability of logical events

q8: What is the probability of having a traffic jam onmy route to work?

© fineartamerica.com

132/158

Page 306: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Probability of logical events

q8: What is the probability of having a traffic jam onmy route to work?

q8(m) = pm(∨

i∈route JamStr i)

⇒ marginals + logical events

© fineartamerica.com

132/158

Page 307: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + structured decomp. = tractable PR

Computing p(α): the probability of arbitrarylogical formula

Multilinear in circuit sizes if the logical circuit:

is smooth, structured decomposable,deterministic

shares the same vtree

× ×

× ×

X3 X3

X1X2 X2 X1

X3

∧ ∧

X1 > 0.6 ¬X2 X1 ≤ 0.3

133/158

Page 308: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + structured decomp. = tractable PR

If p(x) =∑

i wipi(x),α =∨

j αj ,(smooth p)(smooth + deterministicα):

p(α) =∑i

wipi

∨j

αj

=∑i

wi

∑j

pi (αj)

⇒ probabilities are “pushed down” tochildren

× ×

× ×

X3 X3

X1X2 X2 X1

X3

∧ ∧

X1 > 0.6 ¬X2 X1 ≤ 0.3

133/158

Page 309: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + structured decomp. = tractable PR

If p(x,y) = p(x)p(y),α = β ∧ γ ,(structured decomposability):

p(α) = p (β ∧ γ) · p (β ∧ γ) = p (β) · p (γ)

⇒ probabilities decompose into simplerones

× ×

× ×X3 X3

X1X2 X2 X1

X3

∧ ∧

X1 > 0.6 ¬X2 X1 ≤ 0.3

133/158

Page 310: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + structured decomp. = tractable PR

To compute p(α):

compute the probability for each pair ofprobabilistic and logical circuit nodes forthe same vtree node

⇒ cache the values!

feedforward evaluation (bottom-up)

× ×

× ×

X3 X3

X1X2 X2 X1

X3

∧ ∧

X1 > 0.6 ¬X2 X1 ≤ 0.3

133/158

Page 311: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + structured decomp. = tractable PR

To compute p(α):

compute the probability for each pair ofprobabilistic and logical circuit nodes forthe same vtree node

⇒ cache the values!

feedforward evaluation (bottom-up)

× ×

× ×

X3 X3

X1X2 X2 X1

X3

∧ ∧

X1 > 0.6 ¬X2 X1 ≤ 0.3

133/158

Page 312: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

ADV inference : preference learning

Preferences and rankings as logicalconstraints

Structured decomposable circuits forinference over structured spaces

SOTA on modeling densities over rankings

Choi et al., “Tractable learning for structured probability spaces: A case study in learningpreference distributions”, 2015Shen et al., “A Tractable Probabilistic Model for Subset Selection.”, 2017 134/158

Page 313: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

structured decomposability = tractable…Symmetric and group queries (exactly-k, odd-number, etc.) [Bekker et al. 2015]

For the “right” vtree

Probability of logical circuit event in probabilistic circuit [Choi et al. 2015b]

Multiply two probabilistic circuits [Shen et al. 2016]

KL Divergence between probabilistic circuits [Liang et al. 2017b]

Same-decision probability [Oztok et al. 2016]

Expected same-decision probability [Choi et al. 2017]

Expected classifier agreement [Choi et al. 2018]

Expected predictions [Khosravi et al. 2019c]

135/158

Page 314: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

ADV inference : expected predictions

Reasoning about the output of a classifier or regressor f givena distribution p over the input features

⇒ missing values at test time⇒ exploratory classifier analysis

Exm∼pθ(xm|xo)

[fkϕ (x

m,xo)]

Closed form moments for f and p as structured decomposablecircuits with same v-tree

Khosravi et al., “On Tractable Computation of Expected Predictions”, 2019 136/158

Page 315: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Stay tuned for...

Next: 1. How precise is the characterization of tractable circuits bystructural properties? ⇒ necessary conditions

2. How do structural constraints affect the circuit sizes?⇒ succinctness analysis

After: Conclusions!

137/158

Page 316: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Recall: Smoothness and decomposability are sufficient conditions for partialevidence evaluation of a circuit to compute marginals.

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

× ×

× ×× ×

1.0

X1

.61

X2

1.0

X1

.83

X2

1.0

X3

.58

X4

1.0

X3

.77

X4

.35 .64

.49

.58 .77

.58 .77.58 .77

1.0

X1

.61

X2

1.0

X1

.83

X2

1.0

X3

.58

X4

1.0

X3

.77

X4

138/158

Page 317: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Smoothness and decomposability are necessary and sufficient conditions forpartial evidence evaluation of a circuit to compute marginals.

Non-smooth node⇒ a variable is unaccounted for⇒missing integrals.

Non-decomposable node⇒ integral does not decompose.

139/158

Page 318: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Smoothness and decomposability are necessary and sufficient conditions forpartial evidence evaluation of a circuit to compute marginals.

Non-smooth node⇒ a variable is unaccounted for⇒missing integrals.

Non-decomposable node⇒ integral does not decompose.

139/158

Page 319: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Smoothness + decomposability = tractable MAR

Smoothness and decomposability are necessary and sufficient conditions forpartial evidence evaluation of a circuit to compute marginals.

Non-smooth node⇒ a variable is unaccounted for⇒missing integrals.

Non-decomposable node⇒ integral does not decompose.

139/158

Page 320: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Recall: Determinism and decomposability are sufficient conditions formaximizer circuit evaluation to compute MAP.

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

× ×

max

max max

× ×× ×

.66

X1

.61

X2

.2

X1

.83

X2

1.2

X3

.58

X4

1.8

X3

.77

X4

.35 .64

.49

.58 .77

.58 .77.58 .77

.66

X1

.61

X2

.2

X1

.83

X2

1.2

X3

.58

X4

1.8

X3

.77

X4

140/158

Page 321: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + decomposability = tractable MAP

Recall: Determinism and decomposability are sufficient conditions formaximizer circuit evaluation to compute MAP.

Decomposability is not necessary!⇒ A weaker condition, consistency, suffices.

140/158

Page 322: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Consistency

A product node is consistent if any variable shared between its children appears in asingle leaf node

⇒ decomposability implies consistency

X1 X2 X3

×

w1 w2 w3 w4

consistent circuit

X1 X2 ≤ θ X2 > θ X3

×

w1 w2 w3 w4

inconsistent circuit 141/158

Page 323: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + consistency = tractable MAP

142/158

Page 324: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + consistency = tractable MAP

Ifmaxqshared p(q, e) =maxqshared p(qx, ex) ·maxqshared p(qy, ey) (consistent):

maxq

p(q, e) = maxqx,qy

p(qx, ex,qy, ey)

= maxqx

p(qx, ex) ·maxqy

p(qy, ey)

⇒ solving optimization independently

× ×

× ×× ×

X1

X2

X1

X2

X3 X4 X3 X4

142/158

Page 325: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + consistency = tractable MAP

Determinism and consistency are necessary and sufficient conditions formaximizer circuit evaluation to compute MAP.

Non-deterministic node⇒ cannot maximize correctly withoutsummations.

Inconsistent node⇒ MAP assignments of children conflict with eachother.

143/158

Page 326: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + consistency = tractable MAP

Determinism and consistency are necessary and sufficient conditions formaximizer circuit evaluation to compute MAP.

Non-deterministic node⇒ cannot maximize correctly withoutsummations.

Inconsistent node⇒ MAP assignments of children conflict with eachother.

143/158

Page 327: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Determinism + consistency = tractable MAP

Determinism and consistency are necessary and sufficient conditions formaximizer circuit evaluation to compute MAP.

Non-deterministic node⇒ cannot maximize correctly withoutsummations.

Inconsistent node⇒ MAP assignments of children conflict with eachother.

143/158

Page 328: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

Tractability is defined w.r.t. the size of the model.

How do structural constraints affect expressive efficiency (succinctness) ofprobabilistic circuits?

⇒ Again, connections to logical circuits

144/158

Page 329: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

A family of probabilistic circuitsM1 is at least as succinct asM2

iff for everym2 ∈M2, there existsm1 ∈M1 that represents

the same distribution and |m1| ≤ |poly(m2)|.⇒ denotedM1 ≤M2

⇒ strictly more succinct iffM1 ≤M2 andM1 ≥ M2

145/158

Page 330: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

?

Are smooth&decomposable circuits assuccinct as deterministic & consistent ones,or vice versa?

146/158

Page 331: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

? det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

Smooth&decomposable circuits strictlymore succinct thandeterministic&decomposable ones

Smooth&consistent circuits are equallysuccinct as smooth&decomposable ones

146/158

Page 332: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

? det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

Smooth&decomposable circuits strictlymore succinct thandeterministic&decomposable ones

Smooth&consistent circuits are equallysuccinct as smooth&decomposable ones

146/158

Page 333: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

? det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

?≤

Smooth&decomposable circuits strictlymore succinct thandeterministic&decomposable ones

Smooth&consistent circuits are equallysuccinct as smooth&decomposable ones

146/158

Page 334: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

? det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

?≤

Consider following circuit over Boolean variables:∏ri (Yi · Zi1 + (¬Yi) · Zi2), Zij ∈ X

Size linear in the number of variables

Deterministic and consistent

Marginal (with no evidence) is the solutionto #P-hard SAT′ problem [Valiant 1979]⇒no tractable circuit for marginals!

146/158

Page 335: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

? det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

?≤

Consider following circuit over Boolean variables:∏ri (Yi · Zi1 + (¬Yi) · Zi2), Zij ∈ X

Size linear in the number of variables

Deterministic and consistent

Marginal (with no evidence) is the solutionto #P-hard SAT′ problem [Valiant 1979]⇒no tractable circuit for marginals!

146/158

Page 336: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

? det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

?≤

Consider following circuit over Boolean variables:∏ri (Yi · Zi1 + (¬Yi) · Zi2), Zij ∈ X

Size linear in the number of variables

Deterministic and consistent

Marginal (with no evidence) is the solutionto #P-hard SAT′ problem [Valiant 1979]⇒no tractable circuit for marginals!

146/158

Page 337: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

? det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

?≤

Consider following circuit over Boolean variables:∏ri (Yi · Zi1 + (¬Yi) · Zi2), Zij ∈ X

Size linear in the number of variables

Deterministic and consistent

Marginal (with no evidence) is the solutionto #P-hard SAT′ problem [Valiant 1979]⇒no tractable circuit for marginals!

146/158

Page 338: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

≤≤,≥

Consider following circuit over Boolean variables:∏ri (Yi · Zi1 + (¬Yi) · Zi2), Zij ∈ X

Size linear in the number of variables

Deterministic and consistent

Marginal (with no evidence) is the solutionto #P-hard SAT′ problem [Valiant 1979]⇒no tractable circuit for marginals!

146/158

Page 339: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

Consider following circuit over Boolean variables:∏ri (Yi · Zi1 + (¬Yi) · Zi2), Zij ∈ X

Size linear in the number of variables

Deterministic and consistent

Marginal (with no evidence) is the solutionto #P-hard SAT′ problem [Valiant 1979]⇒no tractable circuit for marginals!

146/158

Page 340: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

Consider the marginal distribution p(X) from anaive Bayes distribution p(X, C):

Linear-size smooth and decomposablecircuit

MAP of p(X) solves marginal MAP ofp(X, C) which is NP-hard [de Campos 2011]

⇒ no tractable circuit for MAP!

146/158

Page 341: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

Consider the marginal distribution p(X) from anaive Bayes distribution p(X, C):

Linear-size smooth and decomposablecircuit

MAP of p(X) solves marginal MAP ofp(X, C) which is NP-hard [de Campos 2011]

⇒ no tractable circuit for MAP!

146/158

Page 342: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

Consider the marginal distribution p(X) from anaive Bayes distribution p(X, C):

Linear-size smooth and decomposablecircuit

MAP of p(X) solves marginal MAP ofp(X, C) which is NP-hard [de Campos 2011]

⇒ no tractable circuit for MAP!

146/158

Page 343: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

≤, ≥

≤, ≥

Neither smooth&decomposable nordeterministic& consistent circuits are moresuccinct than the other!⇒ Choose tractable circuit family based

on your queryMore theoretical questions remaining

⇒ “Complete the map”

146/158

Page 344: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Expressive efficiency of circuits

smooth & Decomp.

MAR

det. & cons.

MAP

det. & Decomp.

[Darwiche et al. 2002b]

: strictly more succinct

smooth & cons.

[Peharz et al. 2015]*

: equally succinct

≤, ≥

≤, ≥

Neither smooth&decomposable nordeterministic& consistent circuits are moresuccinct than the other!⇒ Choose tractable circuit family based

on your queryMore theoretical questions remaining

⇒ “Complete the map”

146/158

Page 345: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Conclusions

Page 346: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Why tractable inference?or expressiveness vs tractability

Probabilistic circuitsa unified framework for tractable probabilistic modeling

Learning circuitslearning their structure and parameters from data

Representations and theorytracing the boundaries of tractability and connections to other formalisms

148/158

Page 347: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Q:MGANs

VAEs

Flows

Trees

Mixtures

Factorized

EVI MAR CON MAP MMAP ADV

IIII I

takeaway #1: tractability is a spectrum149/158

Page 348: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

moreexpressive

efficien

t

less

expressive

efficien

t

more tractable queries

less tractable queries

SPNs

PSDDs

CNets

BNs

NFs

ACsAoGs

NADEs

MNsVAEs

GANs

Fully factorized

NBTrees

PolytreesTJT

LTM

Mixtures

takeaway #2: you can be both tractable and expressive150/158

Page 349: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

×

X1 X2 X3 X1 X1

w1 w2×

X1 ≤ θ X2

×

X1 > θ X2

w1 w2

takeaway #3: probabilistic circuits are a foundation fortractable inference and learning

151/158

Page 350: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Challenge #1hybridizing tractable and intractable models

Hybridize probabilistic inference:tractable models inside intractable loopsand intractable small boxes glued by tractable inference!

152/158

Page 351: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Challenge #2scaling tractable learning

Learn tractable modelsonmillions of datapointsand thousands of featuresin tractable time!

153/158

Page 352: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Challenge #3deep theoretical understanding

Trace a precise pictureof the whole tractabile spectrumand complete the map of succintness!

154/158

Page 353: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Challenge #4advanced and automated reasoning

Move beyond single probabilistic queriestowards fully automated reasoning!

155/158

Page 354: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Readings

Probabilistic circuits: Representation and Learningstarai.cs.ucla.edu/papers/LecNoAAAI20.pdf

Foundations of Sum-Product Networks for probabilistic modelingtinyurl.com/w65po5d

Slides for this tutorialstarai.cs.ucla.edu/slides/AAAI20.pdf

156/158

Page 355: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Code

Juice.jl advanced logical+probabilistic inference with circuits in Juliagithub.com/Juice-jl/ProbabilisticCircuits.jl

SumProductNetworks.jl SPN routines in Juliagithub.com/trappmartin/SumProductNetworks.jlSPFlow easy and extensible python library for SPNsgithub.com/SPFlow/SPFlowLibra several structure learning algorithms in OCamllibra.cs.uoregon.edu

More refs ⇒ github.com/arranger1044/awesome-spn157/158

Page 356: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

Acknowledgments

We thank Nicola Di Mauro for his help with a previous version of this tutorialwe gave at UAI19:

http://web.cs.ucla.edu/~guyvdb/slides/TPMTutorialUAI19.pdf

In the same way,we are grateful to all the people from the logical and probabilistic circuits communitieswhose insightful discussions, ideas and research helped and inspired us.

(12 pages of references incoming!)

158/158

Page 357: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References I⊕ Chow, C and C Liu (1968). “Approximating discrete probability distributions with dependence trees”. In: IEEE Transactions on Information Theory 14.3, pp. 462–467.

⊕ Valiant, Leslie G (1979). “The complexity of enumeration and reliability problems”. In: SIAM Journal on Computing 8.3, pp. 410–421.

⊕ Bryant, R (1986). “Graph-based algorithms for boolean manipulation”. In: IEEE Transactions on Computers, pp. 677–691.

⊕ Cooper, Gregory F (1990). “The computational complexity of probabilistic inference using Bayesian belief networks”. In: Artificial intelligence 42.2-3, pp. 393–405.

⊕ Dagum, Paul and Michael Luby (1993). “Approximating probabilistic inference in Bayesian belief networks is NP-hard”. In: Artificial intelligence 60.1, pp. 141–153.

⊕ Zhang, Nevin Lianwen and David Poole (1994). “A simple approach to Bayesian network computations”. In: Proceedings of the Biennial Conference-Canadian Society forComputational Studies of Intelligence, pp. 171–178.

⊕ Roth, Dan (1996). “On the hardness of approximate reasoning”. In: Artificial Intelligence 82.1–2, pp. 273–302.

⊕ Dechter, Rina (1998). “Bucket elimination: A unifying framework for probabilistic inference”. In: Learning in graphical models. Springer, pp. 75–104.

⊕ Dasgupta, Sanjoy (1999). “Learning polytrees”. In: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp. 134–141.

⊕ Meilă, Marina and Michael I. Jordan (2000). “Learning with mixtures of trees”. In: Journal of Machine Learning Research 1, pp. 1–48.

⊕ Bach, Francis R. and Michael I. Jordan (2001). “Thin Junction Trees”. In: Advances in Neural Information Processing Systems 14. MIT Press, pp. 569–576.

⊕ Darwiche, Adnan (2001). “Recursive conditioning”. In: Artificial Intelligence 126.1-2, pp. 5–41.

Page 358: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References II⊕ Yedidia, Jonathan S, William T Freeman, and Yair Weiss (2001). “Generalized belief propagation”. In: Advances in neural information processing systems, pp. 689–695.

⊕ Chickering, Max (2002). “The WinMine Toolkit”. In: Microsoft, Redmond.

⊕ Darwiche, Adnan and Pierre Marquis (2002a). “A knowledge compilation map”. In: Journal of Artificial Intelligence Research 17, pp. 229–264.

⊕ — (2002b). “A knowledge compilation map”. In: Journal of Artificial Intelligence Research 17.1, pp. 229–264.

⊕ Dechter, Rina, Kalev Kask, and Robert Mateescu (2002). “Iterative join-graph propagation”. In: Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence. MorganKaufmann Publishers Inc., pp. 128–136.

⊕ Darwiche, Adnan (2003). “A Differential Approach to Inference in Bayesian Networks”. In: J.ACM.

⊕ Sang, Tian, Paul Beame, and Henry A Kautz (2005). “Performing Bayesian inference by weighted model counting”. In: AAAI. Vol. 5, pp. 475–481.

⊕ Chavira, Mark, Adnan Darwiche, and Manfred Jaeger (2006). “Compiling relational Bayesian networks for exact inference”. In: International Journal of Approximate Reasoning 42.1-2,pp. 4–20.

⊕ Park, James D and Adnan Darwiche (2006). “Complexity results and approximation strategies for MAP explanations”. In: Journal of Artificial Intelligence Research 21, pp. 101–133.

⊕ De Raedt, Luc, Angelika Kimmig, and Hannu Toivonen (2007). “ProbLog: A Probabilistic Prolog and Its Application in Link Discovery.”. In: IJCAI. Vol. 7. Hyderabad, pp. 2462–2467.

⊕ Dechter, Rina and Robert Mateescu (2007). “AND/OR search spaces for graphical models”. In: Artificial intelligence 171.2-3, pp. 73–106.

⊕ Kulesza, A. and F. Pereira (2007). “Structured Learning with Approximate Inference”. In: Advances in Neural Information Processing Systems 20. MIT Press, pp. 785–792.

Page 359: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References III⊕ Marinescu, Radu and Rina Dechter (2007). “Best-first AND/OR search for 0/1 integer programming”. In: International Conference on Integration of Artificial Intelligence (AI) and

Operations Research (OR) Techniques in Constraint Programming. Springer, pp. 171–185.

⊕ Riguzzi, Fabrizio (2007). “A top down interpreter for LPAD and CP-logic”. In: Congress of the Italian Association for Artificial Intelligence. Springer, pp. 109–120.

⊕ Lowd, Daniel and Pedro Domingos (2008). “Learning Arithmetic Circuits”. In: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence. UAI’08. Helsinki,Finland: AUAI Press, pp. 383–392. ISBN: 0-9749039-4-9. URL: http://dl.acm.org/citation.cfm?id=3023476.3023522.

⊕ Olteanu, Dan and Jiewen Huang (2008). “Using OBDDs for efficient query evaluation on probabilistic databases”. In: International Conference on Scalable Uncertainty Management.Springer, pp. 326–340.

⊕ Koller, Daphne and Nir Friedman (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.

⊕ Choi, Arthur and Adnan Darwiche (2010). “Relax, compensate and then recover”. In: JSAI International Symposium on Artificial Intelligence. Springer, pp. 167–180.

⊕ Lowd, Daniel and Pedro Domingos (2010). “Approximate inference by compilation to arithmetic circuits”. In: Advances in Neural Information Processing Systems, pp. 1477–1485.

⊕ Campos, Cassio Polpo de (2011). “New complexity results for MAP in Bayesian networks”. In: IJCAI. Vol. 11, pp. 2100–2106.

⊕ Darwiche, Adnan (2011). “SDD: A New Canonical Representation of Propositional Knowledge Bases”. In: Proceedings of the Twenty-Second International Joint Conference on ArtificialIntelligence - Volume Volume Two. IJCAI’11. Barcelona, Catalonia, Spain. ISBN: 978-1-57735-514-4.

⊕ de Campos, Cassio P (2011). “New complexity results for MAP in Bayesian networks”. In: IJCAI. Vol. 11, pp. 2100–2106.

⊕ Larochelle, Hugo and Iain Murray (2011). “The Neural Autoregressive Distribution Estimator”. In: International Conference on Artificial Intelligence and Statistics, pp. 29–37.

Page 360: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References IV⊕ Poon, Hoifung and Pedro Domingos (2011). “Sum-Product Networks: a New Deep Architecture”. In: UAI 2011.

⊕ Sontag, David, Amir Globerson, and Tommi Jaakkola (2011). “Introduction to dual decomposition for inference”. In: Optimization for Machine Learning 1, pp. 219–254.

⊕ Gens, Robert and Pedro Domingos (2012). “Discriminative Learning of Sum-Product Networks”. In: Advances in Neural Information Processing Systems 25, pp. 3239–3247.

⊕ — (2013). “Learning the Structure of Sum-Product Networks”. In: Proceedings of the ICML 2013, pp. 873–880.

⊕ Lowd, Daniel and Amirmohammad Rooshenas (2013). “Learning Markov Networks With Arithmetic Circuits”. In: Proceedings of the 16th International Conference on ArtificialIntelligence and Statistics. Vol. 31. JMLR Workshop Proceedings, pp. 406–414.

⊕ Peharz, Robert, Bernhard Geiger, and Franz Pernkopf (2013). “Greedy Part-Wise Learning of Sum-Product Networks”. In: ECML-PKDD 2013.

⊕ Goodfellow, Ian et al. (2014). “Generative adversarial nets”. In: Advances in neural information processing systems, pp. 2672–2680.

⊕ Kingma, Diederik P and Max Welling (2014). “Auto-Encoding Variational Bayes”. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR). 2014.

⊕ Kisa, Doga et al. (2014a). “Probabilistic sentential decision diagrams”. In: Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning(KR). Vienna, Austria. URL: http://starai.cs.ucla.edu/papers/KisaKR14.pdf.

⊕ — (2014b). “Probabilistic sentential decision diagrams”. In: Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR). Vienna,Austria.

⊕ Martens, James and Venkatesh Medabalimi (2014). “On the Expressive Efficiency of Sum Product Networks”. In: CoRR abs/1411.7717.

Page 361: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References V⊕ Peharz, Robert, Robert Gens, and Pedro Domingos (2014). “Learning Selective Sum-Product Networks”. In: Workshop on Learning Tractable Probabilistic Models. LTPM.

⊕ Rahman, Tahrima, Prasanna Kothalkar, and Vibhav Gogate (2014). “Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees”.In: Machine Learning and Knowledge Discovery in Databases. Vol. 8725. LNCS. Springer, pp. 630–645.

⊕ Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra (2014). “Stochastic backprop. and approximate inference in deep generative models”. In: arXiv preprintarXiv:1401.4082.

⊕ Rooshenas, Amirmohammad and Daniel Lowd (2014). “Learning Sum-Product Networks with Direct and Indirect Variable Interactions”. In: Proceedings of ICML 2014.

⊕ Adel, Tameem, David Balduzzi, and Ali Ghodsi (2015). “Learning the Structure of Sum-Product Networks via an SVD-based Algorithm”. In: Uncertainty in Artificial Intelligence.

⊕ Bekker, Jessa et al. (2015). “Tractable Learning for Complex Probability Queries”. In: Advances in Neural Information Processing Systems 28 (NIPS).

⊕ Burda, Yuri, Roger Grosse, and Ruslan Salakhutdinov (2015). “Importance weighted autoencoders”. In: arXiv preprint arXiv:1509.00519.

⊕ Choi, Arthur, Guy Van den Broeck, and Adnan Darwiche (2015a). “Tractable learning for structured probability spaces: A case study in learning preference distributions”. In:Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI).

⊕ Choi, Arthur, Guy Van Den Broeck, and Adnan Darwiche (2015b). “Tractable Learning for Structured Probability Spaces: A Case Study in Learning Preference Distributions”. In:Proceedings of the 24th International Conference on Artificial Intelligence. IJCAI’15. Buenos Aires, Argentina: AAAI Press, pp. 2861–2868. ISBN: 978-1-57735-738-4. URL:http://dl.acm.org/citation.cfm?id=2832581.2832649.

⊕ Dennis, Aaron and Dan Ventura (2015). “Greedy Structure Search for Sum-product Networks”. In: IJCAI’15. Buenos Aires, Argentina: AAAI Press, pp. 932–938. ISBN:978-1-57735-738-4.

Page 362: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References VI⊕ Di Mauro, Nicola, Antonio Vergari, and Floriana Esposito (2015a). “Learning Accurate Cutset Networks by Exploiting Decomposability”. In: Proceedings of AIXIA. Springer,

pp. 221–232.

⊕ Di Mauro, Nicola, Antonio Vergari, and Teresa M.A. Basile (2015b). “Learning Bayesian Random Cutset Forests”. In: Proceedings of ISMIS. Springer, pp. 122–132.

⊕ Fierens, Daan et al. (2015). “Inference and Learning in Probabilistic Logic Programs using Weighted Boolean Formulas”. In: Theory and Practice of Logic Programming 15 (03),pp. 358–401. ISSN: 1475-3081. DOI: 10.1017/S1471068414000076. URL: http://starai.cs.ucla.edu/papers/FierensTPLP15.pdf.

⊕ Germain, Mathieu et al. (2015). “MADE: Masked Autoencoder for Distribution Estimation”. In: CoRR abs/1502.03509.

⊕ Peharz, Robert (2015). “Foundations of Sum-Product Networks for Probabilistic Modeling”. PhD thesis. Graz University of Technology, SPSC.

⊕ Peharz, Robert et al. (2015). “On Theoretical Properties of Sum-Product Networks”. In: The Journal of Machine Learning Research.

⊕ Vergari, Antonio, Nicola Di Mauro, and Floriana Esposito (2015). “Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning”. In: ECML-PKDD 2015.

⊕ Vlasselaer, Jonas et al. (2015). “Anytime Inference in Probabilistic Logic Programs with Tp-compilation”. In: Proceedings of 24th International Joint Conference on Artificial Intelligence(IJCAI). URL: http://starai.cs.ucla.edu/papers/VlasselaerIJCAI15.pdf.

⊕ Belle, Vaishak and Luc De Raedt (2016). “Semiring Programming: A Framework for Search, Inference and Learning”. In: arXiv preprint arXiv:1609.06954.

⊕ Cohen, Nadav, Or Sharir, and Amnon Shashua (2016). “On the expressive power of deep learning: A tensor analysis”. In: Conference on Learning Theory, pp. 698–728.

⊕ Friesen, Abram L and Pedro Domingos (2016). “Submodular Sum-product Networks for Scene Understanding”. In:

Page 363: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References VII⊕ Jaini, Priyank et al. (2016). “Online Algorithms for Sum-Product Networks with Continuous Variables”. In: Probabilistic Graphical Models - Eighth International Conference, PGM 2016,

Lugano, Switzerland, September 6-9, 2016. Proceedings, pp. 228–239. URL: http://jmlr.org/proceedings/papers/v52/jaini16.html.

⊕ Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu (2016). “Pixel recurrent neural networks”. In: arXiv preprint arXiv:1601.06759.

⊕ Oztok, Umut, Arthur Choi, and Adnan Darwiche (2016). “Solving PP-PP-complete problems using knowledge compilation”. In: Fifteenth International Conference on the Principles ofKnowledge Representation and Reasoning.

⊕ Peharz, Robert et al. (2016). “On the Latent Variable Interpretation in Sum-Product Networks”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence PP, Issue 99. URL:http://arxiv.org/abs/1601.06180.

⊕ Pronobis, A. and R. P. N. Rao (2016). “Learning Deep Generative Spatial Models for Mobile Robots”. In: ArXiv e-prints. arXiv: 1610.02627 [cs.RO].

⊕ Rahman, Tahrima and Vibhav Gogate (2016). “Learning Ensembles of Cutset Networks”. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI’16. Phoenix,Arizona: AAAI Press, pp. 3301–3307. URL: http://dl.acm.org/citation.cfm?id=3016100.3016365.

⊕ Rashwan, Abdullah, Han Zhao, and Pascal Poupart (2016). “Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks”. In: Proceedingsof the 19th International Conference on Artificial Intelligence and Statistics, pp. 1469–1477.

⊕ Rooshenas, Amirmohammad and Daniel Lowd (2016). “Discriminative Structure Learning of Arithmetic Circuits”. In: Proceedings of the 19th International Conference on ArtificialIntelligence and Statistics, pp. 1506–1514.

⊕ Sguerra, Bruno Massoni and Fabio G Cozman (2016). “Image classification using sum-product networks for autonomous flight of micro aerial vehicles”. In: 2016 5th BrazilianConference on Intelligent Systems (BRACIS). IEEE, pp. 139–144.

Page 364: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References VIII⊕ Sharir, Or et al. (2016). “Tractable generative convolutional arithmetic circuits”. In: arXiv preprint arXiv:1610.04167.

⊕ Shen, Yujia, Arthur Choi, and Adnan Darwiche (2016). “Tractable Operations for Arithmetic Circuits of Probabilistic Models”. In: Advances in Neural Information Processing Systems 29:Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pp. 3936–3944.

⊕ Vlasselaer, Jonas et al. (2016). “Exploiting Local and Repeated Structure in Dynamic Bayesian Networks”. In: Artificial Intelligence 232, pp. 43 –53. ISSN: 0004-3702. DOI:10.1016/j.artint.2015.12.001.

⊕ Yuan, Zehuan et al. (2016). “Modeling spatial layout for scene image understanding via a novel multiscale sum-product network”. In: Expert Systems with Applications 63,pp. 231–240.

⊕ Zhao, Han, Pascal Poupart, and Geoffrey J Gordon (2016a). “A Unified Approach for Learning the Parameters of Sum-Product Networks”. In: Advances in Neural InformationProcessing Systems 29. Ed. by D. D. Lee et al. Curran Associates, Inc., pp. 433–441.

⊕ Zhao, Han et al. (2016b). “Collapsed Variational Inference for Sum-Product Networks”. In: In Proceedings of the 33rd International Conference on Machine Learning. Vol. 48.

⊕ Alemi, Alexander A et al. (2017). “Fixing a broken ELBO”. In: arXiv preprint arXiv:1711.00464.

⊕ Choi, YooJung, Adnan Darwiche, and Guy Van den Broeck (2017). “Optimal feature selection for decision robustness in Bayesian networks”. In: Proceedings of the 26th InternationalJoint Conference on Artificial Intelligence (IJCAI).

⊕ Conaty, Diarmaid, Denis Deratani Mauá, and Cassio Polpo de Campos (2017). “Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks”. In:Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence. Ed. by Gal Elidan and Kristian Kersting. AUAI Press, pp. 322–331.

⊕ Di Mauro, Nicola et al. (2017). “Fast and Accurate Density Estimation with Extremely Randomized Cutset Networks”. In: ECML-PKDD 2017.

Page 365: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References IX⊕ Kimmig, Angelika, Guy Van den Broeck, and Luc De Raedt (2017). “Algebraic model counting”. In: Journal of Applied Logic 22, pp. 46–62.

⊕ Liang, Yitao, Jessa Bekker, and Guy Van den Broeck (2017a). “Learning the structure of probabilistic sentential decision diagrams”. In: Proceedings of the 33rd Conference onUncertainty in Artificial Intelligence (UAI).

⊕ Liang, Yitao and Guy Van den Broeck (2017b). “Towards Compact Interpretable Models: Shrinking of Learned Probabilistic Sentential Decision Diagrams”. In: IJCAI 2017 Workshopon Explainable Artificial Intelligence (XAI). URL: http://starai.cs.ucla.edu/papers/LiangXAI17.pdf.

⊕ Pronobis, Andrzej, Francesco Riccio, and Rajesh PN Rao (2017). “Deep spatial affordance hierarchy: Spatial knowledge representation for planning in large-scale environments”. In:ICAPS 2017 Workshop on Planning and Robotics, Pittsburgh, PA, USA.

⊕ Rathke, Fabian, Mattia Desana, and Christoph Schnörr (2017). “Locally adaptive probabilistic models for global segmentation of pathological oct scans”. In: International Conferenceon Medical Image Computing and Computer-Assisted Intervention. Springer, pp. 177–184.

⊕ Salimans, Tim et al. (2017). “PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications”. In: arXiv preprint arXiv:1701.05517.

⊕ Shen, Yujia, Arthur Choi, and Adnan Darwiche (2017). “A Tractable Probabilistic Model for Subset Selection.”. In: UAI.

⊕ Van den Broeck, Guy and Dan Suciu (2017). Query Processing on Probabilistic Data: A Survey. Foundations and Trends in Databases. Now Publishers. DOI: 10.1561/1900000052.URL: http://starai.cs.ucla.edu/papers/VdBFTDB17.pdf.

⊕ Butz, Cory J et al. (2018). “An Empirical Study of Methods for SPN Learning and Inference”. In: International Conference on Probabilistic Graphical Models, pp. 49–60.

⊕ Choi, YooJung and Guy Van den Broeck (2018). “On robust trimming of Bayesian network classifiers”. In: arXiv preprint arXiv:1805.11243.

Page 366: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References X⊕ Di Mauro, Nicola et al. (2018). “Sum-Product Network structure learning by efficient product nodes discovery”. In: Intelligenza Artificiale 12.2, pp. 143–159.

⊕ Friedman, Tal and Guy Van den Broeck (2018). “Approximate Knowledge Compilation by Online Collapsed Importance Sampling”. In: Advances in Neural Information ProcessingSystems 31 (NeurIPS). URL: http://starai.cs.ucla.edu/papers/FriedmanNeurIPS18.pdf.

⊕ Jaini, Priyank, Amur Ghose, and Pascal Poupart (2018). “Prometheus: Directly Learning Acyclic Directed Graph Structures for Sum-Product Networks”. In: International Conferenceon Probabilistic Graphical Models, pp. 181–192.

⊕ Molina, Alejandro et al. (2018). “Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains”. In: AAAI.

⊕ Rashwan, Abdullah, Pascal Poupart, and Chen Zhitang (2018). “Discriminative Training of Sum-Product Networks by Extended Baum-Welch”. In: International Conference onProbabilistic Graphical Models, pp. 356–367.

⊕ Shen, Yujia, Arthur Choi, and Adnan Darwiche (2018). “Conditional PSDDs: Modeling and learning with modular knowledge”. In: Thirty-Second AAAI Conference on Artificial Intelligence.

⊕ Zheng, Kaiyu, Andrzej Pronobis, and Rajesh PN Rao (2018). “Learning graph-structured sum-product networks for probabilistic semantic maps”. In: Thirty-Second AAAI Conference onArtificial Intelligence.

⊕ Dai, Bin and David Wipf (2019). “Diagnosing and enhancing vae models”. In: arXiv preprint arXiv:1903.05789.

⊕ Ghosh, Partha et al. (2019). “From variational to deterministic autoencoders”. In: arXiv preprint arXiv:1903.12436.

⊕ Holtzen, Steven, Todd Millstein, and Guy Van den Broeck (2019). “Symbolic Exact Inference for Discrete Probabilistic Programs”. In: arXiv preprint arXiv:1904.02079.

⊕ Khosravi, Pasha et al. (2019a). “On Tractable Computation of Expected Predictions”. In: Advances in Neural Information Processing Systems, pp. 11167–11178.

Page 367: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References XI⊕ Khosravi, Pasha et al. (2019b). “What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features”. In: arXiv preprint arXiv:1903.01620.

⊕ Khosravi, Pasha et al. (2019c). “What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features”. In: Proceedings of the 28th International Joint Conferenceon Artificial Intelligence (IJCAI).

⊕ Kossen, Jannik et al. (2019). “Structured Object-Aware Physics Prediction for Video Modeling and Planning”. In: arXiv preprint arXiv:1910.02425.

⊕ Liang, Yitao and Guy Van den Broeck (2019). “Learning Logistic Circuits”. In: Proceedings of the 33rd Conference on Artificial Intelligence (AAAI).

⊕ Peharz, Robert et al. (2019a). “Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning”. In: Uncertainty in Artificial Intelligence.

⊕ Peharz, Robert et al. (2019b). “Random sum-product networks: A simple but effective approach to probabilistic deep learning”. In: Proceedings of UAI.

⊕ Shao, Xiaoting et al. (2019). “Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures”. In: arXiv preprint arXiv:1905.08550.

⊕ Shih, Andy et al. (2019). “Smoothing Structured Decomposable Circuits”. In: arXiv preprint arXiv:1906.00311.

⊕ Stelzner, Karl, Robert Peharz, and Kristian Kersting (2019). “Faster Attend-Infer-Repeat with Tractable Probabilistic Models”. In: Proceedings of the 36th International Conference onMachine Learning. Ed. by Kamalika Chaudhuri and Ruslan Salakhutdinov. Vol. 97. Proceedings of Machine Learning Research. Long Beach, California, USA: PMLR,pp. 5966–5975. URL: http://proceedings.mlr.press/v97/stelzner19a.html.

⊕ Tan, Ping Liang and Robert Peharz (2019). “Hierarchical Decompositional Mixtures of Variational Autoencoders”. In: Proceedings of the 36th International Conference on MachineLearning. Ed. by Kamalika Chaudhuri and Ruslan Salakhutdinov. Vol. 97. Proceedings of Machine Learning Research. Long Beach, California, USA: PMLR, pp. 6115–6124. URL:http://proceedings.mlr.press/v97/tan19b.html.

Page 368: Probabilistic Representations Circuits Inference Learning …starai.cs.ucla.edu/slides/AAAI20.pdf · 2020. 2. 19. · Probabilistic Circuits Representations Inference Learning Applications

References XII

⊕ Trapp, Martin et al. (2019). “Bayesian Learning of Sum-Product Networks”. In: Advances in neural information processing systems (NeurIPS).

⊕ Vergari, Antonio et al. (2019). “Automatic Bayesian density analysis”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33, pp. 5207–5215.