59
Lorena Álvarez Pérez Machine Learning Group (MLG) March 5, 2018 Structured Probabilistic Models for Deep Learning Lectures slides for Chapter 16 of Deep Learning www.deeplearningbook.org Ian Goodfellow Deep Learning Book Chapter 16: Graphical Models March 5, 2018

Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Lorena Álvarez Pérez Machine Learning Group (MLG)

March 5, 2018

Structured Probabilistic

Models for Deep Learning

Lectures slides for Chapter 16 of Deep Learning www.deeplearningbook.org

Ian Goodfellow

Deep  Learning  Book   Chapter  16:  Graphical  Models                                                              March  5,  2018  

Page 2: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

MLG:  Semi-­‐supervised  Learning   November  15,  2017                                                              2/58  

Index  

0. Overview 1.  The Challenge of Unstructured Modeling 2.  Using graphs to describe model structure 3.  Sampling from Graphical models 4.  Advantages of Structured Modeling 5.  Learning about Dependencies 6.  Inference and Approximate Inference 7.  The Deep Learning Approach to Structured Probabilistic

Models

Page 3: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Role of probabilistic structured models in deep learning �  For many tasks (other than classification), a full

representation of the probability distribution of variables is needed -  e.g., denoising, missing value imputation, sampling, etc.

�  Structured models (also known as probabilistic graphical models, PGMs) provide compact representations -  When compared to full (unstructured) probability distributions

March  5,  2018                                                              3/58  

Deep  Learning  Book:  Chapter  16  

0. Overview

Page 4: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

What are structured probabilistic models? �  A way of describing probability distributions using a

graph to describe which variables interact with each other directly

�  Graph is used in the sense of graph theory -  Vertices connected to one another by edges

�  Because structure is described by a graph, they are are called graphical models

In deep learning, different model structures, learning algorithms and inference procedures are used!

March  5,  2018                                                              4/58  

Deep  Learning  Book:  Chapter  16  

0. Overview

Page 5: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

•  The goal of deep learning is to scale machine learning to the kinds of challenges needed to solve artificial intelligence –  e.g., understand natural images, audio waveforms representing speech, etc.

•  The classification task of machine learning is a limited goal –  They take input from a rich high-dimensional distribution and summarize

it with a categorical label –  They discard most of the input

–  They produce a single output •  Or a probability distribution over values of that single output

It is possible to ask probabilistic models to do many

other tasks!

March  5,  2018                                                              5/58  

Deep  Learning  Book:  Chapter  16  

1. The Challenge of Unstructured Modeling

Page 6: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Probabilistic models for other tasks �  They are more expensive than classification

�  They require producing many output values

�  They require complete understanding of structure of entire input without ignoring sections of it

�  Some tasks are: 1)  Density estimation

2)  Denoising

3)  Missing value imputation

4)  Sampling

March  5,  2018                                                              6/58  Deep  Learning  Book:  Chapter  16  

1. The Challenge of Unstructured Modeling

Page 7: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Example: Probabilistic modeling of natural images Generate new samples from a distribution p(x)

March  5,  2018                                                              7/58  Deep  Learning  Book:  Chapter  16  

1. The Challenge of Unstructured Modeling

Page 8: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Intractability of rich distribution •  It is a challenging task, both computationally and statistically

•  Consider a 32x32x3 binary image –  There are 23072 possible images

•  If we have n discrete variables with k possible values each, naïve approach of representing p(x) requires storing a table with kn values!!

This is not feasible!

March  5,  2018                                                              8/58  Deep  Learning  Book:  Chapter  16  

1. The Challenge of Unstructured Modeling

Page 9: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Intractability of rich distribution •  Memory •  Statistical efficiency •  Runtime: Cost of inference

–  If we have p(x) and need to infer p(x1) or p(x2|x1) requires summing across the entire table

•  Runtime: Cost of sampling

Table-based approach models every possible interaction between variables, but usually variables

influence each other only inderectly

March  5,  2018                                                              9/58  Deep  Learning  Book:  Chapter  16  

1. The Challenge of Unstructured Modeling

Page 10: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Direct and indirect interaction �  Consider modeling finishing times in a relay. The team has

three runners: Alice, Bob and Carol. �  Alice hands baton to Bob, Bob hands to Carol who finishes the

lap -  Alice’s finishing does not depend on anyone else, Bob’s

finishing time depends on Alice’s and Carol’s depends on Bob’s

-  Carol’s finishing time depends only indirectly on Alice’s

-  If we already know Bob’s finishing time, we will not be able to better estimate Carol’s finishing time by finding out what Alice’s finishing time was

March  5,  2018                                                              10/58  

2. Using Graphs to Describe Model Structured

2.1 Directed Models

Deep  Learning  Book:  Chapter  16  

Page 11: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Using graphs to describe model structure •  Each node represents a random variable

•  Each edge represents a direct interaction -  These direct interactions imply other indirect interactions

-  But only direct interactions need to be represented

•  Graphical models can be largely divided into two categories 1)  Models based on direct acyclic graphs

2)  Models based on undirected graphs

March  5,  2018                                                              11/58  

2. Using Graphs to Describe Model Structured

2.1 Directed Models

Deep  Learning  Book:  Chapter  16  

Page 12: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Direct Graphical Model •  Also called belief network or Bayesian network

•  In the relay race example…

Bob’s finishing time t1 depends on Alice’s finishing time t0 and Carol’s finishing time t2 depends on Bob’s finishing time t1

What does the arrow represent?

March  5,  2018                                                              12/58  

2. Using Graphs to Describe Model Structured

2.1 Directed Models

Deep  Learning  Book:  Chapter  16  

t0 t1 t2

Alice Bob Carol

Page 13: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Meaning of directed edges Drawing an arrow from “a” to “b” means we define a conditional probability distribution (CPD) over “b” via a conditional distribution. with “a” as one of the variables on the right side of the conditional bar

- i.e., distribution over “b” depends on the value of “a”

March  5,  2018                                                              13/58  

2. Using Graphs to Describe Model Structured

2.1 Directed Models

Deep  Learning  Book:  Chapter  16  

a b p(b|a)

Page 14: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Formal direct graphical model •  A directed graphical acyclic graph ( ) defined on variables x

needs: –  A set of vertices which represent the random variables in the

model

–  A set of local CPDs:

–  The probability distribution over x is given by

–  In the relay race example:

March  5,  2018                                                              14/58  

2. Using Graphs to Describe Model Structured

2.1 Directed Models

Deep  Learning  Book:  Chapter  16  

G

p(x) =�

i

p(xi|PaG(xi))

This expression gives the parents of xi in Gp(xi|PaG(xi))

p(t0, t1, t2) = p(t0)p(t0|t1)p(t2|t1)

Page 15: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Savings achieved by direct model •  As an example: If t0, t1 and t2 are discrete with 100 values,

the single table would require 999999 values -  By making tables for only conditional probabilities, we need only

18999 values (it is reduced by a factor of more than 50)

•  The cost of a single table for modeling n discrete variables each having k values is

•  If m is the maximum no. of variables appearing (on either side of the conditioning bar) in a single CPD, the cost of the tables for direct model is –  As long as m << n, very dramatic saving are got!

March  5,  2018                                                              15/58  

2. Using Graphs to Describe Model Structured

2.1 Directed Models

Deep  Learning  Book:  Chapter  16  

O(kn)

O(km)

Page 16: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

•  Also known as Markov random fields o Markov networks •  They use graphs whose edges are undirected and have no

CPD –  Direct models work best when influence clearly flows in one

direction

–  Undirect models work best when influence has no clear direction or is best modeled as flowing in both directions

Let’s see an example!

March  5,  2018                                                              16/58  

2. Using Graphs to Describe Model Structured

2.2 Undirected models

Deep  Learning  Book:  Chapter  16  

Page 17: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Example: The health undirected model •  Consider a model over three binary variables

•  Whether or not you are sick ( )

•  Whether or not your coworker is sick ( )

•  Whether or not your roommate is sick ( )

•  Let us assume coworker and roommate do not know each other, very unlikely one of them will give a cold to the other (we do not model it)

•  There is no clear directionality either undirected model

March  5,  2018                                                              17/58  

2. Using Graphs to Describe Model Structured

2.2 Undirected models

Deep  Learning  Book:  Chapter  16  

hy

hc

hr

Page 18: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Example: The health undirected graph

•  You and your roommate may infect each other with a cold

•  You and your coworker may do the same

•  Let us assume your roommate and colleague do not know

March  5,  2018                                                              18/58  

2. Using Graphs to Describe Model Structured

2.2 Undirected models

Deep  Learning  Book:  Chapter  16  

hyhc hr

Does your roommate have a cold?

Do you have a cold?

Does your coworker have a cold?

Page 19: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Formal undirected graphical model •  An undirected probabilistic graphical model is defined on a

graph ( ) –  For each clique in the graph, a factor (also called a

clique potential) measures the affinity of the variables for being in each of their joint states

–  Together, they define an unnormalized probability distribution

March  5,  2018                                                              19/58  

2. Using Graphs to Describe Model Structured

2.2 Undirected Models

Deep  Learning  Book:  Chapter  16  

GC φ(C)

A subset of nodes all connected to each other

�p(x) =�

C∈Gφ(C)

Page 20: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Example: This graph (with five cliques) implies that

March  5,  2018                                                              20/58  

2. Using Graphs to Describe Model Structured

2.2 Undirected Models

Deep  Learning  Book:  Chapter  16  

a b c

d e f

p(a, b, c, d, e, f) =1Z

φa,b(a, b)φb,c(b, c)φa,d(a, d)φb,e(b, e)φe,f (e, f)

p(a, b, c, d, e, f)∼

Page 21: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

•  The unnormalized probability distribution : –  It is guaranteed to be non-negative

–  It is not guaranteed to sum or integrate to 1

•  To obtain a valid probability distribution, we must normalize the distribution (Gibbs distribution)

•  Obviously:

–  Z is a constant when the ϕ functions are constants

–  If ϕ has parameters, then Z is a function of those parameters

March  5,  2018                                                              21/58  

2. Using Graphs to Describe Model Structured

2.3 The Partition Function

Deep  Learning  Book:  Chapter  16  

p(x)∼

p(x) =1Z

�p(x) �p(x) =�

C∈Gφ(C)

Z =�

�p(x)dx

Partition function

Page 22: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Intractability of Z •  Since Z is an integral or sum of all possible values of x,

it is often intractable to compute

•  In order to compute a normalize probability of an undirected model: –  Model structure and definitions of ϕ functions must be conductive to

computing Z efficiently

–  In deep learning, Z is intractable and we must resort to approximations

March  5,  2018                                                              22/58  

2. Using Graphs to Describe Model Structured

2.3 The Partition Function

Deep  Learning  Book:  Chapter  16  

Page 23: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Difference between directed & undirected •  Directed models are:

–  Defined directly in terms of probability distributions

•  Undirected models are: –  Defined loosely in terms of ϕ functions which must then

converted into probability distributions

–  Domain of variables has a dramatic effect on kind of probability distributions given a set ϕ functions corresponds to

March  5,  2018                                                              23/58  

2. Using Graphs to Describe Model Structured

2.3 The Partition Function

Deep  Learning  Book:  Chapter  16  

Page 24: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

•  Many interesting theoretical results of undirected graphs depend on the assumption:

•  We can enforce this using an energy-based model

–  is known as the energy function

–  Since no energy function will result in probability of zero for any state of x

•  Any distribution of the form is referred to as a Boltzmann distribution

March  5,  2018                                                              24/58  

2. Using Graphs to Describe Model Structured

2.4 Energy-Based Models

Deep  Learning  Book:  Chapter  16  

�p(x) > 0 ∀x�

�p(x) =�

C∈Gφ(C)

!p(x) = exp !E(x)( )E(x)

exp(z)> 0 !z,

!p(x) = exp !E(x)( )

Page 25: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

•  Cliques in the undirected graph correspond to factors in the unnormalized probability function –  Since , different cliques in undirected graph

correspond to different terms of the energy function •  Exponentation makes each term of the energy function correspond to a factor for a

different clique

•  i.e., energy-based model is a special Markov network

•  This graph (with five cliques) implies that:

March  5,  2018                                                              25/58  

2. Using Graphs to Describe Model Structured

2.2 Undirected models

Deep  Learning  Book:  Chapter  16  

exp(a)exp(b) = exp(a+ b)

a b c

d e f

E(a,b,c,d,e, f ) = Ea,b(a,b)+Eb,c (b,c)+Ea,d (a,d)++Eb,e(b,e)+Ee, f (e, f )

ϕ functions are obtained by setting each ϕ to the exponential of the

corresponding negative energy !a,b(a,b) = exp(!E(a,b))

Page 26: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

•  This graph (with five cliques) implies that:

•  ϕ functions are obtained by setting each ϕ to the exponential of the corresponding negative energy

March  5,  2018                                                              26/58  

2. Using Graphs to Describe Model Structured

2.2 Undirected models

Deep  Learning  Book:  Chapter  16  

a b c

d e f

E(a,b,c,d,e, f ) = Ea,b(a,b)+Eb,c (b,c)+Ea,d (a,d)+Eb,e(b,e)+Ee, f (e, f )

!a,b(a,b) = exp !E(a,b)( )

Page 27: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Separation in undirected models •  Identifying conditional independences is very simple

–  Conditional independence implied by the graph is called separation

•  A set of variables A is separated from variables B given a third set of variables S if the graph structure implies that A is independent from B given S

•  If two variables “a” and “b” are connected by a path involving only unobserved variables, then they are not separated –  If no path exists between them, or all paths contain an observed

variable they they are separated

March  5,  2018                                                              27/58  

2. Using Graphs to Describe Model Structured

2.5 Separation and D-separation

Deep  Learning  Book:  Chapter  16  

Page 28: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Separation in undirected models: Example •  “b” is shaded to indicate that it is observed

•  “b” blocks path from “a” to “c”, so “a” and “c” are separated given “b”

•  There is an active path from “a” to “d”, so “a” and “d” are not separated given “b”

March  5,  2018                                                              28/58  

2. Using Graphs to Describe Model Structured

2.5 Separation and D-separation

Deep  Learning  Book:  Chapter  16  

a

c

d

b

Page 29: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Separation in directed models •  In the context of directed graphs, these separations concepts

are called d-separation •  D-separation is defined as the same as separation for

undirected graphs: –  A set of variables A is separated from variables B given a third set of

variables S if the graph structure implies that A is independent from B given S

•  Two variables are dependent if there is an active path between them. If not, they are d-separated.

•  In directed nets determining whether a path is active is more complicated

March  5,  2018                                                              29/58  

2. Using Graphs to Describe Model Structured

2.5 Separation and D-separation

Deep  Learning  Book:  Chapter  16  

Page 30: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

All active paths (of length 2) in direct models between “a” and “b”

March  5,  2018                                                              30/58  

2. Using Graphs to Describe Model Structured

2.5 Separation and D-separation

Deep  Learning  Book:  Chapter  16  

V-structure or the collider case The explaining away effect

(e.g., relay race) A common cause s

Page 31: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018  

2. Using Graphs to Describe Model Structured

2.6 Converting between Undirected and Directed Graphs

Deep  Learning  Book:  Chapter  16  

•  No probabilistic model is inherently directed or undirected –  Some models are easily described using a direct graph, or most easily

described using an undirected graph

•  Direct models and undirected models have both their advantages and disadvantages –  The choice will partially depend on which probability distribution we

wish to describe –  Which approach can capture the most independences in the

probability distribution or which approach uses the fewest edges

•  Every probability distribution can be represented by either a directed model or by an undirected model –  Worst case: “complete graph”

                                                           31/58  

Page 32: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

“Complete graphs” §  Directed models:

•  Any directed acyclic graph where we impose some ordering in the random variables •  Each variable has all other variables that precede it in the ordering as its ancestors

in the graph

§  Undirected models: •  A graph containing a single clique encompassing all of the variables

March  5,  2018                                                              32/58  

2. Using Graphs to Describe Model Structured

2.6 Converting between Undirected and Directed Graph

Deep  Learning  Book:  Chapter  16  

Undirected  model   Directed  model  

Not useful because they do not imply any independences!

Page 33: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Converting a direct model into an undirected model •  We need to create a new graph •  Looking at graph :

–  For every pair of variables “x” and “y”, we add an undirected edge connecting “x” and “y” to if there is a directed edge between or if “x” and “y” are both parents of a third variable “z”

•  The resulting graph ( ) is known as a moralized graph

March  5,  2018                                                              33/58  

2. Using Graphs to Describe Model Structured

2.6 Converting between Undirected and Directed Graph

Deep  Learning  Book:  Chapter  16  

DU

U

D

U

Page 34: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Examples of converting direct models to undirected models

March  5,  2018                                                              34/58  

2. Using Graphs to Describe Model Structured

2.6 Converting between Undirected and Directed Graph

Deep  Learning  Book:  Chapter  16  

Undirected  model  

Directed  model  

Page 35: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Converting an undirected model to a directed model •  A loop is a sequence of variables connected by undirected

edges, with the last variable connected to the first one in the sequence

•  A chord is a connection between any two non-consecutive variables in the sequence defining a loop

•  We can not create a directed model if the graph have loops of length four or greater –  Solution: To add edges to triangulate long loops (the new graph is

known as chordal or triangulated graph)

•  Finally, it is necessary to assign directions to the edges –  No direct cycles are allowed!

March  5,  2018                                                              35/58  

2. Using Graphs to Describe Model Structured

2.6 Converting between Undirected and Directed Graph

Deep  Learning  Book:  Chapter  16  

Page 36: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Examples of converting an undirected model to a directed one

March  5,  2018                                                              36/58  

2. Using Graphs to Describe Model Structured

2.6 Converting between Undirected and Directed Graph

Deep  Learning  Book:  Chapter  16  

Undirected  model   Directed  model  

No  loops  of  length  greater  than  three    

are  allowed!  

Edges  are  added  to  triangulate  long  loops  

To  assign  direc4ons  to  edges  (no  direct  cycles  

are  allowed!)  

Page 37: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

•  Factor graphs resolve an ambiguity in the graphical representation of standard undirected models –  Ambiguity arises because it is not clear if each clique actually has a

corresponding factor whose scope encompasses the entire clique

•  A factor graph is a graphical representation of an undirected model that that consists of a bipartite undirected graph –  Some of the nodes are drawn as circles

•  They correspond to random variables in the standard undirected model

–  The rest of the nodes are drawn as squares •  They correspond to the factors of the unnormalized probability distribution

•  A variable and a factor are connected if the variable is one of the arguments to the factor

–  No factor may be connected to another factor in the graph, nor can a variable be connected to a variable

March  5,  2018                                                              37/58  

2. Using Graphs to Describe Model Structured

2.7 Factor graphs

Deep  Learning  Book:  Chapter  16  

Page 38: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Example of how a factor graph can resolve ambiguity

March  5,  2018                                                              38/58  

2. Using Graphs to Describe Model Structured

2.7 Factor graphs

Deep  Learning  Book:  Chapter  16  

Undirected  graph:    Is  this  tree  pairwise  

potentials  or  one  potenTal  over  three  variables?  

This  factor  graph  has  one  factor  over  all  three  variables  

This  factor  graph  has  three  factors  (each  over  

only  two  variables)  

Page 39: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              39/58  

3. Sampling from Graphical Models

3.1 Direct models

Deep  Learning  Book:  Chapter  16  

•  In direct graphical models, the ancestral sampling method can produce samples from the joint distribution represented by the model

•  How does ancestral sampling works? –  Sorting the variables into a topological ordering, so that for all i and

j, j is greater than i if is a parent of . –  The variables can be sampled in this order.

•  First, we sample: •  Then, sample: •  … •  Finally, we sample:

•  It does not support every conditional sampling operation –  To sample from a subset of variables in a directed graphical model,

given other variables, requires that all the contioning variables come earlier than the variables to be sampled in the ordered graph.

xix jxi

x1 ∼ P (x1)P (x2|PaG(x2))

P (xn|PaG(xn))

Page 40: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              40/58  

3. Sampling from Graphical Models

3.2 Undirected models

Deep  Learning  Book:  Chapter  16  

•  Ancestral sampling is applicable only to directed models •  We can sample from undirected models by converting them to

directed models –  It involves solving intractable problems –  Or introducing so many edges that the resulting directed model

becomes intractable •  Gibbs sampling: The conceptually approach for drawing

samples from an undirected graph –  Suppose we have a graphical model over a n-dimensional vector of

random variables x –  We iteratively visit each variable xi and draw a sample conditioned on

all the other variables, i.e., –  Asymptotically, after many repetitions, process converges to sampling

from correct distribution •  Difficulty to determine when the samples have reached a sufficiently accurate

p(xi x!i )

Page 41: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              41/58  

4. Advantages of Structured Modeling

Deep  Learning  Book:  Chapter  16  

•  To dramatically reduce the cost of representing probability distributions as well as learning as inference –  By assuming each node has a tabular distribution given its parents,

memory, sampling, inference are now exponential in number of variables in factor with largest scope •  For many interesting models, this is very small •  e.g., RBMs: all factor scopes are size 2 or 1

–  Previously, these costs were exponential in total number of nodes –  Statistically, much easier to estimate this manageable number of

parameters

Page 42: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              42/58  

5. Learning about Dependencies

Deep  Learning  Book:  Chapter  16  

•  A good generative model needs to accurately capture the distribution over the visible variables v –  The different elements of v are highly dependent on each other –  In deep learning, the dependencies are modeled by introducing latent

variables h –  A good model of v which did not contain any latent variables will need

to have: •  A very large number of parents per node in a Bayesian network •  A very large number of cliques in a Markov network

Highly cost in both computational and statistical sense!

Page 43: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              43/58  

5. Learning about Dependencies

Deep  Learning  Book:  Chapter  16  

•  When the model is intended to capture dependencies between visible variables with direct connections, it is usually infeasible to connect all variables –  The graph must be designed to connect all those variables that are tightly

coupled and omit edges between other variables •  Structure learning algorithms perform greedy search

•  Using latent variables, instead of adaptive structure avoids to perform discrete searches and multiple rounds of training –  Use one graph structure –  Many latent variables –  Dense connections of latent variables to observed variables –  Parameters learn that each latent variable interacts strongly with only a

small subset of observed variables

Page 44: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              44/58  

6. Inference and approximate inference

Deep  Learning  Book:  Chapter  16  

Inference •  Ask questions about how variables are related to each other

–  i.e., given a set of medical tests, we can ask what disease a patient might have –  In a latent variable model, we want to extract features describing the

observed variables v –  Solve such problems in order to perform other task

•  We want to compute to determine

•  These are inference problems –  Predict variables given other variables –  Predict distributions of some variables given values of other variables

E[h | v]

p(h | v) p(v)

Page 45: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              45/58  

6. Inference and approximate inference

Deep  Learning  Book:  Chapter  16  

Intractability of Inference •  For most interesting deep models, the inference problems are

intractable –  Even when we use a structured graphical model to simplify them

•  Graphs structures allow to represent complicated high-dimensional distributions with reasonable number of parameters –  Resulting graphs are not restrictive enough to allow efficient inference

•  Computing the marginal probability is #P hard –  NP problems require determining whether a problem has a solution and if so, find it –  Problems in #P require counting the number of solutions

•  This motivates the use of approximate inference in deep learning –  It is usually referred to variational inference

•  Approximate a true distribution by another distribution that is close to the true one as possible

p(h | v) q(h | v)

Page 46: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              46/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

•  Deep learning does not involve specially deep graphical models

•  The main differences in structured probabilistic models in deep learning –  Depth –  Proportion of observed to latent variables –  Latent semantics (meaning of a latent variable) –  Connectivity and inference algorithm –  Intractability and approximation

Page 47: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              47/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Depth of a graphical model •  Latent variable “hi” is at depth j if shortest path hi from to an

observed variable is j steps •  Depth of model is the greatest depth of any such hi

•  This kind of depth is different from depth induced by the computational graph •  Many generative models used for deep learning have no latent variables

(or only one layer), but use deep computational graphs to define the conditional distributions within a model

Page 48: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              48/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Proportion of observed/latent variables •  Deep learning models typically have more latent variables than

observed variables –  They always use of distributed representations

•  Even shallow models have a single large layer of latent variables

•  Complicated non-linear interactions between variables are accomplished via indirect connections that flow through multiple latent variables

•  By contrast, traditional graphical models contain mostly variables that are observed (i.e., few latent variables)

Page 49: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              49/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Latent variable semantics •  Latent variables are designed differently in deep learning •  In traditional graphical models, they are designed with specific

semantics in mind –  Topic of a document, intelligence of a student, disease causing a

patient’s symptoms, etc.

•  In deep learning, they are not designed to take any specific semantics ahead of time –  Training algorithm is free to invent concepts needed to model a dataset –  Latent variables not easy to interpret after the fact

Page 50: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              50/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Connectivity •  Deep graphical models have large groups of units connected

other large groups of units –  Interactions can be described b a single matrix

•  Traditional graphical models have few connections and the choice of connections for each variable may be individually designed –  The design of the model structure is tightly linked to the choice of

inference algorithm

Page 51: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              51/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Inference •  Traditional graphical models: Tractability of exact inference

–  When this is too limiting, a popular approximate approach is loopy belief propagation

–  Both approaches work well with sparsely connected graphs

•  Models used in deep learning are not sparse –  Use either Gibbs sampling or variational inference

•  Rather than simplifying model until exact inference is feasible, make model complex enough as long as we can compute a gradient

Page 52: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              52/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

The restricted Boltzmann machine (RBM) •  Quintessential example of how graphical models are used for

deep learning •  RBM itself is not a deep model

–  It has a single layer of latent units that may be used to learn a representation for the input

–  RBMs can be used to build many deeper models (Chapter 20)

Page 53: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

•  A general Boltzman machine can have arbitrary connections

•  Restricted Boltzmann Machine (RBM): First layer is called the visible or,

input layer, and the second is the hidden layer – Bipartite undirected graph

•  Used for dimensionality reduction, classification or feature learning

No direct interactions between any two visible units or between any two

hidden units (“restricted”)

                                                           53/58  

Page 54: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              54/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

RBM Characteristics •  Units are organized into large groups called layers •  Connectivity between layers is described by a matrix •  Connectivity is relatively dense •  The model is designed to allow efficient Gibbs sampling •  Learn latent variables whose semantics are not specified by the

designer

Page 55: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Canonical RBM •  Energy-based model with binary visible and hidden units

–  The model is divided into groups of units v and h and the interaction between them is described by matrix W

•  The restrictions on RBM structure yield the properties

and

E(v,h) = !bTv! cTh! vTWh

Unconstrained, real-valued, learnable parameters

p(h | v) = p(hi | v)i!

p(v |h) = p(vi |h)i!

                                                           55/58  

Page 56: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              56/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Example: For the binary RBM, we obtain

-  Together these properties allow for block Gibbs sampling which

alternate between sampling all h simultaneously and all v simultaneously

P(hi =1| v) =! (vTW:, i + bi )

P(hi = 0 | v) =1!! (vTW:, i + bi )

Page 57: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              57/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

Example: Samples from a trained RBM and its weights (model trained on MNIST data)

Corresponding  weight  vectors  

•  Each  column  is  a  separate  Gibbs  process  •  Each  row  represents  the  output  of  another  

1000  steps  of  Gibbs  samples  (sucessive  samples  are  highly  correlated)  

Page 58: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

March  5,  2018                                                              57/58  

7. The Deep Learning Approach to Structured Probabilistic Models

Deep  Learning  Book:  Chapter  16  

How to put RBMs into practice? Tensor implementation of Restricted Boltzman Machine (RBM) and

Autoenconder with layerwise pretraining

hWps://github.com/Cospel/rbm-­‐ae-­‐Y  

Page 59: Structured Probabilistic Models for Deep Learningjcid/MLG/mlg2018/LorenaAlvarez_MLG... · 2018. 3. 5. · Role of probabilistic structured models in deep learning ! For many tasks

Semi-supervised Learning

Thank you very much for

your attention!

MLG:  Semi-­‐supervised  Learning   November  15,  2017