32
The Sum-Product Algorithm Use the factor graph framework to derive the algorithm which is applicable to the tree-structed graph Focus on the problem of evaluation local marginals Assume that the original graph is an undirected tree or a direct tree or a polytree First, convert the original graph into a factor graph so that we can deal with them using the same framework

The Sum-Product Algorithm

  • Upload
    landen

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

The Sum-Product Algorithm. Use the factor graph framework to derive the algorithm which is applicable to the tree- structed graph. Focus on the problem of evaluation local marginals. Assume that the original graph is an undirected tree or a direct tree or a polytree. - PowerPoint PPT Presentation

Citation preview

Page 1: The Sum-Product Algorithm

The Sum-Product Algorithm• Use the factor graph framework to derive the

algorithm which is applicable to the tree-structed graph

• Focus on the problem of evaluation local marginals

• Assume that the original graph is an undirected tree or a direct tree or a polytree

• First, convert the original graph into a factor graph so that we can deal with them using the same framework

Page 2: The Sum-Product Algorithm

Goal• The goal is to exploit the structure of the graph to

achieve the two thing:

(i) To obtain an efficient, exact inference algorithm for finding marginals

(ii) In situations where several marginals are required to allow computations to be shared efficiently

Page 3: The Sum-Product Algorithm

The Sum-Product Algorithm• Suppose that all of the variables are hidden

• By definition, Joint distribution

The set of variables in x without including x

• Use

• Then, interchange the summations and the product

Page 4: The Sum-Product Algorithm

The Sum-Product Algorithm• Consider the following graph

Joint distribution

The product of all the factors in the group associated with factor

Page 5: The Sum-Product Algorithm

The Sum-Product Algorithm• Substitution into and interchanging the sums and

products

𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)

∑𝑋 𝑠

𝐹 𝑠(𝑥 , 𝑋 𝑠)

• Introduce a set of functions:

• View as messages from the factor node to the variable node x

𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)

∑𝑋 𝑠

𝐹 𝑠 (𝑥 ,𝑋 𝑠 )= ∏𝑠∈𝑛𝑒 (𝑥)

𝜇𝑓 𝑠→𝑥 (𝑥)

Page 6: The Sum-Product Algorithm

Proof

𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)

∑𝑋 𝑠

𝐹 𝑠 (𝑥 ,𝑋 𝑠 )= ∏𝑠∈𝑛𝑒 (𝑥)

𝜇𝑓 𝑠→𝑥 (𝑥)

Page 7: The Sum-Product Algorithm

The Sum-Product Algorithm

• Each factor is described by a factor (sub-)graph and so can itself be factorized.

Denoted

Page 8: The Sum-Product Algorithm

The Sum-Product Algorithm𝜇 𝑓 𝑠→𝑥 (𝑥 )=∑

𝑥1

…∑𝑥𝑀

𝑓 𝑠 (𝑥 ,𝑥1 , …,𝑥𝑀 ) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿

¿¿¿

¿∑𝑥1

…∑𝑥𝑀

𝑓 𝑠 (𝑥 , 𝑥1 , …,𝑥𝑀) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿

𝜇𝑥𝑚 → 𝑓 𝑠(𝑥𝑚)

The message that go from factor nodes to variable nodes

The message that go from factor nodes to variable nodes

Page 9: The Sum-Product Algorithm

Proof

𝜇 𝑓 𝑠→𝑥 (𝑥 )=¿

¿ [∑𝑥1

…∑𝑥𝑀

𝑓 𝑠(𝑥 , 𝑥1 ,…, 𝑥𝑀)][ ∏𝑠∈𝑛𝑒 ( 𝑓 𝑠)¿

𝜇𝑥𝑚→ 𝑓 𝑠(𝑥𝑚)]

¿ [∑𝑥1

…∑𝑥𝑀

𝑓 𝑠 (𝑥 , 𝑥1 , …,𝑥𝑀 )𝐺1 (𝑥1 , 𝑋 𝑠1 ) …𝐺𝑀 (𝑥𝑀 , 𝑋 𝑠𝑀 )]¿ [∑𝑥1

𝐺1 (𝑥1 , 𝑋 𝑠1 ) …∑𝑥𝑀

𝐺𝑀 (𝑥𝑀 , 𝑋 𝑠𝑀 ) 𝑓 𝑠 (𝑥 ,𝑥1, …, 𝑥𝑀 )]

Page 10: The Sum-Product Algorithm

The Sum-Product Algorithm• Derive an expression of evaluating the message from

variable nodes to factor nodes, again by making the sub-graph factorization

𝐺𝑚 (𝑥𝑚 ,𝑋 𝑠𝑚 )= ∏𝑙∈𝑛𝑒 (𝑥𝑚) { 𝑓 𝑠

¿𝐹 𝑙(𝑥𝑚 ,𝑋𝑚𝑙)

Page 11: The Sum-Product Algorithm

The Sum-Product Algorithm• Each of these message can be computed recursively in

term of messages• To start the recursion, view the node x as the root of

the tree and begin at the leaf nodes

• If a leaf node is a variable node, then the message that is sent along its one and only one link

• If the leaf node is a factor node, the message should take the form

Page 12: The Sum-Product Algorithm

The Sum-Product Algorithm• Start by viewing the variable node x as the root of the

factor graph and initiating messages at the leave• The message passing steps are then applied until

messages have been propagated along every link

• The root node will receive messages from all its neighbours

• The required marginal can be evaluated

𝜇𝑥𝑚→ 𝑓 𝑠 (𝑥𝑚 )= ∏

𝑙∈𝑛𝑒( 𝑥𝑚){ 𝑓 𝑠

¿𝜇𝑓 𝑙→𝑥𝑚(𝑥𝑚 )

𝑝 (𝑥 )= ∏𝑠∈𝑛𝑒 (𝑥)

𝜇 𝑓 𝑠→𝑥 (𝑥)

Page 13: The Sum-Product Algorithm

Example• Unnormalized joint distribution:

Root

leaf

Page 14: The Sum-Product Algorithm

Example

𝜇 𝑓 𝑠→𝑥 (𝑥 )=∑𝑥1

…∑𝑥𝑀

𝑓 𝑠 (𝑥 ,𝑥1 , …,𝑥𝑀 ) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿

𝜇𝑥𝑚 → 𝑓 𝑠(𝑥𝑚)

𝜇𝑥𝑚→ 𝑓 𝑠 (𝑥𝑚 )= ∏

𝑙∈𝑛𝑒( 𝑥𝑚){ 𝑓 𝑠

¿𝜇𝑓 𝑙→𝑥𝑚(𝑥𝑚 )

Page 15: The Sum-Product Algorithm

𝜇𝑥𝑚→ 𝑓 𝑠 (𝑥𝑚 )= ∏

𝑙∈𝑛𝑒( 𝑥𝑚){ 𝑓 𝑠

¿𝜇𝑓 𝑙→𝑥𝑚(𝑥𝑚 )

𝜇 𝑓 𝑠→𝑥 (𝑥 )=∑𝑥1

…∑𝑥𝑀

𝑓 𝑠 (𝑥 ,𝑥1 , …,𝑥𝑀 ) ∏𝑠∈𝑛𝑒( 𝑓 𝑠 )¿

𝜇𝑥𝑚 → 𝑓 𝑠(𝑥𝑚)

Page 16: The Sum-Product Algorithm

Example

Page 17: The Sum-Product Algorithm

Sum-Product And Max-Sum Algorithm

Sum-product algorithm:Take a joint distribution expressed as a factor graphEfficiently find marginals over the component variables

Max-sum algorithm:Find a setting of the variables that has the largest probabilityFind the value of the above probabilityViewed as an application of dynamic programming

Page 18: The Sum-Product Algorithm

Find the maximal value

Or, find the set of values that have the largest probability, we can find the vector that the maximizes the joint distribution

However, the is not always the same as the set of

Run the sum-product algorithm to obtain for every variable, and then, for each marginal in turn, to find the value that the maximizes the marginal

Page 19: The Sum-Product Algorithm

Example

)p( 0.3 0.4 0.7

0.3 0.0 0.30.6 0.4

Max

MaxSo, the marignals are maximized by and , which

corresponds to a value of 0.3

But, the largest joint probability is 0.4

Page 20: The Sum-Product Algorithm

The Max-Sum Algorithm• Write out the max operator:

where M is the total number of variables

• Substitute for using the product of factors and use the distributive law of multiplication

Page 21: The Sum-Product Algorithm

The Max-Sum Algorithm

Page 22: The Sum-Product Algorithm

The Max-Sum AlgorithmThe final maximization is performed over the product

of all messages arriving at the root node, and gives the maximum value for

This is called the max-product algorithm and identical to the sum-product algorithm except that summations are replaced by maximization

Page 23: The Sum-Product Algorithm

The Max-Sum AlgorithmProduct of many small probabilities can lead to

numerical underflow problem, so work with the logarithm of the joint distribution

If then

ln (max𝐱𝑝 (𝐱))=max

𝐱( ln𝑝 (𝐱 ))

The logarithm function makes the products be the sums, so we can obtain the max-sum algorithm

Page 24: The Sum-Product Algorithm

The Max-Sum Algorithm

Page 25: The Sum-Product Algorithm

The Max-Sum Algorithm• The initial message:

• The probability at the root node:

Page 26: The Sum-Product Algorithm

The Max-Sum Algorithm• Finding the maximum of the joint distribution is

irrespective of which node is chosen as the root

• The process of evaluating the above equation will give the value for the most probable value of the root variable

𝑝 (𝑥)max=max𝑥

∑𝑠∈𝑛𝑒 (𝑥)

𝜇 𝑓 𝑠→𝑥(𝑥)

𝑥max=arg max𝑥

∑𝑠∈𝑛𝑒(𝑥 )

𝜇𝑓 𝑠→𝑥(𝑥 )

Page 27: The Sum-Product Algorithm

The Max-Sum Algorithm

The simple chain with N variables each having K states

Take the as the root nodeIn the first phase, propagate messages from the leaf node to the root node using

The initial message:

The most probable value for is given by

…. …. 144

𝑥𝑛+1 𝑥𝑁𝑥𝑛𝑥𝑛−1𝑥1 𝑓 𝑛− 1,𝑛𝑓 𝑛 ,𝑛+ 1

Page 28: The Sum-Product Algorithm

The Max-Sum Algorithm

• Need to determine the state of previous variables that correspond to the same maximizing configuration

• Done by keeping track of which values of the variables gave rise to the maximum state of each variable

Page 29: The Sum-Product Algorithm

The Max-Sum AlgorithmLattice or trellis diagram

• Not a probabilistic graphical because the nodes represent individual states of variable

The variable node

The nodes with the second states

• For each state of a given variable, there is a unique state of the previous variable that maximizes the probability, corresponding to the function , and indicated by the line connecting the node

Page 30: The Sum-Product Algorithm

The Max-Sum Algorithm• Once, we know the most probable value of the final node ,

simply follow link back to find the most probable state of node and so back to the initial node

• Using and is known as back-tracking

𝜙 (𝑥𝑛)=arg max𝑥𝑛− 1

[ ln 𝑓 𝑛−1 ,𝑛 (𝑥𝑛− 1 , 𝑥𝑛 )+𝜇𝑥𝑛− 1→ 𝑓 𝑛−1 ,𝑛(𝑥𝑛)]

Page 31: The Sum-Product Algorithm

The Max-Sum Algorithm

• Two paths, each of which we shall suppose corresponds to a global maximum

Page 32: The Sum-Product Algorithm

The Max-Sum Algorithm• If a message is sent from a factor node f to a variable node

x, a maximization is performed over all other variable node that neighbours of that factor nodes, using

• Performing this maximization, keep recode of which values of the variables gave rise to the maximization

• In the back-tracking step, having found , then use these stored values to assign consistent maximizing states