Md2 k 04_19_2015

Dependency Parsing by Belief Propagation.

Presenter: Roy Adams

David A. Smith and Jason Eisner

Undirected Graphical ModelsY1 Y2 Y3 Y4

X1 X2 X3 X4

= Variable not observed at test time.

= Variable always observed.

= Factor. Encodes correlations between variables.


X1 X2 X3 X4

= Variable not observed at test time.

= Variable always observed.


X1 X2 X3 X4


X1 X2 X3 X4


X1 X2 X3 X4

Marginal InferenceY1 Y2 Y3 Y4

X1 X2 X3 X4

Sum-Product Message PassingInitialize all messages to 1 Until Convergence:

For each variable, Yi: For each F in Neighborhood(Yi):

For each v in Dom(Yi):

For each factor, F: For each Yi in Neighborhood(F):

















Sum-Product Message Passing

Yi

“Based on what my other neighbors said, this my distribution over my values.”


Yi

“Based on what my other neighbors said, this my distribution over your values.”


- If it converges, we get:

- For tree structured graphs, it is guaranteed to converge and the marginals will be exact.

- For loopy graphs, it may not converge, and if it converges, the marginals may not be correct.

- With some tricks, it tends to work very well in practice.

Questions so far?

High Order Factors

Y1 Y3 YnY2 …

Why do we want them?

Y1 Y3 YnY2 …

They can be used to encode structure. E.g. Y1 through Yn must all take different values.

Why are they hard?

Y1 Y3 YnY2 …

Why are they hard?

Y1 Y3 YnY2 …

Basic message passing has exponential complexity in the neighborhood size of the largest factor.

Fixed High Order Factors

Y1 Y3 YnY2 …


1) Doesn’t depend on the parameters, so we don’t need the marginal to calculate gradients in MLE.


2) The structure is so constrained, that the sum in message passing becomes tractable.

Other Structures (Smith and Eisner)

-Parse trees -Unique labels -Label ordering -Segmentation

Questions?

Science

Md2 k 04_19_2015