26
A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology

A Generalization of Forward-backward Algorithm

  • Upload
    zander

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

A Generalization of Forward-backward Algorithm. Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology. Forward-backward algorithm. Allows efficient calculation of sums (e.g. expectation, ...) over all paths in a trellis. Plays an important role in sequence modeling - PowerPoint PPT Presentation

Citation preview

Page 1: A Generalization of  Forward-backward Algorithm

A Generalization of Forward-backward Algorithm

Ai AzumaYuji Matsumoto

Nara Institute of Science and Technology

Page 2: A Generalization of  Forward-backward Algorithm

Forward-backward algorithm

• Allows efficient calculation of sums (e.g. expectation, ...) over all paths in a trellis.

• Plays an important role in sequence modeling• HMMs (Hidden Markov Models)• CRFs (Conditional Random Fields)

[Lafferty et al., 2001]• ...

Page 3: A Generalization of  Forward-backward Algorithm

A sequential labeling example: part-of-speech tagging

SOURCE

“Time flies like an arrow”

Time[noun]

Time[verb]

Time[prep.]

flies[noun]

flies[verb]

flies[prep.]

like[noun]

like[verb]

like[prep.]

an[noun]

an[verb]

an[prep.]

arrow[noun]

arrow[verb]

arrow[prep.]

SINK

Time[indef. art.]

flies[indef. art.]

like[indef. art.]

an[indef. art.]

arrow[indef. art.]

in CRFs and HMMs, we need to compute the "sum" ofthe probabilities (or scores) of all paths.

Page 4: A Generalization of  Forward-backward Algorithm

Forward-backward algorithm efficiently computes sums over all paths in the trellis with dynamic programming

It is intractable to enumerate all paths in the trellis because the number of all paths is enormous

Forward-backward algorithm recursively computes the sum from source/sink to sink/source with keeping intermediate results on each node and arc

Page 5: A Generalization of  Forward-backward Algorithm

Forward-backward algorithm is applicable to

Normalization constant of CRFs

E-step for HMMs

Feature expectationon CRFs

Yy yy

xCc

tcCc

P ctE ,,

Yy yy

xxFλx Cc

kCc

kP cfcZ

fE ,,exp1

Yy y

xFλxCc

cZ ,exp

t = type of node/node pair

= k-th featurekf yC = set of nodes and arcs (cliques) in path yY = set of paths

Page 6: A Generalization of  Forward-backward Algorithm

0th-order moment(Normalization constant)

1st-order moment

Type of sums computable with forward-backward algorithm:

Yy yy CcCc

cfc

Yy yCc

c

yC = set of nodes and arcs (cliques) in path yY = set of paths

Page 7: A Generalization of  Forward-backward Algorithm

But sometimes we need higher-order multivariate moments...

Yy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

To name a few examples:Correlation between featuresObjectives more complex than log-likelihoodParameter differentiations of these...

Page 8: A Generalization of  Forward-backward Algorithm

Our goal: To generalize forward-backward algorithm for higher-order multivariate moments!

Page 9: A Generalization of  Forward-backward Algorithm

Can we derive dynamic programming for this formula?

Answer Record multiple forward/backward variables for each clique,

and Combine all the previously calculated values by the binomial theorem

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

Page 10: A Generalization of  Forward-backward Algorithm

SOURCE

u Cc

cusrc

0Yy y

・・・・・

u

u CcCc

cfcusrc

1Yy yy

u

n

CcCcn cfcu

srcYy yy

A set of paths  from SOURCE to u

usrcY

Page 11: A Generalization of  Forward-backward Algorithm

SOURCE

u Cc

cusrc

0Yy y

・・・・・

u

u CcCc

cfcusrc

1Yy yy

u

n

CcCcn cfcu

srcYy yy

A set of paths  from SOURCE to u

usrcY

Ordinary forward-backward records only this variable

Page 12: A Generalization of  Forward-backward Algorithm

Direct ancestors of v

u

v・・・・・SOURCE

vx

xvvprev

00

vxvx

xvfxvvprev

0prev

11

i

j vxji

ji vvf

j

ivv

0 prev

・・・・・

・・・・・

ni ,,0

vprev

・・・・・

Page 13: A Generalization of  Forward-backward Algorithm

Direct ancestors of v

u

v・・・・・SOURCE

vx

xvvprev

00

vxvx

xvfxvvprev

0prev

11

i

j vxji

ji vvf

j

ivv

0 prev

・・・・・

・・・・・

ni ,,0

vprev

・・・・・

These are derived from the binomial theorem

These are derived from the binomial theorem

Page 14: A Generalization of  Forward-backward Algorithm

Direct ancestors of SINK

SINK・・・・・SOURCE

SINKprev

・・・・・

SINKprev

00 SINKSINKx

x

SINKprev

0

SINK

SINKSINKSINK

xji

i

j

ji f

j

i

・・・・・

・・・・・ ni ,,0

Desired values

Page 15: A Generalization of  Forward-backward Algorithm

Summary of Our Ideas

u

v・・・・・ ・・・・・

u0

・・・・・

u1

un

v0

・・・・・

v1

vn・・・・・

SOURCE

multiple variablesfor each clique

multiple variablesfor each clique

Dependency between variables in a step,which is derived from the binomial theoremDependency between variables in a step,

which is derived from the binomial theorem

Page 16: A Generalization of  Forward-backward Algorithm

For multivariate cases, forward/backward variables have multiple indices

u

u0,,0

・・・・・

u1,,0

uKnn ,,1

xYy yyy

00

1Cc

KCcCc

cfcfc

xYy yyy

10

1Cc

KCcCc

cfcfc

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

・・・・・

Page 17: A Generalization of  Forward-backward Algorithm

To calculate the following form

computational cost of the generalized forward-backward is proportional to

.11 22

21 nnEV

Computational cost is only linear in the number of nodes and arcs in the trellis

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

Linear in |V| and |E|Linear in |V| and |E|

Page 18: A Generalization of  Forward-backward Algorithm

Merits of the generalized forward-backward algorithm

1. The generalized forward-backward subsumes many existing task-specific algorithms

2. For some tasks, it leads to a solution more efficient than the existing ones

Page 19: A Generalization of  Forward-backward Algorithm

Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms:

Task Sum to compute

Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002]

Parameter diffs. of entropy for CRFs[Mann et al., 2007]

Hessian-vector

product for CRFs[Vishwanathan et al., 2006]

y yyy

xxFλxFλCc

kCcCc

cfcc ,,,exp

y yyy

y yyy yy

xFλxFxFλ

xFλxFλxFxFλ

CcCcCc

CcCcCcCc

ccc

cccc

,,,exp

,,exp,,exp

y yyy

xxFλCc

kCcCc

cfcc ,,exp

Page 20: A Generalization of  Forward-backward Algorithm

Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms:

Task Sum to compute

Parameter diffs. of Hamming-loss for CRFs [Kakade et al., 2002]

Parameter diffs. of entropy for CRFs[Mann et al., 2007]

Hessian-vector

product for CRFs[Vishwanathan et al., 2006]

y yyy

xxFλxFλCc

kCcCc

cfcc ,,,exp

y yyy

y yyy yy

xFλxFxFλ

xFλxFλxFxFλ

CcCcCc

CcCcCcCc

ccc

cccc

,,,exp

,,exp,,exp

y yyy

xxFλCc

kCcCc

cfcc ,,exp

All these formulas have a form computable with our proposed method.All these formulas have a form computable with our proposed method.

Page 21: A Generalization of  Forward-backward Algorithm

The previously proposed algorithms for these tasks are task-specific

The generalized forward-backward is a task-independent algorithm applicable to formulae of the form

If a problem involves this form, it immediately offers efficient solution

xYy yyy

Kn

CcK

n

CcCc

cfcfc 1

1

Page 22: A Generalization of  Forward-backward Algorithm

Merits of the generalized forward-backward algorithm

1. The generalized forward-backward subsumes many existing task-specific algorithms

2. For some tasks, it leads to a solution more efficient than the existing ones

Page 23: A Generalization of  Forward-backward Algorithm

Merit 2. Efficient optimization procedure with respect to Generalized Expectation Criteria for CRFs [Mann et al., 2008]

     

     

     

EVL Computational cost is proportional to

   

Computational cost is proportional to

EV

Algorithm proposed in [Mann et al., 2008] By a specialization of the generalization

Nodes labeled as answers

(L = # of nodes labeled as answers)

Page 24: A Generalization of  Forward-backward Algorithm

Future tasks

• Explore other tasks to which our generalized forward-backward algorithm is applicable

• Extend the generalized forward-backward to trees and general graphs containing cycles

Page 25: A Generalization of  Forward-backward Algorithm

Summary• We have generalized the forward-backward

algorithm to allow for higher-order multivariate moments

• The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments

• Many existing task-specific algorithms are instances of this generalization

• It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs

Page 26: A Generalization of  Forward-backward Algorithm

Summary• We have generalized the forward-backward

algorithm to allow for higher-order multivariate moments

• The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments

• Many existing task-specific algorithms are instances of this generalization

• It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs

Thank you for your attention!