100
Liang Huang Suphan Fayong Yang Guo Information Sciences Institute University of Southern California NAACL 2012 Montréal June 2012 Structured Perceptron with Inexact Search the man bit the dog DT NN VBD DT NN x y x y=-1 y=+1 x the man bit the dog x y

Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Liang Huang Suphan Fayong Yang Guo

Information Sciences InstituteUniversity of Southern California

NAACL 2012 Montréal June 2012

Structured Perceptron

with Inexact Search

the man bit the dog

DT NN VBD DT NN

x

y

x

y=-1y=+1

xthe man bit the dog x

那 人 咬 了 狗 y

Page 2: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

2

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

binary classification

Page 3: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

2

the man bit the dog

DT NN VBD DT NN

x

y

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

binary classification

structured classification

Page 4: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

2

the man bit the dog

DT NN VBD DT NN

x

y y

update weightsif y ≠ z

w

x zexactinference

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

binary classification

structured classification

Page 5: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

• challenge: search efficiency (exponentially many classes)

• often use dynamic programming (DP)

• but still too slow for repeated use, e.g. parsing is O(n3)

• and can’t use non-local features in DP2

the man bit the dog

DT NN VBD DT NN

x

y y

update weightsif y ≠ z

w

x zexactinference

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

trivial

hard

constant# of classes

exponential # of classes

binary classification

structured classification

Page 6: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Perceptron w/ Inexact Inference

3

the man bit the dog

DT NN VBD DT NN

x

y

x zinexactinference

y

update weightsif y ≠ z

w

beam searchgreedy search

Page 7: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Perceptron w/ Inexact Inference

• routine use of inexact inference in NLP (e.g. beam search)

3

the man bit the dog

DT NN VBD DT NN

x

y

x zinexactinference

y

update weightsif y ≠ z

w

beam searchgreedy search

Page 8: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Perceptron w/ Inexact Inference

• routine use of inexact inference in NLP (e.g. beam search)

• how does structured perceptron work with inexact search?

3

the man bit the dog

DT NN VBD DT NN

x

y

x zinexactinference

y

update weightsif y ≠ z

w

beam searchgreedy search

Page 9: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Perceptron w/ Inexact Inference

• routine use of inexact inference in NLP (e.g. beam search)

• how does structured perceptron work with inexact search?

• so far most structured learning theory assume exact search

3

the man bit the dog

DT NN VBD DT NN

x

y

x zinexactinference

y

update weightsif y ≠ z

w

beam searchgreedy search

Page 10: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Perceptron w/ Inexact Inference

• routine use of inexact inference in NLP (e.g. beam search)

• how does structured perceptron work with inexact search?

• so far most structured learning theory assume exact search

• would search errors break these learning properties?

3

the man bit the dog

DT NN VBD DT NN

x

y

x zinexactinference

y

update weightsif y ≠ z

w

beam searchgreedy search

Page 11: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Perceptron w/ Inexact Inference

• routine use of inexact inference in NLP (e.g. beam search)

• how does structured perceptron work with inexact search?

• so far most structured learning theory assume exact search

• would search errors break these learning properties?

• if so how to modify learning to accommodate inexact search?3

the man bit the dog

DT NN VBD DT NN

x

y

x zinexactinference

y

update weightsif y ≠ z

w

does it still work???

beam searchgreedy search

Page 12: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Prior work: Early update (Collins/Roark)

4

x zgreedy or beam

y

early update on

prefixes y’, z’

w

Page 13: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Prior work: Early update (Collins/Roark)

• a partial answer: “early update” (Collins & Roark, 2004)

• a heuristic for perceptron with greedy or beam search

• updates on prefixes rather than full sequences

• works much better than standard update in practice, but...

4

x zgreedy or beam

y

early update on

prefixes y’, z’

w

Page 14: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Prior work: Early update (Collins/Roark)

• a partial answer: “early update” (Collins & Roark, 2004)

• a heuristic for perceptron with greedy or beam search

• updates on prefixes rather than full sequences

• works much better than standard update in practice, but...

• two major problems for early update

• there is no theoretical justification -- why does it work?

• it learns too slowly (due to partial examples); e.g. 40 epochs

4

x zgreedy or beam

y

early update on

prefixes y’, z’

w

Page 15: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Prior work: Early update (Collins/Roark)

• a partial answer: “early update” (Collins & Roark, 2004)

• a heuristic for perceptron with greedy or beam search

• updates on prefixes rather than full sequences

• works much better than standard update in practice, but...

• two major problems for early update

• there is no theoretical justification -- why does it work?

• it learns too slowly (due to partial examples); e.g. 40 epochs

• we’ll solve problems in a much larger framework4

x zgreedy or beam

y

early update on

prefixes y’, z’

w

Page 16: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Our Contributions

• theory: a framework for perceptron w/ inexact search

• explains early update (and others) as a special case

• practice: new update methods within the framework

• converges faster and better than early update

• real impact on state-of-the-art parsing and tagging

• more advantageous when search error is severer

5

x zgreedy or beam

y

early update on

prefixes y’, z’

w

Page 17: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

In this talk...

• Motivations: Structured Learning and Search Efficiency

• Structured Perceptron and Inexact Search

• perceptron does not converge with inexact search

• early update (Collins/Roark ’04) seems to help; but why?

• New Perceptron Framework for Inexact Search

• explains early update as a special case

• convergence theory with arbitrarily inexact search

• new update methods within this framework

• Experiments6

Page 18: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

• simple generalization from binary/multiclass perceptron

• online learning: for each example (x, y) in data

• inference: find the best output z given current weight w

• update weights when if y ≠ z

7

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

Page 19: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

• simple generalization from binary/multiclass perceptron

• online learning: for each example (x, y) in data

• inference: find the best output z given current weight w

• update weights when if y ≠ z

7

the man bit the dog

DT NN VBD DT NN

x

y

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

Page 20: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

• simple generalization from binary/multiclass perceptron

• online learning: for each example (x, y) in data

• inference: find the best output z given current weight w

• update weights when if y ≠ z

7

the man bit the dog

DT NN VBD DT NN

x

y y

update weightsif y ≠ z

w

x zexactinference

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

Page 21: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Structured Perceptron (Collins 02)

• simple generalization from binary/multiclass perceptron

• online learning: for each example (x, y) in data

• inference: find the best output z given current weight w

• update weights when if y ≠ z

7

the man bit the dog

DT NN VBD DT NN

x

y y

update weightsif y ≠ z

w

x zexactinference

x

y=-1y=+1

x

y

update weightsif y ≠ z

w

x zexactinference

trivial

hard

constant classes

exponential classes

Page 22: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Convergence with Exact Search• linear classification: converges iff. data is separable

• structured: converges iff. data separable & search exact

• there is an oracle vector that correctly labels all examples

• one vs the rest (correct label better than all incorrect labels)

• theorem: if separable, then # of updates ≤ R2 / δ2 R: diameter

8

y=+1y=-1

y100

z ≠ y100

x100

x100 x111

x2000

x3012 δ

Rosenblatt => Collins 1957 2002

Page 23: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Convergence with Exact Search• linear classification: converges iff. data is separable

• structured: converges iff. data separable & search exact

• there is an oracle vector that correctly labels all examples

• one vs the rest (correct label better than all incorrect labels)

• theorem: if separable, then # of updates ≤ R2 / δ2 R: diameter

8

y=+1y=-1

y100

z ≠ y100

x100

x100 x111

x2000

x3012

R: diameter

δ

Rosenblatt => Collins 1957 2002

Page 24: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Convergence with Exact Search• linear classification: converges iff. data is separable

• structured: converges iff. data separable & search exact

• there is an oracle vector that correctly labels all examples

• one vs the rest (correct label better than all incorrect labels)

• theorem: if separable, then # of updates ≤ R2 / δ2 R: diameter

8

y=+1y=-1

y100

z ≠ y100

x100

x100 x111

x2000

x3012

R: diameter

δδ

Rosenblatt => Collins 1957 2002

Page 25: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Convergence with Exact Search• linear classification: converges iff. data is separable

• structured: converges iff. data separable & search exact

• there is an oracle vector that correctly labels all examples

• one vs the rest (correct label better than all incorrect labels)

• theorem: if separable, then # of updates ≤ R2 / δ2 R: diameter

8

y=+1y=-1

y100

z ≠ y100

x100

x100 x111

x2000

x3012

R: diameterR: diameter

δδ

Rosenblatt => Collins 1957 2002

Page 26: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

correctlabel

current model

Page 27: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

w(k)

correctlabel

current model

Page 28: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

V

N

w(k)

correctlabel

current model

Page 29: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

V

N

w(k)

correctlabel

current model

Page 30: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

V

V NN

V V

w(k)

correctlabel

current model

Page 31: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

V

V NN

V V

w(k)

correctlabel

current model

Page 32: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

correctlabel

current model

Page 33: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

correctlabel

updat

e

current model

Page 34: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

correctlabel

updat

e

current model

new model

Page 35: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

No Convergence w/ Greedy Search

9

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

standard perceptron does not converge with greedy search

correctlabel

updat

e

current model

new model

Page 36: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early update (Collins/Roark 2004) to rescue

10

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V N

w(k)

w(k+

1)

stop and update at the first mistake

standard perceptron does not converge with greedy search

correctlabel

current model

updat

e

new model

Page 37: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early update (Collins/Roark 2004) to rescue

10

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V N

w(k)

w(k+

1)

w(k)

stop and update at the first mistake

standard perceptron does not converge with greedy search

correctlabel

current model

updat

e

new model

Page 38: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early update (Collins/Roark 2004) to rescue

10

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V N

w(k)

w(k+

1)

w(k)

stop and update at the first mistake

V

N

standard perceptron does not converge with greedy search

correctlabel

current model

updat

e

new model

Page 39: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early update (Collins/Roark 2004) to rescue

10

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V N

w(k)

w(k+

1)

w(k)

stop and update at the first mistake

V

N

standard perceptron does not converge with greedy search

correctlabel

current model

updat

e

new model

Page 40: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early update (Collins/Roark 2004) to rescue

10

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V N

w(k)

w(k+

1)

��(x

,

y

,

z)

w(k)

stop and update at the first mistake

V

N

standard perceptron does not converge with greedy search

correctlabel

current model

updat

e

new model

Page 41: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early update (Collins/Roark 2004) to rescue

10

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V N

w(k)

w(k+

1)

��(x

,

y

,

z)w (k+1)

w(k)

stop and update at the first mistake

V

N

standard perceptron does not converge with greedy search

correctlabel

current model

new model

updat

e

new model

Page 42: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Why?

• why does inexact search break convergence property?

• what is required for convergence? exactness?

• why does early update (Collins/Roark 04) work?

• it works well in practice and is now a standard method

• but there has been no theoretical justification

• we answer these Qs by inspecting the convergence proof11

V

V N

N V

N

V V ��(x

,

y

,

z)w (k+1)

w(k)

Page 43: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y correct label

Page 44: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

correct label

curr

ent

mod

el

zexact1-best

Page 45: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

perceptron update:

zexact1-best

Page 46: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

Page 47: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

unit oracle vector u

Page 48: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

δseparation

unit oracle vector u

margin� �

Page 49: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

(by induction)

δseparation

unit oracle vector u

margin� �

Page 50: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 1

12

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

(by induction)

δseparation

unit oracle vector u

margin� �

(part 1: upperbound)

Page 51: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y correct label

Page 52: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

correct label

curr

ent

mod

el

zexact1-best

Page 53: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

perceptron update:

zexact1-best

Page 54: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

Page 55: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

Page 56: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

zexact1-best

Page 57: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

R: max diameter

zexact1-best

diameter R2

Page 58: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

R: max diameter

zexact1-best

diameter R2

Page 59: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

<90˚

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

violation: incorrect label scored higher

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

violation

R: max diameter

zexact1-best

diameter R2

Page 60: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

<90˚

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

violation: incorrect label scored higher

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

violation

by induction:

R: max diameter

zexact1-best

diameter R2

(part 2: upperbound)

Page 61: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

<90˚

Geometry of Convergence Proof pt 2

13

y

update weightsif y ≠ z

wx zexact

inference

y

w(k

)

w(k+1)

violation: incorrect label scored higher

parts 1+2 => update bounds:

correct label

��(x

,

y

,

z)update

curr

ent

mod

el

update

new

m

odel

perceptron update:

violation

by induction:

R: max diameter

zexact1-best

diameter R2

k R2/�2(part 2: upperbound)

Page 62: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

<90˚

Violation is All we need!

14

y

violation: incorrect label scored higher

correct label

��(x

,

y

,

z)update

• exact search is not really required by the proof

• rather, it is only used to ensure violation!

w(k

)

w(k+1)

curr

ent

mod

el

update

new

m

odel

R: max diameter

zexact1-best

the proof only uses 3 facts:

1. separation (margin)2. diameter (always finite)3. violation (but no need for exact)

Page 63: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

<90˚

Violation is All we need!

14

y

violation: incorrect label scored higher

correct label

��(x

,

y

,

z)update

• exact search is not really required by the proof

• rather, it is only used to ensure violation!

w(k

)

w(k+1)

curr

ent

mod

el

update

new

m

odel

R: max diameter

all violations

zexact1-best

the proof only uses 3 facts:

1. separation (margin)2. diameter (always finite)3. violation (but no need for exact)

Page 64: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

y

Violation-Fixing Perceptron• if we guarantee violation, we don’t care about exactness!

• violation is good b/c we can at least fix a mistake

15

y

update weightsif y ≠ z

wx zexact

inferencey

update weightsif y’ ≠ z

wx zfind

violation y’

same mistake bound as before!

standard perceptron violation-fixing perceptron

all violations

Page 65: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

y

Violation-Fixing Perceptron• if we guarantee violation, we don’t care about exactness!

• violation is good b/c we can at least fix a mistake

15

y

update weightsif y ≠ z

wx zexact

inferencey

update weightsif y’ ≠ z

wx zfind

violation y’

same mistake bound as before!

standard perceptron violation-fixing perceptron

all violations

Page 66: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

y

Violation-Fixing Perceptron• if we guarantee violation, we don’t care about exactness!

• violation is good b/c we can at least fix a mistake

15

y

update weightsif y ≠ z

wx zexact

inferencey

update weightsif y’ ≠ z

wx zfind

violation y’

same mistake bound as before!

standard perceptron violation-fixing perceptron

all violations

all po

ssible

upda

tes

Page 67: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

What if can’t guarantee violation

16

currentmodel

Page 68: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

What if can’t guarantee violation• this is why perceptron doesn’t work well w/ inexact search

• because not every update is guaranteed to be a violation

• thus the proof breaks; no convergence guarantee

16

currentmodel

Page 69: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

What if can’t guarantee violation• this is why perceptron doesn’t work well w/ inexact search

• because not every update is guaranteed to be a violation

• thus the proof breaks; no convergence guarantee

• example: beam or greedy search

• the model might prefer the correct label (if exact search)

• but the search prunes it away

• such a non-violation update is “bad”because it doesn’t fix any mistake

• the new model still misguides the search

16

beam

bad

update

currentmodel

Page 70: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Standard Update: No Guarantee

17

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

correctlabel

Page 71: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Standard Update: No Guarantee

17

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

w(k)

correctlabel

Page 72: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Standard Update: No Guarantee

17

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

V

N

w(k)

correctlabel

Page 73: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Standard Update: No Guarantee

17

w(k)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

V

V NN

V V

w(k)

correctlabel

Page 74: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Standard Update: No Guarantee

17

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

correctlabel

Page 75: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Standard Update: No Guarantee

17

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

standard update doesn’t converge

b/c it doesn’t guarantee violation

correct label scores higher.non-violation: bad update!

correctlabel

Page 76: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: Guarantees Violation

18

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

standard update doesn’t converge

b/c it doesn’t guarantee violation

correctlabel

Page 77: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: Guarantees Violation

18

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

w(k)

standard update doesn’t converge

b/c it doesn’t guarantee violation

correctlabel

Page 78: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: Guarantees Violation

18

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

w(k)

V

N

standard update doesn’t converge

b/c it doesn’t guarantee violation

correctlabel

Page 79: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: Guarantees Violation

18

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

w(k)

V

N

standard update doesn’t converge

b/c it doesn’t guarantee violation

correctlabel

Page 80: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: Guarantees Violation

18

w(k)

��(x,

y

,

z

)

V

V N

N V

N

V V training example time fliesN V

output space{N,V} x {N, V}

N VV

V NN

V V

w(k)

w(k+

1)

��(x

,

y

,

z)w (k+1)

w(k)

early update: incorrect prefix scores higher: a violation!

V

N

standard update doesn’t converge

b/c it doesn’t guarantee violation

correctlabel

Page 81: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: from Greedy to Beam

• beam search is a generalization of greedy (where b=1)

• at each stage we keep top b hypothesis

• widely used: tagging, parsing, translation...

• early update -- when correct label first falls off the beam

• up to this point the incorrect prefix should score higher

• standard update (full update) -- no guarantee!

19

standard update(no guarantee!)

Page 82: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: from Greedy to Beam

• beam search is a generalization of greedy (where b=1)

• at each stage we keep top b hypothesis

• widely used: tagging, parsing, translation...

• early update -- when correct label first falls off the beam

• up to this point the incorrect prefix should score higher

• standard update (full update) -- no guarantee!

19

correct labelfalls off beam

(pruned)

correct

standard update(no guarantee!)

Page 83: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: from Greedy to Beam

• beam search is a generalization of greedy (where b=1)

• at each stage we keep top b hypothesis

• widely used: tagging, parsing, translation...

• early update -- when correct label first falls off the beam

• up to this point the incorrect prefix should score higher

• standard update (full update) -- no guarantee!

19

correct labelfalls off beam

(pruned)

correct

incorrect

standard update(no guarantee!)

Page 84: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update: from Greedy to Beam

• beam search is a generalization of greedy (where b=1)

• at each stage we keep top b hypothesis

• widely used: tagging, parsing, translation...

• early update -- when correct label first falls off the beam

• up to this point the incorrect prefix should score higher

• standard update (full update) -- no guarantee!

19

earl

y up

date

correct labelfalls off beam

(pruned)

correct

incorrect

violation guaranteed: incorrect prefix scores higher up to this point

standard update(no guarantee!)

Page 85: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update as Violation-Fixing

20

beam

correct labelfalls off beam

(pruned)

standard update(bad!)

y

update weightsif y’ ≠ z

w

x zfind

violationy’

prefix violations

y’

z

yea

rly

upda

te

Page 86: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Early Update as Violation-Fixing

20

beam

correct labelfalls off beam

(pruned)

standard update(bad!)

y

update weightsif y’ ≠ z

w

x zfind

violationy’

prefix violations

y’

z

y

also new definition of“beam separability”:

a correct prefix should score higher than

any incorrect prefix of the same length(maybe too strong)

cf. Kulesza and Pereira,2007ea

rly

upda

te

Page 87: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

New Update Methods: max-violation, ...

21

beam

earlystandard

(bad!)

max-violation

latest

• we now established a theory for early update (Collins/Roark)

• but it learns too slowly due to partial updates

• max-violation: use the prefix where violation is maximum

• “worst-mistake” in the search space

• all these update methods are violation-fixing perceptrons

Page 88: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Experiments

the man bit the dog

DT NN VBD DT NN

x

y

the man bit the dog x

y

bit

man

the

dog

the

trigram part-of-speech tagging incremental dependency parsing

local features only,exact search tractable

(proof of concept)

non-local features,exact search intractable

(real impact)

Page 89: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

1) Trigram Part of Speech Tagging

23

• standard update performs terribly with greedy search (b=1)

• because search error is severe at b=1: half updates are bad!

• no real difference beyond b=2: search error becomes rare

% of bad (non-violation)standard updates 53% 10% 1.5% 0.5%

Page 90: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Max-Violation Reduces Training Time

24

• max-violation peaks at b=2, greatly reduced training time

• early update achieves the highest dev/test accuracy

• higher than the best published accuracy (Shen et al ‘07)

• future work: add non-local features to tagging

beam iter time test

standard

early

max-violation

- 6 162m 97.28

4 6 37m 97.35

2 3 26m 97.33

Shen et al (2007)Shen et al (2007)Shen et al (2007)Shen et al (2007) 97.33

Page 91: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

2) Incremental Dependency Parsing• DP incremental dependency parser (Huang and Sagae 2010)

• non-local history-based features rule out exact DP

• we use beam search, and search error is severe

• baseline: early update. extremely slow: 38 iterations

25

Page 92: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Max-violation converges much faster

• early update: 38 iterations, 15.4 hours (92.24)

• max-violation: 10 iterations, 4.6 hours (92.25) 12 iterations, 5.5 hours (92.32)

26

Page 93: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Comparison b/w tagging & parsing• search error is much more severe in parsing than in tagging

• standard update is OK in tagging except greedy search (b=1)

• but performs horribly in parsing even at large beam (b=8)

• because ~50% of standard updates are bad (non-violation)!

27

test

standard

early

max-violation

79.1

92.1

92.2

% of bad standard updates

Page 94: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Comparison b/w tagging & parsing• search error is much more severe in parsing than in tagging

• standard update is OK in tagging except greedy search (b=1)

• but performs horribly in parsing even at large beam (b=8)

• because ~50% of standard updates are bad (non-violation)!

27

test

standard

early

max-violation

79.1

92.1

92.2

% of bad standard updates

take-home message:our methods are more helpful for harder search problems!

Page 95: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Related Work and Discussions

28

Page 96: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Related Work and Discussions• our “violation-fixing” framework include as special cases

• early-update (Collins and Roark, 2004)

• a variant of LaSO (Daume and Marcu, 2005)

• not sure about Searn (Daume et al, 2009)

28

Page 97: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Related Work and Discussions• our “violation-fixing” framework include as special cases

• early-update (Collins and Roark, 2004)

• a variant of LaSO (Daume and Marcu, 2005)

• not sure about Searn (Daume et al, 2009)

• “beam-separability” or “greedy-separability” related to:

• “algorithmic-separability” of (Kulesza and Pereira, 2007)

• but these conditions are too strong to hold in practice

28

Page 98: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Related Work and Discussions• our “violation-fixing” framework include as special cases

• early-update (Collins and Roark, 2004)

• a variant of LaSO (Daume and Marcu, 2005)

• not sure about Searn (Daume et al, 2009)

• “beam-separability” or “greedy-separability” related to:

• “algorithmic-separability” of (Kulesza and Pereira, 2007)

• but these conditions are too strong to hold in practice

• under-generating (beam) vs. over-generating (LP-relax.)

• Kulesza & Pereira and Martins et al (2011): LP-relaxation

• Finley and Joachims (2008): both under and over for SVM28

Page 99: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Conclusions• Structured Learning with Inexact Search is Important

• Two contributions from this work:

• theory: a general violation-fixing perceptron framework

• convergence for inexact search under new defs of separability

• subsumes previous work (early update & LaSO) as special cases

• practice: new update methods within this framework

• “max-violation” learns faster and better than early update

• dramatically reducing training time by 3-5 folds

• improves over state-of-the-art tagging and parsing systems

• our methods are more helpful to harder search problems! :)29

Page 100: Structured Perceptron with Inexact Searchweb.engr.oregonstate.edu/~huanlian/slides/perceptron-inexact-NAACL.pdfStructured Perceptron (Collins 02) •simple generalization from binary/multiclass

Thank you!

Liang apologizes for not able to come due to visa/passport reasons.

Many thanks to Philipp Koehn for presenting it! Danke!

% of bad updatesin standard perceptron

[email protected]

parsing accuracy on held-out