62
Better Better k k -best Parsing -best Parsing Liang Huang (Penn) David Chiang (Maryland) 9th International Workshop on Parsing Technologies (IWPT 2005) Vancouver, B.C., Canada

Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Better Better kk-best Parsing-best Parsing

Liang Huang (Penn)

David Chiang (Maryland)

9th International Workshop on Parsing Technologies (IWPT 2005)Vancouver, B.C., Canada

Page 2: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 1IWPT 2005

POS tagging

Syntactic Parsing

Semantic Interpretation

compact:lattice

k-best lists

MotivationsMotivations

• NLP pipeline– 1-best is not always optimal in the future– postpone disambiguation to next phases– next phase compatible with current?

• Y: packed representation (forest, lattice)• N: k-best lists

• Discriminative Training– Reranking (Collins, 2000)– Minimum error training (Och, 2003)– k-best MIRA/Perceptron (McDonald,

Crammer and Pereira, 2005)

Page 3: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 2IWPT 2005

Previous WorkPrevious Work

• Collins (2000); Bikel (2004)– Turn down Dynamic Programming– Aggressive Pruning

• Tight Beam Width• Hard Cell Limit

• Charniak and Johnson (ACL 2005)– multi-pass, coarse-to-fine k-best– improvement: f-score: 89.7% ==> 91.0%

• Jiménez and Marzal (2000)– very close to our lazy offline algorithm (but for CKY only)– tested on a tiny grammar on WSJ (512 rules)

Page 4: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 3IWPT 2005

OutlineOutline

• Formulation– directed monotonic hypergraphs

• Algorithms– Alg.0 thru Alg. 3

• Experiments– k-best parser on top of the Collins/Bikel Parser– k-best CKY-based Hiero decoder (Chiang, 2005)

• Conclusion

Page 5: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 4IWPT 2005

HypergraphHypergraph

• A hypergraph is a pair <V, E>– V is the set of vertices (the items in derivation)– E is the set of hyperedges, each hyperedge connecting several vertices (the antecedents of a derivation rule)

to one vertex (the consequent of a derivation rule)

Page 6: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 5IWPT 2005

I saw a boy with a telescope

S

VPNP

I

NP

a boy

v

sawNP

a telescope

prep

with

PP

VP

Packed Forest as HypergraphPacked Forest as Hypergraph

logical deduction(Shieber et al., 1995)

hypergraph search(Klein & Manning, 2001)

Page 7: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 6IWPT 2005

I saw a boy with a telescope

S

VPNP

I

NP

a boy

v

sawNP

a telescope

prep

with

PP

VP NP

VP

S

Packed Forest as HypergraphPacked Forest as Hypergraph

logical deduction(Shieber et al., 1995)

hypergraph search(Klein & Manning, 2001)

Page 8: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 7IWPT 2005

Packed Forest as HypergraphPacked Forest as Hypergraph

I saw a boy with a telescope

S

VPNP

I

NP

a boy

v

sawNP

a telescope

prep

with

PP

VP NP

VP

a hypergraph!

vertices

hyperedges

logical deduction(Shieber et al., 1995)

hypergraph search(Klein & Manning, 2001)

weighted deduction(Nederhof, 2003)

weighted hypergraph

Page 9: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 8IWPT 2005

Weighted HypergraphWeighted Hypergraph

• A tuple <V, E, t, R>• t - target vertex (goal item) e.g. t = (S, 1, n)

• R - weight set with a total-ordering ≤• every e = <T(e), h(e), f > f - weight function

: a : b : c

: f (a, b, c)

e = < ( ), , f >(Nederhof, 2003)

Page 10: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 9IWPT 2005

Monotonic WeightMonotonic Weight FunctionsFunctions

• all weight functions must be monotonic on eachof its arguments

• optimal sub-problem property in dynamicprogramming

A: f (b, c)

B: bC: c

A: f (b’, c) ≤ f (b, c)

B: b’≤bC: c

e = < ((NP, 1, 2) (VP, 3, 5)),(S, 1, 5), f >

f (b, c)= b•c •Pr(S→NP VP)

CKY example:

Page 11: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 10IWPT 2005

in CKY, t = (S, 1, n)

kk-best Problem in Hypergraph-best Problem in Hypergraph

• 1-best problem– find the best derivation of the target vertex t

• k-best problem– find the top k derivations of the target vertex t

• assumptions– acyclic: so that we can use topological order

Page 12: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 11IWPT 2005

OutlineOutline

• Formulations• Algorithms

– Algorithm 0: naïve polynomial– Algorithm 1: speeding up multk

– Algorithm 2: speeding up mergek

– Algorithm 3: offline lazy algorithm

• Experiments• Conclusion and Future Work

Page 13: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 12IWPT 2005

Generic 1-best Generic 1-best ViterbiViterbi Algorithm Algorithm

• traverse the hypergraph in topological order– for each incoming hyperedge

• compute the result of the f function along the hyperedge• update the 1-best value for the current vertex if possible

v

u: a

w: b

f1

: f1(a, b)

Page 14: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 13IWPT 2005

Generic 1-best Viterbi AlgorithmGeneric 1-best Viterbi Algorithm

v

u: a

w: b

u’: c

w’: d

f1

f2

: better ( f1(a, b), f2(c, d))

• traverse the hypergraph in topological order– for each incoming hyperedge

• compute the result of the f function along the hyperedge• update the 1-best value for the current vertex if possible

Page 15: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 14IWPT 2005

Generic 1-best Viterbi AlgorithmGeneric 1-best Viterbi Algorithm

v

u: a

w: b

u’: c

w’: d

f1

f2

: better( better ( f1(a, b), f2(c, d)), …)

… overall time complexity: O(|E|)

CKY + CNF: |E|=O (n3|P|)

• traverse the hypergraph in topological order– for each incoming hyperedge

• compute the result of the f function along the hyperedge• update the 1-best value for the current vertex if possible

Page 16: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 15IWPT 2005

Dynamic Programming: 1957Dynamic Programming: 1957

Dr. Andrew Viterbi

We knew everything so farin your talk 40 years ago

Dr. Richard Bellman

Page 17: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 16IWPT 2005

kk-best Viterbi algorithm 0: naïve-best Viterbi algorithm 0: naïve

• straightforward k-best extension:– a vector of length k instead of a single value– vector components maintain sorted– now what’s f (a, b) ?

• k2 values -- Cartesian Product f (ai, bj)• just need top k out of the k2 values• O(k2 logk) (sorting) or O(k2) (selection)

v

u: a

w: b

f1

: multk ( f1, a, b)

multk ( f, a, b) = topk { f (ai, bj) }

Page 18: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 17IWPT 2005

v

u: a

w: b

f1

u’: c

w’: d

f2

: mergek(multk ( f1, a, b), multk ( f2, c, d))

overall time complexity: O(k2|E|)

Algorithm 0: naïveAlgorithm 0: naïve

• straightforward k-best extension:– a vector of length k instead of a single value– and how to update?

• from two k-lengthed vectors (2k elements)• select the top k elements: O(k)

Page 19: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 18IWPT 2005

Algorithm 1: speedup Algorithm 1: speedup multmultkk

• only interested in top k, why enumerate all k2 ?• a and b are sorted!• f is monotonic!• so …?• f (a1, b1) must be the 1-best• the 2nd-best must be either f (a2, b1) or f (a1, b2)• what about the 3rd-best?

multk ( f, a, b) = topk{ f (ai, bj) }

Page 20: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 19IWPT 2005

Algorithm 1Algorithm 1 (Demo)(Demo)

.3.3.4.6

.5

.4

.3

.1

ai

b j .24.24

.30.30 .20.20

f (a, b) = ab

Page 21: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 20IWPT 2005

.3.3.4.6

.20.30.5

.24.4

.3

.1

Algorithm 1 (Demo)Algorithm 1 (Demo)

.18.18

.16.16b j

ai

f (a, b) = ab

Page 22: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 21IWPT 2005

Algorithm 1 (Demo)Algorithm 1 (Demo)

.3.3.4.6

.20.30.5

.16.24.4

.18.3

.1

.15.15

b j

ai

use a priority queue (heap) tostore the candidates (frontier) in each iteration:

1. extract-max from theheap

2. push the two“shoulders” into theheap

k iterations.

O(k logk |E|) overall time

Page 23: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 22IWPT 2005

Algorithm 2: speedup Algorithm 2: speedup mergemergekk

• if a vertex has d incoming hyperedges,Algorithm 1 takes time O(d k logk )– d multk and d mergek

• multk ( f, a, b) is just intermediate resultswe are only interested in the result ofmergek (multk ( f1, a, b), …, multk ( fd, x, y) )

v

u: a w: b

f1 fd

…p: x q: y

…fi

Page 24: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 23IWPT 2005

AlgorithmAlgorithm 2 (Demo)2 (Demo)

• can we do the mergek and multksimultaneously? same trick -- heapsort

0.70.4

0.6.42

0.1

0.40.3

0.9.36

0.5

0.90.7

0.4.36

0.1

item-level heap

B1 x C1 B2 x C2 B3 x C3

v k = 2, d = 3

Page 25: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 24IWPT 2005

Algorithm 2 (Demo)Algorithm 2 (Demo)

0.70.4

0.6.42

0.1

0.40.3

0.9.36

0.5

0.80.7

0.4.32

0.1

item-level heap

B1 x C1 B2 x C2 B3 x C3

.42

starts with an initial heap of the 1-best derivationsfrom each hyperedge

v k = 2, d = 3

Page 26: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 25IWPT 2005

Algorithm 2 (Demo)Algorithm 2 (Demo)

0.70.4

0.6.42

0.1

0.40.3

0.9.36

0.5

0.80.7

0.4.32

0.1

item-level heap

B1 x C1 B2 x C2 B3 x C3

starts with an initial heap of the 1-best derivationsfrom each hyperedge

v

but just need the top k among the d 1-best derivations

k = 2, d = 3

Page 27: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 26IWPT 2005

Algorithm 2 (Demo)Algorithm 2 (Demo)

0.70.4

0.6.42

0.1

0.40.3

0.9.36

0.5

0.80.7

0.4.32

0.1

item-level heap

B1 x C1 B2 x C2 B3 x C3

.42

v

pop the best (.42) and …

k = 2, d = 3

Page 28: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 27IWPT 2005

Algorithm 2 (Demo)Algorithm 2 (Demo)

0.70.4

0.6.42.24

0.1.07

0.40.3

0.9.36

0.5

0.80.7

0.4.32

0.1

item-level heap

B1 x C1 B2 x C2 B3 x C3

output

.42

v

pop the best (.42) and …

push the two shoulders (.07 and .24) as its successors

k = 2, d = 3

Page 29: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 28IWPT 2005

Algorithm 2 (Demo)Algorithm 2 (Demo)

0.70.4

0.6.42.24

0.1.07

0.40.3

0.9.36

0.5

0.80.7

0.4.32

0.1

item-level heap

B1 x C1 B2 x C2 B3 x C3

output

.36

.42

improves the O(dk log k) to O(d + k log k )

overall time complexity: O(|E|+|V|k log k )

v k = 2, d = 3

Page 30: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 29IWPT 2005

AlgorithmAlgorithm 3: Offline (lazy)3: Offline (lazy)

• from Alg. 0 to Alg. 2:– delaying the calculations until needed -- lazier– larger locality

• Even lazier… (one step further)– we are interested in the k-best derivations of the

final item only!

Page 31: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 30IWPT 2005

Algorithm 3: Offline (lazy)Algorithm 3: Offline (lazy)

• forward phase– do a normal 1-best search till the final item– construct the hypergraph (parse forest) along the way

• recursive backward phase– ask the final item: what’s your 2nd-best?– final item will propagate this question till the leaves– then ask the final item: what’s your 3rd-best?

Page 32: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 31IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42 ?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)1-best

after the “forward” step (1-best parsing):

forest = 1-best derivations from each hyperarc

Page 33: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 32IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42 ?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)1-best

what’s your 2nd-best?

now the backward step

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 34: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 33IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42 ?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)1-best

I’m not sure... let meask my parents…

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 35: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 34IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42 ?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)1-best

what’s your 2nd-best?

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 36: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 35IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42 ?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)1-best

or, equivalently…who’s your successor in

this hyperarc?

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 37: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 36IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42

?

?

?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

well, it must be either … or …

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 38: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 37IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42

?

?

?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

these are candidatesfor my 2nd-best

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 39: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 38IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42

??

?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

but wait a minute… did you already know the ?’s ?

oops… forgot to askmore questionsrecursively …

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 40: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 39IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42

?

?

?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

what’s your 2nd-best?

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 41: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 40IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

?0.6

.42

?

??

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

… …

… …

recursion goes on to the leaf nodes

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 42: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 41IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

0.50.6

.42

?0.3

?

NP (1, 3) VP (4, 7)

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

and reports back the numbers…

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 43: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 42IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

0.50.6

.42

.300.3

.21

NP (1, 3) VP (4, 7)

.30

.42k=2S (1, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

push .30 and .21 to the candidate heap (priority queue)

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

Page 44: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 43IWPT 2005

Algorithm 3 demoAlgorithm 3 demo

0.7

0.50.6

.42

.300.3

.21

NP (1, 3) VP (4, 7)

.30

.42k=2S (1, 7)

0.4

?0.5

.20 ?

NP (1, 2) VP (3, 7)

0.7

?0.4

.28 ?

VP (1, 5) NP (6, 7)

now I know my 2nd-best

pop the root of the heap (.30)

Page 45: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 44IWPT 2005

Summary of AlgorithmsSummary of Algorithms

globalO(|E|+|D|k log (d+k))generalized J&M

hyperedgeO(|E|)1-best

globalO(|E|+|D|k log k)alg. 3: lazy

item (mergek)O(|E|+|V|k log k)alg. 2

hyperedge (multk)O(k log k |E|)alg. 1

hyperedge (multk)O(ka|E|)alg. 0: naïve

LocalityTime ComplexityAlgorithms

for CKY: a=2 |E|=n3|P| |V|=n2|N| |D|=O(n)

Page 46: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 45IWPT 2005

OutlineOutline

• Formulations• Algorithms: Alg.0 thru Alg. 3• Experiments

– Collins/Bikel Parser– CKY-based MT decoder (Chiang, 2005)

• Conclusion

Page 47: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 46IWPT 2005

Bikel ParserBikel Parser

• Based on lexical context-free models (Collins, 2003)• We use it to emulate Collins Model 2• beam search (pruning on cells)

– cell [i, j] contains all items in the form of (A, i, j)– beam width x

• prune away items worse than x times the best item in the cell (threshold pruning in MT)

– cell limit y• only keep at most y best items in a cell (histogram pruning in MT)

Page 48: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 47IWPT 2005

EfficiencyEfficiency

Implemented Algorithms 0, 1, 3 on top of Bikel ParserAverage (wall-clock) time on section 23 (per sentence):

O(|E|+|D|k log k)

Page 49: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 48IWPT 2005

Quality of the Quality of the kk-best lists-best listsOracle Reranking --Oracle Reranking -- Accuracy (F-score)Accuracy (F-score)

Page 50: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 49IWPT 2005

Why are our k-best lists better?Why are our k-best lists better?

average number of parses for sentences of certain length

as sentences get longer, the number of parses should goup (exponentially)!

Collins

beam width 10-3 k=100beam width 10-4

k=100

Page 51: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 50IWPT 2005

MT decoder (Efficiency)MT decoder (Efficiency)

CKY-based Hiero decoder (Chiang, ACL 2005):implemented algorithms 2 and 3average decoding time (excluding 1-best part)

Page 52: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 51IWPT 2005

DiscussionsDiscussionsHyperpaths Hyperpaths vsvs. Derivations. Derivations

• hyperpath– a minimal sub-hypergraph

• every vertex has at mostone hyperedge

• derivation– a tree– a vertex can appear more

than once

• 1-best: always coincide

hypergraph

hyperpath

derivation

Page 53: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 52IWPT 2005

ConclusionConclusion

• monotonic hypergraph formulation– we solved the k-best derivations problem– not the k-shortest hyperpaths problem

• k-best Algorithms– Alg. 0 (naïve) thru Alg. 3 (lazy)

• experimental results– efficiency– accuracy (effective search over larger space)

Page 54: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 53IWPT 2005

THE ENDTHE END

Questions?

Comments?

Page 55: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 54IWPT 2005

Page 56: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 55IWPT 2005

Discussions (contDiscussions (cont’’d)d)Hyperpaths Hyperpaths vsvs. Derivations. Derivations

(B, i, j) (C, j+1, k)

(A, i, k)

Earley: not the case, but easy to fix

CKY: always coincide

Page 57: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 56IWPT 2005

Interesting PropertiesInteresting Properties

• 1-best is best everywhere (all decisionsoptimal)

• 2nd-best is optimal everywhere exceptone decision– and that decision must be 2nd-best– and it’s the best of all 2nd-best decisions

• so what about the 3rd-best?• kth-best is…

(Charniak and Johnson, ACL 2005)

local picture:

.3.3.4.6

.15.15.15.15.20.30.5

.12.12.12.12.16.16.24.4

.09.09.09.09.12.12.18.18.3

.03.03.03.03.04.04.06.06.1

.18.18

.16.16b j

ai

Page 58: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 57IWPT 2005

Quality of the Quality of the kk-best lists-best listsOracle Reranking --Oracle Reranking -- Relative ImprovementRelative Improvement

f ∈ R|T(e)|

→ R

Page 59: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 58IWPT 2005

Future WorkFuture Work

• k-best discriminative supertagging• Implement Alg. 2 and 3 for Bikel Parser and Alg. 0

and 1 for MT decoder (David)– so both experiments have all four algorithms

• Real Reranking (w/ Libin)• Chinese Parsing?• Formal Grammars and Hypergraphs• Case Factor Diagrams and Hypergraphs

Page 60: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 59IWPT 2005

Hypergraph is Everything!Hypergraph is Everything!

Generic Dynamic Programming

Shared Forest

Weighted Deduction

Hypergraph Searching

branching structures vs. finite-state structures

Page 61: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 60IWPT 2005

Convergence of FomulationsConvergence of Fomulations

Shared Forest

Weighted Deduction

Hypergraph

Page 62: Better k-best Parsing...–very close to our lazy offline algorithm (but for CKY only) –tested on a tiny grammar on WSJ (512 rules) IWPT 2005 Liang Huang and David Chiang 3 Outline

Liang Huang and David Chiang 61IWPT 2005

on aon a log scalelog scale……

?