57
Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005) Chris Quirk, Arul Menezes and Colin Cherry

Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

  • Upload
    tex

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005). Chris Quirk, Arul Menezes and Colin Cherry. Outline. Limitations of SMT and previous work Modeling and training Decoding Experiments Conclusion. Limitations of string-based phrasal SMT. - PowerPoint PPT Presentation

Citation preview

Page 1: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Dependency Treelet Translation: Syntactically Informed Phrasal SMT

(ACL 2005)

Chris Quirk, Arul Menezes

and Colin Cherry

Page 2: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Outline

• Limitations of SMT and previous work

• Modeling and training

• Decoding

• Experiments

• Conclusion

Page 3: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Limitations of string-based phrasal SMT

• It allows only limit phrase reordering.– Ex: max jump, max skip

• It cannot express linguistic generalization: – Ex: they cannot express “SOV SVO”

• Source and target phrases have to be contiguous: – Ex: it cannot handle “ne … pas”

Page 4: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Previous work on syntactic SMT: Simultaneous parsing

• Inversion Transduction Grammars (Wu, 1997)– Using simplifying assumptions: X AB

• Head transducer (Alshawi et al., 2000)– Simultaneous induction of src and tgt

dependency trees

Page 5: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Previous work on syntactic SMT: parsing + transfer

• Tree-to-string (Yamada and Knight, 2001)– Parse tgt sentence, and convert the tgt tree to a src

string

• Path-based transfer model (Lin, 2004)– Translate paths in src dependency trees

• LF-level transfer (Menezes and Richardson, 2001)– Parse both sr and tgt.

Page 6: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Previous work on syntactic SMT:pre- or post-processing

• Post-processing (JHU 2003): re-ranking the n-best list of SMT output using syntactic models.– Parse MT output– No improvement, even when n=16,000

• Pre-processing (Xia & McCord, 2004; Colins et al, 2005; ….): – Reorder src sents before SMT– Some improvement

Page 7: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Outline

• Limitations of SMT and previous work

• Modeling and training

• Decoding

• Experiments

• Conclusion

Page 8: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

What’s new?

• The union of translation: a treelet pair.– A treelet is an arbitrary connected subgraph (not

necessarily a subtree) of a dependency tree.– In comparison:

• Src n-grams: “phrase”-based SMT:• Path: (Lin, 2004)• Context-free rules: many transfer-based MT systems

Decoding is more complicated.

Page 9: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Required modules

• Source dependency parser

• Target word segmenter / tokenizer

• Word aligner: GIZA++

Page 10: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Major steps for training

1. Align src and tgt words

2. Parse source side

3. Project dependency trees

4. Extract treelet translation pairs

5. Train an order model

6. Train other models

Page 11: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 1: Word alignment

• Use GIZA++ to get alignments in both directions, and combine the results with heuristics.

• One constraint: for n-to-1 alignments, the n src words have to be adjacent in the src dependency tree.

Page 12: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Heuristics used to accept alignments from the union

It does not accept m-to-n alignments

Page 13: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 2: parsing source side

• It requires a source dependency parser that – produces unlabeled, ordered dependency trees, and – annotates each src word with a POS tag

• Their system does not allow crossing dependencies:– h(i)=k for any j between i and k, h(j) is also

between i and k.

Page 14: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 3: Projecting dependency trees

• Add links in the tgt dependency tree according to word alignment types:– 1-to-1: trivial– n-to-1: trivial– 1-to-n: use heuristics– Unaligned tgt words: use heuristics– Unaligned src words: ignore them

Page 15: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

1-to-1 and n-to-1 alignments

sk sl Sl’

titj

Page 16: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

1-to-n alignment

a b

b2’a’b1’

The n tgt words should move as a unit: - treat the rightmost one as the head - all other words depend on it.

Page 17: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Unaligned target words

tktjti

Given unaligned tgt word at position j, find the closest positions (i,k), s.t. j is between i and k and ti depends on tk (or vice versa).

Such (i,k) might not exist.Because no crossing is allowed, if (i,k) exists, it is unique.

Page 18: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

An example

startup properties and options

proprietes et options de demarrage

Page 19: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

The reattachment pass to ensure phrasal cohesion

demarrageproprietes et options de

et

proprietes options

de

demarrage proprietes options

de

demarrage

et

Page 20: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Reattachment pass

• “For each node in the wrong order (relative to its siblings), we reattach it to the lowest of its ancestors s.t. it is in the correct place relative to its siblings and parent”.

• Question: how does the reattachment work?– In what order are tree nodes checked?– Once a node is moved, can it be moved again?– How many levels do we have to check to decide

where to attach a node?

Page 21: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

An example

11

913

5 8

3 1 12

6 15

7 10 2

4 14

Page 22: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 3: Projecting dependency trees(Recap)

• Before reattachment, the src and tgt dependency trees are almost isomorphic:– n-to-1: treat “many” src words as one node– 1-to-n: treat “many” tgt words as one node.– Unaligned tgt words: – Unaligned src words:

• After reattachment, the two trees can look very different.

Page 23: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 4: Extracting treelet translation pairs

• “We extract all pairs of aligned src and tgt treelets along with word-level alignment linkages, up to a configurable max size.”

• Due to the reattachment step, a src treelet might not align to a tgt treelet.

Page 24: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Extraction algorithm

• Enumerate all possible source treelets.

• Look at the union of the target nodes aligned to source nodes. If it is a treelet, keep the treelet pair.

• Allow treelets with wildcard roots.– Ex: doesn’t * ne * pas

• Max size of treelets: in practice, up to 4 src words.

• Question: how many source treelets are there?

Page 25: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

An example

startup properties and options

proprietes et options de demarrage

Page 26: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 5: training an order model

Page 27: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Another representation

Page 28: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Learning a dependent’s position w.r.t. its head

P(pos(m,t) | S, T): S: src dependency tree T: unordered tgt dependency tree t (a.k.a. “h”): a node in T m: a child of t

)))(()),(()),((

)),(()),((

),(),(|)((

),|)((

)),|),((

msrcpostsrccatmsrccat

tsrclexmsrclex

tlexmlexmposP

TSmposP

TStmposP

Use a decision tree to decide pos(m)

Page 29: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)
Page 30: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

The prob of the order of tgt tree

c(t) is the set of nodes modifying t. (i.e., the children of t in the dependency tree)

Assumption: the position of each child can be modeled independently in terms of head-relative position

Page 31: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

The order model (cont)

Tt tcm

Tt

TStmposP

TStcorderP

TSTorderP

)(

),|),((

),|))(((

),|)((

Comment: this model is both straightforward andKind of counter-intuitive since treelets are subgraphs.

Page 32: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 6: train other models

)|()|( ii

i tspTSP (si, ti) is a treelet pair.

Two models: - MLE:

- IBM Model 1

It assumes the uniform dist over all possible Decompositions of a tree into treelets.

Page 33: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Step 6: train other models (cont)

• Target LM: n-gram LM

• Other features:– Target word number: word penalty– The number of “phrases” used.– ….

Page 34: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Treelet vs. string-based SMT

• Similarities:– Use the log-linear framework.– Similar features: LM, word penalty, …

• Differences:– Use treelet TM, instead of string-based TM.– The order model is w.r.t. dependency trees.

Page 35: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Outline

• Limitations of SMT and previous work

• Modeling and training

• Decoding

• Experiments

• Conclusion

Page 36: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Challenges

• Traditional left-to-right decoding approach is inapplicable.

• The need to handle treelets: perhaps discontiguous or overlapping

Page 37: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Ordering strategies

• Exhaustive search

• Greedy ordering

• No ordering

Page 38: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Exhaustive search

• For each input node s, find the set of all treelet pairs that match S and are “rooted” at s.

• Move bottom up through the src dependency tree, computing a list of possible tgt trees for each src subtree.

• When attaching one subtree to another, try all possible permutations of children of root node.

Page 39: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Definitions

Page 40: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Exhaustive decoding algorithm

Page 41: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Greedy ordering

• Too many permutations to consider in exhaustive search.

• In the greedy ordering:– Given a fixed pre- and post-modifier count,

we choose the best modifier for each position.

Page 42: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Greedy ordering algorithm

Page 43: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Numbers of candidatesconsidered at each node

• c: # of children specified in treelet pair

r: # of subtrees needed to be attached.

• Exhaustive search: (c+r+1)! / (c+1)!

• Greedy search: (c+r)r2

Page 44: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Dynamic Programming

• In string-based SMT, hyps for the same covered src word vector:– The last two target words in the hyp: for LMList size is O(V2)

• In treelet translation, hyps for the same src subtree:

– The head word: for the order model– The first two target words: for LM– The last two target words: for LMList size is O(V5)DP does not allow for great saving because of the context we

have to keep.

Page 45: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Duplicate elimination

• To eliminate unnecessary ordering operations, they use a hash table to check whether an unordered T has appeared before.

Page 46: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Pruning

• Prune treelet pairs (before the search starts):– Keep pairs whose MLE prob > threshold– Given a src treelet, keep those whose prob

within a ratio r of the best pair.

• N-best lists:– Keep the N-best for each node in src dep

tree.

Page 47: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Outline

• Limitations of SMT and previous work

• Modeling and training

• Decoding

• Experiments

• Conclusion

Page 48: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Setting• Eng-Fr corpus of Microsoft technical data

• Eng parser (NLPWIN): rule-based in-house parser.

Page 49: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Main results

Max phrase size = 4

Page 50: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Effect of max phrase size

Page 51: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Effect of training set size

1K 3K 10K 30K 100K 300K

Pharaoh 17.20 22.51 27.70 33.73 38.83 42.75

Treelet 18.70 25.39 30.96 35.81 40.66 44.32

diff 1.50 2.88 3.26 2.08 1.83 1.57

Page 52: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Effect of ordering strategies

Page 53: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Effect of allowing discontiguous phrases

Page 54: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Effect of optimization

Page 55: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Conclusion

• Modeling:– Treelet translation– Order model based on dependency structure

• Training:– Projecting tgt dependency tree using heuristics– Learn treelet pairs

• Decoding:– Exhaustive search– Greedy ordering

• Results: better performance than SMT, specially for small max phrase size.

Page 56: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Advantages

• Over SMT– Src phrases do not have to be contiguous n-

grams.– It can express linguistic generations.

• Over previous transfer-based approaches:– Treelets are more expressive than paths or

context-free rules.

Page 57: Dependency Treelet Translation: Syntactically Informed Phrasal SMT (ACL 2005)

Discussion

• Projecting tgt dependency tree:– Reattachment: how and why?

• Extracting treelet pairs:– How many subgraphs?

• Order model:

• Decoding: when hyps are extended, updating the score is more complicated.