Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin...

Preview:

DESCRIPTION

The generative story Source word Head words Links to zero or more non-head words (same side) Non-head words Linked from one head word (same side) Deleted words No link in source side Target words Head words Links to zero or more non-head words (same side) Non-head words Linked from one head word (same side) Spurious words No link in target side

Citation preview

Getting the structure right for word alignment: LEAF

Alexander Fraser and Daniel Marcu

Presenter Qin Gao

Problem

IBM Models have 1-N

assumption

Solutions

A sophisticated

generative story

Generative Estimation of parametersAdditional Solution

Decompose the model

components

Semi-supervised

training

ResultSignificant

Improvement on BLEU (AR-

EN)

Quick summary

The generative storySource word

Head words Links to zero or more non-head words (same side)

Non-head words

Linked from one head word (same side)

Deleted words No link in source sideTarget words

Head words Links to zero or more non-head words (same side)

Non-head words

Linked from one head word (same side)

Spurious words

No link in target side

Minimal translational correspondence

The generative story

A B C

1a. Condition: Source word

A B C

1b. Determine source word class

A B C

2a. Condition on source classes

C(A) C(B) C(C)

2b. Determine links between head word and non-head words

C(A) C(B) C(C)

3a. Depends on the source head word

A B C

3b. Determine the target head word

A B C

X

4a. Conditioned on source head word and cept size

A B C

X

2

4b. Determine the target cept size

A B C

X

2

?

5a. Depend on the existing sentence length

A B C

X

2

?

5b. Determine the number of spurious target words

A B C

X

2

? ?

6a. Depend on the target word

A B C

X ? ?XYZ

6b. Determine the spurious word

A B C

X ? ZXYZ

7a. Depends on target head word’s class and source word

A B C

C(X) ? Z

7b. Determine the non-head word it linked to

A B C

C(X) Y Z

8a. Depends on the classes of source/target head words

C(A) B C

C(X) Y Z

1 2 3

2

8b. Determine the position of target head word

C(A) B C

C(X)

Y Z

1 3

2

8c. Depends on the target word class

C(A) B C

C(X)

Y Z

1 3

32

8d. Determine the position of non-headwords

C(A) B C

C(X) Y

Z

1

1 32

9. Fill the vacant position uniformly

C(A) B C

C(X) YZ

1 32

(10) The real alignment

C(A) B C

C(X) YZ

Unsupervised parameter estimation

Bootstrap using HMM alignments in two directions Using the intersection to determine

head words Using 1-N alignment to determine target

cepts Using M-1 alignment to determine

source cepts Could be infeasible

Training: Similar to model 3/4/5

From some alignment (not sure how they get it), apply one of the seven operators to get new alignments

Move French non-head word to new head, move English non-head word to new head, swap heads of two French non-head words, swap heads of two English non-head words, swap English head word links of two French head

words, link English word to French word making new head

words, unlink English and French head words.

All the alignments that can be generated by one of the operators above, are called neighbors of the alignment

Training If we have better alignment in the

neighborhood, update the current alignment

Continue until no better alignment can be found

Collect count from the last neighborhood

Semi-supervised training Decompose the components in the large formula

treat them as features in log-linear model And other features

Used EMD algorithm (EM-Discriminative) method

Experiment First, a very weird operation, they

fully link alignments from ALL systems and then compare the performance

Training/Test Set

Experiments French/English: Phrase based Arabic/English: Hierarchical (Chiang

2005) Baseline: GIZA++ Model 4, Union Baseline Discriminative: Only using

Model 4 components as features

Conclusion(Mine) The new structural features are

useful in discriminative training No evidence to support the

generative model is superior over model 4

Unclear points Are F scores “biased?” No BLEU score given for LEAF

unsupervised They used features in addition to

LEAF features, where is the contribution comes from?