A Weighted FSM implementation of alignment and translation ...people.csail.mit.edu/davidam/docs/SpeechRecoPresentation.pdf · Background Approach Alignment Model Translation Model

BackgroundApproach

Alignment ModelTranslation Model

ExperimentsConclusion

A Weighted FSM implementation of alignmentand translation models

Andres Munoz 1

David Alvarez-Melis 1

1Courant Institute, NYU

December 17, 2012

1 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation

BackgroundApproach



1 Background

2 Approach

3 Alignment Model

4 Translation Model

5 Experiments

6 Conclusion


BackgroundApproach



Background


BackgroundApproach



Source language sentences: f = f1 . . . fm, Target languagesentences: e = e1 . . . el .

Task 1: Given pairs (e, f ) of sentences, find the most likelyalignment in the target language.

Task 2: Given sentences in source language, generate atranslation in the target language.

Noisy channel approach:

p(e|f ) ∝ p(f |e) · p(e)

Translation Model Language Model


BackgroundApproach







p(e|f ) ∝ p(f |e) · p(e)



BackgroundApproach







p(e|f ) ∝ p(f |e) · p(e)



BackgroundApproach



Approach


BackgroundApproach



The IBM Models

Very hard to model p(f1 . . . fm|e1 . . . el ,m) directly. Insteaddefine p(f1 . . . fm, a1 . . . am|e1 . . . el ,m), where (a1, . . . , al) arealignment variables.

aj = k ⇒ fj is aligned to ek .

IBM2 uses the following model:

p(f1 . . . fm, a1 . . . am|e1 . . . el ,m) =m∏i=1

q(ai |i , l ,m) t(fi |eai )

Alignment Probability

Transition Probabilities


BackgroundApproach



The IBM Models




p(f1 . . . fm, a1 . . . am|e1 . . . el ,m) =m∏i=1





BackgroundApproach



The IBM Models




p(f1 . . . fm, a1 . . . am|e1 . . . el ,m) =m∏i=1





BackgroundApproach



t(f |e) are trained by using an EM algorithm.

How to model q(ai |i , l ,m)?

For simplicity, assume alignment probabilities depend only onrelative position (e.g. not in the particular words)

q(ai |i , l ,m) = e−α|i−j lm |

Note that we penalize alignments that move words too farfrom their position in the source language.

Intuition: Words tend to remain in the same part of thesentence when translated.


BackgroundApproach



It possible to represent the IBM Model as a FSM?

How to implement a translation machine (no target sentenceprovided) based on this model?

How many Transducers/Automata are needed?

Can these implementations be done time/space efficient?


BackgroundApproach



Alignment Model


BackgroundApproach



The FSM ingredients:

Automata encoding e and f → E ,F .

0 1The

2red

3house

0 1La

2maison

3rouge

A flower transducer T , with word translation probabilitiest(fi |ej). Transitions: fi : ej/t(fi |ej).

0

f:e/t(f|e)


BackgroundApproach



An automaton EP , the same length as f with permuations of

the words of e, and transition weights e−α|i−j lm |.

0 1

The/0

red/0.200

house/0.4002

The/0.200

red/0

house/0.2003/0

The/0.400

red/0.200

house/0


BackgroundApproach



The full alignment model cascade: EP ◦ T ◦ FThe result (candidate alignments) are projected into the targetspace, and a best path (over the tropical semiring) is found.

FSM code:

fsmcompile −s log −i fwords.syms <AutoFr.txt | ...fsmcompose − translator.fsm | fsmcompose − ...Align.fsm | fsmconvert −s tropical | ...fsmbestpath − | fsmproject −2 − >Predicted.fsm

Result:

0 1La:The/3.873

2maison:house/2.618

3/0rouge:red/0.851


BackgroundApproach



Computing Alignment Error with Transducers

Given the gold alignment of each sentence a∗, with eachalignment scored as “Possible” or “Sure”, how to measure theaccuracy of the model?

Compute AER: Alignment Error Rate (Och and Ney, 2003).

AER(A,P,S) = 1− |P ∩ A|+ |S ∩ A||S |+ |A|


BackgroundApproach



Flower alignment error K transducer with 0-1 loss:

0/0

i:j/1i:i/0

Predicted P and Gold G alignments encoded as automatafrom X = {1, . . . ,m} to Y = {1, . . . , l}. To, if fi is aligned toej , the i-th transition is labeled j .

Compose P ◦ K ◦ G, find bestpath, project, push costs andread cost as number of wrong alignments.


BackgroundApproach



Translation Model


BackgroundApproach



Problem: Given a french sentence try to give the besttranslation in English.

Use the transducer T to solve this problem

Search space is too big (French words can translate to severalwords)

Unknown size of target sentence.

Nondeterministic automaton of size > 106 before compositionwith language model.


BackgroundApproach



Prune the automaton with plausible translations

Compose with bigram (faster than trigram) model to get thetop n sentences

Allow permutations on said sentences and rescore usingtrigram model.

Find best path.


BackgroundApproach



Source sentence

0 1cette

2maison

3est

4rouge


BackgroundApproach



Plausible translations after pruning

0 1

cette:containing/1.423

cette:deplores/1.178

cette:devolving/1.203

cette:Five/1.323

cette:gaggle/1.169

cette:predicament/1.261

cette:rebalanced/1.245

cette:spine/1.326

cette:Ursula/1.167

cette:washing/1.394

2

maison:halfway/1.638

maison:timer/1.833

maison:Wright/1.600

3est:east/0.683

est:is/0.7864/0

rouge:red/0.651


BackgroundApproach



Best 5 sentences after composition with bigram model

0

1this/5.173

6this/5.173

11this/5.173

16

this/5.173

21

this/5.173

2house/8.588

7home/13.06

12homes/14.91

17acquired/15.19

22house/8.588

3is/4.6244red/13.11

5/0</s>/5.221

8is/5.629 9red/13.11 10/0</s>/5.221

13is/5.538

14red/13.11

15/0</s>/5.221

18is/6.125

19

red/13.11

20/0</s>/5.221

23

adopted/12.78

24

red/13.31

25/0</s>/5.221


BackgroundApproach



Final translation

0 1this/8.798

2house/15.47

3is/8.558

4red/25.80

5/0</s>/10.77


BackgroundApproach



Experiments


BackgroundApproach



Dataset: Transcribed debates of the Canadian House ofCommons, French ↔ English. IBM2 trained with ≈ 50, 000sentence pairs.

Test set: 447 sentences, along with their gold alignments.

AER: 42.1%, Precision 58.93%, Recall 68.72. 30 mincomputation. Not very accurate!

In translation the typical automaton of possible translationshas more than 106 transitions and


BackgroundApproach



Conclusion


BackgroundApproach



Conclusion

IBM models are extremely simplistic - Outdated.

AER obtained is poor.

Yet, they provide a clear conceptual interpretation of MT

More recent approaches:

Phrase- Based → Syntax-Based → Hierarchical Phrase-based

Possible improvement: add fertility model, allow for “null”alignments to appear in source language.


BackgroundApproach



D. Jurafsky and J. H. Martin, Speech and LanguageProcessing - An introduction to Natural Language Processing,Computational Linguistics, and Speech Recognition, PearsonPrentice Hall, second ed., 2008.

K. Knight and Y. Al-Onaizan, Translation withfinite-state devices, in Proceedings of the Third Conference ofthe AMTA, Springer-Verlag, 1998, pp. 421–437.

S. Kumar and W. Byrne, A weighted finite statetransducer implementation of the alignment template modelfor statistical machine translation, in Proceedings ofHLT-NAACL, Association for Computational Linguistics, 2003,pp. 63–70.


Documents

A Weighted FSM implementation of alignment and translation ...people.csail.mit.edu/davidam/docs/SpeechRecoPresentation.pdf · Background Approach Alignment Model Translation Model