Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
A Weighted FSM implementation of alignmentand translation models
Andres Munoz 1
David Alvarez-Melis 1
1Courant Institute, NYU
December 17, 2012
1 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
1 Background
2 Approach
3 Alignment Model
4 Translation Model
5 Experiments
6 Conclusion
2 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Background
3 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Source language sentences: f = f1 . . . fm, Target languagesentences: e = e1 . . . el .
Task 1: Given pairs (e, f ) of sentences, find the most likelyalignment in the target language.
Task 2: Given sentences in source language, generate atranslation in the target language.
Noisy channel approach:
p(e|f ) ∝ p(f |e) · p(e)
Translation Model Language Model
4 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Source language sentences: f = f1 . . . fm, Target languagesentences: e = e1 . . . el .
Task 1: Given pairs (e, f ) of sentences, find the most likelyalignment in the target language.
Task 2: Given sentences in source language, generate atranslation in the target language.
Noisy channel approach:
p(e|f ) ∝ p(f |e) · p(e)
Translation Model Language Model
4 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Source language sentences: f = f1 . . . fm, Target languagesentences: e = e1 . . . el .
Task 1: Given pairs (e, f ) of sentences, find the most likelyalignment in the target language.
Task 2: Given sentences in source language, generate atranslation in the target language.
Noisy channel approach:
p(e|f ) ∝ p(f |e) · p(e)
Translation Model Language Model
4 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Approach
5 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
The IBM Models
Very hard to model p(f1 . . . fm|e1 . . . el ,m) directly. Insteaddefine p(f1 . . . fm, a1 . . . am|e1 . . . el ,m), where (a1, . . . , al) arealignment variables.
aj = k ⇒ fj is aligned to ek .
IBM2 uses the following model:
p(f1 . . . fm, a1 . . . am|e1 . . . el ,m) =m∏i=1
q(ai |i , l ,m) t(fi |eai )
Alignment Probability
Transition Probabilities
6 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
The IBM Models
Very hard to model p(f1 . . . fm|e1 . . . el ,m) directly. Insteaddefine p(f1 . . . fm, a1 . . . am|e1 . . . el ,m), where (a1, . . . , al) arealignment variables.
aj = k ⇒ fj is aligned to ek .
IBM2 uses the following model:
p(f1 . . . fm, a1 . . . am|e1 . . . el ,m) =m∏i=1
q(ai |i , l ,m) t(fi |eai )
Alignment Probability
Transition Probabilities
6 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
The IBM Models
Very hard to model p(f1 . . . fm|e1 . . . el ,m) directly. Insteaddefine p(f1 . . . fm, a1 . . . am|e1 . . . el ,m), where (a1, . . . , al) arealignment variables.
aj = k ⇒ fj is aligned to ek .
IBM2 uses the following model:
p(f1 . . . fm, a1 . . . am|e1 . . . el ,m) =m∏i=1
q(ai |i , l ,m) t(fi |eai )
Alignment Probability
Transition Probabilities
6 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
t(f |e) are trained by using an EM algorithm.
How to model q(ai |i , l ,m)?
For simplicity, assume alignment probabilities depend only onrelative position (e.g. not in the particular words)
q(ai |i , l ,m) = e−α|i−j lm |
Note that we penalize alignments that move words too farfrom their position in the source language.
Intuition: Words tend to remain in the same part of thesentence when translated.
7 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
It possible to represent the IBM Model as a FSM?
How to implement a translation machine (no target sentenceprovided) based on this model?
How many Transducers/Automata are needed?
Can these implementations be done time/space efficient?
8 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Alignment Model
9 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
The FSM ingredients:
Automata encoding e and f → E ,F .
0 1The
2red
3house
0 1La
2maison
3rouge
A flower transducer T , with word translation probabilitiest(fi |ej). Transitions: fi : ej/t(fi |ej).
0
f:e/t(f|e)
10 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
An automaton EP , the same length as f with permuations of
the words of e, and transition weights e−α|i−j lm |.
0 1
The/0
red/0.200
house/0.4002
The/0.200
red/0
house/0.2003/0
The/0.400
red/0.200
house/0
11 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
The full alignment model cascade: EP ◦ T ◦ FThe result (candidate alignments) are projected into the targetspace, and a best path (over the tropical semiring) is found.
FSM code:
fsmcompile −s log −i fwords.syms <AutoFr.txt | ...fsmcompose − translator.fsm | fsmcompose − ...Align.fsm | fsmconvert −s tropical | ...fsmbestpath − | fsmproject −2 − >Predicted.fsm
Result:
0 1La:The/3.873
2maison:house/2.618
3/0rouge:red/0.851
12 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Computing Alignment Error with Transducers
Given the gold alignment of each sentence a∗, with eachalignment scored as “Possible” or “Sure”, how to measure theaccuracy of the model?
Compute AER: Alignment Error Rate (Och and Ney, 2003).
AER(A,P,S) = 1− |P ∩ A|+ |S ∩ A||S |+ |A|
13 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Flower alignment error K transducer with 0-1 loss:
0/0
i:j/1i:i/0
Predicted P and Gold G alignments encoded as automatafrom X = {1, . . . ,m} to Y = {1, . . . , l}. To, if fi is aligned toej , the i-th transition is labeled j .
Compose P ◦ K ◦ G, find bestpath, project, push costs andread cost as number of wrong alignments.
14 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Translation Model
15 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Problem: Given a french sentence try to give the besttranslation in English.
Use the transducer T to solve this problem
Search space is too big (French words can translate to severalwords)
Unknown size of target sentence.
Nondeterministic automaton of size > 106 before compositionwith language model.
16 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Prune the automaton with plausible translations
Compose with bigram (faster than trigram) model to get thetop n sentences
Allow permutations on said sentences and rescore usingtrigram model.
Find best path.
17 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Source sentence
0 1cette
2maison
3est
4rouge
18 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Plausible translations after pruning
0 1
cette:containing/1.423
cette:deplores/1.178
cette:devolving/1.203
cette:Five/1.323
cette:gaggle/1.169
cette:predicament/1.261
cette:rebalanced/1.245
cette:spine/1.326
cette:Ursula/1.167
cette:washing/1.394
2
maison:halfway/1.638
maison:timer/1.833
maison:Wright/1.600
3est:east/0.683
est:is/0.7864/0
rouge:red/0.651
19 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Best 5 sentences after composition with bigram model
0
1this/5.173
6this/5.173
11this/5.173
16
this/5.173
21
this/5.173
2house/8.588
7home/13.06
12homes/14.91
17acquired/15.19
22house/8.588
3is/4.6244red/13.11
5/0</s>/5.221
8is/5.629 9red/13.11 10/0</s>/5.221
13is/5.538
14red/13.11
15/0</s>/5.221
18is/6.125
19
red/13.11
20/0</s>/5.221
23
adopted/12.78
24
red/13.31
25/0</s>/5.221
20 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Final translation
0 1this/8.798
2house/15.47
3is/8.558
4red/25.80
5/0</s>/10.77
21 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Experiments
22 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Dataset: Transcribed debates of the Canadian House ofCommons, French ↔ English. IBM2 trained with ≈ 50, 000sentence pairs.
Test set: 447 sentences, along with their gold alignments.
AER: 42.1%, Precision 58.93%, Recall 68.72. 30 mincomputation. Not very accurate!
In translation the typical automaton of possible translationshas more than 106 transitions and
23 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Conclusion
24 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
Conclusion
IBM models are extremely simplistic - Outdated.
AER obtained is poor.
Yet, they provide a clear conceptual interpretation of MT
More recent approaches:
Phrase- Based → Syntax-Based → Hierarchical Phrase-based
Possible improvement: add fertility model, allow for “null”alignments to appear in source language.
25 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation
BackgroundApproach
Alignment ModelTranslation Model
ExperimentsConclusion
D. Jurafsky and J. H. Martin, Speech and LanguageProcessing - An introduction to Natural Language Processing,Computational Linguistics, and Speech Recognition, PearsonPrentice Hall, second ed., 2008.
K. Knight and Y. Al-Onaizan, Translation withfinite-state devices, in Proceedings of the Third Conference ofthe AMTA, Springer-Verlag, 1998, pp. 421–437.
S. Kumar and W. Byrne, A weighted finite statetransducer implementation of the alignment template modelfor statistical machine translation, in Proceedings ofHLT-NAACL, Association for Computational Linguistics, 2003,pp. 63–70.
25 / 25 Munoz, Alvarez-Melis WFSM Implementation for Translation