Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
Factored Models
Word-Based SMT ModelsWord-based SMT: Generative Model
Bakom huset hittade polisen en stor mängd narkotika .
Bakom huset huset hittade polisen en stor mängd mängd narkotika .
Behind the house found police a large quantity of narcotics .
Behind the house police found a large quantity of narcotics .
1 Fertility2 Word translation3 Output ordering
Fertility (and NULL insertion)Word translationRe-ordering (distortion)
Phrase-Based SMT ModelsPhrase-based SMT: Generative Model
Bakom huset hittade polisen en stor mängd narkotika .
Behind the house found police a large quantity of narcotics .
Behind the house police found a large quantity of narcotics .
1 Phrase segmentation2 Phrase translation3 Output ordering(The document-level decoder docent also implements a phrase-based SMT model!)
Problems with These Models
No use of morphology:• treat inflectional variants (“look”, “looks”, “looked”) as
completely different words!• in learning translation models: knowing how to translate
“look” doesn’t help to translate “looks”
Works fine for English (and reasonable amounts of data)
Problems:• morphologically rich languages• sparse data sets• flexible word order
Factored Models
Morphology• is productive• well understood• generalizable patterns
Factored models• learn translations of base forms• learn to map morphology• learn to generate target surface form
Factored Models
Represent words by factors
4
Factored translation models
• Factored represention of words
word word
part-of-speech
OutputInput
morphology
part-of-speech
morphology
word class
lemma
word class
lemma
......• Goals
– Generalization, e.g. by translating lemmas, not surface forms– Richer model, e.g. using syntax for reordering, language modeling)
Koehn, U Edinburgh ESSLLI Summer School Day 5
5
Related work
• Back off to representations with richer statistics (lemma, etc.)[Nießen and Ney, 2001, Yang and Kirchhoff 2006, Talbot and Osborne 2006]
• Use of additional annotation in pre-processing (POS, syntax trees, etc.)[Collins et al., 2005, Crego et al, 2006]
• Use of additional annotation in re-ranking (morphological features, POS,syntax trees, etc.)[Och et al. 2004, Koehn and Knight, 2005]
→ we pursue an integrated approach
• Use of syntactic tree structure[Wu 1997, Alshawi et al. 1998, Yamada and Knight 2001, Melamed 2004,Menezes and Quirk 2005, Chiang 2005, Galley et al. 2006]
→ may be combined with our approach
Koehn, U Edinburgh ESSLLI Summer School Day 5109
Factored Models
Represent words by factors? Why?• combine scores for translating various factors• back-off to other factors (lemma)• use various factors for reordering• better word alignment (?)
Better generalization• can translate words that we haven’t seen in training• better statistics for translation options
Richer model (more (linguistic) information)• PoS, syntactic function, semantic role, ...
Remember Transfer-Based Systems?
source language target language
anal
ysis
generation
transfer
create a factoredrepresentation
generate surface formfrom factoredrepresentation
Factored Models (Example)Factored Translation Models Syntax-Oriented Statistical Models Example-based MT
Decomposing translation: example
lemma lemma
part-of-speech
OutputInput
morphology
part-of-speech
word word
Itranslate lemma and POS separately
Igenerate surface word forms from translated factors
Jörg Tiedemann 5/37
translation steps
generation stepanalysis step
Factored Models
Integrate with phrase-based SMT• phrase translations over various factors• single factors or combined factors• easy to integrate as additional feature functions• add probabilistic generation models in global search• language models over various factors
Segmentation level• n-grams for translation of factors• words for generation
Factored Models
source language target language
anal
ysis
create a factoredrepresentation
generation
translation translate andgenerate surface formfrom factoredrepresentationdecoding
Factored Models
Framework - not a single model• any kind of word-level factor is possible• any combination of translation / generation steps• combination of alternative translation paths
Training• extract phrase translations from factored bitexts• compute generation models from factored corpora
Decoding• many more options need to be considered• slow decoding
Training Factored Translation Models
IBM word alignment + symmetrizationPhrase extraction + scoring with MLE
Factored Translation Models Syntax-Oriented Statistical Models Example-based MT
Training Factored models
I use GIZA++ word alignments again + symmetrizationI phrase extraction + scoring on factorsI (could also use several factors in one step!)
natürlichhatjohnspassam
spiel
naturally
john
has
fun
with
the
game
) natürlich hat john — naturally john has
ADV
V
NNP
NN
P
NN
ADV
NNP
V NN
P DET
NN
) ADV V NNP — ADV NNP V
! create phrase tables for each translation factorJörg Tiedemann 7/37
Factored Translation Models Syntax-Oriented Statistical Models Example-based MT
Training Factored models
I use GIZA++ word alignments again + symmetrizationI phrase extraction + scoring on factorsI (could also use several factors in one step!)
natürlichhatjohnspassam
spiel
naturally
john
has
fun
with
the
game
) natürlich hat john — naturally john has
ADV
V
NNP
NN
P
NN
ADV
NNP
V NN
P DET
NN
) ADV V NNP — ADV NNP V
! create phrase tables for each translation factorJörg Tiedemann 7/37
Training Factored Generation Models
Generation steps map target factors to target factors• typically trained on target side of parallel corpus• may be trained on additional monolingual data
Example: The/det man/nn sleeps/vbz• count collection
- count(the,det)++- count(man,nn)++- count(sleeps,vbz)++
• evidence for probability distributions (MLE)- p(det|the), p(the|det)- p(nn|man), p(man|nn)- p(vbz|sleeps), p(sleeps|vbz)
Factored Models
Use benefits of general phrase-based SMT!• factored models as alternative paths (or backoff)
Factored Translation Models Syntax-Oriented Statistical Models Example-based MT
Factored modelsI Basic phrase-based SMT is very powerful!I Why generalizing if we know specific translation?
lemma lemma
part-of-speech
OutputInput
morphology
part-of-speech
word wordor
I prefer surface model for known wordsI use morphgen model as back-off
Jörg Tiedemann 8/37
Factored Models
Could use generation step to enrich language model
No factored translationEnable PoS LM (better model for word order?!)
Factored Translation Models Syntax-Oriented Statistical Models Example-based MT
Factored models
I could also use factors only to enrich target language
word word
part-of-speech
OutputInput
! generalized re-ordering (include POS-LM!)
Jörg Tiedemann 9/37
Factored Models (Results)
Do not always lead to improvementsFactored Translation Models Syntax-Oriented Statistical Models Example-based MT
Factored models: Results & Summary
Some Results (German/English, Koehn):
System In-domain Out-of-domainBaseline 18.19 15.01
With POS LM 19.05 15.03Morphgen model 14.38 11.65Both model paths 19.47 15.23
I factors on token levelI flexible SMT frameworkI many possible factors & translation/generation stepsI not much success yet ...
Jörg Tiedemann 10/37
Factored Models
Full support in Moses:• http://www.statmt.org/moses/?n=Moses.FactoredTutorial
Data Format (example):
• 4 source language factors (word|lemma|pos|morph)• 3 target language factors (word|lemma|pos)
==> factored-corpus/proj-syndicate.de <== korruption|korruption|nn|nn.fem.cas.sg floriert|florieren|vvfin|vvfin .|.|per|per
==> factored-corpus/proj-syndicate.en <== corruption|corruption|nn flourishes|flourish|nns .|.|.
Factored Models
Training example:
• translate surface words to surface + PoS• include surface word LM and PoS LM
train-model.perl \ --root-dir pos \ --corpus factored-corpus/proj-syndicate.1000 \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 0-0,2 \ --input-factor-max 4
Factored Models
Phrase table:
• source word translates to 2 factors (word+PoS)• 4 scores as usual
frage ||| issue|nn ||| 0.25 0.285714 0.25 0.166667 frage ||| question|nn ||| 0.75 0.625 0.75 0.416667 " ) , ein neuer film ||| "|" a|dt new|jj film|nn ||| 1 0.00403191 1 0.128157
Factored Models
A more complex example:
• translate lemmas• generate PoS from target language lemma• translate morphology into PoS• generate surface forms from lemma and PoS
train-model.perl \ --root-dir morphgen \ --corpus factored-corpus/proj-syndicate.1000 \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+3-2 \ --generation-factors 1-2+1,2-0 \ --decoding-steps t0,g0,t1,g1
Factored Models
Multiple decoding paths:
• path 1: same as previous model (t0, g0, t1, g1)• path 2: translate words to words+PoS (t2)
train-model.perl \ --corpus factored-corpus/proj-syndicate.1000 \ --root-dir morphgen-backoff \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+3-2+0-0,2 \ --generation-factors 1-2+1,2-0 \ --decoding-steps t0,g0,t1,g1:t2
Summary on Factored Models
Add word-level factors• linguistic features• source and target language factors
Phrase-based SMT with factors• multiple translation steps with factors• generation steps• multiple language models over single factors
Decoding• fits well in standard log-linear framework• but is much slower