6
Factored Models Word-Based SMT Models Bakom huset hittade polisen en stor mängd narkotika . Bakom huset huset hittade polisen en stor mängd mängd narkotika . Behind the house found police a large quantity of narcotics . Behind the house police found a large quantity of narcotics . Fertility (and NULL insertion) Word translation Re-ordering (distortion) Phrase-Based SMT Models Bakom huset hittade polisen en stor mängd narkotika . Behind the house found police a large quantity of narcotics . Behind the house police found a large quantity of narcotics . (The document-level decoder docent also implements a phrase-based SMT model!) Problems with These Models No use of morphology: treat inflectional variants (“look”, “looks”, “looked”) as completely different words! in learning translation models: knowing how to translate “look” doesn’t help to translate “looks” Works fine for English (and reasonable amounts of data) Problems: morphologically rich languages sparse data sets flexible word order

Factored Models - Tieteenalat | Humanistinen tiedekunta...Factored Translation Models Syntax-Oriented Statistical Models Example-based MT Factored models: Results & Summary Some Results

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Factored Models - Tieteenalat | Humanistinen tiedekunta...Factored Translation Models Syntax-Oriented Statistical Models Example-based MT Factored models: Results & Summary Some Results

Factored Models

Word-Based SMT ModelsWord-based SMT: Generative Model

Bakom huset hittade polisen en stor mängd narkotika .

Bakom huset huset hittade polisen en stor mängd mängd narkotika .

Behind the house found police a large quantity of narcotics .

Behind the house police found a large quantity of narcotics .

1 Fertility2 Word translation3 Output ordering

Fertility (and NULL insertion)Word translationRe-ordering (distortion)

Phrase-Based SMT ModelsPhrase-based SMT: Generative Model

Bakom huset hittade polisen en stor mängd narkotika .

Behind the house found police a large quantity of narcotics .

Behind the house police found a large quantity of narcotics .

1 Phrase segmentation2 Phrase translation3 Output ordering(The document-level decoder docent also implements a phrase-based SMT model!)

Problems with These Models

No use of morphology:• treat inflectional variants (“look”, “looks”, “looked”) as

completely different words!• in learning translation models: knowing how to translate

“look” doesn’t help to translate “looks”

Works fine for English (and reasonable amounts of data)

Problems:• morphologically rich languages• sparse data sets• flexible word order

Page 2: Factored Models - Tieteenalat | Humanistinen tiedekunta...Factored Translation Models Syntax-Oriented Statistical Models Example-based MT Factored models: Results & Summary Some Results

Factored Models

Morphology• is productive• well understood• generalizable patterns

Factored models• learn translations of base forms• learn to map morphology• learn to generate target surface form

Factored Models

Represent words by factors

4

Factored translation models

• Factored represention of words

word word

part-of-speech

OutputInput

morphology

part-of-speech

morphology

word class

lemma

word class

lemma

......• Goals

– Generalization, e.g. by translating lemmas, not surface forms– Richer model, e.g. using syntax for reordering, language modeling)

Koehn, U Edinburgh ESSLLI Summer School Day 5

5

Related work

• Back off to representations with richer statistics (lemma, etc.)[Nießen and Ney, 2001, Yang and Kirchhoff 2006, Talbot and Osborne 2006]

• Use of additional annotation in pre-processing (POS, syntax trees, etc.)[Collins et al., 2005, Crego et al, 2006]

• Use of additional annotation in re-ranking (morphological features, POS,syntax trees, etc.)[Och et al. 2004, Koehn and Knight, 2005]

→ we pursue an integrated approach

• Use of syntactic tree structure[Wu 1997, Alshawi et al. 1998, Yamada and Knight 2001, Melamed 2004,Menezes and Quirk 2005, Chiang 2005, Galley et al. 2006]

→ may be combined with our approach

Koehn, U Edinburgh ESSLLI Summer School Day 5109

Factored Models

Represent words by factors? Why?• combine scores for translating various factors• back-off to other factors (lemma)• use various factors for reordering• better word alignment (?)

Better generalization• can translate words that we haven’t seen in training• better statistics for translation options

Richer model (more (linguistic) information)• PoS, syntactic function, semantic role, ...

Remember Transfer-Based Systems?

source language target language

anal

ysis

generation

transfer

create a factoredrepresentation

generate surface formfrom factoredrepresentation

Page 3: Factored Models - Tieteenalat | Humanistinen tiedekunta...Factored Translation Models Syntax-Oriented Statistical Models Example-based MT Factored models: Results & Summary Some Results

Factored Models (Example)Factored Translation Models Syntax-Oriented Statistical Models Example-based MT

Decomposing translation: example

lemma lemma

part-of-speech

OutputInput

morphology

part-of-speech

word word

Itranslate lemma and POS separately

Igenerate surface word forms from translated factors

Jörg Tiedemann 5/37

translation steps

generation stepanalysis step

Factored Models

Integrate with phrase-based SMT• phrase translations over various factors• single factors or combined factors• easy to integrate as additional feature functions• add probabilistic generation models in global search• language models over various factors

Segmentation level• n-grams for translation of factors• words for generation

Factored Models

source language target language

anal

ysis

create a factoredrepresentation

generation

translation translate andgenerate surface formfrom factoredrepresentationdecoding

Factored Models

Framework - not a single model• any kind of word-level factor is possible• any combination of translation / generation steps• combination of alternative translation paths

Training• extract phrase translations from factored bitexts• compute generation models from factored corpora

Decoding• many more options need to be considered• slow decoding

Page 4: Factored Models - Tieteenalat | Humanistinen tiedekunta...Factored Translation Models Syntax-Oriented Statistical Models Example-based MT Factored models: Results & Summary Some Results

Training Factored Translation Models

IBM word alignment + symmetrizationPhrase extraction + scoring with MLE

Factored Translation Models Syntax-Oriented Statistical Models Example-based MT

Training Factored models

I use GIZA++ word alignments again + symmetrizationI phrase extraction + scoring on factorsI (could also use several factors in one step!)

natürlichhatjohnspassam

spiel

naturally

john

has

fun

with

the

game

) natürlich hat john — naturally john has

ADV

V

NNP

NN

P

NN

ADV

NNP

V NN

P DET

NN

) ADV V NNP — ADV NNP V

! create phrase tables for each translation factorJörg Tiedemann 7/37

Factored Translation Models Syntax-Oriented Statistical Models Example-based MT

Training Factored models

I use GIZA++ word alignments again + symmetrizationI phrase extraction + scoring on factorsI (could also use several factors in one step!)

natürlichhatjohnspassam

spiel

naturally

john

has

fun

with

the

game

) natürlich hat john — naturally john has

ADV

V

NNP

NN

P

NN

ADV

NNP

V NN

P DET

NN

) ADV V NNP — ADV NNP V

! create phrase tables for each translation factorJörg Tiedemann 7/37

Training Factored Generation Models

Generation steps map target factors to target factors• typically trained on target side of parallel corpus• may be trained on additional monolingual data

Example: The/det man/nn sleeps/vbz• count collection

- count(the,det)++- count(man,nn)++- count(sleeps,vbz)++

• evidence for probability distributions (MLE)- p(det|the), p(the|det)- p(nn|man), p(man|nn)- p(vbz|sleeps), p(sleeps|vbz)

Factored Models

Use benefits of general phrase-based SMT!• factored models as alternative paths (or backoff)

Factored Translation Models Syntax-Oriented Statistical Models Example-based MT

Factored modelsI Basic phrase-based SMT is very powerful!I Why generalizing if we know specific translation?

lemma lemma

part-of-speech

OutputInput

morphology

part-of-speech

word wordor

I prefer surface model for known wordsI use morphgen model as back-off

Jörg Tiedemann 8/37

Factored Models

Could use generation step to enrich language model

No factored translationEnable PoS LM (better model for word order?!)

Factored Translation Models Syntax-Oriented Statistical Models Example-based MT

Factored models

I could also use factors only to enrich target language

word word

part-of-speech

OutputInput

! generalized re-ordering (include POS-LM!)

Jörg Tiedemann 9/37

Page 5: Factored Models - Tieteenalat | Humanistinen tiedekunta...Factored Translation Models Syntax-Oriented Statistical Models Example-based MT Factored models: Results & Summary Some Results

Factored Models (Results)

Do not always lead to improvementsFactored Translation Models Syntax-Oriented Statistical Models Example-based MT

Factored models: Results & Summary

Some Results (German/English, Koehn):

System In-domain Out-of-domainBaseline 18.19 15.01

With POS LM 19.05 15.03Morphgen model 14.38 11.65Both model paths 19.47 15.23

I factors on token levelI flexible SMT frameworkI many possible factors & translation/generation stepsI not much success yet ...

Jörg Tiedemann 10/37

Factored Models

Full support in Moses:• http://www.statmt.org/moses/?n=Moses.FactoredTutorial

Data Format (example):

• 4 source language factors (word|lemma|pos|morph)• 3 target language factors (word|lemma|pos)

==> factored-corpus/proj-syndicate.de <== korruption|korruption|nn|nn.fem.cas.sg floriert|florieren|vvfin|vvfin .|.|per|per

==> factored-corpus/proj-syndicate.en <== corruption|corruption|nn flourishes|flourish|nns .|.|.

Factored Models

Training example:

• translate surface words to surface + PoS• include surface word LM and PoS LM

train-model.perl \ --root-dir pos \ --corpus factored-corpus/proj-syndicate.1000 \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 0-0,2 \ --input-factor-max 4

Factored Models

Phrase table:

• source word translates to 2 factors (word+PoS)• 4 scores as usual

frage ||| issue|nn ||| 0.25 0.285714 0.25 0.166667 frage ||| question|nn ||| 0.75 0.625 0.75 0.416667 " ) , ein neuer film ||| "|" a|dt new|jj film|nn ||| 1 0.00403191 1 0.128157

Page 6: Factored Models - Tieteenalat | Humanistinen tiedekunta...Factored Translation Models Syntax-Oriented Statistical Models Example-based MT Factored models: Results & Summary Some Results

Factored Models

A more complex example:

• translate lemmas• generate PoS from target language lemma• translate morphology into PoS• generate surface forms from lemma and PoS

train-model.perl \ --root-dir morphgen \ --corpus factored-corpus/proj-syndicate.1000 \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+3-2 \ --generation-factors 1-2+1,2-0 \ --decoding-steps t0,g0,t1,g1

Factored Models

Multiple decoding paths:

• path 1: same as previous model (t0, g0, t1, g1)• path 2: translate words to words+PoS (t2)

train-model.perl \ --corpus factored-corpus/proj-syndicate.1000 \ --root-dir morphgen-backoff \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+3-2+0-0,2 \ --generation-factors 1-2+1,2-0 \ --decoding-steps t0,g0,t1,g1:t2

Summary on Factored Models

Add word-level factors• linguistic features• source and target language factors

Phrase-based SMT with factors• multiple translation steps with factors• generation steps• multiple language models over single factors

Decoding• fits well in standard log-linear framework• but is much slower