30
Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University Karlsruhe Speech-to-Speech Translation Workshop ESSLLI 2002, Trento, Italy

Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

Embed Size (px)

Citation preview

Page 1: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

Direct Translation Approaches:Statistical Machine Translation

Stephan Vogel, Alicia Tribble

Interactive Systems LabCarnegie Mellon University &University Karlsruhe

Speech-to-Speech Translation WorkshopESSLLI 2002, Trento, Italy

Page 2: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 2

Overview

Translation ApproachesStatistical Machine TranslationTranslating with Cascaded TransducersExperiments on Nespole Data

Page 3: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 3

Translation Approaches

Interlingua basedTransfer basedDirect Example based Statistical

Page 4: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 4

Statistical Machine Translation

Based on Bayes´ Decision Rule:

ê = argmax{ p(e | f) } = argmax{ p(e) p(f | e) }

Page 5: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 5

Tasks in SMT

Modelling build statistical models which capture characteristic features of translation equivalences and of the target language

Training train translation model on bilingual corpus, train language model on monolingual corpus

Decoding find best translation for new sentences according to models

Page 6: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 6

Alignment Example

Page 7: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 7

Translation Models

IBM1 – lexical probabilities onlyIBM2 – lexicon plus absolut positionHMM – lexicon plus relative positionIBM3 – plus fertilitiesIBM4 – inverted relative position alignment IBM5 – non-deficient version of model 4

[Brown, et.al. 93, Vogel, et.al. 96]

Page 8: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 8

HMM Alignment Model

p(f|e) = a p(f1J, a1

J | e1I)

= a j p(fj , aj | f1j-1, a1

j-1, e1

I)

= a j p(aj | aj-1) p(fj | ea(j))

~ maxa j p(aj | aj-1) p(fj | ea(j))Alignment aj of current word fj depends on alignment aj-1 of previous word fj-1 .

Page 9: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 9

Phrase Translation

Why? To capture context Local word reordering

How? Train alignment model Extract phrase-to-phrase translations from Viterbi path

Notes: Often better results when training target to source for

extraction of phrase translations Phrases are not fully integrated into alignment model,

they are extracted only after training is completed

Page 10: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 10

Translation with Transducers

Transducer: Finite state machine Read sequence of words, write sequene of words Output vocaculary can be different from input vocabulary

Transducer used in current implementation: Tree Transducer, i.e. prefix tree over input strings Output from final states Used to encode lexicon, phrase translations, bilingual word classes and grammers

Page 11: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 11

Cascaded Transducers

Generalization through cascaded transducers:Replace words by category labels and have a transducer for each category

[Vogel, Ney 2000]

Page 12: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 12

Language Model

Standard n-gram model:

p(w1 ... wn) = i p(wi | w1... wi-1)

= i p(wi | wi-2 wi-1) trigram

= i p(wi | wi-1) bigram

Many events not seen -> smoothing required

Page 13: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 13

Decoding Strategies

Sequential construction of target sentence Extend partial translation by words which are

translations of words in the source sentence Language model can be applied immediately Mechanism to ensure proper coverage of

source sentence required

Left – right over source sentence Find translations for sequences of words Construct translation lattice Apply language model and select best path

Page 14: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 14

Translation Graph

Page 15: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 15

Speech Recognition and Translation

Search best string in target language for given acoutsic signal in source language

ê = argmax{ p(e) p(x|e) } = argmax{ p(e) f p(f,x|e) }

= argmax{ p(e) f p(f|e) p(f) p(x|f,x) } = argmax{ p(e) f p(f|e) p(f) p(x|f) }

i.e. recognizer language model not needed !?[Ney, 2001]

Page 16: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 16

Coupling Recognition and Translation

Sequential – first recognition, then translation First best recognition hypothesis N-best list – translate n times Word lattice – translate all pathes in lattice, reuse results

from partial pathes

Integrated – recognition and translation in combined search

Subsequential transducer approach uses this

Note: In Eutrans project best results when translation on first-best hypothesis

Page 17: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 17

Example-Based Machine Translation

Re-use translations to create new translations:Store bilingual corpus with (partial) alignmentFind partial matches, i.e. sequences of words in stored corpus to cover a new sentence Extract translation(s) and build translation latticeApply language model to find best path, i.e. best translation

Page 18: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 18

Nespole Experiments

Application of direct translation techniques to dialogue data collected in Nespole!Testing the effect of phrase translationExperiments with additional knowledge sources Preexisting: monolingual data for the LM and

publically available Lexica Engineered: handwritten rules for fixed

expressions and knowledge extracted from semantic grammars

Page 19: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 19

Nespole Project Data

CMU database of dialogues in the travel domainGerman, English (Italian, French)Speech recognizer hypotheses and human transcriptions both availableSegmented into SDUs (Speech Dialogue Units)

Page 20: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 20

Nespole Corpus: Training

Language English German

Tokens 15572 14992

Vocabulary 1032 1338

Singletons 404 620

3182 Parallel SDUs

Page 21: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 21

Nespole Corpus: Testing

German Reference A Reference B

Tokens 437 610 607

Vocabulary 183 (45 OOV) 165 160

70 Parallel SDUs

Page 22: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 22

0 1 2 3 4 5 6 7 8 9 10

English

German

0 2 4 6 8 10

English

German

Testing Data

Training Data

Corpus Challenges: Sentence Length

Page 23: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 23

Evaluation

Human Scoring Good, Okay, Bad (c.f. Nespole evaluation) Collapsed into a „human score“ on [0,1]

Bleu Score Average of N-gram precisions from (1..N),

typically N=3 or 4 Penalty for short translations to substitute

for recall measure

[Papinini et.al. 2001]

Page 24: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 24

Phrase Translation

Unequal sentence lengths means that training can be improved directionally: S T or T SGerman compounds are better for 1 to many alignments with English multiword phrases, so direction is importantStatistical lexicon alone

Statistical lexicon, phrases from S T training

Statistical lexicon, phrases from bidir. training

0,1903 0,2350 0,2654

Page 25: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 25

Language Model

Monolingual text available from Verbmobil 500.000 words (32x the size of orig. English

corpus)Helps to choose among translation hypotheses but will not generate new ones

Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and small LM

Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and large LM

0,2613 0,3172

Page 26: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 26

General-Purpose Lexicon

Statistical lexicon, phrases, and fixed exp´s with small LM

0,2654

Adding general-purpose lexicon as a transducer

0,2522

Using large instead of small LM

0,3141

general-purpose lexicon as training data instead of separate transducer

0,3275

Page 27: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 27

Fixed Expression Rules

Transducer rules are human readable and can be added by handFixed expressions for times and dates are re-usable, require less time to build than domain-specific rules and improve coverage of some semi-idiomatic constructions.

Statistical lexicon with small LM

Statistical lexicon and fixed-expression transducer with small LM

0,1893 0,1903

Page 28: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 28

Knowledge from Existing Grammars

Could help in domain- but not language- portabilityBenefit mostly in additional vocabulary Statistical lexicon, fixed exp´s, phrases, and general lexicon with large LM

Statistical lexicon, fixed exp´s, phrases, general lexicon and I-transducer with large LM

0,3141 0,3172

Page 29: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 29

Comparative Evaluation Results

Good Okay Bad Score Bleu

Text IF 77 104 227 0,32 0,068

SMT 127 80 205 0,40 0,333

Speech

IF 64 101 243 0,28 0,059

SMT 95 83 227 0,34 0,262

Page 30: Direct Translation Approaches: Statistical Machine Translation Stephan Vogel, Alicia Tribble Interactive Systems Lab Carnegie Mellon University & University

16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 30

Selected References

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 1993, 19,2,  pp.263—311

Stephan Vogel, Hermann Ney, Christoph Tillmann. HMM-Based Word Alignment in Statistical Translation. Int. Conf. on Computational Linguistics, Kopenhagen, Danemark, pp. 836-841, August 1996.

Stephan Vogel, Hermann Ney. Translation with Cascaded Finite State Transducers. 36th Annual Conference of the Association for Computational Linguistics, pp. 23-30, Hongkong, China, October2000.

Stephan Vogel, Alicia Tribble. Improving statistical machine translation for a speech-to-speech translation task. To appear in ICSLP 2002.

H. Ney. The Statistical Approach to Spoken Language Translation. Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Trento, Italy, 8 pages, CD ROM, IEEE Catalog No. 01EX544, December 2001.

Kishore Papinini, Salim Roukos, Todd Ward, Wei-Jing Zhu. Bleu: a Method for Automatic Evaluation ofMachine Translation. IBM Research Report RC22176(W0109-022), September17, 2001.