240
0 Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding Final Presentation Philipp Koehn, Marcello Federico, Wade Shen, Nicola Bertoldi, Chris Callison-Burch, Ondrej Bojar, Brooke Cowan, Chris Dyer, Hieu Hoang, Richard Zens, Alexandra Constantin, Evan Herbst, Christine Moran 17 August 2006 Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

0

Open Source Toolkitfor Statistical Machine Translation:

Factored Translation Modelsand Lattice Decoding

Final Presentation

Philipp Koehn, Marcello Federico, Wade Shen, Nicola Bertoldi,Chris Callison-Burch, Ondrej Bojar, Brooke Cowan,

Chris Dyer, Hieu Hoang, Richard Zens,Alexandra Constantin, Evan Herbst, Christine Moran

17 August 2006

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 2: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Schedule

• First session: Overview and toolkit development

– Factored models and confusion network decodingKoehn, Federico

– Moses toolkitHoang, Dyer, Herbst, Callison-Burch, Bertoldi

• Second session: Experiments

– Experiments in small data settingsShen, Bojar, Moran, Cowan

– Factored models for morphological rich languagesDyer, Koehn, Cowan, Constantin

– Confusion network experimentsZens

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 3: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Accomplishments

• Open source toolkit

– advances state-of-the-art of statistical machine translation models– best performance of European Parliament task– competitive on IWSLT and TC-Star

• Factored models

– outperform traditional phrase-based models– framework for a wide range of models– integrated approach to morphology and syntax

• Confusion networks

– exploit ambiguous input and outperform 1-best– enable integrated approach to speech translation

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 4: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

Phrase-Based Translation

er geht ja nicht nach hauseer geht ja nicht nach hause

he does not go home

• Foreign input is segmented in phrases

– any sequence of words, not necessarily linguistically motivated

• Each phrase is translated into English, phrases are reordered

• Log linear model: Many feature functions hi(e, f) with weights λi combinedto overall score

∑i λihi(e, f) → easy to extend

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 5: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

Translation

• Task: translate this sentence from German into English

er geht ja nicht nach hause

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 6: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

Translation step 1

• Task: translate this sentence from German into English

er geht ja nicht nach hauseer

he

• Pick phrase in input, translate

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 7: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

Translation step 2

• Task: translate this sentence from German into English

er geht ja nicht nach hauseer ja nicht

he does not

• Pick phrase in input, translate

– it is allowed to pick words out of sequence (reordering)– phrases may have multiple words: many-to-many translation

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 8: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

7

Translation step 3

• Task: translate this sentence from German into English

er geht ja nicht nach hauseer geht ja nicht

he does not go

• Pick phrase in input, translate

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 9: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8

Translation step 4

• Task: translate this sentence from German into English

er geht ja nicht nach hauseer geht ja nicht nach hause

he does not go home

• Pick phrase in input, translate

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 10: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

9

Translation options

he

er geht ja nicht nach hause

it, it

, he

isare

goesgo

yesis

, of course

notdo not

does notis not

afterto

according toin

househome

chamberat home

notis not

does notdo not

homeunder housereturn home

do not

it ishe will be

it goeshe goes

isare

is after alldoes

tofollowingnot after

not tonot

is notare notis not a

• Phrase translation tables provide many translation options

• Learned from automatically word-aligned corpora

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 11: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

10

Translation options

he

er geht ja nicht nach hause

it, it

, he

isare

goesgo

yesis

, of course

notdo not

does notis not

afterto

according toin

househome

chamberat home

notis not

does notdo not

homeunder housereturn home

do not

it ishe will be

it goeshe goes

isare

is after alldoes

tofollowingnot after

not tonot

is notare notis not a

• The machine translation decoder does not know the right answer

→ Search problem solved by heuristic beam search

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 12: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

11

Decoding process: precompute translation optionser geht ja nicht nach hause

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 13: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

12

Decoding process: start with initial hypothesiser geht ja nicht nach hause

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 14: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

13

Decoding process: hypothesis expansioner geht ja nicht nach hause

are

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 15: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

14

Decoding process: hypothesis expansioner geht ja nicht nach hause

are

it

he

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 16: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

15

Decoding process: hypothesis expansioner geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 17: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

16

Decoding process: find best pather geht ja nicht nach hause

are

it

hegoes

does not

yes

go

to

home

home

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 18: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

17

Statistical machine translation today

• Best performing methods based on surface word phrases

– uses mapping of short chunks of text (mostly 1-3 words)– sophisticated methods for phrase extraction and modeling (EM algorithm,

generative models, discriminative training)

• Translation solely based on surface forms of words

– no use of explicit syntactic information– no use of morphological information

• How can be build richer models?

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 19: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

18

One motivation: morphology

• Current models treat house and houses as completely different words

– training occurrences of house have no effect on learning translation of houses– if we only see house, we do not know how to translate houses– rich morphology (German, Arabic, Finnish, Czech, ...) → many word forms

• Better approach combines evidence for house and houses

– analyze surface word forms into lemma and morphologye.g.: Haus +plural

– translate lemma and morphology separatelye.g.: Haus → house; +pl → +pl

– generate target surface forme.g.: house +pl → houses

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 20: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

19

Factored translation models

• Factored represention of words

word word

part-of-speech

OutputInput

morphology

part-of-speech

morphology

word class

lemma

word class

lemma

......• Benefits– generalization, e.g. by translating lemmas, not surface forms– richer model, e.g. using syntax for reordering, language modeling)

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 21: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

20

Example factored model

• Our example as factored model:

lemma lemma

OutputInput

morphologymorphology

word word

• Translation process broken up into mapping steps

– translation of lemma– translation of morphology– generation of word from lemma, morphology

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 22: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

21

Expansion of input phrase

• Probabilistic mapping steps– translation step: lemma → lemma

haus → house, home, chamber, ...– translation step: morphology → morphology

single-noun → single-noun, single-pronoun, plural-noun, ...– generation step: lemma,morphology → word

house,single-noun → househouse,plural-noun → houses

• Still a phrase model– translation steps may map phrases

nach hause → home, return home– generation steps operate on single words– traditional phrase-models are special case: single-factor models

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 23: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

22

Computational complexity of mapping steps

• Number of factored expansions may grow exponentially

• Key insights to reduce complexity for a given input sentence:– expansions can be pre-computed and stored as translation options,– pruning translation options early

• Future work: problems with more complex models need to be addressed

– we had problems using some models with three steps or more– see student proposals (Hoang, Dyer) for solutions

Philipp Koehn et al., JHU 2006 WS on MT Final Presentation 17 August 2006

Page 24: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Spoken Language Translationwith Confusion Networks

Marcello Federico, Nicola Bertoldi, Wade Shen, Richard Zens

August 17, 2006

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 25: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Outline

• Spoken language translation

• Approaches to SLT

• Confusion network decoding

• Computational issues

• Implementation in Moses

• Language model interface

• Other applications of confusion networks

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 26: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Spoken Language Translation

Translation from speech input is likely more di!cult than translation from textinput:

• many styles and genres:formal read speech, unplanned speeches, interviews,spontaneous conversations, ...

• less controlled language:relaxed syntax, spontaneous speech phenomena

• automatic speech recognition is prone to errors:possible corruption of syntax and meaning

This work addresses methods to improve performance of spoken languagetranslation by better integrating speech recognition and machine translationmodels.

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 27: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

Integrating Speech Recognition and Translation

• Correlation between transcription word-error-rate and translation quality:

38.5

39

39.5

40

40.5

41

41.5

42

42.5

14 15 16 17 18 19 20 21

BLEU SCORE

WER OF TRANSCRIPTIONS

• Better transcriptions have been possibly analyzed during ASR decoding butdiscarded due to lower scores

• Potential for improving translation quality by exploiting more transcriptionhypotheses generated during ASR.

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 28: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

Statistical Spoken Language Translation

• Let o be be spoken input in the foreign language

• let F(o) be a set of possible transcriptions of o

Goal: find the best English translation through the approximate criterion:

e! = arg maxe

Pr(e | o) ! arg maxe

maxf"F(o)

Pr(e, f | o)

Pr(e, f | o) is computed with a log-linear model incorporating:

• acoustics features, i.e. probs that some foreign words are in the input

• linguistic features, i.e. probs of foreign and English sentences

• translation features, i.e. probs of foreign phrases into English

• alignment features: i.e. probs for word re-ordering

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 29: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

ASR Word Graph

A very general set of transcriptions F(o) can be represented by a word-graph:

• directly computed from the ASR word lattice (e.g. HTK format, lattice tool)

• provides a good representations of all hypotheses analyzed by the ASR system

• arcs are labeled with words, acoustic and language model probabilities

• paths correspond to transcription hypotheses for which probabilities can becomputed

!

"

#$%

&''

#$%

&'(

)

&'*

#$%

&'+#$%

&

,-.

/,-.

0,-.

',-. (

,-.

*

,-.

+

,-.

1

,-.

"!

,-.

""

,-.

"&

,-. "/

,-.

"0

,-.

"'

,-.

"(

,-.

"*

,-.

"+

,-.

"1

,-.

&(&

)

&(/

)

&(0

)

&(')

&(()

&(*)

&(+)

#$%

#$%

#$%

#$%

#$%

#$%

#$%

#$%

#$%

#$%

#$% #$

%

#$%

#$%

#$%

#$%

#$%

&!

2.32,445 &"

2.32,445

&&

2.32,445

&/

2.32,445

&0

2.32,445

&'

2.32,445

&(

2.32,445

&1

2.32,445

//2.32,445

#$%

#$%

#$% #$

% #$% #$

% &*67

&+

68#$

%

/!

#$%

/0

#$%

67 /"

67 /&

67

#$%

68

'+

67

'167

(!

67

#$%

#$%

#$%

/'

68

/(

68

/*68

/+

68

#$%

#$%

#$%

#$%

/1

79:.-25

0!

79:.-25

0"

79:.-25

0&

79:.-25

0/

79:.-25

00

79:.-25

0'

79:.-25

0(

79:.-25

0*

79:.-25

0+

79:.-25

01

79:.-25

'!

79:.-25

'"

79:.-25

'&

79:.-25

'/

79:.-25

'0

79:.-25

''

79:.-25

'(

79:.-25

#$%

#$%

#$%

#$% #$

% #$% #$

%

#$%

#$% #$

% #$%

#$%

#$%

#$% #$

%

#$%

#$%

'*#$%

/&+

;<=>

#$%

#$%

("

79:.-25 (&

79:.-25

(/

79:.-25

(0

79:.-25

('79:.-25

((79:.-25

(+

79:.-25

*/

79:.-25

*'

79:.-25

*(79:.-25

*+

79:.-25

1!

79:.-25

1/

79:.-25

10

79:.-25

1+

79:.-25

&&&

79:.-25

&&(

79:.-25

&&+

79:.-25

&0!

79:.-25

#$%

#$% #$%

#$%

#$%

(*)#$

%

*"#$%

(1

4.

*!

,)

*&

9.#$

%

&!14?

&"!4.

&""

4.

&"&

4.

&"/

4.

&"04.

&"(

9.

&"*

9.

&"1

4?

&&!

4?#$

%

"1+

#$%

*0

#$%

#$%

)#$

% "+*

.,

"*/#$%

#$%

"(0

5)

"++.

"11

,

**

@5#$

%

"0+

@.

"''

.4

"'1

A3

"('

5

"*0

)

"+1

.

&!!

,

+(

#$%

*1

AB

+!AB+&

6.

+/

A38

+0

6.

@5

+*

.6

++

!

+1.448#$

%

"00

.448"0(

73

"01

@.

"'(

.4

"(!

A3

"((

5

"**

)

"1!

.

&!"

,

#$%

+"

#$%

;<=>

#$%

"/(

#$%

+'

#$%

;<=>

1"

#$%

"&0

#$%

1&

#$%

#$%

@5!

#$%

"/!

@5C

"/&

@5

"//

@5

"/0

@5

A38

"/1

74

"0&

5@

"'!

@.

"'*

.4

"("

A3

"(*

5

"*!

5

"*"

5

"*&

5

"*+

)

"1&

.

&!&,

#$%

""(

#$%

#$% ""&

A!

""*!

""+

!

"&!48.6

"&'

7

"&+.6"/*A38

"0!

74

"'"

@.

"'&@. "'/

@.

"(&

A3

"*1

)

"1'

.

"1(

.

"1*

.

&!/

,

1'

A@

1(

A@

#$%

"!&

.7

"!0

.@

"!'

?

"!(B

"!+

.@

""!

?""/

A

""0

A"&"

48

"&(

7

"+!

)

&!0

,

#$%

1*#$%

;<=>

11

67

"!!

=7

"&&

48

"&*

7

"+")

&!'

,#$

%

&'/#$%

"!"

#$%

;<=>

"!/

#$%

;<=>

#$%

#$%

"!*

#$%

;<=>

"!1

#$%

;<=>

"""

#$%

;<=>

#$% #$%

""'#$%

;<=>

#$%

#$%

""1

#$%

;<=>

#$%

#$%

"&/#$%

;<=>

#$%

#$%

#$%

&&'

#$%

"&1

#$%

;<=>

"/"

#$%

;<=>

#$%

#$%

"/'

#$%

;<=>

#$%

"/+

#$%

;<=>

#$%

"0"

#$%

;<=>

"0/#$% ;<=

>

"0'

#$%

;<=>

"0*

#$% ;<=>

#$%

#$%

#$%

#$%

#$%

"'0#$%

;<=>

#$%

#$%

"'+

#$%

;<=>

#$%

#$%

#$%

"(/

#$%

;<=>

#$% #$

%

#$%

"(+.#$

%

"(1

#$%

;<=>

#$%

#$%

&/0

#$%

#$%

"*'

A3

"*(

.#$

%

"+/#$%

"+&#$%

#$%

.A3

"+'

.

#$%

#$%

#$%

&'"#$%

#$%

"+0

#$% ;<=

>

"+(

#$%

;<=>

#$%

#$%

#$%

"1"

.#$

%

"1/#$%

.#$

%

"10#$%

;<=>

#$%

#$%

&//

#$%

#$%

#$%

#$%

#$%

&!(

.

&!*

.

#$% #$%

#$%

&01

#$%

#$% &!+

#$%

;<=>

#$%

#$%

#$%

#$%

#$%

&"'

#$%

;<=>

#$%

&"+

#$%

;<=>

#$% &&"#$

% ;<=>

&&/

2

&&0:7

#$%

,)

67

&/"

#$%

&&1#$%

&&*

#$%

7#$

%

&/*

#$%

:2

&/&

#$%

&/'

687

&/1

#$%

#$%

&0'

38

&0*

3

&/!#$% ;<=

>

&0"

#$%

.5

&/+

7

&'!

,

&'&

)

;<=>

;<=>

&/(#$%

;<=>

#$% ;<=>

;<=>

2

&0/D

&0&#$% ;<=

>

&00#$% ;<=>

&0(#$%

;<=>&0+#$

%

;<=>

#$%

;<=>

#$%

;<=>

&'0

#$%

;<=>

&(")

&'1#$%

&(!)

)

#$%

#$%

#$%

#$% #$%

#$%

#$%

#$%

#$%

&(1

E.2.3F.

&*!

E.2.3F.

&*"

E.2.3F.

&*&

E.2.3F.

&*/

E.2.3F.

&*0

E.2.3F.

&*'

E.2.3F.

&*(

E.2.3F.

#$%

#$%

#$%

#$%

#$% &114.

#$%

&**

48 /!!

4. #$%

&*+

48

/!"4.

/!&4.

/!/4. /!04.

/!'

4.

/!(

4.

/!*

4.

/!+

4.

&*1

48

&+!48

&+"48

&+&48 &+/

48

&+048 &+'

48

&+(

48#$%

#$%

#$%

#$%

#$%

#$%

#$%

#$%

#$%

&+*79:.-25

&++

79:.-25 &+1

79:.-25

&1!

79:.-25

&1"

79:.-25 &1&

79:.-25

&1/79:.-25

&10

79:.-25

&1'

79:.-25

&1(

79:.-25

&1*

79:.-25

#$%

#$% #$%

#$%

#$%

#$%

#$%

#$%

#$%

#$%

&1+#$%

;<=>

#$%

#$%

#$%

#$%

#$%

#$% #$

%

#$%

#$%

/!1

67

/"!

67

/""

67

/"&67 /"/

67

/"0

67

/"'

67

/"(

67

#$% #$

%

#$%

#$% #$%

#$%

#$%

/"*:.- /"+

:.-

/"1:.- /&!

:.-

/&"

:.-

#$%

#$%

#$%

#$%

/&&

253

/&/253 /&0

253

/&'

253

/&(253

#$%

#$%

#$%

#$%

/&*#$%

;<=>

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 30: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

Approaches to Spoken Language Translation

The previous statistical framework includes several alternative implementations:

• 1-best translation:translate only the most probable hypothesis in the word graph

– pros: very e!cient– cons: no potential to recover from recognition errors in the 1-best

transcription

• N-best translation:translate only the N–most probable hypotheses in the word-graph

– pros: can exploit more accurate transcriptions in the word graph– cons: N must be large in order to include good transcriptions, and

decoding time increases linearly with N

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 31: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

7

Approaches to Spoken Language Translation

• Transducer:compose word-graph with a translation FSN and apply a transducer algorithm

– pros: straightforward method that permits to work on the full word graph– cons: computationally prohibitive with large vocabulary tasks and long range

word re-ordering

• Confusion network:translate a suitable approximation of the WG

– pros: it permits to e"ectively explores all paths in the word-graph, with noproblems in re-ordering

– cons: can only exploit limited information in the word graph

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 32: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8

Confusion Network(Mangu 1999)

A confusion network approximates a word graph with a linear network, s.t.:

• arcs are labeled with words or with the empty word ( !-word)

• arcs are weighted with word posterior probabilities

• paths are a superset of those in the word graph

! "#$% &

'()*'+

$

,

-).-'//0

1)-).

2'+$/) 3

45/5

)//5'+

6

789)(-0

9)('+

:'+$! ;

'+$*

'0

-0.1)

-47

45<

=<>

..5

0>7.->'$70??0+.0)-5/0> @'+$>) "!

'+$)

)/>0

/))4

/5<.5

8)7/

/A<+)>)

//54)!

)

""

9)

?''+

$

"&'+$B ",'+$-7 "2

'+$*

'0

)7

>0>)-

"3#C$% "6DEF

CNs can be conveniently represented as a sequences of columns of di"erent depth.

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 33: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

9

Confusion Network Decoding:

Extension of basic phrase-based decoding step:

• cover some not yet covered consecutive columns (span)

• retrieve phrase-translations for all paths inside the columns

• compute translation, distortion and target language models

Example. Coverage set: 01110... Path: cancello d’

0 1 1 1 0 ...era 0.997 cancello 0.995 ! 0.999 di 0.615 imbarco 0.999 ...e 0.002 vacanza 0.004 la 0.001 d’ 0.376 bar 0.001

! 0.001 ! 0.002 all’ 0.005

l’ 0.002

! 0.001

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 34: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

10

Confusion Network Decoding

Computational issues:

• Number of paths grows exponentially with span length

• Implies look-up of translations for a huge number of source phrases

• Factored models require considering joint translation over all factors (tuples):– cartesian product of all translations of each single factor

Solutions implemented into Moses

• Source entries of the phrase-table are stored with prefix-trees

• Translations of all possible coverage sets are pre-fetched from disk

• E!ciency achieved by incrementally pre-fetching over the span length

• Phrase translations over all factors are extracted independently, then translationtuples are generated and pruned by adding a factor each time

Once translation tuples are generated, usual decoding applies.

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 35: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

11

Implementation into Moses

• Input Format: CN input can be rather large, so better to put one word-positionper line:

Haus 0.1 aus 0.4 Aus 0.4 eps 0.1der 0.9 eps 0.1Zeitung 1.0

each line represents alternatives with their probability.

• Factored confusion networks: alternatives are over the full factor space:

Haus|N 0.1 aus|PREP 0.4 Aus|N 0.4 eps|eps 0.1der|DET 0.1 der|PREP 0.8 eps|eps 0.1Zeitung|N 1.0

Notice: confusion network can be projected over single factors.

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 36: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

12

Implementation into Moses

Decoding CN with Factored Models

• at each step of the search process, a portion of the CN is explored, e.g.

... ...Haus | N 0.1 aus|PREP 0.4 Aus|N 0.4 eps|eps 0.1der|DET 0.1 der|PREP 0.8 eps|eps 0.1Zeitung|N 1.0... .... ... ...

.... and translations are looked up for each factor.

Features:

• E!ciency by pre-filtering possible translations for each factor

• Decoding of confusion networks is completely hidden to the decoder.

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 37: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

13

Other Applications of Confusion Networks

Translation tasks with ambiguous input:

• linguistic annotation for factored models

– avoid hard decision by linguistic tools but rather provide alternativeannotations with respective scores:

– e.g. particularly ambiguous part of speech tags

• insertion of punctuation marks missing in the input– model all possible insertions of punctuation marks in the input

• translation of input similar to that produced by speech recognition– e.g. OCR output for optical text translation

• ....

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 38: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

14

Language Model Interface

• Features

– compact binary format for very large language model– quantization of probabilities (8 bits)– fast upload of language model from disk– upload of n-grams on demand

• Comparison with SRI LM Toolkit

– memory: 50% less with large quantized models– speed: 10% slower in decoding with 3-gram LM

• Recent work and improvements

– speed-up by directly storing log-probs– addition of cache memory on n-gram internal data strucure– analysis of LM score computations by search algorithm– caching of probabilities and LM states

the search algorithm requests the same probabilities many times

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 39: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

15

Requests of N-grams by Decoder

Requests of 3-gram probabilities during decoding of a single sentence. About1.6M requests involving about 120K 3-grams.

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 40: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

16

Conclusions

Implementation work

• E!cient on-demand pre-fetching of phrase translations

• Tuning of parameters for confusion network decoding

• Language model interface and pre-fetching of n-grams

Development of state-of-the-art baselines for SLT

• IWSLT BTEC Chinese-English SLT– submissions to IWSLT 2006 evaluation

• EPPS Spanish-English SLT– performance comparable with best TC-STAR systems

Achievements

• SLT decoder more e!cient wrt current implementations by IRST and MIT/LL

• works with large-data tasks and large confusion networks

• works with factored confusion networks

Marcello Federico, ITC-irst Trento Project Summary August 17, 2006

Page 41: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Engineering ResultsEngineering Results

JHUSWS 2006JHUSWS 2006

Page 42: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 2JHUSWS 2006

Open software, so what?Open software, so what?

State of the world, June 2006State of the world, June 2006““Black boxBlack box”” decoder (Pharaoh) widely useddecoder (Pharaoh) widely used

20+ citations in this year20+ citations in this year’’s ACL Proceedings alones ACL Proceedings aloneUbiquitous baseline systemUbiquitous baseline system

ButBut…… it is difficult to extendit is difficult to extendNew features limited to what can be expressed in the New features limited to what can be expressed in the existing phraseexisting phrase--table format (source, target, feature vector)table format (source, target, feature vector)Many interesting projects require reinventing the wheel just Many interesting projects require reinventing the wheel just to change one spoketo change one spoke

Page 43: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 3JHUSWS 2006

Software GoalsSoftware Goals

AccessibilityAccessibilityEasy to maintainEasy to maintainFlexibilityFlexibilityEasy for distributed team developmentEasy for distributed team developmentPortabilityPortability

Page 44: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 4JHUSWS 2006

AccessibilityAccessibility

Easy to readEasy to read““Nothing should be a black boxNothing should be a black box””Descriptive namesDescriptive namesUniform coding styleUniform coding style

Available immediatelyAvailable immediatelySource code on Source code on Sourceforge.netSourceforge.net

CrossCross--platform compatibilityplatform compatibilityWindows, Linux, Windows, Linux, MacOSMacOS X, 64 bit OSX, 64 bit OS

void Load(const std::string &fileName, FactorCollection &factorCollection, FactorType factorType, float weight, size_t nGramOrder);

Page 45: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 5JHUSWS 2006

Easy to MaintainEasy to Maintain

Modular codeModular codeTeam developmentTeam developmentObject oriented frameworkObject oriented framework

Integrated documentation frameworkIntegrated documentation frameworkUsing Using DoxygenDoxygenEasy to maintain Easy to maintain WikiWiki documentation on the Webdocumentation on the Web

Page 46: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 6JHUSWS 2006

DocumentationDocumentation

Page 47: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 7JHUSWS 2006

Page 48: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 8JHUSWS 2006

ExtensibilityExtensibility

Open architecture designed for extensibilityOpen architecture designed for extensibilityArchitecture matches theoretical descriptions of phraseArchitecture matches theoretical descriptions of phrase--based based MT modelsMT models

Short rampShort ramp--up time for researchers familiar with SMT but not with up time for researchers familiar with SMT but not with any particular decoderany particular decoder

Feature function evaluation decoupled from search Feature function evaluation decoupled from search algorithmsalgorithms

Facilitates experimentation with new classes of feature functionFacilitates experimentation with new classes of feature functionssModular designModular design

Framework to allow different replacements of all parts of the deFramework to allow different replacements of all parts of the decodercoderMultiple implementations of translation tablesMultiple implementations of translation tablesLanguage modelsLanguage modelsDifferent types of modelsDifferent types of models

Page 49: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 9JHUSWS 2006

Case Study: Lexicalized ReorderingCase Study: Lexicalized Reordering

Very successful Very successful model,model, but implementation not but implementation not possible with a possible with a ““black boxblack box”” decoderdecoderWith Moses, anyone with an idea can try itWith Moses, anyone with an idea can try itAdding support for LR models to Adding support for LR models to mosesmoses required code required code changes in four (relatively logical) locationschanges in four (relatively logical) locations

FeatureFeature--function base class (function base class (ScoreProducerScoreProducer) extended, logic ) extended, logic for feature value computation implementedfor feature value computation implementedEnable the model based on configurationEnable the model based on configurationCall to evaluate the feature function when extending a Call to evaluate the feature function when extending a hypothesishypothesisAdd the feature values to Add the feature values to nn--best list output for tuning best list output for tuning algorithmsalgorithms

Page 50: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 10JHUSWS 2006

Regression TestingRegression Testing

Regression TestingRegression TestingPharaoh scores used as baseline, which were updated Pharaoh scores used as baseline, which were updated as models changed (for example, hypothesis as models changed (for example, hypothesis recombination based on LM state rather than recombination based on LM state rather than nn--gram gram order)order)Detailed logging enables strict test coverage for all Detailed logging enables strict test coverage for all model typesmodel typesRegression test suite was run approximately 3000 Regression test suite was run approximately 3000 times during workshoptimes during workshop

Page 51: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 11JHUSWS 2006

AccomplishmentsAccomplishments

Code contributions from every member of the Code contributions from every member of the teamteamPerformance improvementsPerformance improvements

Day 1Day 1:: 5.01 sec/sentence 5.01 sec/sentence avgavg decoding timedecoding timeTodayToday:: 1.43 sec/sentence 1.43 sec/sentence avgavg decoding timedecoding time

Page 52: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Aug 17, 2006 12JHUSWS 2006

SummarySummary

State of the world, August 2006State of the world, August 2006““White boxWhite box”” multimulti--factored decoder (Moses) availablefactored decoder (Moses) available

DropDrop--in replacement for Pharaohin replacement for Pharaoh

Further experimentation and development anticipated at:Further experimentation and development anticipated at:Aachen, Charles University, Cornell, Edinburgh, IRST, MIT, Aachen, Charles University, Cornell, Edinburgh, IRST, MIT, Lincoln Labs, UMDLincoln Labs, UMD……and many more.and many more.

Page 53: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Software Goals

• Accessibility• Easy to maintain• Flexibility• Easy for distributed team development• Portability

Page 54: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Accessibility

• Easy to read– “Nothing should be a black box”– Descriptive names

– Uniform coding style• Available immediately

– Source code on Sourceforge.net• Cross-platform compatibility

– Windows, Linux, MacOS X, 64 bit OS

void Load(const std::string &fileName, FactorCollection &factorCollection, FactorType factorType, float weight, size_t nGramOrder);

Page 55: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase
Page 56: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Easy to Maintain

• Modular code– Team development– Object orientated framework

• Integrated documentation framework– Using Doxygen– Interactive Wiki documentation on the Web

Page 57: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase
Page 58: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase
Page 59: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

• Extendable– Flexibility

• Framework to allow different replacements of all parts of the decoder

• Multiple implementations of translation tables• Language models• Different types of models

– Code size • 10,000 at beginning of workshop• 16,000 now

Page 60: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

System tuning

• Log Linear Model

e∗ = arg maxe

Pr(e | f) = arg maxe

pλ(e | f) = arg maxe

∑i

λihi(e, f) (1)

• real valued feature functions:– model specific component of the translation process:

fluency, adequacy, reordering, ...– statistical models are estimated on specific training data

• feature weights:– balance ranges of feature scores– weight importance of features– tuned through Minimum Error Training (MET)

Nicola Bertoldi, ITC-irst Minimum Error Training August 17, 2006

Page 61: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Minimum Error Training

• automatic procedure to optimize feature weights

• minimization of translation errors

• development set (f , ref)

• automatic error function Err(e; ref): (100-BLEU) score

e∗ = e∗(λ) = arg maxe

pλ(e | f) (2)

λ∗ = arg minλ

Err(e∗(λ); ref) (3)

• Err(e) is not math-sound =⇒ no exact solution

• approximate iterative algorithm: gradient descent, downhill simplex

Nicola Bertoldi, ITC-irst Minimum Error Training August 17, 2006

Page 62: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

CLSP-WS solution for MET

Moses Extractor

Optimizer

input

weights

features

n-best

reference

score

Scorer

weights

1-best

inner loopouter loop

optimalweights

• outer loop:– decoding with actual lambdas– generation of nbest translations– addition to previous translations

• inner loop:– optimization over n-bests– decoder and ”random” weights

as initial points• optimizer:

– iterative optimization on single weights– discretization of the r-dimensional space of weights

Nicola Bertoldi, ITC-irst Minimum Error Training August 17, 2006

Page 63: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

MET vs. size of nbest list

• German-English EuroParl task

• tuning on dev set of 2000 sentences

• evaluation on test set of 2000 sentences

• convergence in 5-6 iterations:– good: faster outer loop

• no impact of size of nbest:– good: faster inner loop

18

19

20

21

22

23

24

25

26

0 2 4 6 8 10 12 14

BLE

U

iteration

100 nbest200 nbest400 nbest800 nbest

Nicola Bertoldi, ITC-irst Minimum Error Training August 17, 2006

Page 64: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

MET vs. size of development set

• extraction of 4 subsets:100, 200, 400, 800 sentences

• larger dev set:– more stable result– less iterations– better results

• bad:– overfitting– large dev set– slower outer loop (decoding)

0

5

10

15

20

25

30

0 2 4 6 8 10 12 14 16 18

BLE

U

iteration

100 sentences200 sentences400 sentences800 sentences

2000 sentences

iteration BLEU100 sentences 18 24.3200 sentences 15 25.1400 sentences 16 24.6800 sentences 14 24.9

2000 sentences 9 25.3

Nicola Bertoldi, ITC-irst Minimum Error Training August 17, 2006

Page 65: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

MET vs. optimization algorithm

• task: Spanish-English EPPS, speech input

• dev set of 2643 Confusion Networks, test set of 1073 CNs

• CLSP-WS algorithm vs. downhill simplex (RWTH)

iteration ∆ BLEUdev test

CLSP-WS algorithm 4 +1.0 +0.4downhill simplex 7 +2.9 +3.4

• mismatch between internal score of CLSP-WS algorithm and official score

• better performance of the downhill simplex algorithm

• post-workshop investigation

Nicola Bertoldi, ITC-irst Minimum Error Training August 17, 2006

Page 66: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Moses in parallel

• effective R&D cycle:– fast experiments

• computing facilities:– 6 clusters, 200 machines

• parallelization of translation

• ’split and merge’ technique

• translation time:– splitting/merging ≈ constant, negligible– access to cluster related to cluster load– loading data≈ constant– decoding ∝ input length

Moses

source input

part-1 part-N

Splitter

Moses

translation-N

Merger

translation

(remote) cluster of machines

translation-1

Nicola Bertoldi, ITC-irst Moses in parallel August 17, 2006

Page 67: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Moses in parallel

• Spanish-English EuroParl task

• CLSP cluster, 18 machines

• no control of cluster load

standard 1 job 5 jobs 10 jobs 20 jobs10 sentences 6.3 13.1 9.0 9.0 –

100 sentences 5.2 5.6 3.0 1.7 1.71000 sentences 6.3 6.5 2.0 1.6 1.1

Average time (seconds).

Nicola Bertoldi, ITC-irst Moses in parallel August 17, 2006

Page 68: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Decoder Output Analysis

Evan Herbst

8 / 17 / 06

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 69: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Measurables

• Difficulty

– perplexity

• Error

– WER– PWER– BLEU– confidence intervals

• Significance

– t-test– sign test

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 70: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Definition: Perplexity

Measure likelihood of corpus given model (e.g. language model)

PX = 2−1N

∑i log(pLM(wi)),wi words

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 71: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

Definition: WER

Word Error Rate: modified edit distance

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 72: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

Definition: PWER

Position-independent Word Error Rate: match bags of words

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 73: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

Definition: BLEU

BiLingual Evaluation Understudy: n-gram precision and length comparison

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 74: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

Numbers

Dataset: 2000-sentence Europarl subset

pharaoh moses baseline

Linguae → de-en en-de de-en en-de

BLEU .2557 .1775 .2554 .1776

WER .5432 .6144 .5428 .6145

PWER/WER .865 .940 .865 .947

Lemma BLEU .2625 .2170 .2622 .2180

N-gram Prec. .609/.315/.188/.119 .519/.223/.122/.070 .609/.314/.188/.119 .519/.223/.122/.070

Perplexity 40.97 62.01 40.94 61.77

Ref Perplex. 68.81 125.29 68.81 125.29

Inferences

• lemmas vs. surface: morphology

• output vs. reference perplexity: fluency

• PWER/WER ratio: reordering; phrase tables

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 75: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

7

Tool: Comparison

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 76: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8

Tool: Alignment

Evan Herbst Decoder Output Analysis 8 / 17 / 06

Page 77: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Suffix Arrays for More Statistics(and Less Disk Space!)

Chris Callison-Burch

August 17, 2006

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 78: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Phrase Tables in Statistical Machine Translation

• Using longer phrases leads to better translation quality

• Phrase tables can get unwieldily large with long phrases

• Problem of large tables is compounded for factored translation models

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 79: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Phrase Tables in Factored Translation Models

• Translation tables between source and target phrases, and POS tags, stems,morphological markers, etc.

• Plus generation tables

• Want longer sequences for factors with smaller tags sets

• Number of tables depend on number of conditioning variables, and on back-offstrategies

• Potentially more tables than all pairwise combinations of factors

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 80: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

Ad Hoc Solutions

• Limit length of phrases

• Only extract phrases for test data

• Make unnecessary independence assumptions

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 81: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

Proposed Solution: Intelligent Data Structure

• Uses less memory than table-based data structures

• Allows us to condition on whatever factors we want and easily back-off

• Retrieve translation / generation probabilities for arbitrarily long sequences

• Suffix arrays to index parallel corpus

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 82: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

How Suffix Arrays Work

0123456789

Spain declined to confirm that Spain declined to aid Moroccodeclined to confirm that Spain declined to aid Moroccoto confirm that Spain declined to aid Moroccoconfirm that Spain declined to aid Moroccothat Spain declined to aid MoroccoSpain declined to aid Moroccodeclined to aid Moroccoto aid Moroccoaid MoroccoMorocco

Spain declined to confirm that Spain declined aidto Morocco0 1 2 3 4 5 6 87 9

s[0]s[1]s[2]s[3]s[4]s[5]s[6]s[7]s[8]s[9]

Initialized, unsortedSuffix Array Suffixes denoted by s[i]

CorpusIndex ofwords:

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 83: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

Alphabetically Sorted

8361950472

to aid Moroccoto confirm that Spain declined to aid Morocco

MoroccoSpain declined to aid Morocco

declined to confirm that Spain declined to aid Moroccodeclined to aid Moroccoconfirm that Spain declined to aid Moroccoaid Morocco

that Spain declined to aid MoroccoSpain declined to confirm that Spain declined to aid Morocco

SortedSuffix Array Suffixes denoted by s[i]

s[0]s[1]s[2]s[3]s[4]s[5]s[6]s[7]s[8]s[9]

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 84: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

7

(Reasonably) Fast Find

8361950472

to aid Moroccoto confirm that Spain declined to aid Morocco

MoroccoSpain declined to aid Morocco

declined to confirm that Spain declined to aid Moroccodeclined to aid Moroccoconfirm that Spain declined to aid Moroccoaid Morocco

that Spain declined to aid MoroccoSpain declined to confirm that Spain declined to aid Morocco

SortedSuffix Array Suffixes denoted by s[i]

s[0]s[1]s[2]s[3]s[4]s[5]s[6]s[7]s[8]s[9]

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 85: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8

Applied to Factored Translation Models

spain declin to confirm that spain declin aidto moroccoNNP TO VB IN NNP VBN VBTO NNPVBDSpain declined to confirm that Spain declined aidto Morocco

0 1 2 3 4 5 6 87 9Factored Corpus

Index ofwords:POS:

stems:

• Index each factor

• Store word-level alignments

• Calculate probabilities on the fly

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 86: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

9

Generation Probabilities

p(NNP VBN | Spain declined) = 0.5p(NNP VBD | Spain declined) = 0.5

Spain declined to aid Morocco

to aid Moroccothat Spain declined to aid Morocco

spain declin to confirm that spain declin aidto moroccoNNP TO VB IN NNP VBN VBTO NNPVBDSpain declined to confirm that Spain declined aidto Morocco

0 1 2 3 4 5 6 87 9Factored Corpus

Index ofwords:POS:

stems:

SortedSuffix Array Suffixes denoted by s[i]

8361950472 to confirm that Spain declined to aid Morocco

Moroccodeclined to confirm that Spain declined to aid Moroccodeclined to aid Moroccoconfirm that Spain declined to aid Moroccoaid Morocco

Spain declined to confirm that Spain declined to aid Morocco

s[0]s[1]s[2]s[3]s[4]s[5]s[6]s[7]s[8]s[9]

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 87: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

10

Generation Probabilities

p(Spain | NNP) = 0.66666 .p(Morocco | NNP) = 0.33333 .

spain declin to confirm that spain declin aidto moroccoNNP TO VB IN NNP VBN VBTO NNPVBDSpain declined to confirm that Spain declined aidto Morocco

0 1 2 3 4 5 6 87 9Factored Corpus

Index ofwords:POS:

stems:

NNP VBD TO VB IN NNP VBN TO VB NNP

SortedSuffix Array Suffixes denoted by s[i]

VB NNP

TO VB NNP

NNP4905273816

s[0]s[1]s[2]s[3]s[4]s[5]s[6]s[7]s[8]s[9]

IN NNP VBN TO VB NNP

NNP VBN TO VB NNPTO VB IN NNP VBN TO VB NNP

VB IN NNP VBN TO VB NNP

VBN TO VB NNPVBD TO VB IN NNP VBN TO VB NNP

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 88: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

11

Translation Probabilities

Spain declined to aid Morocco

to aid Moroccothat Spain declined to aid Morocco

spain declin to confirm that spain declin aidto moroccoNNP TO VB IN NNP VBN VBTO NNPVBDSpain declined to confirm that Spain declined aidto Morocco

0 1 2 3 4 5 6 87 9Factored Corpus

Index ofwords:POS:

stems:

SortedSuffix Array Suffixes denoted by s[i]

8361950472 to confirm that Spain declined to aid Morocco

Moroccodeclined to confirm that Spain declined to aid Moroccodeclined to aid Moroccoconfirm that Spain declined to aid Moroccoaid Morocco

Spain declined to confirm that Spain declined to aid Morocco

s[0]s[1]s[2]s[3]s[4]s[5]s[6]s[7]s[8]s[9]

Marocle

aiderd'

refuséavait

Espagnel'

queconfirmer

derefusé

aEspagne

L'

Mor

occo

aid

todecli

ned

Spai

nth

atco

nfirm

todecli

ned

Spai

n

p(L'Espagne a refusé de | Spain declined) = 0.5p(l'Espagne avait refusé d' | Spain declined) = 0.5

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 89: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

12

Translation Probabilities

NNP VBD TO VB IN NNP VBN TO VB NNP

spain declin to confirm that spain declin aidto moroccoNNP TO VB IN NNP VBN VBTO NNPVBDSpain declined to confirm that Spain declined aidto Morocco

0 1 2 3 4 5 6 87 9Factored Corpus

Index ofwords:POS:

stems:

s[0]s[1]s[2]s[3]s[4]s[5]s[6]s[7]s[8]s[9]

Marocle

aiderd'

refuséavait

Espagnel'

queconfirmer

derefusé

aEspagne

L'

Mor

occo

aid

todecli

ned

Spai

nth

atco

nfirm

todecli

ned

Spai

n

p(l'Espagne avait refusé d' | Spain declined, NNP VBN) = 1

SortedSuffix Array Suffixes denoted by s[i]

VB NNP

TO VB NNP

NNP4905273816

IN NNP VBN TO VB NNP

NNP VBN TO VB NNPTO VB IN NNP VBN TO VB NNP

VB IN NNP VBN TO VB NNP

VBN TO VB NNPVBD TO VB IN NNP VBN TO VB NNP

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 90: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

13

Advantages

• Memory reduction

– Memory = 2 * num factors * corpus + word alignments– Significantly less than phrase tables!

• Greater range of statistics

– Arbitrary number of conditioning variables– Allows range of back-off strategies

• Can extract statistics for arbitrarily long sequences

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 91: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

14

Research to be Undertaken

• Integrate into Moses decoder

• Deal with increased computational complexity

• Change search strategies to incorporate longer factor sequences, of differentlevels of granularity

• Experiment to test if longer sequences improve translation quality

• Experiment with what variables to condition upon, how to back off

Chris Callison-Burch Suffix Arrays for More Statistics (and Less Disk Space!) August 17, 2006

Page 92: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

MIT Lincoln + Computer Science AI Labs

18/14/2006

Charles University

Wade Shèn, Břooke Cowan, OndrejBojar and Christine Möran

Factored Translation Models for Small Data Problems

Experiments with Spanish, Czech and Chinese

Page 93: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

28/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 94: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

38/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

General MotivationsChallenges with Small Data

• Phrase-based MT relies on large data– Learn “Phrase” co-occurence within language– Learn Translation templates/phrases across languages

• Problems Phrase-based MT with Small Data– Word Alignment– Hard to see enough phrases (coverage)

Especially in morphologically rich languages– Tend to rely on shorter phrases

Increased local agreement problems Increased long-distance coherence problems

Page 95: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

48/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Possible Advantages of Factored ModelsGeneralization over Morphology

• We can Model morph. variation and phrase translation separately for better statistics: Translation + Generation

– Spanish Gender

– Czech Case

Masculine FeminineEnglishSpanish Él es un jugador rojo Ella es una jugadora roja

he is a red player she is a red player

Nominative + Plural Dative + PluralEnglishCzech černé kočky černým kočkám

black cats black cats

el ser un jugador rojMorph: f 3p+sing f f fMorph: m 3p+sing m m m

černá kočka Morph: dat+pl dat+plMorph: nom+pl nom+pl

Page 96: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

58/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factors as Type CheckingLong Range Phenomena and Divergence

• Long range dependencies can be modeled with latent factors– Spanish: Verb – Subject Number Agreement

• Verb-Argument dependencies

Spanish Mi hija de dos años tiene catarroGloss My daughter of two years has coldCzech Nachlazena je moje dvouletá dcera.

verb: 3p+singSubject: 3p+sing AGR

verb: 3p+sing Subject: 3p+singAGR

Czech Napsal zprávu o matčině domu na papírGloss He wrote a message about mother’s house on a paper

noun: accusativeverb select

Czech Našel zprávu o matčině domu na papířeGloss He found a message about mother’s house on a paper

noun: locativeverb select

Page 97: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

68/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Phrase-Level Generalization

• Class-based divergences– Chinese-English resultative constructions

Similar pattern for large class of verbs

• Longer distance movement dependencies– Chinese-English Questions

Chinese 你 要 答 破 吗made hit broken doneyou

回Gloss it

English you broke it

Chinese 你 要 答 [clause…] 吗want [clause…] y/n-markeryou

would you like to reply to [clause…] ?

回Gloss replyEnglish

causes reorderingTags: VModal Pn Tag: Part

Verb Specific

Page 98: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

78/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Large vs. Small DataHow generalizations may affect SMT Performance

• With large data sets these phenomena can be learned– Language Models should get local agreement phenomena

with enough data– Long range agreement/coherence still problematic– Generalization may still be better, but errors in analysis can

limit

• Generalization may be advantageous for small data– For example: (Spanish/Czech Agreement)

Can’t learn every noun/adjective/determiner triple– Situation for many real-world problems

Page 99: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

88/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines– Approaches– Data Sets

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 100: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

98/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Data Sets and Baselines

Data Set Translation Direction(s)

Size Baseline w/diff LMs(BLEU/Surface)

Full Europarl English Spanish

950k LM Train700k Bitext

3g 29.354g 29.575g 29.54

3g 23.413g (950k) 25.10

3g 25.82(four references)

4g 19.54(seven references)

Euromini English Spanish

60k LM Train40k Bitext

Czech WSJ English Czech

20k LM Train20k Bitext

IWSLT Chinese Chinese English

40k LM Train40k Bitext

Page 101: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

108/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Using Factored ModelsApproaches for Small-Data Tasks

• Factored Models we tried– Different levels of linguistic information modeled separately

example: Morphology vs. phrasal content– Feature “Checking” of existing phrasal models with LMs on

factors

– Generalized Factor-based Distortion Phrase are likely to move distance X if preceding word is Tag Y

• Hypothesis: These models allows better utilization of limited training data

I would like some donutsGood

pn mod vb det np

I would like some big jumpBad

pn mod vb det adj vb

Words

POS

High likelihood Low likelihood

Page 102: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

118/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Different Factored ApproachesOverview of Models Tried

• High Order Language Models

• Parallel Translation Models

Analysis Problems AddressedExplicit Agreement

Long Distance CoherenceUnsupervised Agreement/Coherence • LMs over Word-Classes

• LMs over verbs/subject• LMs over nouns determiner

adjectives

SupervisedModel Types

• LMs over POS

• Parallel Translation Models over Word-Classes and Surface

Agreement/CoherenceUnsupervised

Explicit AgreementProblem Types

• Parallel Translation Models over Lemmas and Morphology

SupervisedAnalysis Model Types

Page 103: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

128/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish– Morphology and Agreement Features (Brooke)– Parallel Lemma and Morphology Translation (Wade)– Scaling to Larger Corpora (Wade)

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 104: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

138/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Spanish ExperimentsLanguage Models over Morphological Features

• NDA– Nouns/Determiner/Adjective Agreement– Generate only on N, D and A tags (don’t

care’s elsewhere)

• VNP– Verb/Nouns/Preposition Selection

Agreement– Generate on V, N or P

ModelModel

Surface

Generate + Check Latent Factors

nda

word

vpn

N/D/A FeaturesGender: masc, fem, common, none Number: sing, plural, invariable, none

V/N/P FeaturesNumber: sing, plural, invariable, none Person: 1p, 2p, 3p, nonePrep-ID: Preposition, none

Page 105: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

148/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

ModelModel

Spanish ExperimentsSkipped LMs for Agreement

• Allow NULL factors to be generated• Increase effective context length to model longer range

dependencies

Surface

Generate Latent Factors

…gave the woman

nda

word

vpn

s+f

s

s+f

X

X

“a”3+s

X

mujerlaadio

Target Phrase

Source Phrase

Page 106: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

158/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Spanish Agreement LMsExperimental Results

• With Skipping

• No Skipping (LM counts don’t care positions)

• No Skipping with all morphological features w/ and w/o POS

• All models beat baseline– Skipping doesn’t seem to help– Full morphology is best

Data Set Baseline NDA VPN BothEuroMini 23.41 24.47 24.33 24.54

Data Set Baseline NDA+Skip VPN+SkipEuroMini 23.41 24.03 24.16

Data Set Baseline Morph Morph+POSEuroMini 23.41 24.66 24.25

Page 107: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

168/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Lemma

Person + Number + Gender + Case

Spanish ExperimentsParallel Lemma/Morphology Translation

• Factor surface into lemma and morphology features• Translate both simultaneously• Re-generate target surface form• Apply LM on both surface and morphology features

• Results:

Surface

Analysis Generation

Me

I

1ps+ Acc

Yo

Mi

1ps+ Acc

Data Set Baseline LemmaEuroMini + 950k LM 25.10 25.71

Page 108: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

178/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Scaling Up to Large TrainingPOS Language Models

• Full Train → Less/No Gain from richer features

POS-LM vs. Baseline

28

28.5

2929.5

30

30.5

31

3g 4g 5g 6g 7g 8g 9g

POS N-gram Order

BLE

U S

core

BaselinePOS-LMFull Tags

NOTE: Scale

Page 109: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

188/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech– Factored Word Alignment for Limited Data– Rich Morphology and Tagged LMs– Putting it Together: Parallel Translation

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Analysis and Conclusions

• Follow-on Research

Page 110: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

198/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factors for Coping with Limited DataBetter Word Alignment for Czech

• Word Alignment is difficult when data is limited and Morphology is rich

– Data: 20k bitext sentences, large vocabulary– Contrast Set: 20k + 840k (Out of Domain) sentences– Task: English Czech

• Two methods to deal with limited data

• Contrastive Behavior for small and large data

Stem Alignment Lemma Alignment

Data Set Word-Word Stem-Lemma Stem-Stem20k Czech 25.17 25.23 25.82

24.99Large Contrast 25.40

Page 111: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

208/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Czeching Rich Morphology with TagsTagged Czech Language Models

• Idea: Use morphologically rich POS Tag sequences to “czech”target output generation

• POS Information Configurations (Baseline: 25.82)

Surface

Generation

cat

N+acc

kočky

Apply LM

Full TagsFeature 1Feature 2… (15 total)Size: 1098 tagsResult: 27.04

CNG TagsCaseNumber+Genderon V, P, PP, N, ASize: 707 tagsResult: 27.45

CNG+VPCNG FeaturesPerson+Tense+Aspect (verbs)Lemma+Case (prepostions)Size: 899 tagsResult: 27.62

Page 112: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

218/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Comparing with Larger Data ModelsTagged Czech Language Models

• Large vs. Small Data

• Tagged Language Models improve performance for small data significantly

– approaches large data performance• Large Task also improves (but much less: 2.36% vs. 6.97%)

Data Set Data Set BLEU Relative Improvement

20k Czech 25.82 –Large Contrast(20k + 840k OOD)

27.47 –

Baseline

20k Czech 27.62 6.97%CNG+VP

2.37%Large Contrast(20k + 840k OOD)

28.12

Page 113: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

228/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Parallel Translation Models for Czech

• Motivation: Factored LM models seem to lose number information

• Better than baseline, but worse than both CNG & CNG+VP

POS Tag + CNG Features

Surfacehim

3p+acc

ho

Model ResultSurface Surface + POS POS+CNG 25.94

on Lemma

3p+acc

Surface Lemma + POS POS+CNG 26.43

Page 114: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

238/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models (Christine)– Lexical Distortion Models– Factor-based Distortion– Results

• Models for Sparse Statistics in Chinese

• Analysis and Conclusions

• Follow-on Research

Page 115: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

248/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Generalized Distortion ModelingIntroduction to Distortion

• For each phrase pair we learn its likely placement relative to the previous phrase

• Orientations– Monotone

word alignment point on top left– Swap

word alignment point on top right– Discontinuous

Not monotone or swap

• Examples– la casa roja the red house– D NN ADJ D ADJ NN

Source

Targ

et

Monotone

Discontinuous Swap

Page 116: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

258/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factor-based Distortion Models

• A Factor-based extension of Lexicalized Distortion– Use of more general factors

e.g. POSf-POSe, Lemma-Lemma

• Can model longer range dependencies– More conditioning variables

• Motivating Results– Hard-coding in a few factor based rules (e.g. swap nouns and

adjectives when translating from English to Spanish) led to improvements (Gispert, et. al. 2006)

Page 117: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

268/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factor-based DistortionSpanish Experiments

• Lexicalized Distortion only

• Factor-based Distortion on small data

• Further Experiments– Other Factors– Minimizing Model Parameters– Combining different models

Data Set ResultBaseline (No Lexical)

Factored: POS-POS SystemCombined: Lexical + POS-POS

Baseline Lexical

Europarl Lang Pharaoh MosesEn De

Es En

En Es

18.15 18.85

31.06 31.85 31.46 32.37

Page 118: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

278/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 119: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

288/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

IWSLT ChineseExperiments with Unsupervised Annotation

• Data: Travel-domain sentences, limited vocabulary, short sentences• Task: Text and ASR translation, Chinese English

• Can we use automatic word classes to learn general sequence constraints?

• First Experiment: 2-gram Word Class LMs of varying orders

ModelModel

SurfaceHow much is it?

class

word

c55c3c22c1

?钱多少总共

Target Phrase

Source Phrase

Page 120: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

298/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

IWSLT ChineseAlignment Templates for Translation

• Second Experiment: Extend Class-based LM to the translation Model

• Bigram word classes for source and target

• Translate alignment templates similar to [Och 98] + surface

• Apply LM to surface and Class

Word Class

Surface

Generation

Me

I Yo

Mi

Page 121: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

308/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

18

18.5

19

19.5

20

20.5

21

21.5

22

22.5

3g 4g 5g 6g 7g 8g 9g

Class N-gram Order

BLE

U S

core

Baseline

Class-LM

ClassTrans+LM

• Class-LM significantly better (p=0.05, ~1.0 BLEU)• Class-Trans may be limited by synchronous PT constraint

– Start to address here, but not in time for eval

NOTE: Scale

IWSLT ChineseAutoclass Results

Page 122: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

318/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Coping with Rich Morphological Constraints in Czech

• Conclusions and Follow On Research

Page 123: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

328/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Conclusions and Future Work

• Factored Approach can help with small data– Large Data tasks may need different factored approaches

• MIT/LL + CSAIL– Continue experiments with morphology and coherence– Fully Asynchronous Factor Translation– Apply techniques to other languages

Extend existing LCTL experiments– Syntax-driven reordering models (Brooke)

• Asynchronous Factors Translation (Hieu)

• Making use of verb sub-categorization information (Ondrej)

Page 124: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Valency-Aware Machine Translation

Project Proposal

Ondrej [email protected]

August 17, 2006

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 125: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Overview

• JHU Workshop motivation and one of the results.

• State-of-the-art MT errors.

• Project goal.

• Motivation: Why Czech.

• Proposed strategy and information sources.

• Summary.

Appendices: References, illustrations and further details on Czech and English

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 126: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Workshop Motivation

• Statistical machine translation (SMT) into morphologically rich languages ismore difficult than from them.

See e.g. Koehn (2005).

• One of workshop goals: examine utility of factored translation models totranslate into morphologically rich languages.

• There was room for improvement:

Regular BLEU English→Czech 25%BLEU of lemmatized MT against lemmatized references 32%

⇒ Errors in morphology cause large BLEU loss.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 127: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

One of the Workshop Results

• Significant improvements gained on small data sets:English→Czech: 20k sentences, BLEU 25.82% to 27.62%or up to 28.12% with additional out-of-domain parallel data.

• Still far below the margin of lemmatized BLEU (35%).

• However local agreement already very good:

Microstudy: Adjective-Noun Agreement74% correct, 2% mismatch, other: missing noun etc.

⇒ So where are the morphological errors?

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 128: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

Current English→Czech MT ErrorsMicrostudy of current best MT output (BLEU 28.12%), intuitive metric:

• 15 sentences, 77 verb-modifier pairs in source text examined:

Translation of . . . preserves meaning . . . is disrupted . . . is missingVerb 43% 14% 21%Modifier 79% 12% 6%

But: When Verb&Mod correct, 44% of cases are non-grammatical or meaning-

disturbing relations.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 129: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

Samples ErrorsInput: Keep on investing.

MT output: Pokracovalo investovanı. (grammar correct here!)

Gloss: Continued investing. (Meaning: The investing continued.)

Correct: Pokracujte v investovanı.

⇒language model misled us ⇒ need to include source valency information.

Input: brokerage firms rushed out ads . . .

MT Output: brokerske firmy vybehl reklamy

Gloss: brokerage firmspl.fem ransg.masc adspl.nom,pl.acc,pl.voc,sg.gen

Correct option 1: brokerske firmy vybehly s reklamamipl.instr

Correct option 2: brokerske firmy vydaly reklamypl.acc

Target-side data may be rich enough to learn: vybehnout–s–instr

Not rich enough to learn all morphological and lexical variants:vybehl–s–reklamou, vybehla–s–reklamami, vybehl–s–prohlasenım, vybehli–s–oznamenım, . . .

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 130: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

Project Goal

Improve MT output quality by valency information.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 131: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

7

Motivation: Why Czech• Relevant properties: very rich morphological system and relatively free word

order.• Well-established theory on syntax and valency in particular.

Sgall, Hajicova, and Panevova (1986), Panevova (1994)

• Data available:monolingual and parallel corpora

manual surface and deep treebanks (parallel forthcoming!)

manual valency lexicons

Language Corpus Annotation up to Tokens

Cs PDT 2.0 (Hajic, 2005) manual surface and deep syntax 1.5M surf.Cs CNC (Kocek, Koprivova, and Kucera, 2000) automatic lemmatization and morphology 114MCs Web corpus automatic surface syntax 100M

Cs↔En PCEDT 1.0 (Cmejrek, Curın, and Havelka, 2003) automatic surface and deep syntax 500kCs↔En CzEng 0.5 automatic surface syntax 15M

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 132: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8

Proposed StrategyPreliminary experiments at workshop:

• Factored models touching valency explored during workshop perform badly.No gain or a slight loss.

Future:

• Evaluate the causes.Was it just sparse data?

• Check subcategorization using partially lexicalized language models.Morphological LM with verbs lexicalized should capture subcategorization.

• Experiment with syntax-based language models.(Chelba and Jelinek, 1998; Charniak, 2001)

• Map explicit subcategorization information from source to target.Translate lemma+subcat to lemma+subcat and POS to POS, generate surface from this.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 133: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

9

Project Will Use these Sources of Information

• Available valency/subcategorization dictionaries.VALLEX for Czech. (∼PropBank for English.)

• Automatically collected subcategorization data.(Korhonen, 2002) and previous, my diss. in prep.

• Word-sense-like algorithms to label verb occurrences with frames.(Bojar, Semecky, and Benesova, 2005), and all WSD community results

Compare with simple approaches:

• More monolingual data for plain n-gram language models may help enough.• Are valency-based generalizations useful in general/on small data/on out-of-

domain data?

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 134: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

10

Summary

• Factored models help fixing morphology → local dependencies already correct.

• Significant margin for improving verb-modifier agreement.

• English→Czech pair is a good fit for the experiments.

• Improved valency models should improve translation quality:Valency theory, data and methods available.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 135: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

11

References

Bojar, Ondrej. 2003. Towards Automatic Extraction of Verb Frames. Prague Bulletin of

Mathematical Linguistics, 79–80:101–120.

Bojar, Ondrej, Jirı Semecky, and Vaclava Benesova. 2005. VALEVAL: Testing VALLEX

Consistency and Experimenting with Word-Frame Disambiguation. Prague Bulletin of

Mathematical Linguistics, 83:5–17.

Charniak, Eugene. 2001. Immediate-head parsing for language models. In Meeting of the

Association for Computational Linguistics, pages 116–123.

Chelba, Ciprian and Frederick Jelinek. 1998. Exploiting syntactic structure for language modeling.

In Christian Boitet and Pete Whitelock, editors, Proceedings of the Thirty-Sixth Annual Meeting

of the Association for Computational Linguistics and Seventeenth International Conference

on Computational Linguistics, pages 225–231, San Francisco, California. Morgan Kaufmann

Publishers.

Cmejrek, Martin, Jan Curın, and Jirı Havelka. 2003. Czech-English Dependency-based Machine

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 136: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

12

Translation. In EACL 2003 Proceedings of the Conference, pages 83–90. Association for

Computational Linguistics, April.

Collins, Michael. 1996. A New Statistical Parser Based on Bigram Lexical Dependencies. In

Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages

184–191.

Collins, Michael, Jan Hajic, Eric Brill, Lance Ramshaw, and Christoph Tillmann. 1999. A

Statistical Parser of Czech. In Proceedings of 37th ACL Conference, pages 505–512, University

of Maryland, College Park, USA.

Hajic, Jan. 2005. Complex Corpus Annotation: The Prague Dependency Treebank. In Maria

Simkova, editor, Insight into Slovak and Czech Corpus Linguistics, pages 54–73, Bratislava,

Slovakia. Veda, vydavatelstvo SAV.

Holan, Tomas. 2003. K syntakticke analyze ceskych(!) vet. In MIS 2003. MATFYZPRESS,

January 18–25, 2003.

Kocek, Jan, Marie Koprivova, and Karel Kucera, editors. 2000. Cesky narodnı korpus - uvod a

prırucka uzivatele. FF UK - UCNK, Praha.

Koehn, Philipp. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In

Proceedings of MT Summit X, September.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 137: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

13

Korhonen, Anna. 2002. Subcategorization Acquisition. Technical Report UCAM-CL-TR-530,

University of Cambridge, Computer Laboratory, Cambridge, UK, February.

Kruijff, Geert-Jan M. 2003. 3-Phase Grammar Learning. In Proceedings of the Workshop on

Ideas and Strategies for Multilingual Grammar Development.

Panevova, Jarmila. 1994. Valency Frames and the Meaning of the Sentence. In Ph. L.

Luelsdorff, editor, The Prague School of Structural and Functional Linguistics, pages 223–243,

Amsterdam-Philadelphia. John Benjamins.

Sgall, Petr, Eva Hajicova, and Jarmila Panevova. 1986. The Meaning of the Sentence and

Its Semantic and Pragmatic Aspects. Academia/Reidel Publishing Company, Prague, Czech

Republic/Dordrecht, Netherlands.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 138: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

14

Analysis of CzechAnalytic (surface syntactic):

#36Zakony

Laws

udelejte

make

pro

for

lidi

people

ADV

AUXPOBJ

PRED

Tectogrammatical (deep syntactic):

#36zakonPl

lawPl

udelatimp

makeimpyou

clovekPl,pro

personPl,for

BENACTPAT

PRED

Morphological:Form Lemma Morphological tag

zakony zakon NNIP1-----A----

zakony zakon NNIP4-----A----

zakony zakon NNIP5-----A----

zakony zakon NNIP7-----A----

udelejte udelat Vi-P---2--A----

udelejte udelat Vi-P---3--A---4

pro pro-1 RR--4----------

lidi clovek NNMP1-----A----

lidi clovek NNMP4-----A----

lidi clovek NNMP5-----A----

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 139: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

15

Properties of Czech languageCzech English

Rich morphology ≥ 4,000 tags possible, ≥ 2,300 seen 50 usedWord order free rigid

• rigid global word order phenomena: clitics

• rigid local word order phenomena: coordination, clitics mutual order

Nonprojective sentences 16,920 23.3%Nonprojective edges 23,691 1.9%

Known parsing results Czech EnglishEdge accuracy 69.2–82.5% 91%Sentence correctness 15.0–30.9% 43%

Data by (Collins et al.,1999), (Holan, 2003), Zeman

(http://ckl.mff.cuni.cz/˜zeman//projekty/neproj/index.html)

and (Bojar, 2003). Consult(Kruijff, 2003) for measuringword order freeness.

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 140: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

16

Detailed numbers on CzechEdge length 1 ≤ 2 ≤ 5English [%] 74.2 86.3 95.6Czech [%] 51.8 72.1 90.2

1

Number of gaps 0 1 2Sentences [%] 76.9 22.7 0.42

2

Climbing steps 1 2 3 4 5Nodes [%] 90.3 8.0 1.3 0.3 0.1

3

1Data for English by (Collins, 1996). Data for Czech by (Holan, 2003).2Data by (Holan, 2003).3Data by (Holan, 2003).

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 141: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

17

Analytic vs. Tectogrammatical (2)

#45ToIt

byconjunct particle

sereflexive particle

meloshould

zmenitchange

.full stop

AUXK

AUXR

OBJAUXVSB

PRED

#45toit

mıtshould

zmenitconj

changeconj

Generic

Actor

PREDPAT ACT

PRED

Ondrej Bojar Valency-Aware Machine Translation August 17, 2006

Page 142: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Asynchronous Factored Translation

Hieu HoangUniversity of Edinburgh

Page 143: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Current System

Phrase Table 1

Je vous achète I am buying you

Phrase Table 2

PRO PRO VB PRO VB VB PRO

Translating

Je vous achète un chat

PRO PRO VB ART NN

Page 144: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Current System

Phrase Table 1

Je vous achète I am buying you

Phrase Table 2

PRO PRO VB PRO VB VB PRO

Translating

Je vous achète un chat

PRO PRO VB ART NN

Page 145: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

LimitationsSynchronous

Phrase Table 1

Je

vous

achète

I

you

am buying

Phrase Table 2

PRO PRO VB PRO VB VB PRO

Je vous achète un chat

PRO PRO VB ART NN

Page 146: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Asynchronous TranslationSynchronous

Phrase Table 1

Je

vous

achète

I

you

am buying

Phrase Table 2

PRO PRO VB PRO VB VB PRO

Je vous achète un chat

PRO PRO VB ART NN

Page 147: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

TilingJe vous achète un chat

PRO PRO VB ART NN

Current System

Page 148: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

TilingJe vous achète un chat

PRO PRO VB ART NN

Current System

Page 149: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

PRO PRO VB ART NN

Je vous achète un chat

TilingJe vous achète un chat

PRO PRO VB ART NN

Current System

Future

Page 150: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

PRO PRO VB ART NN

Je vous achète un chat

TilingJe vous achète un chat

PRO PRO VB ART NN

Current System

Future

Page 151: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

PRO PRO VB ART NN

Je vous achète un chat

TilingJe vous achète un chat

PRO PRO VB ART NN

Current System

Future

Page 152: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

PRO PRO VB ART NN

Je vous achète un chat

TilingJe vous achète un chat

PRO PRO VB ART NN

Current System

Future

Page 153: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Je vous achète un chat

PRO PRO VB ART NN

Je

Vous

achète

un chat

Long Templates

Phrase Table 1

I

am buying

You

a cat

Phrase Table 2

PRO PRO VB ART NN PRO VB VB PRO ART NN

Page 154: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Je vous achète un chat

PRO PRO VB ART NN

Templates

Phrase Table 1

Je

Vous

achète

un chat

I

am buying

You

a cat

Phrase Table 2

PRO PRO VB ART NN PRO VB VB PRO ART NN

Page 155: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Combining information from different factors

ni suo ta da mingzi le ma ?

You said his name, right ?

past

past

You say his name already question

Surface:

Tense:

Page 156: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Challenges

• Computational complexity• Pruning strategies• Recombination• Scoring

Page 157: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Translation of morphologically rich languageswith additional linguistic information

Chris Dyer, Philipp Koehn, Chris Callison-Burch, Hieu Hoang

17 August 2006

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 158: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Morphologically rich languages

• Languages differ in their morphological markup• Examples with increasing complexity:

– Chinese: no marking for number, gender, tense, or aspect– English: number(2) for nouns, four verb forms– Spanish: number(2) and gender(2) for adjectives, ...– German: number(2), gender(3), case(4), definiteness for adjectives, ...– Arabic: number(3), gender(2), case(3), definiteness, possessors for nouns– Finnish: prepositions often expressed morphologically

Language Vocabulary size in EuroparlEnglish 65,887 word formsSpanish 102,886 word formsGerman 195,290 word formsFinnish 358,345 word forms

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 159: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Impact of morphological complexity

• How much information do we have if we discount inflectional morphology?

• Experiment (systems trained on full 700,000 sentence Europarl corpus):

Method devtest testsurface → surface 18.22 BLEU 18.04 BLEU

surface → surface (lemmatize) 22.27 BLEU 22.15 BLEUsurface → lemma 22.70 BLEU 22.45 BLEU

• Gain of 4 BLEU points possible, if we can solve morphology

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 160: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

Problem: unknown word forms

• Unknown surface word forms (German)

test set unigrams bigrams trigramsdevtest-2006 0.71% 12.00% 40.46%

test-2006 0.69% 12.20% 41.08%

• Unknown lemmas (German)

test set unigrams bigrams trigramsdevtest-2006 0.64% 9.05% 33.93%

test-2006 0.64% 9.14% 34.36%

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 161: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

Factored models

• Factored models allow us to address these problems

• Sparse data

– back off to translation of lemmas– back off to language models with richer statistics

• Agreement and grammatical coherence

– use of factors that enforce agreement within noun phrases– use of factors that enforce agreement on the clause level

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 162: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

Addressing data sparseness with lemmas

word word

lemma

OutputInput

• Translate surface into lemma

• Generate surface from lemma

• Translate surface into surface

• Language models over surface and lemma

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 163: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

Addressing data sparseness with lemmas, model 2

word word

lemma

OutputInput

• Translate surface into surface

• Generate lemma from surface

• Language models over surface and lemma

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 164: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

7

Experimental Results

Method devtest testbaseline 18.22 18.04

hidden lemma (gen only) 18.82 18.69hidden lemma (gen and trans) 18.41 18.52

best published results - 18.15

• Better performance than baseline model

• Simpler model has higher performance

– fewer search errors

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 165: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8

Addressing data sparseness with factored models

lemma lemma

part-of-speech

OutputInput

morphology

part-of-speech

word word

• Morphological analysis and generation model

• Pitfalls of this approach

– tag set does not necessarily have sufficient information– explosive search space on large models

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 166: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

9

Overall grammatical coherence

word word

part-of-speech

OutputInput

• High order language models over POS

• Motivation: syntactic tags should enforce syntactic sentence structure

• Results: No major impact with 7-gram POS model (BLEU 18.25 vs. 18.22)

• Analysis: local grammatical coherence already fairly good, POS sequence LMmodel not strong enough to support major restructuring

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 167: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

10

Local agreement (esp. within noun phrases)

word word

part-of-speech

OutputInput

morphology

• High order language models over POS and morphology

• Motivation

– DET-sgl NOUN-sgl good sequence– DET-sgl NOUN-plural bad sequence

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 168: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

11

Agreement within noun phrases

• Experiment: 7-gram POS, morph LM in addition to 3-gram word LM

• Results

Method Agreement errors in NP devtest testbaseline 15% in NP ≥ 3 words 18.22 BLEU 18.04 BLEU

factored model 4% in NP ≥ 3 words 18.25 BLEU 18.22 BLEU

• Example

– baseline: ... zur zwischenstaatlichen methoden ...– factored model: ... zu zwischenstaatlichen methoden ...

• Example

– baseline: ... das zweite wichtige anderung ...– factored model: ... die zweite wichtige anderung ...

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 169: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

12

Subject-verb agreement

• Lexical n-gram language model would prefer

the paintings of the old man is beautiful

old man is is a better trigram than old man are

• Correct translation

the paintings of the old man are beautiful- SBJ-plural - - - - V-plural -

• Special tag that tracks count of subject and verb

p(-,SBJ-plural,-,-,-,-,V-plural,-) > p(-,SBJ-plural,-,-,-,-,V-singular,-)

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 170: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

13

Experiment on English–German

• Add special features for subject and verb

• Verb

– our morphological analyzer does not provide verb morphology→ use of surface forms

• Subject

– subject identified with German parser(Amit Dubey’s parser trained on TIGER treebank)

– if pronoun: surface form of pronoun– if noun phrase: POS and morphological tags of determiner, adjective,

and noun

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 171: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

14

Skip language models

• Full language model confused by many non-items:p(-,SBJ-plural,-,-,-,-,V-plural,-) > p(-,SBJ-plural,-,-,-,-,V-singular,-)

• Skip language models: ignoring irrelevant tags:p(SBJ-plural,V-plural) > p(SBJ-plural,V-singular)

• Results: experiments did not finish as of yet, preliminary results inconclusive

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 172: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

15

Reflection on the data

• Clause elements are translated reasonable well

– now high agreement within noun phrases (with factored model 4%)

• Overall sentence structure muddled

– subject–verb agreement hard to enforce, since which noun phrase is subjectis hard to establish

– role (and hence case) of noun phrases often wrong, since relation to verb isunclear

• Similar problems when translating Arabic–English, Chinese–English

– this motivates work on syntax-based machine translation– one solution: syntactic restructuring models (Brooke’s presentation)– another solution: clause-level sequence models

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 173: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

16

Clause level sequence models

• Correct sentence with verb

the paintings of the old man are beautifulSBJ SBJ OBJ OBJ OBJ OBJ V ADJ

• Incorrect sentence without verb

the paintings of the old man beautifulSBJ SBJ OBJ OBJ OBJ OBJ ADJ

• Syntactic role label sequence model is on the steering wheel!

p(SBJ,SBJ,OBJ,OBJ,OBJ,OBJ,V,ADJ) > p(SBJ,SBJ,OBJ,OBJ,OBJ,OBJ,ADJ)

• May be simplified using skip language models to

p(SBJ,OBJ,V,ADJ) > p(SBJ,OBJ,ADJ)

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 174: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

17

Another reality check

• One typical error of the current system

wir haben daher nicht fur diesen bericht stimmenwe have hence not for this report voting

SUBJ AUX PART PART PP-OBJ PP-OBJ PP-OBJ VINF

• Typical sentences have many particles floating around

– if interested in core sentence structure: ignore them– if interested in all parts of the clause: include them

• Key lesson: feature engineering

– know your tag sets and morphological features– be aware of what problem you want to address– create a factor for this purpose

Dyer, Koehn, Callison-Burch, Hoang Morphologically rich languages 17 August 2006

Page 175: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 JHUSWS 2006 1

Future ResearchFuture Research

BackBack--off models: improving MT off models: improving MT through smarter searching and better through smarter searching and better

use of datause of data

Chris Dyer, University of MarylandChris Dyer, University of Maryland

Page 176: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 2JHUSWS 2006

Two GoalsTwo Goals

Smarter SearchSmarter SearchMitigate sparseMitigate sparse--data effects in multidata effects in multi--factored modelsfactored modelsRecover from search errorsRecover from search errorsEnable wellEnable well--motivated models for translating into motivated models for translating into morphologically complex languagesmorphologically complex languages

BackBack--off modelsoff modelsTake advantage of singleTake advantage of single--factored models when it factored models when it makes sense to do somakes sense to do so

Page 177: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 3JHUSWS 2006

Smarter Search: MotivationSmarter Search: MotivationMorphological complexity poses problems for Morphological complexity poses problems for ““whitewhite--space tokenizedspace tokenized”” statistical MTstatistical MT

Beyond data sparseness: conventional models run into search Beyond data sparseness: conventional models run into search problems for rare surface formsproblems for rare surface forms

Lemmatizing results in considerable German Lemmatizing results in considerable German performance gainsperformance gains

devtestdevtest--20062006 testtest--20062006

surfacesurface→→surfacesurface 18.2218.22 18.0418.04

surfacesurface→→surfacesurface, , lemmatizelemmatize

22.2722.27 22.1522.15

surfacesurface→→lemmalemma 22.7022.70 22.4522.45

Page 178: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 4JHUSWS 2006

Smarter Search: MotivationSmarter Search: Motivation

Single factor models do not generalize . They cannot produce a Single factor models do not generalize . They cannot produce a target form target form unless seen in the training data.unless seen in the training data.Basic generation models allow us to improve translation coverageBasic generation models allow us to improve translation coverage with with (inexpensive) monolingual resources(inexpensive) monolingual resources

Translating Translating EnglishEnglish→→GermanGerman

Generation training data sizeGeneration training data size # distinct words # distinct words produceableproduceable

Surface onlySurface only n/an/a 105,000 distinct words105,000 distinct words

Lemmas onlyLemmas only n/an/a 85,000 distinct 85,000 distinct lemmaslemmas

Page 179: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 5JHUSWS 2006

Smarter Search: MotivationSmarter Search: Motivation

Single factor models do not generalize . They cannot produce a Single factor models do not generalize . They cannot produce a target form target form unless seen in the training data.unless seen in the training data.Basic generation models allow us to improve translation coverageBasic generation models allow us to improve translation coverage with with (inexpensive) monolingual resources(inexpensive) monolingual resources

Translating Translating EnglishEnglish→→GermanGerman

Generation training data sizeGeneration training data size # distinct words # distinct words produceableproduceable

Surface onlySurface only n/an/a 105,000 distinct words105,000 distinct words

Lemmas onlyLemmas only n/an/a 85,000 distinct 85,000 distinct lemmaslemmas

Lemmas + Lemmas + bitextbitext EuroparlEuroparl 15 million words15 million words 117,000 distinct words117,000 distinct words

Lemmas + full Lemmas + full EuroparlEuroparl 27 million words27 million words 122,000 distinct words122,000 distinct words

Lemmas + 1.2M EP + Lemmas + 1.2M EP + WikipediaWikipedia

113 million words113 million words 137,000 distinct words137,000 distinct words

Net result: 30% increase in forms Net result: 30% increase in forms produceableproduceable over a singleover a single--factor modelfactor model

Page 180: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 6JHUSWS 2006

Morphological Analysis and Morphological Analysis and Generation ModelGeneration Model

4-step model1. Translate surface to lemma2. Generate morphology from lemma3. Translate POS to morphology4. Generate surface from lemma + morphology

n-gram LM, surface

n-gram LM, lemmata

n-gram LM, morphology

Page 181: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 7JHUSWS 2006

Initial results were disappointingInitial results were disappointing……

BLEU scores well below baseline (~11)BLEU scores well below baseline (~11)Tuning took an entire weekend on a very small Tuning took an entire weekend on a very small tuning settuning set

Page 182: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 8JHUSWS 2006

The ProblemThe Problem

Search errorsSearch errorsAggressive pruningAggressive pruning

Each step multiplies number of states in the search space Each step multiplies number of states in the search space over a single factored modelover a single factored model

Spans must overlap exactlySpans must overlap exactly

Page 183: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 9JHUSWS 2006

The Problem: an illustrationThe Problem: an illustration

Translation options ‘the right approach’:

der richtige Ansatz

dem richtigen Ansatz

den richtigen Ansatz

Page 184: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 10JHUSWS 2006

The SolutionThe Solution

Back off to shorter spansBack off to shorter spansWhen a deadWhen a dead--end is reached, break up the source end is reached, break up the source span into smaller spans and translate thosespan into smaller spans and translate those

Page 185: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 11JHUSWS 2006

The Solution: an illustrationThe Solution: an illustration

Translation options ‘the’:

der, die, das, dem, den, das, des

Translation options ‘right approach’:

richtiger Ansatz

Ansatz

richtigen Ansatzes

Page 186: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 12JHUSWS 2006

BackBack--off Modelsoff Models

Lexicalized surface forms are commonLexicalized surface forms are commonBecause of lexicalization, obscure morphology or Because of lexicalization, obscure morphology or root forms often retainedroot forms often retained

Ex. Ex. ““be that as it maybe that as it may””

Translations often approximate, unusual when Translations often approximate, unusual when analyzed in more abstract layersanalyzed in more abstract layersIf you mistranslate common stock phrases because If you mistranslate common stock phrases because of a rigid analysis and generation processes, fluency of a rigid analysis and generation processes, fluency sufferssuffers

Page 187: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 13JHUSWS 2006

BackBack--off Modelsoff Models

SolutionSolutionTry to let single translation step to cover all factorsTry to let single translation step to cover all factorsBack off to multiBack off to multi--factored modelfactored model

Page 188: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 14JHUSWS 2006

BackBack--off Models: Implementationoff Models: Implementation

““PrimaryPrimary”” phrase tablephrase tableStandard formStandard formContains all factors on target sideContains all factors on target side

Necessary for secondary factor Necessary for secondary factor LMsLMs

May be trained on single factor data with May be trained on single factor data with ““best best guessesguesses”” for secondary factorsfor secondary factorsMay be aggressively filtered, i.e., for >May be aggressively filtered, i.e., for >nn occurrences, occurrences, etc.etc.

Page 189: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 15JHUSWS 2006

BackBack--off Models: Implementationoff Models: Implementation

Key idea: BackKey idea: Back--off weightoff weightFeature that is associated with choosing a single Feature that is associated with choosing a single factored pathfactored pathTuned along with other feature weightsTuned along with other feature weightsFunction of source phrase length?Function of source phrase length?

Page 190: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8/22/2006 16JHUSWS 2006

SummarySummary

Increase performance of multiIncrease performance of multi--factored modelsfactored modelsRecover from search errorsRecover from search errorsRecover from data sparseness (make more efficient Recover from data sparseness (make more efficient use of longer underlying phrases)use of longer underlying phrases)

Extend the benefits of multiExtend the benefits of multi--factor models to factor models to target languages where sparsetarget languages where sparse--data and search data and search errors are not generally an issueerrors are not generally an issue

EnglishEnglish

Page 191: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Translation with syntax and factors:Handling global and local

dependencies in SMT

Brooke CowanMIT CSAIL

August 17, 2006

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 192: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

1

Goals of statistical machine translation

• Linguistically-correct output

– learn correct syntax and morphology in target language– e.g., noun-phrase agreement, subject-verb agreement, verbs and their

arguments

• Meaning-preserving output

– learn mapping between source and target sentence elements– e.g., identify the subject in the source and ensure it plays the proper role in

the target– can involve a significant amount of reordering

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 193: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

2

Linguistically-correct output

• E.g., in Spanish noun phrases, nouns, determiners, and adjectives areconstrained to agree in gender and number

políticaspolicies

pesquerasfisheries

comunitariascommon

lasthe

det noun adj adj

FEMININE PLURAL

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 194: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

3

Linguistically-correct output

• E.g., in Spanish noun phrases, nouns, determiners, and adjectives areconstrained to agree in gender and number

políticaspolicies

pesquerasfisheries

comunitariascommon

lasthe

det noun adj adj

FEMININE PLURAL

• Phrasal agreement phenomena are generally local in nature.

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 195: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

4

Meaning-preserving output: free word order

• E.g., when translating from German to English, we want to identify and placethe subject, object, and phrasal modifiers in the output

i would like to thank the rapporteur for his report

ich möchte dem berichterstatter für seinen bericht danken

dem berichterstatter möchte ich für seinen bericht danken

für seinen bericht möchte ich dem berichterstatter danken

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 196: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

5

Meaning-preserving output: free word order

• E.g., when translating from German to English, we want to identify and placethe subject, object, and phrasal modifiers in the output

i would like to thank the rapporteur for his report

ich möchte dem berichterstatter für seinen bericht danken

dem berichterstatter möchte ich für seinen bericht danken

für seinen bericht möchte ich dem berichterstatter danken

• Translation involving free-word-order languages or languages pairs with verydifferent basic word order can be quite challenging because these phenomenaare generally global in nature.

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 197: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

6

A hybrid system

• A syntax-based system

– handle global phenomena in translation∗ inter-phrasal reordering∗ verb/argument structure∗ some long-distance agreement phenomena (e.g., subject/verb agreement)

• A factored phrase-based system

– handle local phenomena in translation∗ agreement and reorderings

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 198: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

7

Combining the two systems

• Use the the syntax-based system to reorder the source-language input

• Feed the output of the syntax-based system into the phrase-based system

i would like to thank the rapporteur for his report

für seinen bericht möchte ich dem berichterstatter danken

German input:

Modified German input:

ich WOULD LIKE TO THANK dem berichterstatter für seinen bericht

English output:

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 199: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

8

The syntax-based system

• Discriminatively-trained, tree-to-tree translation system (Cowan, Collins, andKucerova, EMNLP ’06)

• Fully implemented and tested on German-to-English Europarl task

• Model predicts an aligned extended projection (AEP) on the target side

– a syntactic structure encapulating the argument structure of the maintarget-side verb, and

– alignment information between the modifiers on the source and target sides

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 200: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

9

What is an AEP?

np-sb 3 adja erhebliche

s pp-mo 1 appr zwischen

German clause: English AEP:

piat beiden nn gesetzen

vvfin-hd bestehenadv-mo 2 also

adja rechtliche $, ,adja praktischekon undadja wirtschaftlichenn underschiede

Extended Projection (EP) of the main verb

(Frank 2002)

Alignment information

+

S

NP-A VP

V

are

NP-A

SUBJECT: thereOBJECT: 3MOD(1): post-objectMOD(2): pre-subject

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 201: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

10

Integration with Moses

• Factor-based systems handle local phenomena well

• Extensions to Moses

Modified German input:

[ ich ] [ WOULD LIKE TO THANK ] [ dem berichterstatter ] [ für seinen bericht ]

– externally-provided translation options– constraints on reordering– n-best lists of AEPs

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 202: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

11

Research questions

• Factor the translation problem into two parts

– syntax-based system to handle global reorderings and agreements– factor-based system to handle local reordering and agreements

• Can this approach improve overall translation quality?

– past work in rule-based clause restructuring (e.g., Collins, Koehn, Kucerova,ACL ’05)

• What is the best way to combine these systems?

– hard constraints vs soft constraints– voting/backoff framework

Brooke Cowan, MIT CSAIL Syntax and factors in SMT August 17, 2006

Page 203: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Part of Speech Information for Alignment

Alexandra Constantin

2006 CLSP Summer Workshop

Page 204: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Bilingual Dictionary

Haus – house, building, home, household

Page 205: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Lexical Translation Probability Distribution

Page 206: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Implicit Alignment

1 2 3 4Das Haus ist klein.1 2 3 4The house is small.

Page 207: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Alignment Function a

1 2 3 4

Klein ist das Haus

The house is small

1 2 3 4

Page 208: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

POS Motivation

POS information for infrequent words

Page 209: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Example

Page 210: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

IBM Model 1 - Notations

e = target word

f = source word

t(e|f) = probability of translating foreign word f into English word e

f = (f_1, …, f_n) = foreign sentence

e = (e_1,…,e_m) = English sentence

p(e|f) = translation probability

a = alignment function

Page 211: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

IBM Model 1

Page 212: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

EM Algorithm

1. Initialize model (typically with uniform distribution)

2. Apply the model to the data (expectation step)

3. Learn the model from the data (maximization step)

4. Iterate steps 2-3 until convergence

Page 213: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Expectation Step

Page 214: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Expectation Step – p(e|f)

Page 215: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Expectation Step

Page 216: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Maximization Step

Page 217: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Adding POS Information

Page 218: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Experiments- AER

Compare generated alignments against manual alignmentsManual alignments: probable (P) and sure (S)Automated alignments: A

Page 219: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Results

AER 10k 20k 40k 60k 80k 100k

Baseline 53.7 51.8 49.3 48.6 47.5 47.1

Only POS

76.0 75.4 75.5 75.1 75.3 75.1

+ POS 53.6 51.5 49.6 48.4 47.7 47.3

Page 220: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Future Work

Use alignments to train MT system and compare BLEU scoresUse POS information in more complicated alignment methodsUse other factors

Page 221: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

JHU CLSP Summer Workshop 2006Team Presentation

Experimental Resultsfor Confusion Network Decoding

Richard Zens, Nicola Bertoldi, Marcello Federico, Wade Shen

Zens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20061

Page 222: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

IWSLT Task

• Chinese–English, domain: phrase book entries

• corpus statistics:

Chinese Englishsentences 40 Krunning words 351 K 365 Kvocabulary 11 K 10 K

• confusion network statistics (489 sentences):

read speech spontaneous speechavg. length 17.2 17.4avg. / max. depth 2.2 / 92 2.9 / 82avg. number of paths 1021 1032

• no development data for confusion networks

Zens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20062

Page 223: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Results for IWSLT

• phrase table provided by MIT/LL

• competitive baseline results

• results:read speech spontaneous speech

BLEU [%] BLEU [%]verbatim 21.41-best from lattice 19.0 17.21-best from CN 19.0 17.2full CN 19.3 17.8

• improvements are statistically significant (89% confidence)

Zens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20063

Page 224: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Other Ambiguous Input: Punctuation

• Chinese input does not contain punctuation

• illustration:

hello world →

1 2 3 4hello 1.0 ε 0.9 world 1.0 ! 0.7

, 0.1 . 0.2ε 0.1

• results for verbatim input:

punctuation input type BLEU [%]1-best 20.8confusion network 21.0

• competitive performance without tuning→ room for improvement

Zens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20064

Page 225: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Truecasing

truecasing, i.e. restoring case information in lowercase text

• common approach:

– core MT system produces lowercase output– truecasing is done as postprocessing step

• application of factored translation models

1. translate lowercase2. generate truecase output (using a truecase LM)

• results:BLEU [%]

two-step 18.9integrated 17.8

→ somewhat worse performance than dedicated toolZens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20065

Page 226: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

EPPS Task• EPPS: European Parliament Plenary Sessions

• Spanish-English speech-to-speech translation task

• corpus statistics:

Spanish Englishsentences 1.2 Mrunning words 31 M 30 Mvocabulary 140 K 94 K

• confusion network statistics:dev test

sentences 2 633 1 071avg. length 10.6 23.6avg. / max. depth 2.8 / 165 2.7 / 136avg. number of paths 1038 1075

Zens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20066

Page 227: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Results for EPPS Task

dev testASR-WER BLEU ASR-WER BLEU

1-best lattice 19.3 42.2 22.4 37.61-best CN 21.7 40.3 23.3 36.7full CN 7.0 42.4 8.5 38.9

• best result for test in previous work: 37.2 BLEU

• in comparison with previous work on this task, we have

1. a stronger baseline,2. larger improvements and3. much more efficient decoding (4x vs. 25x)

note: all figures in percentZens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20067

Page 228: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Exploration of Confusion Networks

0 2 4 6 8 10 12 14

path length

0.1

1

10

100

1x103

1x104

1x105

1x106

1x107

1x108

1x109

1x1010

avg.

num

ber p

er s

ente

nce

CN totalCN explored1-best explored

Zens, Bertoldi, Federico, Shen: Results for Confusion Net Decoding August 17, 20068

Page 229: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

JHU CLSP Summer Workshop 2006Proposal for Follow-up Research

Exploiting Ambiguous Inputin Statistical Machine Translation

Richard Zens

Human Language Technology and Pattern RecognitionLehrstuhl für Informatik 6

Computer Science DepartmentRWTH Aachen University, Germany

Zens: Exploiting Ambiguous Input in SMT August 17, 20061

Page 230: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Motivation

• MT often used in a pipeline, i.e. the input to the MT systemis the output of another imperfect NLP system, e.g.

– spoken language translation: ASR– segmentation: Chinese words, Arabic tokens– named entity recognition / translation

Zens: Exploiting Ambiguous Input in SMT August 17, 20062

Page 231: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Motivation

• MT often used in a pipeline, i.e. the input to the MT systemis the output of another imperfect NLP system, e.g.

– spoken language translation: ASR– segmentation: Chinese words, Arabic tokens– named entity recognition / translation

• traditional approach: ignore problem, i.e. translate 1-best

• result of previous work:improvements if ambiguity is taken into account

Zens: Exploiting Ambiguous Input in SMT August 17, 20063

Page 232: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Previous Approaches

1. confusion network decoding

• advantages: efficiency, reordering is straightforward• problem: representing alternative segmentations

2. lattice decoding

• advantage: representing alternative segmentations• problem: reordering

goal:⇒ exploit advantages of both approaches,⇒ but avoid weaknesses

Zens: Exploiting Ambiguous Input in SMT August 17, 20064

Page 233: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Generalized Confusion Networks

• confusion networks:

0 1 2 3 4

Zens: Exploiting Ambiguous Input in SMT August 17, 20065

Page 234: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Generalized Confusion Networks

• confusion networks:

0 1 2 3 4

• generalization:

0 1 2 3 4

– add edges that cover multiple positions→ representation of alternative segmentations

– do not add nodes→ retain efficiency, straightforward reordering

Zens: Exploiting Ambiguous Input in SMT August 17, 20066

Page 235: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Improved Reordering for Lattice Input

• confusion network is approximation of lattice→ valuable information might be lost→ potential improvement when using lattices

Zens: Exploiting Ambiguous Input in SMT August 17, 20067

Page 236: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Improved Reordering for Lattice Input

• confusion network is approximation of lattice→ valuable information might be lost→ potential improvement when using lattices

• so far:

– only very local reordering on lattice:∗ skip 1 phrase [Zens & Bender+ 05]

∗ switch positions of 2 or 3 phrases [Kumar & Byrne 05]

Zens: Exploiting Ambiguous Input in SMT August 17, 20068

Page 237: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Improved Reordering for Lattice Input

• confusion network is approximation of lattice→ valuable information might be lost→ potential improvement when using lattices

• so far:

– only very local reordering on lattice:∗ skip 1 phrase [Zens & Bender+ 05]

∗ switch positions of 2 or 3 phrases [Kumar & Byrne 05]

• idea:

– generalize reordering scheme used for CN to lattice input→ long range reordering

Zens: Exploiting Ambiguous Input in SMT August 17, 20069

Page 238: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Goals

• improve robustness to imperfect input

• investigate novel approaches:

– generalized confusion networks– reordering strategies for lattice input

• perform a systematic comparison in terms of MT qualityand computational requirements

• scalability → apply to tasks of different size:small: IWSLT, medium: EPPS/TC-Star, large: NIST/GALE

Zens: Exploiting Ambiguous Input in SMT August 17, 200610

Page 239: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

Targeted Applications

• spoken language translation:

– output of ASR system– punctuation insertion / sentence boundary detection– disfluency detection

• named entity recognition / translation

• Chinese word segmentation

• Arabic tokenization

Zens: Exploiting Ambiguous Input in SMT August 17, 200611

Page 240: Open Source Toolkit for Statistical Machine Translation ... · Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase

References[Kumar & Byrne 05] S. Kumar, W. Byrne: Local phrase reordering models for statisti-

cal machine translation. Proc. HLT/EMNLP, pp. 161–168, Vancouver, Canada, October2005.

[Sadat & Habash 06] F. Sadat, N. Habash: Combination of Preprocessing Schemes forStatistical MT. Proc. COLING/ACL, pp. 1–8, Sydney, Australia, July 2006.

[Xu & Matusov+ 05] J. Xu, E. Matusov, R. Zens, H. Ney: Integrated Chinese Word Seg-mentation in Statistical Machine Translation. Proc. Int. Workshop on Spoken LanguageTranslation (IWSLT), pp. 141–147, Pittsburgh, PA, October 2005.

[Zens & Bender+ 05] R. Zens, O. Bender, S. Hasan, S. Khadivi, E. Matusov, J. Xu,Y. Zhang, H. Ney: The RWTH Phrase-based Statistical Machine Translation System.Proc. Int. Workshop on Spoken Language Translation (IWSLT), pp. 155–162, Pitts-burgh, PA, October 2005.

[Zens & Och+ 02] R. Zens, F.J. Och, H. Ney: Phrase-Based Statistical Machine Transla-tion. Proc. M. Jarke, J. Koehler, G. Lakemeyer, editors, 25th German Conf. on ArtificialIntelligence (KI2002), Vol. 2479 of Lecture Notes in Artificial Intelligence (LNAI), pp.18–32, Aachen, Germany, September 2002. Springer Verlag.

Zens: Exploiting Ambiguous Input in SMT August 17, 200612