68
Bridging the Gap: Machine Translation for Lesser Resourced Languages Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst

Bridging the Gap: Machine Translation for Lesser Resourced Languages

  • Upload
    karli

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Bridging the Gap: Machine Translation for Lesser Resourced Languages. Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst. Mapudungun 900,000 Speakers. - PowerPoint PPT Presentation

Citation preview

Page 1: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

Bridging the Gap: Machine Translation for Lesser Resourced Languages

Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell,

Robert Frederking, Erik Peterson, Kathrin Probst

Page 2: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

2

Inupiaq100’s of Speakers

Quechua6 Million Speakers

Mapudungun900,000 Speakers

Katrina100’s of Speakers

Page 3: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

3

Machine Translation (MT)

SourceLanguage

TargetLanguage

Page 4: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

4

Machine Translation (MT)

SourceLanguage

TargetLanguageDirect

Statistical MTExample Based MT

Page 5: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

5

Machine Translation (MT)

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

DirectStatistical MT

Example Based MT

Syntactic Parsing

Morphologial Analysis+

Page 6: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

6

Machine Translation (MT)

Semantic Analysis

Sentence

Planning

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

DirectStatistical MT

Example Based MT

Interlingua

Syntactic Parsing

Morphologial Analysis+

Page 7: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

7

Machine Translation (MT)

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

Interlingua

DirectStatistical MT

Example Based MT

+ High quality- Expertise intensive

development cycle

Syntactic Parsing

Morphologial Analysis+

Page 8: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

8

Machine Translation (MT)

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

Interlingua

DirectStatistical MT

Example Based MT

+ Short development time

- Requires large bilingual corpus

Syntactic Parsing

Morphologial Analysis+

Page 9: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

9

Machine Translation (MT)

Syntactic Parsing

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

Interlingua

Morphologial Analysis+

TransferRule Based MT

DirectStatistical MT

Example Based MT

Our Approach

Page 10: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

10

Machine Translation (MT)

Syntactic Parsing

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

Interlingua

Morphologial Analysis+

TransferRule Based MT

DirectStatistical MT

Example Based MT

+ High quality- Expertise intensive

development cycle

Page 11: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

11

Machine Translation (MT)

Syntactic Parsing

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

Interlingua

Morphologial Analysis+ Automate the

development of deep-analysis MT

+ High quality- Expertise intensive

development cycle

Page 12: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

12

Our Position

Linguistic Structure

and

Bilingual Informants

help automate the development of

deep-analysis machine translation systems

Page 13: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

13

Sub-Problems

1. Morphology Induction

2. Syntax Refinement

Page 14: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

14

Morphology Induction

1. Linguistic Structure

2. Bilingual Informants

Page 15: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

15

Morphology Induction

1. Linguistic Structure

2. Bilingual Informants

Page 16: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

16

Paradigms Organize Morphology

Hab Mode ReportPol / Mood

TenseObj Agr

ke pe (ü)rkela a

fiki fu

Ø Ø Ønu afu

ØØ Ø

Mapudungun

Subj Agr / Mood

(ü)n

li

chi

yu

Loc Asp

pa tu

pu ka

Ø Ø

Page 17: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

17

Paradigm Discovery in 3 Steps1. Search out partial paradigms in a network of candidates

2. Cluster overlapping partial paradigms

3. Filter the clusters, keeping the largest clusters most likely to model true paradigms

e.er.erá.ido.ieron.ió28: deb, escog, ofrec, roconoc, vend, ...

e.ido.ieron.ir.irá.ió28: asist, dirig, exig, ocurr, sufr, ...

e.erá.ido.ieron.ió28: deb, escog, ...

e.er.ido.ieron.ió46: deb, parec, recog...

e.ido.ieron.irá.ió28: asist, dirig, ...

e.ido.ieron.ir.ió39: asist, bat, sal, ...

e.er.erá.ieron.ió32: deb, padec, romp, ...

e.ido.ieron.ió86: asist, deb, hund,...

e.erá.ieron.ió32: deb, padec, ...

er.ido.ieron.ió58: ascend, ejerc,

recog, ...

ido.ieron.ir.ió44: interrump, sal, ...

azar.e.ido.ieron.ir.ió1: sal

A portion of a Spanish paradigm candidate network

Page 18: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

18

Morpho Challenge 2007

Unsupervised Morphology Induction Competition

English• 3rd Place Overall• Bested the Strong Baseline Morfessor (Creutz, 2006)

German• 1st Place when Combined with Morfessor

Page 19: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

19

Morpho Challenge 2007

Unsupervised Morphology Induction Competition

English• 3rd Place Overall• Bested the Strong Baseline Morfessor (Creutz, 2006)

German• 1st Place when Combined with Morfessor

No Mapudungun yetAgglutinative sequences of suffixes coming soon

Page 20: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

20

Our Machine Translation Architecture

INPUT TEXT

Page 21: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

21

Our Machine Translation Architecture

INPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Page 22: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

22

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

Machine Translation

System

Morphology Analysis

Morphology Analysis Lexicon

Page 23: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

23

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

Page 24: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

24

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

Page 25: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

25

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

Page 26: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

26

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

Page 27: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

27

Sub-Problems

1. Morphology Induction

2. Syntax Refinement

Page 28: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

28

Syntax Refinement

1. Linguistic Structure

2. Bilingual Informants

Page 29: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

29

Syntax Refinement

1. Linguistic Structure

2. Bilingual Informants

Page 30: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

30

Mapudungun

pelafiñ Maria

Spanish

No vi a María

English

I didn’t see Maria

Linguistic Structure: Syntax

Page 31: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

31

Mapudungun

pelafiñ Mariape -la -fi -ñ Mariasee -neg -3.obj -1.subj.indicative Maria

Spanish

No vi a MaríaNo vi a Maríaneg see.1.subj.past.indicative acc Maria

English

I didn’t see Maria

Linguistic Structure: Syntax

Page 32: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

32

V

pe

pe-la-fi-ñ Maria

Page 33: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

33

V

pe

pe-la-fi-ñ Maria

VSuff

laNegation = +

Page 34: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

34

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffGPass all features up

Page 35: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

35

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fiobject person = 3

Page 36: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

36

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffGPass all features up from both children

Page 37: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

37

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

person = 1number = sgmood = ind

Page 38: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

38

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

Pass all features up from both children

VSuffG

Page 39: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

39

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

Pass all features up from both children

VSuffG

VCheck that:1) negation = +2) tense is undefined

Page 40: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

40

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

V NP

N

Maria

N person = 3number = sghuman = +

Page 41: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

41

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

Check that NP is human = +

Pass features up from V VP

Page 42: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

42

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

Page 43: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

43

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass all features to Spanish side

Page 44: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

44

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass all features down

Page 45: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

45

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass object features down

Page 46: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

46

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

Accusative marker on objects is introduced because human = +

VP

NP“a”V

Page 47: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

47

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

VP::VP [VBar NP] -> [VBar "a" NP]( (X1::Y1)

(X2::Y3)

((X2 type) = (*NOT* personal)) ((X2 human) =c +)

(X0 = X1) ((X0 object) = X2)

(Y0 = X0)

((Y0 object) = (X0 object))(Y1 = Y0)(Y3 = (Y0 object))((Y1 objmarker person) = (Y3 person))((Y1 objmarker number) = (Y3 number))((Y1 objmarker gender) = (Y3 gender)))

Page 48: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

48

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

Pass person, number, and mood features to Spanish Verb

Assign tense = past

Page 49: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

49

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

Introduced because negation = +

Page 50: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

50

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

ver

Page 51: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

51

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vervi

person = 1number = sgmood = indicativetense = past

Page 52: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

52

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vi N

María

N

Pass features over to Spanish side

Page 53: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

53

V

pe

I didn’t see Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vi N

María

N

Page 54: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

54

Syntax Refinement

1. Linguistic Structure

2. Bilingual Informants

Page 55: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

55

Morphology Generation

Syntax Refinement Architecture

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Page 56: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

56

Morphology Generation

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

Rule Refinement

OUTPUT TEXT

Morphology Analysis

Online

Translation

Correction

Tool

Syntax Refinement Architecture

Page 57: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

57

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

Rule RefinementMorphology

Analysis

Online

Translation

Correction

Tool

Syntax Refinement Architecture

Page 58: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

58

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

Rule Refinement

OUTPUT TEXT

Morphology Analysis

Online

Translation

Correction

Tool

Syntax Refinement Architecture

Morphologhy Generation

Page 59: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

59

Children played a game

Page 60: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

60

Page 61: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

61

Page 62: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

62

The children played a game

Page 63: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

63

VP

Det

NP

NP

N

niños

N

VP

S

PolP

V

jugaron

V

un N

juego

N

Refining the Grammar

Page 64: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

64

VP

Det

NP

NP

N

niños

N

VP

S

PolP

V

jugaron

V

un N

juego

Nlos

Refining the Grammar

Page 65: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

65

VP

Det

NP

NP

N

niños

N

VP

S

PolP

V

jugaron

V

un N

juego

Nlos

Refining the Grammar

Page 66: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

66

Syntax Refinement Summary

• Increases translation quality on unseen data– English-Spanish experiments (Font Llitjós et al, 2007, MT Summit)

• Generalizes to a Mapudungun-Spanish machine translation system

Page 67: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

67

Overall Summary

Linguistic Structure

and

Bilingual Informants

help automate the development of

deep-analysis machine translation systems:

Morphology Induction

and

Syntax Refinement

Page 68: Bridging the Gap: Machine Translation for             Lesser Resourced Languages

68

Thank You!