67
Machine Translation: Challenges and Approaches

Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

  • Upload
    lyhuong

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MachineTranslation:ChallengesandApproaches

Page 2: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Announcements• Finalexam,Dec.21st,1;10-4PM

• DanJurafsky,StanfordUniv.,"DoesThisVehicleBelongtoYou?"ProcessingtheLanguageofPolicingforImprovingPolice-CommunityRelaPons,Dec.5th,5pmDavisAuditorium

• RupalPatel,NorthwesternUniv.,Speechrecordings–alifealteringformofbiologicaldonaPon,Dec.4th,11:30AM,DavisAuditorium

Page 3: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MultilingualUsers•  ContentlanguagesforwebsitesPercentageofInternetusersbylanguage

http://en.wikipedia.org/wiki/Global_Internet_usage

Slide from Radev

Page 4: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Afrikaans

Bulgarian

Greek German Igno Kurdish Malayalm

Polish sindhi Tamil

Albanian Catalan English GujaraP Indonesian

Kyrgyz Maltese Portuguese

Sinhala Telugu

Amharic Cebuano Esperanto

HaiPanCreole

Irish Lao Maori Punjabi Slovak Thai

Arabic Chichewa

Estonian Hausa Italian LaPn Marathi Romanian

Sloveian Turkish

Armenian

Chinese Filipino Hawaiian Japanese Latvian Mongolian

Russian Somali Ukranian

Azerbaijani

Corsican Finnish Hebrew Javanese Lithuanian

Myanmar

Samoan Spanish Urdu

Basque CroaPan French Hindi Kannada Luxembourgish

Nepala ScotsGaelic

Sundanese

Uzbek

Belarusian

Czech Frisian Hmong Kazakh Macedonian

Norwegian

Serbian Swahili Veitnamese

Bengali Danish Galician Hungarian

Khmer Malagasy Pashto Sesotho Swedish welsh

Bosnian Dutch Georgian Icelandic Korean Malay Persian Shona Tajik Xhosa

Yiddish Yoruba Zulu

Page 5: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

ThankyouforyouraaenPon!QuesPons?

Page 6: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

• Romancelanguageshandledwell

• Similarlanguagepairshandledwell(e.g.,Spanish,Portuguese)

• Formalgenreshandledbeaer

SPllmanyproblems!

Page 7: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Today• MulPlingualChallengesforMT

• MTApproaches•  StaPsPcal• Neuralnet(Thursday)

• MTEvaluaPon

Page 8: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Today• MulPlingualChallengesforMT

• MTApproaches•  StaPsPcal• Neuralnet

• MTEvaluaPon

Page 9: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MultilingualChallenges• OrthographicVariaPons

• Ambiguousspelling  • كتب االوالد اشعارا كَتَبَ األوْالدُ اشعَاراً

•  Ambiguouswordboundaries• 

• LexicalAmbiguity• Bankèبنك(financial)vs. ضفة(river)•  Eatèessen(human)vs.fressen(animal)

Slide from Nizar Habash

Page 10: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MultilingualChallengesMorphologicalVariations

• AffixaPonvs.Root+Paaern

write è written كتب è ب وكتمkill è killed قتل è ل وقتمdo è done فعل è ل وفعم

conj

noun

plural article

•  Tokenization And the cars è and the cars

ات سيارالو è w Al SyArAt Et les voitures è et le voitures

Slide from Nizar Habash

Page 11: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

هنا لستI-am-not here

am

I here

I am not here

not

ت لس

هنا

Translation Divergences conflation

Je ne suis pas ici I not am not here

suis

Je ici ne pas

Slide from Nizar Habash

Page 12: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

TranslationDivergencesEnglish John swam across the river quickly Spanish Juan cruzó rapidamente el río nadando

Gloss: John crossed fast the river swimming Arabic سباحةالنهرعبورجوناسرع

Gloss: sped john crossing the-river swimming Chinese 约翰 快速 地 游 过 这 条

Gloss: John quickly (DE) swam cross the (Quantifier) river

Russian Джон быстро переплыл реку Gloss: John quickly cross-swam river

Slide from Nizar Habash

Page 13: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

LanguageDifferences-vocabulary

[Example from Jurafsky and Martin]

Page 14: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

LanguageDifferences-Syntax• Wordorder

•  SVO:English,Mandarin•  VSO:Irish,ClassicalArabic•  SOV:Hindi,Japanese

• Wordorderinphrases(Fr.)•  lamaisonbleue,thebluehouse

• Wordorderinsentences(Jap.)•  Iliketodrinkcoffee•  watashiwakohiionomunogasukidesu•  I-subjcoffee-objdrink-dat-rhemelike

• PreposiPons(Jap.)•  toMariko,Mariko-ni Slide adapted from Radevc

Page 15: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Today• MulPlingualChallengesforMT

• MTApproaches•  StaPsPcal• Neuralnet

• MTEvaluaPon

Page 16: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTApproachesMTPyramid

S

(Source)

T (Target)

I (Interlingua)

syntax

semantics

phrases phrases

syntax

semantics

Page 17: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

String-to-StringTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Slide from Radev

Page 18: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTApproachesGistingExample

Sobre la base de dichas experiencias se estableció en 1988 una metodología.

Envelope her basis out speak experiences them settle at 1988 one methodology.

On the basis of these experiences, a methodology was arrived at in 1988.

Page 19: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Phrase-BasedTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Slide from Radev

Page 20: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Tree-to-TreeTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Slide from Radev

Page 21: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTApproachesTransferExample

• TransferLexicon• MapSLstructuretoTLstructure

à

poner

X mantequilla en

Y

:obj :mod :subj

:obj

butter

X Y

:subj :obj

X puso mantequilla en Y X buttered Y Slide from Nizar Habash

Page 22: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Tree-to-StringTranslation

S T

I

syntax

semantics

phrases phrases

syntax

semantics

Slide from Radev

Page 23: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTApproachesMTPyramid

S

(Source)

T (Target)

I (Interlingua)

syntax

semantics

phrases phrases

syntax

semantics

Page 24: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTApproachesInterlinguaExample:LexicalConceptualStructure

(Dorr, 1993)

Page 25: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTApproachesMTPyramid

S

(Source)

T (Target)

I (Interlingua)

syntax

semantics

phrases phrases

syntax

semantics

Interlingual Lexicons

Transfer Lexicons Transfer Lexicons

Dictionaries/Parallel Corpora

Page 26: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Today• MulPlingualChallengesforMT

• MTApproaches•  StaPsPcal• Neuralnet

• MTEvaluaPon

Page 27: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

TranslationasDecoding• “OnenaturallywondersiftheproblemoftranslaPoncouldconceivablybetreatedasaproblemincryptography.WhenIlookatanarPcleinRussian,Isay:'ThisisreallywriaeninEnglish,butithasbeencodedinsomestrangesymbols.Iwillnowproceedtodecode.'“• WarrenWeaver,“TranslaPon(1955)”

Slide from Radev

Page 28: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

http://www.ancientegypt.co.uk/writing/rosetta.html

Carved in 196 BC in Egypt Deciphered by Champollion in 1822 Mixture of Egyptian (hieroglyphs and Demotic) and Greek

TheFirstparallelcorpus:TheRosettaStone

Slide from Radev

Page 29: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Europarl:AParallelCorpusforStatisticalMachineTranslation• ProceedingsoftheEuropeanParliament• 21Europeanlanguages

• Romanic(French,Italian,Spanish,Portuguese,Romanian),Germanic(English,Dutch,German,Danish,Swedish),Slavik(Bulgarian,Czech,Polish,Slovak,Slovene),Finni-Ugric(Finnish,Hungarian,Estonian),BalPc(Latvian,Lithuanian),andGreek

• 60millionwords/language• Mustbealignedfirst

Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf

Page 30: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf

Page 31: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix
Page 32: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StatisticalMTNoisyChannelModel

Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Page 33: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red 2. red flower a 3. flower red a 4. a red dog 5. dog cat mouse 6. a red flower

Page 34: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix
Page 35: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red Low 2. red flower a Low 3. flower red a Low 4. a red dog High 5. dog cat mouse Low 6. a red flower High

Page 36: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix
Page 37: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red Low High 2. red flower a Low High 3. flower red a Low High 4. a red dog High Low 5. dog cat mouse Low Low 6. a red flower High High

Page 38: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StatisticalMTTranslate from French: “une fleur rouge”?

Slide from Radev

p(e) p(f|e) p(e)*p(f|e)

1. a flower red Low High Low 2. red flower a Low High Low 3. flower red a Low High Low 4. a red dog High Low Low 5. dog cat mouse Low Low Low 6. a red flower High High High

Page 39: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StatisticalMTAutomaticWordAlignment

•  GIZA++•  AstaPsPcalmachinetranslaPontoolkitusedtotrainwordalignments.•  UsesExpectaPon-MaximizaPonwithvariousconstraintstobootstrapalignments

Slide based on Kevin Knight’s http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Mary

did

not

slap

the

green

witch

Maria no dio una bofetada a la bruja verde

Slide from Nizar Habash

Page 40: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix
Page 41: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StatisticalMTIBMModel(Word-basedModel)

http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Page 42: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

IBM’sEMtrainedmodels(1-5)•  WordtranslaPon•  Localalignment•  FerPliPes•  Class-basedalignment•  Re-orderingAllareseparatemodelstotrain!Model1: ∏

=+==

m

jajm jefp

nceafpeapeafp

1

)|()1(

),|(*)|()|,(

Page 43: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Phrase-BasedStatisticalMT

•  Foreign input segmented in to phrases –  “phrase” is any sequence of words

•  Each phrase is probabilistically translated into English –  P(to the conference | zur Konferenz) –  P(into the meeting | zur Konferenz)

•  Phrases are probabilistically re-ordered See [Koehn et al, 2003] for an intro. This was state-of-the-art before neural MT

Morgen fliege ich nach Kanada zur Konferenz

Tomorrow I will fly to the conference In Canada

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 44: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

WordAlignmentInducedPhrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 45: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

WordAlignmentInducedPhrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the)

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 46: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

WordAlignmentInducedPhrases

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch)

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 47: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) …

Word Alignment Induced Phrases Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 48: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Mary did not slap the green witch

Maria no dió una bofetada a la bruja verde

(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)

WordAlignmentInducedPhrasesSlide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 49: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

AdvantagesofPhrase-BasedSMT

• Many-to-manymappingscanhandlenon-composiPonalphrases

• LocalcontextisveryusefulfordisambiguaPng• “Interestrate” à…• “Interestin”à…

• Themoredata,thelongerthelearnedphrases•  SomePmeswholesentences

Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt

Page 50: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

StringtoTreeTranslation

(Yamada and Knight 2001)

He adores listening to music

He music to listening adores

He/ha music to listening/no ga adores/desu

Page 51: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Clauserestructuring(Collinsetal.)•  IchwerdeIhnendenReportaushaendigen…damitSiedeneventuelluebernehmentkoennen.

•  Iwillpass_onto_youthereport,so_thatyoucanadoptthatperhaps

•  verbiniPal:thatperhapsadoptcan->adoptthatperhapscan

•  verbsecond:sothatyouadopt…can->sothatyoucanadopt

•  movesubject:sothatcanyouadopt->sothatyoucanadopt

•  parPcles:weacceptthepresidency*ParPcle*->weacceptthepresidency(inGerman,split-prefixphrasalverbsareverycommon,e.g.,“anrufen”->“rufensiebiaenocheinmalan”–callrightbackplease)

Slide from Radev

Page 52: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

SynchronousGrammars• Generateparsetreesinparallelintwolanguagesusingdifferentrules

• E.g.,• NP->ADJN(inEnglish)• NP->NADJ(inSpanish)

• ITG(InversionTransducPonGrammar)[Wu1995]• Don’tallowallpermutaPonsinderivaPons• Only<>and[]areallowed

Slide from Radev

Page 53: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTApproachesPracticalConsiderations

• ResourcesAvailability•  ParsersandGenerators

•  Input/Outputcompatability

•  TranslaPonLexicons•  Word-basedvs.Transfer/Interlingua

•  ParallelCorpora•  Domainofinterest•  Biggerisbeaer

•  TimeAvailability•  StaPsPcaltraining,resourcebuilding

Slide from Nizar Habash

Page 54: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Today• MulPlingualChallengesforMT

• MTApproaches•  StaPsPcal• Neuralnet(Thursday)

• MTEvaluaPon

Page 55: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MTEvaluation• Moreartthanscience• WiderangeofMetrics/Techniques

•  interface,…,scalability,…,faithfulness,...space/Pmecomplexity,…etc.

• AutomaPcvs.Human-based• DumbMachinesvs.SlowHumans

Slide from Nizar Habash

Page 56: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

5 contents of original sentence conveyed (might need minor corrections)

4 contents of original sentence conveyed BUT errors in word order

3 contents of original sentence generally conveyed BUT errors in relationship between phrases, tense, singular/plural, etc.

2 contents of original sentence not adequately conveyed, portions of original sentence incorrectly translated, missing modifiers

1 contents of original sentence not conveyed, missing verbs, subjects, objects, phrases or clauses

Human-based Evaluation Example Accuracy Criteria

Slide from Nizar Habash

Page 57: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

5 clear meaning, good grammar, terminology and sentence structure

4 clear meaning BUT bad grammar, bad terminology or bad sentence structure

3 meaning graspable BUT ambiguities due to bad grammar, bad terminology or bad sentence structure

2 meaning unclear BUT inferable

1 meaning absolutely unclear

Human-based Evaluation Example Fluency Criteria

Slide from Nizar Habash

Page 58: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

Today:Crowdsourcing• AmazonMechanicalTurkorCrowdFlower

• CreateaHITforeachsentence

• GetmulPpleworkerstorate

• Pay.01to.10perhit

• CompleteanevaluaPoninhours(vsdays/weeks)

• Ethics?

Page 59: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

AutomaticEvaluationExampleBleuMetric(Papinenietal2001)

• Bleu•  BiLingualEvalua@onUnderstudy•  Modifiedn-gramprecisionwithlengthpenalty•  Quick,inexpensiveandlanguageindependent•  CorrelateshighlywithhumanevaluaPon•  BiasagainstsynonymsandinflecPonalvariaPons

Slide from Nizar Habash

Page 60: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

AutomaticEvaluationExampleBleuMetric

TestSentence

colorlessgreenideassleepfuriously

Gold Standard References

all dull jade ideas sleep irately drab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Slide from Nizar Habash

Page 61: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

AutomaticEvaluationExampleBleuMetric

TestSentence

colorlessgreenideassleepfuriously

Gold Standard References

all dull jade ideas sleep irately drab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Unigram precision = 4/5

Slide from Nizar Habash

Page 62: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

AutomaticEvaluationExampleBleuMetric

TestSentence

colorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriously

Gold Standard References

all dull jade ideas sleep irately drab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Unigram precision = 4 / 5 = 0.8 Bigram precision = 2 / 4 = 0.5

Bleu Score = (a1 a2 …an)1/n = (0.8 ╳ 0.5)½ = 0.6325 è 63.25

Slide from Nizar Habash

Page 63: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

BLEUscoresfor110translationsystemstrainedonEuroparl

Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf

Page 64: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix
Page 65: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

AutomaticEvaluationExampleMETEOR(LavieandAgrawal2007)

•  MetricforEvaluaPonofTranslaPonwithExplicitwordOrdering

•  ExtendedMatchingbetweentranslaPonandreference•  Porterstems,wordNetsynsets

•  UnigramPrecision,Recall,parameterizedF-measure•  ReorderingPenalty•  ParameterscanbetunedtoopPmizecorrelaPonwithhumanjudgments

•  Notbiasedagainst“non-staPsPcal”MTsystems

Slide from Nizar Habash

Page 66: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

MetricsMATRWorkshop• WorkshopinAMTAconference2008

•  AssociaPonforMachineTranslaPonintheAmericas

•  EvaluaPngevaluaPonmetrics• Compared39metrics

•  7baselinesand32newmetrics•  VariousmeasuresofcorrelaPonwithhumanjudgment•  DifferentcondiPons:textgenre,sourcelanguage,numberofreferences,etc.

Slide from Nizar Habash

Page 67: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix

AutomaticEvaluationExampleSEPIA(HabashandElKholy2008)•  AsyntacPcally-awareevaluaPonmetric•  (LiuandGildea,2005;Owczarzaketal.,2007;GiménezandMàrquez,2007)

•  UsesdependencyrepresentaPon•  MICAparser(Nasr&Rambow2006)•  77%ofallstructuralbigramsaresurfacen-gramsofsize2,3,4

•  Includesdependencysurfacespanasafactorinscore•  long-distancedependenciesshouldreceiveagreaterweightthanshortdistancedependencies

•  HigherdegreeofgrammaPcality? 0%5%10%15%20%25%30%35%40%45%50%

1 2 3 4plus