Word Sense Disambiguation for Machine Translation Han-Bin Chen 2010.11.24

Word Sense Disambiguation for Machine Translation

Han-Bin Chen2010.11.24

Reference Paper

• Cabezas and Resnik. 2005. Using WSD Techniques for Lexical Selection. (Technical report)

• Carpuat and Wu. 2005. Word Sense Disambiguation vs. Statistical Machine Translation. (ACL 2005)

• Carpuat and Wu. 2005. Improving Statistical Machine Translation using Word Sense Disambiguation. (EMNLP 2007)

• Chan et al. 2007. Word Sense Disambiguation Improves Statistical Machine Translation. (ACL 2007)

• Apidianaki. 2009. Data-driven semantic analysis for multilingual WSD. (EACL 2009)

SMT Workflow

Language model

Input: source language

Translation modelReordering model

Bilingual Corpus Monolingual Corpus

Decoder

Output: target language

MT Research Areas

Language model

Input: source language

Translation modelReordering model

Bilingual Corpus Monolingual Corpus

Decoder

Output: target language

Word Alignment

Evaluation Metric

Translation Model (TM)

• Research in TM– Phrase extraction– Phrase filtering– Phrase augmentation– Word Sense Disambiguation (WSD)

Traditional WSD

• Target word is a single content word– Noun, verb, adjectives

• Classification task with predefined senses– WordNet, HowNet

• Modern WSD system– Not limited to local context– Linguistic information– Position-sensitive– Syntactic– Collocation

• A intuitive application of WSD is SMT

WSD in MT

• Wrong translations from Google Translate• what is today's special ?• 什麼是今天的特色 ?

• I would like to reserve a table for three• 我想保留一表三

• the plane will briefly stop over in the airport• 這架飛機將簡要地停留在機場

WSD in MT: Early Stage

• Whether WSD model can help SMT– Energetically debated question over the past years

• Implicit WSD in SMT– Local context: phrase table & language model

• Dedicated WSD system– Wider variety of context features– Position, sentence-level, document-level features

• WSD should play a role in MT• Publicly available SMT system

– Pharaoh by Philipp Koehn (2003~2004)

Small Scale Experiment (1)

• Marine CARPUAT and Dekai Wu, 2005• Chinese-to-English translation task• Chinese lexical sample task includes 20 target• Trained with state-of-the-art WSD

– 37 training instances per target word

(manual annotation)

Small Scale Experiment (2)

• Hard decision– Force the decoder to choose translations from glosses– Decided by language model

• Surprising and frustrating result– Small data, out-of-domain material, hard decision– Language model effect

Translation Disambiguation (1)

• Clara Cabezas and Philip Resnik, 2005– Address 3 problems of the previous work

• Use aligned target word directly as "sense"– 4 senses for "briefly": { 短暫地 , 短時間地 , 簡潔地 , 簡要地 }– Trained with state-of-the-art WSD– Handle "small data" and "out-of-domain" problems

• Soft decision– Pharoah XML markup

• Choose specified translations and translation model together

– Handle "hard decision" problem

Translation Disambiguation (2)

• Pharaoh XML markup

• Experiment & Result• Spanish-to-English test from Europarl test• WSD: 0.2382, Baseline: 0.2356• Not statistically significant• But at least it is not a decrease

Toward Better Integration into SMT

• How to better integrate WSD into SMT?• Phrase-based sense disambiguation (PSD)• Key points

– Phrase, not word– Integration into log-linear model: weight tuning

Successful Integration (1)

• Chan et al., 2007• Chinese-to-English translation• Sense disambiguation on Chinese phrase

– 1 or 2 consecutive Chinese words– Extract training examples from word-aligned corpus

• Add WSD features– Contextual probability of WSD – Reward probability of WSD

Successful Integration (2)

• Statistically significant improvement

• 將無法取得更多援助或其他讓步• Hiero: will be more aid and other concessions• Hiero+WSD: will be unable to obtain more aid and other

concessions

PSD System (1)

• Marine CARPUAT and Dekai Wu, 2007• WSD model for every phrase

– Extract training data from phrase extraction– WSD probability as new feature

• Comments– Not every phrase need WSD– Technical problem (Pharaoh)

PSD System (2)

• Result: better translation on all test sets

IWSLT 2006 dataset NIST 2004 test set

PSD System (3)

Recent Issue

• Different translations may have the same sense– 2 senses for "briefly", rather than 4– Sense 1: { 短暫地 , 短時間地 }– Sense 2: { 簡潔地 , 簡要地 }

• Automatic sense clustering

Sense Clustering (1)

• Marianna Apidianaki, 2009• Two translations are semantically related

– If they occur in similar context

• Translation unit (TU) as context– Bilingual sentence pair

• Source word "briefly" • Translations

– { 短暫地 , 短時間地 , 簡潔地 , 簡要地 }– {t1, t2, t3, t4}


• "briefly-t1" occurs in context {TU1, TU4, TU25, TU88…}• "briefly-t2" occurs in context {TU5, TU18, TU92, TU126…}• Clustering based on pairwise context similarity

– Apidianaki, 2008


• Experiment– English-Greek translation– 150 ambiguous English nouns

• Evaluation of lexical selection– Strict precision (Exact match with answer word)

– Enriched precision (Match with the cluster of answer word)

• Result

Conclusion

• From WSD to PSD• However, semantic is also important• Future work

– Semantic PSD

Documents

Word Sense Disambiguation for Machine Translation Han-Bin Chen 2010.11.24