Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CS 712 presentation• SMT and Morphology• Case markers and Morphology• Interpolated Back-off for factor based
MT• Simple syntactic and morphological
processing
Presented by Samiulla S, Jayaprakash S, Subhash K.
SMT and morphology
Poor Rich
PoorRich
Mostly rule-based: (Transfer/Interlingua)(Shilon et al., 2010) (Hebrew - Arabic)
Phrase based (Kohen et al., 2003)E.g., (English-Chinese)
Factor based:Semantic and morphological factors are used
(Ananthakrishnan et al., 2009) (English – Hindi)(Avramidis and Kohen, 2008) (English - Greek)
Phrase based with pre-processingNormalizing the source side surface word forms
(Zollmann et al., 2006) (Arabic - English)
Poor to Rich
Finding correct morphological feature for output is a difficult task.
Each inflected form treated as a different word by phrase based model.
E.g., लड़का and लड़के are treated as entirely different. Source side semantics and word order can be used
to find correct morphological structures on rich target side (Anantkrishnan et al., 2009, Avramidis and Kohen, 2008).
Rich to Poor
Comparatively easier as compared to Poor-rich. But the problem of data sparsity is there. The
unseen inflected form of a word will be treated as an OOV by phrase based model
E.g., If the word अ छा has occured in the training corpus but अ छ has not occured, it will be treated as an OOV.
Source side normalization is needed. (Zollmann et al., 2006) (Arabic to English)
Poor to Poor
Phrase based SMT works well if both the languages are morphologically poor.
Rich to Rich This case is difficult to handle with SMT. Mostly interingua based systems were found
(Shilon et al., 2010) (Hebrew to Arabic).
Case markers and MorphologyAddressing the crux of the fluency
problem in English-Hindi SMTBY
Ananthakrishnan RamanathanHansraj Choudhary
Avishek GhoshPushpak Bhattacharyya
Presented byJayaprakash S
Subhash KSamiulla S
Aim
Accurately generating Case markers and suffixes for English-Hindi translation.
What entity on the English side encodes the information contained in case markers and suffixes on the Hindi side?
Introduction
Fundamental problems in English-Indian Language SMT are
1. Wide syntactic divergenceIssues in word ordering in output translations.
2. Richer case marking and suffixes of Indian languages as compared to English
Being the Free Word Order language, Indian languages badly suffers when morph and case markers are incorrect.
Motivation
Difference between English and Hindi:
1. English follows SVO, Hindi follows SOV in general.
2. English uses post-modifiers, wheres as Hindi uses pre-modifier.
3. Hindi allows greater freedom in word order, identifying constituents depending on case marking.
4. Hindi is relatively richer in morphology.
Motivation contd...
Problems 1 and 2 are addressed by reordering the English sentence to Hindi order during preprocessing step.
Here, we focus on solving the 3rd and the 4th problem.
Motivation (Case Markers)
Major constituents(Sub, Obj, ..) in English are identified by their positions in the sentence.
In Hindi, constituents can be moved around without changing the core meaning. Case markers and suffixes are used to identify the constituents.
Example:
राम ने रावण को मारा . (ram ne ravan ko mara) [Ramkilled Ravan]
रावण को राम ने मारा . (Ravan ko ram ne mara) [Ram killed Ravan]
Motivation (Morphology)
Oblique Case:लडके पाठशाला गये ।लडक ने शोर मचाया ।
Future Tense:लडके पाठशाला जायगे ।
Causative Form:लडको ने उ हे लाया ।
Motivation (Sparsity)
Looking at the English-Hindi case, sparsity is the big problem.
It is very unlikely that all words will appear with all case markers in the training corpus.
Example:If 'लडके' appears in training data but 'लडक ' doesn't,it will be treated as an OOV.
Approach
The goal is to carry the semantics and suffix information from English side to Hindi side.
Factored model acts like a vehical for this information across languages.
Factored model:
p(e | f )= 1Z exp (∑ λ i hi(e , f ))
Factors used
Lemma -----> Lemmaboy -----> लडक्
Suffix + Semantics -----> Case Marker / Suffix-s + subj -----> ए
Lemma + Suffix -----> Surface Formलडक् + ए -----> लडके
Factored Model
Word
Suffix / Case marker
Lemma
Word
SemanticRelation
Suffix
Lemma
OutputReordered Input
Semantic Relations
The experiments have been conducted with two kinds of semantic relations.
1. Relations from Universal Networking Language (UNL) (44 relations).
2. Grammatical relations produced by stanford dependancy parser (55 relations).
Stanford Semantic graph
SaidSaidSaid
wasJackthathe
hitJohn
John said that he was hit by jack
nsubjpass
ccomp
complm agent auxpass
nsubj
UNL semantic graph
SaidSaidSay
Jackhe
hitJohn
John said that he was hit by jack
obj
agt
agtobj
@entry.@past
@entry.@past
:01
Experimental setup
SRILM toolkit was used for building language model.
Training, Tuning, Decoding done using Moses Toolkit.
Stanford dependancy parser was used for extracting semantic relations.
Hindi suffix seperation was done using the stemmer described in (Ananthakrishnan and Rao, 2003).
Evaluation criteria
BLEU: measures the precision of n-grams with respect to the reference translations, with a brevity penalty.
NIST: measures the precision of n-grams. This metric is a variant of BLEU, which was shown to correlate better with human judgments.
Subjective: Human evaluators judged the fluency and adequacy, and counted the number of errors in case markers and morphology.
Results
MODEL BLEU NISTBaseline (surface) 24.32 5.85
lemma + suffix 25.16 5.87
lemma + suffix + unl 27.79 6.05
lemma + suffix + stanford 28.21 5.99
Conclusion and future work
The marked improvement is observed in English Hindi SMT after using factored model
The improvement is not only statistically significant but also verified using subjective evaluation
Correctly combining small parts to form a bigger output sentence of good quality, because smaller sentences are getting better accuracy.
References Ananthakrishnan Ramanathan, Hansraj Choudhary, Avishek Ghosh, and
Pushpak Bhattacharyya. 2009. Case markers and morphology: Addressing the crux of the fluency problem in English- Hindi SMT. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pages 800–
808, Suntec, Singapore.Association for Computational Linguistics,2009. Ananthakrishnan, R., and Rao, D., A LightweightStemmer for Hindi, Workshop
on ComputationalLinguistics for South-Asian Languages, EACL, 2003.
Koehn, P., and Hoang, H., Factored Translation Models, Proceedings of EMNLP, 2007.
Marie-Catherine de Marneffe and Manning, C., Stanford Typed Dependency Manual, 2008.
References Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-
based translation. In Proceedings of HLT-NAACL 2003, pages 127–133.
Andreas Zollmann, Ashish Venugopal, and Stephan Vogel. 2006. Bridging the inflection morphology gap for Arabic statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers (NAACL-Short '06). Association for Computational Linguistics, Stroudsburg, PA, USA, 201-204.
Avramidis, E., and Koehn, P., Enriching Morphologically Poor Languages for Statistical Machine Translation, Proceedings of ACL-08: HLT, 2008.
Reshef Shilon, Nizar Habash, Alon Lavie, and Shuly Wintner. 2010. Machine translation between Hebrew and Arabic: Needs, challenges and preliminary solutions. In Proceedings of AMTA 2010: TheNinth Conference of the Association for Machine Translation in the Americas, November.
Interpolated Back-off for Factored Translation Models, Philipp Koehn, Barry Haddow
The Tenth Biennial Conference of the Association for Machine Translation in the Americas,2012
Presented by Samiulla, Jayaprakash, Subhash, CS712, 2013
1) Phrase based models (Limitation)2) Factor based models (Model & Limitation)3) Back-off4) Interpolation back-off5) Result6) Demo
Plan
1. Pure Phrase based models treat 'house' and 'houses' as completely different
2. 'house' occurring in training data have no effect on learning translation of 'houses'
Phrase based models
Solutions may be
1) increasing the corpus size which will have all morphological variants
2) alter the model, such a way that it learns morphological generations
Solutions
increasing the corpus size
5 → avg. word mapping to target words (synonymy and polysemy)
10 → sentence length
1 million → vocabulary size
Parallel sentences required = 5 x 10 x 1 million = 50 million
Corpus Size
Morphologically rich languages do not stop with this count !!!
increasing the corpus size
5 → avg. word mapping to target words (synonymy and polysemy)10 → sentence length1 million → vocabulary size
Eng : 5 x 10 x 1 million = 50 million
Phrase based models
Morphologically rich languages do not stop with this count !!!
increasing the corpus size (morph richer)
Verb is inflected by person(a), number(b), gender(c), aspect(d), tense(e), voice(f).
approximate number mapping per word will be m = 2^(a+b+c+d+e+f) * 5
Noun is inflected by cases similarly.
Corpus Size
increasing the corpus size (morph richer)English Sentence
(I) was being destroyed
அழி ெகா ேத (tamil)
azhi intu kontu iruntaen (Transliteration)
Root voice marker tense marker aspect marker first personDestroy past tense during past progressive singular
azhi intu kontu iruntatal (Transliteration)
Root voice marker tense marker aspect marker feminineDestroy past tense during past progressive
azhi ththu kontu iruntatharkal (Transliteration)
Root active marker tense marker aspect marker
Morphological variants
Phrase based models
alter the model
Change the phrase based model, so model can learn inflections/syntax.
FACTOR BASED MODEL
Factored Model - 1/6
1) Allows for decomposition and enrichment of phrase -based translation model
2) Two translation steps*, one for lemma and one for morphological properties, and a generation step to produce the target surface form.
Factored Model- 2/6
Word
Morphology
Lemma
Word
Lemma
OutputInput
Morphology
Factored Model- 3/6
P(e | f )= exp Σi= 1n λi hi(e , f )
Log Linear Model
Includes several components
hlm(e , f )= plm(e )= p(e1) p (e2 |e1)... p(em |em− 1)LM:
htrans (e , f )= P (e | f )Translation:
Translation step is decomposed in to many mapping and generation steps
Factored Model - 4/6
Introduce latent variables(el, em) on English side
Instead of summing over all derivations, approximate by taking the best
P(es | fs)= P(es | f s , f l , f m)We know source side factors (fs,fl,fm), find es
Factored Model - 5/6
Decomposing fully-factored model into three Mapping steps using chain rule
Factored Model - 6/6
With independence assumptions,
1) Mapping steps probability distributions are estimated from word aligned corpus.
2) Generation model, is estimated from mono-lingual target side corpus.
What if we use a language which is poor in morphology ? If we knew that all words in vocabulary occurred sufficient number of times in corpus.
Who will win ?
1)Factor based 2) Phrase based
Questions-1/3
Questions-2/3
Word
Morphology
Lemma
Word
Lemma
Morphology
Word
Morphology
Lemma
Word
Lemma
OutputInput
Morphology
Surface
Decomposed Factors
Will the independent assumptions in factored model harm anything,anywhere?
If we are using phrase based model, phrase-table have many translation options for each source phrase.
Is there any way to find out, the model learned enough about a particular phrase ?
Questions-3/3
Analysis
how big is the portion of rare words in the test set and do they get translated significantly worse?
Precision by count
Balance between
1) traditional surface form translation models
2) factored models that decompose translation into lemma and morphological feature mapping steps.
Interpolated back-off
Pure phrase based modelscause sparse data problems in model estimation, affecting
both the translation model and the language model
Factor based models
there may be harm due to the independence assumptions
Motivation
Backoff models rely primarily on surface translation model but back off to the decomposed model for unknown word forms.
Interpolated backoff models combine surface and factored translation models, relying more heavily on the surface models for frequent words, and more heavily on the factored models for the rare words.
Motivation
Backoff
1) Primarily relies on the phrase-based model
2) Only for unknown words and phrases, factored model is used for possible translations
3) We may introduce a third model that relies on synonyms to increase coverage.
Interpolated Back-off
1) Back-off model uses factored model for unknown surface forms.
2)Back-off model does not change predictions for rare surface forms ( seen once or twice ) and factored model play no role here.
Interpolated Back-off
1) subtract some of the probability mass from translations e in the primary distribution p1(e|f ) and use it for additional translations from the secondary distribution p2(e|f)
2) obtain α(e|f ) by absolute discounting,subtract a fixed number D from each count
Interpolated Back-off
Example :
f(eng) e(Tamil) Count P1(e/f) α(e|f )House - Veedu(house) - 5 - 0.72- 0.64
Veettai(house-acc) - 1- 0.14- 0.07Veettin(of house) - 1 - 0.14- 0.07
D=0.5
(1-0.64-0.07-0.07) = 0.23 weight given to factored model P2(e/f)
Experiments
1) Training data : European Parliament proceedingsand collected news commentaries
2) The test set : collection of news stories
3) LM: 5-gram Lemma, 7-gram morphology sequence
4) Word Alignment: on lemma, instead of surface forms
Results
Results
1. A plain surface phrase-based models that usesonly surface forms
2. A joint factored models that translates all factors (surface, lemma, morphology) in one translation step
3. A back-off model from the joint phrase-based model to the decomposed model
4. An interpolated back-off model version of above
5. A lemma back-off model from the joint phrase-based model to a model that maps from source lemma into all target factors
6. An interpolated back-off version of above
Analysis
1) Quadratmeter means square occurred 3 times correctly translated by interpolated back off but not by back-off.
2) German word Gewalten was translated incorrectly into violence by the interpolated back-off model, while the simple back-off model arrived at the right translation powers. The word occurred only three times in the corpus with the acceptable translations powers, forces, and branches,but its singular form Gewalt is very frequent and almost always translates into violence.
Conclusion
backoff methods for the better translation of rare words by combining surface word translation with translations obtained from a decomposed factored model.
gains in BLEU and improved translation accuracy
Demo
Decomposedfactors
Surface
References
Bojar, O. and Kos, K. (2010). 2010 failures in english-czech phrase-based mt. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 60–66, Uppsala, Sweden. Association for Computational Linguistics.
Callison-Burch, C., Koehn, P., Monz, C., and Zaidan, O.(2011). Findings of the 2011 workshop on statistical machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 22–64,Edinburgh, Scotland. Association for Computational Linguistics.
Chen, S. F. and Goodman, J. (1998). An emprirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
Collins, M., Koehn, P., and Kucerova, I. (2005). Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 531–540, Ann Arbor, Michigan.
Ananthakrishnan RamanathanPushpak Bhattacharyya
Indian Institute of Technology
Jayprasad Hegde Ritesh M. Shah
Sasikumar MCDAC Mumbai
Simple Syntactic and Morphological Processing Can Help English-Hindi
Statistical Machine Translation
Motivation Proposed solution Syntactic processing Morphological processing System overview Experimental evaluation Results Conclusion
Road Map
Indian Languages, Differ in terms of word order from English
Cost of reordering, not so good translation Morphologically rich
Parallel corpus should cover a large number of word forms Unavailability of large amounts of parallel corpora
SMTEnglish ह द
How to achieve reasonable improvement?
Motivation
Reorder the English sentence as per Hindi syntax Applying transformation rules on the English
parse tree
Making use of suffixes of Hindi words. Using simple suffix separation
Proposed solution
Syntactic reordering performed as a preprocessing stage
Transformation rule
1.SVO order converted to SOV2.Post modifiers are converted to premodifiers
Syntactic processing
Example
Syntactic processing
Not considering different morphological forms of a word as independent entities. Dealing with data sparsity Results in alignment of morphs instead of wordforms
Consider that the training corpus contains only one instance of “players”,
English: Players should just play.Hindi: खला ड़य को केवल खेलना चा हएkhilaadiyom ko keval khelanaa caahie
Morphological processing
Consider the input sentence, English: The men came across some players Expected Hindi translationआद मय को कुछ खलाडी मलेAadmiyon ko kuch khiladii mile If morphology is not used , the system will
choose खला ड़य for “players”
Morphological processing
Morphological analyzer for English(Minnen et al., 2001)
Suffix separation program for Hindi (Ananthakrishnan and Rao, 2003)
Extracts longest possible suffix(shown in table) for the words
Tools for Morphological information extraction
Ananthakrishnan, R., Bhattacharyya, P., Hegde, J. J., Shah, R. M., and Sasikumar, M., Simple Syntactic and Morphological ProcessingCan Help English-Hindi Statistical Machine Translation, Proceedings of IJCNLP, 2008
System overview
Corpus
BLEU, mWER and SSER are used to evaluate
#sentences #wordsTraining 5000 120153Development 483 11675Test 400 8557Monolingual(Hindi)
49937 1123966
Experimental evaluation
mWER(Multi Reference Word Error Rate) –edit distance with the most similar reference translation
SSER(Subjective sentence error rate) – based on human judgement 0 – non sense 1 – Roughly understandable 2 – understandable 3 - Good 4 - Perfect Lower the SSER Better the translation
Sonja Nießen, Franz Josef Och, Gregor Leusch, and Hermann Ney,An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research,International Conference on Language Resources and Evaluation, pages 39–45, 2000.
Evaluation
SSER `
v(s,t) is the score assigned to traslation t and sentence s
n is the number of translation pairs
K is the number of evaluation classes
Evaluation
Base line
Phrase based model (Koehn et. Al, 2003)
Where f is the source sentence and e is the translation. is the translation having highest probability
Which can be rewritten using Bayes' decision rule as,
ê= argmaxe
p(e | f )
ê= argmaxe
p(e) p( f | e)
ê
Translation model p(f|e) is computed using a phrase translation probability distribution
Base line: Phrase translation table
ϕ ( ̄ f | ̄e)=count ( ̄ f , ̄e )
∑f
cout ( f , ̄e)
The parallel corpus is word aligned.Phrase correspondences are foundGiven the set of phrase pairs , phrase translation probability is
Language model p(e) used is a trigram model with modified Kneser- Ney smoothing
( ̄ f , ̄e)
Technique Evaluation Metric
BLEU mWER SSER Roughly understandable
Understandable
baseline(phrase based translation)(Koehn et. Al, 2003)
12.10 77.49 91.20 10% 0%
baseline+syn 16.90 69.18 74.40 42% 12%
baseline+syn+morph
15.88 70.69 66.40 46% 28%
Results
Incorporating syntactic and morphological information increases the translation quality
Useful for English to any Indian language translation
Conclusion
Ananthakrishnan, R., Bhattacharyya, P., Hegde, J. J., Shah, R. M., and Sasikumar, M., Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation, Proceedings of IJCNLP, 2008.
Philip Koehn, Franz Josef Och, and Daniel Marcu,Statistical Phrase-based Translation, Proceedings of HLT-NAACL, 2003
Sonja Nießen, Franz Josef Och, Gregor Leusch, and Hermann Ney, An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research,International Conference on Language Resources and Evaluation, pages 39–45, 2000.
References