CS 712 presentation SMT and Morphology Case markers and ...pb/cs712-2013/sami... · Richer case marking and suffixes of Indian languages as compared to English Being the Free Word

CS 712 presentation• SMT and Morphology• Case markers and Morphology• Interpolated Back-off for factor based

MT• Simple syntactic and morphological

processing

Presented by Samiulla S, Jayaprakash S, Subhash K.

SMT and morphology

Poor Rich

PoorRich

Mostly rule-based: (Transfer/Interlingua)(Shilon et al., 2010) (Hebrew - Arabic)

Phrase based (Kohen et al., 2003)E.g., (English-Chinese)

Factor based:Semantic and morphological factors are used

(Ananthakrishnan et al., 2009) (English – Hindi)(Avramidis and Kohen, 2008) (English - Greek)

Phrase based with pre-processingNormalizing the source side surface word forms

(Zollmann et al., 2006) (Arabic - English)

Poor to Rich

Finding correct morphological feature for output is a difficult task.

Each inflected form treated as a different word by phrase based model.

E.g., लड़का and लड़के are treated as entirely different. Source side semantics and word order can be used

to find correct morphological structures on rich target side (Anantkrishnan et al., 2009, Avramidis and Kohen, 2008).

Rich to Poor

Comparatively easier as compared to Poor-rich. But the problem of data sparsity is there. The

unseen inflected form of a word will be treated as an OOV by phrase based model

E.g., If the word अ छा has occured in the training corpus but अ छ has not occured, it will be treated as an OOV.

Source side normalization is needed. (Zollmann et al., 2006) (Arabic to English)

Poor to Poor

Phrase based SMT works well if both the languages are morphologically poor.

Rich to Rich This case is difficult to handle with SMT. Mostly interingua based systems were found

(Shilon et al., 2010) (Hebrew to Arabic).

Case markers and MorphologyAddressing the crux of the fluency

problem in English-Hindi SMTBY

Ananthakrishnan RamanathanHansraj Choudhary

Avishek GhoshPushpak Bhattacharyya

Presented byJayaprakash S

Subhash KSamiulla S

Aim

Accurately generating Case markers and suffixes for English-Hindi translation.

What entity on the English side encodes the information contained in case markers and suffixes on the Hindi side?

Introduction

Fundamental problems in English-Indian Language SMT are

1. Wide syntactic divergenceIssues in word ordering in output translations.

2. Richer case marking and suffixes of Indian languages as compared to English

Being the Free Word Order language, Indian languages badly suffers when morph and case markers are incorrect.

Motivation

Difference between English and Hindi:

1. English follows SVO, Hindi follows SOV in general.

2. English uses post-modifiers, wheres as Hindi uses pre-modifier.

3. Hindi allows greater freedom in word order, identifying constituents depending on case marking.

4. Hindi is relatively richer in morphology.

Motivation contd...

Problems 1 and 2 are addressed by reordering the English sentence to Hindi order during preprocessing step.

Here, we focus on solving the 3rd and the 4th problem.

Motivation (Case Markers)

Major constituents(Sub, Obj, ..) in English are identified by their positions in the sentence.

In Hindi, constituents can be moved around without changing the core meaning. Case markers and suffixes are used to identify the constituents.

Example:

राम ने रावण को मारा . (ram ne ravan ko mara) [Ramkilled Ravan]

रावण को राम ने मारा . (Ravan ko ram ne mara) [Ram killed Ravan]

Motivation (Morphology)

Oblique Case:लडके पाठशाला गये ।लडक ने शोर मचाया ।

Future Tense:लडके पाठशाला जायगे ।

Causative Form:लडको ने उ हे लाया ।

Motivation (Sparsity)

Looking at the English-Hindi case, sparsity is the big problem.

It is very unlikely that all words will appear with all case markers in the training corpus.

Example:If 'लडके' appears in training data but 'लडक ' doesn't,it will be treated as an OOV.

Approach

The goal is to carry the semantics and suffix information from English side to Hindi side.

Factored model acts like a vehical for this information across languages.

Factored model:

p(e | f )= 1Z exp (∑ λ i hi(e , f ))

Factors used

Lemma -----> Lemmaboy -----> लडक्

Suffix + Semantics -----> Case Marker / Suffix-s + subj -----> ए

Lemma + Suffix -----> Surface Formलडक् + ए -----> लडके

Factored Model

Word

Suffix / Case marker

Lemma

Word

SemanticRelation

Suffix

Lemma

OutputReordered Input

Semantic Relations

The experiments have been conducted with two kinds of semantic relations.

1. Relations from Universal Networking Language (UNL) (44 relations).

2. Grammatical relations produced by stanford dependancy parser (55 relations).

Stanford Semantic graph

SaidSaidSaid

wasJackthathe

hitJohn

John said that he was hit by jack

nsubjpass

ccomp

complm agent auxpass

nsubj

UNL semantic graph

SaidSaidSay

Jackhe

hitJohn

John said that he was hit by jack

obj

agt

agtobj

@entry.@past

@entry.@past

:01

Experimental setup

SRILM toolkit was used for building language model.

Training, Tuning, Decoding done using Moses Toolkit.

Stanford dependancy parser was used for extracting semantic relations.

Hindi suffix seperation was done using the stemmer described in (Ananthakrishnan and Rao, 2003).

Evaluation criteria

BLEU: measures the precision of n-grams with respect to the reference translations, with a brevity penalty.

NIST: measures the precision of n-grams. This metric is a variant of BLEU, which was shown to correlate better with human judgments.

Subjective: Human evaluators judged the fluency and adequacy, and counted the number of errors in case markers and morphology.

Results

MODEL BLEU NISTBaseline (surface) 24.32 5.85

lemma + suffix 25.16 5.87

lemma + suffix + unl 27.79 6.05

lemma + suffix + stanford 28.21 5.99

Conclusion and future work

The marked improvement is observed in English Hindi SMT after using factored model

The improvement is not only statistically significant but also verified using subjective evaluation

Correctly combining small parts to form a bigger output sentence of good quality, because smaller sentences are getting better accuracy.

References Ananthakrishnan Ramanathan, Hansraj Choudhary, Avishek Ghosh, and

Pushpak Bhattacharyya. 2009. Case markers and morphology: Addressing the crux of the fluency problem in English- Hindi SMT. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pages 800–

808, Suntec, Singapore.Association for Computational Linguistics,2009. Ananthakrishnan, R., and Rao, D., A LightweightStemmer for Hindi, Workshop

on ComputationalLinguistics for South-Asian Languages, EACL, 2003.

Koehn, P., and Hoang, H., Factored Translation Models, Proceedings of EMNLP, 2007.

Marie-Catherine de Marneffe and Manning, C., Stanford Typed Dependency Manual, 2008.

References Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-

based translation. In Proceedings of HLT-NAACL 2003, pages 127–133.

Andreas Zollmann, Ashish Venugopal, and Stephan Vogel. 2006. Bridging the inflection morphology gap for Arabic statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers (NAACL-Short '06). Association for Computational Linguistics, Stroudsburg, PA, USA, 201-204.

Avramidis, E., and Koehn, P., Enriching Morphologically Poor Languages for Statistical Machine Translation, Proceedings of ACL-08: HLT, 2008.

Reshef Shilon, Nizar Habash, Alon Lavie, and Shuly Wintner. 2010. Machine translation between Hebrew and Arabic: Needs, challenges and preliminary solutions. In Proceedings of AMTA 2010: TheNinth Conference of the Association for Machine Translation in the Americas, November.

Interpolated Back-off for Factored Translation Models, Philipp Koehn, Barry Haddow

The Tenth Biennial Conference of the Association for Machine Translation in the Americas,2012

Presented by Samiulla, Jayaprakash, Subhash, CS712, 2013

1) Phrase based models (Limitation)2) Factor based models (Model & Limitation)3) Back-off4) Interpolation back-off5) Result6) Demo

Plan

1. Pure Phrase based models treat 'house' and 'houses' as completely different

2. 'house' occurring in training data have no effect on learning translation of 'houses'

Phrase based models

Solutions may be

1) increasing the corpus size which will have all morphological variants

2) alter the model, such a way that it learns morphological generations

Solutions

increasing the corpus size

5 → avg. word mapping to target words (synonymy and polysemy)

10 → sentence length

1 million → vocabulary size

Parallel sentences required = 5 x 10 x 1 million = 50 million

Corpus Size

Morphologically rich languages do not stop with this count !!!

increasing the corpus size

5 → avg. word mapping to target words (synonymy and polysemy)10 → sentence length1 million → vocabulary size

Eng : 5 x 10 x 1 million = 50 million

Phrase based models

Morphologically rich languages do not stop with this count !!!

increasing the corpus size (morph richer)

Verb is inflected by person(a), number(b), gender(c), aspect(d), tense(e), voice(f).

approximate number mapping per word will be m = 2^(a+b+c+d+e+f) * 5

Noun is inflected by cases similarly.

Corpus Size

increasing the corpus size (morph richer)English Sentence

(I) was being destroyed

அழி ெகா ேத (tamil)

azhi intu kontu iruntaen (Transliteration)

Root voice marker tense marker aspect marker first personDestroy past tense during past progressive singular

azhi intu kontu iruntatal (Transliteration)

Root voice marker tense marker aspect marker feminineDestroy past tense during past progressive

azhi ththu kontu iruntatharkal (Transliteration)

Root active marker tense marker aspect marker

Morphological variants

Phrase based models

alter the model

Change the phrase based model, so model can learn inflections/syntax.

FACTOR BASED MODEL

Factored Model - 1/6

1) Allows for decomposition and enrichment of phrase -based translation model

2) Two translation steps*, one for lemma and one for morphological properties, and a generation step to produce the target surface form.

Factored Model- 2/6

Word

Morphology

Lemma

Word

Lemma

OutputInput

Morphology

Factored Model- 3/6

P(e | f )= exp Σi= 1n λi hi(e , f )

Log Linear Model

Includes several components

hlm(e , f )= plm(e )= p(e1) p (e2 |e1)... p(em |em− 1)LM:

htrans (e , f )= P (e | f )Translation:

Translation step is decomposed in to many mapping and generation steps


Introduce latent variables(el, em) on English side

Instead of summing over all derivations, approximate by taking the best

P(es | fs)= P(es | f s , f l , f m)We know source side factors (fs,fl,fm), find es


Decomposing fully-factored model into three Mapping steps using chain rule


With independence assumptions,

1) Mapping steps probability distributions are estimated from word aligned corpus.

2) Generation model, is estimated from mono-lingual target side corpus.

What if we use a language which is poor in morphology ? If we knew that all words in vocabulary occurred sufficient number of times in corpus.

Who will win ?

1)Factor based 2) Phrase based

Questions-1/3

Questions-2/3

Word

Morphology

Lemma

Word

Lemma

Morphology

Word

Morphology

Lemma

Word

Lemma

OutputInput

Morphology

Surface

Decomposed Factors

Will the independent assumptions in factored model harm anything,anywhere?

If we are using phrase based model, phrase-table have many translation options for each source phrase.

Is there any way to find out, the model learned enough about a particular phrase ?

Questions-3/3

Analysis

how big is the portion of rare words in the test set and do they get translated significantly worse?

Precision by count

Balance between

1) traditional surface form translation models

2) factored models that decompose translation into lemma and morphological feature mapping steps.

Interpolated back-off

Pure phrase based modelscause sparse data problems in model estimation, affecting

both the translation model and the language model

Factor based models

there may be harm due to the independence assumptions

Motivation

Backoff models rely primarily on surface translation model but back off to the decomposed model for unknown word forms.

Interpolated backoff models combine surface and factored translation models, relying more heavily on the surface models for frequent words, and more heavily on the factored models for the rare words.

Motivation

Backoff

1) Primarily relies on the phrase-based model

2) Only for unknown words and phrases, factored model is used for possible translations

3) We may introduce a third model that relies on synonyms to increase coverage.

Interpolated Back-off

1) Back-off model uses factored model for unknown surface forms.

2)Back-off model does not change predictions for rare surface forms ( seen once or twice ) and factored model play no role here.


1) subtract some of the probability mass from translations e in the primary distribution p1(e|f ) and use it for additional translations from the secondary distribution p2(e|f)

2) obtain α(e|f ) by absolute discounting,subtract a fixed number D from each count


Example :

f(eng) e(Tamil) Count P1(e/f) α(e|f )House - Veedu(house) - 5 - 0.72- 0.64

Veettai(house-acc) - 1- 0.14- 0.07Veettin(of house) - 1 - 0.14- 0.07

D=0.5

(1-0.64-0.07-0.07) = 0.23 weight given to factored model P2(e/f)

Experiments

1) Training data : European Parliament proceedingsand collected news commentaries

2) The test set : collection of news stories

3) LM: 5-gram Lemma, 7-gram morphology sequence

4) Word Alignment: on lemma, instead of surface forms

Results

Results

1. A plain surface phrase-based models that usesonly surface forms

2. A joint factored models that translates all factors (surface, lemma, morphology) in one translation step

3. A back-off model from the joint phrase-based model to the decomposed model

4. An interpolated back-off model version of above

5. A lemma back-off model from the joint phrase-based model to a model that maps from source lemma into all target factors

6. An interpolated back-off version of above

Analysis

1) Quadratmeter means square occurred 3 times correctly translated by interpolated back off but not by back-off.

2) German word Gewalten was translated incorrectly into violence by the interpolated back-off model, while the simple back-off model arrived at the right translation powers. The word occurred only three times in the corpus with the acceptable translations powers, forces, and branches,but its singular form Gewalt is very frequent and almost always translates into violence.

Conclusion

backoff methods for the better translation of rare words by combining surface word translation with translations obtained from a decomposed factored model.

gains in BLEU and improved translation accuracy

Demo

Decomposedfactors

Surface

References

Bojar, O. and Kos, K. (2010). 2010 failures in english-czech phrase-based mt. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 60–66, Uppsala, Sweden. Association for Computational Linguistics.

Callison-Burch, C., Koehn, P., Monz, C., and Zaidan, O.(2011). Findings of the 2011 workshop on statistical machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 22–64,Edinburgh, Scotland. Association for Computational Linguistics.

Chen, S. F. and Goodman, J. (1998). An emprirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.

Collins, M., Koehn, P., and Kucerova, I. (2005). Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 531–540, Ann Arbor, Michigan.

Ananthakrishnan RamanathanPushpak Bhattacharyya

Indian Institute of Technology

Jayprasad Hegde Ritesh M. Shah

Sasikumar MCDAC Mumbai

Simple Syntactic and Morphological Processing Can Help English-Hindi

Statistical Machine Translation

Motivation Proposed solution Syntactic processing Morphological processing System overview Experimental evaluation Results Conclusion

Road Map

Indian Languages, Differ in terms of word order from English

Cost of reordering, not so good translation Morphologically rich

Parallel corpus should cover a large number of word forms Unavailability of large amounts of parallel corpora

SMTEnglish ह द

How to achieve reasonable improvement?

Motivation

Reorder the English sentence as per Hindi syntax Applying transformation rules on the English

parse tree

Making use of suffixes of Hindi words. Using simple suffix separation

Proposed solution

Syntactic reordering performed as a preprocessing stage

Transformation rule

1.SVO order converted to SOV2.Post modifiers are converted to premodifiers

Syntactic processing

Example

Syntactic processing

Not considering different morphological forms of a word as independent entities. Dealing with data sparsity Results in alignment of morphs instead of wordforms

Consider that the training corpus contains only one instance of “players”,

English: Players should just play.Hindi: खला ड़य को केवल खेलना चा हएkhilaadiyom ko keval khelanaa caahie

Morphological processing

Consider the input sentence, English: The men came across some players Expected Hindi translationआद मय को कुछ खलाडी मलेAadmiyon ko kuch khiladii mile If morphology is not used , the system will

choose खला ड़य for “players”

Morphological processing

Morphological analyzer for English(Minnen et al., 2001)

Suffix separation program for Hindi (Ananthakrishnan and Rao, 2003)

Extracts longest possible suffix(shown in table) for the words

Tools for Morphological information extraction

Ananthakrishnan, R., Bhattacharyya, P., Hegde, J. J., Shah, R. M., and Sasikumar, M., Simple Syntactic and Morphological ProcessingCan Help English-Hindi Statistical Machine Translation, Proceedings of IJCNLP, 2008

System overview

Corpus

BLEU, mWER and SSER are used to evaluate

#sentences #wordsTraining 5000 120153Development 483 11675Test 400 8557Monolingual(Hindi)

49937 1123966

Experimental evaluation

mWER(Multi Reference Word Error Rate) –edit distance with the most similar reference translation

SSER(Subjective sentence error rate) – based on human judgement 0 – non sense 1 – Roughly understandable 2 – understandable 3 - Good 4 - Perfect Lower the SSER Better the translation

Sonja Nießen, Franz Josef Och, Gregor Leusch, and Hermann Ney,An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research,International Conference on Language Resources and Evaluation, pages 39–45, 2000.

Evaluation

SSER `

v(s,t) is the score assigned to traslation t and sentence s

n is the number of translation pairs

K is the number of evaluation classes

Evaluation

Base line

Phrase based model (Koehn et. Al, 2003)

Where f is the source sentence and e is the translation. is the translation having highest probability

Which can be rewritten using Bayes' decision rule as,

ê= argmaxe

p(e | f )

ê= argmaxe

p(e) p( f | e)

ê

Translation model p(f|e) is computed using a phrase translation probability distribution

Base line: Phrase translation table

ϕ ( ̄ f | ̄e)=count ( ̄ f , ̄e )

∑f

cout ( f , ̄e)

The parallel corpus is word aligned.Phrase correspondences are foundGiven the set of phrase pairs , phrase translation probability is

Language model p(e) used is a trigram model with modified Kneser- Ney smoothing

( ̄ f , ̄e)

Technique Evaluation Metric

BLEU mWER SSER Roughly understandable

Understandable

baseline(phrase based translation)(Koehn et. Al, 2003)

12.10 77.49 91.20 10% 0%

baseline+syn 16.90 69.18 74.40 42% 12%

baseline+syn+morph

15.88 70.69 66.40 46% 28%

Results

Incorporating syntactic and morphological information increases the translation quality

Useful for English to any Indian language translation

Conclusion

Ananthakrishnan, R., Bhattacharyya, P., Hegde, J. J., Shah, R. M., and Sasikumar, M., Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation, Proceedings of IJCNLP, 2008.

Philip Koehn, Franz Josef Och, and Daniel Marcu,Statistical Phrase-based Translation, Proceedings of HLT-NAACL, 2003

Sonja Nießen, Franz Josef Och, Gregor Leusch, and Hermann Ney, An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research,International Conference on Language Resources and Evaluation, pages 39–45, 2000.

References

Documents

CS 712 presentation SMT and Morphology Case markers and ...pb/cs712-2013/sami... · Richer case marking and suffixes of Indian languages as compared to English Being the Free Word