41
100190 北京市海澱區中關村東路95http://www.nlpr.ia.ac.cn/cip/cqzong.htm 電郵:[email protected] 電話: +86-10-6255 4263 How We Use Moses to Develop Our Multi-lingual Machine Translation Systems? Chengqing ZONG (宗成庆) Institute of Automation, Chinese Academy of Sciences 中国科学院自动化研究所 [email protected]

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Embed Size (px)

DESCRIPTION

In this talk Chengqing presents some work on development of statistical machine translation (MT) system based on the open source toolkit Moses at CASIA. In recent years, CASIA have developed several MT systems, including Chinese-to-English and English-to-Chinese, Japanese-to-Chinese, Arabic-to-Chinese, Uigur-to-Chinese and Tibetan-to-Chinese MT systems etc. Moses is a basic translation engine in our systems. Chengqing shows audience how CASIA use and extend Moses to develop the multilingual MT systems. This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme. Latest news on Twitter - #MosesCore

Citation preview

Page 1: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

100190 北京市海澱區中關村東路95號 http://www.nlpr.ia.ac.cn/cip/cqzong.htm

電郵:[email protected] 電話: +86-10-6255 4263

How We Use Moses to Develop Our Multi-lingual Machine

Translation Systems?

Chengqing ZONG (宗成庆) Institute of Automation, Chinese Academy of Sciences

中国科学院自动化研究所

[email protected]

Page 2: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Outline

1.  Brief Introduction to Our Work 2.  Main Features of Moses 3.  How We Use Moses? 4.  Our Feeling

Page 3: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Our group is working with machine translation (MT) research and system development in the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA).

1.  Brief Introduction to Our Work

u  6 staffs u  8 Ph.D candidates, 1 Master student u  5 visiting scholars

Page 4: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

1.  Brief Introduction to Our Work

Page 5: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Multilingual text-to-text translation system

1.  Brief Introduction to Our Work

Japanese

Chinese

Page 6: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  In evaluation of spoken language translation(SLT) organized by IWSLT’2007 The performance of CE clean text translation of our system was the best one according to the results of human rankings.

1.  Brief Introduction to Our Work

Page 7: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

CASIA CASIA

n  In IWSLT’2008

1.  Brief Introduction to Our Work

Page 8: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

CASIA

CASIA

n  In IWSLT’2009

1.  Brief Introduction to Our Work

Page 9: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

CASIA

CASIA

1.  Brief Introduction to Our Work

Page 10: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

NLPR

1.  Brief Introduction to Our Work

Page 11: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

²  In MT evaluation organized by China Workshop on Machine Translation (CWMT) 2011 (Sept. 23~ 24th), our system participated in all tasks: 1.  Chinese to English (News domain, progress) 2.  English to Chinese (News domain, progress) 3.  English to Chinese (News domain, current) 4.  English to Chinese (Science domain) 5.  Japanese to Chinese (News domain) 6.  Tibetan to Chinese (Government documents) 7.  Mongolian to Chinese (Daily) 8.  Uigur to Chinese (News domain) 9.  Kazakh to Chinese (News domain) 10.  Kir Kyrgyz to Chinese (News domain)

1.  Brief Introduction to Our Work

19 Units and 165 Systems participated in this evaluation

Page 12: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

According to BLEU scores, the performance of our system was the top one in the following 5 tasks :

ü  English to Chinese (News domain, progress) ü  Japanese-to-Chinese (News domain) ü  Tibetan to Chinese (Government documents) ü  Mongolian to Chinese (Daily) ü  Kir Kyrgyz to Chinese (News domain)

And it is ranked at the second position in the following 4 tasks: ü  Chinese to English (News domain, progress)

ü  English to Chinese (News domain, current) ü  Uigur to Chinese (News domain) ü  Kazakh to Chinese (News domain)

1.  Brief Introduction to Our Work

Page 13: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Outline

1.  Brief Introduction to Our Work 2.  Main Features of Moses 3.  How We Use Moses? 4.  Our Feeling

Page 14: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  The basic ideas of statistical machine translation (SMT) can be formulated in principle as

2. Main Features of Moses

Now it is usually implemented by a log-linear model:

ebest =argmaxe p(f | e)×pLM(e)×wlength(e)

weight feature

Page 15: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

ü  Phrase translation probability ; ü  Lexical phrase translation probability ; ü  Inversed phrase translation probability ; ü  Inversed lexical phrase translation probability ; ü  English language model based on n-gram ; ü  English sentence length penalty ; ü  Chinese phrase count penalty.

2. Main Features of Moses

Some useful features include:

Page 16: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

欧洲 部分 地区 遭受 洪水 袭击

Europe parts of

(1)

(3)

hit by floods

parts of Europe hit by floods

欧洲 部分 地区 遭受 洪水 袭击

(2)

2. Main Features of Moses

A phrase-based example:

Page 17: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

2. Main Features of Moses

Parallel data

Development data

Test data

Translation model

Target translation

Good or bad

Moses training

Moses decoder

Moses evaluation The Framework:

Page 18: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

2. Main Features of Moses

n  Offer two types of translation models: phase-based and tree-based

n  Support factored translation models n  Allow the decoding of different kinds of

inputs: sentences, confusion networks and word lattices

Page 19: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  Support n-best translation output besides the best one

l  This is a good conference. l  This was a great conference. l  It is a good meeting. l  … …

n  Provide an experimental management system n  Translate fast with a good translation quality

2. Main Features of Moses

Page 20: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  Keep balance on Speed or Quality? n  If we want translation speed, Moses provides us

many options to accelerate the translation process, such as beam size, the granularity of translation rules.

n  If we pursue translation quality, Moses also allows us to enlarge the translation search space in order to have a bigger change to obtain a better translation.

2. Main Features of Moses

Page 21: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  It now includes more and more even better translation models n  Hierarchical Phrase-based Translation Model

(HPB) n  Tree-to-Tree/String-to-Tree Translation Models

n  It provides more new features, such as faster language modeling, multi-thread decoding, client-server translation etc.

2. Main Features of Moses

It keeps improving ……

Page 22: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  Moses provides good documentation and friendly interface

n  We can upgrade the components if we need n  We can develop hybrid translation methods

in the framework of Moses

2. Main Features of Moses

It allows extension ……

Page 23: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Outline

1.  Brief Introduction to Our Work 2.  Main Features of Moses 3.  How We Use Moses? 4.  Our Feeling

Page 24: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

3. How We Use Moses?

n  Moses facilitates our research work l  For the beginners of SMT l  For the researchers familiar with SMT l  For the engineers to build an SMT system

Page 25: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

u For the beginners of SMT: n  For most beginners of SMT, Moses is the most fresh

and vivid tutorials to give the beginners an intuitive feeling of SMT;

n  Detailed guidance is very easy for beginners to use; n  It can provide a preliminary understanding of the

modules involved in the SMT system; n  It can guide beginners to locate their interested

research in SMT quickly.

3. How We Use Moses?

Page 26: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

3. How We Use Moses?

We use Moses as a tutorial tool.

Page 27: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

u For the researchers familiar with SMT n  Moses provides the whole toolkit for

building a translation system n  data preparation, word alignment, translation rule

extraction, parameter tuning, decoding, and evaluation

n  We just need to study the sub-models that we are interested in and then propose new algorithms, and finally verify the effectiveness using Moses

3. How We Use Moses?

Page 28: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  For example, we proposed a new algorithm of word alignment and translation rule extraction

n  Moses can help us to verify the effectiveness of the proposed methods in just few days. It accelerates our research work a lot

3. How We Use Moses?

n  The most important for MT researchers, Moses has become a de facto standard baseline to test their own models

Page 29: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

3. How We Use Moses?

We develop new models to compare with Moses and propose new algorithms to implement on Moses platform.

Page 30: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Source language

Phrases

Formalism gram.

Syntax

Interlingua

Semantic

Target language

Phrases

Formalism gram.

Syntax

Word-based model

Phrase-based

Hierarchical phrase based

3. How We Use Moses?

Tree-to-tree

Semantic

String-to-tree Tree-to-string

Page 31: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

u For the engineers to build an SMT system

n  They do not need to care about the principle about how Moses works

n  just need to provide training data, development data, and test data

n  do some pre-processing work to make data clean n  do some post-processing work to convert the output

3. How We Use Moses?

Page 32: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

NLPR, CAS-IA 4/23/12 32

… n-best list n-best list n-best list

Source sentence Pre-processing

MT engine 1 MT engine 2 MT engine 6 …

Merged n-best list MBR decoder

Word aligning References for alignment

Merging alignments

Confusion network Decoder based on C.N Translation

Moses

Page 33: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

3. How We Use Moses?

We also use Moses as a tool to evaluate the quality of some collected parallel corpus because we can build an MT system in two or three days based on the corpus and evaluate the quality of translation. We know how well the translation quality reflect the quality of corpus.

Page 34: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

1-1 merkezdiki dölet apparatliri bilen jaylardiki dölet apparatlirining xizmet hoquqi merkezning bir tutash rehberlikide jaylarning teshebbuskarliqi we aktipliqini toluq jari qildurush prinsipi boyiche ayrilidu.

1-2 中央和地方的国家机构职权的划分,遵循在中央的统一领导下,充分发挥地方的主动性、积极性的原则。

2-1 madda jungxua xelq jumhuriyitide hemme millet bapbarawer.

2-2 中华人民共和国各民族一律平等。 ……

3. How We Use Moses?

For example,

Page 35: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Many participant systems in MT evaluations in the world employ Moses, such as in evaluations of NIST, WMT, IWSLT and CWMT etc.

3. How We Use Moses?

Page 36: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Systems Use Moses? DCU √

DFKI √

FBK √

KIT √

LIG √

LIMSI

LIUM √

MIT

MSR

NICT √

RWTH

7 among 11 systems employed Moses in SLT evaluation of IWSLT’2011!

3. How We Use Moses?

Page 37: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Systems Use Moses ?

Systems Use Moses ?

DCU √ HIT √

NTT √ IMNU √

Systran √ FRDC √

ICT-CAS √ BUAA √

IA-CAS √ XMU √

IS-CAS √ IIM √

NEU NJU

XAUT √ BJTU √

ISTIC XJU √

XJIPC √

3. How We Use Moses?

16 among 19 systems employed Moses in MT evaluation of CWMT’2011!

Page 38: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Outline

1.  Brief Introduction to Our Work 2.  Main Features of Moses 3.  How We Use Moses? 4.  Our Feeling

Page 39: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  Moses is our friend n  It is a good helper and saves us a lot of labor n  It is a good mirror to reflect the quality of our

MT systems n  It is a roll booster of MT research

4. Our Feeling

We love our friend!

Page 40: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

n  Moses is our competitor n  We hope to develop new translation models to

surpass Moses, as an MT researcher n  Competition makes us get progress

4. Our Feeling

We love our competitor! We love Moses!

Page 41: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012

Thanks 谢谢!