Document Context Neural Machine Translation with Memory ... · Document Context Neural Machine...

Document Context Neural Machine Translation with Memory Networks

Document Context Neural Machine Translationwith Memory Networks

Sameen Maruf, Gholamreza Haffari

Faculty of Information Technology

Monash University

July 17, 2017

1 / 30

Overview

1 Introduction

2 Document MT as Structured Prediction

3 Document NMT with MemNets

4 Experiments and Analysis

5 Conclusion

6 References

2 / 30

Introduction

Overview

1 Introduction

5 Conclusion

6 References

3 / 30

Introduction

Why document-level machine translation?

Most MT models translate sentences independently

Discourse phenomena are ignored, e.g. pronominal anaphoraand lexical consistency which may have long range dependency

4 / 30

Introduction

4 / 30

Introduction

4 / 30

Introduction

4 / 30

Introduction

4 / 30

Introduction

Statistical MT attempts to document MT do not yieldsignificant empirical improvements[Hardmeier and Federico, 2010, Gong et al., 2011,Garcia et al., 2014]

Previous context-NMT models only use local context andreport deteriorated performance when using the target-sidecontext[Jean et al., 2017, Wang et al., 2017, Bawden et al., 2018]

We incorporate global source and target document contexts

5 / 30

Introduction

5 / 30

Introduction

5 / 30

Introduction

5 / 30

Document MT as Structured Prediction

Overview

1 Introduction

5 Conclusion

6 References

6 / 30

7 / 30

Two types of factors: fθ(yt ; xt , x−t), gθ(yt ; y−t)

8 / 30

Training objective:Maximise P(y1, . . . , y|d ||x1, . . . , x|d |)

=⇒ Maximise the pseudo-likelihood

arg maxθ

|d |∏t=1

Pθ(yt |xt , y−t , x−t) (1)

where fθ and gθ are subsumed in the Pθ(yt |xt , y−t , x−t)

9 / 30

Training objective:

Maximise P(y1, . . . , y|d ||x1, . . . , x|d |)

arg maxθ

|d |∏t=1

Pθ(yt |xt , y−t , x−t) (1)

9 / 30

arg maxθ

|d |∏t=1

Pθ(yt |xt , y−t , x−t) (1)

9 / 30

arg maxθ

|d |∏t=1

Pθ(yt |xt , y−t , x−t) (1)

9 / 30

Challenge: During test time, the target document is not given

Coordinate Ascent (i.e., Iterative Decoding)

10 / 30

Iterative Decoding

11 / 30

Iterative Decoding

11 / 30

Iterative Decoding

11 / 30

Iterative Decoding

11 / 30

Iterative Decoding

11 / 30

Iterative Decoding

11 / 30

Document NMT with MemNets

Overview

1 Introduction

5 Conclusion

6 References

12 / 30

=⇒ Pθ(yt |xt , y−t , x−t)

13 / 30

14 / 30

15 / 30

16 / 30

Memory-to-Context:

st,j = GRU(st,j−1,ET [yt,j−1], ct,j , csrct , c trg

17 / 30

Memory-to-Output:

yt,j ∼ softmax(Wy · rt,j + Wym · csrct + Wyt · c trg

t + by )

18 / 30

Use only source, target, or both external memories

Use Memory-to-Context/Memory-to-Output architectures forincorporating the different contexts

19 / 30

Experiments and Analysis

Overview

1 Introduction

5 Conclusion

6 References

20 / 30

Experimental Setup

Training/dev/test corpora statistics:

corpus #docs (H) #sents (K) avg doc len

Fr→En Ted-Talks 10/1.2/1.5 123/15/19 123/128/124Et→En Europarl v7 150/10/18 209/14/25 14/14/14De→En News-Commentary 49/.9/1.6 191/2/3 39/23/19

Evaluation Metrics: BLEU, METEOR

Baselines:

Context-free baseline (S-NMT)

Local source context baselines:• [Jean et al., 2017] & [Wang et al., 2017]

21 / 30

Experimental Setup

Baselines:

21 / 30

Experimental Setup

Baselines:

21 / 30

Experimental Setup

Baselines:

21 / 30

Memory-to-Context Results

Fr→EnFr→EnFr→En0

De→EnDe→EnDe→En0

Et→EnEt→EnEt→En0

22 / 30

Fr→En20

De→En9

Et→En20

22 / 30

Fr→En20

De→En9

S-NMT S-NMT+src

Et→En20

22 / 30

Fr→En20

De→En9

S-NMT S-NMT+src S-NMT+trg

Et→En20

22.121.94

22 / 30

Fr→En20

De→En9

S-NMT S-NMT+src S-NMT+trg S-NMT+both

Et→En20

22.121.94

22 / 30

Memory-to-Output Results

23 / 30

Fr→En20

De→En9

Et→En20

23 / 30

Fr→En20

De→En9

S-NMT S-NMT+src

Et→En20

23 / 30

Fr→En20

21.8 21.76

De→En9

9.9810.04

S-NMT S-NMT+src S-NMT+trg

Et→En20

23 / 30

Fr→En20

21.8 21.76 21.77

De→En9

9.9810.04

S-NMT S-NMT+src S-NMT+trg S-NMT+both

Et→En20

23 / 30

Main Results

24 / 30

Main Results

Fr→En21

21.9521.87

De→En10

[Jean et al., 2017] [Wang et al., 2017]

Et→En21

24 / 30

Main Results

Fr→En21

21.9521.87 21.91

De→En10

[Jean et al., 2017] [Wang et al., 2017] S-NMT+src

Et→En21

22.06 22.1

24 / 30

Main Results

Fr→En21

21.9521.87 21.91

De→En10

[Jean et al., 2017] [Wang et al., 2017] S-NMT+src S-NMT+both

Et→En21

22.06 22.1

24 / 30

Example translation

Source qimonda taidab lissaboni strateegia eesmarke.Target qimonda meets the objectives of the lisbon strategy.

S-NMT <UNK> is the objectives of the lisbon strategy.+Src Mem the millennium development goals are fulfilling the

millennium goals of the lisbon strategy.+Trg Mem in writing. - (ro) the lisbon strategy is fulfilling the

objectives of the lisbon strategy.+Both Mems qimonda fulfils the aims of the lisbon strategy.

[Wang et al., 2017] <UNK> fulfils the objectives of the lisbon strategy.

25 / 30

Example translation

25 / 30

Example translation

25 / 30

Example translation

25 / 30

Example translation (contd.)

Source ... et riigis kehtib endiselt lukasenka diktatuur,mis rikub inim- ning etnilise vahemuse oigusi.

Target ... this country is still under the dictatorship oflukashenko, breaching human rights and the rightsof ethnic minorities.

S-NMT ... the country still remains in a position of lukashenkoto violate human rights and ethnic minorities.

+Src Mem ... the country still applies to the brutal dictatorship ofhuman and ethnic minority rights.

+Trg Mem ... the country still keeps the <UNK> dictatorship thatviolates human rights and ethnic rights.

+Both Mems ... the country still persists in lukashenko’s dictatorshipthat violate human rights and ethnic minority rights.

[Wang et al., 2017] ... there is still a regime in the country that isviolating the rights of human and ethnic minorityin the country.

26 / 30

Conclusion

Overview

1 Introduction

5 Conclusion

6 References

27 / 30

Conclusion

Proposed a model which incorporates the global source andtarget document contexts

Proposed effective training and decoding methodologies forour model

Future Work:Investigate document-context NMT models which incorporatespecific discourse-level phenomena

28 / 30

Conclusion

28 / 30

Conclusion

28 / 30

Conclusion

28 / 30

Conclusion

28 / 30

References

Overview

1 Introduction

5 Conclusion

6 References

29 / 30

References

References I

Hardmeier C. and Federico, M. (2010).

Modelling pronominal anaphora in statistical machine translation.International Workshop on Spoken Language Translation.

Gong Z. and Zhang M. and Zhou G. (2011).

Cache-based document-level statistical machine translation.Proceedings of the Conference on Empirical Methods in Natural Language Processing.

Garcia E. M. and Espana-Bonet C. and Marquez L. (2014).

Document-level machine translation as a re-translation process.Procesamiento del Lenguaje Natural, 53:103110..

Jean, S. and Lauly, L. and Firat, O. and Cho, K. (2017).

Does Neural Machine Translation Benefit from Larger Context?arXiv:1704.05135.

Wang, L. and Tu, Z. and Way, A. and Liu, Q. (2017).

Exploiting Cross-Sentence Context for Neural Machine Translation.Proceedings of the Conference on Empirical Methods in Natural Language Processing.

Bawden, R. and Sennrich, R. and Birch, A. and Haddow, B. (2018).

Evaluating Discourse Phenomena in Neural Machine Translation.Proceedings of the NAACL-HLT 2018.

30 / 30

Document Context Neural Machine Translation with Memory ... · Document Context Neural Machine...

Documents

Neural Machine Translation with Source Dependency

Neural Machine Translation with Semantically Relevant

SYSTRAN’s Pure Neural Machine Translation Systemsblog.systransoft.com/wp-content/uploads/2016/10/SystranNMTReport.pdf · SYSTRAN’s Pure Neural Machine Translation Systems

Analysis for Neural Machine Translation Systems

Private Outsourced Translation for Medical Data...Machine Translation Modern approaches to machine translation are frequently based on neural networks, speci cally Recurrent Neural

Making sense of neural machine translationmlf/docum/forcada17j2.pdfKeywords: neural machine translation, neural networks, machine translation, word embeddings, encoder, decoder, deep

Visualizing and Understanding Neural Machine Translation

문자 단위의 Neural Machine Translation

Building Salesforce Neural Machine Translation System

Exploration of Neural Machine Translation in

Improving Context-Aware Neural Machine Translation Using Self …milab.snu.ac.kr/pub/AAAI2020Yun.pdf · 2020-01-31 · Improving Context-Aware Neural Machine Translation Using Self-Attentive

Google’s Multilingual Neural Machine Translation System ... · Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation MelvinJohnson,MikeSchuster,QuocV.Le,MaximKrikun,YonghuiWu,

Neural Machine Translation › 5f44 › 62e454f073bb... · Neural Machine Translation is the primary algorithm used in industry to perform machine translation. This state-of-the-art

Improving Neural Machine Translation with External …ufal.mff.cuni.cz/~zabokrtsky/pgs/thesis_proposal/jindri...Improving Neural Machine Translation with External Information Thesis

Google’s Multilingual Neural Machine Translation …Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation Hanock Kwak 2016-11-22 (Tue.) BI Lab

“Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation Machines)

Recurrent neural networks for statistical machine translation (Neural Machine Translation) · 2016-11-14 · Recurrent neural networks for statistical machine translation (Neural

Substructure-based neural machine translation for

Google’s Multilingual Neural Machine Translation System ... · PDF fileGoogle’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation MelvinJohnson,MikeSchuster,QuocV.Le,MaximKrikun,YonghuiWu,