Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Institute of Computational Linguistics
Contrastive Evaluation of Larger-context Neural Machine Translation
Kolloquium Talk 2018Mathias Müller
4/10/18 KOLLO, Mathias Müller
Larger-context neural machine translation
4/10/18 KOLLO, Mathias Müller Page 2
Why larger context?
4/10/18 KOLLO, Mathias Müller Page 3
SourceHowever, the European Central Bank (ECB) took an interest in it in a report on virtual currencies published in October. It describes bitcoin as "the most successful virtual currency,” […].
TargetDennoch hat die Europäische Zentralbank (EZB) in einem im Oktober veröffentlichten Bericht über virtuelle Währungen Interesse hierfür gezeigt. Sie beschreibt Bitcoin als "die virtuelle Währung mit dem größten Erfolg” […].
(example taken from newstest2013.{de,en})
Why larger context?
4/10/18 KOLLO, Mathias Müller Page 4
Why larger context?
4/10/18 KOLLO, Mathias Müller Page 5
SourceIt describes bitcoin as "the most successful virtual currency”.
TargetEs beschreibt den Bitcoin als "die erfolgreichste virtuelle Währung".
How to incorporate larger context?
4/10/18 KOLLO, Mathias Müller Page 6
Open question, preliminary works:• gated auxiliary context or ”warm start” decoder initialization with
a document summary (Wang et al., 2017)• additional encoder and attention network for previous
source sentence (Jean et al., 2017)• Concatenate previous source sentence, mark with a prefix
(Tiedemann and Scherrer, 2017)• both source and target context (Miculicich Werlen et al.,
submitted)• hierarchical attention, among other solutions (Bawden et al.,
submitted)
Additional Encoder and attention network
4/10/18 KOLLO, Mathias Müller Page 7
• on top of Nematus (Sennrich et al., 2017) which follows standard practice: an encoder-decoder framework with attention (Bahdanau et al., 2014)
• Encoder and Decoder are gated recurrent units (GRUs), a variant of RNNs
• Decoder is a GRU conditioned on source sentence, the source sentence context in turn is generated by the encoder, and modulated by attention
• we also condition on preceding sentences, with additional encoders and separate attention networks
Additional Encoder and attention network
4/10/18 KOLLO, Mathias Müller Page 7
• on top of Nematus (Sennrich et al., 2017) which follows standard practice: an encoder-decoder framework with attention (Bahdanau et al., 2014)
• Encoder and Decoder are gated recurrent units (GRUs), a variant of RNNs
• Decoder is a GRU conditioned on source sentence, the source sentence context in turn is generated by the encoder, and modulated by attention
• we also condition on preceding sentences, with additional encoders and separate attention networks
Recurrent neural networks refresher
4/10/18 KOLLO, Mathias Müller Page 8
RNN variant: gated recurrent unit (GRU)
4/10/18 KOLLO, Mathias Müller Page 9
Figure taken from Chung et al. (2014)
Conditional gated recurrent unit (cGRU)
4/10/18 KOLLO, Mathias Müller Page 10
Detailed formulas: https://github.com/nyu-dl/dl4mt-tutorial/blob/master/docs/cgru.pdf
Extension of cGRU for n contexts
4/10/18 KOLLO, Mathias Müller Page 11
Detailed formulas: https://github.com/bricksdont/ncgru/blob/master/ct.pdf
How to incorporate larger context?
4/10/18 KOLLO, Mathias Müller Page 12
• Additional encoder and attention networks for previous context (Jean et al., 2017) in Nematus
• Technically: an extension of deep transition (Pascanu et al., 2013) with additional GRU steps that attend to contexts other than the current source sentence
• Intuitively: while generating the next word, the decoder has access to previous source or target sentence
• Multiple encoders share most of the parameters because embedding matrices are tied (Press and Wolf, 2016)
Actual systems we have trained
4/10/18 KOLLO, Mathias Müller Page 13
• Nematus systems with standard parameters, similar to Edinburgh’s WMT 17 submissions
• English to German (why?)• Training data from WMT 17
1) Baseline system without additional context2) + source context: 1 previous source sentence if any3) + target context: 1 previous target sentence if any
How to evaluate larger-context systems?
4/10/18 KOLLO, Mathias Müller Page 14
• Need: evaluation that focuses on specific linguistic phenomena• Challenge Set for contrastive evaluation
SourceDespite the fact that it is a part of China, Hong Kong determines its currency policy separately.
TargetHongkong bestimmt, obwohl es zu China gehört, seine Währungspolitik selbst.
ContrastiveHongkong bestimmt, obwohl er zu China gehört, seine Währungspolitik selbst.
(example taken from newstest2009)
How to evaluate larger-context systems?
4/10/18 KOLLO, Mathias Müller Page 15
• Previous work with manually constructed sets: Guillou and Hardmeier (2016); Isabelle et al. (2017); Bawden et al., (submitted)
• Larger-scale automatic sets: Sennrich (2017); Rios et al. (2017); Burlot and Yvon (2017); ours
Our test set of contrastive examples
4/10/18 KOLLO, Mathias Müller Page 16
• Sources: WMT, CS Corpus, OpenSubtitles• Good candidates extracted automatically after linguistic
processing (parsing, coreference resolution)• focused on personal pronouns• Roughly 600k examples
4/10/18 KOLLO, Mathias Müller Page 17
Results: BLEU
4/10/18 KOLLO, Mathias Müller Page 18
System newstest2015 (dev) newstest2017 (test)Baseline 24.80 23.02C10 22.68 21.47C11 24.48 22.38
Contrastive scores where EN pronoun is ‘it’
4/10/18 KOLLO, Mathias Müller Page 19
Baseline C10 C11Overall performance 0.44 0.47 0.64
Baseline C10 C11it : er 0.18 0.27 0.50it : es 0.84 0.76 0.83it : sie 0.3 0.39 0.62
Contrastive scores where EN pronoun is ‘it’
4/10/18 KOLLO, Mathias Müller Page 20
Baseline C10 C11intrasegmental 0.61 0.60 0.67extrasegmental 0.41 0.45 0.64
ê distance ê Baseline C10 C110 0.61 0.60 0.671 0.36 0.43 0.642 0.46 0.43 0.583 0.53 0.53 0.66
3+ 0.67 0.56 0.76
Current activities
4/10/18 KOLLO, Mathias Müller Page 22
Last steps for the contrastive evaluation experiments:• Publish our resource and work at WMT 18
Ongoing work:• inductive biases of fully convolutional (Gehring et al., 2017) or
self-attention (“transformer”) models (Vaswani et al., 2017); collaboration with Edinburgh
• Low-resource experiments with Rumansh: pretraining transformer models with self-attentional language models (adaptation of Ramachandran et al., 2017)
Thanks!
4/10/18 KOLLO, Mathias Müller Page 23
Code currently here:https://gitlab.cl.uzh.ch/mt/nematus-context2
Bibliography
4/10/18 KOLLO, Mathias Müller Page 24
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
Bawden, Rachel, et al. “Evaluating Discourse Phenomena in Neural Machine Translation”. (Submitted to NAACL 2018)
Burlot, Franck, and François Yvon. "Evaluating the morphological competence of Machine Translation Systems." Proceedings of the Second Conference on Machine Translation. 2017.
Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
Gehring, Jonas, et al. "Convolutional sequence to sequence learning." arXiv preprint arXiv:1705.03122 (2017).
Guillou, Liane, and Christian Hardmeier. "PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation." LREC. 2016.
Isabelle, Pierre, Colin Cherry, and George Foster. "A Challenge Set Approach to Evaluating Machine Translation." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.
Bibliography
4/10/18 KOLLO, Mathias Müller Page 25
Jean, Sebastien, et al. "Does Neural Machine Translation Benefit from Larger Context?." arXiv preprint arXiv:1704.05135 (2017).
Miculicich Werlen, Lesly, et al. “Self-Attentive Residual Decoder for Neural Machine Translation.” (Submitted to NAACL 2018)
Pascanu, Razvan, et al. "How to construct deep recurrent neural networks." In Proceedings of the Second International Conference on Learning Representations (ICLR 2014)
Press, Ofir, and Lior Wolf. "Using the Output Embedding to Improve Language Models." Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Vol. 2. 2017.
Ramachandran, Prajit, Peter Liu, and Quoc Le. "Unsupervised Pretraining for Sequence to Sequence Learning." Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.
Rikters, Matīss, Mark Fishel, and Ondřej Bojar. "Visualizing neural machine translation attention and confidence." The Prague Bulletin of Mathematical Linguistics 109.1 (2017): 39-50.
Bibliography
4/10/18 KOLLO, Mathias Müller Page 26
Rios Gonzales, Annette, Laura Mascarell, and Rico Sennrich. "Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings." Proceedings of the Second Conference on Machine Translation. 2017.
Sennrich, Rico, et al. "Nematus: a Toolkit for Neural Machine Translation." Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017.
Sennrich, Rico. "How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs." Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Vol. 2. 2017.
Tiedemann, Jörg, and Yves Scherrer. "Neural Machine Translation with Extended Context." Proceedings of the Third Workshop on Discourse in Machine Translation. 2017.
Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
Wang, Longyue, et al. "Exploiting Cross-Sentence Context for Neural Machine Translation." Proceedings of EMNLP. 2017.
Appendix: Notions of depth in RNN networks
4/10/18 KOLLO, Mathias Müller Page 27
• generally three types of depth (Pascanu et al., 2013):
stacked layers (each layer individually recurrent)deep transition (units not individually recurrent)deep output (units not individually recurrent)
• in Nematus, the decoder is implemented as a cGRU with deep transition and deep output
• crucially: attention over source sentence vectors C is a deep transition step