Upload
sekizawayuuki
View
119
Download
3
Embed Size (px)
Citation preview
EMNLP 2016 readingIncorporating Discrete Translation Lexicons
into Neural Machine Translation
author : Philip ArthurGraham Neubig, Satoshi Nakamura
presentation : Sekizawa YuukiKomachi lab M1
17/02/15 1
2
Incorporating Discrete Translation Lexiconsinto Neural Machine Translation
• NMT often mistakes traislatinglow-frequency content words• lose sentence meaning
• propose method• encode low-frequency words by lexicon probabilicity• 2methods : 1, use it as a bias 2, linear interpolation
• result (En-Ja translation, use two corpora (KFTT, BTEC) )• improve 2.0-2.3 BLEU, 0.13-0.44 NIST score• faster covergence time
17/02/15
3
NMT feature
• NMT system• treat each word in the vocabulary as a vector of continuous-
valued numbers • share statistical power between similar words
(“dog” and “cat”) or contexts (“this is” and “that is”) • drawback : often mistranslate into words that seem natural in the
context do not reflect the content of the source sentence.
• PBMT ・ SMT tend to rarely make this kind of mistake• base their translations on discrete phrase mappings
• ensure that source words will be translated into a target word that has been observed as a translation at least once in the training data
17/02/15
4
NMT
• source words• target words• translate probability
17/02/15
weight matrix bias vector
fixed-width vector
5
Integrating Lexicons into NMT
• Lexicon probability
17/02/15
lexical matrix by input sentence
alignmentprobability
vocab
input sentence words
6
combine lexicon probability
1. model bias
2. linear interpolation
17/02/15
x : learnable parameter(begin : 0.5)
prevent zeroprobabilityhere : 0.001
7
Constructing Lexicon Probability
1. automatically learning• use EM algorithm• E : count expected count : • M : lexicon probability
2. manual• use dictionary entry
as translation3. hybrid
17/02/15
all possible count
translation set of source word f
8
Experiment
• Dataset : KFTT, BTEC • English to Japanese• tokenize, lowercase• length <= 50• if low frequent word,
it replace <unk> and translate in test (Luong et al (2015) )• BTEC : less than 1, KFTT : less than 3
• Evaluation• BLEU, NIST, recall (rare words from references)
17/02/15
Data Corpus
Sentence TokensEn Ja
Train BTECKFTT
464K377K
3.60M 4.97M7.77M 8.04M
Dev BTECKFTT
5101,160
3.8K 5.3K 24.3K
26.8KTest BTEC
KFTT508
1,169 3.8K 5.5K
26.0K 28.4K
appear less than 8 times in target training corpus or references
vocab-size source target
BTEC 17.8k 21.8k
KFTT 48.2k 49.1k
9
Experiment
• method• pbmt : Koehn+ (2003) – use Moses• hiero (hierarchical pbmt) : Chiang+ (2007) – use travatar• attn : Bahdanau+ (2015) – attention NMT• auto-bias : proposed – automatic• hyb-bias : proposed – hybrid dictionary
• Lexicon• auto : training data (separately) with GIZA++• manual : English-Japanese dictionary – Eijiro : 104k entries• hyb : combine “auto” and “manual” lexicon
17/02/15
10
compare with related work
† : p < 0.05, * : p < 0.10
17/02/15
+2.3 +0.44 +30%
11
compare with related work
† : p < 0.05, * : p < 0.10
• KFTT : BLEU↑ NIST↓ (compare with SMT)• traditional SMT systems have a small advantage
in translating low-frequency words
17/02/15
12
Translate examples
17/02/15
13
Training curves
• in KFTT• blue : attn• orange : auto-bias• green : hyb-bias
• first iteration : propose BLEU are higher than attn
• iteration time : 167minutes (attn) 275minutes (auto-bias)• due to calculate and use lexical probability matrix
17/02/15
14
Attention matrices
• proposed (bias)• more correct
• lighter color : stronger word attention• red box : correct alignment
17/02/15
15
proposed method resultfirst column without lexicon NMT
bias・ man is less effectivedue to coverage for target-domain words
linear・ reverse to bias・ worse than biasdue to constant interpolation coefficient
17/02/15
16
Incorporating Discrete Translation Lexiconsinto Neural Machine Translation
• NMT often mistakes traislatinglow-frequency content words
• propose method• encode low-frequency words by lexicon probabilicity• 2methods : 1, use it as a bias 2, linear interpolation
• improve 2.0-2.3 BLEU, 0.13-0.44 NIST score• faster covergence time
17/02/15