Click here to load reader
View
0
Download
0
Embed Size (px)
10.11.19 09:43
Neural Machine Translation2 CMPT 413/825, Fall 2019
Jetic Gū
School of Computing Science
12 Nov. 2019
Overview • Focus: Neural Machine Translation
• Architecture: Encoder-Decoder Neural Network
• Main Story:
• Extension to Seq2Seq: Copy Mechanism
• Extension to Seq2Seq: Ensemble
• Extension to Seq2Seq: BeamSearch
• [Extra] Beyond Seq2Seq: Attention is all you need
• [Extra] Beyond NMT 2
DecoderEncoder
Sequence-to-Sequence
Re vie
wPr(E |F) = ΠtPr(e′ = et |F, e
Attention
Re vie
w
!
RNN RNN RNN
prof likes
prof likes cheese
RNN
cheese
! ! !
RNN
Encoder
RNNRNN RNN RNN RNN RNN
?
RNN
was
RNN
hat
RNN
prof.
RNN
sarkar
RNN
gegessen
RNN
Decoder
P0 Review
Attention
Re vie
w
!
Frau
in
Gelb
mit
einem
Kinderwagen
Mann
mit
blauem
Hemd
und
ein
So ur
ce (G
er m
an )
wo m
an in
sh irt
wi th 1a
st ro
lle r
1a nd 2a
m an in 3a
bl ue
sh irt
Prediction (English)
go ld
P0 Review
Attention • Why does Attention work?
• What is in the context vector?
• What is in ?
• It’s alignment1 !
• learns to refer to useful information in src2
• similar to human attention: we pay attention to whatever is needed
α
Re vie
w
P0 XXXXXXX !
contextt = ∑ i
αt,ihenci
scoret,i = f(henci , h dec t )
αt = softmax(scoreti)
1. CL2015008 [Bahdanau et al.] Neural Machine Translation by Jointly Learning to Align and Translate 2. CL2017342 [Ghader et Monz] What does Attention in Neural Machine Translation Pay Attention to?
P0 Review
Common Problems of NMT
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Frequent word are translated correctly, rare words are not
• In any corpus, word frequencies are exponentially unbalanced
Re vie
w
P0 XXXXXXX
P0 Review
OOV
Re vie
w
P0 XXXXXXX
1. https://en.wikipedia.org/wiki/Zipf%27s_law
P0 Review
Zipf’s Law
O cc
ur re
nc e
C ou
nt
1
10
100
1000
10000
100000
1000000
Frequency Rank 1 25000 50000
• rare word are exponentially less frequent than frequent words
• e.g. in HW4, 45%|65% (src|tgt) of the unique words occur once
Common Problems of NMT
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Frequent word are translated correctly, rare words are not
• In any corpus, word frequencies are exponentially unbalanced
• Under translation
• Crucial information are left untranslated; premature generation
Re vie
w
P0 XXXXXXX
P0 Review
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Frequent word are translated correctly, rare words are not
• In any corpus, word frequencies are exponentially unbalanced
• Under translation
• Crucial information are left untranslated; premature generation
Under-trans
Re vie
w
P0 XXXXXXX
P0 Review
!
RNN
nobody
RNN
expects
expects
RNN
the
the
RNN
spanish
spanish
RNN
inquisition
inquisition
RNN
!
RNN
nobody
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Frequent word are translated correctly, rare words are not
• In any corpus, word frequencies are exponentially unbalanced
• Under translation
• Crucial information are left untranslated; premature generation
Under-trans
Re vie
w
P0 XXXXXXX
P0 Review
!
RNN
nobody
RNN
expects
expects
RNN
the
the
RNN
spanish
spanish
RNN
inquisition
inquisition
RNN
!
RNN
nobody
'inquisition': 0.49 : 0.51
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Frequent word are translated correctly, rare words are not
• In any corpus, word frequencies are exponentially unbalanced
• Under translation
• Crucial information are left untranslated; premature generation
Under-trans
Re vie
w
P0 XXXXXXX
P0 Review
!
RNN
nobody
RNN
expects
expects
RNN
the
the
RNN
spanish
spanish
RNN
inquisition
inquisition
RNN
!
RNN
nobody
'inquisition': 0.49 : 0.51
Common Problems of NMT
Re vie
w
P0 XXXXXXX
P0 Review
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Frequent word are translated correctly, rare words are not
• In any corpus, word frequencies are exponentially unbalanced
• Under translation
• Crucial information are left untranslated; premature generation
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Under translation
Common Problems of NMT
Re vie
w
P0 XXXXXXX
P0 Review
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Copy Mechanisms
• Char-level Encoder (oops for logogram, e.g. Chinese)
• Under translation
• Ensemble
• Beam search
• Coverage models
• Out-of-Vocabulary (OOV) Problem; Rare word problem
• Under translation
Copy Mechanism
Co nc
ep t
P0 XXXXXXX
P1 Copy
!
RNN RNN RNN
prof loves
prof loves cheese
RNN
cheese
! ! !
RNN
1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation
Copy Mechanism
Co nc
ep t
P0 XXXXXXX
P1 Copy
!
RNN RNN RNN
prof loves
prof loves cheese
RNN
cheese
! ! !
RNN
prof
liebt
kebab
1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation
Copy Mechanism
Co nc
ep t
P0 XXXXXXX
P1 Copy
!
RNN RNN RNN
prof loves
prof loves cheese
RNN
cheese
! ! !
RNN
prof
liebt
kebab
αt,1 = 0.1
αt,2 = 0.2
αt,3 = 0.7
1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation
Copy Mechanism
Co nc
ep t
P0 XXXXXXX
P1 Copy
!
RNN RNN RNN
prof loves
prof loves cheese
RNN
cheese
! ! !
RNN
prof
liebt
kebab
αt,1 = 0.1
αt,2 = 0.2
αt,3 = 0.7
kebab
1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation
Copy Mechanism
Co nc
ep t
P0 XXXXXXX
P1 Copy
!
RNN RNN RNN
prof loves
prof loves cheese
! !
RNN
prof
liebt
kebab
αt,1 = 0.1
αt,2 = 0.2
αt,3 = 0.7
kebab
• Sees at step
• looks at attention weight
• replace with the source word
t
αt
fargmaxiαt,i
1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation
*Pointer-Generator Copy Mechanism
Co nc
ep t
P0 XXXXXXX
P1 Copy
1. CL2016032 [Gulcehre et al.] Pointing the Unknown Words
!
RNN RNN RNN
prof loves
prof loves cheese
RNN
cheese
! ! !
RNN
prof
liebt
kebab
αt,1 = 0.1
αt,2 = 0.2
αt,3 = 0.7
kebab
*Pointer-Generator Copy Mechanism
Co nc
ep t
P0 XXXXXXX
P1 Copy
1. CL2016032 [Gulcehre et al.] Pointing the Unknown Words
RNN
!
pg
!
RNN RNN RNN
prof loves
prof loves cheese
RNN
cheese
! ! !
RNN
prof
liebt
kebab
αt,1 = 0.1
αt,2 = 0.2
αt,3 = 0.7
kebab
• binary classifier switch • use decoder output • use copy mechanism
*Pointer-Generator Copy Me