Click here to load reader

#03-2019-1012 NMT Lecture2 - GitHub Pages · PDF file 2020. 11. 1. · P1 Copy! RNN RNN RNN prof loves prof loves cheese! ! prof liebt kebab αt,1 = 0.1 αt,2 = 0.2 αt,3 = 0.7 kebab

  • View
    0

  • Download
    0

Embed Size (px)

Text of #03-2019-1012 NMT Lecture2 - GitHub Pages · PDF file 2020. 11. 1. · P1 Copy! RNN...

  • 10.11.19 09:43

    Neural Machine Translation2 CMPT 413/825, Fall 2019

    Jetic Gū

    School of Computing Science

    12 Nov. 2019

  • Overview • Focus: Neural Machine Translation

    • Architecture: Encoder-Decoder Neural Network

    • Main Story:

    • Extension to Seq2Seq: Copy Mechanism

    • Extension to Seq2Seq: Ensemble

    • Extension to Seq2Seq: BeamSearch

    • [Extra] Beyond Seq2Seq: Attention is all you need

    • [Extra] Beyond NMT 2

  • DecoderEncoder

    Sequence-to-Sequence

    Re vie

    wPr(E |F) = ΠtPr(e′ = et |F, e

  • Attention

    Re vie

    w

    !

    RNN RNN RNN

    prof likes

    prof likes cheese

    RNN

    cheese

    ! ! !

    RNN

    Encoder

    RNNRNN RNN RNN RNN RNN

    ?

    RNN

    was

    RNN

    hat

    RNN

    prof.

    RNN

    sarkar

    RNN

    gegessen

    RNN

    Decoder

    P0 Review

  • Attention

    Re vie

    w

    !

    Frau

    in

    Gelb

    mit

    einem

    Kinderwagen

    Mann

    mit

    blauem

    Hemd

    und

    ein

    So ur

    ce (G

    er m

    an )

    wo m

    an in

    sh irt

    wi th 1a

    st ro

    lle r

    1a nd 2a

    m an in 3a

    bl ue

    sh irt

    Prediction (English)

    go ld

    P0 Review

  • Attention • Why does Attention work?

    • What is in the context vector?

    • What is in ?

    • It’s alignment1 !

    • learns to refer to useful information in src2

    • similar to human attention: we pay attention to whatever is needed

    α

    Re vie

    w

    P0 XXXXXXX !

    contextt = ∑ i

    αt,ihenci

    scoret,i = f(henci , h dec t )

    αt = softmax(scoreti)

    1. CL2015008 [Bahdanau et al.] Neural Machine Translation by Jointly Learning to Align and Translate 2. CL2017342 [Ghader et Monz] What does Attention in Neural Machine Translation Pay Attention to?

    P0 Review

  • Common Problems of NMT

    • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Frequent word are translated correctly, rare words are not

    • In any corpus, word frequencies are exponentially unbalanced

    Re vie

    w

    P0 XXXXXXX

    P0 Review

  • OOV

    Re vie

    w

    P0 XXXXXXX

    1. https://en.wikipedia.org/wiki/Zipf%27s_law

    P0 Review

    Zipf’s Law

    O cc

    ur re

    nc e

    C ou

    nt

    1

    10

    100

    1000

    10000

    100000

    1000000

    Frequency Rank 1 25000 50000

    • rare word are exponentially less frequent than frequent words

    • e.g. in HW4, 45%|65% (src|tgt) of the unique words occur once

  • Common Problems of NMT

    • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Frequent word are translated correctly, rare words are not

    • In any corpus, word frequencies are exponentially unbalanced

    • Under translation

    • Crucial information are left untranslated; premature generation

    Re vie

    w

    P0 XXXXXXX

    P0 Review

  • • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Frequent word are translated correctly, rare words are not

    • In any corpus, word frequencies are exponentially unbalanced

    • Under translation

    • Crucial information are left untranslated; premature generation

    Under-trans

    Re vie

    w

    P0 XXXXXXX

    P0 Review

    !

    RNN

    nobody

    RNN

    expects

    expects

    RNN

    the

    the

    RNN

    spanish

    spanish

    RNN

    inquisition

    inquisition

    RNN

    !

    RNN

    nobody

  • • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Frequent word are translated correctly, rare words are not

    • In any corpus, word frequencies are exponentially unbalanced

    • Under translation

    • Crucial information are left untranslated; premature generation

    Under-trans

    Re vie

    w

    P0 XXXXXXX

    P0 Review

    !

    RNN

    nobody

    RNN

    expects

    expects

    RNN

    the

    the

    RNN

    spanish

    spanish

    RNN

    inquisition

    inquisition

    RNN

    !

    RNN

    nobody

    'inquisition': 0.49 : 0.51

  • • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Frequent word are translated correctly, rare words are not

    • In any corpus, word frequencies are exponentially unbalanced

    • Under translation

    • Crucial information are left untranslated; premature generation

    Under-trans

    Re vie

    w

    P0 XXXXXXX

    P0 Review

    !

    RNN

    nobody

    RNN

    expects

    expects

    RNN

    the

    the

    RNN

    spanish

    spanish

    RNN

    inquisition

    inquisition

    RNN

    !

    RNN

    nobody

    'inquisition': 0.49 : 0.51

  • Common Problems of NMT

    Re vie

    w

    P0 XXXXXXX

    P0 Review

    • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Frequent word are translated correctly, rare words are not

    • In any corpus, word frequencies are exponentially unbalanced

    • Under translation

    • Crucial information are left untranslated; premature generation

    • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Under translation

  • Common Problems of NMT

    Re vie

    w

    P0 XXXXXXX

    P0 Review

    • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Copy Mechanisms

    • Char-level Encoder (oops for logogram, e.g. Chinese)

    • Under translation

    • Ensemble

    • Beam search

    • Coverage models

    • Out-of-Vocabulary (OOV) Problem; Rare word problem

    • Under translation

  • Copy Mechanism

    Co nc

    ep t

    P0 XXXXXXX

    P1 Copy

    !

    RNN RNN RNN

    prof loves

    prof loves cheese

    RNN

    cheese

    ! ! !

    RNN

    1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

  • Copy Mechanism

    Co nc

    ep t

    P0 XXXXXXX

    P1 Copy

    !

    RNN RNN RNN

    prof loves

    prof loves cheese

    RNN

    cheese

    ! ! !

    RNN

    prof

    liebt

    kebab

    1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

  • Copy Mechanism

    Co nc

    ep t

    P0 XXXXXXX

    P1 Copy

    !

    RNN RNN RNN

    prof loves

    prof loves cheese

    RNN

    cheese

    ! ! !

    RNN

    prof

    liebt

    kebab

    αt,1 = 0.1

    αt,2 = 0.2

    αt,3 = 0.7

    1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

  • Copy Mechanism

    Co nc

    ep t

    P0 XXXXXXX

    P1 Copy

    !

    RNN RNN RNN

    prof loves

    prof loves cheese

    RNN

    cheese

    ! ! !

    RNN

    prof

    liebt

    kebab

    αt,1 = 0.1

    αt,2 = 0.2

    αt,3 = 0.7

    kebab

    1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

  • Copy Mechanism

    Co nc

    ep t

    P0 XXXXXXX

    P1 Copy

    !

    RNN RNN RNN

    prof loves

    prof loves cheese

    ! !

    RNN

    prof

    liebt

    kebab

    αt,1 = 0.1

    αt,2 = 0.2

    αt,3 = 0.7

    kebab

    • Sees at step

    • looks at attention weight

    • replace with the source word

    t

    αt

    fargmaxiαt,i

    1. CL2015015 [Luong et al.] Addressing the Rare Word Problem in Neural Machine Translation

  • *Pointer-Generator Copy Mechanism

    Co nc

    ep t

    P0 XXXXXXX

    P1 Copy

    1. CL2016032 [Gulcehre et al.] Pointing the Unknown Words

    !

    RNN RNN RNN

    prof loves

    prof loves cheese

    RNN

    cheese

    ! ! !

    RNN

    prof

    liebt

    kebab

    αt,1 = 0.1

    αt,2 = 0.2

    αt,3 = 0.7

    kebab

  • *Pointer-Generator Copy Mechanism

    Co nc

    ep t

    P0 XXXXXXX

    P1 Copy

    1. CL2016032 [Gulcehre et al.] Pointing the Unknown Words

    RNN

    !

    pg

    !

    RNN RNN RNN

    prof loves

    prof loves cheese

    RNN

    cheese

    ! ! !

    RNN

    prof

    liebt

    kebab

    αt,1 = 0.1

    αt,2 = 0.2

    αt,3 = 0.7

    kebab

    • binary classifier switch • use decoder output • use copy mechanism

  • *Pointer-Generator Copy Me

Search related