59
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural Networks in NLP 1

TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

TTIC31190:NaturalLanguageProcessing

KevinGimpelWinter2016

Lecture11:RecurrentandConvolutionalNeuralNetworksinNLP

1

Page 2: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Announcements• Assignment3assignedyesterday,dueFeb.29

• projectproposaldueTuesday,Feb.16

• midtermonThursday,Feb.18

2

Page 3: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• neuralnetworkmethodsinNLP• syntaxandsyntacticparsing• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications

3

Page 4: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

2-transformation(1-layer)network

• we’llcallthisa“2-transformation”neuralnetwork,ora“1-layer”neuralnetwork

• inputvectoris• scorevectoris• onehiddenvector(“hiddenlayer”)

4

vectoroflabelscores

Page 5: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

1-layerneuralnetworkforsentimentclassification

5

Page 6: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Usesoftmax functiontoconvertscoresintoprobabilities

6

Page 7: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ikr smh heaskedfiryo lastnamesohecanadduonfb lololol

7

intj pronoun prepadj prep verbotherverb det noun pronoun

pronoun propernoun

verbprep intj

NeuralNetworksforTwitterPart-of-SpeechTagging

adj =adjectiveprep=prepositionintj =interjection

• inAssignment3,you’llbuildaneuralnetworkclassifiertopredictaword’sPOStagbasedonitscontext

Page 8: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ikr smh heaskedfiryo lastnamesohecan

8

intj pronoun prepadj prep verbotherverbdet noun pronoun

NeuralNetworksforTwitterPart-of-SpeechTagging

• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?– ithastobeindependentofthelabel– ithastobeafixed-lengthvector

Page 9: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ikr smh heaskedfiryo lastnamesohecan

9

intj pronoun prepadj prep verbotherverbdet noun pronoun

NeuralNetworksforTwitterPart-of-SpeechTagging

• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?

wordvectorforyo

Page 10: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ikr smh heaskedfiryo lastnamesohecan

10

intj pronoun prepadj prep verbotherverbdet noun pronoun

NeuralNetworksforTwitterPart-of-SpeechTagging

• e.g.,predicttagofyo givencontext• whatshouldtheinputxbe?

wordvectorforyowordvectorforfir

Page 11: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ikr smh heaskedfiryo lastnamesohecan

11

intj pronoun prepadj prep verbotherverbdet noun pronoun

NeuralNetworksforTwitterPart-of-SpeechTagging

wordvectorforyowordvectorforfir

• whenusingwordvectorsaspartofinput,wecanalsotreatthemasmoreparameterstobelearned!

• thisiscalled“updating”or“fine-tuning”thevectors(sincetheyareinitializedusingsomethinglikeword2vec)

Page 12: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ikr smh heaskedfiryo lastnamesohecan

12

intj pronoun prepadj prep verbotherverbdet noun pronoun

NeuralNetworksforTwitterPart-of-SpeechTagging

vectorforlastvectorforyo

• let’susethecenterword+twowordstotheright:

vectorforname

• ifname istotherightofyo,thenyo isprobablyaformofyour• butourx aboveusesseparatedimensionsforeachposition!

– i.e.,nameistwowordstotheright– whatifnameisonewordtotheright?

Page 13: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

FeaturesandFilters• wecoulduseafeaturethatreturns1ifnameistotherightofthecenterword,butthatdoesnotusetheword’sembedding

• howdoweincludeafeaturelike“awordsimilartoname appearssomewheretotherightofthecenterword”?

• ratherthanalwaysspecifyrelativepositionandembedding,wewanttoaddfilters thatlookforwordslikenameanywhereinthewindow(orsentence!)

13

Page 14: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Filters• fornow,thinkofafilterasavectorinthewordvectorspace

• thefiltermatchesaparticularregionofthespace• “match”=“hashighdotproductwith”

14

Page 15: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Convolution• convolutionalneuralnetworksuseabunchofsuchfilters

• eachfilterismatchedagainst(dotproductcomputedwith)eachwordintheentirecontextwindoworsentence

• e.g.,asinglefilterisavectorofsamelengthaswordvectors

15

Page 16: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Convolution

16

vectorforlastvectorforyo vectorforname

Page 17: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Convolution

17

vectorforlastvectorforyo vectorforname

Page 18: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Convolution

18

vectorforlastvectorforyo vectorforname

Page 19: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Convolution

19

vectorforlastvectorforyo vectorforname

=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence

Page 20: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Pooling

20

vectorforlastvectorforyo vectorforname

=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence

howdoweconvertthisintoafixed-lengthvector?usepooling:

max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin

Page 21: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Pooling

21

vectorforlastvectorforyo vectorforname

=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence

howdoweconvertthisintoafixed-lengthvector?usepooling:

max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin

then,thissinglefilterproducesasinglefeaturevalue(theoutputofsomekindofpooling).inpractice,weusemanyfiltersofmanydifferentlengths(e.g.,n-gramsratherthanwords).

Page 22: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ConvolutionalNeuralNetworks• convolutionalneuralnetworks(convnets orCNNs)usefiltersthatare“convolvedwith”(matchedagainstallpositionsof)theinput

• informally,thinkofconvolutionas“performthesameoperationeverywhereontheinputinsomesystematicorder”

• “convolutionallayer”=setoffiltersthatareconvolvedwiththeinputvector(whetherx orhiddenvector)

• couldbefollowedbymoreconvolutionallayers,orbyatypeofpooling

• oftenusedinNLPtoconvertasentenceintoafeaturevector

22

Page 23: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

RecurrentNeuralNetworksInputisasequence:

23

nottoobad

Page 24: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

RecurrentNeuralNetworksInputisasequence:

24

“hiddenvector”

Page 25: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

RecurrentNeuralNetworks

25

“hiddenvector”

Page 26: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Disclaimer• thesediagramsareoftenusefulforhelpingusunderstandandcommunicateneuralnetworkarchitectures

• buttheyrarelyhaveanysortofformalsemantics(unlikegraphicalmodels)

• theyaremorelikecartoons

26

Page 27: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

LongShort-TermMemoryRNNs(gateless)

27

“memorycell”

Page 28: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

LongShort-TermMemoryRNNs(gateless)

28

Page 29: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

LongShort-TermMemoryRNNs(gateless)

29

Page 30: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

LongShort-TermMemoryRNNs(gateless)

30

Experiment:textclassification• StanfordSentimentTreebank• binaryclassification(positive/negative)

• 25-dimwordvectors• 50-dimcell/hiddenvectors• classificationlayeronfinal hiddenvector• AdaGrad,10epochs,mini-batchsize10• earlystoppingondev set

accuracy

80.6

Page 31: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

31

Page 32: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

32

Page 33: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

33

Page 34: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

34

thisispointwisemultiplication!isavector

Page 35: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

35

thisispointwisemultiplication!isavector

Page 36: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

36

diagonalmatrix

logisticsigmoid,sooutputrangesfrom

0to1

Page 37: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

37

acc.

gateless 80.6

outputgates 81.9

Page 38: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

38

acc.

gateless 80.6

outputgates 81.9

What’sbeinglearned?(demo)

Page 39: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

InputGates

39

Page 40: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

InputGates

40

again,thisispointwise

multiplication

Page 41: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

InputGates

41

Page 42: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

InputGates

42

diagonalmatrix

Page 43: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

OutputGates

43

InputGates

difference

Page 44: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

InputGates

44

acc.

gateless 80.6

outputgates 81.9

inputgates 84.4

Page 45: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

InputandOutputGates

45

acc.

gateless 80.6

outputgates 81.9

inputgates 84.4

input&outputgates 84.6

Page 46: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ForgetGates

46

Page 47: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ForgetGates

47

Page 48: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ForgetGates

48

Page 49: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

ForgetGates

49

acc.

gateless 80.6

outputgates 81.9

inputgates 84.4

forgetgates 82.1

Page 50: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

AllGates

50

Page 51: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

AllGates

51

acc.

gateless 80.6

outputgates 81.9

inputgates 84.4

input&outputgates 84.6

forgetgates 82.1

input&forget gates 84.1

forget& outputgates 82.6

input,forget,outputgates 85.3

Page 52: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Backward&BidirectionalLSTMs

52

bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax

Page 53: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Backward&BidirectionalLSTMs

53

bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax

forward backward

gateless 80.6 80.3

outputgates 81.9 83.7

inputgates 84.4 82.9

forgetgates 82.1 83.4

input,forget,outputgates 85.3 85.9

Page 54: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

Backward&BidirectionalLSTMs

54

bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax

forward backward bidirectional

gateless 80.6 80.3 81.5

outputgates 81.9 83.7 82.6

inputgates 84.4 82.9 83.9

forgetgates 82.1 83.4 83.1

input,forget,outputgates 85.3 85.9 85.1

Page 55: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

LSTM

55

Page 56: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

DeepLSTM(2-layer)

56

layer1

layer2

Page 57: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

DeepLSTM(2-layer)

57

layer1

layer2

acc.

gatelessshallow(50) 80.6

deep(30,30) 80.8

input,forget,outputshallow(50) 85.3

deep(30,30) ~85

Page 58: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

DeepBidirectionalLSTMs

58

concatenatehiddenvectorsofforward&backwardLSTMs,connecteachentrytoforwardandbackwardhiddenvectorsinnextlayer

Page 59: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/lectures/11.pdf · Natural Language Processing Kevin Gimpel Winter 2016 Lecture 11: Recurrent and Convolutional Neural

59

(logistic)sigmoid: