81
Deep neural networks

Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Deep neural networks

Page 2: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Outline •  What’s new in ANNs in the last 5-10 years?

–  Deeper networks, more data, and faster training •  Scalability and use of GPUs ✔ •  Symbolic differentiation

– reverse-mode automatic differentiation –  “Generalized backprop”

•  Some subtle changes to cost function, architectures, optimization methods ✔

•  What types of ANNs are most successful and why? –  Convolutional networks (CNNs) –  Long term/short term memory networks (LSTM) –  Word2vec and embeddings

•  What are the hot research topics for deep learning?

Page 3: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Outline •  What’s new in ANNs in the last 5-10 years?

–  Deeper networks, more data, and faster training •  Scalability and use of GPUs ✔ •  Symbolic differentiation

– reverse-mode automatic differentiation –  “Generalized backprop”

•  Some subtle changes to cost function, architectures, optimization methods ✔

•  What types of ANNs are most successful and why? –  Convolutional networks (CNNs) –  Long term/short term memory networks (LSTM) –  Word2vec and embeddings

•  What are the hot research topics for deep learning?

Page 4: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Recap:mul+layernetworks•  Classifierisamul,layernetworkoflogis-cunits•  Eachunittakessomeinputsandproducesoneoutputusingalogis,cclassifier

•  Outputofoneunitcanbetheinputofanother

Input layer

Output layer

Hidden layer

v1=S(wTX) w0,1

x1

x2

1

v2=S(wTX)

z1=S(wTV)

w1,1

w2,1

w0,2

w1,2

w2,2

w1

w2

Page 5: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Recap:Learningamul+layernetwork

•  Definealosswhichissquarederror– overanetworkoflogis,cunits

•  Minimizelosswithgradientdescent

– outputisnetworkoutput

JX,y (w) = y i − ˆ y i( )i∑

2

Page 6: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Recap:weightupdatesformul+layerANN

δk ≡ tk − ak( ) ak 1− ak( )

δ j ≡ δkwkj( )k∑ aj 1− aj( )

For nodes k in output layer:

For nodes j in hidden layer:

For all weights:

“Propagate errors backward” BACKPROP

wkj = wkj −ε δkajwji = wji −ε δ jai

Can carry this recursion out further if you have multiple hidden layers

How can we generalize BackProp to other ANNs?

Page 7: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Generalizingbackprop

•  Star,ngpoint:afunc,onofnvariables

•  Step1:codeyourfunc,onasaseriesofassignments

•  Step2:backpropagatebygoingthruthelistinreverseorder,star,ngwith…

•  …andusingthechainrule

e.g.

Wengert list

return

inputs:

Step 1: forward

Step 1: backprop

https://justindomke.wordpress.com/

A function

Computed in previous step

Page 8: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Recap:logis+cregressionwithSGD

8

Let X be a matrix with k examples Let wi be the input weights for the i-th hidden unit Then Z = X W is output (pre-sigmoid) for all m units for all k examples

w1 w2 w3 … wm

0.1 -0.3 …

-1.7 …

0.3 …

1.2

x1 1 0 1 1

x2 …

xk

XW =

x1.w1 x1.w2

… x1.wm

xk.w1 … … xk.wm

There’s a lot of chances to do this in parallel…. with parallel matrix multiplication

Page 9: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:2-layerneuralnetwork

return

inputs: Step 1: forward

Inputs: X,W1,B1,W2,B2 Z1a = mul(X,W1) // matrix mult Z1b = add*(Z11,B1) // add bias vec A1 = tanh(Z1b) //element-wise Z2a = mul(A1,W2) Z2b = add*(Z2a,B2) A2 = tanh(Z2b) // element-wise P = softMax(A2) // vec to vec C = crossEntY(P) // cost function

Step 1: backprop

Target Y; N examples; K outs; D feats, H hidden

X is N*D, W1 is D*H, B1 is 1*H, W2 is H*K, …

Z1a is N*H Z1b is N*H

A1 is N*H

Z2a is N*K Z2b is N*K

A2 is N*K P is N*K

C is a scalar

Page 10: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:2-layerneuralnetwork

return

inputs: Step 1: forward

Inputs: X,W1,B1,W2,B2 Z1a = mul(X,W1) // matrix mult Z1b = add*(Z11,B1) // add bias vec A1 = tanh(Z1b) //element-wise Z2a = mul(A1,W2) Z2b = add*(Z2a,B2) A2 = tanh(Z2b) // element-wise P = softMax(A2) // vec to vec C = crossEntY(P) // cost function

Step 1: backprop

dC/dC = 1 dC/dP = dC/dC * dCrossEntY/dP dC/dA2 = dC/dP * dsoftmax/dA2 dC/Z2b = dC/dA2 * dtanh/dZ2b dC/dZ1a = dC/dZ2b * (dadd*/dZ2a + dadd/dB2) dC/dB2 = dC/Z2b * 1 dC/dZ2a = dC/dZ2b * (dmul/dA1 + dmul/dW2) dC/dW2 = dC/dZ2a * 1 dC/dA1 = …

Target Y; N rows; K outs; D feats, H hidden

N*K (N*K) * (K*K)

(N*K) N*H

−1N

pi − yiyi (1− yi )"

#$

%

&'

i∑

Page 11: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:2-layerneuralnetwork

return

inputs: Step 1: forward

Inputs: X,W1,B1,W2,B2 Z1a = mul(X,W1) // matrix mult Z1b = add(Z11,B1) // add bias vec A1 = tanh(Z1b) //element-wise Z2a = mul(A1,W2) Z2b = add(Z2a,B2) A2 = tanh(Z2b) // element-wise P = softMax(A2) // vec to vec C = crossEntY(P) // cost function

Step 1: backprop

dC/dC = 1 dC/dP = dC/dC * dCrossEntY/dP dC/dA2 = dC/dP * dsoftmax/dA2 dC/Z2b = dC/dA2 * dtanh/dZ2b dC/dZ1a = dC/dZ2b * (dadd/dZ2a + dadd/dB2) dC/dB2 = dC/Z2b * 1 dC/dZ2a = dC/dZ2b * (dmul/dA1 + dmul/dW2) dC/dW2 = dC/dZ2a * 1 dC/dA1 = …

Target Y; N rows; K outs; D feats, H hidden

−1N

pi − yiyi (1− yi )"

#$

%

&'

i∑

N*K

N*H

Need a backward form for each matrix operation used in forward

Page 12: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:2-layerneuralnetwork

return

inputs: Step 1: forward

Inputs: X,W1,B1,W2,B2 Z1a = mul(X,W1) // matrix mult Z1b = add(Z11,B1) // add bias vec A1 = tanh(Z1b) //element-wise Z2a = mul(A1,W2) Z2b = add(Z2a,B2) A2 = tanh(Z2b) // element-wise P = softMax(A2) // vec to vec C = crossEntY(P) // cost function

Step 1: backprop

dC/dC = 1 dC/dP = dC/dC * dCrossEntY/dP dC/dA2 = dC/dP * dsoftmax/dA2 dC/Z2b = dC/dA2 * dtanh/dZ2b dC/dZ1a = dC/dZ2b * (dadd/dZ2a + dadd/dB2) dC/dB2 = dC/Z2b * 1 dC/dZ2a = dC/dZ2b * (dmul/dA1 + dmul/dW2) dC/dW2 = dC/dZ2a * 1 dC/dA1 = …

Target Y; N rows; K outs; D feats, H hidden

N*K

N*H

Need a backward form for each matrix operation used in forward, with respect to each argument

Page 13: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:2-layerneuralnetwork

return

inputs: Step 1: forward

Inputs: X,W1,B1,W2,B2 Z1a = mul(X,W1) // matrix mult Z1b = add(Z11,B1) // add bias vec A1 = tanh(Z1b) //element-wise Z2a = mul(A1,W2) Z2b = add(Z2a,B2) A2 = tanh(Z2b) // element-wise P = softMax(A2) // vec to vec C = crossEntY(P) // cost function

Target Y; N rows; K outs; D feats, H hidden

N*H

Need a backward form for each matrix operation used in forward, with respect to each argument

An autodiff package usually includes •  A collection of matrix-oriented

operations (mul, add*, …) •  For each operation

•  A forward implementation •  A backward implementation for

each argument

•  It’s still a little awkward to program with a list of assignments so….

Page 14: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Generalizingbackprop

•  Star,ngpoint:afunc,onofnvariables

•  Step1:codeyourfunc,onasaseriesofassignments

Wengert list return

inputs:

Step 1: forward

https://justindomke.wordpress.com/

•  BeHerplan:overloadyourmatrixoperatorssothatwhenyouusethemin-linetheybuildanexpressiongraph

•  ConverttheexpressiongraphtoaWengertlistwhennecessary

Page 15: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:Theano

… # generate some training data

Page 16: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:Theano

Page 17: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

p_1

Page 18: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:2-layerneuralnetwork

return

inputs: Step 1: forward

Inputs: X,W1,B1,W2,B2 Z1a = mul(X,W1) // matrix mult Z1b = add(Z11,B1) // add bias vec A1 = tanh(Z1b) //element-wise Z2a = mul(A1,W2) Z2b = add(Z2a,B2) A2 = tanh(Z2b) // element-wise P = softMax(A2) // vec to vec C = crossEntY(P) // cost function

N*H

An autodiff package usually includes •  A collection of matrix-oriented

operations (mul, add*, …) •  For each operation

•  A forward implementation •  A backward implementation

for each argument

•  A way of composing operations into expressions (often using operator overloading) which evaluate to expression trees

•  Expression simplification/compilation

Page 19: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Sometoolsthatuseautodiff•  Theano:

–  UnivMontrealsystem,Python-based,firstversion2007nowv0.8,integratedwithGPUs

–  ManylibrariesbuildoverTheano(Lasagne,Keras,Blocks..)•  Torch:

–  Collobertetal,usedheavilyatFacebook,basedonLua.Similarperformance-wisetoTheano

•  TensorFlow:–  Googlesystem,possiblynotwell-tunedforGPUsystemsusedbyacademics

–  AlsosupportedbyKeras•  Autograd

–  NewsystemwhichisveryPythonic,doesn’tsupportGPUs(yet)•  …

Page 20: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Outline•  What’snewinANNsinthelast5-10years?– Deepernetworks,moredata,andfastertraining•  ScalabilityanduseofGPUs✔•  Symbolicdifferen,a,on✔•  Somesubtlechangestocostfunc,on,architectures,op,miza,onmethods✔

•  WhattypesofANNsaremostsuccessfulandwhy?– Convolu+onalnetworks(CNNs)– Longterm/shorttermmemorynetworks(LSTM)– Word2vecandembeddings

•  Whatarethehotresearchtopicsfordeeplearning?

Page 21: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,
Page 22: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

WHAT’SACONVOLUTIONALNEURALNETWORK?

Page 23: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Modelofvisioninanimals

Page 24: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

VisionwithANNs

Page 25: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?

1-D

https://en.wikipedia.org/wiki/Convolution

Page 26: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?

1-D

https://en.wikipedia.org/wiki/Convolution

Page 27: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo

Page 28: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo

Page 29: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo

Page 30: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo

Page 31: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo

Page 32: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?http://matlabtricks.com/post-5/3x3-convolution-kernels-with-online-demo

Page 33: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

What’saconvolu+on?•  Basicidea:– Picka3x3matrixFofweights– Slidethisoveranimageandcomputethe“innerproduct”(similarity)ofFandthecorrespondingfieldoftheimage,andreplacethepixelinthecenterofthefieldwiththeoutputoftheinnerproductopera,on

•  Keypoint:– Differentconvolu,onsextractdifferenttypesoflow-level“features”fromanimage– AllthatweneedtovarytogeneratethesedifferentfeaturesistheweightsofF

Page 34: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

HowdoweconvolveanimagewithanANN?

Note that the parameters in the matrix defining the convolution are tied across all places that it is used

Page 35: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Howdowedomanyconvolu+onsofanimagewithanANN?

Page 36: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Example:6convolu+onsofadigithttp://scs.ryerson.ca/~aharley/vis/conv/

Page 37: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

CNNstypicallyalternateconvolu+ons,non-linearity,andthendownsampling

Downsampling is usually averaging or (more common in recent CNNs) max-pooling

Page 38: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Whydomax-pooling?•  Savesspace•  Reducesoverfidng?•  BecauseI’mgoingtoaddmoreconvolu,onsaferit!

–  Allowstheshort-rangeconvolu,onstoextendoverlargersubfieldsoftheimages•  Sowecanspotlargerobjects•  Eg,alonghorizontalline,oracorner,or…

Page 39: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

AnotherCNNvisualiza+onhttps://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html

Page 40: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,
Page 41: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,
Page 42: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Whydomax-pooling?•  Savesspace•  Reducesoverfidng?•  BecauseI’mgoingtoaddmoreconvolu,onsaferit!– Allowstheshort-rangeconvolu,onstoextendoverlargersubfieldsoftheimages•  Sowecanspotlargerobjects•  Eg,alonghorizontalline,oracorner,or…

•  Atsomepointthefeaturemapsstarttogetverysparseandblobby–theyareindicatorsofsomeseman,cproperty,notarecognizabletransforma,onoftheimage

•  Thenjustusethemasfeaturesina“normal”ANN

Page 43: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,
Page 44: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,
Page 45: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Whydomax-pooling?•  Savesspace•  Reducesoverfidng?•  BecauseI’mgoingtoaddmoreconvolu,onsaferit!

–  Allowstheshort-rangeconvolu,onstoextendoverlargersubfieldsoftheimages•  Sowecanspotlargerobjects•  Eg,alonghorizontalline,oracorner,or…

Page 46: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Alterna+ngconvolu+onanddownsampling

5 layers up

The subfield in a large dataset that gives the strongest output for a neuron

Page 47: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Outline•  What’snewinANNsinthelast5-10years?– Deepernetworks,moredata,andfastertraining•  ScalabilityanduseofGPUs✔•  Symbolicdifferen,a,on✔•  Somesubtlechangestocostfunc,on,architectures,op,miza,onmethods✔

•  WhattypesofANNsaremostsuccessfulandwhy?– Convolu,onalnetworks(CNNs) ✔– Longterm/shorttermmemorynetworks(LSTM)– Word2vecandembeddings

•  Whatarethehotresearchtopicsfordeeplearning?

Page 48: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

RECURRENTNEURALNETWORKS

Page 49: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Mo+va+on:whataboutsequencepredic+on?

What can I do when input size and output size vary?

Page 50: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Mo+va+on:whataboutsequencepredic+on?

x

A

h

Page 51: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

ArchitectureforanRNN

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Sequence of outputs

Sequence of inputs Start of

sequence marker

End of sequence marker

Some information

is passed from one

subunit to the next

Page 52: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Architectureforan1980’sRNN

Problem with this: it’s extremely deep and very hard to train

Page 53: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

ArchitectureforanLSTM

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Longterm-short term model

σ: output in [0,1] tanh: output in [-1,+1]

Decide what

to forget

Decide what

to insert

Combine with transformed xt

“Bits of memory”

Page 54: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

WalkthroughWhat part of

memory to “forget” – zero means forget

this bit

Page 55: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Walkthrough

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

What bits to insert into the next states

What content to store into the next state

Page 56: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Walkthrough

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Next memory cell content – mixture of not-forgotten part of

previous cell and insertion

Page 57: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Walkthrough

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

What part of cell to output

tanh maps bits to [-1,+1] range

Page 58: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

ArchitectureforanLSTM

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

it

Ct-1

ot

(1)

(2)

(3)

ft

Page 59: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Implemen+nganLSTM

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

(1)

(2)

(3)

For t = 1,…,T:

Page 60: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

SOMEFUNLSTMEXAMPLES

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 61: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

LSTMscanbeusedforothersequencetasks

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

sequence classification translation named entity

recognition image captioning

Page 62: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Character-levellanguagemodel

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Test time: •  pick a seed

character sequence

•  generate the next character

•  then the next •  then the next …

Page 63: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Character-levellanguagemodel

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 64: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Character-levellanguagemodel

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Yoav Goldberg: order-10 unsmoothed character n-grams

Page 65: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Character-levellanguagemodel

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 66: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Character-levellanguagemodel

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

LaTeX “almost compiles”

Page 67: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Character-levellanguagemodel

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 68: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Outline•  What’snewinANNsinthelast5-10years?– Deepernetworks,moredata,andfastertraining•  ScalabilityanduseofGPUs✔•  Symbolicdifferen,a,on✔•  Somesubtlechangestocostfunc,on,architectures,op,miza,onmethods✔

•  WhattypesofANNsaremostsuccessfulandwhy?– Convolu,onalnetworks(CNNs) ✔– Longterm/shorttermmemorynetworks(LSTM)– Word2vecandembeddings

•  Whatarethehotresearchtopicsfordeeplearning?

Page 69: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

WORD2VECANDWORDEMBEDDINGS

Page 70: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Basicideabehindskip-gramembeddings

from an input word w(t) in a document

construct hidden layer that

“encodes” that word

So that the hidden layer will predict

likely nearby words w(t-K), …,

w(t+K)

final step of this prediction is a softmax over lots of outputs

Page 71: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Basicideabehindskip-gramembeddings

Training data: positive examples are pairs of words w(t), w(t+j) that co-occur

Training data: negative examples are samples of pairs of words w(t), w(t+j) that don’t co-occur

You want to train over a very large corpus (100M words+) and hundreds+ dimensions

Page 72: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Resultsfromword2vec

https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/index.html

Page 73: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Resultsfromword2vec

https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/index.html

Page 74: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Resultsfromword2vec

https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/index.html

Page 75: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Outline•  What’snewinANNsinthelast5-10years?– Deepernetworks,moredata,andfastertraining•  ScalabilityanduseofGPUs✔•  Symbolicdifferen,a,on✔•  Somesubtlechangestocostfunc,on,architectures,op,miza,onmethods✔

•  WhattypesofANNsaremostsuccessfulandwhy?– Convolu,onalnetworks(CNNs) ✔– Longterm/shorttermmemorynetworks(LSTM)– Word2vecandembeddings

•  Whatarethehotresearchtopicsfordeeplearning?

Page 76: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Somecurrenthottopics

•  Mul,-tasklearning– Doesithelptolearntopredictmanythingsatonce?e.g.,POStagsandNERtagsinawordsequence?– Similartoword2veclearningtoproduceallcontextwords

•  ExtensionsofLSTMsthatmodelmemorymoregenerally– e.g.forques,onansweringaboutastory

Page 77: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Somecurrenthottopics

•  Op,miza,onmethods(>>SGD)•  Neuralmodelsthatinclude“aHen,on”– Abilityto“explain”adecision

Page 78: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Examplesofaaen+on

https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf

Page 79: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Examplesofaaen+on

https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf

Basic idea: similarly to the way an LSTM chooses what to “forget” and “insert” into memory, allow a network to choose a path to focus on in the visual field

Page 80: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Somecurrenthottopics

•  Knowledge-baseembedding:extendingword2vectoembedlargedatabasesoffactsabouttheworldintoalow-dimensionalspace.– TransE,TransR,…

•  “NLPfromscratch”:sequence-labelingandotherNLPtaskswithminimalamountoffeatureengineering,onlynetworksandcharacter-orword-levelembeddings

Page 81: Deep neural networkswcohen/10-601/deep-2.pdfDeep neural networks. Outline ... – Deeper networks, more data, and faster training ... – Many libraries build over Theano (Lasagne,

Somecurrenthottopics

•  Computervision:complextaskslikegenera,nganaturallanguagecap,onfromanimageorunderstandingavideoclip

•  Machinetransla,on– EnglishtoSpanish,…

•  Usingneuralnetworkstoperformtasks– Drivingacar– Playinggames(likeGoor…)