Lecture 10: Recurrent Neural Networks - Stanford...

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20171

Lecture 10:Recurrent Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20172

Administrative

A1 grades will go out soon

A2 is due today (11:59pm)

Midterm is in-class on Tuesday!We will send out details on where to go soon

Extra Credit: Train Game

More details on Piazza by early next week

Last Time: CNN Architectures

AlexNet

Figure copyright Kaiming He, 2016. Reproduced with permission.

3x3 conv, 128

3x3 conv, 64

3x3 conv, 128

3x3 conv, 256

3x3 conv, 512

FC 4096

FC 1000

Softmax

FC 4096

3x3 conv, 512

Softmax

3x3 conv, 512

3x3 conv, 256

3x3 conv, 128

3x3 conv, 64

3x3 conv, 512

FC 4096

FC 1000

FC 4096

VGG16 VGG19 GoogLeNet

Figure copyright Kaiming He, 2016. Reproduced with permission. Input

Softmax

3x3 conv, 64

7x7 conv, 64 / 2

FC 1000

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128 / 2

3x3 conv, 1283x3 conv, 128

Residual block

Xidentity

F(x) + x

Figures copyright Larsson et al., 2017. Reproduced with permission.

Dense Block 1

Dense Block 2

Dense Block 3

Softmax

1x1 conv, 64

Concat

Dense Block

DenseNet FractalNet

Last Time: CNN ArchitecturesAlexNet and VGG have tons of parameters in the fully connected layers

AlexNet: ~62M parameters

FC6: 256x6x6 -> 4096: 38M paramsFC7: 4096 -> 4096: 17M paramsFC8: 4096 -> 1000: 4M params~59M params in FC layers!

Today: Recurrent Neural Networks

Vanilla Neural Networks

“Vanilla” Neural Network

Recurrent Neural Networks: Process Sequences

e.g. Image Captioningimage -> sequence of words

e.g. Sentiment Classificationsequence of words -> sentiment

e.g. Machine Translationseq of words -> seq of words

e.g. Video classification on frame level

Sequential Processing of Non-Sequence Data

Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015.Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

Classify images by taking a series of “glimpses”

Sequential Processing of Non-Sequence Data

Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

Generate images one piece at a time!

Recurrent Neural Network

yusually want to predict a vector at some time steps

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

new state old state input vector at some time step

some functionwith parameters W

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set of parameters are used at every time step.

(Vanilla) Recurrent Neural Network

The state consists of a single “hidden” vector h:

h0 fW h1

RNN: Computational Graph

h0 fW h1 fW h2

h0 fW h1 fW h2 fW h3

Re-use the same weight matrix at every time-step

RNN: Computational Graph: Many to Many

y3y2y1

y3y2y1 L1L2 L3 LT

RNN: Computational Graph: Many to One

RNN: Computational Graph: One to Many

y3y3y3

Sequence to Sequence: Many-to-one + one-to-many

Many to one: Encode input sequence in a single vector

Sequence to Sequence: Many-to-one + one-to-many

Many to one: Encode input sequence in a single vector

One to many: Produce output sequence from single input vector

Example: Character-levelLanguage Model

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

Example: Character-levelLanguage ModelSampling

At test-time sample characters one at a time, feed back to model

.79Softmax

“e” “l” “l” “o”Sample

.79Softmax

“e” “l” “l” “o”SampleExample:

Character-levelLanguage ModelSampling

.79Softmax

Backpropagation through timeLoss

Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient

Truncated Backpropagation through timeLoss

Run forward and backward through chunks of the sequence instead of whole sequence

Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps

min-char-rnn.py gist: 112 lines of Python

(https://gist.github.com/karpathy/d4dee566867f8291f086)

train more

at first:

The Stacks Project: open source algebraic geometry textbook

Latex source http://stacks.math.columbia.edu/The stacks project is licensed under the GNU Free Documentation License

Generated C code

Searching for interpretable cells

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016

Karpathy, Johnson, and Fei-Fei: Visualizing and Understanding Recurrent Networks, ICLR Workshop 2016Figures copyright Karpathy, Johnson, and Fei-Fei, 2015; reproduced with permission

quote detection cell

line length tracking cell

if statement cell

quote/comment cell

code depth cell

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-FeiShow and Tell: A Neural Image Caption Generator, Vinyals et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Image Captioning

Figure from Karpathy et a, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015; figure copyright IEEE, 2015.Reproduced for educational purposes.

Convolutional Neural Network

test image

This image is CC0 public domain

test image

x0<START>

<START>

x0<START>

<START>

test image

before:h = tanh(Wxh * x + Whh * h)

now:h = tanh(Wxh * x + Whh * h + Wih * v)

x0<START>

<START>

test image

sample!

x0<START>

<START>

test image

x0<START>

<START>

test image

sample!

x0<START>

<START>

test image

x0<START>

<START>

test image

sample<END> token=> finish.

A cat sitting on a suitcase on the floor

A cat is sitting on a tree branch

A dog is running in the grass with a frisbee

A white teddy bear sitting in the grass

Two people walking on the beach with surfboards

Two giraffes standing in a grassy field

A man riding a dirt bike on a dirt track

Image Captioning: Example Results

A tennis player in action on the court

Captions generated using neuraltalk2All images are CC0 Public domain: cat suitcase, cat tree, dog, bear, surfers, tennis, giraffe, motorcycle

Image Captioning: Failure Cases

A woman is holding a cat in her hand

A woman standing on a beach holding a surfboard

A person holding a computer mouse on a desk

A bird is perched on a tree branch

A man in a baseball uniform throwing a ball

Captions generated using neuraltalk2All images are CC0 Public domain: fur coat, handstand, spider web, baseball

Image Captioning with Attention

Xu et al, “Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015Figure copyright Kelvin Xu, Jimmy Lei Ba, Jamie Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Benchio, 2015. Reproduced with permission.

RNN focuses its attention at a different spatial location when generating each word

Image: H x W x 3

Features: L x D

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image: H x W x 3

Features: L x D

Distribution over L locations

Image: H x W x 3

Features: L x D

Weighted combination of features

z1Weighted

features: D

Image: H x W x 3

Features: L x D

Weighted features: D y1

First wordXu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Image: H x W x 3

Features: L x D

First word

Weighted features: D

Distribution over vocab

Image: H x W x 3

Features: L x D

First word

z2 y2Weighted

features: D

Image: H x W x 3

Features: L x D

First word

z2 y2Weighted

features: D

Soft attention

Hard attention

Visual Question Answering

Agrawal et al, “VQA: Visual Question Answering”, ICCV 2015Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016Figure from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.

Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 2016Figures from Zhu et al, copyright IEEE 2016. Reproduced for educational purposes.

Visual Question Answering: RNNs with Attention

Multilayer RNNs

Vanilla RNN Gradient Flow Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013

Vanilla RNN Gradient FlowBackpropagation from ht to ht-1 multiplies by W (actually Whh

Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013

Vanilla RNN Gradient Flow

h0 h1 h2 h3 h4

x1 x2 x3 x4

Computing gradient of h0 involves many factors of W(and repeated tanh)

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1: Exploding gradients

Largest singular value < 1:Vanishing gradients

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value < 1:Vanishing gradients

Gradient clipping: Scale gradient if its norm is too bigComputing gradient

of h0 involves many factors of W(and repeated tanh)

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value < 1:Vanishing gradients Change RNN architecture

Long Short Term Memory (LSTM)

Hochreiter and Schmidhuber, “Long Short Term Memory”, Neural Computation 1997

Vanilla RNN LSTM

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

vector from before (h)

vector from below (x)

sigmoid

4h x 2h 4h 4*h

f: Forget gate, Whether to erase celli: Input gate, whether to write to cellg: Gate gate (?), How much to write to cello: Output gate, How much to reveal cell

☉ ht

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

☉ ht

Long Short Term Memory (LSTM): Gradient Flow[Hochreiter et al., 1997]

Backpropagation from ct to ct-1 only elementwise multiplication by f, no matrix multiply by W

c0 c1 c2 c3

Uninterrupted gradient flow!

c0 c1 c2 c3

3x3 conv, 64

7x7 conv, 64 / 2

FC 1000

3x3 conv, 64

3x3 conv, 1283x3 conv, 128 / 2

3x3 conv, 1283x3 conv, 128

Similar to ResNet!

c0 c1 c2 c3

3x3 conv, 64

7x7 conv, 64 / 2

FC 1000

3x3 conv, 64

3x3 conv, 1283x3 conv, 128 / 2

3x3 conv, 1283x3 conv, 128

Similar to ResNet!

In between:Highway Networks

Srivastava et al, “Highway Networks”, ICML DL Workshop 2015

Other RNN Variants

[LSTM: A Search Space Odyssey, Greff et al., 2015]

[An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015]

GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]

Summary- RNNs allow a lot of flexibility in architecture design- Vanilla RNNs are simple but don’t work very well- Common to use LSTM or GRU: their additive interactions

improve gradient flow- Backward flow of gradients in RNN can explode or vanish.

Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)

- Better/simpler architectures are a hot topic of current research- Better understanding (both theoretical and empirical) is needed.

Next time: Midterm!

Then Detection and Segmentation

Lecture 10: Recurrent Neural Networks - Stanford...

Documents

Lecture’9’&’10:’’ Stereo’Vision’vision.stanford.edu/.../lectures/lecture9_10_stereo_cs131.pdf · Lecture 9 & 10 - !!! Fei-Fei Li! Algorithm: • Re-project image planes

Lecture 12 - Stanford University CS231n: Convolutional ...cs231n.stanford.edu/slides/2016/winter1516_lecture12.pdf · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 12 - 22

Lecture 15: Object recognition: Bag of Words models & …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/...Fei-Fei Li Lecture 15 - Lecture 15: Object recognition: Bag of Words

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 1 May ...cs231n.stanford.edu › slides › 2020 › lecture_11.pdfFei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - May 14, 2020

Lecture 15: Object recognition: Part based generative modelsvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 15 - • Task: Estimation of model parameters Learning

Image Classification pipeline Lecture 2 - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture2.pdf · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 2 - 2

Lecture 3: Linear Filters - Artificial Intelligencevision.stanford.edu/teaching/cs231a_autumn1213/lecture/lecture3_linear_filters_cs231a...Fei-Fei Li Lecture 3 - 3 3‐Oct‐12 Images

Lecture’12:’Clustering’and’ Segmentaon’ - Stanford …vision.stanford.edu/teaching/cs131_fall1314_nope/...Lecture 12 - !!! Fei-Fei Li! Whatwe’will’learn’today’ •

Lecture 7: Camera Models - Artificial Intelligencevision.stanford.edu/teaching/cs231a_autumn1112/... · Fei-Fei Li Lecture 7 - Cameras & Lenses • Laws of geometric optics – Light

Lecture’6:’’ Finding’Features’(part1/2)vision.stanford.edu/teaching/cs131_fall1415/lectures/Lecture6... · Lecture 6 - !!! Fei-Fei Li! Requirements’ • Region’extracHon’needs’to’be’repeatable’and’accurate’

Lecture’13:’k,means’and’’ mean,shi4’clustering’vision.stanford.edu/teaching/cs131_fall1516/lectures/... · 2015-11-03 · Fei-Fei Li Lecture 13 - Lecture’13:’k,means’and’’

Lecture’13:’k,means’and’’ mean,shi4’clustering’vision.stanford.edu/teaching/cs131_fall1415/lectures/... · 2014-11-03 · Lecture 13 - !!! Fei-Fei Li! Lecture’13:’k,means’and’’

Lecture 13 - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture13.pdfPinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML

Lecture’4:’ Pixels’and’Filters - Stanford Computer …vision.stanford.edu/teaching/cs131_fall1617/lectures...Fei-Fei Li Lecture 4- Whatwe’will’learn’today? • Images’as’functions

Fei-Fei Li & Justin Johnson & Serena Yeungcs231n.stanford.edu/slides/2019/cs231n_2019_lecture07.pdf · 2019-04-23 · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - 4 April

Administrative - Stanford Universitycs231n.stanford.edu/slides/2016/winter1516_lecture15.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 15 -Lecture 15 - 16 7 Mar 2016 And

Lecture 14: Introduction to Object Recognition & Bag-of ...vision.stanford.edu/teaching/cs231a_autumn1112/... · Lecture 14-Fei-Fei Li Discriminative models Support Vector Machines

Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 ...vision.stanford.edu/teaching/cs231n/slides/2016/... · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 14 - Lecture

Lecture 8: Spatial Localization and Detectioncs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 1 1 Feb 2016 Lecture

Lecture’7:’’ Finding’Features’(part2/2)’vision.stanford.edu/teaching/cs131_fall1415/lectures/lecture7_DoG...Lecture 7 - !!! Fei-Fei Li! Aquickreview • Local’invariantfeatures’