81
INF 5400 Machine learning for image classification Lecture 11: Recurrent neural networks March 27 , 2019 Tollef Jahren

INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400 Machine learning for image classification

Lecture 11: Recurrent neural networks

March 27 , 2019

Tollef Jahren

Page 2: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Outline

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 2

Page 3: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

About today

Page 3

Goals:

• Understand how to build a recurrent neural network

• Understand why long range dependencies is a challenge for recurrent

neural networks

Page 4: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Readings

• Video:

– cs231n: Lecture 10 | Recurrent Neural Networks

• Read:

– The Unreasonable Effectiveness of Recurrent Neural Networks

• Optional:

– Video: cs224d L8: Recurrent Neural Networks and Language Models

– Video: cs224d L9: Machine Translation and Advanced Recurrent LSTMs

and GRUs

Page 4

Page 5: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 5

Progress

Page 6: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Overview

• What have we learnt so far?

– Fully connected neural network

– Convolutional neural network

• What worked well with these?

– Fully connected neural network

works well when the input data is

structured

– Convolutional neural network works

well on images

• What does not work well?

– Processing data with unknown length

Page 6

Page 7: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 7

Progress

Page 8: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Recurrent Neural Network (RNN)

• Takes a new input

• Manipulate the state

• Reuse weights

• Gives a new output

Page 8

𝑦𝑡

𝑥𝑡

Page 9: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Recurrent Neural Network (RNN)

• Unrolled view

• 𝑥𝑡 : The input vector:

• ℎ𝑡: The hidden state of the RNN

• 𝑦𝑡 : The output vector:

Page 9

𝑦𝑡

𝑥𝑡

Page 10: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Recurrent Neural Network (RNN)

• Recurrence formula:

– Input vector: 𝑥𝑡– Hidden state vector: ℎ𝑡

Page 10

𝑦𝑡

𝑥𝑡

Note: We use the same function and parameters

for very “time” step

Page 11: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

(Vanilla) Recurrent Neural Network

• Input vector: 𝑥𝑡• Hidden state vector: ℎ𝑡• Output vector: 𝑦𝑡• Weight matrices: 𝑊ℎℎ, 𝑊ℎ𝑥 ,𝑊ℎ𝑦

General form:

ℎ𝑡 = 𝑓𝑤 ℎ𝑡−1, 𝑥𝑡

Vanilla RNN:

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏

𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡

Page 11

𝑦𝑡

𝑥𝑡

Page 12: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Computational Graph

Page 12

Page 13: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Computational Graph

Page 13

Page 14: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Computational Graph

Page 14

Page 15: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Computational Graph

Page 15

Page 16: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Computational Graph

Page 16

Page 17: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Computational Graph

Page 17

Page 18: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Computational Graph

Page 18

Page 19: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Example: Language model

• Task: Predicting the next character

• Training sequence: “hello”

• Vocabulary:

– [h, e, l, o]

• Encoding: Onehot

Page 19

Page 20: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Predicting the next character

• Task: Predicting the next character

• Training sequence: “hello”

• Vocabulary:

– [h, e, l, o]

• Encoding: Onehot

• Model:

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡

Page 20

Page 21: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Predicting the next character

• Task: Predicting the next character

• Training sequence: “hello”

• Vocabulary:

– [h, e, l, o]

• Encoding: Onehot

• Model:

ℎ𝑡 = tanh 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏

𝑦𝑡 = 𝑊ℎ𝑦ℎ𝑡

Page 21

Page 22: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 22

Page 23: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 23

Page 24: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 24

Page 25: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Predicting the next character

• Vocabulary:

– [h, e, l, o]

• At test time, we can sample from the

model to synthesis new text.

Page 25

Page 26: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 26

Progress

Page 27: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Input-output structure of RNN’s

• One-to-one

• one-to-many

• many-to-one

• Many-to-many

• many-to-many (encoder-decoder)

Page 27

Page 28: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: One-to-one

Normal feed forward neural network

Page 28

Page 29: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: One-to-many

Page 29

Task:

• Image Captioning

– (Image Sequence of words)

Page 30: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Many-to-one

Page 30

Tasks:

• Sentiment classification

• Video classification

Page 31: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Many-to-many

Page 31

Task:

• Video classification on frame level

Page 32: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

RNN: Many-to-many (encoder-

decoder)

Page 32

Task:

• Machine Translation

(English French)

Page 33: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 33

Progress

Page 34: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Training Recurrent Neural Networks

• The challenge in training a recurrent neural network is to preserve long range

dependencies.

• Vanilla recurrent neural network

– ℎ𝑡 = 𝑓 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏

– 𝑓 = 𝑟𝑒𝑙𝑢() can easily cause exploding values and/or gradients

– 𝑓 = tanh() can easily cause vanishing gradients, difficult to remember

more than ~7 steps.

• Finite memory available

Page 34

Page 35: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Exploding or vanishing gradients

• tanh() solves the exploding value problem

• tanh() does NOT solve the exploding gradient problem, think of a scalar input

and a scalar hidden state.

ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏

𝜕ℎ𝑡𝜕ℎ𝑡−1

= 1 − tanh2(𝑊ℎℎℎ𝑡−1 +𝑊ℎ𝑥𝑥𝑡 + 𝑏) ⋅ 𝑊ℎℎ

The gradient can explode/vanish exponentially in time (steps)

• If |𝑊ℎℎ| < 1, vanishing gradients

• If |𝑊ℎℎ| > 1, exploding gradients

Page 35

Page 36: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Exploding or vanishing gradients

• Conditions for vanishing and exploding gradients for the weight matrix, 𝑊ℎℎ:

– If largest singular value > 1: Exploding gradients

– If largest singular value < 1: Vanishing gradients

Page 36

“On the difficulty of training Recurrent Neural Networks, Pascanu et al. 2013

Page 37: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Gradient clipping

• To avoid exploding gradients, a simple

trick is to set a threshold of the norm of

the gradient matrix, 𝜕ℎ𝑡

𝜕ℎ𝑡−1.

• Error surface of a single hidden unit RNN

• High curvature walls

• Solid lines: Standard gradient descent

trajectories

• Dashed lines gradient rescaled to fixes

size

Page 37

“On the difficulty of training Recurrent Neural Networks, Pascanu et al. 2013

Page 38: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Vanishing gradients - “less of a

problem”

• In contrast to feed-forward networks, RNNs will not stop learning in spite of

vanishing gradients.

• The network gets “fresh” inputs each step, so the weights will be updated.

• The challenge is to learning long range dependencies. This can be improved

using more advanced architectures.

• Output at time step t is mostly effected by the states with to time t.

Page 38

Page 39: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Initialization of the weights

• Potential initialization schemes for the transition matrices, 𝑊ℎℎ:

– Activation (tanh): 𝑉𝑎𝑟[𝑊𝑙] =2

𝑛𝑙 + 𝑛𝑙+1

– Activation (tanh): 𝑉𝑎𝑟[𝑊𝑙] =1

𝑛𝑙

– Activation (relu): Orthogonal matrix

– Activation (relu): Identity matrix

• Initial state, ℎ0:

– Initialize to zeros or random

– Note: We can learn the initial state by backpropagate into it.

Page 39

Page 40: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Some initialization schemes can help

RNNs

• Identity initialization of state

transition matrix, 𝑊ℎℎ:

– Initially no state change can

be a good start

– Avoid exploding/vanishing

gradients

– May be able to use ReLu

– The output should be the

sum of the two values which

have “mask” value of 1.

Page 40

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Page 41: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Backpropagation through time (BPTT)

• Training of a recurrent neural network is done using backpropagation and

gradient descent algorithms

– To calculate gradients you have to keep your inputs in memory until you

get the backpropagating gradient

– Then what if you are reading a book of 100 million characters?

Page 41

Page 42: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Truncated Backpropagation thought

time

• The solution is that you stop at some point and update the weight as if you

were done.

Page 42

Page 43: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Truncated Backpropagation thought

time

• The solution is that you stop at some point and update the weight as if you

were done.

Page 43

Page 44: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Truncated Backpropagation thought

time

• The solution is that you stop at some point and update the weight as if you

were done.

Page 44

Page 45: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Truncated Backpropagation thought

time

• Advantages:

– Reducing the memory requirement

– Faster parameter updates

• Disadvantage:

– Not able to capture longer dependencies then the truncated length.

Page 45

Page 46: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

What effect do you get from stopping

the gradients?• You are not guarantied to remember interactions longer than the number of

steps.

• Say a text call a person Anna then much later refer to her as “she”

• A model with a lot of steps could directly learn to remember Anna.

• RNNs can in test time remember longer than the number of steps:

– You may have learned from earlier cases where “name” and “he/she” were

much closer

– The model knows that it should save gender of the name

Page 46

Page 47: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

A fun fix! Synthetic gradients

• Train a network to predict the gradients

you would get if all steps were connected

• You can learn that if you see Anna in the

state, you should expect gradients “asking

for Anna” later

• This is of course not perfect since you

don’t have all the coming information

Page 47

Decoupled Neural Interfaces using Synthetic Gradients

Page 48: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 48

Progress

Page 49: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Inputting Shakespeare

• Input a lot of text

• Try to predict the next character

• Learning spelling, quotes and

punctuation.

Page 49

https://gist.github.com/karpathy/d4dee566867f8291f086

Page 50: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Inputting Shakespeare

Page 50

https://gist.github.com/karpathy/d4dee566867f8291f086

Page 51: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Inputting latex

Page 51

Page 52: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Inputting C code (linux kernel)

Page 52

Page 53: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Interpreting cell activations

• Semantic meaning of the hidden state vector

Page 53

Visualizing and Understanding Recurrent Networks

Page 54: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Interpreting cell activations

Page 54

Visualizing and Understanding Recurrent Networks

Page 55: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Interpreting cell activations

Page 55

Visualizing and Understanding Recurrent Networks

Page 56: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Interpreting cell activations

Page 56

Visualizing and Understanding Recurrent Networks

Page 57: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Interpreting cell activations

Page 57

Visualizing and Understanding Recurrent Networks

Page 58: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 58

Progress

Page 59: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Gated Recurrent Unit (GRU)

• A more advanced recurrent unit

• Uses gates to control information flow

• Used to improve long term dependencies

Page 59

Page 60: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Gated Recurrent Unit (GRU)

Page 60

Vanilla RNN

• Cell state ℎ𝑡 = 𝑡𝑎𝑛ℎ(𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1 + 𝑏)

GRU

• Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢)

• Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟)

• Candidate cell state ෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏)

• Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡

GRU has the ability to adding and removing to the state, not “transforming the

state” only.

With Γ𝑟 as ones and Γ𝑢 as zeros, GRU ՜ Vanilla RNN

Page 61: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Gated Recurrent Unit (GRU)

• Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢)

• Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟)

• Candidate cell state ෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏)

• Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡

Page 61

Summary:

• The update gate , Γ𝑢, can easily be close to 1 due to the sigmoid activation

function. Copying the previous hidden state will prevent vanishing gradients.

• With Γ𝑢 = 0 and Γ𝑟 = 0 the GRU unit can forget the past.

• In a standard RNN, we are transforming the hidden state regardless of how

useful the input is.

Page 62: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Long Short Term Memory (LSTM)

(for reference only)

Page 62

GRU LSTM

Update gate Γ𝑢 = 𝜎(𝑊𝑢𝑥𝑡 + 𝑈𝑢ℎ𝑡−1 + 𝑏𝑢) Forget gate Γ𝑓 = 𝜎(𝑊𝑓𝑥𝑡 + 𝑈𝑓 ℎ𝑡−1 + 𝑏𝑓)

Reset gate Γ𝑟 = 𝜎(𝑊𝑟𝑥𝑡 + 𝑈𝑟ℎ𝑡−1 + 𝑏𝑟) Input gate Γ𝑖 = 𝜎(𝑊𝑖𝑥𝑡 + 𝑈𝑖 ℎ𝑡−1 + 𝑏𝑖)

Output gate Γ𝑜 = 𝜎(𝑊𝑜𝑥𝑡 + 𝑈𝑜ℎ𝑡−1 + 𝑏𝑜)

Candidate cell

state

෨ℎ𝑡 = 𝑡𝑎𝑛ℎ 𝑊𝑥𝑡 + 𝑈(Γ𝑟∘ ℎ𝑡−1 + 𝑏) Potential cell memory ǁ𝑐𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑐𝑥𝑡 + 𝑈𝑐 ℎ𝑡−1 + 𝑏𝑐)

Final cell state ℎ𝑡 = Γ𝑢 ∘ ℎ𝑡−1 + 1 − Γ𝑢 ∘ ෨ℎ𝑡 Final cell memory 𝑐𝑡 = Γ𝑓 ∘ 𝑐𝑡−1 + Γ𝑖 ∘ ǁ𝑐𝑡

Final cell state ℎ𝑡 = Γ𝑜 ∘ tanh(𝑐𝑡)

LSTM training

• One can initialize the biases of the forget

gate such that the values in the forget gate

are close to one.

Page 63: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Multi-layer Recurrent Neural Networks

• Multi-layer RNNs can be used to enhance

model complexity

• Similar as for feed forward neural networks,

stacking layers creates higher level feature

representation

• Normally, 2 or 3 layer deep, not as deep as

conv nets

• More complex relationships in time

Page 63

Page 64: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Bidirectional recurrent neural network

• The blocks can be vanilla, LSTM and GRU recurrent units

• Real time vs post processing

Page 64

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡+1

ℎ𝑡 = ℎ𝑡 , ℎ𝑡

𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏)

Page 65: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Bidirectional recurrent neural network

• Example:

– Task: Is the word part of a person’s name?

– Text 1: He said, “Teddy bear are on sale!”

– Text 2: He said, “Teddy Roosevelt was a great President!”

Page 65

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡−1

ℎ𝑡 = 𝑓 𝑊ℎ𝑥𝑥𝑡 +𝑊ℎℎℎ𝑡+1

ℎ𝑡 = ℎ𝑡, ℎ𝑡

𝑦𝑡 = 𝑔(𝑊ℎ𝑦ℎ𝑡 + 𝑏)

Page 66: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

• Overview (Feed forward and convolution neural networks)

• Vanilla Recurrent Neural Network (RNN)

• Input-output structure of RNN’s

• Training Recurrent Neural Networks

• Simple examples

• Advanced models

• Advanced examples

Page 66

Progress

Page 67: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

• Combining text RNN with image CNN

• Learning RNN to interpret top features from CNN

Page 67

Deep Visual-Semantic Alignments for Generating Image Descriptions

Page 68: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

Page 68

Page 69: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

Page 69

Page 70: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

Page 70

Page 71: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

Page 71

Page 72: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

Page 72

Page 73: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

Page 73

Page 74: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning

Page 74

Page 75: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning: Results

Page 75

NeuralTalk2

Page 76: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Image Captioning: Failures

Page 76

NeuralTalk2

Page 77: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Generating images

• Input top and left pixel to predict a new pixel.

• Fill in holes.

Page 77

Page 78: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Machine translation

• Example:

– German to English:

– “Echt dicke Kiste” to “Awesome sauce”

• Tokenize each word/character, e.g. as a onehot vector.

• <EOS> token

• Can use cross-entropy loss for every timestep

• Character model vs vocabulary model:

– Pros character model :

• Much smaller class size (input/ouput vector length)

• You don’t need to worry about unknown words

– Pros vocabulary model

• Shorter sequences, easier to capture long term dependencies

• Less computationally expensive to train

Page 78

Page 79: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

Machine translation

Page 79

• RNN: Many-to-many (encoder-decoder)

• Tips which may help:

• Different 𝑊 for the encoder and the decoder

• Concatenate the last hidden state of the encoder to all hidden state in the

decoder

• In the decoder, pass in 𝑦𝑛 as 𝑥𝑛+1• Reverse the order:

• ABC XY

• CBA XY

Encoder

Decoder

Page 80: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

CNN + RNN

• With RNN on top of CNN features,

you capture a larger time horizon

Page 80

Long-term Recurrent Convolutional Networks for Visual Recognition

and Description

Page 81: INF 5860 Machine learning for image classification · INF 5400 27.03.2019 Outline • Overview (Feed forward and convolution neural networks) • Vanilla Recurrent Neural Network

INF 5400

27.03.2019

A network can be both convolutional

and recurrent

• By simply changing the matrix

multiplications to convolutions

Page 81

Learning Video Object Segmentation with Visual Memory