Long-Short Term Memory Network - WordPress.com · 2017-11-07 · Long-Short Term Recurrent Networks (LSTM) • Idea: Don’t multiply Multiplication == Vanishing gradients Insteadof

Long-Short Term Memory Network

Hien Van Nguyen

University of Houston

11/6/2017

Why recurrent networks?

• Sequential input, next state depends on previous state

• Generalize to input with variable length

• Consider smaller chunk fewer parameters in model

11/7/2017 Machine Learning 2

What is sequence?


Source: https://uvadlc.github.io/lectures/lecture8.pdf

https://uvadlc.github.io/lectures/lecture8.pdf

One-hot vector


Recurrent networks


Unroll through time

Recurrent networks


Unroll through time

Simple recurrent network

• Linear activation

• Gradient:

• 𝑇𝑇 is the number of timestepsconsidered


Problem of Vanishing/Exploding Gradient

• Review of chain rule

• Apply chain rule:


How change in V at step k will affect loss at step t

On the difficulty of training recurrent networks https://arxiv.org/pdf/1211.5063.pdf

https://arxiv.org/pdf/1211.5063.pdf


• Recall that:

• Using chain rule:




Long-Short Term Recurrent Networks (LSTM)

• Idea: Don’t multiply Multiplication == Vanishing gradients

Instead of multiplying previous hidden state by a matrix to get new state

we add something to old hidden state and get new state (not called “hidden state” but “cell” in LSTM language, explained next)



• Intuition:Not everything is useful to rememberNot every input is useful to takeNot necessary to output each instance



• Comparison of vanilla RNN and LSTM


Vanilla RNN

LSTM

Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://colah.github.io/posts/2015-08-Understanding-LSTMs/




Vanilla RNN

LSTM

LSTM-Step by Step


LSTM-Step by Step


LSTM-Step by Step


LSTM-Step by Step


LSTM-Step by Step





Vanilla RNN

LSTM

LSTM-Gradient Flow


Learning sequence representation:https://d-nb.info/1082034037/34

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

https://d-nb.info/1082034037/34


LSTM-Gradient Flow


Applications – Machine Translation


Source: https://uvadlc.github.io/lectures/lecture8.pdf

https://uvadlc.github.io/lectures/lecture8.pdf

Applications – Machine Translation


Google Pixel Buds

Applications – Image Captioning


Applications – Question Answering


Applications – Visual Question Answering


Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf


Applications – Visual Question Answering


Documents

Long-Short Term Memory Network - WordPress.com · 2017-11-07 · Long-Short Term Recurrent Networks (LSTM) • Idea: Don’t multiply Multiplication == Vanishing gradients Insteadof