Upload
do-hoerin
View
35
Download
4
Embed Size (px)
Citation preview
Lecture 72015. 08. 02 | Do Hoerin
Modeling sequence:A brief overviewLecture 7a
Getting target when modeling sequences
• Teach by trying to predict the next term in the input seq.• Blurs the distinction between supervised and unsupervised
learning
Memoryless models
Autoregressive Models
ㅁㅁㅁㅁㅁ
Feed-forward Neural Nets
inputt-2
inputt-1
inputt
Average of individual or vector values
inputt-2
inputt-1
inputt
hidden
Adding one more non-linear hidden unit
Beyond Memoryless Models
• With hidden state, we get a more interesting kind of model• Store more information for a long time.• If the dynamics is noisy and the way it generates outputs from its
hidden state is noisy, we can never know its exact hidden state.• The best we can do is to infer a probability distribution over the
space of hidden state vectors
• This inference is only tractable for 2 types of hidden state model.
Linear Dynamic Systems
output output output
hidden hidden hidden
drivinginput
drivinginput
drivinginput
time
• Real-valued hidden states• Linear dynamics with Gaussian noise• Driving input
• To predict the next output , we need to infer the hidden state
• Linearly transformed Gaussian is Gaussian• Computed using Kalman filtering
Hidden Markov Model
output output output
time
• HMMs have a discrete one-of-N state
• Transitions between states are stocastic
• To predict the next output, we need to infer the probability distribution over hidden states
stateA
stateB
stateC
Limitations of HMMs
• Generate data with HMMs• N hidden states Æ remember only log N bits
• Generate the second half of utterance, with the first half of it• Syntax, semantics, intonation, accent, rate, volume, speaker’s characteristics…• All of these spec’s must all fit• 100 bits of info’ that the first half of an utterance Æ 2^100
• Combine two properties• Distributed hidden state store lots of info’
about the past• Non-linear dynamics to update hidden state
• Deterministic (not stochastic)
Recurrent Neural Network
output output output
hidden hidden hidden
input input input
time
Recurrent Neural Network
• What kinds of behaviors can RNN exhibit?• Oscillate• Settle to point attractors• Behave chaotically
• Computational power of RNNs makes them very hard to train• Discuss on Lecture 7d
Training RNNswith backpropagationLecture 7b
• Layered feed-forward net with shared weight
• Training algorithm in the time domain• The forward pass builds up a stack of
activities of all the units• The backward pass peels activities off the
stack to compute the error• After the backward pass, add together the
derivatives at all the different times for each weight
Training RNNs with backpropagation
time
w1 w2 w3 w4
w1 w2 w3 w4
w1 w2 w3 w4
An irritating extra issue
• We need to specify the initial state of all units• It’s better to treat initial states as learned parameters
• We learn them in the same way as we learn the weights• Start off with an initial random guess for the initial states.• At the end of each training sequence, backpropagate through time all the way
to the initial states to get the gradient of the error function with respect to each initial state.
• Adjust the initial states by following the negative gradient.
• Initial states of all the units
• The initial states of a subset of the units
• States of a subset at every steps• Natural way to model most sequential data
Providing input to recurrent networks
time
w1 w2 w3 w4
w1 w2 w3 w4
w1 w2 w3 w4
• Specified desired final activities
• Specified desired activities of all units for the last few step
• Specified desired activity of a subset
Teaching signals for recurrent networks
time
w1 w2 w3 w4
w1 w2 w3 w4
w1 w2 w3 w4
A toy exampleof training an RNNLecture 7c
The algorithm for binary addition
No carryPrint 1
CarryPrint 1
No carryPrint 0
CarryPrint 0
00
00
00
00
11
11
11
11
1 00 1
1 00 1
1 00 1
1 00 1
• 2 input units, 1 output unit
• desired output at each time step isthe output for the column that was provided as input two steps ago
• It takes one time step to update the hidden units based on the two input digits.
• It takes another time step for the hidden units to cause the output.
A recurrent net for binary
0 0 1 1 0 1 0 0
0 1 0 0 1 1 0 1
1 0 0 0 0 0 0 1time
in
3 fully interconnected hidden units
out
• It learns four distinct patterns of activity for the 3 hidden units• Patterns correspond to the nodes in the finite state automaton. • The automaton is restricted to be in exactly one state at each time.• The hidden units are restricted to have exactly one vector of activity at each
time.
• With N hidden neurons, it has 2^N possible binary activity vectors
What the network learns
Why is difficultto train an RNNLecture 7d
• In the forward pass : use squashing function to prevent exploding
• In the backward pass : completely linear• After forward pass, slope of the blue line is fixed• In an RNN trained on long sequence, it can
easily explode or vanish• So RNN have difficulty dealing with long-range
dependencies
The backward pass is linear
Why the back-propagated gradient blows up
• If we start a trajectory within an attractor, small changes in where we start make no difference to where we end up.
• But if we start almost exactly on the boundary, tiny changes can make a huge difference.
• Long Short Term Memory (Discuss on Lecture 7e)• Designed to remember values for a long time
• Hessian Free Optimization• Deal with fancy optimizer that can detect directions with a tiny gradient
but even smaller curvature
• Echo State Networks
• Good initialization with momentum
Four effective ways to learn an RNN
Long Short Term Memory (LSTM)Lecture 7d
• Designed a memory cell using logistic and linear units with multiplicative interactions
• Information gets into the cell whenever it’s write gate is on
• Information stays so long as its keepgate is on
• Information can be read from the cell by turning on its read gate
Long Short Term Memory
• INPUT : sequence of (x, y, p) – coordinates, status of pen(up or down)
• OUTPUT : sequence of character
• Graves & Schmidhuber (2009) showed that RNNs with LSTM are currently the best systems for reading cursive writing.
• Online handwriting recognition by an RNN with Long Short Term Memory (from Alex Graves)
• [Refer] https://www.youtube.com/watch?v=-yX1SYeDHbg
Reading Cursive Handwriting