Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Deconstructing Deep Learning
@markawest
A Practical-ish Introduction to Data Science, Part 2
Who Am I?
@markawest
Who Am I?
• Java Developer and Architect.
@markawest
Who Am I?
• Java Developer and Architect.
• Currently managing a team of Data Scientists at Bouvet Oslo.
@markawest
Who Am I?
• Java Developer and Architect.
• Currently managing a team of Data Scientists at Bouvet Oslo.
• Leader javaBin (Norwegian JUG).
@markawest
Agenda
Deep Learning 101
Convolutional Neural
Networks
Recurrent Neural
Networks
@markawest
Agenda
Deep Learning 101
Convolutional Neural
Networks
Recurrent Neural
Networks
@markawest
Agenda
Deep Learning 101
Convolutional Neural
Networks
Recurrent Neural
Networks
@markawest
Agenda
Deep Learning 101
Convolutional Neural
Networks
Recurrent Neural
Networks
@markawest
Deep Learning 101
Deep Learning 101 Convolutional Neural Networks
Recurrent Neural Networks
@markawest
@markawest
Deep LearningBiologically inspired machine learning with multi-layered
Artificial Neural Networks.
Source: Neuroscape Neuroscience Center, USCF Source: becominghuman.ai/@venkateshtata9
«Vanilla» Artificial Neural Network
•Multi Layer Peceptron.• Feed-forward.• Densely Connected.•Weighted connections.
@markawest
@markawest
Artificial Neural Network Structure
INPUT LAYEREntry point for incoming
data.
@markawest
Artificial Neural Network Structure
INPUT LAYEREntry point for incoming
data.
@markawest
Artificial Neural Network Structure
HIDDEN LAYERIdentify features from input data and correlate these to
the correct output.
INPUT LAYEREntry point for incoming
data.
HIDDEN LAYERIdentify features from input data and correlate these to
the correct output.
OUTPUT LAYERDeliver the end result
from the ANN
@markawest
Artificial Neural Network Structure
INPUT LAYER
MULTIPLE HIDDEN LAYERS OUTPUT LAYER
@markawest
Deep Artificial Neural Network
i0
i1
NODEINPUTS WEIGHTS
o0
OUTPUT
Artificial Neural Network Node
@markawest
w1
w0
Sum ofWeighted
Inputs
ActivationFunction
i0
i1
NODEINPUTS WEIGHTS
o0
OUTPUT
Artificial Neural Network Node
@markawest
w1
w0
Sum ofWeighted
Inputs
ActivationFunction
It is not unusual for a single node to receive inputs numbering in the hundreds of thousands.
This is especially true for densely connected ANNs, whereeach node recieves inputs from all nodes in the previous
layer.
i0
i1
NODEINPUTS WEIGHTS
o0
OUTPUT
Artificial Neural Network Node
@markawest
w1
w0
Sum ofWeighted
Inputs
ActivationFunction
Each input is paired with an adjustable weight. Weights act to amplify or dampen
specific inputs.
Weights are adjusted during the training phase to improve accuracy of the ANN.
i0
i1
NODEINPUTS WEIGHTS
o0
OUTPUT
Artificial Neural Network Node
@markawest
w1
w0
Weighted Inputs aresummed up and then
passed through an Activation Function.
The Activation Functiondecides what strength ofsignal (if any) should be
sent onwards.
Sum ofWeighted
Inputs
ActivationFunction
i0
i1
NODEINPUTS WEIGHTS
o0
OUTPUT
Artificial Neural Network Node
@markawest
w1
w0
Sum ofWeighted
Inputs
ActivationFunction
The result from the Activation Function is sent onwards to the next layer in the ANN.
In a densely connected ANN the output from this node would be sent to all nodes in the next layer.
i0
i1
NODEINPUTS WEIGHTS
o0
OUTPUT
Artificial Neural Network Node
@markawest
w1
w0
Sum ofWeighted
Inputs
ActivationFunction
Activation Functions
@markawest
0
0-1
-1
1
1
SigmoidTanhReLU
Out
put
Input
• Convert input signals intooutput signals via simple calculations.
• Many different types withdifferent behaviour and different pros and cons.
Training an Artificial Neural Network
@markawest
Training an Artificial Neural Network
@markawest
«NOT A COW»
0.15
1. Generate PredictionTraining Data sent through theANN to generate predictiions
Training an Artificial Neural Network
@markawest
«NOT A COW»
0.15PREDICTED: NOT A COW (0.15)EXPECTED: COW (1.00)
2. Calculate LossA «Loss Function» calculates differencebetween Predicted and Expected values
1. Generate PredictionTraining Data sent through theANN to generate predictiions
Training an Artificial Neural Network
@markawest
«NOT A COW»
0.15PREDICTED: NOT A COW (0.15)EXPECTED: COW (1.00)
3. Update WeightsWeights updated via «back-propagation» to reduce loss
2. Calculate LossA «Loss Function» calculates differencebetween Predicted and Expected values
1. Generate PredictionTraining Data sent through theANN to generate predictiions
Updating Weights via Gradient Descent
@markawest
GLOBAL MINIMUM
LOCAL MINIMUM
LOSS
WEIGHT
• Goal: Increment Weightvalues to find those that givethe lowest Loss values.
• Challenges: • No map.• Avoiding Local Minima.• Tuning the Learning Rate.
START
Deep Learning 101 Summary• An Artificial Neural Network (ANN) is a set of interconnected Nodes.
• Nodes are organised into Layers (Input, Hidden and Output).
• Deep Learning refers to ANNs with multiple Hidden Layers.
• Inputs to Nodes are amplified or dampened by Weights.
• Weights are tuned via Gradient Descent to optimise the ANNs predictions.
@markawest
Convolutional Neural Networks
Deep Learning 101 Convolutional Neural Networks
Recurrent Neural Networks
@markawest
How computers «see» images
@markawest
0.59 1.0 0.9 0.87
0.25 0.29 0.42 0.46
0.94 0.76 0.89 0.96
0.86 0.93 0.88 0.85
@markawest
Greyscale (28 x 28 x 1) RGB/Colour (28 x 28 x 3)
28 p
ixel
s
28 p
ixel
s
28 pixels3 Channels
Extra «channels» increase complexity
28 pixels1 Channel
Example «Vanilla» ANN Configuration
@markawest
640 x 640 x 3
WEIGHTED CONNECTIONS
1 287 782 400
INPUT LAYER
1 228 800 Nodes
HIDDEN LAYER
1 048 Nodes
Another Approach: Convolutional Filters
@markawest
-1 0 +1
-2 0 +2
-1 0 +1
+1 +2 +1
0 0 0
-1 -2 -1
VerticalEdge
Detection
HorizontalEdge
Detection
• Detect and amplify specific features in images.
• Reusable across images.
• Are weights that are learned and optimised via back-propagation during training.
• A CNN will have many convolutional filters, where eachone learns to detect a specific feature.
Applying Convolutional Filters
@markawest
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
Original ImageConv .Filter Feature Map
-1 0 +1-2 0 +2-1 0 +1
3 x 3 Pixels1 x 1 Stride
+ =
Applying Convolutional Filters
@markawest
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
3
Original Image Feature Map
-1 0 +1-2 0 +2-1 0 +1
+ =-1 0 +1
-2 0 +2
-1 0 +1
Conv .Filter
3 x 3 Pixels1 x 1 Stride
Applying Convolutional Filters
@markawest
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
3 0
Original Image Feature Map
-1 0 +1-2 0 +2-1 0 +1
+ =-1 0 +1
-2 0 +2
-1 0 +1
Conv .Filter
3 x 3 Pixels1 x 1 Stride
Applying Convolutional Filters
@markawest
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
3 0 0
Original Image Feature Map
-1 0 +1-2 0 +2-1 0 +1
+ =-1 0 +1
-2 0 +2
-1 0 +1
Conv .Filter
3 x 3 Pixels1 x 1 Stride
Applying Convolutional Filters
@markawest
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
3 0 0 -3
Original Image Feature Map
-1 0 +1-2 0 +2-1 0 +1
+ =-1 0 +1
-2 0 +2
-1 0 +1
Conv .Filter
3 x 3 Pixels1 x 1 Stride
Applying Convolutional Filters
@markawest
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
3 0 0 -34
Original Image Feature Map
-1 0 +1-2 0 +2-1 0 +1
+ =
-1 0 +1
-2 0 +2
-1 0 +1
Conv .Filter
3 x 3 Pixels1 x 1 Stride
Applying Convolutional Filters
@markawest
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
3 0 0 -34 0 0 -44 0 0 -43 0 0 -4
Original Image Feature Map+ =
-1 0 +1
-2 0 +2
-1 0 +1
-1 0 +1-2 0 +2-1 0 +1
Conv .Filter
3 x 3 Pixels1 x 1 Stride
Applying ReLU Activation Function
@markawest
3 0 0 04 0 0 04 0 0 03 0 0 0
ReLUFeature Map New Feature Map+ =3 0 0 -34 0 0 -44 0 0 -43 0 0 -4
Out
put
Input
0
0
10
10-10
-10
Dimensionality Reduction via Pooling
@markawest
3 0 0 04 0 0 04 0 0 03 0 0 0
Feature MapPool
2 x 2 Pixels2 x 2 Stride
+ =
Option 1: Max
Pooling
Option 2: AveragePooling
Option 3: Sum
Pooling
New Feature Maps
Dimensionality Reduction via Pooling
@markawest
3 0 0 04 0 0 04 0 0 03 0 0 0
Feature MapPool + =4 1.75 7
New Feature Maps
2 x 2 Pixels2 x 2 Stride
Option 1: Max
Pooling
Option 2: AveragePooling
Option 3: Sum
Pooling
3 0 0 04 0 0 04 0 0 03 0 0 0
Dimensionality Reduction via Pooling
@markawest
Feature MapPool + =4 0 1.75 0 7 0
New Feature Maps
2 x 2 Pixels2 x 2 Stride
Option 1: Max
Pooling
Option 2: AveragePooling
Option 3: Sum
Pooling
Dimensionality Reduction via Pooling
@markawest
3 0 0 04 0 0 04 0 0 03 0 0 0
Feature MapPool + =4 04
1.75 0
1.75
7 07
New Feature Maps
2 x 2 Pixels2 x 2 Stride
Option 1: Max
Pooling
Option 2: AveragePooling
Option 3: Sum
Pooling
3 0 0 04 0 0 04 0 0 03 0 0 0
Dimensionality Reduction via Pooling
@markawest
Feature MapPool + =4 04 0
1.75 0
1.75 0
7 07 0
New Feature Maps
2 x 2 Pixels2 x 2 Stride
Option 1: Max
Pooling
Option 2: AveragePooling
Option 3: Sum
Pooling
Feature Identification and Dimensionality Reduction
@markawest
4 04 0
3 0 0 -34 0 0 -44 0 0 -43 0 0 -4
0 0 0 0 0 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 1 1 1 1 00 0 0 0 0 0
3 0 0 04 0 0 04 0 0 03 0 0 0
Original Image
1. Post Conv Filter 2. Post ReLU 3. Post Max Pooling
Feature Map Processing
Use Case: Fashion MNIST
@markawest
• 70 000 labelled images of clothing.
• Each image : 28 x 28 greyscale pixels.
• Distributed evenly across 10 classes.
• Goal: Train a CNN that can identifywhich class a given image belongs to.
CNN Implementation
@markawest
https://github.com/markwest1972/CNN-Example-Google-Colaboratory
@markawest
HIDDEN OUTPUTINPUT
INPUT DENSECONV MAXPOOLING DROPOUT FLATTEN DENSE
Learn a set ofConvolutional Filters
that generateFeature Maps for each Input Image
Randomlydeactivate
nodes during training to fight
overfitting
Convert data from 3D to 1D for
processing by Dense Layers
Entry point forInput Image (28 x 28 x 1)
Report probabilities
for the 10 target Classes
CNN Architecture Overview
DimensionalityReduction of Feature
Maps
CorrelateFeature Maps to
a given Class
CONV
Applies 32 Convolutional Filters to
the Input Image
Resulting 3D array, containing 32 Feature Maps
INPUT DENSECONV MAXPOOLING DROPOUT FLATTEN DENSE
28 x 28 x 1 array of
Pixel Values
@markawest
@markawest50% reduction in size of
Feature Maps via Max PoolingM
AXPO
OLI
NG
32 Feature Maps of26 x 26 pixels
32 Feature Maps of13 x 13 Pixels
INPUT DENSECONV MAXPOOLING DROPOUT FLATTEN DENSE
Training and Testing the CNN
• Train the CNN with Training Data, updatingweights via Back Propagation every 256 records.
• Repeat 10 times (also known as epochs).
• Test the CNN with Test Data and score based on accuracy (match between actualclass vs. predicted class).
@markawest
Fashion MNIST Dataset (70 000)
Training Data (60 000)
Test Data (10 000)
CNN Test Results
@markawest
• Precision: Percent ofcorrect class identifications.
• Recall: Percent of correctlyidentified class members.
• F1-score: Harmonic meanof recall and precision.
• Accuracy: Percent ofcorrect predictions.
Some examples of incorrectly classified images
@markawest
CNN Summary
• CNNs are spatially aware and are therefore particularly efficient for image processing.
• CNNs use learnable and reusable Convolutional Filters to identify features in images.
• CNNs require Dense Layers to map features to classes.
• Pooling is an optional step to reduce the dimensionality of input data.
• Understanding the internal representations of ANNs can give insight into howthey can be improved.
@markawest
Recurrent Neural Networks
Deep Learning 101 Convolutional Neural Networks
Recurrent Neural Networks
@markawest
Which way is the arrow moving?
@markawest
Did this person enjoy the movie?
«If you have never read the book you might like this movie. If you have read the book (as I have) then you will hate it.»
@markawest
A question of time!
«Vanilla» ANNs (such as MLPs) are unable to handle
sequential input!
@markawest
INPUT
HIDDEN
OUTPUT
Recurrent Neural Networks to the Rescue!
@markawest
Input Layer cycles through sequential inputs (i.e. words in a sentence) in given order
Hidden Layer state is maintained betweensequential inputs via a loop, also known as the
RNN’s «working» memory
Output Layer gives final prediction after all sequential inputs have been processed
Unrolling a Recurrent Neural Network
@markawest
i[1-4]
o4
INPUT
HIDDEN
OUTPUT
Unrolling a Recurrent Neural Network
@markawest
=
i[1] i[2] i[3] i[4]
o1 o2 o3 o4
i[1-4]
o4
INPUT
HIDDEN
OUTPUT
Identical ANNs sharing weights and hidden state
The Limitations of «Short-Term» Memory
@markawest
INFLUENCE ON RNN OUTPUT
“the trailers were the best part of the whole movie.”
trailersthe were the best part of the whole movie
Long Short-Term Memory (LSTM) Networks
@markawest
=
i[1] i[2] i[3] i[4]
o1 o2 o3 o4
i[1-4]
o4
«Short Term
Memory»
«Long Term
Memory»
Identical ANNs sharing weights and hidden state
INPUTGATE
OUTPUTGATE
FORGETGATE
concat
OUTPUT (n)
SHORT TERM (n)
LONG TERM (n)
SHORT TERM (n-1)
LONG TERM (n-1)FORGET
GATE
concat
OUTPUT (n-1)
OUTPUTGATE
INPUT (n) INPUT (n+1)
A LSTM Node
@markawest
Use Case : IMDB Reviews
• 50 000 IMDB Reviews, each one classifiedas a "positive" or "negative" review.
• Reviews encoded as a sequence ofintegers, with all punctuation removed.
• Integer represents that words overall frequency in the dataset.
• Goal: Train an LSTM that can accuratelyclassify a movie review.
IMDB Reviews Dataset (50 000)
Training Data(25 000)
Positive (12 500)
Negative (12 500)
Test Data(25 000)
Positive (12 500)
Negative (12 500)
@markawest
Example Positive Review
ENCODED
1, 13, 296, 4, 20, 11, 6, 4435, 5, 13, 66, 447, 12, 4, 177, 9, 321, 5, 4,
114, 9, 518, 427, 642, 160, 2468, 7, 4, 20, 9, 407, 4, 228, 63, 2363, 80, 30, 626, 515, 13, 386, 12, 8, 316, 37, 1232, 4, 698, 1285, 5,
262, 8, 32, 5247, 140, 5, 67, 45, 87
DECODED
<START> i watched the movie in a preview and i really loved it the cast
is excellent and the plot is sometimesabsolutely hilarious another highlight
of the movie is definitely the musicwhich hopefully will be released sooni recommend it to everyone who likes the british humour and especially to
all musicians go and see it's great
@markawest
Pre-Processing Dataset
• Reduced datasetvocabulary to the most popular10 000 words.
• Padded / Truncated eachreview so that all are 500 wordslong.
@markawest
<PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <START> i saw this at the <UNKNOWN> film festival it was awful every clichéd violent richboy fantasy was on display you just knew how it was going to end especially with all the shots of the<UNKNOWN> wife and the rape of the first girl br br the worst part was the q a with the directorwriter and writer producer they tried to come across as <UNKNOWN> but you could tell they're the types that get off on violence i bet anything they frequent <UNKNOWN> and do drugs br br don't waste yourtime i had to keep my boyfriend from walking out of it
Word Embedding for NLP
• Plots words in a multi-dimensional space.
• Different dimensions capture different linguistic relationships between words.
• The closer the words, the stronger therelationship.
• Pre-trained models available (GloVe, fastText, Word2Vec).
@markawestSource: www.shanelynn.ie
LSTM Implementation
@markawest
https://github.com/markwest1972/LSTM-Example-Google-Colaboratory
LSTM Architecture Overview
HIDDEN OUTPUTINPUT
INPUT DENSEEMBEDDING DROPOUT LSTM DROPOUT
@markawest
Accept a list of 500 integers, each
representing a wordof a truncated / padded movie
review
Randomly deactivatenodes during training to
fight overfitting
Randomly deactivatenodes during training to
fight overfitting
Use a Sigmoid ActivationFunction to return a value
between 0 (negative) and 1 (positive)
Learns a 32 dimensionalWord Embedding for all words in the training set
Learn to differentiatebetween positive and
negative reviews based onword order and word
meaning
@markawest
INPUT DENSEEMBEDDING DROPOUT LSTM DROPOUT
Learn a 32 dimensional Word Embedding for all
10 000 uniquewords in thetraining set.
1 2 3 4 5 ... 321 -0.1 1.1 7.3 2.3 4.5 ... 4.92 1.0 9.8 2.2 -4.3 6.7 ... 8.73 9.6 7.7 -6.9 1.3 0.3 ... -0.9... ... ... ... ... ... ... ...
10 000 0 -1.1 3.3 9.9 6 ... -1.0
Dimensions
Wor
dsTable Coordinates updated via Back Propagation
@markawest
INPUT DENSEEMBEDDING DROPOUT LSTM DROPOUT
Learn to differentiate betweenpositive and negative reviews
based on word order and word meaning 1 2 3 4 5 ... 32
1 -0.1 1.1 7.3 2.3 4.5 ... 4.9
2 1.0 9.8 2.2 -4.3 6.7 ... 8.7
3 9.6 7.7 -6.9 1.3 0.3 ... -0.9
... ... ... ... ... ... ... ...
10 000 0 -1.1 3.3 9.9 6 ... -1.0
Dimensions
Wor
ds
1, 13, 296, 4, 20, 11, 6, 4435, 5, 13, 66, 447, 12, 4, 177, 9, 321, 5, 4, 114, 9, 518, 427, 642, 160, 2468, 7, 4, 20, 9, 407, 4, 228, 63, 2363, 80, 30, 626, 515, 13, 386, 12, 8, 316, 37, 1232, 4, 698,
1285, 5, 262, 8, 32, 5247, 140, 5, 67, 45, 87
Training and Testing the LSTM
• Train the LSTM with Training Data, updating weights via Back Propagationevery 256 records.
• Repeat 3 times (also known as epochs).
• Test the LSTM with Test Data and score based on accuracy (match betweenactual class vs. predicted class).
@markawest
IMDB Reviews Dataset (50 000)
Training Data(25 000)
Positive (12 500)
Negative (12 500)
Test Data(25 000)
Positive (12 500)
Negative (12 500)
LSTM Test Results
@markawest
• Precision: Percent ofcorrect class identifications.
• Recall: Percent of correctlyidentified class members.
• F1-score: Harmonic meanof recall and precision.
• Accuracy: Percent ofcorrect predictions.
Example of an incorrectly classified review
«<START> i was very disappointed when this show was canceledalthough i can not vote i live on the island of i sat down to see the show
on <UNKNOWN> and was very surprised that it didn't aired the nextday i read on the internet that it was canceled br br it's true not everyone was as much talented as the other but there were very talented
people singing br br i find it very sad for them br br that they worked so hard and there dreams came <UNKNOWN> down br br its a pity br br»
@markawest
Predicted: Negative, Actual: Positive
Example of an incorrectly classified review
«<START> i was very disappointed when this show was canceledalthough i can not vote i live on the island of i sat down to see the show
on <UNKNOWN> and was very surprised that it didn't aired the nextday i read on the internet that it was canceled br br it's true not everyone was as much talented as the other but there were very talented
people singing br br i find it very sad for them br br that they worked so hard and there dreams came <UNKNOWN> down br br its a pity br br»
@markawest
Predicted: Negative, Actual: Positive
Example of an incorrectly classified review
«<START> i was very disappointed when this show was canceledalthough i can not vote i live on the island of i sat down to see the show
on <UNKNOWN> and was very surprised that it didn't aired the nextday i read on the internet that it was canceled br br it's true not everyone was as much talented as the other but there were very talented
people singing br br i find it very sad for them br br that they worked so hard and there dreams came <UNKNOWN> down br br its a pity br br»
@markawest
Predicted: Negative, Actual: Positive
Example of an incorrectly classified review
«<START> i was very disappointed when this show was canceledalthough i can not vote i live on the island of i sat down to see the show
on <UNKNOWN> and was very surprised that it didn't aired the nextday i read on the internet that it was canceled br br it's true not everyone was as much talented as the other but there were very talented
people singing br br i find it very sad for them br br that they worked so hard and there dreams came <UNKNOWN> down br br its a pity br br»
@markawest
Predicted: Negative, Actual: Positive
RNN/LSTM Summary• RNNs handle sequential data by persisting the hidden state via short-term or
«working» memory.
• LSTMs extend RNNs by adding long-term memory.
• Word Embedding is a powerful technique for modelling semantic relationships between words.
• Pre-processing of data is vital for securing good results.
• Precision, Recall, F1-score and Accuracy are different metrics for measuringmodel performance.
@markawest
Further Reading (Blog articles and code)
BLOG SERIES1. https://www.bouvet.no/bouvet-deler/an-
introduction-to-deep-learning
2. https://www.bouvet.no/bouvet-deler/understanding-convolutional-neural-networks-part-1
3. https://www.bouvet.no/bouvet-deler/understanding-convolutional-neural-networks-part-2
4. https://www.bouvet.no/bouvet-deler/explaining-recurrent-neural-networks
CNN & RNN IMPLEMENTATIONS• https://github.com/markwest1972/CNN-
Example-Google-Colaboratory
• https://github.com/markwest1972/LSTM-Example-Google-Colaboratory
@markawest
Thanks for listening!
@markawest