Download pdf - 05 Intro to ANN and Deep Learning - Amazon Web Servicesaws-de-media.s3-eu-west-1.amazonaws.com/images... · Learning Parameters • Learning Rate:How far to move down in the direction

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Constantin GonzalezPrincipal Solutions Architect, Amazon Web Services

[email protected] 2017

Deep LearningBasics and Intuition

Deductive Logic

How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?

https://en.wikiquote.org/wiki/Sherlock_Holmes

Deductive Logic

P Q P ∧Q P∨ Q P∴ Q T T T T TT F F T FF T F T TF F F F T

Deductive Logic

P Q P ∧Q P∨ Q P∴ Q T T T T TT F F T FF T F T TF F F F T

• 𝑃 = 𝑇 ∧ 𝑄 = 𝑇 ∴ 𝑃 ∧ 𝑄 = 𝑇

• 𝑃 ∧ 𝑄 ∴ 𝑃 → 𝑄;∼ 𝑃 ∴ 𝑃 → 𝑄

• P→ QP_________∴ Q

Complexities of Deductive LogicPredicates (n) # of rows in truth table (20)

1 12 43 8… …10 1024… …20 1,048,576… …30 1,073,741,824


1 12 43 8… …10 1024… …20 1,048,576… …30 1,073,741,824

[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)]Φ ∧~Φ

_____________________∴F


1 12 43 8… …10 1024… …20 1,048,576… …30 1,073,741,824

[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧~Φ) ∴ 𝐹

• Useless for dealing with partial information.


1 12 43 8… …10 1024… …20 1,048,576… …30 1,073,741,824


• Useless for dealing with non-deterministic inference.

[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧~Φ) ∴ 𝐹


1 12 43 8… …10 1024… …20 1,048,576… …30 1,073,741,824


• Useless for dealing with non-deterministic inference.

• We perform very poorly in computing complex logical statements intuitively.

[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧~Φ) ∴ 𝐹

http://bit.ly/2rwtb0o

Search Trees and Graphs

http://aima.cs.berkeley.edu/

Search Trees and Graphs

http://aima.cs.berkeley.edu/

http://bit.ly/2rwtb0o

Neurobiology

Perceptron

𝑓 𝑥>, 𝑤> = Φ(𝑏 + Σ>(𝑤>. 𝑥>))

Φ 𝑥 = E1, 𝑖𝑓𝑥 ≥ 0.50, 𝑖𝑓𝑥 < 0.5

Example of Perceptron

𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1𝐼1 = 𝐼2 = 𝐵1 = 1

𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0𝐼2 = 0; 𝐼1 = 𝐵1 = 1

Linearity and Non-Linear Solutions

P Q P ∧QT T TT F FF T FF F F

P

Qx0

0 0

Linearity and Non-Linear Solutions

P Q P ∧Q P⨁ QT T T TT F F FF T F FF F F T

P

Qx0

0 0 P

Qx0

x 0

• A feedforward neural network is abiologically inspired classificationalgorithm.

• It consist of a (possibly large) numberof simple neuron-like processingunits, organized in layers.

• Every unit in a layer is connectedwith all the units in the previouslayer.

• Each connection has a weight that encodes the knowledge of the network.

• Data enters from the input layer and arrives at the output through hiddenlayers.

• There is no feed back between the layers

Multi-Layer Perceptron

Fully-Connected Multi-Layer Feed-Forward Neural Network Topology

Activation Function (Φ)

• A well-trained ANN has weightsthat amplify the signal anddampen the noise.

• A bigger weight signifies atighter correlation between asignal and the network’s outcome.

• The process of learning for any learning algorithm using weights is the process of re-adjusting the weights and biases

• Back Propagation is a popular training algorithm and is based on distributing the blame for the error and divide it between the contributing weights.

Training an ANN

Error Surface

https://commons.wikimedia.org/wiki/File:Error_surface_of_a_linear_neuron_with_two_input_weights.png

Gradient Descent

Learning Parameters

• Learning Rate: How far to move down in the direction of steepest gradient. • Online Learning: Weights are updated at each step. Often slow to learn.• Batch Learning: Weights are updated after the whole of training data.

Often makes it hard to optimize.• Mini-Batch: Combination of both when we break up the training set into

smaller batches and update the weights after each mini-batch.

SDG Visualization

http://sebastianruder.com/optimizing-gradient-descent/

Overview of Learning Rate

Data Encoding

Source: Alec Radford

https://www.tensorflow.org/get_started/mnist/beginners

Apache MXNet

Why Apache MXNet?

Most Open Best On AWSOptimized for deep learning on AWS

Accepted into theApache Incubator

(Integration with AWS)

0

4

8

12

16

1 2 4 8 16

Amazon AI: Scaling With MXNet

91%Efficiency

IdealInception v3Resnet

Alexnet

IdealInception v3Resnet

Alexnet

88%Efficiency

0

64

128

192

256

1 2 4 8 16 32 64 128 256

Amazon AI: Scaling With MXNet

Building a Fully Connected Network in Apache MXNet

128

Building a Fully Connected Network in Apache MXNet

Training a Fully Connected Network in Apache MXNet

Demo Timehttp://localhost:9999/notebooks/mxnet-notebooks/python/tutorials/mnist.ipynb

• How ANN predicts Stock Market Indices? by Vidya Sagar Reddy Gopala, Deepak Ramu• Deep Learning, O’Reilly Media Inc. ISBN: 9781491914250; By: Josh Patterson and Adam Gibson• Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks, CreateSpace Independent Publishing

Platform; ISBN: 1505714346; By: Jeff Heaton• Coursera; Neural Networks for Machine Learning by Jeoff Hinton at University of Toronto• https://www.researchgate.net/figure/267214531_fig1_Figure-1-RNN-unfolded-in-time-Adapted-from-Sutskever-2013-with-

permission• www.yann-lecun.com• http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/• https://github.com/cyrusmvahid/mxnet-notebooks/tree/master/python

References

Thank you!

Constantin [email protected]