Upload
dean-wyatte
View
438
Download
0
Embed Size (px)
Citation preview
Neural Networks • AI summer is here! • In the last year NNs have – ConFnued SOA advancements in image and speech recogniFon
– Beaten a human player in Go
– Provided some quanFficaFon of “art”
• 100,000,000,000 neurons • 10,000 dendriFc inputs per neuron
• 1 electrical output
How does your brain work?
How to learn the weights?
• If we know what output should look like, can compute error and update weights to minimize it – OpFmizaFon problem, typically use gradient descent
_ Correct output
Output
Error
Gradient descent
• Given a cost funcFon – MSE – Cross-‐entropy – etc.
• Can take step in opposite direcFon of cost gradient by compuFng derivaFve w.r.t. weights
• Scale by learning rate (Fny step)
A brief history of neural networks: The Perceptron
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
~1960: “The perceptron” Universal funcFon approximator
AND
A brief history of neural networks: The Perceptron
~1960: “The perceptron” Universal funcFon approximator
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
…but only if funcFon is linearly separable
XOR
?
A brief history of neural networks: The Perceptron
• Neural network research halts (AI winter)
• Meanwhile… – Support Vector Machine (SVM)
invented, solves non-‐linear problems
• Shif toward separaFon of feature representaFon and classificaFon – Handcraf the best features, train
the SVM (or current state-‐of-‐the-‐art) to do the classificaFon
• Eventually, mulF-‐layer perceptron generalizaFon realized, solves non-‐linear problems – Nobody cares…
A brief history of neural networks: Next ~30 years
h"ps://www.youtube.com/watch?v=3liCbRZPrZA
• Discovering good features is hard! – Requires a lot of domain knowledge – State of the art in computer vision was the culminaFon of years of
collaboraFon between computer vision scienFsts, neuroscienFsts, etc. • Neural networks automaFcally learn features (weights) from examples
based on the task – Each neuron is a “feature detector” that acFvates proporFonately to how
well its input matches its weights – Deep learning: Shif back from hand-‐crafed features to features learned
from task
General learning methods for robust feature representaFon and classificaFon
Hidden 1 Hidden 2 Hidden 3
• Handful of researchers sFll toiling away on neural networks with li2le-‐to-‐no recogniFon – 2012: one grad student studying how to implement neural networks on GPUs submits
first “deep learning” architecture to image recogniFon challenge, wins by a landslide – 2013: Almost every submission the is a deep neural network executed on GPU
(conFnuing trend)
A brief history of neural networks: Deep learning bandwagon
First deep neural network
• 8 layers • 650,000 “neurons” (units) • 60,000,000 learned parameters • 630,000,000 connecFons • Uses same basic algorithm as mulF-‐layer perceptron to learn weights • Finally caught on because
– Can do it “fast” (~1 week in 2012) thanks to GPU-‐based computaFon – Actually works and with less overfikng due to tricks and massive amounts of data
AlexNet
AlexNet
96 11x11 pixel filter weights learned from ImageNet AlexNet
Handcrafed Textons
Unseen image classificaFons
Neural Networks in 2016 • Variety of libraries that specify
inputs as tensor minibatch and automaFcally compute gradients – Tensorflow – Theano (Keras/Lasagne) – Torch
• Libraries also available for common Neural Network layer types – ConvoluFonal, acFvaFon, pooling, dropout, RNN, etc.
• Almost too easy – Mind the danger zone!
Data science due diligence “Neural Networks sound awesome and will solve all our problems!” • Significant investment in resources. GPU (TPU?) cluster, ramp-‐up
on niche/rapidly-‐evolving tools • Long feedback loop for architecture improvement. Typically launch
many jobs and terminate bad models (see above) • Need a lot of high-‐dimensional data with variability (millions of
unique observaFons and/or heavy data augmentaFon). Delicate balance of increased predicFve power/overfikng
• Hard to debug when not working. Millions of reasons (literally) a model can be wrong, few ways it can be right. “Black magic”
• Deep nonlinear models suffer from interpretability issues. Blackbox model (although acFve research here)
Thanks
Manuel Ruder, Alexey Dosovitskiy, Thomas Brox (2016). ArFsFc style transfer for videos. h2p://arxiv.org/abs/1604.08610
h2ps://www.youtube.com/watch?v=Khuj4ASldmU
“I am comfortable with the SciPy stack and want to understand more”
A Neural Network in 11 lines of Python h2p://iamtrask.github.io/2015/07/12/basic-‐python-‐network/
“I am comfortable with ML libraries and want to build a model”
MNIST • Keras
h2ps://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
• Tensorflow h2ps://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html
Varia'onal Autoencoders (also using MNIST) • Keras
h2p://blog.keras.io/building-‐autoencoders-‐in-‐keras.html • Tensorflow
h2ps://jmetzen.github.io/2015-‐11-‐27/vae.html