Intro to Neural Networks

Intro to Neural Networks

Dean Wya2e Boulder Data Science

@drwya2e June 9, 2016

Neural Networks •  AI summer is here! •  In the last year NNs have –  ConFnued SOA advancements in image and speech recogniFon

–  Beaten a human player in Go

–  Provided some quanFficaFon of “art”

About me

•  100,000,000,000 neurons •  10,000 dendriFc inputs per neuron

•  1 electrical output

How does your brain work?

One simple abstracFon

Dendri'c input

Synap'c weights

Soma Axonal output

Digression into regression

•  Linear regression

•  LogisFc regression

How to learn the weights?

•  If we know what output should look like, can compute error and update weights to minimize it – OpFmizaFon problem, typically use gradient descent

_ Correct output

Output

Error

Gradient descent

•  Given a cost funcFon – MSE – Cross-‐entropy – etc.

•  Can take step in opposite direcFon of cost gradient by compuFng derivaFve w.r.t. weights

•  Scale by learning rate (Fny step)

A brief history of neural networks: The Perceptron

x1 x2 y

0 0 0

0 1 0

1 0 0

1 1 1

~1960: “The perceptron” Universal funcFon approximator

AND


~1960: “The perceptron” Universal funcFon approximator

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

…but only if funcFon is linearly separable

XOR

?


•  Neural network research halts (AI winter)

•  Meanwhile… –  Support Vector Machine (SVM)

invented, solves non-‐linear problems

•  Shif toward separaFon of feature representaFon and classificaFon –  Handcraf the best features, train

the SVM (or current state-‐of-‐the-‐art) to do the classificaFon

•  Eventually, mulF-‐layer perceptron generalizaFon realized, solves non-‐linear problems –  Nobody cares…

A brief history of neural networks: Next ~30 years

h"ps://www.youtube.com/watch?v=3liCbRZPrZA

Handcrafed arFsanal features

•  Discovering good features is hard! –  Requires a lot of domain knowledge –  State of the art in computer vision was the culminaFon of years of

collaboraFon between computer vision scienFsts, neuroscienFsts, etc. •  Neural networks automaFcally learn features (weights) from examples

based on the task –  Each neuron is a “feature detector” that acFvates proporFonately to how

well its input matches its weights –  Deep learning: Shif back from hand-‐crafed features to features learned

from task

General learning methods for robust feature representaFon and classificaFon

Hidden 1 Hidden 2 Hidden 3

•  Handful of researchers sFll toiling away on neural networks with li2le-‐to-‐no recogniFon –  2012: one grad student studying how to implement neural networks on GPUs submits

first “deep learning” architecture to image recogniFon challenge, wins by a landslide –  2013: Almost every submission the is a deep neural network executed on GPU

(conFnuing trend)

A brief history of neural networks: Deep learning bandwagon

First deep neural network

•  8 layers •  650,000 “neurons” (units) •  60,000,000 learned parameters •  630,000,000 connecFons •  Uses same basic algorithm as mulF-‐layer perceptron to learn weights •  Finally caught on because

–  Can do it “fast” (~1 week in 2012) thanks to GPU-‐based computaFon –  Actually works and with less overfikng due to tricks and massive amounts of data

AlexNet

AlexNet

96 11x11 pixel filter weights learned from ImageNet AlexNet

Handcrafed Textons

Unseen image classificaFons

Neural Networks in 2016 •  Variety of libraries that specify

inputs as tensor minibatch and automaFcally compute gradients –  Tensorflow –  Theano (Keras/Lasagne) –  Torch

•  Libraries also available for common Neural Network layer types –  ConvoluFonal, acFvaFon, pooling, dropout, RNN, etc.

•  Almost too easy –  Mind the danger zone!

Data science due diligence “Neural Networks sound awesome and will solve all our problems!” •  Significant investment in resources. GPU (TPU?) cluster, ramp-‐up

on niche/rapidly-‐evolving tools •  Long feedback loop for architecture improvement. Typically launch

many jobs and terminate bad models (see above) •  Need a lot of high-‐dimensional data with variability (millions of

unique observaFons and/or heavy data augmentaFon). Delicate balance of increased predicFve power/overfikng

•  Hard to debug when not working. Millions of reasons (literally) a model can be wrong, few ways it can be right. “Black magic”

•  Deep nonlinear models suffer from interpretability issues. Blackbox model (although acFve research here)

Thanks

Manuel Ruder, Alexey Dosovitskiy, Thomas Brox (2016). ArFsFc style transfer for videos. h2p://arxiv.org/abs/1604.08610

h2ps://www.youtube.com/watch?v=Khuj4ASldmU

Resources

“This is cool, but I don’t (want to) code” h2p://playground.tensorflow.org

“I am comfortable with the SciPy stack and want to understand more”

A Neural Network in 11 lines of Python h2p://iamtrask.github.io/2015/07/12/basic-‐python-‐network/

“I am comfortable with ML libraries and want to build a model”

MNIST •  Keras

h2ps://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py

•  Tensorflow h2ps://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html

Varia'onal Autoencoders (also using MNIST) •  Keras

h2p://blog.keras.io/building-‐autoencoders-‐in-‐keras.html •  Tensorflow

h2ps://jmetzen.github.io/2015-‐11-‐27/vae.html