25
Intro to Neural Networks Dean Wya2e Boulder Data Science @drwya2e June 9, 2016

Intro to Neural Networks

Embed Size (px)

Citation preview

Intro  to  Neural  Networks  

Dean  Wya2e  Boulder  Data  Science  

@drwya2e  June  9,  2016  

Neural  Networks  •  AI  summer  is  here!  •  In  the  last  year  NNs  have    –  ConFnued  SOA  advancements  in  image  and  speech  recogniFon  

–  Beaten  a  human  player  in  Go  

–  Provided  some  quanFficaFon  of  “art”  

 

About  me  

•  100,000,000,000  neurons  •  10,000  dendriFc  inputs  per  neuron  

•  1  electrical  output  

How  does  your  brain  work?  

One  simple  abstracFon  

Dendri'c  input  

Synap'c  weights  

Soma   Axonal  output  

Digression  into  regression  

•  Linear  regression  

•  LogisFc  regression  

How  to  learn  the  weights?  

•  If  we  know  what  output  should  look  like,  can  compute  error  and  update  weights  to  minimize  it  – OpFmizaFon  problem,  typically  use  gradient  descent  

_   Correct  output    

Output  

Error  

Gradient  descent  

•  Given  a  cost  funcFon  – MSE  – Cross-­‐entropy  – etc.  

•  Can  take  step  in  opposite  direcFon  of  cost  gradient  by  compuFng  derivaFve  w.r.t.  weights  

•  Scale  by  learning  rate  (Fny  step)  

A  brief  history  of  neural  networks:  The  Perceptron  

x1   x2   y  

0   0   0  

0   1   0  

1   0   0  

1   1   1  

~1960:  “The  perceptron”  Universal  funcFon  approximator  

AND  

A  brief  history  of  neural  networks:  The  Perceptron  

~1960:  “The  perceptron”  Universal  funcFon  approximator  

x1   x2   y  

0   0   0  

0   1   1  

1   0   1  

1   1   0  

…but  only  if  funcFon  is  linearly  separable  

XOR  

?  

A  brief  history  of  neural  networks:  The  Perceptron  

•  Neural  network  research  halts    (AI  winter)  

•  Meanwhile…  –  Support  Vector  Machine  (SVM)  

invented,  solves  non-­‐linear  problems  

•  Shif  toward  separaFon  of  feature  representaFon  and  classificaFon  –  Handcraf  the  best  features,  train  

the  SVM  (or  current  state-­‐of-­‐the-­‐art)  to  do  the  classificaFon  

•  Eventually,  mulF-­‐layer  perceptron  generalizaFon  realized,  solves  non-­‐linear  problems  –  Nobody  cares…  

A  brief  history  of  neural  networks:  Next  ~30  years    

h"ps://www.youtube.com/watch?v=3liCbRZPrZA  

Handcrafed  arFsanal  features  

•  Discovering  good  features  is  hard!  –  Requires  a  lot  of  domain  knowledge  –  State  of  the  art  in  computer  vision  was  the  culminaFon  of  years  of  

collaboraFon  between  computer  vision  scienFsts,  neuroscienFsts,  etc.  •  Neural  networks  automaFcally  learn  features  (weights)  from  examples  

based  on  the  task  –  Each  neuron  is  a  “feature  detector”  that  acFvates  proporFonately  to  how  

well  its  input  matches  its  weights  –  Deep  learning:  Shif  back  from  hand-­‐crafed  features  to  features  learned  

from  task  

General  learning  methods  for  robust  feature  representaFon  and  classificaFon  

Hidden  1   Hidden  2   Hidden  3  

•  Handful  of  researchers  sFll  toiling  away  on  neural  networks  with  li2le-­‐to-­‐no  recogniFon  –  2012:  one  grad  student  studying  how  to  implement  neural  networks  on  GPUs  submits  

first  “deep  learning”  architecture  to  image  recogniFon  challenge,  wins  by  a  landslide  –  2013:  Almost  every  submission  the  is  a  deep  neural  network  executed  on  GPU  

(conFnuing  trend)  

A  brief  history  of  neural  networks:  Deep  learning  bandwagon  

First  deep  neural  network  

•  8  layers  •  650,000  “neurons”  (units)  •  60,000,000  learned  parameters  •  630,000,000  connecFons  •  Uses  same  basic  algorithm  as  mulF-­‐layer  perceptron  to  learn  weights  •  Finally  caught  on  because  

–  Can  do  it  “fast”  (~1  week  in  2012)  thanks  to  GPU-­‐based  computaFon  –  Actually  works  and  with  less  overfikng  due  to  tricks  and  massive  amounts  of  data  

AlexNet  

AlexNet    

96  11x11  pixel  filter  weights  learned  from  ImageNet    AlexNet  

Handcrafed  Textons  

Unseen  image  classificaFons  

Neural  Networks  in  2016  •  Variety  of  libraries  that  specify  

inputs  as  tensor  minibatch  and  automaFcally  compute  gradients  –  Tensorflow  –  Theano  (Keras/Lasagne)  –  Torch  

•  Libraries  also  available  for  common  Neural  Network  layer  types  –  ConvoluFonal,  acFvaFon,  pooling,    dropout,  RNN,  etc.  

•  Almost  too  easy  –  Mind  the  danger  zone!  

Data  science  due  diligence  “Neural  Networks  sound  awesome  and  will  solve  all  our  problems!”    •  Significant  investment  in  resources.  GPU  (TPU?)  cluster,  ramp-­‐up  

on  niche/rapidly-­‐evolving  tools  •  Long  feedback  loop  for  architecture  improvement.  Typically  launch  

many  jobs  and  terminate  bad  models  (see  above)  •  Need  a  lot  of  high-­‐dimensional  data  with  variability  (millions  of  

unique  observaFons  and/or  heavy  data  augmentaFon).  Delicate  balance  of  increased  predicFve  power/overfikng    

•  Hard  to  debug  when  not  working.  Millions  of  reasons  (literally)  a  model  can  be  wrong,  few  ways  it  can  be  right.  “Black  magic”  

•  Deep  nonlinear  models  suffer  from  interpretability  issues.  Blackbox  model  (although  acFve  research  here)  

Thanks  

Manuel  Ruder,  Alexey  Dosovitskiy,  Thomas  Brox  (2016).  ArFsFc  style  transfer  for  videos.  h2p://arxiv.org/abs/1604.08610  

h2ps://www.youtube.com/watch?v=Khuj4ASldmU  

Resources  

“This  is  cool,  but  I  don’t  (want  to)  code”  h2p://playground.tensorflow.org  

“I  am  comfortable  with  the  SciPy  stack  and  want  to  understand  more”  

 A  Neural  Network  in  11  lines  of  Python  h2p://iamtrask.github.io/2015/07/12/basic-­‐python-­‐network/  

“I  am  comfortable  with  ML  libraries  and  want  to  build  a  model”  

 MNIST  •  Keras  

h2ps://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py  

•  Tensorflow  h2ps://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html  

Varia'onal  Autoencoders  (also  using  MNIST)  •  Keras  

h2p://blog.keras.io/building-­‐autoencoders-­‐in-­‐keras.html  •  Tensorflow  

h2ps://jmetzen.github.io/2015-­‐11-­‐27/vae.html