46
ICSD Summer School 2016 Data Science - Week 4 Gabriella Contardo LIP6, University Pierre et Marie Curie, Paris, France August 8, 2016 1/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

  • Upload
    buihanh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

ICSD Summer School 2016Data Science - Week 4

Gabriella Contardo

LIP6, University Pierre et Marie Curie, Paris, France

August 8, 2016

1/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 2: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Outline of the week

Course 1 : Reminders of the learning paradigm, neuralnetworks / multi layer perceptron.Course 2 : Deep learning : Convolutional Neural NetworksCourse 3 : Tips on deep-learning - Unsupervised Learning :Clustering (K-Means), EMCourse 4 : Unsupervised learning - PCA, Matrix Factorization andRecommender systemsCourse 5 : Unsupervised learning with (deep) neural networks :Auto-encoders, RNN. Word embeddings.

2/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 3: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

References

On-line course material for todayThanks to:

Patrick Gallinari - Professor at UPMC - Course ”ApprentissageStatistique”Fei-Fei Li’s course at Stanford : CS231n: Convolutional NeuralNetworks for Visual Recognition (Lecture : introduction to neuralnets, backpropagation)Feel free to read their supplementary notes available on thewebsite :http://cs231n.stanford.edu/syllabus.html

Also interesting (generally speaking):Machine Learning course by Andrew Ng on CourseraLectures by Nando De Freitas (Oxford) - videos available

3/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 4: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Outline of the day

Reminders and definitions about the learning problem(s)Brief history of machine learning (ML)/neural networksPerceptron→ multi layer perceptron (=neural network)What is inside ?How do I learn this ?

4/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 5: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Learning with examples

3 main « components » Data {z1, ..., zN} Machine or model Fθ

Criterion C (learning and evaluation) Goal

Extract information from data Relevant information

For the task we study For other data of the same type

Utilisation Inference on new data (=examples)

Different learning family : Supervised Unsupervised Semi supervised Reinforcement

Slides from AS course of P. Gallinari

Page 6: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Examples of some learning tasks/problems

Speach / Writing Data : (signal, (transcription)) Goal : recognize signal Criterion : # words accuratly recognized

Driving autonomous car Data : (images road, (command steering wheel)) e.g. S. Thrun

Darpa Challenge + Google car Goal : keep on the road Criterion : distance drived

Textual Information Data : (text + request, (relevant information)) – text corpus But : return information matching the request Critère : precision and recall

Slides from AS course of P. Gallinari

Page 7: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Examples of some learning tasks/problems

User modelization Data : (User log activity) Goal : Modelize / analize the user’s behavior

Examples : target customer, personnalized ad, recommender system (e.g movies), personnal assistant (e.g Google now, cortana) Google Now, etc

Critère : ? Evaluation : ? Example Google Now

Google Now keeps track of searches, calendar events, locations, and travel patterns. It then synthesizes all that info and alerts you—either through notifications in the menu bar or cards on the search screen—of transit alerts for your commute, box scores for your favorite sports team, nearby watering holes, and more. You can assume it will someday suggest a lot more.

Slides from AS course of P. Gallinari

Page 8: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Examples of some learning tasks/problems

More complex: Translation (side note : novel approaches are promising) Understanding scene (visual) or text : extract some « sens » Meta learning, transfer learning, « learning to learn » Discovering (« curiosity »), e.g using database or web

Data : information representation ? But ?? Critère ?? Evaluation ??

Slides from AS course of P. Gallinari

Page 9: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Data : diversity

Slides from AS course of P. Gallinari

Page 10: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

4 « types » of learning problems

(Machine) Learning provides tools to tackle generic problems

Transverse to wide variety of applications (finance, advertising, computer vision,...)

4 familes of learning Supervised Unsupervised Semi-supervised Reinforcement

Each family handle a particular set of generic problems Example of such set

Supervised : classification, regression, ranking

Slides from AS course of P. Gallinari

Page 11: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Supervised Learning

7

Training Data : set of couple (input, expected output) Goal : learn to associate input to output

Expect the model to generalize well : i.e good prediction on example (input) outside of the dataset used to learn but with same (or close) data origin

Utilisation : classification, regression, ordonnancement

Slides from AS course of P. Gallinari

Page 12: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Unsupervised Learning

Training Data Only input, no desired output

Goal Extract/Detect patterns in data, learn some structure in data

Similarities, underlying factors linking the data, ...

Utilization Density estimation, clustering, latent factors, feature learning,

generative models…

Slides from AS course of P. Gallinari

Page 13: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Semi-supervised learning

Training Data Labeled – a few Unlabeled (=no output) – a lot

Goal Extract information (e.g pattern, cf unsupervised) from

unlabeled examples to healp labelizing. Learning conjointly with the two sets of examples.

Utilization Huge datasets where labels are costly

Slides from AS course of P. Gallinari

Page 14: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Reinforcement Learning Training data

Environement with states (inputs) , actions that get the system to a state

to another, qualitative desired output

Paradigme Learning from exploring the environment, guided by reward (e.g

for good prediction) Trade-off explore/exploit

Utilisation Guiding, sequential decision, robotique, games with 2 (or more)

players … Example backgammon (TD Gammon Thesauro 1992)

Trained on 1.5 M games

Play against itself More recently : atari deepmind, alpha go (mix deep learning and

reinforcement learning)

Slides from AS course of P. Gallinari

Page 15: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Brief history of neural networks

43 Mc Culloch & Pitts : artificial neuron "A logical calculus of the ideas immanent in nervous activities"

40 – 45 Wiener (USA) Kolmogorov (URSS) Türing (UK)

48 – 50 Von Neuman : cellular automaton 49 Hebb’s rule : neuroscience / computer science –

adaptation of neurons during learning

Slides from AS course of P. Gallinari

Page 16: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Brief history of neural networks 55 – 60

Rosenblatt : Perceptron

Widrow - Hoff : Adaline

69 : Minsky / AI winter 70 – 80 (Auto)associative memory, ART, SOM ... 80 – 85

Non linear networks Hopfield networks, Boltzmann Machine Multi Layer Perceptron

AI winter 2 2006-...

Deep neural networks, restricted Boltzmann machines,… Representation learning

Slides from AS course of P. Gallinari

Page 17: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Ok now back to business

We still don’t know what is a neural network… How do I build it ? What is inside ? How does

this work ?− « Intuition » with playground.tensorflow.org− Forward pass and stuff

How do I learn this ?− Gradient descent ! Keep calm and backpropagate

Food for thought

Page 18: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Playground.tensorflow.org Perceptron :

Page 19: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

A neural net : MultiLayerPerceptron

Page 20: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

A neural net : MultiLayerPerceptron Still linear prediction….

Page 21: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

A neural net : MultiLayerPerceptron Kernel trick

Page 22: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

A neural net : MultiLayerPerceptron Kernel trick After a few learning steps…

Page 23: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

A neural net : MultiLayerPerceptron Non-linear activation function

Page 24: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

A neural net : MultiLayerPerceptron Non-linear activation function After a few learning steps

Page 25: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

A neural net : MultiLayerPerceptron

Other activation functions besides tanH and Sigmoid: Rectifier linear units

g(x) = max(0, b + w.x) Rectifier units allow to draw activations to 0 (used for sparse

representations) Maxout

g(x) = max_i (b_i + w_i . x) Generalizes the rectifier unit There are multiple weight vectors for each unit

Softmax Used for classification with a out of p coding (p classes)

Ensures that the sum of predicted outputs sums to 1

g(x) =

Slides from AS course of P. Gallinari

Page 26: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Learning neural networks

Finding the weights (W) that give best results to match inputs to expected outputs.

Learning : Minimising the function cost E(W, {X,Y})

How ? Gradient descent ! But how ? Backpropagation of the gradient !

− NB : a gradient :

Page 27: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Reminder – Gradient algorithm

  Goal Optimize a cost function E(w) with parameters w Principle :

Initialize wIterate until convergence w(t+1) = w(t) + ε(t)D(t)

Direction of descent D is computed from local information on the cost function E(w), i.e 1st or 2d order approximation

Example batch, with training dataset D = {(x1,y1),...(xN, yN)} :

Init w0

Iterate w(t+1) = w(t) – ε ∇WE(w(t))

Where E = Σi=1..Nc(xi)

 

Page 28: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Forward

(credit image N.Baskiotis course « ARF » UPMC)

Page 29: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Learning Algorithm

Feed forward an input (or more)− Compute predicted output

Compute error depending on cost function E, desired output and predicted output

Compute gradients of all weights (see next slide)

Update weights regarding gradient Repeat until stopping criterion

Page 30: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Backpropagation

Goal of gradient : update weights on the « right » direction (to lower the error) : − Which weight is responsible of the error and on

what « amount » ?

Backpropagation principle : gradient of a layer’s weights computed using « deltas » from the next layer ( : chain rule)

Page 31: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Neural Networks - Backpropagation

δout

δparamδnext

Update layer k ...

outk1

outk2

outkn

δnext

δoutk

1

δnext

δoutk

2

δnext

δoutkn

δout

δin

k−11

δnext

δout

δin

k−1m

δnext

...

ink−11

ink−1m

What does it look like if we compute backpropagation on thepreviously seen network with an example x i = [1,−1], y i = −1and a mean square error ?

5/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 32: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Neural Networks - Backpropagation

ModularityDo not need to compute yourself all the gradients : build thenetwork as layers of modules (OOP)Distinguish activation function modules and parametric modules(weights layers)Basically, a module should have (at least) three functions :

Forward : given an input, predict the outputBackward : given an input, and δnext (gradient backpropagated fromnext module), return the gradient δto prev to backpropagate inprevious module: δto prev = δout

δinδnext

Update : given input and δnext , update the parameters if needed(e.g if module ”tanh”, not necessary because no parameters):compute gradient δout

δparamsδnext .

Rely on matrices computation.

6/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 33: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Brain neural cell / Neural network cell

(credit image Karpathy’s course CS231n oxford)(credit image Karpathy’s course CS231n oxford)

Page 34: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Criterion – Loss functions Different cost functions can

be used, depending on the problem or the model

LMSE Regression

(but often used in classification)

Classification, Hinge, logistique

Hinge, logistique approximate classification error

Notations : d=desired output, y=predicted

output. Classif : d=[-1,1]

Figure from Bishop 2006

Abciss : z= y.d (desired * predicted output)

Page 35: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Criterion – Loss functions Different cost functions can

be used, depending on the problem or the model

LMSE Regression

(but often used in classification)

Classification, Hinge, logistique : approximate classification error

Also : cross-entropy/log-likelihood.

Notations : d=desired output, y=predicted

output. Classif : d=[-1,1]

Figure from Bishop 2006

Abciss : z= y.d (desired * predicted output)

Page 36: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Misc.

Batch / mini-batch / stochastic gradient descent :

→ Batch : compute gradient on all examples at each iteration. Good when convex loss.

→ Stochastic : compute on a randomly picked example at each iteration. Provides variance to get out local minima.

→Mini-batch : compute on a smaller subset of examples.

Back to playground : train loss vs test loss

→ Generalization / Overfitting

→ Regularization

Page 37: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Overfitting

Page 38: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Regularization

Page 39: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Regularization

Page 40: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Neural Networks Architectures

(credit image Karpathy’s course CS231n oxford)

Page 41: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Neural Networks Architectures

(credit image Karpathy’s course CS231n oxford)

Page 42: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Neural Networks Architectures

(credit image Karpathy’s course CS231n oxford)

Page 43: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Neural Networks

Questions ?

How to implement them ? Different (recent) libraries (provides GPU implem) :

– Torch (lua)

– Theano (python)

– Caffe (python/C++)

– TensorFlow (python)

– ….

Page 44: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Exercises

Consider a network with one hidden layer. We note fw (x) the outputof the network for input x (w are the parameters of the network).An input x i = {x i

j }j=1,...,d , its label y i , a training datasetD = {(x i , y i)}1,...,NWeights from input to hidden layer are noted w0 = {wjh}j=1,...,d ,h=1,...,HWeights from hidden layer to output layer arew1 = {whk}h=1,...,H,k=1,...,KActivation function for both layers are noted g1,g2

QuestionsHow many neurons are there in the networks ? How many outputs? Draw the network. Explain why there can be a number of outputsuperior to 1.Write the output fw (x) with regard to the components of x and w

7/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 45: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Exercises

QuestionsHow many neurons are there in the networks ? How many outputs? Draw the network. Explain why there can be a number of outputsuperior to 1.Write the output fw (x) with regard to the components of x and wWrite the cost (mean square) w.r.t the training set D. Write itstheoretical formulation (using expected value).

8/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning

Page 46: ICSD Summer School 2016 Data Science - Week 4 …people.inf.elte.hu/jehad/Aldahdooh/ISCD-FR/f1.pdf · ... Convolutional Neural Networks ... CS231n: Convolutional Neural Networks for

Exercises

QuestionsWrite the output fw (x) with regard to the components of x and w :fw (x)k = g2(

∑h whkg1(

∑i wihxi))

Write the cost (mean square) w.r.t the training set D. Write itstheoretical formulation (using expected value).R(fw ) = Ex ,y ((fw (x)− y)2)(= 1/n

∑i(fw (x

i)− y i)2)

9/9 G.Contardo Roscoff ICSD Summer School 2016 - Data Science and Machine Learning