64
An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of the University of Bologna February 20, 2018 Andrea Asperti DISI - Department of Informatics: Science and Engineering University of Bologna Mura Anteo Zamboni 7, 40127, Bologna, ITALY [email protected] Andrea Asperti Universit` a di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 1

An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

An introduction to Neural Networksand Deep Learning

Talk given at the Department of Mathematicsof the University of Bologna

February 20, 2018

Andrea Asperti

DISI - Department of Informatics: Science and EngineeringUniversity of Bologna

Mura Anteo Zamboni 7, 40127, Bologna, [email protected]

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 1

Page 2: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

A branch of Machine Learning

What is Machine Learning?

There are problems that are difficult to address with traditionalprogramming techniques:

I classify a document according to some criteria (e.g. spam,sentiment analysis, ...)

I compute the probability that a credit card transaction isfraudulent

I recognize an object in some image (possibly from an inusualviewpoint, in new lighting conditions, in a cluttered scene)

I ...

Typically the result is a weighted combination of a large number ofparameters, each one contributing to the solution in a small degree.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 2

Page 3: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

The Machine Learning approach

Suppose to have a set of input-output pairs (training set)

{〈xi , yi 〉}

the problem consists in guessing the map xi 7→ yi

The M.L. approach:

• describe the problem with a model depending on someparameters Θ (i.e. choose a parametric class of functions)

• define a loss function to compare the results of the modelwith the expected (experimental) values

• optimize (fit) the parameters Θ to reduce the loss to aminimum

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 3

Page 4: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Why Learning?

Machine Learning problems are in fact optimization problems!So, why talking about learning?

The point is that the solution to the optimization problem is notgiven in an analytical form (often there is no closed form solution).

So, we use iterative techniques (typically, gradient descent) toprogressively approximate the result.

This form of iteration over data can be understood as a way ofprogressive learning of the objective function based on theexperience of past observations.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 4

Page 5: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Why Learning?

Machine Learning problems are in fact optimization problems!So, why talking about learning?

The point is that the solution to the optimization problem is notgiven in an analytical form (often there is no closed form solution).

So, we use iterative techniques (typically, gradient descent) toprogressively approximate the result.

This form of iteration over data can be understood as a way ofprogressive learning of the objective function based on theexperience of past observations.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 5

Page 6: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Why Learning?

Machine Learning problems are in fact optimization problems!So, why talking about learning?

The point is that the solution to the optimization problem is notgiven in an analytical form (often there is no closed form solution).

So, we use iterative techniques (typically, gradient descent) toprogressively approximate the result.

This form of iteration over data can be understood as a way ofprogressive learning of the objective function based on theexperience of past observations.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 6

Page 7: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Why Learning?

Machine Learning problems are in fact optimization problems!So, why talking about learning?

The point is that the solution to the optimization problem is notgiven in an analytical form (often there is no closed form solution).

So, we use iterative techniques (typically, gradient descent) toprogressively approximate the result.

This form of iteration over data can be understood as a way ofprogressive learning of the objective function based on theexperience of past observations.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 7

Page 8: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Using gradients

The objective is to minimize some loss function over (fixed)training samples, e.g.

Θ(w) =∑i

E (o(w , xi ), yi )

by suitably adjusting the parameters w .

See how it changes according to small perturbations ∆(w) of theparameters w : this is the gradient

∇w [θ] = [ ∂Θ∂w1

, . . . , ∂Θ∂wn

]

of Θ w.r.t. w .

The gradient is a vectorpointing in the direction ofsteepest ascent.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 8

Page 9: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Gradient descent

Goal: minimize some loss function Θ(w) by suitably adjusting theparameters.

We can reach a minimalconfiguration for Θ(w) byiteratively taking small stepsin the direction opposite tothe gradient (gradient descent).

This is a general technique.

Warning: not guaranteed to work:

I may end up in local minima

I may get lost in plateau

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 9

Page 10: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Next arguments

A bit of taxonomy

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 10

Page 11: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Different types of Learning Tasks

• supervised learning:inputs + outputs (labels)- classification

- regression

• unsupervised learning:just inputs- clustering

- component analysis

- autoencoding

• reinforcement learningactions and rewards- learning long-term gains

- planning

supervised

unsupervised

reinforcement

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 11

Page 12: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Classification vs. Regression

Two forms of supervised learning: {〈xi , yi 〉}

inputNew

Probably a cat!

classification

New input

Expectedvalue

regression

y is discete: y ∈ {•,+} y is (conceptually) continuous

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 12

Page 13: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Many different techniques

• Different ways todefine the models:- decision trees

- linear models

- neural networks

- ...

• Different error(loss) functions:- mean squared errors

- logistic loss

- cross entropy

- cosine distance

- maximum margin

- ...

Sunny Overcast Rain

High Strong Normal Weak

No Yes

Yes

No Yes

Outlook

Humidity Wind

decision tree neural net

mean squared errors maximum margin

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 13

Page 14: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Next argument

Neural Networks

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 14

Page 15: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Neural Network

A network of (artificial) neurons

Artificial neuron

Each neuron takes multiple inputs and produces a single output(that can be passed as input to many other neurons).

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 15

Page 16: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

The artificial neuron

w1

output

x

x

b

+1

Σ

inputs

function

activation

bias

1

x22w

n

nw

The purpose of the activation function is to introduce athresholding mechanism(similar to the axon-hillock of cortical neurons).

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 16

Page 17: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Different activation functions

The activation function is responsible for threshold triggering.

0

1

threshold: if x > 0 then 1 else 0 logistic function: 11+e−x

1

0

hyperbolic tangent: ex−e−x

ex +e−x rectified linear (RELU): if x > 0 then x else 0

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 17

Page 18: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

A comparison with the cortical neuron

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 18

Page 19: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Next argument

Networks typology/topology

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 19

Page 20: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Layers

A neural network is a collection of artificial neurons connectedtogether.Neurons are usually organized in layers.

If there is more than one hidden layer the network is deep,otherwise it is called a shallow network.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 20

Page 21: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Feed-forward networks

If the network is acyclic, it is called a feed-forward network.Feed-forward networks are (at present) the commonest type ofnetworks in practical applications.

Important Composing linear transformations makes no sense,since we still get a linear transformation.What is the source of non linearity in Neural Networks?

The activation function

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 21

Page 22: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Feed-forward networks

If the network is acyclic, it is called a feed-forward network.Feed-forward networks are (at present) the commonest type ofnetworks in practical applications.

Important Composing linear transformations makes no sense,since we still get a linear transformation.What is the source of non linearity in Neural Networks?

The activation function

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 22

Page 23: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Dense networks

The most typical feed-forward network is a dense network whereeach neuron at layer k − 1 is connected to each neuron at layer k .

The network is defined by a matrix of parameters (weights) Wk foreach layer (+ biases).

The matrix Wk has dimension Lk × Lk+1 where Lk is the numberof neurons at layer k .

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 23

Page 24: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Parameters and hyper-parameters

The weights Wk are the parameters of the model: they arelearned during the training phase.

The number of layers and the number of neurons per layer arehyper-parameters: they are chosen by the user and fixed beforetraining may start.

Other important hyper-parameters govern training such as learningrate, batch-size, number of ephocs an many others.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 24

Page 25: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Convolutional networks

Convolutional networks are used with inputs with a topologicalstructure: signal sequences (e.g. sound), or images.

They repeteadly apply a (small) uniform linear transformation,called kernel, shifting it over the whole input image.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 25

Page 26: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Example

[−1 0 1

]−→

−101

−→

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 26

Page 27: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Computing features

Many interesting kernels (filters) known from Image Processing:

I first and second order derivatives, image gradients

I sobel, prewitt, . . .

In Neural Networks, kernels are learned by training.

Since kernels are small and weights are shared training is relativelyfast.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 27

Page 28: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Recurrent Networks

In a recurrent net-woks you may have cycles:

• dynamics is very complexnot even clear it stabilizes

• difficult to train

• biologically more realistic

Restricted models:

I Long-Short Term Memory models (LSTM),

I Gated Recurrent Unit (GRU)

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 28

Page 29: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

LSTM and GRU

LSTM are useful to model sequences:

I equivalent to very deep nets with one hidden layer per timeslice (net unrolling)

I weights are shared between different time slices

I they can keep information for a long time in an internal state

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 29

Page 30: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Symmetrically connected networks

Similar to recurrent networks, but connections between units aresymmetrical (they have the same weight in both directions).

They have stable configurations corresponding to local minima of asuitable energy function.

Hopfield nets: symmetrically connected nets without hidden units

Boltzmann machines: symmetrically connected nets with hidden units:

I more powerful models than Hopfield nets

I less powerful than general recurrent networks

I have a nice and simple learning algorithm

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 30

Page 31: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

How a real network looks like

VGG 16 (Simonyan e Zisserman). 92.7 accuracy (top-5) in ImageNet.

Picture by Davi Frossard: VGG in TensorFlow

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 31

Page 32: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

How do we implement a neural net?

Neural nets looks complicated.

How do we implement them?

There exist suitable languages:

I Theano, University of Montreal

I TensorFlow, Google Brain

I Caffe, Berkeley Vision

I Keras, F.Chollet

I PyTorch, Facebook

I . . .

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 32

Page 33: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

VGG 16 in Keras

From GitHub

def VGG_16(weights_path=None):

model = Sequential()

model.add(ZeroPadding2D((1,1),input_shape=(3,224,224)))

model.add(Convolution2D(64, 3, 3, activation=’relu’))

model.add(ZeroPadding2D((1,1)))

model.add(Convolution2D(64, 3, 3, activation=’relu’))

model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))

model.add(Convolution2D(128, 3, 3, activation=’relu’))

model.add(ZeroPadding2D((1,1)))

model.add(Convolution2D(128, 3, 3, activation=’relu’))

model.add(MaxPooling2D((2,2), strides=(2,2)))

...

The whole model is defined in 50 lines of code.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 33

Page 34: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

But what about training?

So complex ...

fit(x, y, batch_size=32, epochs=10)

I x: input, an array of data (hence, typically, an array of arrays)

I y: labels, an array of target categories

I batch size: integer, number of samples per gradient update.

I epochs: integer, the number of epochs (passes) to train themodel.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 34

Page 35: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Next arguments

Features and deep features

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 35

Page 36: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Features

Any individual measurable property of data useful for the solutionof a specific task is called a feature.

Examples:

I Emergency C-section: age, first pregnancy, anemia, fetusmalpresentation, previous premature birth, anomalousultrasound, ...

I Meteo: humidity, pression, temperature, wind, rain, snow, ...

I Expected lifetime: age, health, annual income, kind of work,...

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 36

Page 37: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Derived (inner) features

New interesting features may be derived as a combination of inputfeatures.

Suppose for instance that we are interested to model somephenomenon with a cubic function

f (x) = ax3 + bx2 + cx + d

We can use x as input or . . .

we can precompute x , x2 and x3 reducing the problem to a linearmodel!

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 37

Page 38: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Derived (inner) features

New interesting features may be derived as a combination of inputfeatures.

Suppose for instance that we are interested to model somephenomenon with a cubic function

f (x) = ax3 + bx2 + cx + d

We can use x as input or . . .

we can precompute x , x2 and x3 reducing the problem to a linearmodel!

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 38

Page 39: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Traditional Image Processing

In order to process animage we start computinginteresting derived featureson the image:

- first order derivatives- second order derivatives- difference of gaussians- laplacian- ...

original gaussian blur 25

gaussian difference gaussian blur 10

Then we use these derived features to get the desired output.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 39

Page 40: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Deep learning, in deeper sense

Discovering good features is a complex task.

Why not delegating the task to the machine, learning them?

Deep learning exploits a hierarchical organization of the learningmodel, allowing complex features to be computed in terms ofsimpler ones, through non-linear transformations.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 40

Page 41: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

AI, machine learning, deep learning

• Knowledge-based systems: take an expert, ask him how hesolves a problem and try to mimic his approach by means oflogical rules

• Traditional Machine-Learning: take an expert, ask himwhat are the features of data relevant to solve a givenproblem, and let the machine learn the mapping

• Deep-Learning: get rid of the expert

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 41

Page 42: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

AI, machine learning, deep learning

• Knowledge-based systems: take an expert, ask him how hesolves a problem and try to mimic his approach by means oflogical rules

• Traditional Machine-Learning: take an expert, ask himwhat are the features of data relevant to solve a givenproblem, and let the machine learn the mapping

• Deep-Learning: get rid of the expert

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 42

Page 43: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

AI, machine learning, deep learning

• Knowledge-based systems: take an expert, ask him how hesolves a problem and try to mimic his approach by means oflogical rules

• Traditional Machine-Learning: take an expert, ask himwhat are the features of data relevant to solve a givenproblem, and let the machine learn the mapping

• Deep-Learning: get rid of the expert

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 43

Page 44: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Relations between research areas

Deep learning

Example: MLPs autoencoders Example:

Representation learning

Machine learning

Example:

logistic

regression

Artificial Intelligence

bases

knowledgeExample:

Picture taken from “Deep Learning” by Y.Bengio, I.Goodfellow e A.Courville, MITPress.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 44

Page 45: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Components trained to learn

Input InputInput

Rule−based

systems

Input

Output

Output Output

Output

Hand−

designedprogram features

Hand−

designed

Mapping

from

features

Mapping

from

features

Features Features

Mapping

from

features

features

complex

More

Classic

Machine

Learning

Learning

Representation Deep

Learning

learning

components

Picture taken from “Deep Learning” by Y.Bengio, I.Goodfellow e A.Courville, MITPress.Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 45

Page 46: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Next arguments

Some successful applications

• MINST and ImageNet• Speech Recognition• Lip reading• Text generation• Deep dreams and Inceptionism• Mimicking style• Robot navigation• Game simulation

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 46

Page 47: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

MNIST

Modified National Institute of Standards and Technology database

I grayscale images of handwritten digits, 20× 20 pixels each

I 60,000 training images and 10,000 testing images

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 47

Page 48: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

MNIST

A comparison of different techniques

Classifier Error rateLinear classifier 7.6K-Nearest Neighbors 0.52SVM 0.56Shallow neural network 1.6Deep neural network 0.35Convolutional neural network 0.21

See LeCun’s page the mnist database for more data.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 48

Page 49: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

ImageNet

ImageNet (@Stanford Vision Lab)

I high resolution color images covering 22K object classes

I over 15 million labeled images from the web

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 49

Page 50: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

ImageNet competition

Annual competition of image classification (since 2010).

I 1.2 Million images (30K categories)

I make five guesses about image label, ordered by confidence

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 50

Page 51: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

ImageNet samples

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 51

Page 52: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

ImageNet results

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 52

Page 53: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Speech recognition

Several stages (similar to optical character recognition):

I Segmentation. Convert the sound wave into a vector ofacoustic coefficients. Typical sampling: 10 milliseconds.

I The acoustic model Use adjacent vectors of acousticcoefficients to associate probabilities with phonemes.

I Decoding Find the sequence of phonemes that best fit theacoustic data, and a model of expected sentences.

Deep neural networks, pioneered by George Dahl andAbdel-rahman Mohamed, are replacing previous machine learningmethods.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 53

Page 54: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Speech recognition

Major industries are investing lot of money on speech recognition:Amazon (with Intel), Google, Microsoft, ...

Achieving Human Parity in Conversational Speech Recognition.Speech & Dialog research group at Microsoft, 2016

R.Zweig (project manager) attributes the accomplishment to thesystematic use of the latest neural network technology in allaspects of the system.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 54

Page 55: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Lip reading

Google’s DeepMind AI can lip-read TV shows better than aprofessional

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 55

Page 56: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Text Generation

See Andrej Karpathy’s blogThe Unreasonable Effectiveness of Recurrent Neural Networks

Examples of fake algebraic documents automatically generated by a

RNN.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 56

Page 57: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Deep dreams

Visit Deep dreams generator

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 57

Page 58: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Mimicking style

A neural algorithm of artistic styleL.A. Gatys, A.S. Ecker, M. Bethge

Similar to inceptionism, but with “style” (texture) instead of content.

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 58

Page 59: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

More examples

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 59

Page 60: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

More examples

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 60

Page 61: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Mimicking style: a different approach

Image-to-image translation with Cycle Generative Adversarial Networks

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 61

Page 62: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Robot navigation

Quadcopter Navigation in the Forest using Deep Neural Networks

Robotics and Perception Group, University of Zurich, Switzerland &

Institute for Artificial Intelligence (IDSIA), Lugano Switzerland

Based on Imitation Learning

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 62

Page 63: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Atari Games and Q-learning

Google DeepMind’s system playing Atari games (2013)

Recently extended to Augmented Imagination (2017)

video

Based on:

I deep neural networks

I an innovative reinforcement learning technique calledQ-learning

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 63

Page 64: An introduction to Neural Networks and Deep Learningasperti/SLIDES/neural.pdf · An introduction to Neural Networks and Deep Learning Talk given at the Department of Mathematics of

Atari Games and Q-learning

The same network architecturewas applied to all games

Input are screen frames

Works well for reactive games,not for planning

Andrea Asperti Universita di Bologna - DISI: Dipartimento di Informatica: Scienza e Ingegneria 64