43
Deep learning and neural networks 1 B. Mehlig , Department of Physics, University of Gothenburg, Sweden FFR135/FIM720 Artificial Neural Networks Chalmers/Gothenburg University, 7.5 credits Thanks to E. Werner, H. Linander & O. Balabanov.

Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Deep learning and neural networks

1

B. Mehlig, Department of Physics, University of Gothenburg, SwedenFFR135/FIM720 Artificial Neural Networks Chalmers/Gothenburg University, 7.5 credits

Thanks to E. Werner, H. Linander & O. Balabanov.

Page 2: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Neurons in the cerebral cortex

Neurons in the cerebral cortex (outer layer of the cerebrum, the largest and best developed part of the Human brain)

brainmaps.org

2

Page 3: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Neurons in the cerebral cortex

Neurons in the cerebral cortex (outer layer of the cerebrum, the largest and best developed part of the Human brain)

brainmaps.org

3

dendrites

cell body

axon

input

process

output

Page 4: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Neuron anatomy and activity

Neurons in the cerebral cortex

Spike train in electrosensory pyramidal neuron in fish (Eigenmannia)Gabbiani & Metzner, J. Exp. Biol. 202 (1999) 1267

active inactive

10 µm

Total length of dendrites up to . � cm

time

4

Schematic drawing of a neuron Output of a neuron: spike train

input process output

Page 5: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

McCulloch-Pitts neuron

Simple model for a neuron: Signal processing: weighted sum of inputs Activation function

Weights(synapticcouplings)

wi1

wi2

wiN

Incoming signalsj=1,...,N

wij ThresholdNeuron number

�i

�i

Output

Oi

Oi

g(bi)

= bi

i

(local field)

5

McCulloch & Pitts, Bull. Math. Biophys. 5 (1943) 115

x1

x2

xN

Oi = g� N�

j=1

wijxj � �i

= g�wi1x1 + . . . + wiNxN � �i)

xj

Page 6: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Activation function

Signal processing of McCulloch-Pitts neuron: weighted sum of inputs with activation function : g(bi)

= bi

InputsWeights wij

ThresholdOutput Oi

�i

g(b)

b

1

Step function

g(b)

b

1

�1

0

Signum function

g(b)

b

1

Sigmoid functionsgn(b)

0

6

g(b)

ReLU function

0

max(0, b)

b

Oi = g� N�

j=1

wijxj � �i

�= g

�wi1x1 + . . . + wiNxN � �i) xj

g(x) =1

e�b + 1

(local field)

�H(b)

Page 7: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Highly simplified idealisation of a neuron

Take activation function to be signum function .Output of neuron can assume only two values, ,representing the level of activity of the neuron.

This is an idealisation.

-real neurons fire spike trains

-switching active inactive can be delayed (time delay)

-real neurons may perform non-linear summation of inputs

-real neurons subject to noise (stochastic dynamics)

��

1

�1

active

inactiveisgn(bi)±1

7

Page 8: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Neural networks

8

Simple perceptrons

Recurrent network

Boltzmann machine

inputs inputs

Two-layer perceptron

Connect neurons into networks that can perform computing tasks: for example objectidentification, video analysis, speech recognition, classification, data clustering,...

The term perceptron was coined in 1958.

Rosenblatt, Psychological Review 65 (1958) 386

Page 9: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

A classification task

Input patterns . Index labels different patterns.

Each pattern has two components, and .

Arrange components into vector, . is shown in the Figure.

Patterns fall into two classes: onthe left, and on the right.

µ = 1, . . . , p

9

x

(µ)

x

(µ) =

"x

(µ)1

x

(µ)2

#x

(µ)1 x

(µ)2

x

(8)

x1

x2

x

(8)

Page 10: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

A classification task

Input patterns . Index labels different patterns.

Each pattern has two components, and .

Arrange components into vector, . is shown in the Figure.

Patterns fall into two classes: onthe left, and on the right.

Draw a red line (decision boundary) to distinguish the two types of patterns ( and ).

10

µ = 1, . . . , p

x

(8)

x1

x2

x

(µ) =

"x

(µ)1

x

(µ)2

#

x

(µ)

x

(µ)1 x

(µ)2

x

(8)

Page 11: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

A classification task

Input patterns . Index labels different patterns.

Each pattern has two components, and .

Arrange components into vector, , is shown in the Figure.

Patterns fall into two classes: onthe left, and on the right.

Draw a red line (decision boundary) to distinguish the two types of patterns ( and ).

Aim: train a neural network to compute the decision boundary. To do this, define targetvalues: for , and for

Training set .

11

t(µ) = 1 t(µ) = �1

µ = 1, . . . , p

x1

x2

x

(µ)

x

(µ)1 x

(µ)2

x

(µ) =

"x

(µ)1

x

(µ)2

#

(x(µ), t(µ)), µ = 1, . . . , p

x

(8)

x

(8)

Page 12: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Simple perceptron

Simple perceptron: one neuron. Two input terminals and . Activation function . No threshold, . Output

Vectors

Scalar product between vectors:

So

Aim: adjust the weights so that network outputs correct target value for all patterns:

w1

w2

sgn(b)

w =

w1

w2

� = 0

angle between and w

' w

= |w|

cos �sin�

�O

↵�

12

inputs

output denotes length of vector w

w

µ = 1, . . . , pfor

cos �

�2���2

x =

x1

x2

�= |x|

cos ↵sin↵

= |w| |x|(cos ↵ cos � + sin↵ sin��

w · x = w1x1 + w2x2

= |w| |x| cos(↵� �)

w · x = |w| |x| cos � x

O = sgn(w1x1 + w2x2)

O(µ) = sgn(w · x

(µ)) = t(µ)

x2x1

x2

x1

= sgn(w · x)

x

Page 13: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Geometrical solution

Simple perceptron: one neuron. Two input terminals and . Activation function . No threshold, . Output with

Aim: adjust the weights so that network outputs correct target values for all patterns:

Solution:

w1

w2

w

'

angle between and w

� = 0

13

Check: so . Correct since .

O

inputs

output

w

µ = 1, . . . , pfor

Legendt(µ) = 1

t(µ) = �1

cos �

�2���2

O = sgn(w · x) w · x = |w| |x| cos � x

O(µ) = sgn(w · x

(µ)) = t(µ)

x1

x2

x2

x1

x

(8)

w · x

(8) > 0 O(8) = sgn(w · x

(8)) = 1t(8) = 1

sgn(b)x2x1

define decision boundary by w · x = 0so that decision boundary. w ?

Page 14: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Hebb’s rule

Now the pattern is on wrong side of the red line. So .

Move the red line so that , by rotating the weight vector :w

w

14

Legendt(µ) = 1

t(µ) = �1x

(8)

x1

x2

x

(8) O(8) = sgn(w · x

(8)) 6= t(8)

O(8) = sgn(w · x

(8)) = t(8)

Page 15: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Hebb’s rule

Now the pattern is on wrong side of the red line. So .

Move the red line so that , by rotating the weight vector :

(small parameter ) so that .

w

w�

� > 0

15

Legendt(µ) = 1

t(µ) = �1

x

(8) O(8) = sgn(w · x

(8)) 6= t(8)

O(8) = sgn(w · x

(8)) = t(8)

w� = w + �x(8) O(8) = sgn(w · x

(8)) = t(8)

x

(8)

x1

x2

Page 16: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Hebb’s rule

Now the pattern is on wrong side of the red line. So .

Move the red line so that , by rotating the weight vector :

(small parameter ) so that .

w

� > 0

w

16

Legendt(µ) = 1

t(µ) = �1

x1

x2

x

(4)

x

(4) O(4) = sgn(w · x

(4)) 6= t(4)

O(4) = sgn(w · x

(4)) = t(4)

O(4) = sgn(w · x

(4)) = t(4)w� = w � �x(4)

Page 17: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Hebb’s rule

Now the pattern is on wrong side of the red line. So .

Move the red line so that , by rotating the weight vector :

(small parameter ) so that .

w

w�

� > 0

17

Legendt(µ) = 1

t(µ) = �1

x

(4) O(4) = sgn(w · x

(4)) 6= t(4)

O(4) = sgn(w · x

(4)) = t(4)

O(4) = sgn(w · x

(4)) = t(4)w� = w � �x(4)

x1

x2

x

(4)

Page 18: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Hebb’s rule

Now the pattern is on wrong side of the red line. So .

Move the red line so that , by rotating the weight vector :

(small parameter ) so that .

Note the minus sign:

where

where

Learning rule (Hebb’s rule)

with

w

w�

� > 0

w0 = w + �w

18

Apply learning rule many times until problem is solved.

Legendt(µ) = 1

t(µ) = �1 t(8) = 1

t(4) = �1

x

(4) O(4) = sgn(w · x

(4)) 6= t(4)

O(4) = sgn(w · x

(4)) = t(4)

O(4) = sgn(w · x

(4)) = t(4)w� = w � �x(4)

w� = w + �x(8)

w� = w � �x(4)

�w = ⌘ t(µ)x

(µ)x1

x2

x

(4)

Page 19: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Example - AND function

Logical AND

The threshold determines intersection of decision boundary with -axis.

Condition for decision boundary: .

Line equation for decision boundary: .

w

w2 = 1

w1 = 1

19

Legendt(µ) = 1

t(µ) = �1

1

⇠1 ⇠2 t

0 0 -1

1 0 -1

0 1 -1

1 1 1

x1

x2

threshold

x2

w · x� ✓ = 0intersection with -axis x2

x2 = �w1w2

x1 + ✓w2

x1 x2

x1

x2

Page 20: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Example - XOR function

Logical XOR

20

Legendt(µ) = 1

t(µ) = �1

1

⇠1 ⇠2 t

0 0 -1

1 0 1

0 1 1

1 1 -1

x1

x2

x2x1

Page 21: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Example - XOR function

Logical XOR

This problem is not linearly separable because we cannot separate from by asingle red line.

21

Legendt(µ) = 1

t(µ) = �1

1

⇠1 ⇠2 t

0 0 -1

1 0 1

0 1 1

1 1 -1

x1

x2

x2x1

Page 22: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Example - XOR function

Logical XOR

This problem is not linearly separable because we cannot separate from by asingle red line.

Solution: use two red lines.

22

Legendt(µ) = 1

t(µ) = �1

1

⇠1 ⇠2 t

0 0 -1

1 0 1

0 1 1

1 1 -1

x1

x2

x2x1

Page 23: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Example - XOR function

layer of hidden neurons and (neither input nor output)

all hidden weights equal to

Logical XOR

Two hidden neurons, each one defines one red line.

Together they define three regions, , , and .

We need a third neuron (with weights and to associate region with ,and regions and with .

I II III

W1 W2 II

II IIII

II III

1

11

1

1

1

2

21

V1

V2

V2V1

O = sgn(V1 � V2 � 1)

1

�1

threshold 23

Legendt(µ) = 1

t(µ) = �1

1

⇠1 ⇠2 t

0 0 -1

1 0 1

0 1 1

1 1 -1

x1

x2

x1

x2

x2x1

Page 24: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Non-(linearly) separable problems

Solve problems that are not linearly separable with a hidden layer of neurons

Four hidden neurons - one for each red line-segment. Move the red lines into the correct configuration by repeatedly using Hebb’s rule until the problem is solved(a fifth neuron assigns regions and solves the classification problem).

24

Legendt(µ) = 1

t(µ) = �1

x1

x2

x1

x2

Page 25: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Generalisation

Train the network on a training set : move red lines into the correct configuration by repeatedly applying Hebb’s rule to adjust all weights. Usually many iterations necessary.

Once all red lines are in the right place (all weights determined), apply network to a newdata set. If the training set was reliable, then the network has learnt to classify the new data,it has learnt to generalise.

25

Legendt(µ) = 1

t(µ) = �1

x1

x2

x1

x2

(x(µ), t(µ)), µ = 1, . . . , p

Page 26: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Training

Train the network on a training set : move red lines into the correct configuration by repeatedly applying Hebb’s rule to adjust all weights. Usually many iterations necessary.

Once all red lines are in the right place (all weights determined), apply network to a newdata set. If the training set was reliable, then the network has learnt to classify the new data,it has learnt to generalise.

26

Legendt(µ) = 1

t(µ) = �1

x1

x2

x1

x2

(x(µ), t(µ)), µ = 1, . . . , p

Page 27: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Training

Train the network on a training set : move red lines into the correct configuration by repeatedly applying Hebb’s rule to adjust all weights. Usually many iterations necessary.

Usually a small number of errors is acceptable. It is often not meaningful to try to fine-tune very precisely (input patterns are subject to noise).

error

27

Legendt(µ) = 1

t(µ) = �1

x1

x2

x1

x2

(x(µ), t(µ)), µ = 1, . . . , p

Page 28: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Overfitting

Train the network on a training set : move red lines into the correct configuration by repeatedly applying Hebb’s rule to adjust all weights. Usually many iterations necessary.

Here: used hidden neurons to fit decision boundary very precisely.

But often the inputs are affected by noise, which can be very different in different data sets.

28

15

training data set

Legendt(µ) = 1

t(µ) = �1

x1

x2

(x(µ), t(µ)), µ = 1, . . . , p

Page 29: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Overfitting

Train the network on a training set : move red lines into the correct configuration by repeatedly applying Hebb’s rule to adjust all weights. Usually many iterations necessary.

Here: used hidden neurons to fit decision boundary very precisely.

But often the inputs are affected by noise, which can be very different in different data sets.

Here the network just learnt noise in the training set (overfitting). Overfitting is a huge problem because networks usually have many parameters (weights and thresholds).

29

15

test data set

Legendt(µ) = 1

t(µ) = �1

x1

x2

(x(µ), t(µ)), µ = 1, . . . , p

Page 30: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Multilayer perceptrons

All examples had two-dimensional input space and one-dimensional outputs.So that we could draw the problem and understand its geometrical nature.

In reality problems are often high-dimensional. For example image recognition: input dimension . Output dimension: number of classes to be recognised (car, lorry, road sign, person, bicycle,...)

Multilayer perceptron: network with many hidden layers

Problems: (i) overfitting severe if many weights (ii) nets with many layers learn slowly

Questions: (i) how many hidden layers necessary? (ii) how many hidden layers efficient?

N = #pixels� 3

hidden layersnumber , , and

30

�� 1 � � + 1

... ...

Page 31: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

How many hidden layers?

All Boolean functions with inputs can be trained/learned with a single hidden layer.

Proof by construction. Requires neurons in hidden layer.

For large , this architecture is not practicalbecause the number of neurons increasesexponentially with .

Example: parity function. More efficient layout: build network using XOR units. Requires only units.

But such deep networksare in general hard to train.

N

2N

N

N

hidden layers

31

inputs outputs

� N

Parity function: if oddnumber of inputs equal ,otherwise .

O = 11

O = 0

Page 32: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Deep learning

Deep learning. Perceptrons withlarge numbers of layers (more than two). Neural nets recognise objects in images with very high accuracy.

Better than Humans?

Neural nets have been around since the 80ies.

It was always thought that they are very difficult to train.

Why does it suddenly work so well?

32

neural nets

estimate ofHuman error

Page 33: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Image classification interface for Humans

33

cs.stanford.edu/people/karpathy/ilsvrc

Page 34: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Better training sets

To obtain training set: must collect and manually annotate data. Expensive.Challenges: (i) efficient data collection and annotation (ii) require large variety in collected data

Example: Image-net (publicly available data set): images, classified into classes.

> 107 103

image-net.org

34

Page 35: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Manual annotation of training data

35

xkcd.com/1897

Page 36: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

reCAPTCHA

Two goals. (i) protects websites from bot (ii) users who click correctly provide correct annotation.

github.com/google/recaptcha

36

Page 37: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Better layouts: convolutional networks

Convolutional network. Each neuron infirst hidden layer is connected to a smallregion of inputs. Here pixels.

Slide the region over input image. Use sameweights for all hidden neurons. Update

- detects the same local feature everywhere in input image (edge, corner,...). Feature map.

- the form of the sum is called convolution

- less overfitting because fewer weights

- use several feature maps to detect different features in input image

1

• • • � � � � � � � �• • • � � � � � � � �• • • � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �

1

• � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �1

� • • • � � � � � � �� • • • � � � � � � �� • • • � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �

1

� • � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �

3� 3

Vmn

inputs hidden

V11

V12

10� 10

10� 10

10� 10

8� 8

8� 8

8� 8� 4

37

Vmn = g� 3�

k=1

3�

l=1

wklxm+k�1,n+l�1

Page 38: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Convolutional networks

- max-pooling: take maximum of over small region ( ) to reduce # of parameters

- add fully connected layers to learn more abstract features

But not new!

Vmn

inputs hidden

2� 2

8� 8� 410� 10 max-pooling

4� 4� 4

38

outputs

Krizhevsky, Sutskever & Hinton (2012)

Patterns detected by convolutional net in 1986 Rumelhart, Hinton & Williams (1986)

Page 39: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Object detection

Object detection from video frames using Yolo-9000

Film recorded with an Iphone on my way home from work.

Tiny number of weights (so that the algorithm runs on my laptop).

Pretrained on Pascal VOC data set.

github.com/philipperemy/yolo-9000

39

host.robots.ox.ac.uk/pascal/VOC/

Page 40: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Recognising NIST digits

National Institute of Standard (NIST) in USA has a database of hand-written digits.

Convolutional network trained on the M-NIST data set (60 000 digits) classifies digits in the corresponding test set with high accuracy. Only 23 out of 10 000 wrong.

But the convolutional net fails on our own digits!

Convolutional nets excel at learning patterns in a given input distribution very precisely.But they may fail if the distribution is slightly distorted.

40

http://yann.lecun.com/exdb/mnist/

Ciresan, Meier & Schmidhuber, arxiv:1202.2745

O. Balabanov (2018)

Page 41: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Further problems

Convolutional nets do not understand what they see in the same way as Humans do.

Convolutional net misclassifies slightly distorted image with high confidence

Convolutional net misclassifies noise with certainty.

The nearest decision boundary is very Translational-invariant nature ofclose. Not intuitive but possible in very convolutional layout makes ithigh-dimensional input space. difficult to learn global features.

41

original image correctlyclassified as car

slightly distorted imageclassified as ostrich

Szegedy et al. arxiv:1312.6199

99.6%

classified as leopard

Khurshudov, blog (2015)

Nguyen et al. arxiv:1412.61897

Page 42: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

Conclusions

Artificial neural networks were already studied in the 80ies.

In the last few years: deep-learning revolution due to - better training sets - better hardware (GPUs, dedicated chips) - improved algorithms for object detection Applications: google, facebook, tesla, zenuity,... .

Perspectives - close connections to statistical physics (spin glasses, random magnets) - related methods in mathematical statistics (big data analysis) - unsupervised learning (clustering and familiarity of input data)

But convolutional nets do not understand what they learn in the way Humans do.

Literature: lecture notes Artificial neural networks FFR 135 (Chalmers) FIM720 (GU).

42

B. Mehlig, physics.gu.se/~frtbm

Page 43: Deep learning and neural networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdfDeep learning and neural networks 1 B. Mehlig, Department of Physics,

OpenTAStellan Östlund & Hampus Linander

43