Deep learning and neural networks

B. Mehlig, Department of Physics, University of Gothenburg, SwedenFFR135/FIM720 Artificial Neural Networks Chalmers/Gothenburg University, 7.5 credits

Thanks to E. Werner, H. Linander & O. Balabanov.

Neurons in the cerebral cortex

Neurons in the cerebral cortex (outer layer of the cerebrum, the largest and best developed part of the Human brain)

brainmaps.org

Neurons in the cerebral cortex (outer layer of the cerebrum, the largest and best developed part of the Human brain)

brainmaps.org

dendrites

cell body

process

output

Neuron anatomy and activity

Spike train in electrosensory pyramidal neuron in fish (Eigenmannia)Gabbiani & Metzner, J. Exp. Biol. 202 (1999) 1267

active inactive

10 µm

Total length of dendrites up to . � cm

Schematic drawing of a neuron Output of a neuron: spike train

input process output

McCulloch-Pitts neuron

Simple model for a neuron: Signal processing: weighted sum of inputs Activation function

Weights(synapticcouplings)

Incoming signalsj=1,...,N

wij ThresholdNeuron number

Output

(local field)

McCulloch & Pitts, Bull. Math. Biophys. 5 (1943) 115

Oi = g� N�

wijxj � �i

= g�wi1x1 + . . . + wiNxN � �i)

Activation function

Signal processing of McCulloch-Pitts neuron: weighted sum of inputs with activation function : g(bi)

InputsWeights wij

ThresholdOutput Oi

Step function

Signum function

Sigmoid functionsgn(b)

ReLU function

max(0, b)

Oi = g� N�

wijxj � �i

�= g

�wi1x1 + . . . + wiNxN � �i) xj

g(x) =1

e�b + 1

(local field)

�H(b)

Highly simplified idealisation of a neuron

Take activation function to be signum function .Output of neuron can assume only two values, ,representing the level of activity of the neuron.

This is an idealisation.

-real neurons fire spike trains

-switching active inactive can be delayed (time delay)

-real neurons may perform non-linear summation of inputs

-real neurons subject to noise (stochastic dynamics)

��

active

inactiveisgn(bi)±1

Neural networks

Simple perceptrons

Recurrent network

Boltzmann machine

inputs inputs

Two-layer perceptron

Connect neurons into networks that can perform computing tasks: for example objectidentification, video analysis, speech recognition, classification, data clustering,...

The term perceptron was coined in 1958.

Rosenblatt, Psychological Review 65 (1958) 386

A classification task

Input patterns . Index labels different patterns.

Each pattern has two components, and .

Arrange components into vector, . is shown in the Figure.

Patterns fall into two classes: onthe left, and on the right.

µ = 1, . . . , p

(µ) =

(µ)1 x

Arrange components into vector, . is shown in the Figure.

Draw a red line (decision boundary) to distinguish the two types of patterns ( and ).

µ = 1, . . . , p

(µ) =

(µ)1 x

Arrange components into vector, , is shown in the Figure.

Draw a red line (decision boundary) to distinguish the two types of patterns ( and ).

Aim: train a neural network to compute the decision boundary. To do this, define targetvalues: for , and for

Training set .

t(µ) = 1 t(µ) = �1

µ = 1, . . . , p

(µ)1 x

(µ) =

(x(µ), t(µ)), µ = 1, . . . , p

Simple perceptron

Simple perceptron: one neuron. Two input terminals and . Activation function . No threshold, . Output

Vectors

Scalar product between vectors:

Aim: adjust the weights so that network outputs correct target value for all patterns:

sgn(b)

� = 0

angle between and w

cos �sin�

↵�

inputs

output denotes length of vector w

µ = 1, . . . , pfor

cos �

�2��2

�= |x|

cos ↵sin↵

= |w| |x|(cos ↵ cos � + sin↵ sin��

w · x = w1x1 + w2x2

= |w| |x| cos(↵� �)

w · x = |w| |x| cos � x

O = sgn(w1x1 + w2x2)

O(µ) = sgn(w · x

(µ)) = t(µ)

= sgn(w · x)

Geometrical solution

Simple perceptron: one neuron. Two input terminals and . Activation function . No threshold, . Output with

Aim: adjust the weights so that network outputs correct target values for all patterns:

Solution:

angle between and w

� = 0

Check: so . Correct since .

inputs

output

µ = 1, . . . , pfor

Legendt(µ) = 1

t(µ) = �1

cos �

�2��2

O = sgn(w · x) w · x = |w| |x| cos � x

O(µ) = sgn(w · x

(µ)) = t(µ)

w · x

(8) > 0 O(8) = sgn(w · x

(8)) = 1t(8) = 1

sgn(b)x2x1

define decision boundary by w · x = 0so that decision boundary. w ?

Hebb’s rule

Now the pattern is on wrong side of the red line. So .

Move the red line so that , by rotating the weight vector :w

Legendt(µ) = 1

t(µ) = �1x

(8) O(8) = sgn(w · x

(8)) 6= t(8)

O(8) = sgn(w · x

(8)) = t(8)

Hebb’s rule

Move the red line so that , by rotating the weight vector :

(small parameter ) so that .

� > 0

Legendt(µ) = 1

t(µ) = �1

(8) O(8) = sgn(w · x

(8)) 6= t(8)

O(8) = sgn(w · x

(8)) = t(8)

w� = w + �x(8) O(8) = sgn(w · x

(8)) = t(8)

Hebb’s rule

� > 0

Legendt(µ) = 1

t(µ) = �1

(4) O(4) = sgn(w · x

(4)) 6= t(4)

O(4) = sgn(w · x

(4)) = t(4)

O(4) = sgn(w · x

(4)) = t(4)w� = w � �x(4)

Hebb’s rule

� > 0

Legendt(µ) = 1

t(µ) = �1

(4) O(4) = sgn(w · x

(4)) 6= t(4)

O(4) = sgn(w · x

(4)) = t(4)

O(4) = sgn(w · x

(4)) = t(4)w� = w � �x(4)

Hebb’s rule

Note the minus sign:

Learning rule (Hebb’s rule)

� > 0

w0 = w + �w

Apply learning rule many times until problem is solved.

Legendt(µ) = 1

t(µ) = �1 t(8) = 1

t(4) = �1

(4) O(4) = sgn(w · x

(4)) 6= t(4)

O(4) = sgn(w · x

(4)) = t(4)

O(4) = sgn(w · x

(4)) = t(4)w� = w � �x(4)

w� = w + �x(8)

w� = w � �x(4)

�w = ⌘ t(µ)x

(µ)x1

Example - AND function

Logical AND

The threshold determines intersection of decision boundary with -axis.

Condition for decision boundary: .

Line equation for decision boundary: .

w2 = 1

w1 = 1

Legendt(µ) = 1

t(µ) = �1

⇠1 ⇠2 t

0 0 -1

1 0 -1

0 1 -1

threshold

w · x� ✓ = 0intersection with -axis x2

x2 = �w1w2

x1 + ✓w2

Example - XOR function

Logical XOR

Legendt(µ) = 1

t(µ) = �1

⇠1 ⇠2 t

0 0 -1

1 1 -1

Logical XOR

This problem is not linearly separable because we cannot separate from by asingle red line.

Legendt(µ) = 1

t(µ) = �1

⇠1 ⇠2 t

0 0 -1

1 1 -1

Logical XOR

This problem is not linearly separable because we cannot separate from by asingle red line.

Solution: use two red lines.

Legendt(µ) = 1

t(µ) = �1

⇠1 ⇠2 t

0 0 -1

1 1 -1

layer of hidden neurons and (neither input nor output)

all hidden weights equal to

Logical XOR

Two hidden neurons, each one defines one red line.

Together they define three regions, , , and .

We need a third neuron (with weights and to associate region with ,and regions and with .

I II III

W1 W2 II

II IIII

II III

O = sgn(V1 � V2 � 1)

threshold 23

Legendt(µ) = 1

t(µ) = �1

⇠1 ⇠2 t

0 0 -1

1 1 -1

Non-(linearly) separable problems

Solve problems that are not linearly separable with a hidden layer of neurons

Four hidden neurons - one for each red line-segment. Move the red lines into the correct configuration by repeatedly using Hebb’s rule until the problem is solved(a fifth neuron assigns regions and solves the classification problem).

Legendt(µ) = 1

t(µ) = �1

Generalisation

Train the network on a training set : move red lines into the correct configuration by repeatedly applying Hebb’s rule to adjust all weights. Usually many iterations necessary.

Once all red lines are in the right place (all weights determined), apply network to a newdata set. If the training set was reliable, then the network has learnt to classify the new data,it has learnt to generalise.

Legendt(µ) = 1

t(µ) = �1

(x(µ), t(µ)), µ = 1, . . . , p

Training

Once all red lines are in the right place (all weights determined), apply network to a newdata set. If the training set was reliable, then the network has learnt to classify the new data,it has learnt to generalise.

Legendt(µ) = 1

t(µ) = �1

(x(µ), t(µ)), µ = 1, . . . , p

Training

Usually a small number of errors is acceptable. It is often not meaningful to try to fine-tune very precisely (input patterns are subject to noise).

Legendt(µ) = 1

t(µ) = �1

(x(µ), t(µ)), µ = 1, . . . , p

Overfitting

Here: used hidden neurons to fit decision boundary very precisely.

But often the inputs are affected by noise, which can be very different in different data sets.

training data set

Legendt(µ) = 1

t(µ) = �1

(x(µ), t(µ)), µ = 1, . . . , p

Overfitting

Here: used hidden neurons to fit decision boundary very precisely.

But often the inputs are affected by noise, which can be very different in different data sets.

Here the network just learnt noise in the training set (overfitting). Overfitting is a huge problem because networks usually have many parameters (weights and thresholds).

test data set

Legendt(µ) = 1

t(µ) = �1

(x(µ), t(µ)), µ = 1, . . . , p

Multilayer perceptrons

All examples had two-dimensional input space and one-dimensional outputs.So that we could draw the problem and understand its geometrical nature.

In reality problems are often high-dimensional. For example image recognition: input dimension . Output dimension: number of classes to be recognised (car, lorry, road sign, person, bicycle,...)

Multilayer perceptron: network with many hidden layers

Problems: (i) overfitting severe if many weights (ii) nets with many layers learn slowly

Questions: (i) how many hidden layers necessary? (ii) how many hidden layers efficient?

N = #pixels� 3

hidden layersnumber , , and

�� 1 � � + 1

... ...

How many hidden layers?

All Boolean functions with inputs can be trained/learned with a single hidden layer.

Proof by construction. Requires neurons in hidden layer.

For large , this architecture is not practicalbecause the number of neurons increasesexponentially with .

Example: parity function. More efficient layout: build network using XOR units. Requires only units.

But such deep networksare in general hard to train.

hidden layers

inputs outputs

Parity function: if oddnumber of inputs equal ,otherwise .

O = 11

Deep learning

Deep learning. Perceptrons withlarge numbers of layers (more than two). Neural nets recognise objects in images with very high accuracy.

Better than Humans?

Neural nets have been around since the 80ies.

It was always thought that they are very difficult to train.

Why does it suddenly work so well?

neural nets

estimate ofHuman error

Image classification interface for Humans

cs.stanford.edu/people/karpathy/ilsvrc

Better training sets

To obtain training set: must collect and manually annotate data. Expensive.Challenges: (i) efficient data collection and annotation (ii) require large variety in collected data

Example: Image-net (publicly available data set): images, classified into classes.

> 107 103

image-net.org

Manual annotation of training data

xkcd.com/1897

reCAPTCHA

Two goals. (i) protects websites from bot (ii) users who click correctly provide correct annotation.

github.com/google/recaptcha

Better layouts: convolutional networks

Convolutional network. Each neuron infirst hidden layer is connected to a smallregion of inputs. Here pixels.

Slide the region over input image. Use sameweights for all hidden neurons. Update

- detects the same local feature everywhere in input image (edge, corner,...). Feature map.

- the form of the sum is called convolution

- less overfitting because fewer weights

- use several feature maps to detect different features in input image

• • • � � � � � � � �• • • � � � � � � � �• • • � � � � � � � ��

• � � � � � � � �� 1

� • • • � � � � � � �� • • • � � � � � � �� • • • � � � � � � ��

� • � � � � � � ��

3� 3

inputs hidden

10� 10

8� 8

8� 8� 4

Vmn = g� 3�

wklxm+k�1,n+l�1

Convolutional networks

- max-pooling: take maximum of over small region ( ) to reduce # of parameters

- add fully connected layers to learn more abstract features

But not new!

inputs hidden

2� 2

8� 8� 410� 10 max-pooling

4� 4� 4

outputs

Krizhevsky, Sutskever & Hinton (2012)

Patterns detected by convolutional net in 1986 Rumelhart, Hinton & Williams (1986)

Object detection

Object detection from video frames using Yolo-9000

Film recorded with an Iphone on my way home from work.

Tiny number of weights (so that the algorithm runs on my laptop).

Pretrained on Pascal VOC data set.

github.com/philipperemy/yolo-9000

host.robots.ox.ac.uk/pascal/VOC/

Recognising NIST digits

National Institute of Standard (NIST) in USA has a database of hand-written digits.

Convolutional network trained on the M-NIST data set (60 000 digits) classifies digits in the corresponding test set with high accuracy. Only 23 out of 10 000 wrong.

But the convolutional net fails on our own digits!

Convolutional nets excel at learning patterns in a given input distribution very precisely.But they may fail if the distribution is slightly distorted.

http://yann.lecun.com/exdb/mnist/

Ciresan, Meier & Schmidhuber, arxiv:1202.2745

O. Balabanov (2018)

Further problems

Convolutional nets do not understand what they see in the same way as Humans do.

Convolutional net misclassifies slightly distorted image with high confidence

Convolutional net misclassifies noise with certainty.

The nearest decision boundary is very Translational-invariant nature ofclose. Not intuitive but possible in very convolutional layout makes ithigh-dimensional input space. difficult to learn global features.

original image correctlyclassified as car

slightly distorted imageclassified as ostrich

Szegedy et al. arxiv:1312.6199

classified as leopard

Khurshudov, blog (2015)

Nguyen et al. arxiv:1412.61897

Conclusions

Artificial neural networks were already studied in the 80ies.

In the last few years: deep-learning revolution due to - better training sets - better hardware (GPUs, dedicated chips) - improved algorithms for object detection Applications: google, facebook, tesla, zenuity,... .

Perspectives - close connections to statistical physics (spin glasses, random magnets) - related methods in mathematical statistics (big data analysis) - unsupervised learning (clustering and familiarity of input data)

But convolutional nets do not understand what they learn in the way Humans do.

Literature: lecture notes Artificial neural networks FFR 135 (Chalmers) FIM720 (GU).

B. Mehlig, physics.gu.se/~frtbm

OpenTAStellan Östlund & Hampus Linander

Deep learning and neural networks - Göteborgs...

Documents

Dresden - Laubegast Leben mit dem Fluss · Jörg Lämmerhirt, Jens Olaf Seifert, Frank Frenzel, Roman Mehlig, Doris Stepputtis, Daniel Woite und weitere Mitarbeiterinnen und Mitarbeiter

Neural Networks Neural Networks

Neural Network Part 3: Convolutional Neural Networkspages.cs.wisc.edu/.../lecture12-neural-networks-3.pdf · Convolutional neural networks •Strong empirical application performance

Clustering of Particles in Turbulent Flows Michael Wilkinson (Open University) Senior collaborators: Bernhard Mehlig, Stellan Ostlund (Gothenburg) Students

Artificial neural network for system identification in neural networks · · 2016-05-26Artificial neural network for system identification in neural networks! ... Introduction !

Neural Predictive Coding using Convolutional Neural ...Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics Arindam

What is a neural network - 123seminarsonly.com4 Neural networks.doc 1 Introduction 1.1 What is a neural network? Work on artificial neural networks, commonly referred to as neural

Scan1physics.gu.se/~frtbm/joomla/media/mydocs/ANN... · Title: Scan1.jpeg Created Date: 20191028151122Z

NRW-Arbeitshilfe (Entwurf) zum Starkregenrisikomanagement · NRW-Arbeitshilfe (Entwurf) zum Starkregenrisikomanagement Bernd Mehlig, Landesamt für Natur, Umwelt und Verbraucherschutz

NEURAL NETWORKS: Basics using MATLAB Neural Network

Neural Network Part 1: Multiple Layer Neural Networkspages.cs.wisc.edu/.../slides/lecture10-neural-networks-1.pdfNeural networks • a.k.a. artificial neural networks, connectionist

What are In This Issue: mangroves?...A Closer Look at Mangrove Propagules Sources: naturalhistory2.si.edu, nhmi.org, restoreourshores.org, Ulf Mehlig, commons.wikimedia.org MANGROVE

Ex3 1physics.gu.se/~frtbm/joomla/media/mydocs/Task7.10.3.pdfTitle Ex3_1.JPG Created Date 10/17/2019 9:41:56 AM

Artiﬁcal Neural Networks - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/ann.pdfin the software AlphaGo [8] that plays the game of go. The different algorithms have

Neural Networks with Complex Activations and … · Neural Networks with Complex Activations and Connection Weights ... neural network paradigms, ... Neural Networks with Complex

Biological Neural Network & Nonlinear Dynamics Biological Neural Network Similar Neural Network to Real Neural Networks Membrane Potential Potential of

Lecture 10: Neural Networks and Deep Learningsaravanan-thirumuruganathan.github.io/cse5334... · Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks

Neural network for Neural Networks

Brownian motors - Göteborgs universitetphysics.gu.se/~frtbm/joomla/media/mydocs/LennartSjogren... · 2013-03-13 · Brownian motors 10.1 Introduction ... One of the rst to study

Artificial Neural NetworksArtificial Neural Networksrailway.iust.ac.ir/files/rail/Booklet_Teach/neural_network_moaveni.pdf · Artificial Neural NetworksArtificial Neural Networks