Part 8: Neural networks
EEEN60178: Digital Image Engineering Hujun Yin
Cross-section of retina, drawn by Santiago Ramón y Cajal (1853-1934).
Part 8: Neural networks
• Perception & cognition Retina & visual pathway Artificial intelligence
• Neurons & models
• Learning paradigms & network architectures
• Multilayer perceptrons
• Self-organising networks
• Learning & generalisation
• Genetic algorithm
8.1 Perception & cognition
• Perception &cognition
Tree
8.1 Perception & cognition
• Evolution
“Evolution is fascinating to watch. To me it is the most
interesting when one can observe the evolution of a single man.” Shana Alexander
8.1 Perception & cognition
• Evolution of eyes
Photosensory cell Receptor pigment Cup eye
Pinhole eye Single chambered (lens) eyes
Compound eyes have good time resolving power. Human eyes need 0.05 second to identify objects, while compounds eyes need only 0.01 second.
8.1 Perception & cognition
• Evolution of eyes
Limulus
Abdomen
Eyes
Head
Ommatidium
Tail
Hartline, et al (1960s) verified receptive field and lateral inhibition in limulus eyes
8.1 Perception & cognition
• Eyes
+
+
Lateral inhibition
To extract object boundaries!
Mach band effect
8.1 Perception & cognition
• A system point of view
Spatial frequency
Spatial frequency
Spatial frequency
Spatial position
Spatial position
Input spectrum
Output spectrum
HVS
MTF
Rel
ati
ve s
ensi
tivi
ty
Inp
ut in
ten
sity
O
utp
ut in
ten
sity
Ma
gn
itu
de
Ma
gn
itu
de
8.1 Perception & cognition
• Human eye
Cones: 6-7 millions
Red (64%)
Green (32%)
Blue (2%)
Rods: 120 millions
Dynamic range (brightest to darkest): 20 billion times (i.e. 206 dB)
Cameras: 8-16 bits
8.1 Perception & cognition
• Retina
8.1 Perception & cognition
• Visual pathway
There are about 100 millions of photosensitive cells in human retina, but only 1 million optic nerves connecting between retina and cortex.
8.1 Perception & cognition
• Visual cortex
• Many region have predetermined responses to visual stimuli. • Sensory experience from the external world can influence how the brain wires itself up after birth.
V1 V2 V6 depth
V5 motion
V4 colour
V3 orientation
8.1 Perception & cognition
• Cortex
8.1 Perception & cognition
• Perception
Einstein and ??
- from Kelso (1997)
8.1 Perception & cognition
• Perception
Robin’s vase
Kanisza triangle
8.1 Perception & cognition
• Cognition
Cognition is a result of dynamic pattern forming process of the brain, in which activities in the brain and behaviours appear self-organising into coherent statues or patterns and self-organisation provide a paradigm for behaviour and
cognition, as well as the structure and function of nervous system - from Kelso
(1997)
8.1 Perception & cognition
• Cognition
The human brain has trillions of cells, large enough for holding all events we have ever experienced [as claimed], but why can’t we remember all of them in details? Pinker believes that our brains are compelled to organise, without categories, the mental lift would be a chaos. Categorisations are for inference of objects having the same or similar properties.- from Pinker (1997)
8.1 Perception & cognition
• Cognition
Psychology
Behaviourism Neurobiology
Cognitive Science Computationism Connectionism
8.1 Perception & cognition
• Artificial system
Cognition is a complex information processing process (Marr 1982)
8.1 Perception & cognition
• Computing history
China ancient Abaci
Blaise Pascale 1642 Pascaline or Adding Machine
Joseph Jacquard 1804 Punch card controlled loom
Charles Babbage 1823 Difference Engine
Augusta Ada 1836 Programming
Herman Hollerith 1880 Data Processing Devices (IBM)
UK team 1942 Colossus
USA team 1946 ENIAC
UK teams 1949 EDSAC and Manchester Mark I
8.1 Perception & cognition
• Artificial intelligence
Alan Turing: Turing Machine,
Universal Turing Machine
Turing Test (Can Machines Think?)
Top-down: Representation/Symbols
Reasoning/Rules/Grammars
Learning/Logical Operations
Turing test
"Can machines think?" A seminal paper, Computing Machinery and Intelligence, Alan Turing published in 1950 in Mind.
Singularity?
8.2 Neurons
• Models of neural networks
(Artificial) Neural Networks Approach (bottom-up approach)
° NNs started on findings and discoveries in neurobilogy and neuroscience. We gradually realise that human brains bear little resemblance to the von Neumann type of computers .
° The human brain has about 100 billion interconneted neurons, Each is a cell that uses biochemical reactions to receive, process and transmit stimulus. Each neuron is a simple processing unit, which receives (through dendrites and synapses) stimuli (input) from roughly 1000 neighbouring neurons (axons), cumulates these stimuli (in soma) and fires through its axon to its output neurons if the aggregated input is over a threshold (hillock). Networks of these cells form the basis of information processing.
8.2 Neurons
• Biological neurons
~100 billion neurons
The brain has ~100 billion interconnected neurons, each connecting to ~1000 neighbouring neurons.
8.2 Neurons
• Biological neurons
(60-100 trillion) synapses
Action potential
Protein
molecule
Potassium
Chloride
Sodium
Axon’s action potential opens ion channel, producing Ca2+ that leads to release of neurotransmitter, causing ion-conducting channels to open. Synapses can have either excitatory (depolarizing) or inhibitory (hyperpolarizing) effect on postsynaptic neurons.
8.2 Neurons
• Biological neurons
8.2 Neurons
• Models of neurons
bias
Synaptic weights
y=(xTw+b)x
Tw+b
wn
w2
w1
1
xn
x2
x1
b
(.)
Input signals
Adder Activation function
Output
1
v
(v)
0
1
v
(v)
0
Perceptron
8.2 Neurons
• Models of neural networks
Multilayer Perceptron (MLP)
v3
v2
x1
x2
x3
xn-1
xn
y1
y2
y3
Hidden layer Output layerInput layer
Weights Weights
::
v1
vnh
8.2 Neurons
• Artificial neural networks: a history
McCulloch and Pitts 1942: Linear Threshold Gate (LTG)
Wiener 1948: “Cybernetics”
Hebb 1949: “Organisation of Behaviour”
Rosenblatt 1958: Perceptron
Minsky and Papert 1969: “Perceptron”
Werbos 1974: Back-propagation algorithm
Hopfield 1982: Hopfield Recurrent Network
Kohonen 1982: Self-Organising Map
Rumelhart, et. al. 1986: Multilayer Perceptron
Hopfield net
•
Step 1: Assign Connection Weights
tij is the connection weight from node i to node j and xis
which can be +1 or -1 is element i of the exemplar class s.
Step 2: Initialize with Unknown Input Pattern
µi(0) = xi, 0iN-1,
µi(t) is the output of node i at time t and xi is the element i of
the input pattern.
Step 3: Iterate Until Convergence
f is a activation function such as hard limit. The process is
repeated until the node outputs remain unchanged with further
iterations. The node outputs represent the exemplar pattern
that best matches the input.
Step 4: Repeat by Going to Step 2
1,0 , ,0
,1
0
Njiji
jixxt
M
s
sj
si
ij
10 ,)]([)1(1
0
NjttftN
i
iijj
8.3 Learning paradigms
• Learning
The process which leads to the modification of behaviour _Oxford Dictionary
“Some growth process or metabolic change takes place” _ Donald Hebb
Learning is a process by which the free parameters of neural networks are adapted through a process of simulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place. _Simon Haykin
8.3 Learning paradigms
• Learning
Learning, in neural network terms, is the process of modification of connection weights.
• Supervised Learning
–Error-correction learning
• Reinforcement learning
• Unsupervised Learning
– Hebbian learning
– Competitive learning
– Self-organisation
2| |),(| |min wxyd ),,( xydew
},{max asrewarda
yxw
otherwise 0,
wins neuron if ),,( iyxfw
ii
otherwise 0,
winner toclose neuron if , ixyw
ii
Global order can arise from local
interactions (Turing 1952)
rewardw
8.4 Network architectures
• Feedforward
v3
v2
x1
x2
x3
xn-1
xn
y1
y2
y3
Hidden layer Output layerInput layer
Weights Weights
::
v1
vnh
• Feed-forward Networks – Perceptron and multilayer perceptron – Radial basis function – Support vector machine
• Recurrent Networks – Hopfield networks – Boltzmann machine
v3
v2
x1
x2
x3
xn
y1
y2
y3
Weights
:
v1
vnh
yn
8.5 Feedforward networks
• Perceptron learning
Least mean square method:
n
k
Tkk tttwtxty
0
)()()()()( wx
)]([2
1 2 teEJ
)()()(
)()(
tette
tetJ
xww
)(2
1)( 2 tetJ
)]()()()[()()()()()(
)()1( tttdtttetttJ
tt Twxxwxw
www
)()( tytde
(without thresholding or activation function)
+1
-1
y=sign(xTw+b)x
Tw+b
wn
w2
w1
xn
x2
x1
b
8.5 Feedforward networks
• Perceptron limitations
+1
-1
y=sign(xTw+b)x
Tw+b
wn
w2
w1
xn
x2
x1
b
Linear separation
Sensitive to initial states
Local minima
A local minimum
Global minimum
x
Cost function
Starting point
8.5 Feedforward networks
• Multiplayer perceptron Back-propagation algorithm
Step 1: Initialisation. Set all weights and nodes’ biases to small random numbers.
Step 2: Presentation of training examples. input x=[x1, x2, …xn]T and
desired output d=[d1, d2,..dm]T.
Step 3: Forward computation. Calculate the outputs of the hidden and output layer,
Step 4: Backward computation and updating weights. Compute the error terms,
Step 5: Iteration. Repeat steps 3 and 4 and stop when the total error reaches the
required level).
)(1
okj
n
j
ojk
okk bvwy
h
)(1
hji
n
i
hij
hjj bxwv
))(1()1()'( kkkkkkok
ok
ok
ok ydyyyyee
m
k
ojk
okjj
m
k
ojk
ok
hj
hj wvvw
11
)1()'(
jok
ojk
ojk vww
ihj
hij
hij xww
v3
v2
x1
x2
x3
xn-1
xn
y1
y2
y3
Hidden layer Output layerInput layer
Weights Weights
::
v1
vnh
8.5 Feedforward networks
• Radial basis functions
Properties:
•Hidden neurons are Gaussian
•Output neurons are linear similar to Single Layer Perceptrons.
•Easy to train Hidden nodes can be chosen first by clustering; output layer is trained as single layer perceptrons.
•Fast to converge
v3
v2
x1
x2
x3
xn-1
xn
y1
y2
y3
Hidden layer Output layerInput layer
Weights=1 Weights
::
v1
vnh
v
(v)
0
)2
exp(2
2
kkv
mx
8.5 Feedforward networks
• Radial basis functions
The RBF model came from two fundamental theories
The Universal Approximation Theory: any function can be approximated with arbitrary precision by a weighted sum of a set of non-constant, bounded and monotone-increasing continuous functions,
Cover’s Separability Theory: A complex pattern-classification problem cast in a high-dimensional space nonlinearly is more likely to be linearly separable than in a low high dimensional space.
M
i
iwf
1
),(ˆ x
8.6 Self-organising networks
• Hebbian learning
When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such that A’s efficiency as one of the cells firing B, is increased.
In mathematical term:
Oja’s rule:
xyw
)()]()()()[()(
})]()()([{
)()()()1( 2
2/1
1
2
twtytxtytw
tytxtw
tytxtwtw iiin
j
jj
iii
8.6 Self-organising networks
• Hebbian learning
Input Output Weight Changes Weights
x1 x2 1 y w1 w2 b w1 w2 b
0 0 0
1 1 1 1 1 1 1 1 1 1
1 -1 1 -1 -1 1 -1 0 2 0
-1 1 1 -1 1 -1 -1 1 1 -1
-1 -1 1 -1 1 1 -1 2 2 -2
train a perceptron for an AND function, y=b+xiwi
X1
1
X2
1
1
0 .5
.5
Boundary
0
X1
Y
1 X2
1 -1
(.)
-
x1
x2
+
-
-
x1+x2=1
8.6 Self-organising networks
• Von der Malsburg and Willshaw model (1973,1976)
'
'' )()()()()()(
k
kikk
k
ik
j
iijii tybtyetxtwtcyt
ty
otherwise 0
)( if ,)(*)(*
t*ytyty
jj
j
constant subject to ),(*)()(
ijji
ijwtytx
t
tw
8.6 Self-organising networks
• Kohonen‘s SOM
nx
x
x
:
:
2
1
x
Otherwise 0,
bubble theinside is j neuron If ,1)1(ty j
(t) j if 0
(t)j if )],()([)()]()([)()()()(
)(
twtxtytwtxtwtytxty
t
tw iji
jijiijjijij
)()(min)()( tttt jj
i xwxw
])()([)1(
i
iijTjj tyhtty xw
8.6 Self-organising networks
• SOM algorithm
)]()()[,,()()( tttkvtt vk wxw
• Updating the weights of winner and its neighbours.
• At each time t, present an input, x(t), select the winner.
cc
tv wx
)(minarg
• Repeat until the map converges.
Typical neighbourhood function: ])(2
exp[),,(2
2
t
kvtkv
8.6 Self-organising networks
• SOM clustering
Dove
Hen
Duck Goose
Owl Hawk
Eagle
Fox
Dog Wolf
Cat
TigerLion HorseZebra
Cow
0 2 4 6 8 10
0
1
2
3
4
5
6
7
8
9
10
11
8.6 Self-organising networks
• SOM Data visualisation/management
8.6 Self-organising networks
• SOM
Quantisation & topology
Topologically “ordered” map
“Error tolerant” coding -HVQ
(Luttrell, NC 1994)
“Minimum wiring” (Mitchison, NC 1995),
(Durbin&Mitchson, Nature1990)
8.6 Self-organising networks
• SOM extensions
° HSOM (Miikkulainen 1990), DISLEX (1990, 1997)
° PSOM (Ritter 1993), Hyperbolic SOM (1999), H2SOM
° Temporal Kohonen Map (Chappell &Taylor 1993)
° Neural Gas (Martinetz et al.1991), Growing Grid (Fritzke1995)
° ASSOM (Kohonen 1997)
° Recurrent SOM (Koskela, 1997), RecursiveSOM (Voegtlin2001)
° SOAR (Lampinen&Oja 1989), SOMAR (Ni&Yin, 2007)
° Bayesian SOM & SOMN (Yin&Allinson 1995,1997; Utsugi 1997)
° GTM (Bishop et al. 1998)
° GHSOM (Merkl et al. 2000), TOC(TreeSOM) (Freeman&Yin2004)
° PicSOM (Laaksonen, Oja, et al., 2000)
° ViSOM (Yin 2001, 2002), gViSOM (Yin 2007)
8.7 Learning & generalisation
• Learning (fitting) & generalisation
0 2 4 6 8 100
1
2
3
4
5
6
X
Y
0 2 4 6 8 100
1
2
3
4
5
6
X
Y
On one hand, we want to get as high classification (or low error) rate as possible for the training data set.
On the other hand, we would also like the trained network to have good performance on unseen (testing) data.
Which is
better?
8.7 Learning & generalisation
• Bias &Variance dilemma (finite data set D)
dxxpxdxy )()]()([2
1 2
)]()}({)}][({)([2
)]()}({[)}]({)([
)]()}({)}({)([)]()([
22
22
xdxyExyExy
xdxyExyExy
xdxyExyExyxdxy
222 )}]}({)({[)]()}({[})]()({[ xyExyExdxyExdxyE DDD
(bias)2 (variance)
A close fitter/model yields small bias but large variance, while a loose fitter (or a wild guess) gives small variance but large bias.
8.7 Learning & generalisation
• Bias &Variance dilemma (finite data set D)
)()(})({
)}({)}({
0)]()}({[
2
2
xdxgxgE
xhExyEas
xdxyE
D
D
* Courtesy of http://www.aiaccess.net/
g(x): generating curve
dots: sample
For h1(x):
)()()}({
0)}]}({)({[
1
2
xyxhxyEas
xyExyED
For h2(x):
Low variance, but large bias (error) !
Low bias but large variance ! Poor generalisation !
That is, different samples give different h2(x), thus
large variance! Or if for Sample 2, one still uses
h2(x) obtained from Sample 1, the errors are large.
8.7 Learning & generalisation
• Bias &Variance dilemma (finite data set D) sin(2x): generating curve
dots: samples
Best representation
Over-fitting !
0th-order 1st-order
9th-order 3rd-order
8.8 Genetic algorithms
• Historical point of view
- Evolutionary Computation
A computational and optimisation methodology based on genetic and evolutionary mechanisms - proposed by John Holland in 1960s
8.8 Genetic algorithms
• Genetic view
The human body consists of thousands of millions of cells. Each of the cells' nucleuses contains 46 chromosomes. 23 from each parent making 23 chromosome pairs in each cell of the offspring.
The chromosomes consists of DNA (deoxyribonucleic acid), which is a double helix made of sugar (deoxyribose), phosphorus and four nitrogen bases called A (adenine), T (thymine), C (cytosine) and G (guanine). The bases A and T, and the bases C and G, respectively, always make pairs.
Our DNA consists of 70-100 thousand genes. A gene is a segment of DNA that codes for the production of a specific protein by spelling the sequence of amino acids composing that protein. It is also the genes that give every person their own particular looks. The genes that we inherit from our parents decide which sex we will have, the colour of our skin and eyes, if we will be a tall or short person, and if a gene in our body is defect we can become ill.
8.8 Genetic algorithms
• Genetic view
23 paired chromosomes of the human (male)
TraitTrait
(allele)(allele)
locuslocus
22 pairs of autosomes22 pairs of autosomes
Sex chromosomes: Sex chromosomes:
1 X and 1 Y 1 X and 1 Y –– male, or male, or
1 pair of X 1 pair of X –– femalefemale
8.8 Genetic algorithms
• Genetic algorithm (GA)
Population: a set of possible solutions, or chromosomes.
Initial population: can be randomly chosen
Fitness: a fitness of a solution (chromosome) needs to be defined.
Encoding: to encode a solution (chromosomes ) to usually a binary sequence (a number of binary bits).
8.8 Genetic algorithms
• GA operations
Selection: select a pair of, chromosomes from the population, with the
probabilities in proportion to their fitness.
Crossover: Randomly choose a locus and exchange the bits before and
after that locus between two chosen chromosomes to create two
child chromosomes. For example if the chosen parent strings are
11000100 and 01111110 and locus is chosen at the third bit.
Then the child strings are 11000110 and 01111100.
Mutation: Randomly (with a probability) flip some of the bits in the
chromosomes. For example, the string 00011000 might be
mutated in its fourth position to yield 00010000.
8.8 Genetic algorithms
• Selection: Roulette wheel
8.8 Genetic algorithms
• Crossover
• Mutation
11001011 + 11011111 = 11001111
11001111 => 10001011
8.8 Genetic algorithms
• Example
Find the maximum point of
f(y)=y+|sin(32y)|, 0y<
Using 16-bit encoding for solutions, 0<y<
Population size: 200,
Initial population is randomly chose from the search space,
Fitness: f(y),
Crossover rate: 0.5
Mutation rate: 0.005
run matlab code: GA1.m
Part 8: Neural networks (Summary)
• Introduction & cognition
• Artificial intelligence
• Retina & visual pathway
• Neurons & models
• Multilayer perceptron
• Self-organising networks
• Genetic algorithm