View
229
Download
0
Category
Preview:
Citation preview
The Centre for Technology enabled Teaching & Learning , N Y S S, India The Centre for Technology enabled Teaching & Learning , N Y S S, India
DTELDTEL(Department for Technology Enhanced Learning)
1DTEL DTEL Teaching Innovation - Entrepreneurial - Global
DEPARTMENT OF ELECTRONICS &
TELECOMMUNICATION ENGINEERING
VII-SEMESTER
FE2: SOFT COMPUTING
ET411
2
ET411
UNIT NO.2
NEURAL NETWORK
UNIT 1- SYLLABUS
Introduction of neural networks, learning methods1
perceptron training algorithm, single layer perceptron, 2
multiplayer perceptron3
DTEL DTEL
neural network architectures, 4
3
ADALINE,MADALINE5
CHAPTER-1 SPECIFIC OBJECTIVE / COURSE OUTCOME
Identify and describe Learning rules.1
Apply supervised neural networks to pattern classification.2
The student will be able to:
DTEL DTEL
Apply supervised neural networks to pattern classification.2
4
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Techniques in soft computing
�Neural Networks
�Fuzzy Logic
Lect. No. 01:Unit 01
5DTEL DTEL
�Genetic Algorithm
�Hybrid Systems
The Centre for Technology enabled Teaching & Learning , N Y S S, India
According to Haykin(1994):
• A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It
Definition of NNLect. No. 01:Unit 01Lect. No. 01:Unit 01
6DTEL DTEL
experiential knowledge and making it available for use. It resembles the brain in two respects:
– Knowledge is acquired by the network through a learning process.
– Interneuron connection strengths known as synaptic weights are used to store the knowledge
The Centre for Technology enabled Teaching & Learning , N Y S S, India
What is Neural Network?
• A complex biological NN is a highly interconnected set of neurons to facilitate our reading, breathing…….
• Each neuron– is a rich assembly of tissue and chemistry– has the complexity (if not the speed) of a
1010
Lect. No. 01:Unit 01
7DTEL DTEL
– has the complexity (if not the speed) of a microprocessor
• NNs operate– neural functions are stored in the neurons and their
connections– learning: establishment of new and modification of
existingconnections
The Centre for Technology enabled Teaching & Learning , N Y S S, India Biological Neuron
• The brain is highly complex, nonlinear, and parallel information processing system
• It performs tasks like pattern recognition, perception, motor control , many times faster than the fastest digital computers
Lect. No. 01:Unit 01
8DTEL DTEL
computers• The purpose of neurons is to transmit information
– it accepts many inputs, which are all added up in some way
– if enough active inputs are received at once, the neuron will be activated and fire; if not, it will remain in its inactive state
The Centre for Technology enabled Teaching & Learning , N Y S S, India Structure of Neuron
�Body (soma) – contains nucleus containing the chromosomes
� dendrites� axon� synapse - a narrow gap
� couples the axon with
Lect. No. 01:Unit 01
9DTEL DTEL
� couples the axon with the dendrite of another cell
� no direct linkage across the junction, it is a chemical one
� information is passed from one neuron to another through synapses
Figure from Elements of Artificial Neural Network, By K. Mehrotra, MIT,Cognet
The Centre for Technology enabled Teaching & Learning , N Y S S, India Operation of biological neuron
• signals are transmitted between neurons by electrical pulses (action potentials, spikes) traveling along the axon
• when the potential at the synapse is raised sufficiently by the action potential, it releases chemicals called
Lect. No. 01:Unit 01
10DTEL DTEL
the action potential, it releases chemicals called neurotransmitters– it may take the arrival of more than one action
potential before the synapse is triggered
The Centre for Technology enabled Teaching & Learning , N Y S S, India ARTIFICIAL NEURAL NET• Information-processing system.
• Neurons process the information.
• The signals are transmitted by means of
connection links.
Lect. No. 01:Unit 01
11DTEL DTEL
connection links.
• The links possess an associated weight.
• The output signal is obtained by applying
activations to the net input.
The Centre for Technology enabled Teaching & Learning , N Y S S, India ARTIFICIAL NEURAL NET
X
X1
W2
W1Y
Lect. No. 01:Unit 01
12DTEL DTEL
X2
The figure shows a simple artificial neural net withtwo input neurons (X1, X2) and one output neuron(Y). The inter connected weights are given by W1and W2.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Association of biological net with artificial net
Lect. No. 01:Unit 01
dendrites
Cell body
threshold
axon
13DTEL DTEL
Biological Neuron Artificial Neuron
Cell Neuron
Dendrites Weights or interconnections
Soma Net input
Axon Output
summation
axon
The Centre for Technology enabled Teaching & Learning , N Y S S, India Processing of an artificial net
The neuron is the basic information processing unit of a NN. It consists of:
1. A set of links, describing the neuron inputs, with weights W1, W2, …, Wm.
2. An adder function (linear combiner) for computing the
Lect. No. 01:Unit 01
14DTEL DTEL
2. An adder function (linear combiner) for computing the weighted sum of the inputs (real numbers)
3. Activation function for limiting the amplitude of the neuron output.
jj
j XWum
1 ∑=
=
)(u y b+=ϕ
The Centre for Technology enabled Teaching & Learning , N Y S S, India SALIENT FEATURES OF ANN
�Adaptive learning
�Self-organization
�Real-time operation
�Massive parallelism
Lect. No. 01:Unit 01
15DTEL DTEL
�Massive parallelism
�Learning and generalizing ability
The Centre for Technology enabled Teaching & Learning , N Y S S, India BIAS OF AN ARTIFICIAL NEURON
The bias value is added to the weighted sum
∑wixi so that we can transform it from the origin.
Y = ∑w x + b, where b is the bias
Lect. No. 02:Unit 01
16DTEL DTEL
Yin = ∑wixi + b, where b is the bias
x1-x2=0
x1-x2= 1
x1
x2
x1-x2= -1
The Centre for Technology enabled Teaching & Learning , N Y S S, India BUILDING BLOCKS OF ARTIFICIAL
NEURAL NET
� Network Architecture (Connection between Neurons)
Lect. No. 02:Unit 01
17DTEL DTEL
� Setting the Weights (Training)
� Activation Function
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Lect. No. 02:Unit 01
18DTEL DTEL
Figure from Elements of Artificial Neural Network, By K. Mehrotra, MIT,Cognet
The Centre for Technology enabled Teaching & Learning , N Y S S, India LAYER PROPERTIES
• Input Layer: Each input unit may be designated by an attribute value possessed by the instance.
• Hidden Layer: Not directly observable, provides
Lect. No. 02:Unit 01
19DTEL DTEL
• Hidden Layer: Not directly observable, provides nonlinearities for the network.
• Output Layer: Encodes possible values.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
MULTI LAYER ARTIFICIAL NEURAL NET
� INPUT: records without class attribute with normalized attributes values.
� INPUT VECTOR: X = { x1, x2, …, xn}where n is the number of (non-class) attributes.
� INPUT LAYER: there are as many nodes as non-
Lect. No. 02:Unit 01
20DTEL DTEL
class attributes, i.e. as the length of the input vector.� HIDDEN LAYER: the number of nodes in the hidden
layer and the number of hidden layers depends on implementation.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
TRAINING PROCESS
� Supervised Training - Providing the network with a series of sample inputs and comparing the output with the expected responses.
� Unsupervised Training - Most similar input vector
Lect. No. 02:Unit 01
21DTEL DTEL
� Unsupervised Training - Most similar input vector is assigned to the same output unit.
� Reinforcement Training - Right answer is not provided but indication of whether ‘right’ or ‘wrong’ is provided.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
ACTIVATION FUNCTION
ACTIVATION LEVEL – DISCRETE OR CONTINUOUS
HARD LIMIT FUCNTION (DISCRETE)
� Binary Activation function
�Bipolar activation function
Lect. No. 02:Unit 01
22DTEL DTEL
�Identity function
SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)
�Binary Sigmoidal activation function
�Bipolar Sigmoidal activation function
The Centre for Technology enabled Teaching & Learning , N Y S S, India
ACTIVATION FUNCTION
Activation functions:
(A)Identity
(B)Binary step
(C)Bipolar step
Lect. No. 02:Unit 01
23DTEL DTEL
(C)Bipolar step
(D)Binary sigmoidal
(E)Bipolar sigmoidal
(F)Ramp
Figure from Principles of Soft Computing, by S. N. Sivanandam & S. N. Deepa
The Centre for Technology enabled Teaching & Learning , N Y S S, India Activation function
• Binary Step
• Bipolar Step
• Binary Sigmoidal
1, _ 0( _ )
0, _ 0
y iny in
y inφ
>= ≤
1, _ 0( _ )
1, _ 0
y iny in
y inφ
>= − ≤
1
Lect. No. 02:Unit 01
24DTEL DTEL
• Binary Sigmoidal
• Bipolar Sigmoidal
• Ramp
* _
1( _ )
1 y iny in
e αφ −=+
* _
* _
1( _ )
1
y in
y in
ey in
e
α
αφ−
−
−=+
1, _ 1
( _ ) _ ,0 _ 1
0, _ 0
y in
y in y in y in
y in
φ>
= ≤ ≤ <
The Centre for Technology enabled Teaching & Learning , N Y S S, India
PROBLEM SOLVING
1. Select a suitable NN model based on the nature of the problem.
2. Construct a NN according to the characteristics of the application domain.
Lect. No. 02:Unit 01
25DTEL DTEL
3. Train the neural network with the learning procedure of the selected model.
4. Use the trained network for making inference or solving problems.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
NEURAL NETWORKS
� Neural Network learns by adjusting theweights so as to be able to correctly classifythe training data and hence, after testingphase, to classify unknown data.
Lect. No. 03:Unit 01
26DTEL DTEL
� Neural Network needs long time for training.
� Neural Network has a high tolerance tonoisy and incomplete data.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Operation of neural netLect. No. 03:Unit 01
fOutput y
∑
w0j
w1j
x0
x1
27DTEL DTEL
Weighted sum
Inputvector x
Output y
Activationfunction
Weightvector
w
wnjxn
The Centre for Technology enabled Teaching & Learning , N Y S S, India
MP Neuron
�Neurons are randomly connected
�Each neuron has a fixed threshold
�It takes one ‘time step’ to pass a signal over one connection link
�Firing state (Activation ) is binary (1 = firing, 0 = not firing)
McCULLOCH –PITTS NEURON
Lect. No. 03:Unit 01
28DTEL DTEL
�Most widely used in case of logic functions.
�One inhibitory neuron connects to all other neurons
�It functions to regulate network activity (prevent too manyfirings)
�Positive weights – excites neuron, Negative weights – inhibitneuron
The Centre for Technology enabled Teaching & Learning , N Y S S, India
X1
XnY
+w
+w
-p
McCULLOCH –PITTS NEURON
Lect. No. 03:Unit 01
29DTEL DTEL
Xn+1
Xn+m
-p
-p
p > 0-p inhibit+w excite
Activation Function
f(y_in) = 1 if y_in ≥ θ= 0 if y_in < θ
Where θ is the threshold
The Centre for Technology enabled Teaching & Learning , N Y S S, India
• AND Logic Using MP Neuron
x1 x2 y0 0 00 1 01 0 01 1 1
x1X1
y Y
McCULLOCH –PITTS NEURON
Lect. No. 03:Unit 01
30DTEL DTEL
y_in = x1*1 + x2*1 = x1+x2
y = f(y_in) = 1 if y_in ≥ θ, y_in ≥ 2
x2 θ = 2X2
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Training Algorithms for Single Layer NN
�Hebb – most fundamental�Perceptron Learning Algorithm
Delta Rule
Lect. No. 04:Unit 01
31DTEL DTEL
�Delta Rule
The Centre for Technology enabled Teaching & Learning , N Y S S, India
• Donald Hebb stated in 1949 that in the brain, thelearning is performed by the change in thesynaptic gap. Hebb explained it:
Lect. No. 04:Unit 01 Hebb Network
32DTEL DTEL
• “When an axon of cell A is near enough to excite cell B, and repeatedly or permanently takes place in firing it, some growth process or metabolic change takes place in one or both the cells such that A’s efficiency, as one of the cells firing B, is increased.”
The Centre for Technology enabled Teaching & Learning , N Y S S, India
• Hebb Learning� Initialize weights to 0, wi
� For each training vector and target pair, si : ti , ( i = 1,n )� Set activations for input
neurons xi = si
1
X1 Y
X2
y
b
w1
w2 Bipolar Data +1 or –1We need training data for
Lect. No. 04:Unit 01 Hebb Network
33DTEL DTEL
� Set activations for output neuron y = ti
� Adjust weightswi (new) = wi (old) + xi y
• Adjust bias,b(new) = b(old) + y
We need training data for learning ( s:t ), Training vector s, Target vector t
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Hebb Learning Example ( AND Logic Gate ):
Input
x1
Input
x2
Bias
B
Target
y
1 1 1 1
1 -1 1 -1
Lect. No. 04:Unit 01
34DTEL DTEL
1 -1 1 -1
-1 1 1 -1
-1 -1 1 -1
The Centre for Technology enabled Teaching & Learning , N Y S S, India Initialize weights to zero, calculate change in weights and bias
Recall: wi (new) = wi (old) + xi yb(new) = b(old) + y
So, define: ∆w1 = x1y ∆ w2 = x2y ∆b = y
NEW w w b
Lect. No. 04:Unit 01
35DTEL DTEL
x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b
1 1 1 1 1 1 1
NEW w1, w2, b
The Centre for Technology enabled Teaching & Learning , N Y S S, India Initialize weights to zero, calculate change in weights and bias
Recall: wi (new) = wi (old) + xi yb(new) = b(old) + y
So: ∆w1 = x1y ∆ w2 = x2y ∆b = y
NEW w w b
Lect. No. 04:Unit 01
36DTEL DTEL
x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b
1 1 1 1 1 1 1 1 1 1
NEW w1, w2, b
Since initial weights = 0, wi (new) = wi (old) + xi y wi (new) = xi y
The Centre for Technology enabled Teaching & Learning , N Y S S, India CURRENT DECISION BOUNDARY
y = b + ∑i xi * wi = 0 (recall zero is the boundary )
0 = b + x1w1 + x2w2
solve for x : x = -(w /w ) x – b/w
Lect. No. 04:Unit 01
37DTEL DTEL
solve for x2: x2 = -(w1 /w2) x1 – b/w2
With current weights: x2 = -x1 – 1
x2 = 0, x1 = -1
x2 = -1 , x1 = 0
+-
- -
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Using: wi (new) = wi (old) + xi yb(new) = b(old) + y
And: ∆w1 = x1y ∆ w2 = x2y ∆b = y
w w b
Next Data Set
Lect. No. 04:Unit 01
38DTEL DTEL
x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b
1 -1 1 -1 -1 1 -1 0 2 0
NEW w1, w2, b
Since previous weights no longer 0, wi (new) = wi (old) + xi y
The Centre for Technology enabled Teaching & Learning , N Y S S, India CURRENT DECISION BOUNDARY
+-
x2 = -(w1 /w2) x1 – b/w2
With current weights: x2 = 0
Lect. No. 04:Unit 01
39DTEL DTEL
- -
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Using: wi (new) = wi (old) + xi y
b(new) = b(old) + y
And: ∆w1 = x1y ∆ w2 = x2y ∆b = y
w w b
Next Data Set
Lect. No. 04:Unit 01
40DTEL DTEL
x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b
-1 1 1 -1 1 -1 -1 1 1 -1
NEW w1, w2, b
Since previous weights no longer 0, wi (new) = wi (old) + xi y
The Centre for Technology enabled Teaching & Learning , N Y S S, India
CURRENT DECISION BOUNDARY
+-
x2 = -(w1 /w2) x1 – b/w2
With current weights: x2 = - x1 +1
Lect. No. 04:Unit 01
41DTEL DTEL
- -
Boundary is now in correct position, but one more data set to process
The Centre for Technology enabled Teaching & Learning , N Y S S, India
FINAL DECISION BOUNDARY
+-
- -
x2 = -(w1 /w2) x1 – b/w2
With current weights: x2 = - x1 +1
Lect. No. 04:Unit 01
42DTEL DTEL
Observation:
� Weights only change for active input neurons, xi ≠ 0
� Hebb Learning will not always find the correct weights even if hey exist
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Using: wi (new) = wi (old) + xi y
b(new) = b(old) + y
And: ∆w1 = x1y ∆ w2 = x2y ∆b = y
w w b
Last Data Set
Lect. No. 04:Unit 01
43DTEL DTEL
x1 x2 x0 y ∆w1 ∆w2 ∆b w1 w2 b
-1 -1 1 -1 1 1 -1 2 2 -2
NEW w1, w2, b
Since previous weights no longer 0, wi (new) = wi (old) + xi y
The Centre for Technology enabled Teaching & Learning , N Y S S, India
LINEAR SEPARABILITY
Linear separability is the concept wherein theseparation of the input space into regions isbased on whether the network response ispositive or negative.
Consider a network having positiveresponse in the first quadrant andnegative response in all other
Lect. No. 05:Unit 01
44DTEL DTEL
negative response in all otherquadrants (AND function) witheither binary or bipolar data, thenthe decision line is drawnseparating the positive responseregion from the negative responseregion.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
� Decision region/boundaryn = 2, b != 0, q = 0
is a line, called decision boundary, which partitions the plane into two decision regions
2x
1x
+
-
21
2
12
2211 or 0
wb
xww
x
wxwxb
−−=
=++
),( xx
Lect. No. 05:Unit 01
45DTEL DTEL
into two decision regionsIf a point/pattern is in the positive region, then
, and the output is one (belongs to class one) Otherwise, , output –1 (belongs to class two)
n = 2, b = 0, q != 0 would result a similar partition
02211 ≥++ wxwxb02211 <++ wxwxb
),( 21 xx
The Centre for Technology enabled Teaching & Learning , N Y S S, India � If n = 3 (three input units), then the decision
boundary is a two dimensional plane in a threedimensional space
� In general, a decision boundary isa n-1 dimensional hyper-plane in an n dimensionalspace, which partition the space into two decisionregions
� This simple network thus can classify a given pattern
Lect. No. 05:Unit 01
46DTEL DTEL
� This simple network thus can classify a given patterninto one of the two classes, provided one of thesetwo classes is entirely in one decision region (oneside of the decision boundary) and the other class isin another region.
� The decision boundary is determined completely bythe weights W and the bias b (or threshold q).
The Centre for Technology enabled Teaching & Learning , N Y S S, India
LINEAR SEPARABILITY PROBLEM
� If two classes of patterns can be separated by a decision boundary,represented by the linear equation
then they are said to be linearly separable. The simple network cancorrectly classify any patterns.
� Decision boundary (i.e., W, b or q) of linearly separable classes canbe determined either by some learning procedures or by solving
Lect. No. 05:Unit 01
47DTEL DTEL
be determined either by some learning procedures or by solvinglinear equation systems based on representative patterns of eachclasses
� If such a decision boundary does not exist, then the two classes aresaid to be linearly inseparable.
� Linearly inseparable problems cannot be solved by the simplenetwork , more sophisticated architecture is needed.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
�Examples of linearly separable classes- Logical AND function
patterns (bipolar) decision boundary
x1 x2 y w1 = 1-1 -1 -1 w2 = 1-1 1 -1 b = -11 -1 -1 q = 01 1 1 -1 + x1 + x2 = 0
x
oo
o
x: class I (y = 1)o: class II (y = -1)
Lect. No. 05:Unit 01
48DTEL DTEL
1 1 1 -1 + x1 + x2 = 0
- Logical OR function
patterns (bipolar) decision boundary
x1 x2 y w1 = 1-1 -1 -1 w2 = 1-1 1 1 b = 11 -1 1 q = 01 1 1 1 + x1 + x2 = 0
o: class II (y = -1)
x
xo
x
x: class I (y = 1)o: class II (y = -1)
The Centre for Technology enabled Teaching & Learning , N Y S S, India � Examples of linearly inseparable classes- Logical XOR (exclusive OR) function
patterns (bipolar) decision boundaryx1 x2 y-1 -1 -1-1 1 11 -1 11 1 -1
o
xo
x
x: class I (y = 1)o: class II (y = -1)
Lect. No. 05:Unit 01
49DTEL DTEL
1 1 -1No line can separate these two classes, as can be seen from the
fact that the following linear inequality system has no solutionbecause we have b < 0 from (1) + (4), and b >= 0 from (2) + (3), which is a contradiction
<++≥−+≥+−<−−
(4)(3)(2)(1)
0 0 0 0
21
21
21
21
wwbwwbwwbwwb
The Centre for Technology enabled Teaching & Learning , N Y S S, India
� XOR can be solved by a more complex network with hidden units
YY
z2z2
z1z1x1x1
x2x2
22
22
22
22
-2-2
-2-2
θ = 1θ = 1
θ = 0θ = 0
Y
z2
z1x1
x2
2
2
2
2
-2
-2
θ = 1
θ = 0
(-1, -1) (-1, -1) -1
Lect. No. 05:Unit 01
50DTEL DTEL
(-1, -1) (-1, -1) -1(-1, 1) (-1, 1) 1(1, -1) (1, -1) 1(1, 1) (1, 1) -1
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Perceptrons
• By Rosenblatt (1962)
– For modeling visual perception (retina)
– Three layers of units: Sensory, Association, and Response
– Learning occurs only on weights from A units to R units
(weights from S units to A units are fixed).
Lect. No. 06:Unit 01
51DTEL DTEL
(weights from S units to A units are fixed).
– A single R unit receives inputs from n A units (same
architecture as our simple network)
– For a given training sample s:t, change weights only if the
computed output y is different from the target output t
(thus error driven)
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Supervised Learning (Perceptron Learning Rule)• Training and test data sets • Training set; input & target are specified
x1 w1 w
Lect. No. 06:Unit 01 Perceptron Network
52DTEL DTEL
Σx1
x2
xn
.
.
.
w1w2
wn
w0o
0
1 ; *
( ) 0 ;
1 ;
n
i ii
i n
w x
f Y Y i n
Y i n
θ
θ θθ
=
>= − ≤ ≤ − < −
∑
The Centre for Technology enabled Teaching & Learning , N Y S S, India
PERCEPTRON LEARNING
wnew = wold + ∆w∆w = η (t - o) xiWhere,
t = target value (known),o = perceptron output (calculated),η = small constant (0.1) i.e. learning rate.xi = input sample
Lect. No. 06:Unit 01
53DTEL DTEL
xi = input sample
• If the output is correct (t = o) the weights wi are not changed• If the output is incorrect (t ≠ o) the weights wi are changed
such that the output of the perceptron for the new weights is closer to t.
• The algorithm converges to the correct classification• if the training data is linearly separable• η is sufficiently small
The Centre for Technology enabled Teaching & Learning , N Y S S, India Perceptron learning rules
Sr. No.
Condition Action
1. The perceptron classifies the input pattern correctly (y_out = t)
No change in the current set of weights
2. The perceptron misclassifies the Increase each by
0 1, ,...., mw w w
w w∆
Lect. No. 06:Unit 01
54DTEL DTEL
2. The perceptron misclassifies the input pattern negatively (y_out = -1 but target = +1)
Increase each by , where is proportional to , for all i=0,1,….,m
3. The perceptron misclassifies the input pattern positively (y_out= +1 but target = -1)
Decrease each by , where is proportional to , for all i=0,1,….,m
iwiw∆
iw∆
ix
iw
iw∆iw∆
ix
The Centre for Technology enabled Teaching & Learning , N Y S S, India
LEARNING ALGORITHM
Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function, the training value will be 1.
Output , O : The output value from the neuron.
Ij : Inputs being presented to the neuron.
Lect. No. 06:Unit 01
55DTEL DTEL
Wj : Weight from input neuron (Ij) to the output neuron.
LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
TRAINING ALGORITHM
• Adjust neural network weights to map inputs to outputs.
• Use a set of sample patterns where the desired output (given the inputs presented) is known.
Lect. No. 06:Unit 01
56DTEL DTEL
known.• The purpose is to learn to
– Recognize features which are common to good and bad exemplars
The Centre for Technology enabled Teaching & Learning , N Y S S, India
MULTILAYER PERCEPTRON
Output Values
Output Layer
AdjustableWeights
Lect. No. 07:Unit 01
57DTEL DTEL
Input Signals
Weights
Input Layer
Figure from Principles of Soft Computing, by S. N. Sivanandam & S. N. Deepa
The Centre for Technology enabled Teaching & Learning , N Y S S, India
LAYERS IN NEURAL NETWORK
• The input layer:– Introduces input values into the network.– No activation function or other processing.
• The hidden layer(s):– Performs classification of features.– Two hidden layers are sufficient to solve any problem.
Lect. No. 07:Unit 01
58DTEL DTEL
– Two hidden layers are sufficient to solve any problem.– Features imply more layers may be better.
• The output layer:– Functionally is just like the hidden layers.– Outputs are passed on to the world outside the neural
network.
The Centre for Technology enabled Teaching & Learning , N Y S S, India
ADALINE
�By Widrow and Hoff (1960) � Adaptive Linear Neuron for signal processing� The same architecture of our simple network� Learning method: delta rule (another way of error
driven), also called Widrow-Hoff learning rule� The delta: t – y_in
Lect. No. 07:Unit 01
59DTEL DTEL
� The delta: t – y_in�NOT t – y because y = f( y_in ) is not differentiable
� Learning algorithm: same as Perceptron learning except in Step 5:b := b + a * (t – y_in)wi := wi + a * xi * (t – y_in)
The Centre for Technology enabled Teaching & Learning , N Y S S, India
� Derivation of the delta rule� Error for all P samples: mean square error
� E is a function of W = {w1, ... wn}� Learning takes gradient descent approach to reduce E by
modify W� the gradient of E:
∑=
−=P
p
pinyptP
E1
2))(_)((1
)......,(EE
E∂∂
∂∂=∇
Lect. No. 07:Unit 01
60DTEL DTEL
�
�
� There for
)......,(1 nww
E∂∂
=∇
ii w
Ew
∂∂−∝∆
i
P
p
P
p ii
xpinyptP
pinyptw
pinyptPw
E
]))(_)((2
[
)(_)(())](_)((2
[
1
1
∑
∑
=
=
−−=
−∂∂−=
∂∂
ii w
Ew
∂∂−∝∆ i
P
xpinyptP
]))(_)((2
[ 1∑ −=
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Recommended Textbooks
• Neural Networks, Fuzzy Logic and Genetic Algorithms, Synthesis and Applications, S. Rajshekharan, Vijayalakshmi Pai
• Elements of Artificial Neural Network, K. Mehrotra,
Lect. No. 07:Unit 01
61DTEL DTEL
• Elements of Artificial Neural Network, K. Mehrotra, MIT, Cognet
• Principles of Soft Computing, by S. N. Sivanandam & S. N. Deepa,
• Fuzzy sets and Fuzzy logic, George Klir,Bo Yuan,PHI
The Centre for Technology enabled Teaching & Learning , N Y S S, India
Thank You
62DTEL DTEL
Recommended