Upload
mani-singh
View
215
Download
0
Embed Size (px)
Citation preview
8/14/2019 Learning Law in Neural Networks.doc
1/19
Learning
In Neural Network, it is an ability to learn from its environment and to improve its performance
through learning.
Learning implies that the processing element somehow changes its input/output behavior inresponse to tbhe environment. For example, if the processing element originally gives an output
of +1 in response to a particular input pattern, it might have an output of -1 to that same input
pattern after learning takes place. The processing element has somehow changed it mind about
what the correct response to that input should be. What does the processing element do to makethis change?
The output is computed as a result of a transfer function of the weighted input. The net input forthis simple case is computed by multiplying the value of each individual input by itscorresponding weight, or equivalently, taking the dot product of the input and weight vectors.
The processing element then takes this input value and applies the transfer function to it to
compute the resulting output.
Activation function- A function by which new output of the basic unit is derived from a
combination of the net inputs and the current state of the unit (the total input).
axon - The part of a nerve cell through which impulses travel away from the cell body; the
electrically active parts of a nerve-cell.
back-propagation - A learning algorithm for a multilayer network in which the weights are
modified via the propagation of an error signal "backward" from the outputs to the inputs.
connection- A pathway between processing elements, either positive or negative, that links theprocessing elements into a network.
dendrite - The branched part of a nerve cell that caries impulses toward the cell body. The
electrically passive parts of a nerve cell.
learning- The phase in a neural network when new data is introduced into the network, causingthe weights on the processing elements to be adjusted.
neuron - The structural and functional unit of the nervous system, consisting of the nerve cell
body and all its processes, including an axon and one or more dendrons.
8/14/2019 Learning Law in Neural Networks.doc
2/19
perceptron- A large class of simple neuron-like networks with only an input layer and an output
layer. Developed in 1957 by Frank Rosenblatt, this class of neural network had no hidden layer.
summation function- A function that combines the various input activations into a singleactivation.
synapse- The point of contact between adjacent neurons where nerve impulses are transmitted
from one to another.threshold- A minimum level of excitation energy.
training - A process whereby a network learns to associate an input pattern with the correct
answer.
weight- The strength of an input connection expressed by a real number. Processing elementsreceive input via interconnects. Each interconnect has a weight attached to it. The sum of the
weights make up a value that updates the processing element. The output value of a processing
element is described by a level of excitation that causes interconnects to be either on (i.e.
excitatory output) or off (i.e. inhibitory output).
Architecture of neural networks
Feed-forward networks
Feed-forward ANNs (figure 1) allow signals to travel one way only; from input to
output. There is no feedback (loops) i.e. the output of any layer does not affect that
same layer. Feed-forward ANNs tend to be straight forward networks that associate
inputs with outputs. They are extensively used in pattern recognition. This type of
organisation is also referred to as bottom-up or top-down.
Feedback networks
Feedback networks (figure 1) can have signals travelling in both directions by
introducing loops in the network. Feedback networks are very powerful and can get
extremely complicated. Feedback networks are dynamic; their 'state' is changing
continuously until they reach an equilibrium point. They remain at the equilibrium
point until the input changes and a new equilibrium needs to be found. Feedback
architectures are also referred to as interactive or recurrent, although the latter term is
often used to denote feedback connections in single-layer organisations.
8/14/2019 Learning Law in Neural Networks.doc
3/19
Competitive learningis a form ofunsupervised learning inartificial neural networks,in which nodes
compete for the right to respond to a subset of the input data. A variant ofHebbian learning,competitive
learning works by increasing the specialization of each node in the network. It is well suited to
findingclusters within data.
Models and algorithms based on the principle of competitive learning includevector quantization andself-
organising maps (Kohonen maps).
http://en.wikipedia.org/wiki/Unsupervised_learninghttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Hebbian_learninghttp://en.wikipedia.org/wiki/Cluster_analysishttp://en.wikipedia.org/wiki/Vector_quantizationhttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Vector_quantizationhttp://en.wikipedia.org/wiki/Cluster_analysishttp://en.wikipedia.org/wiki/Hebbian_learninghttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Unsupervised_learning8/14/2019 Learning Law in Neural Networks.doc
4/19
Competitive Learning is usually implemented with Neural Networks that contain a hidden layer which is
commonly known as competitive layer.[2]
Every competitive neuron i is described by a vector of
weights and calculates the similarity measure between the
input data and the weight vector .
For every input vector, the competitive neurons compete with each other to see which one of them is the
most similar to that particular input vector. The winner neuron m sets its output and all the other
competitive neurons set their output .
Usually, in order to measure similarity the inverse of the Euclidean distance is
used: between the input vector and the weight vector .
Example
Here is a simple competitive learning algorithm to find three clusters within some input data.
1. (Set-up.) Let a set of sensors all feed into three different nodes, so that every node is connected to
every sensor. Let the weights that each node gives to its sensors be set randomly between 0.0 and 1.0.
Let the output of each node be the sum of all its sensors, each sensor's signal strength being multiplied
by its weight.
2. When the net is shown an input, the node with the highest output is deemed the winner. The input is
classified as being within the cluster corresponding to that node.
3. The winner updates each of its weights, moving weight from the connections that gave it weaker
signals to the connections that gave it stronger signals.
Thus, as more data are received, each node converges on the centre of the cluster that it has come torepresent and activates more strongly for inputs in this cluster and more weakly for inputs in other
clusters.
http://en.wikipedia.org/wiki/File:Competitive_neural_network_architecture.pnghttp://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/File:Competitive_neural_network_architecture.png8/14/2019 Learning Law in Neural Networks.doc
5/19
LEARNING TYPES
Supervised Or Active Learning - learning with an external teacher or a supervisor who
presents training set to the network.
Unsupervised Or Self-Organized Learning does not require an external teacher. During thetraining session, the neural network receives a number of different input patterns, discovers
significant features in these patterns and learns how to classify input data into appropriatecategories.
Unsupervised learning can be used in real-time.
Re-inforcement learning:- The output will be come with the help of feedback. If the output is
match than the result will be +1 otherwise 0 or -1.
Unsupervised Learning Types:-
Adaptive Resonance Theory (ART)is a theory developed by Stephen Grossberg and Gail Carpenter onaspects of how the brain processes information. ART networks consist of an input layer and an outputlayer.
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Carpenter and Grossberg (1987) On-line clustering algorithm Recurrent ANN Competitive output layer Data clustering applications Stability-plasticity dilemma
Stability: system behaviour doesnt change after irrelevant eventsPlasticity: System adapts its behaviour according to significant events
Dilemma: how to achieve stability without rigidity and plasticity without chaos? Ongoing learning capability Preservation of learned knowledge
8/14/2019 Learning Law in Neural Networks.doc
6/19
ART Architecture
Bottom-up weights bij Top-down weights tij Store class template
Input nodes Vigilance test Input normalisation
Output nodes Forward matching
Long-term memory ANN weights
8/14/2019 Learning Law in Neural Networks.doc
7/19
Short-term memory ANN activation pattern
ART Types
ART1: Unsupervised Clustering of binary input vectors.
ART2: Unsupervised Clustering of real-valued input vectors.
ART3: Incorporates "chemical transmitters" to control the searchprocess in a hierarchical ART structure.
ARTMAP: Supervised version of ART that can learn arbitrarymappings of binary patterns.
Fuzzy ART: Synthesis of ART and fuzzy logic.
Fuzzy ARTMAP: Supervised fuzzy ART
dART and dARTMAP: Distributed code representations in the F2 layer(extension of winner take all approach).
Gaussian ARTMAP
ART 1- Analog Adaptive Resonance Theory
ART 1is the simplest variety of ART networks, accepting only binary inputs.
The basic structure of an ART1 neural network involves:
an input processing field (called the F1layer) which happens to consist of two parts:o an input portion (F1(a))
8/14/2019 Learning Law in Neural Networks.doc
8/19
o an interface portion (F1(b))
the cluster units (the F2layer)
and a mechanism to control the degree of similarity of patterns placed on the same cluster
a reset mechanism
weighted bottom-up connections between the F1and F2layers
weighted top-down connections between the F2and F1layers
Reset Module
Fixed connection weights
Implements the vigilance test
Excitatory connection from F1(b)
8/14/2019 Learning Law in Neural Networks.doc
9/19
Inhibitory connection from F1(a)
Output of reset module inhibitory to output layer
Disables firing output node if match with pattern is not close enough
Duration of reset signal lasts until pattern is present
Gain module
Fixed connection weights
Controls activation cycle of input layer
Excitatory connection from input lines
Inhibitory connection from output layer
Output of gain module excitatory to input layer
2/3 rule for input layer
ART1 Example : character recognition
ART 2- Binary Adaptive Resonance Theory
ART 2extends network capabilities to support continuous inputs.
Unsupervised Clustering for :
Real-valued input vectors
Binary input vectors that are noisy
Includes a combination of normalization and noise suppression
8/14/2019 Learning Law in Neural Networks.doc
10/19
Fast Learning
Weights reach equilibrium in each learning trial
Have some of the same characteristics as the weight found by ART1
More appropriate for data in which the primary information is contained in the pattern of
components that are small or large
Slow Learning
Only one weight update iteration performed on each learning trial
Needs more epochs than fast learning
More appropriate for data in which the relative size of the nonzero components is
important
ART 2-A- Binary Adaptive Resonance Theory
ART 2-Ais a streamlined form of ART-2 with a drastically accelerated runtime, and with qualitative results
being only rarely inferior to the full ART-2 implementation.
Applications ART
Natural language processing
Document clustering
Document retrieval
Automatic query
Image segmentation
8/14/2019 Learning Law in Neural Networks.doc
11/19
Character recognition
Data mining
Data set partitioning
Detection of emerging clusters
Fuzzy partitioning
Condition-action association
Hebbian learning
In 1949, Donald Hebb proposed one of the key ideas in biological learning, commonly known as Hebbs
Law. Hebbs Law states that if neuron iis near enough to excite neuron jand repeatedly participates in
its activation, the synaptic connection between these two neurons is strengthened and neuron jbecomes
more sensitive to stimuli from neuron i.
Hebbs Law can be represented in the form of two rules:
1. If two neurons on either side of a connection are activated synchronously, then the weight of
that connection is increased.
2. If two neurons on either side of a connection are activated asynchronously, then the weight of
that connection is decreased.
Hebbs Law provides the basis for learning without a teacher. Learning here is a local phenomenon
occurring without feedback from the environment.
i j
InputSignals
OutputSignal
s
8/14/2019 Learning Law in Neural Networks.doc
12/19
8/14/2019 Learning Law in Neural Networks.doc
13/19
8/14/2019 Learning Law in Neural Networks.doc
14/19
Competitive learning
In competitive learning, neurons compete among themselves to be activated.
While in Hebbian learning, several output neurons can be activated simultaneously, in
competitive learning, only a single output neuron is active at any time.
The output neuron that wins the competition is called the winner-takes-al l neuron.
The basic idea of competitive learning was introduced in the early 1970s.
In the late 1980s, Teuvo Kohonen introduced a special class of artificial neural networks called
self-organising feature maps. These maps are based on competitive learning.
SOM
What is a self-organising feature map?
Our brain is dominated by the cerebral cortex, a very complex structure of billions of neurons and
hundreds of billions of synapses. The cortex includes areas that are responsible for different human
activities (motor, visual, auditory, somatosensory, etc.), and associated with different sensory inputs. Wecan say that each sensory input is mapped into a corresponding area of the cerebral cortex. The cortex
is a self-organising computational map in the human brain.
8/14/2019 Learning Law in Neural Networks.doc
15/19
8/14/2019 Learning Law in Neural Networks.doc
16/19
The Kohonen network
n The Kohonen model provides a topological mapping. It places a fixed number of input patterns
from the input layer into a higher-dimensional output or Kohonen layer.
n Training in the Kohonen network begins with the winners neighbourhood of a fairly large size.
Then, as training proceeds, the neighbourhood size gradually decreases.
n The lateral connections are used to create a competition between neurons. The neuron with the
largest activation level among all neurons in the output layer becomes the winner. This neuron is
the only neuron that produces an output signal. The activity of all other neurons is suppressed in
the competition.
n The lateral feedback connections produce excitatory or inhibitory effects, depending on the
distance from the winning neuron. This is achieved by the use of a Mexican hat functionwhich
describes synaptic weights between neurons in the Kohonen layer.
Supervised Learning Types
Hop Field Network:-
Input
layer
OutputSignals
InputSignals
x1
x2
Output
layer
y1
y2
y3
8/14/2019 Learning Law in Neural Networks.doc
17/19
PERCEPTRON
In machine learning, the perceptronis an algorithm for supervised classification of an input into one of
several possible non-binary outputs. It is a type of linear classifier, i.e. a classification algorithm that
makes its predictions based on a linear predictor function combining a set of weights with the featurevector describing a given input using the delta rule. The learning algorithm for perceptrons is an online
algorithm, in that it processes elements in the training set one at a time.
The perceptron algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank
Rosenblatt.
The perceptron is a binary classifier which maps its input (a real-valuedvector)to an output
value (a singlebinary value):
http://en.wikipedia.org/wiki/Vector_spacehttp://en.wikipedia.org/wiki/Binary_functionhttp://en.wikipedia.org/wiki/Binary_functionhttp://en.wikipedia.org/wiki/Vector_space8/14/2019 Learning Law in Neural Networks.doc
18/19
where is a vector of real-valued weights, is thedot product (which here computes a weighted
sum), and is the 'bias', a constant term that does not depend on any input value.
The value of (0 or 1) is used to classify as either a positive or a negative instance, in the case of
a binary classification problem. If is negative, then the weighted combination of inputs must produce a
positive value greater than in order to push the classifier neuron over the 0 threshold. Spatially, thebias alters the position (though not the orientation) of thedecision boundary.The perceptron learning
algorithm does not terminate if the learning set is notlinearly separable.If the vectors are not linearly
separable learning will never reach a point where all vectors are classified properly. The most famous
example of the perceptron's inability to solve problems with linearly nonseparable vectors is the Boolean
exclusive-or problem. The solution spaces of decision boundaries for all binary functions and learning
behaviors are studied in the reference.
In the context ofartificial neural networks,a perceptron is anartificial neuron using theHeaviside step
function as the activation function. The perceptron algorithm is also termed the single-layer perceptron,
to distinguish it from a multilayer perceptron, which is a misnomer for a more complicated neural network.
As a linear classifier, the single-layer perceptron is the simplest feedforward neural network.
ADALINE(Adaptive Linear Neuronor later Adaptive Linear Element) is an early single-layer neural
network and the name of the physical device that implemented this network .[1]
It was developed by
Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based
on the McCullochPitts neuron. It consists of a weight, a bias and a summation function.
The difference between Adaline and the standard (McCullochPitts) perceptron is that in the learning
phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard
perceptron, the net is passed to the activation (transfer) function and the function's output is used for
adjusting the weights.
There also exists an extension known asMadaline
.
http://en.wikipedia.org/wiki/Dot_producthttp://en.wikipedia.org/wiki/Decision_boundaryhttp://en.wikipedia.org/wiki/Linearly_separablehttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/Madalinehttp://en.wikipedia.org/wiki/Madalinehttp://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Linearly_separablehttp://en.wikipedia.org/wiki/Decision_boundaryhttp://en.wikipedia.org/wiki/Dot_product8/14/2019 Learning Law in Neural Networks.doc
19/19
Adaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and
generates one output. Given the following variables:
x is the input vector
w is the weight vector
n is the number of inputs
some constant
y is the output
then we find that the output is . If we further assume that
then the o/p reduces to the dot product of x and w