Learning Law in Neural Networks.doc

8/14/2019 Learning Law in Neural Networks.doc

1/19

Learning

In Neural Network, it is an ability to learn from its environment and to improve its performance

through learning.

Learning implies that the processing element somehow changes its input/output behavior inresponse to tbhe environment. For example, if the processing element originally gives an output

of +1 in response to a particular input pattern, it might have an output of -1 to that same input

pattern after learning takes place. The processing element has somehow changed it mind about

what the correct response to that input should be. What does the processing element do to makethis change?

The output is computed as a result of a transfer function of the weighted input. The net input forthis simple case is computed by multiplying the value of each individual input by itscorresponding weight, or equivalently, taking the dot product of the input and weight vectors.

The processing element then takes this input value and applies the transfer function to it to

compute the resulting output.

Activation function- A function by which new output of the basic unit is derived from a

combination of the net inputs and the current state of the unit (the total input).

axon - The part of a nerve cell through which impulses travel away from the cell body; the

electrically active parts of a nerve-cell.

back-propagation - A learning algorithm for a multilayer network in which the weights are

modified via the propagation of an error signal "backward" from the outputs to the inputs.

connection- A pathway between processing elements, either positive or negative, that links theprocessing elements into a network.

dendrite - The branched part of a nerve cell that caries impulses toward the cell body. The

electrically passive parts of a nerve cell.

learning- The phase in a neural network when new data is introduced into the network, causingthe weights on the processing elements to be adjusted.

neuron - The structural and functional unit of the nervous system, consisting of the nerve cell

body and all its processes, including an axon and one or more dendrons.


2/19

perceptron- A large class of simple neuron-like networks with only an input layer and an output

layer. Developed in 1957 by Frank Rosenblatt, this class of neural network had no hidden layer.

summation function- A function that combines the various input activations into a singleactivation.

synapse- The point of contact between adjacent neurons where nerve impulses are transmitted

from one to another.threshold- A minimum level of excitation energy.

training - A process whereby a network learns to associate an input pattern with the correct

answer.

weight- The strength of an input connection expressed by a real number. Processing elementsreceive input via interconnects. Each interconnect has a weight attached to it. The sum of the

weights make up a value that updates the processing element. The output value of a processing

element is described by a level of excitation that causes interconnects to be either on (i.e.

excitatory output) or off (i.e. inhibitory output).

Architecture of neural networks

Feed-forward networks

Feed-forward ANNs (figure 1) allow signals to travel one way only; from input to

output. There is no feedback (loops) i.e. the output of any layer does not affect that

same layer. Feed-forward ANNs tend to be straight forward networks that associate

inputs with outputs. They are extensively used in pattern recognition. This type of

organisation is also referred to as bottom-up or top-down.

Feedback networks

Feedback networks (figure 1) can have signals travelling in both directions by

introducing loops in the network. Feedback networks are very powerful and can get

extremely complicated. Feedback networks are dynamic; their 'state' is changing

continuously until they reach an equilibrium point. They remain at the equilibrium

point until the input changes and a new equilibrium needs to be found. Feedback

architectures are also referred to as interactive or recurrent, although the latter term is

often used to denote feedback connections in single-layer organisations.


3/19

Competitive learningis a form ofunsupervised learning inartificial neural networks,in which nodes

compete for the right to respond to a subset of the input data. A variant ofHebbian learning,competitive

learning works by increasing the specialization of each node in the network. It is well suited to

findingclusters within data.

Models and algorithms based on the principle of competitive learning includevector quantization andself-

organising maps (Kohonen maps).
http://en.wikipedia.org/wiki/Unsupervised_learninghttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Hebbian_learninghttp://en.wikipedia.org/wiki/Cluster_analysishttp://en.wikipedia.org/wiki/Vector_quantizationhttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Self-organising_maphttp://en.wikipedia.org/wiki/Vector_quantizationhttp://en.wikipedia.org/wiki/Cluster_analysishttp://en.wikipedia.org/wiki/Hebbian_learninghttp://en.wikipedia.org/wiki/Artificial_neural_networkshttp://en.wikipedia.org/wiki/Unsupervised_learning


4/19

Competitive Learning is usually implemented with Neural Networks that contain a hidden layer which is

commonly known as competitive layer.[2]

Every competitive neuron i is described by a vector of

weights and calculates the similarity measure between the

input data and the weight vector .

For every input vector, the competitive neurons compete with each other to see which one of them is the

most similar to that particular input vector. The winner neuron m sets its output and all the other

competitive neurons set their output .

Usually, in order to measure similarity the inverse of the Euclidean distance is

used: between the input vector and the weight vector .

Example

Here is a simple competitive learning algorithm to find three clusters within some input data.

1. (Set-up.) Let a set of sensors all feed into three different nodes, so that every node is connected to

every sensor. Let the weights that each node gives to its sensors be set randomly between 0.0 and 1.0.

Let the output of each node be the sum of all its sensors, each sensor's signal strength being multiplied

by its weight.

2. When the net is shown an input, the node with the highest output is deemed the winner. The input is

classified as being within the cluster corresponding to that node.

3. The winner updates each of its weights, moving weight from the connections that gave it weaker

signals to the connections that gave it stronger signals.

Thus, as more data are received, each node converges on the centre of the cluster that it has come torepresent and activates more strongly for inputs in this cluster and more weakly for inputs in other

clusters.
http://en.wikipedia.org/wiki/File:Competitive_neural_network_architecture.pnghttp://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/Competitive_learning#cite_note-2http://en.wikipedia.org/wiki/File:Competitive_neural_network_architecture.png


5/19

LEARNING TYPES

Supervised Or Active Learning - learning with an external teacher or a supervisor who

presents training set to the network.

Unsupervised Or Self-Organized Learning does not require an external teacher. During thetraining session, the neural network receives a number of different input patterns, discovers

significant features in these patterns and learns how to classify input data into appropriatecategories.

Unsupervised learning can be used in real-time.

Re-inforcement learning:- The output will be come with the help of feedback. If the output is

match than the result will be +1 otherwise 0 or -1.

Unsupervised Learning Types:-

Adaptive Resonance Theory (ART)is a theory developed by Stephen Grossberg and Gail Carpenter onaspects of how the brain processes information. ART networks consist of an input layer and an outputlayer.

Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Carpenter and Grossberg (1987) On-line clustering algorithm Recurrent ANN Competitive output layer Data clustering applications Stability-plasticity dilemma

Stability: system behaviour doesnt change after irrelevant eventsPlasticity: System adapts its behaviour according to significant events

Dilemma: how to achieve stability without rigidity and plasticity without chaos? Ongoing learning capability Preservation of learned knowledge


6/19

ART Architecture

Bottom-up weights bij Top-down weights tij Store class template

Input nodes Vigilance test Input normalisation

Output nodes Forward matching

Long-term memory ANN weights


7/19

Short-term memory ANN activation pattern

ART Types

ART1: Unsupervised Clustering of binary input vectors.

ART2: Unsupervised Clustering of real-valued input vectors.

ART3: Incorporates "chemical transmitters" to control the searchprocess in a hierarchical ART structure.

ARTMAP: Supervised version of ART that can learn arbitrarymappings of binary patterns.

Fuzzy ART: Synthesis of ART and fuzzy logic.

Fuzzy ARTMAP: Supervised fuzzy ART

dART and dARTMAP: Distributed code representations in the F2 layer(extension of winner take all approach).

Gaussian ARTMAP

ART 1- Analog Adaptive Resonance Theory

ART 1is the simplest variety of ART networks, accepting only binary inputs.

The basic structure of an ART1 neural network involves:

an input processing field (called the F1layer) which happens to consist of two parts:o an input portion (F1(a))


8/19

o an interface portion (F1(b))

the cluster units (the F2layer)

and a mechanism to control the degree of similarity of patterns placed on the same cluster

a reset mechanism

weighted bottom-up connections between the F1and F2layers

weighted top-down connections between the F2and F1layers

Reset Module

Fixed connection weights

Implements the vigilance test

Excitatory connection from F1(b)


9/19

Inhibitory connection from F1(a)

Output of reset module inhibitory to output layer

Disables firing output node if match with pattern is not close enough

Duration of reset signal lasts until pattern is present

Gain module

Fixed connection weights

Controls activation cycle of input layer

Excitatory connection from input lines

Inhibitory connection from output layer

Output of gain module excitatory to input layer

2/3 rule for input layer

ART1 Example : character recognition

ART 2- Binary Adaptive Resonance Theory

ART 2extends network capabilities to support continuous inputs.

Unsupervised Clustering for :

Real-valued input vectors

Binary input vectors that are noisy

Includes a combination of normalization and noise suppression


10/19

Fast Learning

Weights reach equilibrium in each learning trial

Have some of the same characteristics as the weight found by ART1

More appropriate for data in which the primary information is contained in the pattern of

components that are small or large

Slow Learning

Only one weight update iteration performed on each learning trial

Needs more epochs than fast learning

More appropriate for data in which the relative size of the nonzero components is

important

ART 2-A- Binary Adaptive Resonance Theory

ART 2-Ais a streamlined form of ART-2 with a drastically accelerated runtime, and with qualitative results

being only rarely inferior to the full ART-2 implementation.

Applications ART

Natural language processing

Document clustering

Document retrieval

Automatic query

Image segmentation


11/19

Character recognition

Data mining

Data set partitioning

Detection of emerging clusters

Fuzzy partitioning

Condition-action association

Hebbian learning

In 1949, Donald Hebb proposed one of the key ideas in biological learning, commonly known as Hebbs

Law. Hebbs Law states that if neuron iis near enough to excite neuron jand repeatedly participates in

its activation, the synaptic connection between these two neurons is strengthened and neuron jbecomes

more sensitive to stimuli from neuron i.

Hebbs Law can be represented in the form of two rules:

1. If two neurons on either side of a connection are activated synchronously, then the weight of

that connection is increased.

2. If two neurons on either side of a connection are activated asynchronously, then the weight of

that connection is decreased.

Hebbs Law provides the basis for learning without a teacher. Learning here is a local phenomenon

occurring without feedback from the environment.

i j

InputSignals

OutputSignal

s


12/19


13/19


14/19

Competitive learning

In competitive learning, neurons compete among themselves to be activated.

While in Hebbian learning, several output neurons can be activated simultaneously, in

competitive learning, only a single output neuron is active at any time.

The output neuron that wins the competition is called the winner-takes-al l neuron.

The basic idea of competitive learning was introduced in the early 1970s.

In the late 1980s, Teuvo Kohonen introduced a special class of artificial neural networks called

self-organising feature maps. These maps are based on competitive learning.

SOM

What is a self-organising feature map?

Our brain is dominated by the cerebral cortex, a very complex structure of billions of neurons and

hundreds of billions of synapses. The cortex includes areas that are responsible for different human

activities (motor, visual, auditory, somatosensory, etc.), and associated with different sensory inputs. Wecan say that each sensory input is mapped into a corresponding area of the cerebral cortex. The cortex

is a self-organising computational map in the human brain.


15/19


16/19

The Kohonen network

n The Kohonen model provides a topological mapping. It places a fixed number of input patterns

from the input layer into a higher-dimensional output or Kohonen layer.

n Training in the Kohonen network begins with the winners neighbourhood of a fairly large size.

Then, as training proceeds, the neighbourhood size gradually decreases.

n The lateral connections are used to create a competition between neurons. The neuron with the

largest activation level among all neurons in the output layer becomes the winner. This neuron is

the only neuron that produces an output signal. The activity of all other neurons is suppressed in

the competition.

n The lateral feedback connections produce excitatory or inhibitory effects, depending on the

distance from the winning neuron. This is achieved by the use of a Mexican hat functionwhich

describes synaptic weights between neurons in the Kohonen layer.

Supervised Learning Types

Hop Field Network:-

Input

layer

OutputSignals

InputSignals

x1

x2

Output

layer

y1

y2

y3


17/19

PERCEPTRON

In machine learning, the perceptronis an algorithm for supervised classification of an input into one of

several possible non-binary outputs. It is a type of linear classifier, i.e. a classification algorithm that

makes its predictions based on a linear predictor function combining a set of weights with the featurevector describing a given input using the delta rule. The learning algorithm for perceptrons is an online

algorithm, in that it processes elements in the training set one at a time.

The perceptron algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank

Rosenblatt.

The perceptron is a binary classifier which maps its input (a real-valuedvector)to an output

value (a singlebinary value):
http://en.wikipedia.org/wiki/Vector_spacehttp://en.wikipedia.org/wiki/Binary_functionhttp://en.wikipedia.org/wiki/Binary_functionhttp://en.wikipedia.org/wiki/Vector_space


18/19

where is a vector of real-valued weights, is thedot product (which here computes a weighted

sum), and is the 'bias', a constant term that does not depend on any input value.

The value of (0 or 1) is used to classify as either a positive or a negative instance, in the case of

a binary classification problem. If is negative, then the weighted combination of inputs must produce a

positive value greater than in order to push the classifier neuron over the 0 threshold. Spatially, thebias alters the position (though not the orientation) of thedecision boundary.The perceptron learning

algorithm does not terminate if the learning set is notlinearly separable.If the vectors are not linearly

separable learning will never reach a point where all vectors are classified properly. The most famous

example of the perceptron's inability to solve problems with linearly nonseparable vectors is the Boolean

exclusive-or problem. The solution spaces of decision boundaries for all binary functions and learning

behaviors are studied in the reference.

In the context ofartificial neural networks,a perceptron is anartificial neuron using theHeaviside step

function as the activation function. The perceptron algorithm is also termed the single-layer perceptron,

to distinguish it from a multilayer perceptron, which is a misnomer for a more complicated neural network.

As a linear classifier, the single-layer perceptron is the simplest feedforward neural network.

ADALINE(Adaptive Linear Neuronor later Adaptive Linear Element) is an early single-layer neural

network and the name of the physical device that implemented this network .[1]

It was developed by

Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based

on the McCullochPitts neuron. It consists of a weight, a bias and a summation function.

The difference between Adaline and the standard (McCullochPitts) perceptron is that in the learning

phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard

perceptron, the net is passed to the activation (transfer) function and the function's output is used for

adjusting the weights.

There also exists an extension known asMadaline

.
http://en.wikipedia.org/wiki/Dot_producthttp://en.wikipedia.org/wiki/Decision_boundaryhttp://en.wikipedia.org/wiki/Linearly_separablehttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/Madalinehttp://en.wikipedia.org/wiki/Madalinehttp://en.wikipedia.org/wiki/ADALINE#cite_note-1http://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/Heaviside_step_functionhttp://en.wikipedia.org/wiki/Artificial_neuronhttp://en.wikipedia.org/wiki/Artificial_neural_networkhttp://en.wikipedia.org/wiki/Linearly_separablehttp://en.wikipedia.org/wiki/Decision_boundaryhttp://en.wikipedia.org/wiki/Dot_product


19/19

Adaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and

generates one output. Given the following variables:

x is the input vector

w is the weight vector

n is the number of inputs

some constant

y is the output

then we find that the output is . If we further assume that

then the o/p reduces to the dot product of x and w

Documents

Learning Law in Neural Networks.doc