81
Artificial Neural Networks : An Introduction G.Anuradha

Artificial Neural Networks : An Introduction

Embed Size (px)

DESCRIPTION

Artificial Neural Networks : An Introduction. G.Anuradha. Learning Objectives. Reasons to study neural computation Comparison between biological neuron and artificial neuron Basic models of ANN Different types of connections of NN, Learning and activation function - PowerPoint PPT Presentation

Citation preview

Page 1: Artificial Neural Networks : An Introduction

Artificial Neural Networks : An Introduction

G.Anuradha

Page 2: Artificial Neural Networks : An Introduction

Learning Objectives

• Reasons to study neural computation

• Comparison between biological neuron and artificial neuron

• Basic models of ANN

• Different types of connections of NN, Learning and activation function

• Basic fundamental neuron model-McCulloch-Pitts neuron and Hebb network

Page 3: Artificial Neural Networks : An Introduction

Reasons to study neural computation

• To understand how brain actually works– Computer simulations are used for this

purpose

• To understand the style of parallel computation inspired by neurons and their adaptive connections– Different from sequential computation

• To solve practical problems by using novel learning algorithms inspired by brain

Page 4: Artificial Neural Networks : An Introduction

Biological Neural Network

Page 5: Artificial Neural Networks : An Introduction

Neuron and a sample of pulse train

Page 6: Artificial Neural Networks : An Introduction

How does the brain work

• Each neuron receives inputs from other neurons– Use spikes to communicate

• The effect of each input line on the neuron is controlled by a synaptic weight– Positive or negative

• Synaptic weight adapts so that the whole network learns to perform useful computations– Recognizing objects, understanding languages,

making plans, controlling the body• There are 1011 neurons with 104 weights.

Page 7: Artificial Neural Networks : An Introduction

Modularity and brain

• Different bits of the cortex do different things• Local damage to the brain has specific effects• Early brain damage makes function relocate• Cortex gives rapid parallel computation plus

flexibility• Conventional computers requires very fast

central processors for long sequential computations

Page 8: Artificial Neural Networks : An Introduction

Information flow in nervous system

Page 9: Artificial Neural Networks : An Introduction

ANN

• ANN posess a large number of processing elements called nodes/neurons which operate in parallel.

• Neurons are connected with others by connection link.

• Each link is associated with weights which contain information about the input signal.

• Each neuron has an internal state of its own which is a function of the inputs that neuron receives- Activation level

Page 10: Artificial Neural Networks : An Introduction

Comparison between brain verses computer Brain ANN

Speed Few ms. Few nano sec. massive ||el processing

Size and complexity 1011 neurons & 1015

interconnectionsDepends on designer

Storage capacity Stores information in its interconnection or in synapse.

No Loss of memory

Contiguous memory locations

loss of memory may happen sometimes.

Tolerance Has fault tolerance No fault tolerance Inf gets disrupted when interconnections are disconnected

Control mechanism Complicated involves chemicals in biological neuron

Simpler in ANN

Page 11: Artificial Neural Networks : An Introduction

Artificial Neural Networks

x1

x2

X1

X2

w1

w2

Y ynX

1 1 2 2iny x w x w

( )iny f y

Page 12: Artificial Neural Networks : An Introduction

McCulloch-Pitts Neuron Model

Page 13: Artificial Neural Networks : An Introduction

McCulloch Pits for And and or model

Page 14: Artificial Neural Networks : An Introduction

McCulloch Pitts for NOT Model

Page 15: Artificial Neural Networks : An Introduction

Advantages and Disadvantages of McCulloch Pitt model

• Advantages

• Simplistic• Substantial computing

power

• Disadvantages– Weights and

thresholds are fixed– Not very flexible

Page 16: Artificial Neural Networks : An Introduction

Features of McCulloch-Pitts model

• Allows binary 0,1 states only

• Operates under a discrete-time assumption

• Weights and the neurons’ thresholds are fixed in the model and no interaction among network neurons

• Just a primitive model

Page 17: Artificial Neural Networks : An Introduction

General symbol of neuron consisting of processing node and

synaptic connections

Page 18: Artificial Neural Networks : An Introduction

Neuron Modeling for ANN

Is referred to activation function. Domain is set of activation values net.

Scalar product of weight and input vector

Neuron as a processing node performs the operation of summation of its weighted input.

Page 19: Artificial Neural Networks : An Introduction

Binary threshold neurons

• There are two equivalent ways to write the equations for a binary threshold neuron:

y

ii

iwxz

z1 if

0 otherwisey

1 if

0 otherwise

Page 20: Artificial Neural Networks : An Introduction

Sigmoid neurons

• These give a real-valued output that is a smooth and bounded function of their total input.– Typically they use the

logistic function– They have nice

derivatives which make learning easy

0.5

00

1

z

y

Page 21: Artificial Neural Networks : An Introduction

Activation function

• Bipolar binary and unipolar binary are called as hard limiting activation functions used in discrete neuron model

• Unipolar continuous and bipolar continuous are called soft limiting activation functions are called sigmoidal characteristics.

Page 22: Artificial Neural Networks : An Introduction

Activation functionsBipolar continuous

Bipolar binary functions

Page 23: Artificial Neural Networks : An Introduction

Activation functionsUnipolar continuous

Unipolar Binary

Page 24: Artificial Neural Networks : An Introduction

Common models of neurons

Binary perceptrons

Continuous perceptrons

Page 25: Artificial Neural Networks : An Introduction

Quiz

• Which of the following tasks are neural networks good at?– Recognizing fragments of words in a pre-

processed sound wave.– Recognizing badly written characters.– Storing lists of names and birth dates.– logical reasoning

Neural networks are good at finding statistical regularities that allow them to recognize patterns. They are not good at flawlessly

applying symbolic rules or storing exact numbers.

Page 26: Artificial Neural Networks : An Introduction

Basic models of ANN

Basic Models of ANN

Interconnections Learning rules Activation function

Page 27: Artificial Neural Networks : An Introduction

Classification based on interconnections

Page 28: Artificial Neural Networks : An Introduction

Feed-forward neural networks

• These are the commonest type of neural network in practical applications.

– The first layer is the input and the last layer is the output.

– If there is more than one hidden layer, we call them “deep” neural networks.

• They compute a series of transformations that change the similarities between cases.

– The activities of the neurons in each layer are a non-linear function of the activities in the layer below.

hidden units

output units

input units

Page 29: Artificial Neural Networks : An Introduction

Feedforward Network

• Its output and input vectors are respectively

• Weight wij connects the i’th neuron with j’th input. Activation rule of ith neuron is

where

EXAMPLE

Page 30: Artificial Neural Networks : An Introduction

Multilayer feed forward network

Can be used to solve complicated problems

Page 31: Artificial Neural Networks : An Introduction

Feedback networkWhen outputs are directed back as inputs to same or preceding layer nodes it results in the formation of feedback networks

Page 32: Artificial Neural Networks : An Introduction

Lateral feedbackIf the feedback of the output of the processing elements is directed back as input to the processing elements in the same layer then it is called lateral feedback

Page 33: Artificial Neural Networks : An Introduction

Recurrent networks

• These have directed cycles in their connection graph.– That means you can sometimes get back to

where you started by following the arrows. • They can have complicated dynamics and this

can make them very difficult to train.– There is a lot of interest at present in finding

efficient ways of training recurrent nets.• They are more biologically realistic.

Recurrent nets with multiple hidden layers are just a special case that has some of the hiddenhidden connections missing.

Page 34: Artificial Neural Networks : An Introduction

Recurrent neural networks for modeling sequences

• Recurrent neural networks are a very natural way to model sequential data:

– They are equivalent to very deep nets with one hidden layer per time slice.

– Except that they use the same weights at every time slice and they get input at every time slice.

• They have the ability to remember information in their hidden state for a long time.

– But its very hard to train them to use this potential.

input

input

input

hidden

hidden

hidden

output

output

outputtime

Page 35: Artificial Neural Networks : An Introduction

An example of what recurrent neural nets can now do (to whet your interest!)

• Ilya Sutskever (2011) trained a special type of recurrent neural net to predict the next character in a sequence.

• After training for a long time on a string of half a billion characters from English Wikipedia, he got it to generate new text.– It generates by predicting the probability distribution for the next

character and then sampling a character from this distribution.

Page 36: Artificial Neural Networks : An Introduction

Symmetrically connected networks

• These are like recurrent networks, but the connections between units are symmetrical (they have the same weight in both directions).– John Hopfield (and others) realized that symmetric networks are

much easier to analyze than recurrent networks.– They are also more restricted in what they can do. because they

obey an energy function.• For example, they cannot model cycles.

• Symmetrically connected nets without hidden units are called “Hopfield nets”.

Page 37: Artificial Neural Networks : An Introduction

Symmetrically connected networks with hidden units

• These are called “Boltzmann machines”.– They are much more powerful models than Hopfield nets.– They are less powerful than recurrent neural networks.– They have a beautifully simple learning algorithm.

Page 38: Artificial Neural Networks : An Introduction

Basic models of ANN

Basic Models of ANN

Interconnections Learning rules Activation function

Page 39: Artificial Neural Networks : An Introduction

Learning

• It’s a process by which a NN adapts itself to a stimulus by making proper parameter adjustments, resulting in the production of desired response

• Two kinds of learning– Parameter learning:- connection weights are

updated– Structure Learning:- change in network

structure

Page 40: Artificial Neural Networks : An Introduction

Training

• The process of modifying the weights in the connections between network layers with the objective of achieving the expected output is called training a network.

• This is achieved through– Supervised learning– Unsupervised learning– Reinforcement learning

Page 41: Artificial Neural Networks : An Introduction

Classification of learning

• Supervised learning:-– Learn to predict an output when given an

input vector.

• Unsupervised learning– Discover a good internal representation of the

input.

• Reinforcement learning– Learn to select an action to maximize payoff.

Page 42: Artificial Neural Networks : An Introduction

Supervised Learning

• Child learns from a teacher

• Each input vector requires a corresponding target vector.

• Training pair=[input vector, target vector]

NeuralNetwork

W

ErrorSignal

Generator

X

(Input)

Y

(Actual output)

(Desired Output)

Error

(D-Y) signals

Page 43: Artificial Neural Networks : An Introduction

• Each training case consists of an input vector x and a target output t.

• Regression: The target output is a real number or a whole vector of real numbers.– The price of a stock in 6 months time.– The temperature at noon tomorrow.

• Classification: The target output is a class label.– The simplest case is a choice between 1 and 0.– We can also have multiple alternative labels.

Two types of supervised learning

Page 44: Artificial Neural Networks : An Introduction

Unsupervised Learning

• How a fish or tadpole learns

• All similar input patterns are grouped together as clusters.

• If a matching input pattern is not found a new cluster is formed

• One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning.

• It provides a compact, low-dimensional representation of the input.

Page 45: Artificial Neural Networks : An Introduction

Self-organizing

• In unsupervised learning there is no feedback

• Network must discover patterns, regularities, features for the input data over the output

• While doing so the network might change in parameters

• This process is called self-organizing

Page 46: Artificial Neural Networks : An Introduction

Reinforcement Learning

NNW

ErrorSignal

Generator

X

(Input)

Y

(Actual output)

Error

signals R

Reinforcement signal

Page 47: Artificial Neural Networks : An Introduction

When Reinforcement learning is used?

• If less information is available about the target output values (critic information)

• Learning based on this critic information is called reinforcement learning and the feedback sent is called reinforcement signal

• Feedback in this case is only evaluative and not instructive

Page 48: Artificial Neural Networks : An Introduction

Basic models of ANN

Basic Models of ANN

Interconnections Learning rules Activation function

Page 49: Artificial Neural Networks : An Introduction

1. Identity Functionf(x)=x for all x

2. Binary Step function

3. Bipolar Step function

4. Sigmoidal Functions:- Continuous functions 5. Ramp functions:-

Activation Function

ifx

ifxxf

0

1{)(

ifx

ifxxf

1

1{)(

00

10

11

)(

ifx

xifx

ifx

xf

Page 50: Artificial Neural Networks : An Introduction

Some learning algorithms we will learn are

• Supervised:• Adaline, Madaline• Perceptron• Back Propagation• multilayer perceptrons• Radial Basis Function Networks

• Unsupervised• Competitive Learning• Kohenen self organizing map• Learning vector quantization• Hebbian learning

Page 51: Artificial Neural Networks : An Introduction

Neural processing

• Recall:- processing phase for a NN and its objective is to retrieve the information. The process of computing o for a given x

• Basic forms of neural information processing– Auto association– Hetero association– Classification

Page 52: Artificial Neural Networks : An Introduction

Neural processing-Autoassociation

• Set of patterns can be stored in the network

• If a pattern similar to a member of the stored set is presented, an association with the input of closest stored pattern is made

Page 53: Artificial Neural Networks : An Introduction

Neural Processing- Heteroassociation

• Associations between pairs of patterns are stored

• Distorted input pattern may cause correct heteroassociation at the output

Page 54: Artificial Neural Networks : An Introduction

Neural processing-Classification

• Set of input patterns is divided into a number of classes or categories

• In response to an input pattern from the set, the classifier is supposed to recall the information regarding class membership of the input pattern.

Page 55: Artificial Neural Networks : An Introduction

Important terminologies of ANNs

• Weights

• Bias

• Threshold

• Learning rate

• Momentum factor

• Vigilance parameter

• Notations used in ANN

Page 56: Artificial Neural Networks : An Introduction

Weights

• Each neuron is connected to every other neuron by means of directed links

• Links are associated with weights

• Weights contain information about the input signal and is represented as a matrix

• Weight matrix also called connection matrix

Page 57: Artificial Neural Networks : An Introduction

Weight matrix

W=1

2

3

.

.

.

.

.

T

T

T

T

n

www

w

=

11 12 13 1

21 22 23 2

1 2 3

...

...

..................

...................

...

m

m

n n n nm

w w w ww w w w

w w w w

Page 58: Artificial Neural Networks : An Introduction

Weights contd…• wij –is the weight from processing element ”i” (source node)

to processing element “j” (destination node)

X1

1

XiYj

Xn

w1j

wij

wnj

bj

0

0 0 1 1 2 2

01

1

....

n

i ijinji

j j j n nj

n

j i iji

n

j i ijinji

y x w

x w x w x w x w

w x w

y b x w

Page 59: Artificial Neural Networks : An Introduction

Activation Functions

• Used to calculate the output response of a neuron.

• Sum of the weighted input signal is applied with an activation to obtain the response.

• Activation functions can be linear or non linear• Already dealt

– Identity function– Single/binary step function– Discrete/continuous sigmoidal function.

Page 60: Artificial Neural Networks : An Introduction

Bias

• Bias is like another weight. Its included by adding a component x0=1 to the input vector X.

• X=(1,X1,X2…Xi,…Xn)

• Bias is of two types– Positive bias: increase the net input– Negative bias: decrease the net input

Page 61: Artificial Neural Networks : An Introduction

Why Bias is required?

• The relationship between input and output given by the equation of straight line y=mx+c

X YInput

C(bias)

y=mx+C

Page 62: Artificial Neural Networks : An Introduction

Threshold

• Set value based upon which the final output of the network may be calculated

• Used in activation function• The activation function using threshold can be

defined as

ifnet

ifnetnetf

1

1)(

Page 63: Artificial Neural Networks : An Introduction

Learning rate

• Denoted by α.

• Used to control the amount of weight adjustment at each step of training

• Learning rate ranging from 0 to 1 determines the rate of learning in each time step

Page 64: Artificial Neural Networks : An Introduction

Other terminologies

• Momentum factor: – used for convergence when momentum factor

is added to weight updation process.

• Vigilance parameter:– Denoted by ρ– Used to control the degree of similarity

required for patterns to be assigned to the same cluster

Page 65: Artificial Neural Networks : An Introduction

Neural Network Learning rules

c – learning constant

Page 66: Artificial Neural Networks : An Introduction

Hebbian Learning Rule

• The learning signal is equal to the neuron’s output

FEED FORWARD UNSUPERVISED LEARNING

Page 67: Artificial Neural Networks : An Introduction

Features of Hebbian Learning

• Feedforward unsupervised learning

• “When an axon of a cell A is near enough to exicite a cell B and repeatedly and persistently takes place in firing it, some growth process or change takes place in one or both cells increasing the efficiency”

• If oixj is positive the results is increase in weight else vice versa

Page 68: Artificial Neural Networks : An Introduction
Page 69: Artificial Neural Networks : An Introduction

Perceptron Learning rule• Learning signal is the difference between the

desired and actual neuron’s response• Learning is supervised

Page 70: Artificial Neural Networks : An Introduction

Example

Page 71: Artificial Neural Networks : An Introduction

Quiz

• Suppose we have 3D input x=(0.5,-0.5) connected to a neuron with weights w=(2,-1) and bias b=0.5. furthermore the target for x is t=0. in this case we use a binary threshold neuron for the output so that

y=1 if xTw+b>=0 and 0 otherwise

What will be the weights and bias after 1 iteration of perceptron learning algorithm?

w= (1.5,-0.5) b=-1.5 w=(1.5,-0.5) b=-0.5 w=(2.5,-1.5) b=0.5 w=(-1.5,0.5) b=1.5

Page 72: Artificial Neural Networks : An Introduction

Delta Learning Rule

• Only valid for continuous activation function• Used in supervised training mode• Learning signal for this rule is called delta• The aim of the delta rule is to minimize the error over all training

patterns

Page 73: Artificial Neural Networks : An Introduction

Delta Learning Rule Contd.

Learning rule is derived from the condition of least squared error.

Calculating the gradient vector with respect to wi

Minimization of error requires the weight changes to be in the negative gradient direction

Page 74: Artificial Neural Networks : An Introduction

Widrow-Hoff learning Rule

• Also called as least mean square learning rule• Introduced by Widrow(1962), used in supervised learning• Independent of the activation function• Special case of delta learning rule wherein activation function is an

identity function ie f(net)=net• Minimizes the squared error between the desired output value di

and neti

Page 75: Artificial Neural Networks : An Introduction

Winner-Take-All learning rules

Page 76: Artificial Neural Networks : An Introduction

Winner-Take-All Learning rule Contd…

• Can be explained for a layer of neurons• Example of competitive learning and used for

unsupervised network training• Learning is based on the premise that one of the

neurons in the layer has a maximum response due to the input x

• This neuron is declared the winner with a weight

Page 77: Artificial Neural Networks : An Introduction
Page 78: Artificial Neural Networks : An Introduction

Summary of learning rules

Page 79: Artificial Neural Networks : An Introduction

Linear Separability

• Separation of the input space into regions is based on whether the network response is positive or negative

• Line of separation is called linear-separable line.

• Example:-– AND function & OR function are linear

separable Example– EXOR function Linearly inseparable. Example

Page 80: Artificial Neural Networks : An Introduction

Hebb Network

• Hebb learning rule is the simpliest one• The learning in the brain is performed by the

change in the synaptic gap• When an axon of cell A is near enough to excite

cell B and repeatedly keep firing it, some growth process takes place in one or both cells

• According to Hebb rule, weight vector is found to increase proportionately to the product of the input and learning signal.

yxoldwneww iii )()(

Page 81: Artificial Neural Networks : An Introduction

Flow chart of Hebb training algorithm

Start

Initialize Weights

For Each

s:t

Activate inputxi=si

1

1

Activate outputy=t

Weight updateyxoldwneww iii )()(

Bias updateb(new)=b(old) + y

Stop

y

n