38
Perceptron

Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Embed Size (px)

Citation preview

Page 1: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Perceptron

Page 2: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Inner-product scalar Perceptron

Perceptron learning rule

XOR problem linear separable patterns

Gradient descent Stochastic Approximation to gradient descent

LMS Adaline

Page 3: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Inner-product

A measure of the projection of one vector onto another

net =<r w ,

r x >=||

r w ||⋅ ||

r x ||⋅cos(θ)

net = wi ⋅ x i

i=1

n

Page 4: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Example

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 5: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Activation function

o = f (net) = f ( wi ⋅ x i)i=1

n

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

f (x) := sgn(x) =1 if x ≥ 0

−1 if x < 0

⎧ ⎨ ⎩

Page 6: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

f (x) := σ (x) =1

1+ e(−ax )

f (x) := ϕ (x) =1 if x ≥ 0

0 if x < 0

⎧ ⎨ ⎩

sigmoid function€

f (x) := ϕ (x) =

1 if x ≥ 0.5

x if 0.5 > x > 0.5

0 if x ≤ −0.5

⎨ ⎪

⎩ ⎪

Page 7: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Graphical representation

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 8: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Similarity to real neurons...

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 9: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

1040 Neurons 104-5 connections

per neuronZur Anzeige wird der QuickTime™

Dekompressor „TIFF (LZW)“ benötigt.

Page 10: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Perceptron Linear treshold unit (LTU)

x1

x2

xn

.

..

w1

w2

wn

w0

x0=1

o

net = wi ⋅ x i

i= 0

n

o = sgn(net) =1 if net ≥ 0

−1 if net < 0

⎧ ⎨ ⎩

McCulloch-Pitts model of a neuron

Page 11: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

The goal of a perceptron is to correctly classify the set of pattern D={x1,x2,..xm} into one of the classes C1 and C2

The output for class C1 is o=1 and fo C2 is o=-1

For n=2 Zur Anzeige wird der QuickTime™

Dekompressor „TIFF (LZW)“ benötigt.

Page 12: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Lineary separable patterns

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

o = sgn( wix i

i= 0

n

∑ )

wix i

i= 0

n

∑ > 0 for C0

wix i

i= 0

n

∑ ≤ 0 for C1

Page 13: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Perceptron learning rule Consider linearly separable problems How to find appropriate weights Look if the output pattern o belongs to the

desired class, has the desired value d

is called the learning rate 0 < ≤ 1

wnew = wold + Δw

Δw = η (d − o)

Page 14: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

In supervised learning the network has ist output compared with known correct answers Supervised learning Learning with a teacher

(d-o) plays the role of the error signal

Page 15: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Perceptron The algorithm converges to the correct classification

if the training data is linearly separable and is sufficiently small

When assigning a value to we must keep in mind two conflicting requirements Averaging of past inputs to provide stable weights

estimates, which requires small Fast adaptation with respect to real changes in the

underlying distribution of the process responsible for the generation of the input vector x, which requires large

Page 16: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Several nodes

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

o1 = sgn( w1ix i

i= 0

n

∑ )

o2 = sgn( w2ix i

i= 0

n

∑ )

Page 17: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

o j = sgn( w jix i

i= 0

n

∑ )

Page 18: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Constructions

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“

benötigt.

Page 19: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Frank Rosenblatt

1928-1969

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 20: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 21: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 22: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 23: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Rosenblatt's bitter rival and professional nemesis was Marvin Minsky of Carnegie Mellon University

Minsky despised Rosenblatt, hated the concept of the perceptron, and wrote several polemics against him

For years Minsky crusaded against Rosenblatt on a very nasty and personal level, including contacting every group who funded Rosenblatt's research to denounce him as a charlatan, hoping to ruin Rosenblatt professionally and to cut off all funding for his research in neural nets

Page 24: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

XOR problem and Perceptron

By Minsky and Papert in mid 1960

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 25: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 26: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Gradient Descent To understand, consider simpler linear

unit, where

Let's learn wi that minimize the squared error, D={(x1,t1),(x2,t2), . .,(xd,td),..,(xm,tm)}

• (t for target)€

o = wi ⋅ x i

i= 0

n

Page 27: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Error for different hypothesis, for w0 and w1 (dim 2)

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 28: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

We want to move the weigth vector in the direction that decrease E

wi=wi+Δwi w=w+Δw

Page 29: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

The gradient

Page 30: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Differentiating E

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 31: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Update rule for gradient decent

Δwi = η (td

d ∈D

∑ − od )x id

Page 32: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Gradient DescentGradient-Descent(training_examples, )

Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value, is the learning rate (e.g. 0.1)

Initialize each wi to some small random value Until the termination condition is met, Do

Initialize each Δwi to zero

For each <(x1,…xn),t> in training_examples Do

• Input the instance (x1,…,xn) to the linear unit and compute the output o

• For each linear unit weight wi

• Do Δwi= Δwi + (t-o) xi

For each linear unit weight wi

Do

• wi=wi+Δwi€

Δwi = η (td

d ∈D

∑ − od )x id

Page 33: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 34: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Stochastic Approximation to gradient descent

The gradient decent training rule updates summing over all the training examples D

Stochastic gradient approximates gradient decent by updating weights incrementally

Calculate error for each example Known as delta-rule or LMS (last mean-square)

weight update Adaline rule, used for adaptive filters Widroff and Hoff

(1960)

Δwi = η (t − o)x i

Page 35: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

LMS Estimate of the weight vector No steepest decent No well defined trajectory in the weight space Instead a random trajectory (stochastic gradient

descent) Converge only asymptotically toward the

minimum error Can approximate gradient descent arbitrarily

closely if made small enough

Page 36: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Summary Perceptron training rule guaranteed to succeed if

Training examples are linearly separable Sufficiently small learning rate

Linear unit training rule uses gradient descent or LMS guaranteed to converge to hypothesis with minimum squared error Given sufficiently small learning rate Even when training data contains noise Even when training data not separable by H

Page 37: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

Inner-product scalar Perceptron

Perceptron learning rule

XOR problem linear separable patterns

Gradient descent Stochastic Approximation to gradient descent

LMS Adaline

Page 38: Perceptron. Inner-product scalar Perceptron Perceptron learning rule XOR problem linear separable patterns Gradient descent Stochastic Approximation to

XOR?Multi-Layer Networks

input layer

hidden layer

output layer