THE PERCEPTRON The McCulloch-Pitts Neuronangom.myweb.cs.uwindsor.ca/teaching/cs574/le2.pdf · THE...

THE PERCEPTRON

The McCulloch-Pitts Neuron

• The first mathematical model of a neuron [WarrenMcCulloch and Walter Pitts, 1943]

• Binary activation: fires (1) or not fires (0)

• Excitatory inputs: the a’s, andInhibitory inputs: the b’s

• Unit weights and fixed threshold θ

• Absolute inhibition

ct+1 =

1 If∑n

i=0 ai,t ≥ θ and b1,t = · · · = bm,t = 0

0 Otherwise

Computing with McCulloch-Pitts Neurons

1b 0 NOT

Any task or phenomenon that can be represented as

a logic function can be modelled by a network of

MP-neurons

• {OR, AND, NOT} is functionally complete

• Any Boolean function can be implemented using

OR, AND and NOT

• Canonical forms: CSOP or CPOS forms

• MP-neurons ⇔ Finite State Automata2

Limitation of MP-neurons and Solution

• Problems with MP-neurons

– Weights and thresholds are analytically determined.

Cannot learn

– Very difficult to minimize size of a network

– What about non-discrete and/or non-binary tasks?

• Perceptron solution [Rosenblatt, 1958]

– Weights and thresholds can be determined ana-

lytically or by a learning algorithm

– Continuous, bipolar and multiple-valued versions

– Efficient minimization heuristics exist

nwΣ θ

Perceptron

x0 = 1

θw0= −

• Architecture

– Input: ~x = (x0 = 1, x1, . . . , xn)

– Weight: ~w = (w0 = −θ, w1, . . . , wn), θ = bias

– Net input: y = ~w~x =∑n

i=0 wixi

– Output f(~x) = g(~w~x) =

{0 If ~w~x < 01 If ~w~x ≥ 0

• Pattern classification

• Supervised learning

• Error-correction learning

Perceptron Analysis

• Perceptron’s decision boundary

w1x1 + · · ·+ wnxn = θ

w0x0 + w1x1 + · · ·+ wnxn = 0

Separating hyperplane

Hyperplane direction

• All points

– below the hyperplane have value 0

– on the hyperplane have the same value

– above the hyperplane have value 1

Perceptron Analysis

(continued)

• Linear Separability

– A problem (or task or set of examples) is lin-

early separable if there exists a hyperplane w0x0+

w1x1+ · · ·+wnxn = 0 that separates the examples

into two distinct classes

– Perceptron can only learn (compute) tasks that

are linearly separable.

– The weight vector ~w of the perceptron correspond

to the coefficients of the separating line

• Non-Linear Separability

– Limitations of the perceptron: many real-world

problems are highly non-linear

– Simpe Boolean functions:

∗ XOR, EQUALITY, . . . etc.

∗ Linear, parity, symmetric or . . . functions

Perceptron Learning Rule

• Test problem

– Let the set of training examples be

[~x1 = (1,2), d1 = 1]

[~x2 = (−1,2), d2 = 0]

[~x3 = (0,−1), d3 = 0]

– The bias (or threshold) be b = 0

– The initial weight vector be ~w = (1,0.8)

We want to obtain a learning algorithm that finds a weight vector~w which will correctly classify (separate) the examples.

(continued)

• First input ~x1 is misclassified with positive error. Whatto do?

• Idea: move hyperplane to separating position

• Solution:

– Move ~w closer to ~x1: add ~x1 to ~w.

∗ ~w = ~w + ~x1

– First rule: positive error rule

If d = 1 and a = 0 then ~wnew = ~wold + ~x

(continued)

• Second input ~x2 is misclassified with negative error

• Solution:

– Move ~w away from ~x2: substract ~x2 from ~w.

∗ ~w = ~w − ~x2

– Second rule: negative error rule

If d = 0 and a = 1 then ~wnew = ~wold − ~x

(continued)

• Third input ~x3 is misclassified with negative error

• Move ~w away from to ~x3: ~w = ~w − ~x3

• The perceptron will correctly classify inputs ~x1, ~x2, ~x3

if presented to it again. There will be no errors

• Third rule: no error rule

If d = a then ~wnew = ~wold

(continued)

• Unified learning rule

~wnew = ~wold + δ~x = ~wold + (d− a)~x

• With learning rate η

~wnew = ~wold + ηδ~x = ~wold + η(d− a)~x

• Choice of learning rate η

– Too large: learning oscillates

– Too small: very slow learning

– 0 < η ≤ 1. Popular choices:

∗ η = 0.5

∗ η = 1

– Variable learning rate η = |~w~x||~x2|

– Adaptive learning rate

– . . . etc.

Perceptron Learning Algorithm

Initialization: ~w0 = ~0;

t = 0;

Repeat

t = t + 1;

Error = 0;

For each training example [~x, d~x] do

net = ~w · ~x;a~x = g(net);

δ~x = d~x − a~x;

Error = Error + |δ~x|;~wt+1 = ~wt + η · δ~x · ~x;{

or equivalently,

For 0 ≤ i ≤ n

wi,t+1 = wi,t + η · δ~x · xi;

}Until Error = 0;

Save last weight vector;

• Perceptron convergence theorem: [M. Minsky and

S. Papert, 1969] The perceptron learning algorithm

terminates if and only if the task is linearly separable

• Cannot learn non-linearly separable functions

Perceptron Learning Algorithm

(continued)

• Termination criteria

– Assured for small enough η and l.s. functions

– For non-l.s. functions: halt when number of mis-

classifications is minimal

• Problem representation

– Non-numeric inputs: encode into numeric form

– Multiple-class problem:

∗ Use single-layer network

∗ Each output node corresponds to one class

∗ A u-neuron network can classify inputs into 2u

classes

• Variations of perceptron

– Bipolar vs. binary encodings

– Threshold vs. signum functions

Pocket Algorithm

• Robust classification for linearly non-separable prob-lems?

• Find ~w such that such that the number of misclassi-fications is as small as possible.

Initialization: ~w0 = PerceptronLearning;

Error ~w0= number of misclassifications of ~w0;

Pocket = ~w0;

t = 0;Repeat

t = t + 1;~wt = PerceptronLearning;

If Error ~wt< Error ~wt−1

Pocket = ~wt;

Until t = MaxIterations;

Best weight so far is stored in Pocket;

• Initial weight in PerceptronLearning should be random

• Presentation of training examples inPerceptronLearning should be random

• Slow but robust learning for non-separable tasks

Adaline

nwΣ f

x0 = 1

• Architecture

– Input: ~x = (x0 = 1, x1, . . . , xn)

– Weight: ~w = (w0 = −θ, w1, . . . , wn), θ = bias

– Net input: y = ~w~x =∑n

i=0 wixi

– Output f(~x) = g(~w~x) = ~w~x

• Pattern classification

• Supervised learning

• Error-correction learning

Adaline Analysis

• Adaline’s decision boundary

w0x0 + w1x1 + · · ·+ wnxn = 0

Separating hyperplane

Hyperplane direction

• The Adaline

– has a decision boundary like the perceptron

– can be used to classify objects into two categories

– has same limitation as the perceptron

Adaline Learning Principle

• Data fitting (or linear regression)

– Set of measurements: {(x, dx)}

– Find w and b such that

dx ≈ wx + b

or more specifically,

di = wxi + b + εi = yi + εi

∗ εi = instantaneous error

∗ yi = linearly fitted value

∗ w = line slope, b = d-axis intercept (or bias)

0 2 4 6 8 10 12

(continued)

• Best fit problem: find the best choice of (~w, b) such

that the fitted line passes closest to all points

• Solution: Least squares

– Minimize sum of squared errors (SSE) or mean of

squared errors (MSE)

– Error ε~x = d~x − d̃~x where d̃~x = ~w~x + b

– MSE:

ε2~xi

0 2 4 6 8 10 12

(continued)

• The minimum MSE, called the least mean square

(LMS) can be obtained analytically:

δ ~w= 0

δb= 0

and solve for ~w and b

• Pattern classification can be interpreted as a linear

• LMS is difficult to obtain for larger dimensions (com-

plex formula) and larger data sets

• Adaline:

– Learns by minimizing the MSE

– Not sensitive to noise

– Powerful and robust learning

Adaline Learning Algorithm

• Gradient descent

– A learning example: [~x, d~x]

– Actual output: net~x =

– Desired output: d~x

– Squared error: E~x = (d~x − net~x)2

– Gradient of E~x:

∇E~x =δE~x

δ ~w= (

δw0,δE~x

δw1, . . . ,

– E~x is minimal if and only if ∇E~x = 0

– Negative gradient of E~x:

−∇E~x

gives direction of steepest descent to the mini-

– Gradient descent:

∆~w = −η∇E~x = −δE~x

(continued)

• Widrow-Hoff delta rule

δwi= 2(d~x − net~x)

δ(−net~x)

δ ~wi

= (d~x − net~x)δ(−∑n

j=0 wjxj)

δ ~wi

= −(d~x − net~x)xi

• ⇒ Learning rule:

~wnew = ~wold + η(d~x − net~x)~x

(continued)

Initialization: ~w0 = ~0;

t = 0;

Repeat

t = t + 1;

For each training example [~x, d~x] do

net~x = ~w · ~x;a~x = g(net~x) = net~x;

δ~x = d~x − a~x;

~wt+1 = ~wt + η · δ~x · ~x;{

or equivalently,

For 0 ≤ i ≤ n

wi,t+1 = wi,t + η · δ~x · xi;

}Until MSE(~w) is minimal;

Save last weight vector;

• Can be used for function approximation task as well

THE PERCEPTRON The McCulloch-Pitts Neuronangom.myweb.cs.uwindsor.ca/teaching/cs574/le2.pdf · THE...

Documents

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks

Redes Neurais: A Terceira Geração · Gerações de Redes Neurais Primeira Geração Neurônios com saída binária Exemplo: neurônio MCP (McCulloch & Pitts, 1943) Segunda Geração

Sesión 009. “Del cerebro al mercado: Redes Neuronales ...umh1480.edu.umh.es/wp-content/uploads/sites/44/... · McCulloch y Pitts en términos de un modelo computacional de actividad

2.1 2.1 Breve reseña histórica de la Neurocomputación a)Los comienzos (década de los cuarenta) El trabajo de Warren McCulloch y Walter Pitts publicado

History of AI Image source. Early excitement 1940s McCulloch & Pitts neurons; Hebb’s learning rule 1950 Turing’s “Computing Machinery and Intelligence”

Introduction - CAS– General theories of learning, vision, conditioning – No specific mathematical models of neuron operation • 1940s: Hebb, McCulloch and Pitts – Mechanism

INFORMATION Gefter (4).pdf98 INFORMATION | ARTIFICIAL INTELLIGENCE ESTATE OF FRANCIS BELLO / SCIENCE SOURCE PITTS HAD FOUND in McCulloch everything he had needed—acceptance, friendship,

Modelos neuronales de tercera - FMR - Iniciofemexrobotica.org/eir2015/wp-content/uploads/2015/01/Modelos... · Utiliza las neuronas McCulloch-Pitts como unidad computacional. En torno

UNIVERSITY POLYTECHNIC OF MADRID FACULTY …oa.upm.es/42680/1/KAMAL_ALRAWI.pdf · UNIVERSITY POLYTECHNIC OF MADRID FACULTY OF COMPUTER SCIENCE ... 1943 when McCulloch and Pitts built

TESIS DE GRADO - bibliotecadigital.umsa.bo:8080bibliotecadigital.umsa.bo:8080/rddu/bitstream/123456789/5336/1/T... · Modelo de neurona McCulloch-Pitts ..... 17 Figura 2.4. Tipología

El Perceptrón 1958 - El psicólogo Frank Ronsenblant desarrolló un modelo simple de neurona basado en el modelo de McCulloch y Pitts que utilizaba una

The Perceptron. Prehistory W.S. McCulloch & W. Pitts (1943). A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics,

Utilización de Redes Neuronales Recurrentes para el ... · Ilustración 5: Neurona original de Mcculloch-Pitts. 11. Los pesos asociados a las neuronas pueden actualizarse de muchas

Interconnect Acceleration for publication for Machine ... · Deep Learning is not new. o. 1943: “A logical calculus of the ideas immanent in nervous activity”, McCulloch & Pitts

Topik...2019/05/02 · Topik • Arsitektur JST • Fungsi aktivasi • McCulloch-Pitts neuron • Decision boundary • Bias Arsitektur JST • JST umumnya divisualisasikan dalam

1 Lecture 5 Neural Control. 2 History Early stages 1943 McCulloch-Pitts: neuron, origins 1948 Wiener: cybernatics 1949 Hebb: learning rule 1958

Modelos neuronales de tercera - FMR - Inicio · Utiliza las neuronas McCulloch-Pitts como unidad computacional. En torno a ellas existen muchos modelos de redes neuronales como los

Simple Neural Nets for Pattern Classification: McCulloch-Pittsjkalita/work/cs587/2014/02ThresholdLogic.pdf · Simple Neural Nets for Pattern Classification: McCulloch-Pitts ... designers

Redes Neuronales - elo.utfsm.clelo328/PDI21_RedesNeuronales.pdf · Las c élulas operan en lapsos discretos. Una red neuronal de c élulas de McCulloch -Pitts tiene la ... el tama

“NETtalk en español” - 148.206.53.84148.206.53.84/tesiuami/UAMI12946.pdf · Por otro lado, en el desarrollo del modelo de una neurona artificial, McCulloch y Pitts combinaron