Chapter4_Part3 Differential Heb learning & Differential Competitive learning Tutor : Prof. Gao Reporter : WangYing

Chapter4_Part3Chapter4_Part3Differential Heb learning & Differential Heb learning &

Differential Competitive Differential Competitive

learninglearning

Chapter4_Part3Chapter4_Part3Differential Heb learning & Differential Heb learning &

Differential Competitive Differential Competitive

learninglearning

Tutor : Prof. Gao Reporter : WaTutor : Prof. Gao Reporter : WangYingngYing

2006.10.30

Review

Signal Heb Learning Law

Competitive Learning Law

ij ij i jm m S S

ij j ij i jm S m S S ij j i ijm S S m

2006.10.30

Part I: Differential Heb Learning

Learning law

Its simpler version

Hebbian correlations promote spurious causal associations among concurrently active units. Differential correlations estimate the concurrent and presumably causal variation among active units.

ij ij i j i jm m S S S S

ij ij i jm m S S

2006.10.30

Differential Heb Learning

Fuzzy Cognitive Maps (FCMs)Adaptive Causal InferenceKlopf’s Drive Reinforcement ModelConcomitant Variation as Statistical Covari

ancePulse-Coded Differential Hebbian Learning

2006.10.30

Fuzzy Cognitive Maps ( 模糊认知映射 )

Fuzzy signed directed graphs with feedback. It model the world as a collection of classes and causal relations between classes.

The directed edge from causal concept to concept measures how much causes .

ije iC

jC iC jC

iC jCije

: Sells of computers

: Profits

iC jC

2006.10.30

Fuzzy Cognitive Map of South African Politics

外国投资矿业雇用黑人

白人种族激进主义

工作保留法律

黑人种族联合

种族隔离政府管理力度

民族政党支持者

1c

2C

3C

4C5C 6C

7C

8C 9C

2006.10.30

Causal Connection Matrix

E

1

2

3

4

5

6

7

8

9

C

C

C

C

C

C

C

C

C

0 1 1 0 0 0 0 1 1

0 0 1 0 0 0 0 1 0

0 0 0 1 0 1 0 1 1

0 0 0 0 0 1 1 0 1

0 1 1 0 0 1 1 0 0

0 0 0 1 0 0 1 1 0

0 0 0 0 1 0 0 1 0

0 0 0 0 0 0 1 0 0

0 0 0 0 1 0 0 1 0

1 2 3 4 5 6 7 8 9C C C C C C C C C

2006.10.30

TAM recall process

2C 1C

We start with the foreign investment policy

Then

The arrow indicates the threshold operation with, say, ½ as the threshold value.

So zero causal input produces zero causal output. contains equals 1 because we are testing the foreign-investment policy option. Next

Next

So is a fixed point of the FCM dynamical system.

1 1 0 0 0 0 0 0 0 0C

1 0 1 1 0 0 0 0 1 1C E

1 1 1 0 0 0 0 1 1 2C

2 0 1 2 1 1 1 1 4 1C E 1 1 1 1 0 0 0 1 1 3C

3 0 1 2 1 1 0 0 4 1C E

31 1 1 1 0 0 0 1 1 C

3C

2006.10.30

Strengths and weaknesses of FCM

Advantages☺Experts: 1.represent factual and evaluative concepts in an int

eractive framework; 2.quickly draw FCM pictures or respond to questionnaires; 3.consent or dissent to the local causal structure and perhaps the global equilibrations.

☺FCM knowledge representation and inferencing structure: reduces to simple vector-matrix operations, favors integrated-circuit implementation, and allows extension to neural, statistical, or dynamical systems techniques.

Disadvantages It equally encodes the expert’s knowledge or ignorance, wis

dom or prejudice. Since different experts differ in how they assign causal strengths to edges, and in which concepts they deem causally relevant, the FCM seems merely to encode its designer’s biases, and may not even encode them accurately.

2006.10.30

Combination of FCMs We combined arbitrary FCM connection matrices

by adding augmented(增广 )FCM matrices . We add the pointwise to yield the combined FCM matrix :

Some experts may be more credible than others. We can weight each expert with a nonnegative credibility weight by multiplicatively weighting the expert’s augmented FCM matrix:

Adding FCM matrices represents a simple form of causal learning.

1,..., kE E

1,..., kF F iF

F

ii

F F

i

i ii

F F

2006.10.30




2006.10.30

Adaptive Causal Inference

We infer causality between variables when we observe concomitant variation or lagged variation between them. If B changes when A changes, we suspect a causal relationship. The more correlated the changes, the more we suspect a causal relationship, or, more accurately.

Time derivatives measure changes. Products of derivatives correlate changes. This leads to the simplest differential Hebbian learning law: ij ij i je e C C

2006.10.30

Adaptive Causal Inference

The passive decay term forces zero causality between unchanging concepts.

The concomitant-variation term indicates causal increase or decrease according to joint concept movement. If and both increase or both decrease, the product of derivatives is positive, v.v.

The concomitant-variation term provides a simple causal “arrow of time”.

ije

i jCC

iC

jC

2006.10.30




2006.10.30

Klopf’s Drive Reinforcement Model

Harry Klopf independently proposed the following discrete variant of differential Hebbian learning:

where the synaptic difference updates the

current synaptic efficacy in the first-order difference equation

ijm t

1

T

ij j j j ij i ik

m t S y t c m t k S x t k

1ij ij ijm t m t m t ijm t

2006.10.30


The term drive reinforcement arises from variables and their velocities. Klopf defines a neuronal drive as the weighted signal and a neuronal reinforcer as the weighted difference .

A differentiable version of the drive-reinforcement model take the form:

The synaptic magnitude amplifies the synapse’s plasticity. In particular, suppose the ijth synapse is excitatory: . Then we can derive:

Implicitly the passive decay coefficient scales the term. The coefficient will usually be much smaller than unity to prevent rapid forgetting:

ij ij ij i jm m m S S

ijm0ijm

1ij ij i jm m S S

ijm

ij ij i jm m S S

ij im S

ij im S

2006.10.30


Drive-reinforcement synapses can rapidly encode neuronal signal information. Moreover, signal velocities or directions tend to be more robust, more noise tolerant.

Unfortunately, it tend to zero as they equilibrate, and they equilibrate exponentially quickly. This holds for both excitatory and inhibitory synapses.

2006.10.30


The equilibrium condition implies that

or in general. This would hold equally in a signal Hebbian model if we replaced the signal product with the magnitude -weighted product .

Klopf apparently overcomes this tendency in his simulations by forbidding zero synaptic values: .

0ijm

0ij i jm S S

0ijm

0.1ijm t

i jS S

ij i jm S S

2006.10.30


The simple differential Hebbian learning law

equilibrates to More generally the differential Hebbian law learns

an exponentially weighted average of sampled concomitant variations, since it has the solution

in direct analogy to the signal-Hebbian integral equation.

ij ij i jm m S S

ij i jm S S

0

0tt s t

ij ij i jm t m e S s S s e ds

2006.10.30




2006.10.30

Concomitant Variation as Statistical Covariance

The very term concomitant variation resembles the term covariance. In differential Hebbian learning we interpreted variation as time change, and concomitance as conjunction or product. Alternatively we can interpret variation spatially as a statistical variance or covariance.

Sejnowski has cast synaptic modification as a mean-squared optimization problem and derived a covariance-based solution. After some simplifications the optimal solution takes the form of the covariance learning law

,ij ij i i j jm m Cov S x S y

2006.10.30


Since We can derive

The stochastic-approximation approach estimates the unknown expectation with the observed realization product

So we estimate a random process with its observed time samples

xy i j i jE S S S S

, x zCov x z E xz m m

ij ij xy i j x i y jm m E S S E S E S

ij ij i j x i y jm m S S E S E S

2006.10.30


Suppose instead that we estimate the unknown joint-expectation term

as the observed time samples in the

integrand:

This leads to the new covariance learning law

How should a synapse estimate the unknown averages and at each time t?

xy i x i j y jE S E S S E S

xyE ,i j i x i j y jCov S S S E S S E S

ij ij i x i j y jm m S E S S E S

x iE S t y jE S t

2006.10.30


We can lag slightly the stochastic-approximation estimate in time to make a martingale assumption. A martingale assumption estimates the immediate future as the present, or the present as the immediate past

for some time instant s arbitrarily close to t. The assumption increases in accuracy as s approaches t.

0x i x i i

i

E S t E S t S s for s t

S s

1ij ij i jm t m t S t S t

2006.10.30


This approximation assumes that the signal processes are well-behaved: continuous, have finite variance, and are at least approximately wide-sense stationary.

In an approximate sense when time averages resemble ensemble averages, differential Hebbian learning and covariance learning coincide.

2006.10.30




2006.10.30

Pulse-Coded Differential Hebbian Learning

The velocity-difference property for pulse-coded signal functions

The pulse-coded differential Hebbian law replaces the signal velocities in the usual differential Hebbian law with the two differences

When no pulse are present, the pulse-coded DHL reduces to the random-signal Heb law.

ij ij i j ij

ij i j i j i j j i ij

m m S S n

m S S x y x S y S n

i i iS t x t S t j j jS t y t S t

2006.10.30


Replace the binary pulse functions with the bipolar pulse functions, and then suppose the pulses and the expected pulse frequencies, are pairwise independent. Then the average behavior reduces to

the ensemble-averages random signal Hebbian learning law or, equivalently, the classical deterministic-signal Hebbian learning law.

ij ij i jE m E m E S E S

2006.10.30


In the language of estimation theory, both random-signal Heb learning and random pulse-coded differential Heb learning provide unbiased estimators of signal Heb learning.

The pulse frequencies and can be interpret ergodically (time averages equaling space averages) as ensemble averages

,0i i i

i i

S t E x t x s s t

E x t x s

jSiS

2006.10.30


Substituting these martingale assumptions into pulse-coded DHL

It suggests that random pulse-coded DHL provides a real-time stochastic approximation to covariance learning

This show again how differential Heb learning and covariance learning coincide when appropriate time averages resemble ensemble averages.

ij ij i i i j j j ijm m x E x t x s y E y t y s n

,ij ij i j ijm m Cov S S n

2006.10.30

Part II: Differential Competitive Learning

Learning law

Learn only if change! The signal velocity is a local reinforceme

nt mechanism. Its sign indicates whether the jth neurons are winning or losing, and its magnitude measures by how much.

ij j j i i ij ijm S y S x m n

jS

2006.10.30

Differential Competitive Learning

If the velocity-difference property replaces the competitive signal velocity ,then the pulse-coded differential competitive learning law is just the difference of nondifferential competitive laws

jS

ij j j i ij ij

j i ij j i ij ij

m y S S m n

y S m S S m n

YF

Winning!

jy t jS tijm =1

Losing!

=0

0

Losing

2006.10.30

Competitive signal velocity & supervised reinforcement function

Both of them use a sign change to punish misclassifying .

Both of them tend to rapidly estimate unknown pattern-class centroids.

The unsupervised signal velocity dose not depend on unknown class memberships, it estimates this information with instantaneous win-rate information.

Even uses less information: DCL will perform comparably to SCL!

2006.10.30

Computation of postsynaptic

signal velocity

Velocity-difference property Nonlinear derivative reduces to the locally

available difference lies between ,except when The signal velocity at time is estimated by mere

presence or absence of the postsynaptic pulse .

jS

1 0j jS S 0j jy S 1j jand y S

0 0j jS S

jS

0 1jS

jy tt

high-speed sensory environmentsstimulus patterns shift constantly

slower, stabler pattern environments

j jy S

jS

2006.10.30

Differential-competitive synaptic conjecture then states:

Synapse can physically detect the presence or absence of pulse as a change in the postsynaptic neuron’s polarization.

Synapse can clearly detects the presynaptic pulse train , and thus the pulse-train’s pulse count in the most recent 30 milliseconds or so.

ix t

jy

jS t

Synapse

Incoming pulse train

Detected

postsynaptic pulse

Electrochemically

2006.10.30

Behavior patterns involved in animal learning

Klopf and Gluck suggest that input signal velocities provide pattern information for this.

Pulse-coded differential Hebbian learn

ing

Pulse-coded differential Hebbian learn

ing

Classical signal Hebbian learning

Classical signal Hebbian learning

Pulse-coded differential competitive

learning

Pulse-coded differential competitive

learning

Ordinary caseMicroscope

Process signalsstore, recognize, recall patterns

Noisy synaptic vectors can locally estimate pattern centroids in real time without supervision.

2006.10.30

Differential Competitive Learning as Delta Modulation

The discrete differential competitive learning law

represents a neural version of adaptive delta modulation. In communication theory, delta-modulation systems tra

nsmit consecutive sampled amplitude differences instead of the sampled amplitude values themselves. A delta-modulation system may transmit only signals, indicating local increase or decrease in the underlying sampled waveform.

1j j j j k jm k m k S y k x m k

1

2006.10.30

Differential Competitive Learning as Delta Modulation

Signal difference can be approximated as the activation difference

The signum operator sgn(.) behaves as a modified threshold function

It fixes the step size of the delta modulation, and a varia

ble step size will results in adaptive delta modulation.

sgn 1

j j j

j j

S y t y t

y t y t

1 0

sgn 0 0

1 0

if x

x if x

if x

jS jy

2006.10.30

Consecutive differences, more informative than consecutive samples

We define the statistical correlation between random variables x and z, it takes values in the bipolar interval [1,-1], x and z are positively correlated if , v.v.

Let denote pulse difference

Suppose the wide-sense-stationary random sequence is zero mean, and each of them has the same finite variance.

2

2

11 ,

1

j j

j j

j j

E y k y ky k y k

E y k y k

,x z

jy k

1k j jd y k y k

jykd

0

2006.10.30

Consecutive differences, more informative than consecutive samples

Random sequence also has zero mean. The above properties simplifies the variance as

If consecutive samples are highly positively correlated, if the differences have less variance than the samples

In the pulse-coded case, when the jth neuron wins, it emits a dense pulse train, this winning pulse frequency nay be sufficiently high to satisfy the property.

2 2

2 2 2 2

1 2 1

2 2 1

k j j j jV d E y k E y k E y k y k

kd 1 ,j jy k y k

kd

1 2

Thanks for your Thanks for your attention!attention!

Thanks for your Thanks for your attention!attention!

Documents

Chapter4_Part3 Differential Heb learning & Differential Competitive learning Tutor : Prof. Gao Reporter : WangYing