EE141 1 Universal Learning Models Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University

EE1411

Universal Learning Universal Learning ModelsModels

Janusz A. Starzyk

Computational IntelligenceComputational Intelligence

Based on a course taught by Prof. Randall O'Reilly University of Colorado and Prof. Włodzisław DuchUniwersytet Mikołaja Kopernika

EE1412

Task learningTask learningWe want to combine Hebbian learning and learning using error correction, hidden units and biologically justified models.

Hebbian networks model states of the world but not perception-action.

Error correction can learn mapping. Unfortunately the delta rule is only good for output units, and not hidden units, because it has to be given a goal.

Backpropagation of errors can teach hidden units.But there is no good biological justification for this method…

The idea of backpropagation is simple but a detailed algorithms requires many calculations.

Main idea: we're looking for the minimum error function, measuring the difference between the desired behavior and the behavior realized by the network.

EE1413

Error functionError functionE(w) – error function, dependent on all parameters of network w, is the sum of errors E(X;w) for all images X. ok(X;w) – values reached on output nr. k network for image X. tk(X;w) – values desired on output nr. k network for image X.

One image X, one parameter w then: 2; ;E w t o w X X X

Error value f. =0 is not always attainable, the network may not have enough parameters to learn the desired behavior, we can only aim for the smallest error.

In the minimum error E(X;w) is for parameter w for derivative dE(X;w)/dw = 0.

For many parameters we have all derivativesdE/dwi, or gradient.

EE1414

Error propagationError propagationThe delta rule minimizes error for one neuron, e.g.. the output neuron, which is reached by signals si

wik = ||tk – ok|| si

What signals should we take for hidden neurons? First we let signals into the network calculating activation , output signals from neurons h, through all layers, to the outputs ok (forward step).We calculate the errors k = (tk-ok), and corrections for the output neurons wik = k hi. Error for hidden neurons: j = k wjk k hj(1-hj), (backward step)

(backpropagation of error). The strongest correction for undecided weights – near 0.5

EE1415

GeneRecGeneRecAlthough most models used in psychology teach multilayer perceptron structures with the help of variations of backpropagation (in this way one can learn any function) the idea of transferring information about errors doesn't have a biological justification.

GeneRec (General Recirculation, O’Reilly 1996), Bi-directional signal propagation, asymmetrical weights wkl wjk.

First phase –, response of the network to the activation of x– gives output y–, then observation of the desired result y+ and propagation to input x+. The change in weights requires information about signals from both phases.

EE1416

GeneRec - learningGeneRec - learningThe learning rule agrees with the delta rule:

In comparison with backpropagation the difference of signals [y+-y-] replaces the aggregate error, (the difference of signals) ~ (the difference of activations) * (the derivative of the activation function), thus it is a gradient rule.

For setups is xi=1, so:

ij j j iw y y x

j j jy y Bi-directional information transfer is almost simultaneous, answers for the formation of attractor states, constraint satisfaction, image completion.

The P300 wave which appears 300 msec after activation shows expectations resulting from external activationErrors are the result of activity in the whole network, we will get slightly better results taking the average [x++x-]/2 and retaining the weight symmetry:

ij i j i jw x y x y CHL rule (Contrastive Hebbian Rule)

EE1417

Two phasesTwo phasesFrom where does the error come for correction of synaptic connections?

The layer on the right side = the middle after time t+1; e.g.. a) word pronunciation: external action correction; b) external expectations and someone's pronunciation; c) awaiting results of action and their observation; d) reconstruction (awaiting input).

EE1418

GeneRec propertiesGeneRec propertiesHebbian learning creates a model of the world, remembering correlations, but it is not capable of learning task execution.

Hidden layers allow for the transformation of a problem and error correction permits learning of difficult task execution, the relationships of inputs and outputs.

The combination of Hebbian learning – correlations (x y) – and error-based learning can learn everything in a biologically correct manner: CHL leads to symmetry, an approximate symmetry will suffice, connections are generally bidirectional. Err = CHL in the table.

Lack of Ca2+ = there is no learning; little Ca2+ = LTD, much Ca2+ = LTPLTD – unfulfilled expectations, only phase -, lack of z + reinforcement.

*

* *

*

EE1419

Combination of Hebb + errorsCombination of Hebb + errors

Advantages Disadvantages

Hebb

(Local)

Autonomic narrow

Reliable greedy

Error

(Remote)

Purposeful interdependent

Cooperative lazy

It's good to combine Hebbian learning and CHL error correction

CHL is like socialism tries to correct errors of the

whole, limits unit motivation, common responsibility low effectiveness planed activity

Hebbian learning is like capitalism based on greed local interests individualism efficacy of activity lack of monitoring the whole

EE14110

Combination of Hebb + errorsCombination of Hebb + errorsIt's good to combine Hebbian learning and CHL error correction

Correlations and errors:

Combination

Additionally, inhibition within layers is necessary: it creates economical internal representations, units compete with each other, only the best remain, specialized, makes possible self-organized learning.

EE14111

Simulation of a difficult problemSimulation of a difficult problemGenrec.proj.gz, chapt. 5.93 hidden units.Learning is interrupted after 5 epochs without error.

Errors during learning show substantial fluctuations – networks with recurrence are sensitive to small changes in weight, explore different solutions. Compare with learning easy and difficult tasks using only Hebb.

EE14112

Inhibitory competition as a constraintInhibitory competition as a constraintInhibition

Leads to sparse distributed representations (many representations, only some are useful in a concrete situation)

Competition and specialization: survival of the best adapted

Self-organized learning

Often more important than Hebbian

Inhibition was also used in the mixture of experts framework

gating units are subject to WTA competition control outputs of the experts

EE14113

Comparison of weight change in learningComparison of weight change in learning

View of hidden layer weights in Hebbian learning

Neural weights are introduced in reference to particular inputs

View of hidden layer weights in error correction learning

The weights seem fairly random when compared with Hebbian learning

EE14114

Comparison of weight change in learningComparison of weight change in learning

Charts comparing a) training errors b) number of cycles as functions of the number of training epochs for three different learning methods

Hebbian (Pure Hebb) Error correction (Pure Err) Combination (Hebb& Err) – which attained the best results

Epochs

b)

EE14115

Full Leabra modelFull Leabra model

Inhibition within layers, Hebbian learning + error correction for weights between layers.

6 principles of intelligent system construction.

1. Biological realism2. Distributed representations3. Inhibitory competition4. Bidirectional

Activation Propagation1. Error-driven learning2. Hebbian learning

EE14116

GeneralizationGeneralizationHow do we deal with things which we've never seen

every time we enter the classroom, every meeting, every sentence that you hear, etc.

We always encounter new situations, and we reasonably generalize them

How do we do this?

nust

EE14117

Good representationsGood representations

Internal distributed representations. New concepts are combinations of existing properties.

Hebbian learning + competition based on inhibition limit error correction so as to create good representations.

EE14118

Generalization in attractor networksGeneralization in attractor networksThe GeneRec rule itself doesn't lead to good generalization. Simulations: model_and_task.proj. gz, Chapt. 6

The Hebb parameter controls how much CHL and how much Hebb.

Pure_err realizes only CHL, check phases - and +

Compare internal representations for different types of learning.

EE14119

Deep networksDeep networksTo learn difficult problems, many transformations are necessary, strongly changing the representation of the problem.

Error signals become weak and learning is difficult.

We must add limits and self-organizing learning.

Analogy: Balancing several connected sticks is difficult, but adding self-organizing learning between fragments will simplify this significantly – like adding a gyroscope to each element.

EE14120

Sequential learningSequential learningExcept for object and relationship recognition and task execution, sequential learning is important, eg. the sequence of words in the sentences:

The dog bit the man. The man bit the dog.

The child lifted up the toy.

I drove through the intersection because the car on the right was just approaching.

The meaning of words, gestures, behaviors, depends on the sequence, the context.

Time plays a fundamental role: the consequences of the appearance of image X may be visible only with a delay, eg. the consequences of the position of figures during a game are only evident after a few turns.

Network models react immediately – how do brains do this?

EE14121

Family treeFamily treeExample simulation: family_trees.proj.gz, Chapt. 6.4.1

What is still missing? Temporal and sequential relationships!

EE14122

Sequential learningSequential learning

Cluster plot showing the representation of hidden layer neurons a) before learning b) after learning using a combined Hebbian and error-correction

method

The trained network has two branches corresponding to two families

EE14123

Sequential learningSequential learningCategories of temporal relationships:

Sequences with a given structure

Delayed in time Continuous trajectories

The context is represented in the frontal lobes of the cortex

it should affect the hidden layer.We need recurrent networks, which can hold onto context information for a period of time. Simple Recurrent Network, SRN,

The context layer is a copy of the hidden layerElman network.

EE14124

Sequential learningSequential learningBiological justification for context representationFrontal lobes of the cortex

Responsible for planning and performing temporal activities. People with damaged frontal lobes have trouble performing

the sequence of an activity even though they have no problem with the individual steps of the activity

Frontal lobes are responsible for temporal representations For example words such as “fly” or “pole” acquire

meanings based on the context Context is a function of previously acquired information

People with schizophrenia can use context directly before an ambiguous word but not context from a previous sentence.

Context representations not only lead to sequential behavior but are also necessary for understanding sequentially presented information such as speech.

EE14125

Examples of sequential learningExamples of sequential learningCan we discover rules of sequence creation? Examples:

BTXSEBPVPSEBTSXXTVVEBPTVPSE

A machine with consecutive passages produces these behaviors:

Are these sequences acceptable?

BTXXTTVVETSXSEVVSXEBSSXSE

As studies have shown, people can learn more quickly to recognize letters produced according to a specific pattern, even if they don't know the rules being used

EE14126

Network realizationNetwork realization

The network randomly chooses one of two possible states.

Hidden/contextual neurons learn to recognize machine states, not only labels.

Behavior modeling: the same observations but different internal states => different decisions and next states.

Project fsa.proj.gz, chapt. 6.6.3

EE14127

Temporal delay and reinforcement Temporal delay and reinforcement The reward (reinforcement) often follows with a delay eg. learning a game, behavioral strategies.

Idea: we have to foresee sufficiently earlywhat events lead to a reward. This is done by the temporal differences algorithm.(Temporal Differences TD - Sutton).From where does a reward come in the brain?

The midbrain dopaminergic system modulates the activity of the basal ganglia (BG) through the substantia nigra (SN), and the frontal cortex through the ventral tegmental area (VTA). It's a rather complicated system, whose actions are related to the evaluation of impulses/actions from the point of view of value and reward.

EE14128

Temporal delay and reinforcementTemporal delay and reinforcement

The ventral tegmental area (VTA) is part of the reward system.

VTA neurons deliver the neurotransmitter dopamine (DA) to the frontal lobes and the basal ganglia modulating learning in this area responsible for planning and action.

More advanced regions of the brain are responsible for producing this global learning signal

Studies of patients with damage in the VTA area indicate its role in predicting reward and punishment.

EE14129

Anticipation of reward and resultAnticipation of reward and result Anticipation of reward and resultAnticipation of reward and result

Anticipation of reward and reaction on the decision (Knutson et al, 2001)

EE14130

Basal gangliaBasal ganglia BGBGVTA neurons first learn to react to reward and then to predict ahead of time the appearance of a reward.

EE14131

Formulation sketchFormulation sketch –TD algorithm –TD algorithmWe need to determine a value function, the sum after all future rewards, the further away in time the less important:

The adaptive critic AC learns how to estimate the value function V(t).At every point in time, AC tries to predict the value of the reward

This can be done recursively:

Error of the predicted reward:

The network tries to reduce this error.The name of the algorithm – TD (temporal difference) represents the error in the calculation of the value function during a period of time

EE14132

Network implementationNetwork implementationPrediction of activity and error.

Conditioned stimulusCS for t=2Unconditioned stimulus(reward)US for t=16rl_cond.proj.gz

Initially large error for Time=16 because the reward r(16) is unexpected

Adaptive critic AC

EE14133

Two-phase implementationTwo-phase implementation(Phase +) computes the expected size of the reward over time t+1 (value r).

(Phase –) in step t-k predicts t-k+1, at the end r(tk).

The function value V(t+1) in phase + is carried over to value V(t) in phase -

Learning progresses backwards in time affecting the value of the previous step

CS for t=2

US for t=16

)1(ˆ1

)1(ˆ tVtV

EE14134

Two-phase implementationTwo-phase implementationThe system learns that stimulants (tone) predicts the reward

Input CSC – Complete Serial Compound, uses unique elements for each stimulus for each point in time.

This is not a very realistic model of classical conditioning.

Chapt. 6.7.3, proj. rl_cond.proj.gz

Documents

EE141 1 Universal Learning Models Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University