Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Closed Loop Temporal Sequence LearningLearning to act in response to

sequences of sensor events

Machine Learning Classical Conditioning Synaptic Plasticity

Dynamic Prog.(Bellman Eq.)

REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based

d-Rule

Monte CarloControl

Q-Learning

TD( )often =0

TD(1) TD(0)

Rescorla/Wagner

Neur.TD-Models(“Critic”)

Neur.TD-formalism

DifferentialHebb-Rule

(”fast”)

STDP-Modelsbiophysical & network

EVALUATIVE FEEDBACK (Rewards)

NON-EVALUATIVE FEEDBACK (Correlations)

Correlationbased Control

(non-evaluative)

ISO-Learning

ISO-Modelof STDP

Actor/Critictechnical & Basal Gangl.

Eligibility Traces

Hebb-Rule

DifferentialHebb-Rule

(”slow”)

supervised L.

Anticipatory Control of Actions and Prediction of Values Correlation of Signals

Neuronal Reward Systems(Basal Ganglia)

Biophys. of Syn. PlasticityDopamine Glutamate

LTP(LTD=anti)

ISO-Control

Overview over different methods

You are here !

Real Life• Heat radiation predicts pain when touching a hot surface.• Sound of a prey may precede its smell will precede its

taste.

Control• Force precedes position change.• Electrical (disturbance) pulse may precede change in a

controlled plant.

Psychological Experiment• Pavlovian and/or Operant Conditioning: Bell precedes

Natural Temporal Sequences in life and

in “control” situations

Early: “Bell”

Late: “Food”

)( )( )( tytutdt

What we had so far !

This is an open-loopsystem

Sensor 2

conditionedInput

Bell Food Salivation

Pavlov, 1927

Temporal Sequence

This is an Open Loop System !

Adaptable

Neuron

Closed loop

Sensing Behaving

How to assure behavioral & learning convergence ??

This is achieved by starting with a stable reflex-like actionand learning to supercede it by an anticipatory action.

Remove beforebeing hit !

Robot Application

Early: “Vision”

Late: “Bump”

C o n tro lle rC o n tro lled

S ystemC o n tro lS ign a ls

F e edb ack

D is tu rba ncesS e t-P o in t

Reflex Only

(Compare to an electronic closed loop controller!)

This structure assures initial (behavioral) stability (“homeostasis”)

Think of a Thermostat !

This is a closed-loop

system before learning

The Basic Control StructureSchematic diagram of A pure reflex loop

Retraction

reflex

Learning Goal:Correlate the vision signalswith the touch signals andnavigate without collisions.

Initially built-in behavior:Retraction reaction whenever an obstacle is touched.

Robot Application

Lets look at the Reflex first

This is an open-loopsystem

system during learning

Closed LoopTS Learning

The T represents the temporal delay between vision and bump.

Lets see how learning continues

Closed LoopTS Learning

Antomically this loop still

exists but ideally it sh

never be active again !

This is th

e system

after l

earning

What has the learning done ?

Elimination of the late reflex input corresponds mathematically to the condition:

This was the major condition of convergence for the ISO/ICO rules. (The other one was T=0).

What’s interesting about this is that the learning-induced behavior self-generates this condition. This is non-trivial as it assures the synaptic weight AND behavior stabilize at exactly the same moment in time.

This is t

he syste

m after

This is the systemafter learning

What has happened inengineering terms?

The inner pathway has now become a purefeed-forward path

Formally:

X 0 = P0[V+ Deà sT]

X 1 = 1à P1P01HV

P1(D+P01X 0H0)

1à P1P01HV

P1(D+P01X 0H0)+ HV

X 0 = eà sTD

Delta pulse Disturbance DCalculations in the Laplace Domain!

HV = à Pà 11 eà sT

1à P01eàsT1

HV = à Pà 11 eà sT

1à P01eàsT1

Phase Shift

The Learner has learned the inverse transfer function of the world and can compensate the disturbance therefore at the summation node!

The out path has, through learning, built a Forward Model of the inner path.

This is some kind of a “holy grail” for an engineer called:

“Model Free Feed-Forward Compensation”

For example: If you want to keep the temperature in an outdoors container constant, you can employ a thermostat.

You may however also first measure how the sun warms the liquid in the container, relating outside temperature to inside temperature (“Eichkurve”).

From this curve you can extract a control signal for the cooling of the liquid and react BEFORE the liquid inside starts warming up.

As you react BEFORE, this is called Feed-Forward-Compensation (FFC).

As you have used a curve, have performed Model-Based FFC.

Our robot learns the model, hence we do Model Free FFC.

Example of ISO instability as compared to ICO stability !

Remember the Auto-Correlation Instability of ISO??

= u1 v’d

= u1 u0’d

o SetpointLevel

D isturbanceH eat-Pulse

40 sec

Tem perature C om pensationbefore & a fter

IC O -Learning

C ontro l Param eters

Setpoint: 45o

C old w ateradded

17000 sec

Old ISO-Learning

200 400 600 800

Se tp o int 45Pulse 12 se cle a rnra te 7 .5E-07g a in 150nfilte rs 15Flo w o f wa te r: 750 m l in 60 se cWa te r te m p : 17 d e g .C o ntro l sa m e a s in ic o g ut08C o ntro l sho ws tha t se tp o int is e xtre m ly ha rd to re a c h,le a rning he lp s a lso he re a lo t

C o ntro l

New ICO-Learning

0 500 1000 1500 2000 2500 3000 3500

640 500 1000 1500 2000 2500

Se tp o int 60Pulse 10 se cle a rnra te 4E-08g a in 250Flo w o f wa te r: 400 m l/m inWa te r te m p : 17 d e g .

Se tp o int 70Pulse 20 se cle a rnra te 4E-07g a in 250Flo w o f wa te r: 750 m l in 60 se cWa te r te m p : 17 d e g .C o m m e nt: O NE SHO T Le a rning

C o ntro l

Full compensation

One-shot learning

Over compensation

Statistical Analysis:Conventional differential hebbian learning

One-shot learning(some instances)

Highly unstable

Stable in a wide domain

One-shot learning(common)

Statistical Analysis:Input Correlation Learning

Stable learning possiblewith about 10-100 foldhigher learning rate

Two more applications: RunBot

With different convergent conditions!

Walking is a very difficult problem as such and requires a good, multi-layered control structure only on top of which learning can take place.

An excursion into Multi-Layered Neural Network Control

Some statements:

• Animals are behaving agents. They receive sensor inputs from MANY sensors and they have to control the reaction of MANY muscles.

• This is a terrific problem and solved by out brain through many control layers.

• Attempts to achieve this with robots have so far been not very successful as network control is somewhat fuzzy and unpredictable.

Central Control(Brain)

Spinal Cord(Oscillations )

Motorneurons & Sensors(Reflex generation)

Muscles & Skeleton(Biomechanics)

Muscle length

Ground-contact

Step controlTerrain control

Adaptive Control during WalikingThree Loops

Self-Stabilization by passiveBiomechanical Properties

(Pendelum)

- 3- 6

Passive Walking Properties

Motor neurons & Sensors(Reflex generation)

Muscle length

RunBot’s Network of the lowest loop

ES FSAL FS ESAR

FMEM EMFM

LeftHip Motor RightHip Motor

LeftKnee Motor RightKnee Motor

Motor neuron Sensor neuron/

receptorExcitatorysynapse

Inhibitorysynapse

N NN N

Hip Motor M Hip Motor MKnee Motor M Knee Motor M

Leg Control

-10 10

10 -10-10

-30-30 10 10-10 -15 15

NN N FEF

10 -10

-30-30

-30 10 10-10 -15

-10 10

Left Leg Right Leg

-30 15

SS S S-30

Hip Knee KneeHip

Leg Control of RunBotReflexive Control (cf. Cruse)

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13)

1 2 3 4 t 5 6 7 8Time s ( )( )B

Instantaneous Parameter Switching

Body (UBC) Control of RunBot

o toB dy Mo r M

Body Control

Terrain control

Long-Loop Reflex of one of the upper loops

Before or during a fall:Leaning forward of rumpand arms

Terrain control

Long-Loop Reflexe der obersten Schleife

Forward leaningUBC

Backward leaningUBC

Motor neuron Sensor neuron/

receptorExcitatorysynapse

Inhibitorysynapse

N NN N

Hip Motor M Hip Motor MKnee Motor M Knee Motor M

Leg Control

-10 10

10 -10-10

-30-30 10 10-10 -15 15

NN N FEF

10 -10

-30-30

-30 10 10-10 -15

-10 10

Left Leg Right Leg

-30 15

SS S S-30

Hip Knee KneeHip

The Leg Control as Target of the Learning

Left, R ight

Leg contro l

Body control

Learner neurons

Sensor

changeab le synapse

Excitatory synapse

Inhibito ry synapse

Inhibito ry synapse(shunting, divisive)

Target neurons

Learning in RunBot

L6 ρ60 = 1

Differential HebbianLearning related toSpike-Timing DependentPlasticity

RunBot: Learning to climb a slope

Spektrum der Wissenschaft, Sept. 06

Human walking versus machine walking

ASIMO by Honda

ngle (d

Lower floor

Upper floor

Ramp Falls

Manages

RunBot’sJoints

Left LegRight Leg

StanceSwing

(1) (2) (3)

Change of Gait when walking upwards

r Signa

ody Com

Time (s)4 8 12 16 20 24 28

Too late Early enough

X0=0: Weight stabilize togetherwith the behaviourLearning relevant parameters

AL, (AR) Stretch receptor for anterior angle of left (right) hip; GL, (GR) Sensor neuron for ground contact of left (right) foot; EI (FI) Extensor (Flexor) reflex inter-neuron, EM (FM) Extensor (Flexor) reflex motor-neuron; ES (FS) Extensor (Flexor) reflex sensor neuron

A different Learning Control Circuitry

FMEM EMFM

Left Hip Motor Right Hip Motor

Left Knee Motor Right Knee Motor

ES FSAL FS ESAR

Sensor Neuron orReceptor

Learning-ControlNeuron L,Interneuron I

Motor neuron-

InhibitorySynapse(subtractive)

InhibitorySynapse(shunting,divisive)

ExcitatorySynapse

X 0 LL

Motor Neuron Signal Structure (spike rate)

ESorEM

STDP .

Left Leg

Right Leg

Learning reverses

pulse sequence

Stability of STDP

Motor Neuron Signal Structure (spike rate)

ESorEM

STDP .

Left Leg

Right Leg

Learning reverses

pulse sequence

Stability of STDP

The actual ExperimentNote: Here our convergence condition has been T=0(oscillatory!) and NOT as usual X0=0

0 020 2040 4060 60

ρ1 ESQ

t [s] t [s]

Speed Change by Learning

Note th

Oscill

T ≈ 0 !

Stable learning possiblewith about 10-100 foldhigher learning rate

= u1 v’d

= u1 u0’d

Driving School: Learn to follow a trackand develop RFs in a closed loop behavioral context

Central Features: • Filtering of all inputs with low-pass filters

(membrane property)• Reproduces asymmetric weight change curve

)( )( )( tytutdt

Derivative of the Output

Filtered Input

ISO-learning for temporal sequence learning:(Porr and Wörgötter, 2003)

Differential Hebbian learning used for STDP

)( )( )( tytutdt

Differential Hebbian Learning rule:

Transfer of a computational neuroscience approach to a technical domain.

Modelling STDP (plasticity) and using this approach (in a simplified way) to control

behavioral learning.

An ecological approach to some aspects of

theoretical neuroscience

system during learning

Adding a secondary loop

This delay constitutes the temporal

sequence of sensor events !

Pavlov in“real life” !

ControllerControlled

SystemControlSignals

Feedback

DisturbancesSet-Point

X0X1early late

What has happened during learningto the system ?

The primary reflex re-action has effectively been eliminatedand replaced by an anticipatory action

Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Documents

Clustering the Temporal Sequences of 3D Protein Structure

Unsupervised Structure Discovery for Multimedia Sequences · Summary Statistical models for pattern discovery Unsupervised learning of temporal patterns with hierarchical HMM Multi-modal

Learning Sentence-internal Temporal Relations

Temporal Segmentation of Video Sequences for Content-Based …sibgrapi.sid.inpe.br/col/sid.inpe.br/banon/2002/10.24.11... · 2002-08-29 · Temporal Segmentation of Video Sequences

Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences

Working Memory for Sequences of Temporal Durations Reveals ...€¦ · Working Memory for Sequences of Temporal Durations Reveals a Volatile Single-Item Store Sanjay G. Manohar* and

Learning Sequences with Web2.0

Reinforcement Learning Lecture Temporal Difference Learning

ActionVLAD: Learning Spatio-Temporal Aggregation for ...openaccess.thecvf.com/content_cvpr_2017/.../Girdhar_ActionVLAD_Learning... · ActionVLAD: Learning spatio-temporal aggregation

Preferential Temporal Difference Learning

Learning Spatio-Temporal Representation With Pseudo-3D ...openaccess.thecvf.com/content_ICCV_2017/papers/Qiu_Learning_Spatio-Temporal... · Learning Spatio-Temporal Representation

Spatio-Temporal Free-Form Registration of Cardiac MR Image Sequences

NEUROSCIENCE Copyright © 2021 Learning hierarchical ...Feb 19, 2021 · auditory and visual sequences containing temporal regularities. We find neural tracking of regularities within

Chapter 6: Temporal Difference Learning

INDUCTIVE REPRESENTATION LEARNING ON TEMPORAL …

Experiences with Mining Temporal Event Sequences …people.cs.vt.edu/naren/papers/ind0147-patnaik.pdfExperiences with Mining Temporal Event Sequences from Electronic Medical Records:

Classi cation of Time Sequences using Graphs of Temporal

Learning-Based Methods for Comparing Sequences, with ... · Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching Colin Ra el Sequences

Modeling Time Series and Sequences: Learning

Learning Temporal Regularity in Video Sequences › openaccess › content... · ingless scenes [1]. Automatic segmentation of ‘meaning-ful’ moments in such videos without supervision