59
Closed Loop Temporal Sequence Learni Learning to act in response to sequences of sensor events

Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Embed Size (px)

Citation preview

Page 1: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Closed Loop Temporal Sequence LearningLearning to act in response to

sequences of sensor events

Page 2: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Machine Learning Classical Conditioning Synaptic Plasticity

Dynamic Prog.(Bellman Eq.)

REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based

d-Rule

Monte CarloControl

Q-Learning

TD( )often =0

ll

TD(1) TD(0)

Rescorla/Wagner

Neur.TD-Models(“Critic”)

Neur.TD-formalism

DifferentialHebb-Rule

(”fast”)

STDP-Modelsbiophysical & network

EVALUATIVE FEEDBACK (Rewards)

NON-EVALUATIVE FEEDBACK (Correlations)

SARSA

Correlationbased Control

(non-evaluative)

ISO-Learning

ISO-Modelof STDP

Actor/Critictechnical & Basal Gangl.

Eligibility Traces

Hebb-Rule

DifferentialHebb-Rule

(”slow”)

supervised L.

Anticipatory Control of Actions and Prediction of Values Correlation of Signals

=

=

=

Neuronal Reward Systems(Basal Ganglia)

Biophys. of Syn. PlasticityDopamine Glutamate

STDP

LTP(LTD=anti)

ISO-Control

Overview over different methods

You are here !

Page 3: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Real Life• Heat radiation predicts pain when touching a hot surface.• Sound of a prey may precede its smell will precede its

taste.

Control• Force precedes position change.• Electrical (disturbance) pulse may precede change in a

controlled plant.

Psychological Experiment• Pavlovian and/or Operant Conditioning: Bell precedes

Food.

Natural Temporal Sequences in life and

in “control” situations

Page 4: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Early: “Bell”

Late: “Food”

x

)( )( )( tytutdt

dii

What we had so far !

Page 5: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

This is an open-loopsystem

Sensor 2

conditionedInput

Bell Food Salivation

Pavlov, 1927

Temporal Sequence

This is an Open Loop System !

Page 6: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Adaptable

Neuron

Env.

Closed loop

Sensing Behaving

Page 7: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

How to assure behavioral & learning convergence ??

This is achieved by starting with a stable reflex-like actionand learning to supercede it by an anticipatory action.

Remove beforebeing hit !

Page 8: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Robot Application

x

Early: “Vision”

Late: “Bump”

Page 9: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

C o n tro lle rC o n tro lled

S ystemC o n tro lS ign a ls

F e edb ack

D is tu rba ncesS e t-P o in t

X 0

Reflex Only

(Compare to an electronic closed loop controller!)

This structure assures initial (behavioral) stability (“homeostasis”)

Think of a Thermostat !

Page 10: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

This is a closed-loop

system before learning

The Basic Control StructureSchematic diagram of A pure reflex loop

Bump

Retraction

reflex

Page 11: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Learning Goal:Correlate the vision signalswith the touch signals andnavigate without collisions.

Initially built-in behavior:Retraction reaction whenever an obstacle is touched.

Robot Application

Page 12: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Lets look at the Reflex first

Page 13: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

This is an open-loopsystem

This is a closed-loop

system before learning

This is a closed-loop

system during learning

Closed LoopTS Learning

The T represents the temporal delay between vision and bump.

Page 14: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Lets see how learning continues

Page 15: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Closed LoopTS Learning

Antomically this loop still

exists but ideally it sh

ould

never be active again !

This is th

e system

after l

earning

Page 16: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

What has the learning done ?

Elimination of the late reflex input corresponds mathematically to the condition:

X0=0

This was the major condition of convergence for the ISO/ICO rules. (The other one was T=0).

What’s interesting about this is that the learning-induced behavior self-generates this condition. This is non-trivial as it assures the synaptic weight AND behavior stabilize at exactly the same moment in time.

Page 17: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

This is t

he syste

m after

learn

ing.

This is the systemafter learning

What has happened inengineering terms?

The inner pathway has now become a purefeed-forward path

Page 18: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Formally:

X 0 = P0[V+ Deà sT]

X 1 = 1à P1P01HV

P1(D+P01X 0H0)

1à P1P01HV

P1(D+P01X 0H0)+ HV

X 0 = eà sTD

Delta pulse Disturbance DCalculations in the Laplace Domain!

HV = à Pà 11 eà sT

1à P01eàsT1

Page 19: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

HV = à Pà 11 eà sT

1à P01eàsT1

Phase Shift

The Learner has learned the inverse transfer function of the world and can compensate the disturbance therefore at the summation node!

D

P1

-e-sT

e-sT

0

Page 20: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

The out path has, through learning, built a Forward Model of the inner path.

This is some kind of a “holy grail” for an engineer called:

“Model Free Feed-Forward Compensation”

For example: If you want to keep the temperature in an outdoors container constant, you can employ a thermostat.

You may however also first measure how the sun warms the liquid in the container, relating outside temperature to inside temperature (“Eichkurve”).

From this curve you can extract a control signal for the cooling of the liquid and react BEFORE the liquid inside starts warming up.

As you react BEFORE, this is called Feed-Forward-Compensation (FFC).

As you have used a curve, have performed Model-Based FFC.

Our robot learns the model, hence we do Model Free FFC.

Page 21: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Example of ISO instability as compared to ICO stability !

Remember the Auto-Correlation Instability of ISO??

= u1 v’d

dt

= u1 u0’d

dtICO

ISO

Page 22: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

1 C

o SetpointLevel

D isturbanceH eat-Pulse

40 sec

Tem perature C om pensationbefore & a fter

IC O -Learning

Page 23: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

C ontro l Param eters

41

42

43

44

45

46

47

48

49

50

51

Tem

per

atur

e [C

]

Setpoint: 45o

C old w ateradded

17000 sec

Old ISO-Learning

Page 24: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

0.0

0.1

-0.1

-0.2

042

43

44

45

46

47

48

49

50

200 400 600 800

Se tp o int 45Pulse 12 se cle a rnra te 7 .5E-07g a in 150nfilte rs 15Flo w o f wa te r: 750 m l in 60 se cWa te r te m p : 17 d e g .C o ntro l sa m e a s in ic o g ut08C o ntro l sho ws tha t se tp o int is e xtre m ly ha rd to re a c h,le a rning he lp s a lso he re a lo t

C o ntro l

New ICO-Learning

Page 25: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

-0.3

0.2

0.0

0.0

-0.4

-0.2

55

56

57

58

59

60

61

62

63

64

0 500 1000 1500 2000 2500 3000 3500

66

68

70

72

74

76

640 500 1000 1500 2000 2500

Se tp o int 60Pulse 10 se cle a rnra te 4E-08g a in 250Flo w o f wa te r: 400 m l/m inWa te r te m p : 17 d e g .

Se tp o int 70Pulse 20 se cle a rnra te 4E-07g a in 250Flo w o f wa te r: 750 m l in 60 se cWa te r te m p : 17 d e g .C o m m e nt: O NE SHO T Le a rning

C o ntro l

C o ntro l

Full compensation

One-shot learning

Over compensation

Page 26: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Statistical Analysis:Conventional differential hebbian learning

One-shot learning(some instances)

Highly unstable

Page 27: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Stable in a wide domain

One-shot learning(common)

Statistical Analysis:Input Correlation Learning

Page 28: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Stable learning possiblewith about 10-100 foldhigher learning rate

Page 29: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Two more applications: RunBot

With different convergent conditions!

Walking is a very difficult problem as such and requires a good, multi-layered control structure only on top of which learning can take place.

Page 30: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

An excursion into Multi-Layered Neural Network Control

Some statements:

• Animals are behaving agents. They receive sensor inputs from MANY sensors and they have to control the reaction of MANY muscles.

• This is a terrific problem and solved by out brain through many control layers.

• Attempts to achieve this with robots have so far been not very successful as network control is somewhat fuzzy and unpredictable.

Page 31: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Central Control(Brain)

Spinal Cord(Oscillations )

Motorneurons & Sensors(Reflex generation)

Muscles & Skeleton(Biomechanics)

Muscle length

Ground-contact

Step controlTerrain control

Adaptive Control during WalikingThree Loops

Page 32: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Muscles & Skeleton(Biomechanics)

Self-Stabilization by passiveBiomechanical Properties

(Pendelum)

Page 33: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

( )A

( )B

( )C

( )D

036

- 3- 6

036

- 3- 6

03

- 3

03

- 3

Passive Walking Properties

Page 34: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Motor neurons & Sensors(Reflex generation)

Muscles & Skeleton(Biomechanics)

Muscle length

RunBot’s Network of the lowest loop

FS

ES

GL GR

ES FSAL FS ESAR

FM

EM

ES

FS

EM

FM

FMEM EMFM

LeftHip Motor RightHip Motor

LeftKnee Motor RightKnee Motor

Page 35: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Motor neuron Sensor neuron/

receptorExcitatorysynapse

Inhibitorysynapse

N NN N

Hip Motor M Hip Motor MKnee Motor M Knee Motor M

S

EFEF

A

Leg Control

-10 10

10 -10-10

-1 1

-30

1 -1

-30-30 10 10-10 -15 15

G

NN N FEF

A

10 -10

-10

-1 1

-30-30

1 -1

-30 10 10-10 -15

G

NE

-10 10

Left Leg Right Leg

S S S

-30 15

SS S S-30

Hip Knee KneeHip

Leg Control of RunBotReflexive Control (cf. Cruse)

Page 36: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13)

1 2 3 4 t 5 6 7 8Time s ( )( )B

(A)

Instantaneous Parameter Switching

Page 37: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Body (UBC) Control of RunBot

N

o toB dy Mo r M

EF

Body Control

-1 1

UBC

Page 38: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Central Control(Brain)

Terrain control

Long-Loop Reflex of one of the upper loops

Before or during a fall:Leaning forward of rumpand arms

Page 39: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Central Control(Brain)

Terrain control

Long-Loop Reflexe der obersten Schleife

Forward leaningUBC

Backward leaningUBC

Page 40: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Motor neuron Sensor neuron/

receptorExcitatorysynapse

Inhibitorysynapse

N NN N

Hip Motor M Hip Motor MKnee Motor M Knee Motor M

S

EFEF

A

Leg Control

-10 10

10 -10-10

-1 1

-30

1 -1

-30-30 10 10-10 -15 15

G

NN N FEF

A

10 -10

-10

-1 1

-30-30

1 -1

-30 10 10-10 -15

G

NE

-10 10

Left Leg Right Leg

S S S

-30 15

SS S S-30

Hip Knee KneeHip

The Leg Control as Target of the Learning

Page 41: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

S

S

A

A

N

N

AS

Left, R ight

H ips

Knees

N

N

IR

Leg contro l

Body control

Learner neurons

Sensor

L1

L2

L3

L4

L5

L6

S

E

F

changeab le synapse

Excitatory synapse

Inhibito ry synapse

Inhibito ry synapse(shunting, divisive)

Target neurons

UBC

S

F

E

F

E

E

F

1

- 1

- 1

- 1

- 1

- 1

1

1

11

Learning in RunBot

d/dt

X

ρ6

AS

IR

L6 ρ60 = 1

1

u

u1

0

Differential HebbianLearning related toSpike-Timing DependentPlasticity

Page 42: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

RunBot: Learning to climb a slope

Spektrum der Wissenschaft, Sept. 06

Page 43: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Human walking versus machine walking

ASIMO by Honda

Page 44: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

60

90

120

150

Left

Hip J

oint

Ang

le (d

eg)

80

120

160

200

60

90

120

150

Rig

ht H

ip J

oint

Ang

le (d

eg)

Left

Knee

Joi

nt A

ngle (d

eg)

80

120

160

200

Rig

ht K

nee

Join

t Ang

le (d

eg)

A B

C D

Lower floor

Upper floor

Ramp Falls

Manages

RunBot’sJoints

Page 45: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Left LegRight Leg

Time

StanceSwing

(1) (2) (3)

Change of Gait when walking upwards

Page 46: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

-40

0

40

80

120

0.0

0.4

0.8

1.2

Acce

lero

met

er S

enso

r Signa

l

0.0

0.4

0.8

1.2

Infra

red

Sens

or S

igna

l

Upp

er B

ody Com

pone

nt (d

eg)

Time (s)4 8 12 16 20 24 28

0

5

10

15

Wei

ghts

Time (s)4 8 12 16 20 24 28

ρ3

ρ 6

ρ 2

ρ 5

ρ1

ρ 4

E F

G H1

1

1

1

1

1

,

Too late Early enough

X

X0=0: Weight stabilize togetherwith the behaviourLearning relevant parameters

Page 47: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

AL, (AR) Stretch receptor for anterior angle of left (right) hip; GL, (GR) Sensor neuron for ground contact of left (right) foot; EI (FI) Extensor (Flexor) reflex inter-neuron, EM (FM) Extensor (Flexor) reflex motor-neuron; ES (FS) Extensor (Flexor) reflex sensor neuron

A different Learning Control Circuitry

FMEM

ES FS

EMFM

FS ES

FMEM EMFM

GL GR

Left Hip Motor Right Hip Motor

Left Knee Motor Right Knee Motor

ES FSAL FS ESAR

Sensor Neuron orReceptor

Learning-ControlNeuron L,Interneuron I

Motor neuron-

InhibitorySynapse(subtractive)

InhibitorySynapse(shunting,divisive)

ExcitatorySynapse

r1t t

V

X 1

X 0 LL

II

M

Page 48: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Motor Neuron Signal Structure (spike rate)

X

X

r1

r 0

h

h

V1

0

1

0

ESorEM

j0

x0 x0

j1

x1 x1

STDP .

x0 x1

Left Leg

Right Leg

Learning reverses

pulse sequence

Stability of STDP

Page 49: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Motor Neuron Signal Structure (spike rate)

X

X

r1

r 0

h

h

V1

0

1

0

ESorEM

j0

x0 x0

j1

x1 x1

STDP .

x0 x1

Left Leg

Right Leg

Learning reverses

pulse sequence

Stability of STDP

Page 50: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

The actual ExperimentNote: Here our convergence condition has been T=0(oscillatory!) and NOT as usual X0=0

Page 51: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

0

1

2

3

50

60

70

80

90

90

95

100

105

110

115

1.8

2

2.2

2.4

2.6

2.8

3

0 020 2040 4060 60

Spe

ed (c

m/s

)

Gai

nT

hres

hold

(de

g)

Wei

ght

ρ1 ESQ

MG

A B

C D

t [s] t [s]

Speed Change by Learning

Note th

e

Oscill

atio

ns !

T ≈ 0 !

Page 52: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events
Page 53: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events
Page 54: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Stable learning possiblewith about 10-100 foldhigher learning rate

= u1 v’d

dt

= u1 u0’d

dt

Page 55: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Driving School: Learn to follow a trackand develop RFs in a closed loop behavioral context

Page 56: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Central Features: • Filtering of all inputs with low-pass filters

(membrane property)• Reproduces asymmetric weight change curve

)( )( )( tytutdt

dii

Derivative of the Output

Filtered Input

ISO-learning for temporal sequence learning:(Porr and Wörgötter, 2003)

T

Differential Hebbian learning used for STDP

Page 57: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

)( )( )( tytutdt

dii

Differential Hebbian Learning rule:

Transfer of a computational neuroscience approach to a technical domain.

Modelling STDP (plasticity) and using this approach (in a simplified way) to control

behavioral learning.

An ecological approach to some aspects of

theoretical neuroscience

Page 58: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

This is a closed-loop

system before learning

This is a closed-loop

system during learning

Adding a secondary loop

This delay constitutes the temporal

sequence of sensor events !

Pavlov in“real life” !

Page 59: Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

ControllerControlled

SystemControlSignals

Feedback

DisturbancesSet-Point

X0X1early late

What has happened during learningto the system ?

The primary reflex re-action has effectively been eliminatedand replaced by an anticipatory action