Computational Neuromodulation

Computational Neuromodulation

Peter Dayan Gatsby Computational Neuroscience Unit

University College London

Nathaniel Daw Sham Kakade Read Montague

John O’Doherty Wolfram Schultz Ben Seymour

Terry Sejnowski Angela Yu

2

5. Diseases of the Will

• Contemplators• Bibliophiles and Polyglots • Megalomaniacs• Instrument addicts• Misfits

• Theorists

3

There are highly cultivated, wonderfully endowed minds whose wills suffer from a particular form of lethargy. Its undeniable symptoms include a facility for exposition, a creative and restless imagination, an aversion to the laboratory, and an indomitable dislike for concrete science and seemingly unimportant data… When faced with a difficult problem, they feel an irresistible urge to formulate a theory rather than question nature.

As might be expected, disappointments plague the theorist…

Theorists

4

Computation and the Brain• statistical computations

– representation from density estimation (Terry)– combining uncertain information over space,

time, modalities for sensory/memory inference– learning as a hierarchical Bayesian problem– learning as a filtering problem

• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis

5

Conditioning

• Ethology

• Psychology– classical/operant conditioning

• Computation– dynamic programming– Kalman filtering

• Algorithm– TD/delta rules

• Neurobiology

neuromodulators;

amygdala; OFC; nucleus accumbens; dorsal striatum

prediction: of important events

control: in the light of those predictions

policy evaluation

policy improvement

6

Dopamine

no prediction prediction, reward prediction, no reward

RR

LSchultz et al R L R

• drug addiction, self-stimulation

• effect of antagonists

• effect on vigour

• link to action

• `scalar’ signal

7

Prediction, but What Sort?

• Sutton: predict sum future reward

s tV(t)= r(s)

s t+1=r(t)+ r(s) =r(t)+V(t+1)

(t)=r(t)+V(t+1)- V(t)TD error

8

Rewards rather than Punishments



V(t)

R

RL

dopamine cells in VTA/SNc Schultz et al

9

Prediction, but What Sort?

• Sutton:

• Watkins: policy evaluation

predict sum future reward

s tV(t)= r(s)

s t+1=r(t)+ r(s) =r(t)+V(t+1)


~ ( )

V( ) ( , ) ( )V( )xyy a x

x E r x a P a y

10

Policy Improvement

• Sutton: define (x;M) do R-M on:

uses the same TD error

• Watkins: value iteration with

~ ( ; )

( , ) ( )V( ) ( )xyy a x M

E r x a P a y V x

(t)

( , )Q x a

* *( , ) ( , ) ( )max ( , )xy by

Q x a r x a P a Q y b

Q b(t)=r(t)+max Q(t+1,b) - Q(t,a)

11

Active Issues

• exploration/exploitation• model-based (PFC)/cached (striatal) methods• motivational influences• vigour• hierarchical control (PFC)• hyperbolic discounting, Pavlovian misbehavior

and ‘the will’• representational learning• appetitive/aversive opponency• links with behavioural economics

12




• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis– exploration/exploitation trade-offs

13

Uncertainty

Computational functions of uncertainty:

weaken top-down influence over sensory processing

promote learning about the relevant representations

expected uncertainty from known variability or ignorance

We focus on two different kinds of uncertainties:

unexpected uncertainty due to gross mismatch between prediction and observation

ACh

NE

14

Norepinephrine

• vigilance

• reversals

• modulates plasticity? exploration?

• scalar

15

Aston-Jones: Target Detectiondetect and react to a rare target amongst common distractors

• elevated tonic activity for reversal• activated by rare target (and reverses)• not reward/stimulus related? more response related?

16

Vigilance Task

• variable time in start• η controls confusability

• one single run• cumulative is clearer

• exact inference• effect of 80% prior

18

Phasic NE

• onset response from timing uncertainty (SET)

• growth as P(target)/0.2 rises

• act when P(target)=0.95

• stop if P(target)=0.01

• arbitrarily set NE=0 after 5 timesteps(small prob of reflexive action)

19

Four Types of Trial

19%

1.5%

1%

77%

fall is rather arbitrary

20

Response Locking

slightly flatters the model – since no furtherresponse variability

21

Interrupts/Resets (SB)

LC

PFC/ACC

22

Active Issues

• approximate inference strategy• interaction with expected

uncertainty (ACh)• other representations of

uncertainty• finer gradations of ignorance

23




• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis– exploration/exploitation trade-offs

24

• general: excitability, signal/noise ratios

• specific: prediction errors, uncertainty signals

Computational Neuromodulation

25

Learning and Inference

• Learning: predict; control

∆ weight (learning rate) x (error) x (stimulus)

– dopaminephasic prediction error for future reward

– serotoninphasic prediction error for future punishment

– acetylcholineexpected uncertainty boosts learning

– norepinephrineunexpected uncertainty boosts learning

26

Learning and Inference

z

x

ACh

expecteduncertainty

top-downprocessing

bottom-upprocessing

sensory inputs

cortical processing

context

NE

unexpecteduncertainty

prediction, learning, ...

y

27

HighPain

LowPain

0.8 1.0

0.8 1.0

0.2

0.2

Temporal Difference Prediction Error

predict sum future pain:

s tV(t)= r(s)

s t+1=r(t)+ r(s) =r(t)+V(t+1)


∆ weight (learning rate) x (error) x (stimulus)

28

HighPain

LowPain

0.8 1.0

0.8 1.0

0.2

0.2

Prediction error



Value

29

TD model

?

A – B – HIGH C – D – LOW C – B – HIGH A – B – HIGH A – D – LOW C – D – LOW A – B – HIGH A – B – HIGH C – D – LOW C – B – HIGH

Brain responsesPrediction error

experimental sequence…..

MR scanner

Ben Seymour; John O’Doherty


30

TD prediction error:

ventral striatum

Z=-4 R

31

Temporal Difference Values

right anterior insula dorsal raphe?

32

Rewards rather than Punishments



V(t)

R

RL

dopamine cells in VTA/SNc Schultz et al

33

TD Prediction Errors

• computation: dynamic programming and optimal control

• algorithm: ongoing error in predictions of the future

• implementation:– dopamine: phasic prediction error for reward;

tonic punishment– serotonin: phasic prediction error for punishment;

tonic reward

• evident in VTA; striatum; raphe?

• next: action; motivation; addiction; misbehavior

35

Task Difficulty

• set η=0.65 rather than 0.675• information accumulates over a longer period• hits more affected than cr’s• timing not quite right

36

Intra-trial Uncertainty

• phasic NE as unexpected state change within a model

• relative to prior probability; against default

• interrupts (resets) ongoing processing

• tie to ADHD?

• close to alerting (AJ) – but not necessarily tied to behavioral output (onset rise)

• close to behavioural switching (PR) – but not DA

• farther from optimal inference (EB)

• phasic ACh: aspects of known variability within a state?

37

Where Next

• dopamine– tonic release and vigour– appetitive misbehaviour and hyperbolic

discounting– actions and habits– psychosis

• serotonin– aversive misbehaviour and psychiatry

• norepinephrine– stress, depression and beyond

38

ACh & NE have distinct behavioral effects:

• ACh boosts learning to stimuli with uncertain consequences

• NE boosts learning upon encountering global changes in the environment

(e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998)

ACh & NE have similar physiological effects

• suppress recurrent & feedback processing

• enhance thalamocortical transmission

• boost experience-dependent plasticity(e.g. Gil et al, 1997)

(e.g. Kimura et al, 1995; Kobayashi et al, 2000)

Experimental Data

(e.g. Bucci, Holland, & Gallagher, 1998)

(e.g. Devauges & Sara, 1990)

39

Model Schematics

z

x

ACh

expecteduncertainty

top-downprocessing

bottom-upprocessing

sensory inputs

cortical processing

context

NE

unexpecteduncertainty

prediction, learning, ...

y

40

Attentionattentional selection for (statistically) optimal processing,above and beyond the traditional view of resource constraint

sensoryinput

Example 1: Posner’s Task

stimulus

location

cue

sensoryinput

cue

highvalidity

lowvalidity

stimulus

location

(Phillips, McAlonan, Robb, & Brown, 2000)

cue

target

response

0.2-0.5s

0.1s

0.1s

0.15s

generalize to the case that cue identity changes with no notice

41

Formal Framework

cues: vestibular, visual, ... 4c3c2c1c

Starget: stimulus location, exit direction...

variability in quality of relevant cuevariability in identity of relevant cue

AChNE

Sensory Informationavoid representing

full uncertainty

t1 t1

it ttt DP )|(*

1

1)|(*

hDijP ttt

42

Simulation Results: Posner’s Task

increase ACh

valid

ity e

ffect

% normal level

100 120 140

decrease ACh

% normal level

100 80 60

VE (1- )(NE 1-ACh)

3c2c1c

S

vary cue validity vary ACh

fix relevant cue low NE

nicotine

valid

ity e

ffect

concentration concentration

scopolamine

(Phillips, McAlonan, Robb, & Brown, 2000)

43

Maze Task

example 2: attentional shift

reward

cue 1

cue 2

reward

cue 1

cue 2

relevant irrelevant

irrelevant relevant

(Devauges & Sara, 1990)

no issue of validity

44

Simulation Results: Maze Navigation

3c2c1c

S

fix cue validity no explicit manipulation of ACh

change relevant cue NE

% R

ats

reach

ing c

rite

rion

No. days after shift from spatial to visual task

% R

ats

reach

ing c

rite

rion

No. days after shift from spatial to visual task

experimental data model data

(Devauges & Sara, 1990)

45

Simulation Results: Full Modeltrue & estimated relevant stimuli

neuromodulation in action

trials

validity effect (VE)

46

Simulated Psychopharmacology

50% NE

50% ACh/NE

AChcompensation

NE cannearly catchup

47

Summary

• single framework for understanding ACh, NE and some aspects of attention

• ACh/NE as expected/unexpected uncertainty signals

• experimental psychopharmacological data replicated by model simulations

• implications from complex interactions between ACh & NE

• predictions at the cellular, systems, and behavioral levels

• activity vs weight vs neuromodulatory vs population representations of uncertainty

Documents

Computational Neuromodulation