Upload
didier
View
41
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Computational Neuromodulation. Peter Dayan Gatsby Computational Neuroscience Unit University College London. Nathaniel Daw Sham Kakade Read Montague John O’Doherty Wolfram Schultz Ben Seymour Terry Sejnowski Angela Yu. 5. Diseases of the Will Contemplators - PowerPoint PPT Presentation
Citation preview
Computational Neuromodulation
Peter Dayan Gatsby Computational Neuroscience Unit
University College London
Nathaniel Daw Sham Kakade Read Montague
John O’Doherty Wolfram Schultz Ben Seymour
Terry Sejnowski Angela Yu
2
5. Diseases of the Will
• Contemplators• Bibliophiles and Polyglots • Megalomaniacs• Instrument addicts• Misfits
• Theorists
3
There are highly cultivated, wonderfully endowed minds whose wills suffer from a particular form of lethargy. Its undeniable symptoms include a facility for exposition, a creative and restless imagination, an aversion to the laboratory, and an indomitable dislike for concrete science and seemingly unimportant data… When faced with a difficult problem, they feel an irresistible urge to formulate a theory rather than question nature.
As might be expected, disappointments plague the theorist…
Theorists
4
Computation and the Brain• statistical computations
– representation from density estimation (Terry)– combining uncertain information over space,
time, modalities for sensory/memory inference– learning as a hierarchical Bayesian problem– learning as a filtering problem
• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis
5
Conditioning
• Ethology
• Psychology– classical/operant conditioning
• Computation– dynamic programming– Kalman filtering
• Algorithm– TD/delta rules
• Neurobiology
neuromodulators;
amygdala; OFC; nucleus accumbens; dorsal striatum
prediction: of important events
control: in the light of those predictions
policy evaluation
policy improvement
6
Dopamine
no prediction prediction, reward prediction, no reward
RR
LSchultz et al R L R
• drug addiction, self-stimulation
• effect of antagonists
• effect on vigour
• link to action
• `scalar’ signal
7
Prediction, but What Sort?
• Sutton: predict sum future reward
s tV(t)= r(s)
s t+1=r(t)+ r(s) =r(t)+V(t+1)
(t)=r(t)+V(t+1)- V(t)TD error
8
Rewards rather than Punishments
no prediction prediction, reward prediction, no reward
(t)=r(t)+V(t+1)- V(t)TD error
V(t)
R
RL
dopamine cells in VTA/SNc Schultz et al
9
Prediction, but What Sort?
• Sutton:
• Watkins: policy evaluation
predict sum future reward
s tV(t)= r(s)
s t+1=r(t)+ r(s) =r(t)+V(t+1)
(t)=r(t)+V(t+1)- V(t)TD error
~ ( )
V( ) ( , ) ( )V( )xyy a x
x E r x a P a y
10
Policy Improvement
• Sutton: define (x;M) do R-M on:
uses the same TD error
• Watkins: value iteration with
~ ( ; )
( , ) ( )V( ) ( )xyy a x M
E r x a P a y V x
(t)
( , )Q x a
* *( , ) ( , ) ( )max ( , )xy by
Q x a r x a P a Q y b
Q b(t)=r(t)+max Q(t+1,b) - Q(t,a)
11
Active Issues
• exploration/exploitation• model-based (PFC)/cached (striatal) methods• motivational influences• vigour• hierarchical control (PFC)• hyperbolic discounting, Pavlovian misbehavior
and ‘the will’• representational learning• appetitive/aversive opponency• links with behavioural economics
12
Computation and the Brain• statistical computations
– representation from density estimation (Terry)– combining uncertain information over space,
time, modalities for sensory/memory inference– learning as a hierarchical Bayesian problem– learning as a filtering problem
• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis– exploration/exploitation trade-offs
13
Uncertainty
Computational functions of uncertainty:
weaken top-down influence over sensory processing
promote learning about the relevant representations
expected uncertainty from known variability or ignorance
We focus on two different kinds of uncertainties:
unexpected uncertainty due to gross mismatch between prediction and observation
ACh
NE
14
Norepinephrine
• vigilance
• reversals
• modulates plasticity? exploration?
• scalar
15
Aston-Jones: Target Detectiondetect and react to a rare target amongst common distractors
• elevated tonic activity for reversal• activated by rare target (and reverses)• not reward/stimulus related? more response related?
16
Vigilance Task
• variable time in start• η controls confusability
• one single run• cumulative is clearer
• exact inference• effect of 80% prior
18
Phasic NE
• onset response from timing uncertainty (SET)
• growth as P(target)/0.2 rises
• act when P(target)=0.95
• stop if P(target)=0.01
• arbitrarily set NE=0 after 5 timesteps(small prob of reflexive action)
19
Four Types of Trial
19%
1.5%
1%
77%
fall is rather arbitrary
20
Response Locking
slightly flatters the model – since no furtherresponse variability
21
Interrupts/Resets (SB)
LC
PFC/ACC
22
Active Issues
• approximate inference strategy• interaction with expected
uncertainty (ACh)• other representations of
uncertainty• finer gradations of ignorance
23
Computation and the Brain• statistical computations
– representation from density estimation (Terry)– combining uncertain information over space,
time, modalities for sensory/memory inference– learning as a hierarchical Bayesian problem– learning as a filtering problem
• control theoretic computations– optimising rewards, punishments– homeostasis/allostasis– exploration/exploitation trade-offs
24
• general: excitability, signal/noise ratios
• specific: prediction errors, uncertainty signals
Computational Neuromodulation
25
Learning and Inference
• Learning: predict; control
∆ weight (learning rate) x (error) x (stimulus)
– dopaminephasic prediction error for future reward
– serotoninphasic prediction error for future punishment
– acetylcholineexpected uncertainty boosts learning
– norepinephrineunexpected uncertainty boosts learning
26
Learning and Inference
z
x
ACh
expecteduncertainty
top-downprocessing
bottom-upprocessing
sensory inputs
cortical processing
context
NE
unexpecteduncertainty
prediction, learning, ...
y
27
HighPain
LowPain
0.8 1.0
0.8 1.0
0.2
0.2
Temporal Difference Prediction Error
predict sum future pain:
s tV(t)= r(s)
s t+1=r(t)+ r(s) =r(t)+V(t+1)
(t)=r(t)+V(t+1)- V(t)TD error
∆ weight (learning rate) x (error) x (stimulus)
28
HighPain
LowPain
0.8 1.0
0.8 1.0
0.2
0.2
Prediction error
(t)=r(t)+V(t+1)- V(t)TD error
Temporal Difference Prediction Error
Value
29
TD model
?
A – B – HIGH C – D – LOW C – B – HIGH A – B – HIGH A – D – LOW C – D – LOW A – B – HIGH A – B – HIGH C – D – LOW C – B – HIGH
Brain responsesPrediction error
experimental sequence…..
MR scanner
Ben Seymour; John O’Doherty
Temporal Difference Prediction Error
30
TD prediction error:
ventral striatum
Z=-4 R
31
Temporal Difference Values
right anterior insula dorsal raphe?
32
Rewards rather than Punishments
no prediction prediction, reward prediction, no reward
(t)=r(t)+V(t+1)- V(t)TD error
V(t)
R
RL
dopamine cells in VTA/SNc Schultz et al
33
TD Prediction Errors
• computation: dynamic programming and optimal control
• algorithm: ongoing error in predictions of the future
• implementation:– dopamine: phasic prediction error for reward;
tonic punishment– serotonin: phasic prediction error for punishment;
tonic reward
• evident in VTA; striatum; raphe?
• next: action; motivation; addiction; misbehavior
35
Task Difficulty
• set η=0.65 rather than 0.675• information accumulates over a longer period• hits more affected than cr’s• timing not quite right
36
Intra-trial Uncertainty
• phasic NE as unexpected state change within a model
• relative to prior probability; against default
• interrupts (resets) ongoing processing
• tie to ADHD?
• close to alerting (AJ) – but not necessarily tied to behavioral output (onset rise)
• close to behavioural switching (PR) – but not DA
• farther from optimal inference (EB)
• phasic ACh: aspects of known variability within a state?
37
Where Next
• dopamine– tonic release and vigour– appetitive misbehaviour and hyperbolic
discounting– actions and habits– psychosis
• serotonin– aversive misbehaviour and psychiatry
• norepinephrine– stress, depression and beyond
38
ACh & NE have distinct behavioral effects:
• ACh boosts learning to stimuli with uncertain consequences
• NE boosts learning upon encountering global changes in the environment
(e.g. Bear & Singer, 1986; Kilgard & Merzenich, 1998)
ACh & NE have similar physiological effects
• suppress recurrent & feedback processing
• enhance thalamocortical transmission
• boost experience-dependent plasticity(e.g. Gil et al, 1997)
(e.g. Kimura et al, 1995; Kobayashi et al, 2000)
Experimental Data
(e.g. Bucci, Holland, & Gallagher, 1998)
(e.g. Devauges & Sara, 1990)
39
Model Schematics
z
x
ACh
expecteduncertainty
top-downprocessing
bottom-upprocessing
sensory inputs
cortical processing
context
NE
unexpecteduncertainty
prediction, learning, ...
y
40
Attentionattentional selection for (statistically) optimal processing,above and beyond the traditional view of resource constraint
sensoryinput
Example 1: Posner’s Task
stimulus
location
cue
sensoryinput
cue
highvalidity
lowvalidity
stimulus
location
(Phillips, McAlonan, Robb, & Brown, 2000)
cue
target
response
0.2-0.5s
0.1s
0.1s
0.15s
generalize to the case that cue identity changes with no notice
41
Formal Framework
cues: vestibular, visual, ... 4c3c2c1c
Starget: stimulus location, exit direction...
variability in quality of relevant cuevariability in identity of relevant cue
AChNE
Sensory Informationavoid representing
full uncertainty
t1 t1
it ttt DP )|(*
1
1)|(*
hDijP ttt
42
Simulation Results: Posner’s Task
increase ACh
valid
ity e
ffect
% normal level
100 120 140
decrease ACh
% normal level
100 80 60
VE (1- )(NE 1-ACh)
3c2c1c
S
vary cue validity vary ACh
fix relevant cue low NE
nicotine
valid
ity e
ffect
concentration concentration
scopolamine
(Phillips, McAlonan, Robb, & Brown, 2000)
43
Maze Task
example 2: attentional shift
reward
cue 1
cue 2
reward
cue 1
cue 2
relevant irrelevant
irrelevant relevant
(Devauges & Sara, 1990)
no issue of validity
44
Simulation Results: Maze Navigation
3c2c1c
S
fix cue validity no explicit manipulation of ACh
change relevant cue NE
% R
ats
reach
ing c
rite
rion
No. days after shift from spatial to visual task
% R
ats
reach
ing c
rite
rion
No. days after shift from spatial to visual task
experimental data model data
(Devauges & Sara, 1990)
45
Simulation Results: Full Modeltrue & estimated relevant stimuli
neuromodulation in action
trials
validity effect (VE)
46
Simulated Psychopharmacology
50% NE
50% ACh/NE
AChcompensation
NE cannearly catchup
47
Summary
• single framework for understanding ACh, NE and some aspects of attention
• ACh/NE as expected/unexpected uncertainty signals
• experimental psychopharmacological data replicated by model simulations
• implications from complex interactions between ACh & NE
• predictions at the cellular, systems, and behavioral levels
• activity vs weight vs neuromodulatory vs population representations of uncertainty