View
30
Download
0
Category
Tags:
Preview:
DESCRIPTION
Reward-related Neural Circuitry. Julie Fiez, Ph.D. Departments of Psychology & Neuroscience. Acknowledgements. Karin Cox Mauricio Delgado Corrine Durisko Mary Conway Kate Fissell Chris May Alison Moed Susan Ravizza Elizabeth Tricomi Steve Wilson. Bruce McCandliss James McClelland - PowerPoint PPT Presentation
Citation preview
Reward-related Neural Circuitry
Julie Fiez, Ph.D.
Departments of Psychology & Neuroscience
Acknowledgements
Karin CoxMauricio DelgadoCorrine DuriskoMary ConwayKate FissellChris May
Alison MoedSusan Ravizza
Elizabeth TricomiSteve Wilson
Bruce McCandlissJames McClelland
Athanassio ProtopapasMichael SayetteAndy Stegner
Dopamine Plays a Crucial Role in Reward-Related Processing
Dopamine neurons respond to unexpected rewards.
Animals will work for delivery of drugs that stimulate dopaminergic signalling.
Schultz et al. (1997). Science, 275:1593-1599
Dopamine neurons project into distinct fronto-striatal-thalamic loops
PFC
Dorsal Striatum(Caudate/Putamen)
Ventral Striatum(Nucleus
Accumbens)
THALAMUS
SNpc VTA
Orbitofrontal
Is Dopamine a “Pleasure” Signal?“Liking” vs. “Wanting”
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Cannon & Bseirki (2004). Physiol & Behav, 81:741-7428.
Does Dopamine Support the Development of Associations That Yield Increased Reward?
Even simple behaviors have multiple opportunities for “habit” formation:
Stimulus-outcome:consequences (feedback) may alter the value of neutral stimulus
Response-outcome:consequences may alter motor (and cognitive) activity
Stimulus-response-outcome:consequences may alter the relationship between a stimulus & a response
Stimulus-response:after learning, behavior may be no longer governed by outcomes
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
light -> lever press -> food deliverystim -> response -> outcome
The Dopamine Signal May be Ideal to Support Such Reinforcement Learning
Schultz & Montague(1997). Science, 275:1593-1599
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Egelman et al. (1998). J Cogn Neurosci, 10:623-30.
Do ventral & dorsal striatum support different aspects of
reinforcement learning?
Ito et al. (2002). J Neurosci, 22:6247-6253
Training: initial Pavlovian training CS+: light paired with drug delivery CS-: clicks presented non-contingently
-2nd order conditioning each lever press leads to light (CS+) delivery 10 lever presses earns drug delivery drug delivered after a fixed (20 min) interval
PFC
Dorsal Striatum(Caudate/Putamen)
Ventral Striatum(Nucleus
Accumbens)
THALAMUS
SNpc VTA
Orbitofrontal
(e.g., Elliott et al., 2004; O’Doherty et al., 2004; Robbins et al., 1992)
Emerging Issues for fMRI
• What striatal response properties are observed in humans?
• Are there dissociations between ventral vs. dorsal activity that converge with the animal literature?
• What insight might such dissociations provide into the nature of human reward-related processing?
Do striatal regions respond to the unpredictable delivery of reinforcers?
Schultz et al. (1997). Science, 275:1593-1599
Yes, especially at or near the nucleus accumbens:
Berns et al. (2001). J Neurosci, 21:1793-2798
Do striatal regions respond to delivery of unexpected monetary outcomes?
No significant differences between reward, punishment, and neutral trials were observed.
Left Nucleus Accumbens(x, y, z = -12, 8, 8)
1992
1994
1996
1998
2000
2002
2004
2006
2008
T1 T2 T3 T4 T5 T6 T7
Time Period
mean intensity value
punishneutralreward
How might we reconcile these findings?
• The study by Berns & colleagues involved the delivery of a primary reinforcer.
• Will delivery of an unexpected, conditioned cue activate the ventral striatum?
Schultz et al. (1997). Science, 275:1593-1599• The oddball study made use of an abstract, unconditioned cue (red or green arrow) to indicate gain or loss of a secondary reinforcer (delivered later).
Unexpected delivery of conditioned cues
Run 1
Run 2
…Runs separated by approximately 23 minutes
Notepad Golf ball
Tape (neutral) Cigarette
Run 1
Run 2
…Runs separated by approximately 23 minutes
Notepad Golf ball
Tape (neutral) Cigarette
• Male heavy smokers (at least 20 cigarettes/day)
• Participants abstained from smoking for 8 hours
• Compliance assessed by expired CO
• Three neutral and one conditioned cue exposure
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
0 10.5 21 31.5 42 52.5 63 73.5
Time (s)
Percent change
NAcc cig
NAcc neu
Caud cig
Caud neu
Interim Summary
Consistent with prior neurophysiological findings, the ventral striatum responds to the unexpected delivery of primary reinforcers and conditioned cues.
These findings support claims that the ventral striatum plays an integral role in reward-related signaling under normal conditions, and that it may contribute to pathological states such as addiction.
What about the dorsal striatum?
Reward-responsive dopamine neurons also project to the dorsal striatum.
The dorsal striatum has typically been observed to respond weakly in paradigms that drive the ventral striatum.
However, robust reward-related differences have been found in the dorsal striatum using other paradigms.
PFC
Dorsal Striatum
Ventral Striatum
THALAMUS
SNpc VTA
Orbitofrontal
The Card Guessing Task
Scanning Sequence:
Trial Events:
Seconds
Scan 1
Choice Period
REWARDTRIAL
Card Outcome
7?
3
Card
Scan 2 Scan 3 Scan 4 Scan 5 Scan 1
Post-Outcome Period
15
TEMPORAL SEQUENCE
0 6 9 12
Indicated monetary gain
Indicated monetary loss
Robust dorsal striatal activity is found during the card guessing task
Which aspects of the task account for activation?
• Unlike the ventral striatum, delivery of reinforcer or conditioned cue is not sufficient to activate dorsal striatum.
• Activation during guessing task shows such delivery is not necessary.
• Is it the mere need for an instrumental response?
? 7? 7
Oddball task Guessing task
• Or must there be a real or perceived contingency between the the response & the outcome?
Blue circle = single keypress
Yellow circle = choose a keypress
The dorsal striatum is sensitive to perceived response-outcome contingency.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.No choice trial Choice trial
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Involvement in response-outcome signaling may apply to complex situations.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Early Trials LateTrials
Caudate Activity
Do the contributions of the dorsal striatum extend to “cold” cognition?
0
20
40
60
80
100
Native English Speakers
Native Japanese Speaker
Natural"lake" token
Speech Token
Natural"rake" token
%heard
as “lake”
equal intermediate levels
The Development of Speech Categories May Be Self-Organizing
When one neuron A participates in firing another neuron B, the strength of the effect of A on the firing of B is increased.
- paraphrased from Hebb, 1949
Or, put more simply:
Neurons that wire together, fire together.
Once perceptual categories have been formed, can they be “reshaped”?
Difficulties caused by a self-reinforcing tendency to hear two speech sounds as the same, thus:
• Exaggerating the differences between sounds could overcome barrier.
• Learning should not require explicit feedback.
Load-RoadSeries
Fixed Training
L
Adaptive Training (Initial Stimuli)
102030405060708090
100
0
R
An Empirical Test of the Theory
0102030405060708090
100
0.0 0.5 1.0
Fixed Training Condition
0102030405060708090
100
0.0 0.5 1.0
Adaptive Training Condition
PosttestPretest
[l]Anchor
[r]Anchor
[l]Anchor
[r]Anchor
Load-RoadSeries
Fixed Training
L
Adaptive Training (Initial Stimuli)
102030405060708090
100
0
R
Is the model complete?
Difficulties caused by a self-reinforcing tendency to hear two speech sounds as the same, thus:• Exaggerating the differences between sounds could overcome barrier.
• Learning should not require explicit feedback.
• But what if feedback is given?
Load-RoadSeries
Fixed Training
L
Adaptive Training (Initial Stimuli)
102030405060708090
100
0
R
(McCandliss et al., 2002)
Effects of Training Without Feedback
Effects of Training With Feedback
With feedback, both the adaptive and fixed techniques are effective.
Could the differences in learning reflect the engagement of the dorsal striatum?
• Hypothesis: – In a motivated learner, performance feedback may be rewarding
(correct response) or non-rewarding (incorrect response).– Outcomes may engage striatal reinforcement learning
mechanisms.– Perceptual representations and associated responses that lead to
“rewarding” outcomes are strengthened.• Test by having Japanese subjects perform the /r/ vs. /l/ task with and
without feedback.• Compare activation in perceptual identification task to activation in
the guessing task.
A comparison across tasks.
Feedback trialFeedback trial
2.5 s 500 ms
500 ms
“fixed” stimuli(0.2, 0.6 alongcontinuum)
11.5 s
No-feedback trialNo-feedback trial
2.5 s 500 ms 500 ms 11.5 s
2.5 s 500 ms
500 ms
11.5 s
Guessing Task Categorizaton Task
Increased Caudate Activation During Feedback TrainingThe striatum is more
active in the feedback as compared to the no-feedback condition.
Performance Feedback Acts Like Gambling Reward/Punishment
The activation is similar in location and pattern to that observed with the guessing task.
Temporal cortex may be affected by top-down outcome signals.
Can we see pre vs. post training differences?
No explicit task: Subjects listen passively to stimuliAn “oddball” response is presented every 16-24 ms
loa
d
roa
d
loa
dlo
ad
loa
dlo
ad
loa
dlo
ad
roa
d
roa
d
time bins 1-5 after oddball onset
time bins 1-5 after oddball onset
time bins 1-5 after oddball onset
Use fMRI to determine which areas of the brain respond to the oddball stimulus.
If the sounds are perceived as the same, there should be no response to the oddballs.
Examine the Neural Response to Native vs. Non-native Phoneme Contrast
Pre-test Categorization Curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
stimulus index
proportion [r] or [n] responses
road-load
mode-node
• Subjects: native Japanese speakers (n=9)
Before training, auditory regions responded most to the native oddballs.
Left posterior superior temporal gyrus(x, y, z = 58, -34, 12)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Percent Change From Baseline
pre road-load
pre mode-node
*
*
Right posterior superior temporal gyrus(x, y, z = -60, -22, 4)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Percent Change From Baseline
pre road-load
pre mode-node
*
*(0.14)
After training, the largest responses were to the non-native oddballs.
Right posterior superior temporal gyrus(x, y, z = -60, -22, 4)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Percent Change From Baseline
post road-load
post mode-node*
Left posterior superior temporal gyrus(x, y, z = 58, -34, 12)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Percent Change From Baseline
post road-load
post mode-node
*
Implications for Perceptual Organization• The organization of perceptual categories may be mediated by both Hebbian-based and
reinforcement-based learning mechanism.• During development, both mechanisms may come into play.
Goldstein et al., PNAS, 100:830-835.
Adaptive input:
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Kuhl, Nature Neuroscience Reviews, 5:831-843.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (LZW) decompressorare needed to see this picture.
Test periods (10 min)
Baseline Social response
Extinction
Pro
po
rtio
n o
f ca
no
nic
al s
ylla
ble
s
Rewarding outcome:
Feedback may invoke learning that cuts across both implicit & explicit memory tasks.
Implications for Normal Development
The striatum appears to be part of a reinforcement learning system. This system may use rewarding outcomes (broadly construed) to shape:
- perceptual representations of environmental stimuli
- affective (motivational) responses evoked by stimuli & associated contexts
- overt (motor) & covert (?) responses elicited by stimuli
- episodic memory associations or retrieval processes
Dysfunction/abnormal input into this system may result in developmental disorders.
- OCD
- susceptibility to drug abuse and drug addiction:
- stress during early developmental periods
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
ConclusionsVentral striatum is responsive to the mere presentation of primary reinforcers and conditioned cues;
thus, the ventral striatum may play an important role in representing the incentive value of stimuli.
Dorsal striatum is sensitive to whether there is a perceived contingency between a response and an outcome;
thus, dorsal striatum may contribute to selecting and shaping behavior by associating actions with their outcomes.
The dorsal striatum and prefrontal cortex may work together to provide substantial cognitive control over representations of incentive value induced by stimulus events.
The dorsal striatal response is multi-faceted.
The choice period shows a sensitivity to motivational state:
?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
High or Low
Periods of Low Incentive
Periods of High Incentive?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
High or Low
Periods of Low Incentive
Periods of High Incentive?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
High or Low
Low reward trial
Large Reward Trial?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
?
?
7
3
Positive Feedback $4.00
Cue Choice-Period Outcome Feedback
Positive Feedback $0.00
High or Low
e
The outcome period shows a sensitivity to outcome value:
Left Caudate Nucleus (x, y, z = -8, 8, 5)
Time Period
T1 T2
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
High Incentive
Low Incentive
Caudate neurons show selective activation for trials in which the monkey’s movement will be rewarded
rewarded movement
unrewarded movement
instruction trigger sound
instruction trigger reward
(Schultz, Tremblay, and Hollerman, 2000)
Modulation of cue-induced craving
Run 1
Run 2
…Runs separated by approximately 23 minutes
Notepad Golf ball
Tape (neutral) Cigarette
Run 1
Run 2
…Runs separated by approximately 23 minutes
Notepad Golf ball
Tape (neutral) Cigarette
All participants refrained from smoking for 8 hours
10 participants expected to smoke midway through scanning session
10 participants did not expect to smoke
Expectancy modulates the cue-induced response:
• affects measures self-reported craving
• affects facial expressions evoked in response to a conditioned cue
• affects performance on tasks requiring executive control
The dorsal striatum may act in concert with prefrontal regions.
PFC
Dorsal Striatum
Ventral Striatum
THALAMUS
SNpc VTA
Orbitofrontal
Leon & Shadlen (1999). Neuron, 24:415-425.
Expectancy modulates prefrontal activity
Left Ventrolateral PFC
-3-2
-1
0
12
3
4
5
6
Per
cen
t ch
ang
e fr
om
neu
tral
YESNO
Right Ventrolateral PFC
-3-2
-1
0
12
3
4
5
6
Per
cen
t ch
ang
e fr
om
neu
tral
YESNO
Right Dorsolateral PFC
0.1
1
Per
ce
nt
ch
an
ge
fro
m
ne
utr
al
NO YES
Left Dorsolateral PFC
11
Per
ce
nt
ch
an
ge
fro
m
ne
utr
al
NO YES
Dorsolateral PFC Ventrolateral PFC
The dorsal striatum is sensitive to perceived response-outcome contingency.
Contingency condition
Instrumental condition
Blue circle = single keypress
Yellow circle = choose a keypress
No-Choice Trials
1996
1997
1998
1999
2000
2001
2002
2003
2004
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10Time Period
rewardpunish
Choice Trials
rewardpunish
Choice Trials
1996
1997
1998
1999
2000
2001
2002
2003
2004
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10Time Period
Reward trial Punishment trial
Behavioral ResultsAfter the imaging study, subjects completed extended training.
With presentation of fixed (non-adpative stimuil), robust learning occurred only with feedback.
Recommended