Reward-related Neural Circuitry

Preview:

DESCRIPTION

Reward-related Neural Circuitry. Julie Fiez, Ph.D. Departments of Psychology & Neuroscience. Acknowledgements. Karin Cox Mauricio Delgado Corrine Durisko Mary Conway Kate Fissell Chris May Alison Moed Susan Ravizza Elizabeth Tricomi Steve Wilson. Bruce McCandliss James McClelland - PowerPoint PPT Presentation

Citation preview

Reward-related Neural Circuitry

Julie Fiez, Ph.D.

Departments of Psychology & Neuroscience

Acknowledgements

Karin CoxMauricio DelgadoCorrine DuriskoMary ConwayKate FissellChris May

Alison MoedSusan Ravizza

Elizabeth TricomiSteve Wilson

Bruce McCandlissJames McClelland

Athanassio ProtopapasMichael SayetteAndy Stegner

Dopamine Plays a Crucial Role in Reward-Related Processing

Dopamine neurons respond to unexpected rewards.

Animals will work for delivery of drugs that stimulate dopaminergic signalling.

Schultz et al. (1997). Science, 275:1593-1599

Dopamine neurons project into distinct fronto-striatal-thalamic loops

PFC

Dorsal Striatum(Caudate/Putamen)

Ventral Striatum(Nucleus

Accumbens)

THALAMUS

SNpc VTA

Orbitofrontal

Is Dopamine a “Pleasure” Signal?“Liking” vs. “Wanting”

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Cannon & Bseirki (2004). Physiol & Behav, 81:741-7428.

Does Dopamine Support the Development of Associations That Yield Increased Reward?

Even simple behaviors have multiple opportunities for “habit” formation:

Stimulus-outcome:consequences (feedback) may alter the value of neutral stimulus

Response-outcome:consequences may alter motor (and cognitive) activity

Stimulus-response-outcome:consequences may alter the relationship between a stimulus & a response

Stimulus-response:after learning, behavior may be no longer governed by outcomes

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

light -> lever press -> food deliverystim -> response -> outcome

The Dopamine Signal May be Ideal to Support Such Reinforcement Learning

Schultz & Montague(1997). Science, 275:1593-1599

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Egelman et al. (1998). J Cogn Neurosci, 10:623-30.

Do ventral & dorsal striatum support different aspects of

reinforcement learning?

Ito et al. (2002). J Neurosci, 22:6247-6253

Training: initial Pavlovian training CS+: light paired with drug delivery CS-: clicks presented non-contingently

-2nd order conditioning each lever press leads to light (CS+) delivery 10 lever presses earns drug delivery drug delivered after a fixed (20 min) interval

PFC

Dorsal Striatum(Caudate/Putamen)

Ventral Striatum(Nucleus

Accumbens)

THALAMUS

SNpc VTA

Orbitofrontal

(e.g., Elliott et al., 2004; O’Doherty et al., 2004; Robbins et al., 1992)

Emerging Issues for fMRI

• What striatal response properties are observed in humans?

• Are there dissociations between ventral vs. dorsal activity that converge with the animal literature?

• What insight might such dissociations provide into the nature of human reward-related processing?

Do striatal regions respond to the unpredictable delivery of reinforcers?

Schultz et al. (1997). Science, 275:1593-1599

Yes, especially at or near the nucleus accumbens:

Berns et al. (2001). J Neurosci, 21:1793-2798

Do striatal regions respond to delivery of unexpected monetary outcomes?

No significant differences between reward, punishment, and neutral trials were observed.

Left Nucleus Accumbens(x, y, z = -12, 8, 8)

1992

1994

1996

1998

2000

2002

2004

2006

2008

T1 T2 T3 T4 T5 T6 T7

Time Period

mean intensity value

punishneutralreward

How might we reconcile these findings?

• The study by Berns & colleagues involved the delivery of a primary reinforcer.

• Will delivery of an unexpected, conditioned cue activate the ventral striatum?

Schultz et al. (1997). Science, 275:1593-1599• The oddball study made use of an abstract, unconditioned cue (red or green arrow) to indicate gain or loss of a secondary reinforcer (delivered later).

Unexpected delivery of conditioned cues

Run 1

Run 2

…Runs separated by approximately 23 minutes

Notepad Golf ball

Tape (neutral) Cigarette

Run 1

Run 2

…Runs separated by approximately 23 minutes

Notepad Golf ball

Tape (neutral) Cigarette

• Male heavy smokers (at least 20 cigarettes/day)

• Participants abstained from smoking for 8 hours

• Compliance assessed by expired CO

• Three neutral and one conditioned cue exposure

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

0 10.5 21 31.5 42 52.5 63 73.5

Time (s)

Percent change

NAcc cig

NAcc neu

Caud cig

Caud neu

Interim Summary

Consistent with prior neurophysiological findings, the ventral striatum responds to the unexpected delivery of primary reinforcers and conditioned cues.

These findings support claims that the ventral striatum plays an integral role in reward-related signaling under normal conditions, and that it may contribute to pathological states such as addiction.

What about the dorsal striatum?

Reward-responsive dopamine neurons also project to the dorsal striatum.

The dorsal striatum has typically been observed to respond weakly in paradigms that drive the ventral striatum.

However, robust reward-related differences have been found in the dorsal striatum using other paradigms.

PFC

Dorsal Striatum

Ventral Striatum

THALAMUS

SNpc VTA

Orbitofrontal

The Card Guessing Task

Scanning Sequence:

Trial Events:

Seconds

Scan 1

Choice Period

REWARDTRIAL

Card Outcome

7?

3

Card

Scan 2 Scan 3 Scan 4 Scan 5 Scan 1

Post-Outcome Period

15

TEMPORAL SEQUENCE

0 6 9 12

Indicated monetary gain

Indicated monetary loss

Robust dorsal striatal activity is found during the card guessing task

Which aspects of the task account for activation?

• Unlike the ventral striatum, delivery of reinforcer or conditioned cue is not sufficient to activate dorsal striatum.

• Activation during guessing task shows such delivery is not necessary.

• Is it the mere need for an instrumental response?

? 7? 7

Oddball task Guessing task

• Or must there be a real or perceived contingency between the the response & the outcome?

Blue circle = single keypress

Yellow circle = choose a keypress

The dorsal striatum is sensitive to perceived response-outcome contingency.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.No choice trial Choice trial

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Involvement in response-outcome signaling may apply to complex situations.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Early Trials LateTrials

Caudate Activity

Do the contributions of the dorsal striatum extend to “cold” cognition?

0

20

40

60

80

100

Native English Speakers

Native Japanese Speaker

Natural"lake" token

Speech Token

Natural"rake" token

%heard

as “lake”

equal intermediate levels

The Development of Speech Categories May Be Self-Organizing

When one neuron A participates in firing another neuron B, the strength of the effect of A on the firing of B is increased.

- paraphrased from Hebb, 1949

Or, put more simply:

Neurons that wire together, fire together.

Once perceptual categories have been formed, can they be “reshaped”?

Difficulties caused by a self-reinforcing tendency to hear two speech sounds as the same, thus:

• Exaggerating the differences between sounds could overcome barrier.

• Learning should not require explicit feedback.

Load-RoadSeries

Fixed Training

L

Adaptive Training (Initial Stimuli)

102030405060708090

100

0

R

An Empirical Test of the Theory

0102030405060708090

100

0.0 0.5 1.0

Fixed Training Condition

0102030405060708090

100

0.0 0.5 1.0

Adaptive Training Condition

PosttestPretest

[l]Anchor

[r]Anchor

[l]Anchor

[r]Anchor

Load-RoadSeries

Fixed Training

L

Adaptive Training (Initial Stimuli)

102030405060708090

100

0

R

Is the model complete?

Difficulties caused by a self-reinforcing tendency to hear two speech sounds as the same, thus:• Exaggerating the differences between sounds could overcome barrier.

• Learning should not require explicit feedback.

• But what if feedback is given?

Load-RoadSeries

Fixed Training

L

Adaptive Training (Initial Stimuli)

102030405060708090

100

0

R

(McCandliss et al., 2002)

Effects of Training Without Feedback

Effects of Training With Feedback

With feedback, both the adaptive and fixed techniques are effective.

Could the differences in learning reflect the engagement of the dorsal striatum?

• Hypothesis: – In a motivated learner, performance feedback may be rewarding

(correct response) or non-rewarding (incorrect response).– Outcomes may engage striatal reinforcement learning

mechanisms.– Perceptual representations and associated responses that lead to

“rewarding” outcomes are strengthened.• Test by having Japanese subjects perform the /r/ vs. /l/ task with and

without feedback.• Compare activation in perceptual identification task to activation in

the guessing task.

A comparison across tasks.

Feedback trialFeedback trial

2.5 s 500 ms

500 ms

“fixed” stimuli(0.2, 0.6 alongcontinuum)

11.5 s

No-feedback trialNo-feedback trial

2.5 s 500 ms 500 ms 11.5 s

2.5 s 500 ms

500 ms

11.5 s

Guessing Task Categorizaton Task

Increased Caudate Activation During Feedback TrainingThe striatum is more

active in the feedback as compared to the no-feedback condition.

Performance Feedback Acts Like Gambling Reward/Punishment

The activation is similar in location and pattern to that observed with the guessing task.

Temporal cortex may be affected by top-down outcome signals.

Can we see pre vs. post training differences?

No explicit task: Subjects listen passively to stimuliAn “oddball” response is presented every 16-24 ms

loa

d

roa

d

loa

dlo

ad

loa

dlo

ad

loa

dlo

ad

roa

d

roa

d

time bins 1-5 after oddball onset

time bins 1-5 after oddball onset

time bins 1-5 after oddball onset

Use fMRI to determine which areas of the brain respond to the oddball stimulus.

If the sounds are perceived as the same, there should be no response to the oddballs.

Examine the Neural Response to Native vs. Non-native Phoneme Contrast

Pre-test Categorization Curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

stimulus index

proportion [r] or [n] responses

road-load

mode-node

• Subjects: native Japanese speakers (n=9)

Before training, auditory regions responded most to the native oddballs.

Left posterior superior temporal gyrus(x, y, z = 58, -34, 12)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Percent Change From Baseline

pre road-load

pre mode-node

*

*

Right posterior superior temporal gyrus(x, y, z = -60, -22, 4)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Percent Change From Baseline

pre road-load

pre mode-node

*

*(0.14)

After training, the largest responses were to the non-native oddballs.

Right posterior superior temporal gyrus(x, y, z = -60, -22, 4)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Percent Change From Baseline

post road-load

post mode-node*

Left posterior superior temporal gyrus(x, y, z = 58, -34, 12)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Percent Change From Baseline

post road-load

post mode-node

*

Implications for Perceptual Organization• The organization of perceptual categories may be mediated by both Hebbian-based and

reinforcement-based learning mechanism.• During development, both mechanisms may come into play.

Goldstein et al., PNAS, 100:830-835.

Adaptive input:

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Kuhl, Nature Neuroscience Reviews, 5:831-843.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

Test periods (10 min)

Baseline Social response

Extinction

Pro

po

rtio

n o

f ca

no

nic

al s

ylla

ble

s

Rewarding outcome:

Feedback may invoke learning that cuts across both implicit & explicit memory tasks.

Implications for Normal Development

The striatum appears to be part of a reinforcement learning system. This system may use rewarding outcomes (broadly construed) to shape:

- perceptual representations of environmental stimuli

- affective (motivational) responses evoked by stimuli & associated contexts

- overt (motor) & covert (?) responses elicited by stimuli

- episodic memory associations or retrieval processes

Dysfunction/abnormal input into this system may result in developmental disorders.

- OCD

- susceptibility to drug abuse and drug addiction:

- stress during early developmental periods

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

ConclusionsVentral striatum is responsive to the mere presentation of primary reinforcers and conditioned cues;

thus, the ventral striatum may play an important role in representing the incentive value of stimuli.

Dorsal striatum is sensitive to whether there is a perceived contingency between a response and an outcome;

thus, dorsal striatum may contribute to selecting and shaping behavior by associating actions with their outcomes.

The dorsal striatum and prefrontal cortex may work together to provide substantial cognitive control over representations of incentive value induced by stimulus events.

The dorsal striatal response is multi-faceted.

The choice period shows a sensitivity to motivational state:

?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

High or Low

Periods of Low Incentive

Periods of High Incentive?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

High or Low

Periods of Low Incentive

Periods of High Incentive?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

High or Low

Low reward trial

Large Reward Trial?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

?

?

7

3

Positive Feedback $4.00

Cue Choice-Period Outcome Feedback

Positive Feedback $0.00

High or Low

e

The outcome period shows a sensitivity to outcome value:

Left Caudate Nucleus (x, y, z = -8, 8, 5)

Time Period

T1 T2

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

High Incentive

Low Incentive

Caudate neurons show selective activation for trials in which the monkey’s movement will be rewarded

rewarded movement

unrewarded movement

instruction trigger sound

instruction trigger reward

(Schultz, Tremblay, and Hollerman, 2000)

Modulation of cue-induced craving

Run 1

Run 2

…Runs separated by approximately 23 minutes

Notepad Golf ball

Tape (neutral) Cigarette

Run 1

Run 2

…Runs separated by approximately 23 minutes

Notepad Golf ball

Tape (neutral) Cigarette

All participants refrained from smoking for 8 hours

10 participants expected to smoke midway through scanning session

10 participants did not expect to smoke

Expectancy modulates the cue-induced response:

• affects measures self-reported craving

• affects facial expressions evoked in response to a conditioned cue

• affects performance on tasks requiring executive control

The dorsal striatum may act in concert with prefrontal regions.

PFC

Dorsal Striatum

Ventral Striatum

THALAMUS

SNpc VTA

Orbitofrontal

Leon & Shadlen (1999). Neuron, 24:415-425.

Expectancy modulates prefrontal activity

Left Ventrolateral PFC

-3-2

-1

0

12

3

4

5

6

Per

cen

t ch

ang

e fr

om

neu

tral

YESNO

Right Ventrolateral PFC

-3-2

-1

0

12

3

4

5

6

Per

cen

t ch

ang

e fr

om

neu

tral

YESNO

Right Dorsolateral PFC

0.1

1

Per

ce

nt

ch

an

ge

fro

m

ne

utr

al

NO YES

Left Dorsolateral PFC

11

Per

ce

nt

ch

an

ge

fro

m

ne

utr

al

NO YES

Dorsolateral PFC Ventrolateral PFC

The dorsal striatum is sensitive to perceived response-outcome contingency.

Contingency condition

Instrumental condition

Blue circle = single keypress

Yellow circle = choose a keypress

No-Choice Trials

1996

1997

1998

1999

2000

2001

2002

2003

2004

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10Time Period

rewardpunish

Choice Trials

rewardpunish

Choice Trials

1996

1997

1998

1999

2000

2001

2002

2003

2004

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10Time Period

Reward trial Punishment trial

Behavioral ResultsAfter the imaging study, subjects completed extended training.

With presentation of fixed (non-adpative stimuil), robust learning occurred only with feedback.

Recommended