Show Me the Money!
Dmitry Kit
Outline
Overview Reinforcement Learning Other Topics Conclusions
Learning Models
Hebbian Learning Strengthens the relationship between neurons that
exhibit similar activity patterns and/or in close proximity
Might Explain Topological Features of the Brain Population Coding Basis function learning Area allocation to different functions
Reinforcement Learning (RL) Strengthens the relationship between choices that are
causally connected in obtaining some reward
RL Framework and the Brain
Reward Signal Representation Dopamine
Local Action selection structures Lateral Intra-parietal Area (LIP) Supplementary Eye Field (SEF) Frontal Eye Field (FEF)
Global Mechanism for action selection Basal Ganglia
Outline
Overview Reinforcement Learning
Reward Signal (Dopamine)Decision Variables (SEF, LIP, Other)Global Mechanism for Choice (Basal Ganglia)
Other Topics Conclusions
Reward Signal (Dopamine)
Located in Nigra Pars Compacta (SNc)
Modulates neurons in many different regions
Tonic low frequency activity
Only sends the error signal between expected and actual rewards
RL and Dopamine
Decision Variables in LIP
Contains neurons that code for: expected gain relative rewards between different actions
This activity was observed to be before the choices were actually presented and movement was madeSuggesting that these neurons were used to
decide on appropriate action
LIP Neuron Activity
Expectation of: High reward
produced high firing frequency (black line)
Low reward produced high firing frequency (gray line)
The firing rate was correlated with gain expectation early in the trial
Overall Neural Activity in LIP A large portion of examined neurons showed a
significant activity related to gain expectation, outcome probability and estimated value These were mostly exhibit in the early part of the trial
These neurons were also modulated by the actual movement
Neural Features of SEF
Three types of neurons found Active upon failure to perform task
Not responsible for executing actions Not related to spatial stimuli Possible error signal coding
Active upon success Not a response to visual stimuli Not responsible for motor control Related to some internal coding of performance
Active before and during the delivery of reinforcement Possibly interconnected to other regions of the brain Seem to code expected reward versus actual reward received
Function of SEF
Monitoring and controlling: Perception and production systems during decision
making Error Correction Production of responses that are not well-learned Overcoming habitual responses
Evidence: Neurons do not generate eye movements Monitor performance and reward
Reward Coding in Other Structures
Neurons in orbitofrontal cortex show: Selectivity to the type of physical reward
Solid Liquid Etc.,
Distinguish between rewards and punishers Some neurons in amygdala respond to
magnitude of reward
Local Choices
Multiple areas in charge of decision making Frontal Eye Field (FEF) LIP Supplementary Eye Field (SEF) Etc.,
Might have different goals
Need a global mechanism to arbitrate between these different goals
Physiology (Basal Ganglia)
Located at the base of the cerebrum Consists of:
Caudate Nucleus (CD) and putamen (PUT) (collectively called striatum) Input from cerebral cortex and part of the thalamus
Globus pallidus External Segment (GPe) Internal Segment (GPi)
Subthalamic Nucleus (STN) Receives direct input from cerebral cortex
Substantia Nigra Pars reticulate (SNr) Pars compacta (SNc)
Output Stations (GPi and SNr) To thalamus and brain stem motor areas
Anatomical Locations (BG)
BG: Function
Controls Thalamocortical networks
Mainly involved in hand or arm movements Brain stem motor networks
Superior Colliculus eye-head orienting
the pedunculopontine nucleus locomotion
periaqueductal gray vocalization
autonomic responses
BG-SC connection
Exists in many lower mammals Method of control
CD inhibits neural activity in SNrSNr projects inhibitory connections to SC
Inhibition is the main method of control Appropriate action is selected by inhibiting all
except the desired action
Neural Properties of BG
Contains memory-guided neurons Contains neurons that code expectation of
task specific events SNr
Only effected by planned movementsResponse fields of neurons is the same to
those they connect to in SC
Circuit Diagram
Coordinated Activity Model
Use GPe to select just the activity you need (Focus)
Use STN to inhibit a planned future activity(Sequencing)
Might be an incorrect model if we emphasize the direct cortical input to the STN Direct control over movement
suppression
Learning of Sequential Procedures
Frontoparietal association cortices and anterior part of the basal ganglia learn new sequencesUses visuospatial coordinates
Motor-premotor cortices and the mid-posterior part of the basal ganglia exploit learned sequencesUses motor coordinates
BG and Decision Making
Ventral striatum receives input from neocortical areas (cognition) and limbic (emotional) areas
Speed of saccades are related to emotional or motivational state
As with SEF and LIP many BG neurons respond to the expectation of reward
Uses dopaminergic neurons to: Modulate selectivity of individual neurons Modulate response magnitude of individual neurons
Circuit Diagram Revisited
Consequences of BG Disorders
Involuntary movement Random movement
Visually guided saccades Shorter saccades Problems with coordinated movements Responds deficit to memory-guided saccades Trouble holding fixation
Especially if STN is damaged Inability to learn sequential procedures Lack of motivation to perform actions
Why Disinhibition?
Possibly an evolutionary by-product
Need a gating mechanism not an enhancement mechanism
Outline
Overview Reinforcement Learning Other Topics Attention Vs. Reward Credit Assignment Problem Conclusions
Attention Or reward?
Attention is a more global concept than reward Defined as the study of vigilance, selective processing of stimuli,
and control systems for complex behavior Attention can modulate neurons before the onset of
stimuli, just like reward expectation neurons Attention is dependant on task difficulty How does one distinguish between reward expectation
signal and attention to a particular stimuli at a single neuron? Some studies of attention might have been looking at the same
neural signal as those studying reward Provide better definitions for reward and attention
Attention might be defined in terms of rewards
The Credit Assignment Problem
What chain of actions resulted in reward?
Which of the action to the right got you your steak?
Action/order 1st 2nd 3rd 4th 5th
Take a car
Take a bike
Open Door
Sit down
Shift in seat
Point at Menu Item
Tapped the table
Solution
Start from the point of receiving the reward Recently shown that in rats the hippocampus replays their daily
experiences backwards Back propagate the reward at a discounted rate
In monkeys the neurons coding for reward expectation decreased their activity, when the delay between the cue and reward was increased
Converges to optimal policy if: Any action has some positive probability of being chosen Infinite time
Infinite Time
Humans do not need an optimal policy The inherent randomness in our
environment, behaviors, and tasks make it unlikely that a set of truly unrelated actions coincide frequently
Conclusion Choose a set of actions (eg., SEF, LIP, etc.,) Execute a subset of actions that do not violate physical
limitations (Basal Ganglia) Compare the final result against the expected result (Dopamine) Try to do the task again
Almost every task that we execute uses the eyes to locate the target of interest and therefore it is not surprising that the eye is closely related to the current task.
Might be a huge oversimplification: “Correlation does not imply causation” More experiments are needed to show these relationships
The End
Thank You