Upload
kibo-hicks
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava. Introduction. Operant Learning - PowerPoint PPT Presentation
Citation preview
Reinforcement learning and
human behavior
Hanan Shteingart and Yonatan Loewenstein
MTAT.03.292 Seminar in Computational Neuroscience
Zurab Bzhalava
Introduction
• Operant Learning
• Dominant computational approach to model operant learning is model-free RL
• Human behavior is far more complex
• Remaining Challenges
Reinforcement Learning
RL: A class of learning problems in which an agent interacts with an unfamiliar, dynamic and stochastic environment
Goal: Learn a policy to maximize some measure of long-term reward
Markov Decision Process
• A (finite) set of states S• A (finite) set of actions A• Transition Model: T(s, a, s’) = P(s’ | a ,s)• Reward Function: R(s)
• ᵧ is a discount factor ᵧ [0; 1]∈
• Policy π
• Optimal policy π*
Markov Decision Process
Bellman equation:
Biological Algorithms
• Behavioral control
• Evaluate the world quickly
• Choose appropriate behavior based on those valuations
midbrain's dopamine neurons
• Central role in guiding our behavior and thoughts
• Valuation of our world– Value of money– Other human being
• Major role in decision-making • Reward-dependent learning• Malfunction in mental illness • Related to Parkinson's disease. • Schizophrenia
Reinforcement signals define an agent's goals
1. organism is in state X an receives reward information;
2. organism queries stored value of state X;
3. organism updates stored value of state X based on current reward information;
4. organism selects action based on stored policy
5. organism transitions to state Y and receives reward information.
The reward-prediction error hypothesis
Difference between the experienced and predicted “reward” of an event
•Neurons of the ventral tegmental area
•phasic activity changes encode a 'prediction error about summed future reward'
prediction-error signal encoded in dopamine neuron firing.
Value binding
Human reward responses
Human reward responses
Model-based RL vs Model-free RL
• goal-directed vs habitual behaviors
• Implemented by two anatomically distinct systems (subject of debate)
• Some findings suggest:
– Medial striatum is more engaged during planning
– Lateral striatum is more engaged during choices in extensively trained tasks
Model-based RL vs Model-free RL
(b) Model-free RL
(c) Model-based RL
Human subjects in exhibited a mixture of both effects.
Challenges in relating human behavior to RL algorithms
• Humans tend to alternate rather than repeat an action after receiving a positively surprising payoff
• Tremendous heterogeneity in reports on human operant learning
• Probability matching or not
Heterogeneity in world model
Learning the world model
Reference List:
• Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein
• The ubiquity of model-based reinforcement learning Bradley B Doll Dylan A Simon3 and Nathaniel D Daw
• Computational roles for dopamine in behavioral control P. Read Montague1,2, Steven E. Hyman3 & Jonathan D. Cohen4,5