Click here to load reader
Upload
saikiran-boga
View
212
Download
0
Embed Size (px)
Citation preview
Cognitive ModelingAakash Hingu (IIIrd Year)
Amit Kumar Swami (IVth Year)
Himanshu Singh (IIIrd Year)
Saikiran Boga (IIIrd Year)
Saiteja Sirikonda (IIIrd Year)
Sudanshu Gaur (IIIrd Year) Date: 19th May 2013
Motivation
• A cognitive model is an approximation to Cognitive Processes (predominantly human) for the purposes of comprehension and prediction.
• We try to imitate the ways of the human brain and try implement them through a mathematical model.
• In the coming slides, we try to explain the following,
Reinforcement Learning with Inertia Model
Normalized Reinforcement Learning with Inertia.
The BOSS Model
Task• 60 problems
• Each problem played by 20 participants
• Each problem played for 100 trials by a Participant.
• Each trial involves a choice between risky and safe options
–Example:
•Risky: -$32 with p = 0.1; $0 otherwise;
•Safe: -$3 with p = 1
Dataset
Column 1: The proportion of risky choices averaged over all participants and
time periods for each problem.
Column 2: High outcome on the risky option
Column 3: Probability of High outcome on the risky option
Column 4: Low outcome on the risky option
Column 5: Medium outcome on the safe option
Models
Inertia Reinforcement learning :
• It was just a basic Reinforcement learning method until we introduced the Inertia to it.
• That is to sustain the previous outcomes with the model "learning" it with weight provided.
• So it is basically inertia reinforcement learning model(IRL).
Result:
• MSD for Best TPT : 0.0094
*NOTE :
We have optimized the parameter(Pinert and weight(w)) values with global optimization tool.
• pinert : 0.656
• w(weight) : 0.545
• MSD for IRL model : 0.00768
• The IRL models' MSD is better than best TPT models' MSD by the order of 1.2239.
• The Reason is that our model is learning from the inertia we have instrumented on the latest favourable outcomes.
Reinforcement Learning With Normalization
The probability of selecting the risky prospect at trial t is given by:
where
WVt(k) = is the weighted value of action k at trial t,
μ = is a free payoff sensitivity parameter, and
Dt = is a measure of experienced payoff variability.
Reinforcement Learning With Normalization (Cont.)
If strategy k was selected at t, its weighted value at trial t+1 is a weighted average of WVt(k) at t, and vt the obtained payoff at t
The parameter 0 < ω < 1 captures the weight for the recent outcomes. The initial value,
WV1(k) = 0.5 * (high * ph + low * (1-ph))+ .5 * med
The payoff variability term Dt is the weighted average of the difference between the obtained payoff at trial t and t-1:
where v0 is assumed to equal A(1), and D1 is assumed to equal μ.
Normalized Reinforcement Learning with Inertia
• Inertia is embedded into NRL model.
• Here inertia is chosen as a free parameter and optimized through GA.
• Optimized value of Inertia = 0.537
• Optimized value of ω=0.395
• Optimized value of Inertia λ= 0.59
Result:
The resultant MSD obtained is 0.0278
Models (Cont.)
2. BOSS Model
Basic Idea:
Tactically, when a person gambles he credits a certain amount of
money which he can affords to loose. He likes to take more and more
risk as he gains more and more money, and tends to choose safer
options as he loses more.
BOSS Model(Cont.)
Algorithm:-
For each trial
If it’s a first trial, take a random choice
Else if outcome till now >= compare
Randomly take a choice
else take safe choice
If choice = risky
take collective outcome randomly with low & high
Else collective outcome with medium
BOSS Model(Cont.)
Where compare is decided based on the medium outcome
compare = max_num_loss * med if med <0
= 0 otherwise
compare is the only free variable here. And optimized value found using GA for given dataset is 46.
Result:-
MSD = 0.0238