Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The Computing Brain:
Decision-Making as a Case Study
Prof. Angela Yu
Department of Cognitive Science
Understanding the Brain/MindBehavior Neurobiology
?
CognitiveNeuroscience
A powerful analogy: the computing brain
≈
Understanding the Brain/Mind
A powerful analogy: the computing brain
≈X
Brain << Supercomputer
Brain >> Supercomputer
Marr’s “Three Levels of Analysis”David Marr (1969): Brain = Information Processor
computational
algorithmic
implementational
goals of computationwhy things work the way they do
representation of input/outputhow one is transformed into the other
how is the system physically realizedneural representation and dynamics
NEGOTIATING SPEED-ACCURACY TRADEOFF IN SEQUENTIAL IDENTIFICATION UNDER A STOCHASTIC DEADLINE3
the deadline Θ, or by the successful registry of the subject’s decision, whichever occurring earlier—
“∧” denotes the minimum of the two arguments on its either side. Then by the strong law of large
numbers the long-run average reward per unit time equals ER/ET with probability one. Therefore,
the maximum reward rate problem is equivalent to solving the stochastic optimization problem
V := sup(τ,µ)
E�1{τ+T0<Θ}
�mj=1 rj1{µ=j,M=j}
�
E [(τ + T0) ∧Θ],
for which we will show that an optimal solution always exists and describe how to calculate the
supremum and an admissible decision rule (τ, µ) which attains the supremum.
An important theoretical question is whether and how Bayes-risk minimization and reward-rate
maximization are related to each other. In this work, we demonstrate that reward rate maximization
for this class of problems is formally equivalent to solving the family (W (c))c>0 of Bayes-risk
minimization problems,
W (c) := inf(τ,µ)
E�c�(τ + T0) ∧Θ) + 1{τ+T0<Θ}
�
i �=j
rj1{µ=i,M=j} + 1{τ+T0≥Θ}
m�
j=1
rj1{M=j}
�,
indexed by the unit sampling (observation or time) cost c > 0, thus rendering the reward-rate
maximization problem amenable to a large array of existing analytical and computational tools in
stochastic control theory. In particular, we show that the maximum reward rate V is the unique
unit sampling cost c > 0 which makes the minimum Bayes risk W (c) equal to the maximal expected
reward�m
j=1 rjP(M = j) under the prior distribution. Moreover,
c � V if and only if inf(τ,µ)
E�c�(τ + T0) ∧Θ
�− 1{τ+T0<Θ}
m�
j=1
rj1{µ=j,M=j}
�� 0;
namely, the maximum reward rate V is the unique unit sampling cost c for which expected total
observation cost E[c((τ∗+T0)∧Θ)] and expected terminal reward E[1{τ∗+T0<Θ}�m
j=1 rj1{µ∗=j,M=j}]
break even under any optimal decision rule (τ∗, µ∗).
In Section 2, we characterize the Bayes-risk minimization solution to the multi-hypothesis se-
quential identification problems W (c), c > 0 under a stochastic deadline. This treatment extends
our previous work on Bayes risk minimization in sequential testing of multiple hypotheses [4] and
of binary hypotheses under a stochastic deadline [10], in which there are penalties associated with
breaching a stochastic deadline in addition to typical observation and misidentification costs. In
Section 3, we characterize the formal relationship between reward-rate maximization and Bayes-
risk minimization, and leverage it to obtain a numerical procedure for optimizing reward rate.
Significantly, we will show that the optimal policy for reward rate maximization depends on the
initial belief state, unlike for Bayes risk minimization—this is because the former identifies with a
different setting of the latter depending on the initial state. This dependence on initial belief state
shows explicitly that the reward-rate-maximizing policy cannot satisfy any iterative, Markovian
form of Bellman’s dynamic programming equation [1]. Finally, in Section 4, we demonstrate how
the procedure can be applied to solve a numerical example involving binary hypotheses.
Marr’s “Three Levels of Analysis”
computationalalgorithmic
implementational
our approach
behavior
biology
Example 1: Ordering food
Deliberation Delay
... optimizes choice ... eating alone
How do We Make Decisions?
Speed-Accuracy Trade-off
• Sensory uncertainty✤ how far away is the intersection?✤ is that a yellow light or a yellow street lamp?
• Action uncertainty✤ can the car stop in time? (rental car, rain)
• Prior information✤ duration of yellow light, P(cop), $ ticket
• Relative costs✤ cops/tickets versus temporal delay
Example 2: Stop or go at traffic light
How do We Make Decisions?
Speed Accuracyvs.
• Slow response ⇒ greater accuracy but higher time cost
• What is the optimal tradeoff?
• Are humans/animals optimal?
• How does the brain implement the computations?
Fundamental Trade-off
Random Dot Coherent Motion Paradigm
30% coherence 5% coherenceEasy Difficult
vs.
Left or Right?
Monkey Decision-MakingRandom dot coherent motion paradigm
Behavioral Data
Accuracy vs. Coherence <RT> vs. Coherence(Roitman & Shadlen, 2002)
Easier
more accuratefaster
Neural DataSaccade generation system
(Britten, Shadlen, Newsome, & Movshon, 1993)
Preferred Anti-preferred
Time Time
Res
pons
e
26%(hard)
Time Time
Res
pons
e
100%(easy)
MT tuning function
Direction (°)
(Britten & Newsome, 1998)
MT: sustained response
coherence
direction
Saccade generation system
Neural DataLIP: ramping response
(Roitman & Shadlen, 2002)
(Roitman & Shalden, 2002)
LIP response reflects sensory information
• ramping slope depends on coherence (difficulty)
LIP response reflects perceptual decision
• 0% stimulus ⇒ different response depending on choice
• neural activation reaches identical peak before saccade
Saccade generation system
Neural Data
Marr’s “Three Levels of Analysis”
computationalalgorithmic
implementational
our approach
behavior
biology
Decision-Making: ComputationsRandom dot coherent motion paradigm
Repeated decision: Left or right? Stop now or collect more data?
+: more accurate
-: more time
L R L R L R
wait wait
(50 ms) (100 ms) (150 ms)
Decision-Making: Computations
Luckily, Mathematicians Solved the Problem
Wald & Wolfowitz (1948): hypothesis 1 (left) vs. hypothesis 2 (right)
Optimal policy
• cost function: Pr(error) + c·RT• accumulate evidence over time: Pr(left) versus Pr(right)• stop if evidence/confidence exceeds “left” or “right” boundary
Decision-Making: Computations
Wald & Wolfowitz (1948): hypothesis 1 (left) vs. hypothesis 2 (right)
Optimal policy
• cost function: Pr(error) + c·RT• accumulate evidence over time: Pr(left) versus Pr(right)• stop if total evidence exceeds “left” or “right” boundary
“left” boundary
“right” boundary
Evidencetrial 1
trial 2
Decision-Making: Algorithm
(Roitman & Shalden, 2002)
Decision-Making: Implementation
MT: motion processing
Direction (°)(Britten & Newsome, 1998)
coherence
direction
LIP: evidence accumulation
Model ⇔ Behavior
Model: harder ⇒• slower• more errors boundary
Accuracy vs. Coherence <RT> vs. Coherence
easyhard
Model ⇔ Neurobiology
Model LIP Neural Response
Putting it All Together
behavior
biology
Evid
ence
Extensions
• Multiple (3+) choices (Yu & Dayan, 2005)
• Temporal uncertainty about stimulus onset (Yu, 2007)
• Decision deadline (Frazier & Yu, 2008)
• Learning about direction bias (L/R) over trials
(Yu, Dayan, & Cohen, 2009)
• Minimizing Pr(error)+c*RT versus
maximizing reward rate = Pr(correct)/RT
(Dayanik & Yu, in press)