The Computing Brain: Decision-Making as a Case...

Preview:

Citation preview

The Computing Brain:

Decision-Making as a Case Study

Prof. Angela Yu

Department of Cognitive Science

ajyu@ucsd.edu

Understanding the Brain/MindBehavior Neurobiology

?

CognitiveNeuroscience

A powerful analogy: the computing brain

Understanding the Brain/Mind

A powerful analogy: the computing brain

≈X

Brain << Supercomputer

Brain >> Supercomputer

Marr’s “Three Levels of Analysis”David Marr (1969): Brain = Information Processor

computational

algorithmic

implementational

goals of computationwhy things work the way they do

representation of input/outputhow one is transformed into the other

how is the system physically realizedneural representation and dynamics

NEGOTIATING SPEED-ACCURACY TRADEOFF IN SEQUENTIAL IDENTIFICATION UNDER A STOCHASTIC DEADLINE3

the deadline Θ, or by the successful registry of the subject’s decision, whichever occurring earlier—

“∧” denotes the minimum of the two arguments on its either side. Then by the strong law of large

numbers the long-run average reward per unit time equals ER/ET with probability one. Therefore,

the maximum reward rate problem is equivalent to solving the stochastic optimization problem

V := sup(τ,µ)

E�1{τ+T0<Θ}

�mj=1 rj1{µ=j,M=j}

E [(τ + T0) ∧Θ],

for which we will show that an optimal solution always exists and describe how to calculate the

supremum and an admissible decision rule (τ, µ) which attains the supremum.

An important theoretical question is whether and how Bayes-risk minimization and reward-rate

maximization are related to each other. In this work, we demonstrate that reward rate maximization

for this class of problems is formally equivalent to solving the family (W (c))c>0 of Bayes-risk

minimization problems,

W (c) := inf(τ,µ)

E�c�(τ + T0) ∧Θ) + 1{τ+T0<Θ}

i �=j

rj1{µ=i,M=j} + 1{τ+T0≥Θ}

m�

j=1

rj1{M=j}

�,

indexed by the unit sampling (observation or time) cost c > 0, thus rendering the reward-rate

maximization problem amenable to a large array of existing analytical and computational tools in

stochastic control theory. In particular, we show that the maximum reward rate V is the unique

unit sampling cost c > 0 which makes the minimum Bayes risk W (c) equal to the maximal expected

reward�m

j=1 rjP(M = j) under the prior distribution. Moreover,

c � V if and only if inf(τ,µ)

E�c�(τ + T0) ∧Θ

�− 1{τ+T0<Θ}

m�

j=1

rj1{µ=j,M=j}

�� 0;

namely, the maximum reward rate V is the unique unit sampling cost c for which expected total

observation cost E[c((τ∗+T0)∧Θ)] and expected terminal reward E[1{τ∗+T0<Θ}�m

j=1 rj1{µ∗=j,M=j}]

break even under any optimal decision rule (τ∗, µ∗).

In Section 2, we characterize the Bayes-risk minimization solution to the multi-hypothesis se-

quential identification problems W (c), c > 0 under a stochastic deadline. This treatment extends

our previous work on Bayes risk minimization in sequential testing of multiple hypotheses [4] and

of binary hypotheses under a stochastic deadline [10], in which there are penalties associated with

breaching a stochastic deadline in addition to typical observation and misidentification costs. In

Section 3, we characterize the formal relationship between reward-rate maximization and Bayes-

risk minimization, and leverage it to obtain a numerical procedure for optimizing reward rate.

Significantly, we will show that the optimal policy for reward rate maximization depends on the

initial belief state, unlike for Bayes risk minimization—this is because the former identifies with a

different setting of the latter depending on the initial state. This dependence on initial belief state

shows explicitly that the reward-rate-maximizing policy cannot satisfy any iterative, Markovian

form of Bellman’s dynamic programming equation [1]. Finally, in Section 4, we demonstrate how

the procedure can be applied to solve a numerical example involving binary hypotheses.

Marr’s “Three Levels of Analysis”

computationalalgorithmic

implementational

our approach

behavior

biology

Example 1: Ordering food

Deliberation Delay

... optimizes choice ... eating alone

How do We Make Decisions?

Speed-Accuracy Trade-off

• Sensory uncertainty✤ how far away is the intersection?✤ is that a yellow light or a yellow street lamp?

• Action uncertainty✤ can the car stop in time? (rental car, rain)

• Prior information✤ duration of yellow light, P(cop), $ ticket

• Relative costs✤ cops/tickets versus temporal delay

Example 2: Stop or go at traffic light

How do We Make Decisions?

Speed Accuracyvs.

• Slow response ⇒ greater accuracy but higher time cost

• What is the optimal tradeoff?

• Are humans/animals optimal?

• How does the brain implement the computations?

Fundamental Trade-off

Random Dot Coherent Motion Paradigm

30% coherence 5% coherenceEasy Difficult

vs.

Left or Right?

Monkey Decision-MakingRandom dot coherent motion paradigm

Behavioral Data

Accuracy vs. Coherence <RT> vs. Coherence(Roitman & Shadlen, 2002)

Easier

more accuratefaster

Neural DataSaccade generation system

(Britten, Shadlen, Newsome, & Movshon, 1993)

Preferred Anti-preferred

Time Time

Res

pons

e

26%(hard)

Time Time

Res

pons

e

100%(easy)

MT tuning function

Direction (°)

(Britten & Newsome, 1998)

MT: sustained response

coherence

direction

Saccade generation system

Neural DataLIP: ramping response

(Roitman & Shadlen, 2002)

(Roitman & Shalden, 2002)

LIP response reflects sensory information

• ramping slope depends on coherence (difficulty)

LIP response reflects perceptual decision

• 0% stimulus ⇒ different response depending on choice

• neural activation reaches identical peak before saccade

Saccade generation system

Neural Data

Marr’s “Three Levels of Analysis”

computationalalgorithmic

implementational

our approach

behavior

biology

Decision-Making: ComputationsRandom dot coherent motion paradigm

Repeated decision: Left or right? Stop now or collect more data?

+: more accurate

-: more time

L R L R L R

wait wait

(50 ms) (100 ms) (150 ms)

Decision-Making: Computations

Luckily, Mathematicians Solved the Problem

Wald & Wolfowitz (1948): hypothesis 1 (left) vs. hypothesis 2 (right)

Optimal policy

• cost function: Pr(error) + c·RT• accumulate evidence over time: Pr(left) versus Pr(right)• stop if evidence/confidence exceeds “left” or “right” boundary

Decision-Making: Computations

Wald & Wolfowitz (1948): hypothesis 1 (left) vs. hypothesis 2 (right)

Optimal policy

• cost function: Pr(error) + c·RT• accumulate evidence over time: Pr(left) versus Pr(right)• stop if total evidence exceeds “left” or “right” boundary

“left” boundary

“right” boundary

Evidencetrial 1

trial 2

Decision-Making: Algorithm

(Roitman & Shalden, 2002)

Decision-Making: Implementation

MT: motion processing

Direction (°)(Britten & Newsome, 1998)

coherence

direction

LIP: evidence accumulation

Model ⇔ Behavior

Model: harder ⇒• slower• more errors boundary

Accuracy vs. Coherence <RT> vs. Coherence

easyhard

Model ⇔ Neurobiology

Model LIP Neural Response

Putting it All Together

behavior

biology

Evid

ence

Extensions

• Multiple (3+) choices (Yu & Dayan, 2005)

• Temporal uncertainty about stimulus onset (Yu, 2007)

• Decision deadline (Frazier & Yu, 2008)

• Learning about direction bias (L/R) over trials

(Yu, Dayan, & Cohen, 2009)

• Minimizing Pr(error)+c*RT versus

maximizing reward rate = Pr(correct)/RT

(Dayanik & Yu, in press)

Recommended