69
Deciding Under Deciding Under Probabilistic Probabilistic Uncertainty Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Embed Size (px)

Citation preview

Page 1: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Deciding Under Deciding Under Probabilistic Probabilistic UncertaintyUncertainty

Russell and Norvig: Sect. 17.1-3,Chap. 17Russell and Norvig: Sect. 17.1-3,Chap. 17CS121 – Winter 2003

Page 2: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Non-deterministic vs. Non-deterministic vs. Probabilistic UncertaintyProbabilistic Uncertainty

Non-deterministic model

?

ba c

{a,b,c}

decision that is best for worst case

~ Adversarial search

?

ba c

{a(pa),b(pb),c(pc)}

decision that maximizes expected utility valueProbabilistic

model

Page 3: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70 utility of state

U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62

One State/One Action One State/One Action ExampleExample

Page 4: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62• U2(S0) = 74• U(S0) = max{U1(S0),U2(S0)} = 74

One State/Two Actions One State/Two Actions ExampleExample

Page 5: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62 – 5 = 57• U2(S0) = 74 – 25 = 49• U(S0) = max{U1(S0),U2(S0)} = 57

-5 -25

Introducing Action CostsIntroducing Action Costs

Page 6: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Example: Finding JulietExample: Finding Juliet

Charles’ off.

Juliet’s off.

Conf. room

10min

5min

10min

Page 7: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Example: Finding JulietExample: Finding Juliet

States are: S0: Romeo in Charles’ office S1: Romeo in Juliet’s office and Juliet here S2: Romeo in Juliet’s office and Juliet not here S3: Romeo in conference room and Juliet here S4: Romeo in conference room and Juliet not here

Actions are: GJO (go to Juliet’s office) GCR (go to conference room)

Page 8: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Utility ComputationUtility Computation

0

GJO

GJO

GCR

GCR

1

1

2

3

4

3

0

0

0

0

-10

-10

-10

-15

-10

Charles’ off.

Juliet’s off.

Conf. room

10min

5min

10min

Page 9: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

n-Step Decision Processn-Step Decision Process

There is a single initial stateStates reached after i steps are all different from those reached after ji stepsEach state i has a its reward R(i) Each state reached after n steps is terminalThe goal is to maximize the sum of rewards

0 1 2 3

Page 10: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

n-Step Decision Processn-Step Decision Process

Utility of state i:

U(i) = R(i) + maxa kP(k | a.i) U(k)

0 1 2 3

i

• Two possible actions from i : a1 and a2

• a1 leads to k11 with probability P11 or k12 with probability P12

P11 = P(k11 | a1.i) P12 = P(k12 | a1.i)• a2 leads to k21 with probability P21 or k22 with probability P22

• U(i) = R(i) + max {P11 U(k11) + P12 U(k12), P21 U(k21) + P22 U(k22)}

Page 11: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

n-Step Decision Processn-Step Decision Process

For j = n-1, n-2, …, 0 do:

For every state si attained after step j Compute the utility of si

Label that state with the corresponding best action

Utility of state i:

U(i) = R(i) + maxa kP(k | a.i) U(k)

Best choice of action at state i:

*(i) = arg maxa kP(k | a.i) U(k)0 1 2 3

i

Optimal policy

Page 12: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

n-Step Decision Process …n-Step Decision Process …

Utility of state i:

U(i) = maxa (kP(k | a.i) U(k) – Ca)

Best choice of action at state i:

*(i) = arg maxa (kP(k|a.i)U(k)–Ca)

0 1 2 3

i

… with costs on actions instead ofrewards on states

Page 13: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

0

GJO

GJO

GCR

GCR

1

1

2

3

4

3

0 1 2 3

i

Page 14: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

0

GJO

GJO

GCR

GCR

1

2

4

3

0 1 2 3

i

Page 15: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Target TrackingTarget Tracking

targetrobot

• The robot must keep the target in view• The target’s trajectory is not known in advance• The environment may or may not be known

Page 16: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Target TrackingTarget Tracking

targetrobot

• The robot must keep the target in view• The target’s trajectory is not known in advance• The environment may or may not be known

Page 17: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

States Are Indexed by States Are Indexed by Time Time

• State = (robot-position, target-position, time)• Action = (stop, up, down, right, left)• Outcome of an action = 5 possible states, each with probability 0.2

([i,j], [u,v], t)

• ([i+1,j], [u,v], t+1)• ([i+1,j], [u-1,v], t+1)• ([i+1,j], [u+1,v], t+1)• ([i+1,j], [u,v-1], t+1)• ([i+1,j], [u,v+1], t+1)

right

Each state has 25 successors

Page 18: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

h-Step Planning Processh-Step Planning Process

Planning horizon h

“Terminal” states:• States where the target is not visible• States at depth h

Reward function:• Target visible +1• Target not visible 0Maximizing the sum of rewards ~ maximizing escape time

R(state) = t

where 0 < < 1

discounting

Page 19: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

h-Step Planning Processh-Step Planning Process

Planning horizon h

The planner computes the optimal policy over this tree of states, but only the firststep of the policy is executed.

Then everything is repeated again …(sliding horizon)

Page 20: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

h-Step Planning Processh-Step Planning Process

Planning horizon h

Page 21: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

h-Step Planning Processh-Step Planning Process

Planning horizon h

h is chosen such that the computation of the optimal policy over the tree can be computed in one increment of time

Page 22: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Example With No PlannerExample With No Planner

Page 23: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Example With PlannerExample With Planner

Page 24: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Other ExampleOther Example

Page 25: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

h-Step Planning Processh-Step Planning Process

Planning horizon h

The optimal policy over this tree is not theoptimal policy that would have been computedif a prior model of the environment had beenavailable, along with an arbitrarily fast computer

Page 26: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

0

GJO

GJO

GCR

GCR

1

1

2

3

4

30 1 2 3

i

Page 27: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Simple Robot Navigation Simple Robot Navigation ProblemProblem

• In each state, the possible actions are U, D, R, and L

Page 28: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Probabilistic Transition Probabilistic Transition ModelModel

• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):

• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)

Page 29: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Probabilistic Transition Probabilistic Transition ModelModel

• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):

• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move)

Page 30: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Probabilistic Transition Probabilistic Transition ModelModel

• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):

• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move)• With probability 0.1 the robot moves left one square (if the robot is already in the leftmost row, then it does not move)

Page 31: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Markov PropertyMarkov Property

The transition properties depend only on the current state, not on previous history (how that state was reached)

Page 32: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)• U is executed

2

3

1

4321

Page 33: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)• U is executed

2

3

1

4321

Page 34: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)• U is executed• R is executed

2

3

1

4321

Page 35: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)• U is executed• R is executed

2

3

1

4321

Page 36: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)

2

3

1

4321

[3,2]

Page 37: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)• U is executed

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

Page 38: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

HistoriesHistories

• Planned sequence of actions: (U, R)• U has been executed• R is executed

• There are 9 possible sequences of states – called histories – and 6 possible final states for the robot!

4321

2

3

1

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 39: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Probability of Reaching the Probability of Reaching the GoalGoal

•P([4,3] | (U,R).[3,2]) = P([4,3] | R.[3,3]) x P([3,3] | U.[3,2]) + P([4,3] | R.[4,2]) x P([4,2] | U.[3,2])

2

3

1

4321

Note importance of Markov property in this derivation

•P([3,3] | U.[3,2]) = 0.8•P([4,2] | U.[3,2]) = 0.1

•P([4,3] | R.[3,3]) = 0.8•P([4,3] | R.[4,2]) = 0.1

•P([4,3] | (U,R).[3,2]) = 0.65

Page 40: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Utility FunctionUtility Function

• The robot needs to recharge its batteries• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• [4,3] or [4,2] are terminal states• Reward of a terminal state: +1 or -1• Reward of a non-terminal state: -1/25• Utility of a history: sum of rewards of traversed states• Goal: Maximize the utility of the history

-1

+1

2

3

1

4321

Page 41: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

-1

+1

2

3

1

4321

Histories are potentially unbounded and the same state can be reached many times

Page 42: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Utility of an Action Utility of an Action SequenceSequence

-1

+1

• Consider the action sequence (U,R) from [3,2]

2

3

1

4321

Page 43: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Utility of an Action Utility of an Action SequenceSequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 44: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Utility of an Action Utility of an Action SequenceSequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories:

U = hUh P(h)

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 45: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Optimal Action SequenceOptimal Action Sequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories• The optimal sequence is the one with maximal utility

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 46: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Optimal Action SequenceOptimal Action Sequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories• The optimal sequence is the one with maximal utility• But is the optimal action sequence what we want to compute?

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

NO!! Except if sequence is executed blindly (open-loop strategy)

Page 47: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

observable stateRepeat: s sensed state If s is terminal then exit a choose action (given s) Perform a

Reactive Agent AlgorithmReactive Agent Algorithm

Page 48: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Utility of state i:

U(i) = maxa (kP(k | a.i) U(k) – Ca)

Best choice of action at state i:

*(i) = arg maxa (kP(k|a.i)U(k)–Ca)0 1 2 3

i

Page 49: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Policy Policy (Reactive/Closed-Loop (Reactive/Closed-Loop Strategy)Strategy)

• A policy is a complete mapping from states to actions

-1

+1

2

3

1

4321

Page 50: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Repeat: s sensed state If s is terminal then exit a (s) Perform a

Reactive Agent AlgorithmReactive Agent Algorithm

Page 51: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Optimal PolicyOptimal Policy

-1

+1

• A policy is a complete mapping from states to actions• The optimal policy * is the one that always yields a history (ending at a terminal state) with maximal expected utility

2

3

1

4321

Makes sense because of Markov property

Note that [3,2] is a “dangerous” state that the optimal policy

tries to avoid

Page 52: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Optimal PolicyOptimal Policy

-1

+1

• A policy is a complete mapping from states to actions• The optimal policy * is the one that always yields a history with maximal expected utility

2

3

1

4321

This problem is called aMarkov Decision Problem (MDP)

How to compute *?

Page 53: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Utility of state i:

U(i) = maxa (kP(k | a.i) U(k) – Ca)

Best choice of action at state i:

*(i) = arg maxa (kP(k|a.i)U(k)–Ca)0 1 2 3

i

Page 54: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

The trick used in target-The trick used in target-tracking (indexing state by tracking (indexing state by time) can be applied …time) can be applied …

-1

+1

… but would yield a large tree + sub-optimal policy

Page 55: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

First-Step AnalysisFirst-Step Analysis

Simulate one step:

U(i) = R(i) + maxa kP(k | a.i) U(k)

*(i) = arg maxa kP(k | a.i) U(k)

( Principle of Max Expected Utility)

-1

+1

?

Page 56: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

What is the Difference?What is the Difference?

*(i) = arg maxa kP(k | a.i) U(k)

U(i) = R(i) + maxa kP(k | a.i) U(k)

0 1 2 3

-1

+1

Page 57: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Value IterationValue IterationInitialize the utility of each non-terminal state si

to U0(i) = 0

For t = 0, 1, 2, …, do:

Ut+1(i) R(i) + maxa kP(k | a.i) Ut(k)

-1

+1

2

3

1

4321

Page 58: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Value IterationValue IterationInitialize the utility of each non-terminal state si

to U0(i) = 0

For t = 0, 1, 2, …, do:

Ut+1(i) R(i) + maxa kP(k | a.i) Ut(k)Ut([3,1])

t0 302010

0.6110.5

0-1

+1

2

3

1

4321

0.705 0.655 0.3880.611

0.762

0.812 0.868 0.918

0.660

Note the importanceof terminal states andconnectivity of thestate-transition graph

Not very different from indexing states by time

Page 59: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Policy IterationPolicy Iteration

Pick a policy at random

Page 60: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Policy IterationPolicy Iteration

Pick a policy at random Repeat: Compute the utility of each state for

U (i) = R(i) + kP(k | (i).i) U (k)Set of linear equations

(often a sparse system)

Page 61: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Policy IterationPolicy Iteration

Pick a policy at random Repeat: Compute the utility of each state for

U (i) = R(i) + kP(k | (i).i) U (k) Compute the policy ’ given these

utilities

’(i) = arg maxa kP(k | a.i) U(k)

If ’ = then return else’

Page 62: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Application of First Step Application of First Step Analysis Analysis Computing the Probability of Computing the Probability of Folding pFolding pfoldfold of a Proteinof a Protein

Unfolded state Folded state

pfold1- pfold

HIV integrase

Page 63: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Computation Through Computation Through SimulationSimulation

10K to 30K independent simulations

Page 64: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Capture the stochastic nature of Capture the stochastic nature of molecular molecular motion by a probabilistic roadmapmotion by a probabilistic roadmap

vi

vj

Pij

Page 65: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Edge probabilitiesEdge probabilities

Follow Metropolis criteria:

otherwise. ,

1

;0 if ,)/exp(

i

iji

Bij

ij

N

EN

TkE

P

Self-transition probability:

ijijii PP 1

vj

vi

Pij

Pii

Page 66: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Pii

F: Folded setU: Unfolded set

First-Step AnalysisFirst-Step Analysis

Pij

i

k

j

l

m

Pik Pil

Pim

Let fi = pfold(i)After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

=1 =1

One linear equation per node Solution gives pfold for all nodes

No explicit simulation run All pathways are taken into

account Sparse linear system

Page 67: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Partially Observable Partially Observable Markov Decision ProblemMarkov Decision Problem

• Uncertainty in sensing:

A sensing operation returns multiple states, with a probability distribution

Page 68: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

Example: Target TrackingExample: Target Tracking

There is uncertaintyin the robot’s and target’s positions, and this uncertaintygrows with further motion

There is a risk that the target escape behind the corner requiring the robot to move appropriately

But there is a positioninglandmark nearby. Shouldthe robot tries to reduceposition uncertainty?

Page 69: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003

SummarySummary

Probabilistic uncertainty Utility function Optimal policy Maximal expected utility Value iteration Policy iteration