60
Decision Making Under Decision Making Under Uncertainty Uncertainty Russell and Norvig: ch 16, 17 CMSC 671 – Fall 2005 material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller

Decision Making Under Uncertainty

Embed Size (px)

DESCRIPTION

Decision Making Under Uncertainty. Russell and Norvig: ch 16, 17 CMSC 671 – Fall 2005. material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller. Decision Making Under Uncertainty. Many environments have multiple possible outcomes Some of these outcomes may be good; others may be bad - PowerPoint PPT Presentation

Citation preview

Page 1: Decision Making Under Uncertainty

Decision Making Under Decision Making Under UncertaintyUncertainty

Russell and Norvig: ch 16, 17

CMSC 671 – Fall 2005

material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller

Page 2: Decision Making Under Uncertainty

Decision Making Under Uncertainty

Many environments have multiple possible outcomesSome of these outcomes may be good; others may be badSome may be very likely; others unlikely

What’s a poor agent to do??

Page 3: Decision Making Under Uncertainty

Non-Deterministic vs. Non-Deterministic vs. Probabilistic UncertaintyProbabilistic Uncertainty

?

ba c

{a,b,c}

decision that is best for worst case

?

ba c

{a(pa),b(pb),c(pc)}

decision that maximizes expected utility valueNon-deterministic

modelProbabilistic model~ Adversarial search

Page 4: Decision Making Under Uncertainty

Expected UtilityExpected Utility

Random variable X with n values x1,…,xn and distribution (p1,…,pn)E.g.: X is the state reached after doing an action A under uncertaintyFunction U of XE.g., U is the utility of a stateThe expected utility of A is EU[A] = i=1,…,n p(xi|A)U(xi)

Page 5: Decision Making Under Uncertainty

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62

One State/One Action One State/One Action ExampleExample

Page 6: Decision Making Under Uncertainty

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62• U2(S0) = 74• U(S0) = max{U1(S0),U2(S0)} = 74

One State/Two Actions One State/Two Actions ExampleExample

Page 7: Decision Making Under Uncertainty

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62 – 5 = 57• U2(S0) = 74 – 25 = 49• U(S0) = max{U1(S0),U2(S0)} = 57

-5 -25

Introducing Action CostsIntroducing Action Costs

Page 8: Decision Making Under Uncertainty

MEU Principle

A rational agent should choose the action that maximizes agent’s expected utilityThis is the basis of the field of decision theoryThe MEU principle provides a normative criterion for rational choice of action

Page 9: Decision Making Under Uncertainty

Not quite…

Must have complete model of: Actions Utilities States

Even if you have a complete model, will be computationally intractableIn fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationalityNevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision-theoretic problems than ever before

Page 10: Decision Making Under Uncertainty

We’ll look at

Decision-Theoretic Planning Simple decision making (ch. 16) Sequential decision making (ch. 17)

Page 11: Decision Making Under Uncertainty

Axioms of Utility Theory

Orderability (A>B) (A<B) (A~B)

Transitivity (A>B) (B>C) (A>C)

Continuity A>B>C p [p,A; 1-p,C] ~ B

Substitutability A~B [p,A; 1-p,C]~[p,B; 1-p,C]

Monotonicity A>B (p≥q [p,A; 1-p,B] >~ [q,A; 1-q,B])

Decomposability [p,A; 1-p, [q,B; 1-q, C]] ~ [p,A; (1-p)q, B; (1-p)(1-q),

C]

Page 12: Decision Making Under Uncertainty

Money Versus Utility

Money <> Utility More money is better, but not always in

a linear relationship to the amount of money

Expected Monetary ValueRisk-averse – U(L) < U(SEMV(L))

Risk-seeking – U(L) > U(SEMV(L))

Risk-neutral – U(L) = U(SEMV(L))

Page 13: Decision Making Under Uncertainty

Value Function

Provides a ranking of alternatives, but not a meaningful metric scaleAlso known as an “ordinal utility function”

Remember the expectiminimax example: Sometimes, only relative judgments (value

functions) are necessary At other times, absolute judgments (utility

functions) are required

Page 14: Decision Making Under Uncertainty

Multiattribute Utility Theory

A given state may have multiple utilities ...because of multiple evaluation criteria ...because of multiple agents (interested

parties) with different utility functions

We will talk about this more later in the semester, when we discuss multi-agent systems and game theory

Page 15: Decision Making Under Uncertainty

Decision Networks

Extend BNs to handle actions and utilitiesAlso called influence diagramsUse BN inference methods to solvePerform Value of Information calculations

Page 16: Decision Making Under Uncertainty

Decision Networks cont.

Chance nodes: random variables, as in BNsDecision nodes: actions that decision maker can takeUtility/value nodes: the utility of the outcome state.

Page 17: Decision Making Under Uncertainty

R&N example

Page 18: Decision Making Under Uncertainty

Umbrella Network

weather

forecast

umbrella

happiness

take/don’t take

f w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2

P(rain) = 0.4

U(have,rain) = -25U(have,~rain) = 0U(~have, rain) = -100U(~have, ~rain) = 100

have umbrella

P(have|take) = 1.0P(~have|~take)=1.0

Page 19: Decision Making Under Uncertainty

Evaluating Decision Networks

Set the evidence variables for current stateFor each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the

parent nodes of the utility node, using BN inference

Calculate the resulting utility for action

Return the action with the highest utility

Page 20: Decision Making Under Uncertainty

Decision Making:Umbrella Network

weather

forecast

umbrella

happiness

take/don’t take

f w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2

P(rain) = 0.4

U(have,rain) = -25U(have,~rain) = 0U(~have, rain) = -100U(~have, ~rain) = 100

have umbrella

P(have|take) = 1.0P(~have|~take)=1.0

Should I take my umbrella??

Page 21: Decision Making Under Uncertainty

Value of Information (VOI)

Suppose an agent’s current knowledge is E. The value of the current best action is

i

iiA

))A(Do,E|)A(sult(ReP))A(sult(ReUmax)E|(EU

The value of the new best action (after new evidence E’ is obtained):

i

iiA

))A(Do,E,E|)A(sult(ReP))A(sult(ReUmax)E,E|(EU

The value of information for E’ is therefore:

)E|(EU)E,e|(EU)E|e(P)E(VOIk

kekk

Page 22: Decision Making Under Uncertainty

Value of Information:Umbrella Network

weather

forecast

umbrella

happiness

take/don’t take

f w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2

P(rain) = 0.4

U(have,rain) = -25U(have,~rain) = 0U(~have, rain) = -100U(~have, ~rain) = 100

have umbrella

P(have|take) = 1.0P(~have|~take)=1.0

What is the value of knowing the weather forecast?

Page 23: Decision Making Under Uncertainty

Sequential Decision Making

Finite HorizonInfinite Horizon

Page 24: Decision Making Under Uncertainty

Simple Robot Navigation Simple Robot Navigation ProblemProblem

• In each state, the possible actions are U, D, R, and L

Page 25: Decision Making Under Uncertainty

Probabilistic Transition Probabilistic Transition ModelModel

• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):

• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)

Page 26: Decision Making Under Uncertainty

Probabilistic Transition Probabilistic Transition ModelModel

• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):

• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move)

Page 27: Decision Making Under Uncertainty

Probabilistic Transition Probabilistic Transition ModelModel

• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):

• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move)• With probability 0.1 the robot moves left one square (if the robot is already in the leftmost row, then it does not move)

Page 28: Decision Making Under Uncertainty

Markov PropertyMarkov Property

The transition properties depend only on the current state, not on previous history (how that state was reached)

Page 29: Decision Making Under Uncertainty

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)

2

3

1

4321

[3,2]

Page 30: Decision Making Under Uncertainty

Sequence of ActionsSequence of Actions

• Planned sequence of actions: (U, R)• U is executed

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

Page 31: Decision Making Under Uncertainty

HistoriesHistories

• Planned sequence of actions: (U, R)• U has been executed• R is executed

• There are 9 possible sequences of states – called histories – and 6 possible final states for the robot!

4321

2

3

1

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 32: Decision Making Under Uncertainty

Probability of Reaching the Probability of Reaching the GoalGoal

•P([4,3] | (U,R).[3,2]) = P([4,3] | R.[3,3]) x P([3,3] | U.[3,2]) + P([4,3] | R.[4,2]) x P([4,2] | U.[3,2])

2

3

1

4321

Note importance of Markov property in this derivation

•P([3,3] | U.[3,2]) = 0.8•P([4,2] | U.[3,2]) = 0.1

•P([4,3] | R.[3,3]) = 0.8•P([4,3] | R.[4,2]) = 0.1

•P([4,3] | (U,R).[3,2]) = 0.65

Page 33: Decision Making Under Uncertainty

Utility FunctionUtility Function

• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape

-1

+1

2

3

1

4321

Page 34: Decision Making Under Uncertainty

Utility FunctionUtility Function

• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• The robot needs to recharge its batteries

-1

+1

2

3

1

4321

Page 35: Decision Making Under Uncertainty

Utility FunctionUtility Function

• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• The robot needs to recharge its batteries• [4,3] or [4,2] are terminal states

-1

+1

2

3

1

4321

Page 36: Decision Making Under Uncertainty

Utility of a HistoryUtility of a History

• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• The robot needs to recharge its batteries• [4,3] or [4,2] are terminal states• The utility of a history is defined by the utility of the last state (+1 or –1) minus n/25, where n is the number of moves

-1

+1

2

3

1

4321

Page 37: Decision Making Under Uncertainty

Utility of an Action Utility of an Action SequenceSequence

-1

+1

• Consider the action sequence (U,R) from [3,2]

2

3

1

4321

Page 38: Decision Making Under Uncertainty

Utility of an Action Utility of an Action SequenceSequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 39: Decision Making Under Uncertainty

Utility of an Action Utility of an Action SequenceSequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories:

U = hUh P(h)

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 40: Decision Making Under Uncertainty

Optimal Action SequenceOptimal Action Sequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories• The optimal sequence is the one with maximal utility

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

Page 41: Decision Making Under Uncertainty

Optimal Action SequenceOptimal Action Sequence

-1

+1

• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories• The optimal sequence is the one with maximal utility• But is the optimal action sequence what we want to compute?

2

3

1

4321

[3,2]

[4,2][3,3][3,2]

[3,3][3,2] [4,1] [4,2] [4,3][3,1]

only if the sequence is executed blindly!

Page 42: Decision Making Under Uncertainty

Accessible orobservable state

Repeat: s sensed state If s is terminal then exit a choose action (given s) Perform a

Reactive Agent AlgorithmReactive Agent Algorithm

Page 43: Decision Making Under Uncertainty

Policy Policy (Reactive/Closed-Loop (Reactive/Closed-Loop Strategy)Strategy)

• A policy is a complete mapping from states to actions

-1

+1

2

3

1

4321

Page 44: Decision Making Under Uncertainty

Repeat: s sensed state If s is terminal then exit a (s) Perform a

Reactive Agent AlgorithmReactive Agent Algorithm

Page 45: Decision Making Under Uncertainty

Optimal PolicyOptimal Policy

-1

+1

• A policy is a complete mapping from states to actions• The optimal policy * is the one that always yields a history (ending at a terminal state) with maximal expected utility

2

3

1

4321

Makes sense because of Markov property

Note that [3,2] is a “dangerous” state that the optimal policy

tries to avoid

Page 46: Decision Making Under Uncertainty

Optimal PolicyOptimal Policy

-1

+1

• A policy is a complete mapping from states to actions• The optimal policy * is the one that always yields a history with maximal expected utility

2

3

1

4321

This problem is called aMarkov Decision Problem (MDP)

How to compute *?

Page 47: Decision Making Under Uncertainty

Additive UtilityAdditive Utility

History H = (s0,s1,…,sn)

The utility of H is additive iff: U(s0,s1,…,sn) = R(0) + U(s1,…,sn) = R(i)

Reward

Page 48: Decision Making Under Uncertainty

Additive UtilityAdditive Utility

History H = (s0,s1,…,sn)

The utility of H is additive iff: U(s0,s1,…,sn) = R(0) + U(s1,…,sn) =

R(i)

Robot navigation example: R(n) = +1 if sn = [4,3]

R(n) = -1 if sn = [4,2] R(i) = -1/25 if i = 0, …, n-1

Page 49: Decision Making Under Uncertainty

Principle of Max Expected Principle of Max Expected UtilityUtility

History H = (s0,s1,…,sn)

Utility of H: U(s0,s1,…,sn) = R(i)

First-step analysis

U(i) = R(i) + maxa kP(k | a.i) U(k)

*(i) = arg maxa kP(k | a.i) U(k)

-1

+1

Page 50: Decision Making Under Uncertainty

Value IterationValue Iteration Initialize the utility of each non-terminal state si

to U0(i) = 0

For t = 0, 1, 2, …, do:

Ut+1(i) R(i) + maxa kP(k | a.i) Ut(k)

-1

+1

2

3

1

4321

Page 51: Decision Making Under Uncertainty

Value IterationValue Iteration Initialize the utility of each non-terminal state si

to U0(i) = 0

For t = 0, 1, 2, …, do:

Ut+1(i) R(i) + maxa kP(k | a.i) Ut(k)

Ut([3,1])

t0 302010

0.6110.5

0-1

+1

2

3

1

4321

0.705 0.655 0.3880.611

0.762

0.812 0.868 0.918

0.660

Note the importanceof terminal states andconnectivity of thestate-transition graph

Page 52: Decision Making Under Uncertainty

Policy IterationPolicy Iteration

Pick a policy at random

Page 53: Decision Making Under Uncertainty

Policy IterationPolicy Iteration

Pick a policy at random Repeat: Compute the utility of each state for

Ut+1(i) R(i) + kP(k | (i).i) Ut(k)

Page 54: Decision Making Under Uncertainty

Policy IterationPolicy Iteration

Pick a policy at random Repeat: Compute the utility of each state for

Ut+1(i) R(i) + kP(k | (i).i) Ut(k)

Compute the policy ’ given these utilities

’(i) = arg maxa kP(k | a.i) U(k)

Page 55: Decision Making Under Uncertainty

Policy IterationPolicy Iteration

Pick a policy at random Repeat: Compute the utility of each state for

Ut+1(i) R(i) + kP(k | (i).i) Ut(k) Compute the policy ’ given these

utilities

’(i) = arg maxa kP(k | a.i) U(k)

If ’ = then return

Or solve the set of linear equations:

U(i) = R(i) + kP(k | (i).i) U(k)(often a sparse system)

Page 56: Decision Making Under Uncertainty

Infinite HorizonInfinite Horizon

-1

+1

2

3

1

4321

In many problems, e.g., the robot navigation example, histories are potentially unbounded and the same state can be reached many timesOne trick:

Use discounting to make infiniteHorizon problem mathematicallytractable

What if the robot lives forever?

Page 57: Decision Making Under Uncertainty

Example: Tracking a Example: Tracking a TargetTarget

targetrobot

• The robot must keep the target in view• The target’s trajectory is not known in advance• The environment may or may not be known

An optimal policy cannot be computed aheadof time:- The environment might be unknown- The environment may only be partially observable- The target may not wait

A policy must be computed “on-the-fly”

Page 58: Decision Making Under Uncertainty

POMDP POMDP (Partially Observable Markov Decision (Partially Observable Markov Decision

Problem)Problem)

• A sensing operation returns multiple states, with a probability distribution

• Choosing the action that maximizes the expected utility of this state distribution assuming “state utilities” computed as above is not good enough, and actually does not make sense (is not rational)

Page 59: Decision Making Under Uncertainty

Example: Target TrackingExample: Target Tracking

There is uncertaintyin the robot’s and target’s positions; this uncertaintygrows with further motion

There is a risk that the target may escape behind the corner, requiring the robot to move appropriately

But there is a positioninglandmark nearby. Shouldthe robot try to reduce itsposition uncertainty?

Page 60: Decision Making Under Uncertainty

SummarySummary

Decision making under uncertainty Utility function Optimal policy Maximal expected utility Value iteration Policy iteration