15
1 SA-1 Probabilistic Robotics Planning and Control: Markov Decision Processes

Probabilistic Robotics

Embed Size (px)

DESCRIPTION

Probabilistic Robotics. Planning and Control: Markov Decision Processes. Problem Classes. Deterministic vs. stochastic actions Full vs. partial observability. Deterministic, fully observable. Stochastic, Fully Observable. Stochastic, Partially Observable. Markov Decision Process (MDP). - PowerPoint PPT Presentation

Citation preview

Page 1: Probabilistic Robotics

1SA-1

Probabilistic Robotics

Planning and Control:

Markov Decision Processes

Page 2: Probabilistic Robotics

2

Problem Classes

•Deterministic vs. stochastic actions

•Full vs. partial observability

Page 3: Probabilistic Robotics

3

Deterministic, fully observable

Page 4: Probabilistic Robotics

4

Stochastic, Fully Observable

Page 5: Probabilistic Robotics

5

Stochastic, Partially Observable

Page 6: Probabilistic Robotics

6

Markov Decision Process (MDP)

s2

s3

s4s5

s1

0.7

0.3

0.90.1

0.3

0.3

0.4

0.99

0.01

0.2

0.8 r=-10

r=20

r=0

r=1

r=0

Page 7: Probabilistic Robotics

7

Markov Decision Process (MDP)

• Given:• States x• Actions u• Transition probabilities p(x‘|u,x)• Reward / payoff function r(x,u)

• Wanted:• Policy (x) that maximizes the future

expected reward

Page 8: Probabilistic Robotics

8

Rewards and Policies• Policy (general case):

• Policy (fully observable case):

• Expected cumulative payoff:

• T=1: greedy policy• T>1: finite horizon case, typically no discount• T=infty: infinite-horizon case, finite reward if discount < 1

ttt uuz 1:11:1 ,:

tt ux :

T

tT rER1

Page 9: Probabilistic Robotics

9

Policies contd.• Expected cumulative payoff of policy:

• Optimal policy:

• 1-step optimal policy:

• Value function of 1-step optimal policy:

T

tttttT uzurExR1

1:11:1 )(|)(

),(argmax)(1 uxrxu

),(max)(1 uxrxVu

)(argmax tT xR

Page 10: Probabilistic Robotics

10

2-step Policies• Optimal policy:

• Value function:

'),|'()'(),(argmax)( 12 dxxuxpxVuxrxu

'),|'()'(),(max)( 12 dxxuxpxVuxrxVu

Page 11: Probabilistic Robotics

11

T-step Policies• Optimal policy:

• Value function:

'),|'()'(),(argmax)( 1 dxxuxpxVuxrx Tu

T

'),|'()'(),(max)( 1 dxxuxpxVuxrxV Tu

T

Page 12: Probabilistic Robotics

12

Infinite Horizon

• Optimal policy:

• Bellman equation

• Fix point is optimal policy

• Necessary and sufficient condition

'),|'()'(),(max)( dxxuxpxVuxrxVu

Page 13: Probabilistic Robotics

13

Value Iteration

• for all x do

• endfor

• repeat until convergence• for all x do

• endfor

• endrepeat

'),|'()'(ˆ),(max)(ˆ dxxuxpxVuxrxVu

min)(ˆ rxV

'),|'()'(ˆ),(argmax)( dxxuxpxVuxrxu

Page 14: Probabilistic Robotics

14

Value Iteration for Motion Planning

Page 15: Probabilistic Robotics

15

Value Function and Policy Iteration

• Often the optimal policy has been reached long before the value function has converged.

• Policy iteration calculates a new policy based on the current value function and then calculates a new value function based on this policy.

• This process often converges faster to the optimal policy.