Upload
rory
View
35
Download
0
Embed Size (px)
DESCRIPTION
Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty. Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004. Robots in unstructured environments. A vision for robotic-assisted health-care. Providing - PowerPoint PPT Presentation
Citation preview
Tractable Planning for Real-World Robotics:
The promises and challenges of dealing with uncertainty
Joelle PineauRobotics Institute
Carnegie Mellon University
Stanford University
May 3, 2004
Joelle Pineau Tractable Planning for Real-World Robotics 2
Robots in unstructured environments
Joelle Pineau Tractable Planning for Real-World Robotics 3
A vision for robotic-assisted health-care
Moving thingsaround
Moving thingsaround Supporting
inter-personalcommunication
Supportinginter-personal
communication
Calling for helpin emergencies
Calling for helpin emergencies
Monitoring Rx adherence
& safety
Monitoring Rx adherence
& safety
Providinginformation Providing
information
Reminding to eat, drink, & take meds
Reminding to eat, drink, & take meds
Providing physical
assistance
Providing physical
assistance
Joelle Pineau Tractable Planning for Real-World Robotics 4
What is hard about planning for real-world robotics?
• Generating a plan that has high expected utility.
• Switching between different representations of the world.
• Resolving conflicts between interfering jobs.
• Accomplishing jobs in changing, partly unknown environments.
• Handling percepts which are incomplete, ambiguous, outdated,
incorrect.
* Highlights from a proposed robot grand challenge
Joelle Pineau Tractable Planning for Real-World Robotics 5
Talk outline
• Uncertainty in plan-based robotics
• Partially Observable Markov Decision Processes (POMDPs)
• POMDP solver #1: Point-based value iteration (PBVI)
• POMDP solver #2: Policy-contingent abstraction (PolCA+)
Joelle Pineau Tractable Planning for Real-World Robotics 6
Why use a POMDP?
• POMDPs provide a rich framework for sequential decision-making, which can model:
– Effect uncertainty
– State uncertainty
– Varying rewards across actions and goals
Joelle Pineau Tractable Planning for Real-World Robotics 7
Robotics:
• Robust mobile robot navigation[Simmons & Koenig, 1995; + many more]
• Autonomous helicopter control[Bagnell & Schneider, 2001; Ng et al., 2003]
• Machine vision[Bandera et al., 1996; Darrell & Pentland, 1996]
• High-level robot control[Pineau et al., 2003]
• Robust dialogue management[Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000]
POMDP applications in last decade
Others:
• Machine maintenance[Puterman., 1994]
• Network troubleshooting[Thiebeaux et al., 1996]
• Circuits testing [correspondence, 2004]
• Preference elicitation[Boutilier, 2002]
• Medical diagnosis[Hauskrecht, 1997]
Joelle Pineau Tractable Planning for Real-World Robotics 8
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
S = state setA = action setZ = observation set
What we see: zt-1 zt
Joelle Pineau Tractable Planning for Real-World Robotics 9
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
T: Pr(s’|s,a) = state-to-state transition probabilitiesS = state setA = action setZ = observation set
What we see: zt-1 zt
Joelle Pineau Tractable Planning for Real-World Robotics 10
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilities
S = state setA = action setZ = observation set
What we see: zt-1 zt
Joelle Pineau Tractable Planning for Real-World Robotics 11
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function
S = state setA = action setZ = observation set
What we see: zt-1 zt
rt-1 rt
Joelle Pineau Tractable Planning for Real-World Robotics 12
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 atWhat we see: zt-1 zt
What we infer: bt-1 bt
rt-1 rt
T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function
S = state setA = action setZ = observation set
Joelle Pineau Tractable Planning for Real-World Robotics 13
Examples of robot beliefs
robot particles
Uniform belief Bi-modal belief
Joelle Pineau Tractable Planning for Real-World Robotics 14
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2}
P(s1)
0
1
Joelle Pineau Tractable Planning for Real-World Robotics 15
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2, s3}
P(s1)
P(s2)
0
1
1
Joelle Pineau Tractable Planning for Real-World Robotics 16
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2, s3 , s4}
P(s1)
P(s2)
0
1
1
P(s3)
Joelle Pineau Tractable Planning for Real-World Robotics 17
POMDP solving
Objective: Find the sequence of actions that maximizes the expected sum of rewards.
Bb
AabVbabTabRbV
'
)'()',,(),(max)(
Valuefunction
Immediatereward
Futurereward
Joelle Pineau Tractable Planning for Real-World Robotics 18
• Represent V(b) as the upper surface of a set of vectors.– Each vector is a piece of the control policy (= action sequences).
– Dim(vector) = number of states.
• Modify / add vectors to update value fn (i.e. refine policy).
POMDP value function
P(s1)
V(b)
b
2 states
Joelle Pineau Tractable Planning for Real-World Robotics 19
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
V0(b)
b
Plan length #vectors 0 1
P(break-in)
Joelle Pineau Tractable Planning for Real-World Robotics 20
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
P(break-in)
V1(b)
b
Plan length # vectors 0 1 1 3
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 21
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
V2(b)
b
Plan length # vectors 0 1 1 3 2 27
P(break-in)
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 22
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
V3(b)
b
Plan length # vectors 0 1 1 3 2 27 3 2187
P(break-in)
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 23
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907V4(b)
b
P(break-in)
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 24
The curse of history
)A( Z1 nn O
Policy size grows exponentially with the
planning horizon:
Where Γ = policy sizen = planning horizonA = # actionsZ = # observations
Joelle Pineau Tractable Planning for Real-World Robotics 25
How many vectors for this problem?
104 (navigation) x 103 (dialogue) states1000+ observations100+ actions
Joelle Pineau Tractable Planning for Real-World Robotics 26
Talk outline
• Uncertainty in plan-based robotics
• Partially Observable Markov Decision Processes (POMDPs)
• POMDP solver #1: Point-based value iteration (PBVI)
• POMDP solver #2: Policy-contingent abstraction (PolCA+)
Joelle Pineau Tractable Planning for Real-World Robotics 27
Exact solving assumes all beliefs are equally likely
robot particles
Uniform belief Bi-modal belief N-modal belief
INSIGHT: No sequence of actions and observations canproduce this N-modal belief.
Joelle Pineau Tractable Planning for Real-World Robotics 28
A new algorithm: Point-based value iteration
P(s1)
V(b)
b1 b0 b2
Approach:
Select a small set of belief points
Plan for those belief points only
Joelle Pineau Tractable Planning for Real-World Robotics 29
A new algorithm: Point-based value iteration
P(s1)
V(b)
b1 b0 b2a,z a,z
Approach:
Select a small set of belief points Use well-separated, reachable beliefs
Plan for those belief points only
Joelle Pineau Tractable Planning for Real-World Robotics 30
A new algorithm: Point-based value iteration
P(s1)
V(b)
b1 b0 b2a,z a,z
Approach:
Select a small set of belief points Use well-separated, reachable beliefs
Plan for those belief points only Learn value and its gradient
Joelle Pineau Tractable Planning for Real-World Robotics 31
A new algorithm: Point-based value iteration
Approach:
Select a small set of belief points Use well-separated, reachable beliefs
Plan for those belief points only Learn value and its gradient
Pick action that maximizes value: bbV
max)(
P(s1)
V(b)
bb1 b0 b2
Joelle Pineau Tractable Planning for Real-World Robotics 32
The anytime PBVI algorithm
• Alternate between:
1. Growing the set of belief point
2. Planning for those belief points
• Terminate when you run out of time or have a good policy.
Joelle Pineau Tractable Planning for Real-World Robotics 33
Complexity of value update
Exact Update PBVI
Time:Projection S2 A Z n S2 A Z n
Sum S A nZ S A Z n B
Size: (# vectors) A nZ B
where: S = # states n = # vectors at iteration n A = # actions B = # belief points
Z = # observations
Joelle Pineau Tractable Planning for Real-World Robotics 34
Theoretical properties of PBVI
Theorem: For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by:
P(s1)
V(b)
b1 b0 b2
Where is the set of reachable beliefsB is the set of all beliefs
1'2minmax* ||'||minmax
)1(
)(|||| bb
RRVV Bbbn
Bn
Err Err
Joelle Pineau Tractable Planning for Real-World Robotics 35
The anytime PBVI algorithm
• Alternate between:
1. Growing the set of belief point
2. Planning for those belief points
• Terminate when you run out of time or have a good policy.
Joelle Pineau Tractable Planning for Real-World Robotics 36
PBVI’s belief selection heuristic
1. Leverage insight from policy search methods:
– Focus on reachable beliefs.
P(s1)
b ba1,z2ba2,z2ba2,z1ba1,z1
a2,z2 a1,z2
a2,z1
a1,z1
Joelle Pineau Tractable Planning for Real-World Robotics 37
PBVI’s belief selection heuristic
1. Leverage insight from policy search methods:
– Focus on reachable beliefs.
2. Focus on high probability beliefs:
– Consider all actions, but stochastic observation choice.
P(s1)
b ba1,z2ba2,z1
a1,z2
a2,z1
ba2,z2ba1,z1
Joelle Pineau Tractable Planning for Real-World Robotics 38
PBVI’s belief selection heuristic
1. Leverage insight from policy search methods:
– Focus on reachable beliefs.
2. Focus on high probability beliefs:
– Consider all actions, but stochastic observation choice.
3. Use the error bound on point-based value updates:
– Select well-separated beliefs, rather than near-by beliefs.
P(s1)
b ba1,z2ba2,z1
a1,z2
a2,z1
Joelle Pineau Tractable Planning for Real-World Robotics 39
Classes of value function approximations
1. No belief [Littman et al., 1995]
3. Compressed belief[Poupart&Boutilier, 2002;
Roy&Gordon, 2002]
x1
x0
x2
2. Grid over belief[Lovejoy, 1991; Brafman 1997;
Hauskrecht, 2000; Zhou&Hansen, 2001]
4. Sample belief points[Poon, 2001; Pineau et al, 2003]
Joelle Pineau Tractable Planning for Real-World Robotics 40
Performance on well-known POMDPs
Maze1
0.20
0.94
0.00
2.30
2.25
REWARD
Maze2
0.11
-
0.07
0.35
0.34
Maze3
0.26
-
0.11
0.53
0.53
Maze1
0.19
-
24hrs
12166
3448
TIME
Maze2
1.44
-
24hrs
27898
360
Maze3
0.51
-
24hrs
450
288
Maze1
-
174
-
660
470
# Belief points
Maze2
-
337
-
1840
95
Maze3
-
-
-
300
86
Method
No belief[Littman&al., 1995]
Grid[Brafman., 1997]
Compressed[Poupart&al., 2003]
Sample[Poon, 2001]
PBVI[Pineau&al., 2003]
Maze1:36 states
Maze2:92 states
Maze3: 60 states
Joelle Pineau Tractable Planning for Real-World Robotics 41
PBVI in the Nursebot domain
Objective: Find the patient!
State space = RobotPosition PatientPosition
Observation space = RobotPosition + PatientFound
Action space = {North, South, East, West, Declare}
Joelle Pineau Tractable Planning for Real-World Robotics 42
PBVI performance on find-the-patient domain
Patient found 17% of trials
Patient found 90% of trialsNo Belief PBVI
No Belief
PBVI
Joelle Pineau Tractable Planning for Real-World Robotics 43
Validation of PBVI’s belief expansion heuristic
Greedy
PBVI
No BeliefRandom
Find-the-patient domain870 states, 5 actions, 30 observations
Joelle Pineau Tractable Planning for Real-World Robotics 44
Policy assuming full observability
You find some:(25%)
You loose some:(75%)
Joelle Pineau Tractable Planning for Real-World Robotics 45
PBVI Policy with 3141 belief points
You find some:(81%)
You loose some:(19%)
Joelle Pineau Tractable Planning for Real-World Robotics 46
PBVI Policy with 643 belief points
You find some:(22%)
You loose some:(78%)
Joelle Pineau Tractable Planning for Real-World Robotics 47
Highlights of the PBVI algorithm
• Algorithmic:
– New belief sampling algorithm.
– Efficient heuristic for belief point selection.
– Anytime performance.
• Experimental:
– Outperforms previous value approximation algorithms on known problems.
– Solves new larger problem (1 order of magnitude increase in problem size).
• Theoretical:
– Bounded approximation error.
[ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ]
Joelle Pineau Tractable Planning for Real-World Robotics 48
Back to the grand challenge
How can we go from 103 states to real-world problems?
Joelle Pineau Tractable Planning for Real-World Robotics 49
Talk outline
• Uncertainty in plan-based robotics
• Partially Observable Markov Decision Processes (POMDPs)
• POMDP solver #1: Point-based value iteration (PBVI)
• POMDP solver #2: Policy-contingent abstraction (PolCA+)
Joelle Pineau Tractable Planning for Real-World Robotics 50
Navigation
Structured POMDPs
Many real-world decision-making problems exhibit structure inherent to the problem domain.
Cognitive support Social interaction
High-level controller
Move AskWhere
Left Right Forward Backward
Joelle Pineau Tractable Planning for Real-World Robotics 51
Structured POMDP approaches
Factored models[Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001]
– Idea: Represent state space with multi-valued state features.
– Insight: Independencies between state features can be leveraged to
overcome the curse of dimensionality.
Hierarchical POMDPs[Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol &
Mahadevan, 2000; Pineau & Thrun, 2000]
– Idea: Exploit domain knowledge to divide one POMDP into many
smaller ones.
– Insight: Smaller action sets further help overcome the curse of history.
Joelle Pineau Tractable Planning for Real-World Robotics 52
A hierarchy of POMDPs
Act
ExamineHealth Navigate
MoveVerifyFluids
ClarifyGoal
North South East West
VerifyMeds
subtask
abstract action
primitive action
Joelle Pineau Tractable Planning for Real-World Robotics 53
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
AMove = {N,S,E,W}
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
Step 1: Select the action set
Joelle Pineau Tractable Planning for Real-World Robotics 54
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
AMove = {N,S,E,W}
SMove = {s1,s2}
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
Step 1: Select the action set
Step 2: Minimize the state set
Joelle Pineau Tractable Planning for Real-World Robotics 55
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
AMove = {N,S,E,W}
SMove = {s1,s2}
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
PARAMETERS
{bh,Th,Oh,Rh}
PARAMETERS
{bh,Th,Oh,Rh}
Step 1: Select the action set
Step 2: Minimize the state set
Step 3: Choose parameters
Joelle Pineau Tractable Planning for Real-World Robotics 56
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
PLAN
h
PLAN
h
PARAMETERS
{bh,Th,Oh,Rh}
PARAMETERS
{bh,Th,Oh,Rh}
Step 1: Select the action set
Step 2: Minimize the state set
Step 3: Choose parameters
Step 4: Plan task h
AMove = {N,S,E,W}
SMove = {s1,s2}
Joelle Pineau Tractable Planning for Real-World Robotics 57
PolCA+ in the Nursebot domain
• Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments.
Joelle Pineau Tractable Planning for Real-World Robotics 58
Performance measure
-2000
2000
6000
10000
14000
0 400 800 1200
Time Steps
Cum
ulat
ive
Rew
ard
PolCA+
PolCA
QMDP
Hierarchy + Belief
Execution Steps
Hierarchy + Belief
Hierarchy + BeliefPolCA+
Joelle Pineau Tractable Planning for Real-World Robotics 59
Comparing user performance
0.1 0.10.18
POMDP PolicyNo Belief Policy
Joelle Pineau Tractable Planning for Real-World Robotics 60
Visit to the nursing home
Joelle Pineau Tractable Planning for Real-World Robotics 61
Highlights of the PolCA+ algorithm
• Algorithmic:– New hierarchical approach for POMDP framework.
– POMDP-specific state and observation abstraction methods.
• Experimental:– First instance of POMDP-based high-level robot controller.
– Novel application of POMDPs to robust dialogue management.
• Theoretical:– For special case (fully observable), guarantees recursive optimality.
[ Pineau, Gordon & Thrun, UAI 2003. Pineau et al., RAS 2003. Roy, Pineau & Thrun, ACL 2001]
Joelle Pineau Tractable Planning for Real-World Robotics 62
Future work
PBVI:
• How can we handle domains with multi-valued state features?
• Can we leverage dimensionality reduction?
• Can we find better ways to pick belief points?
PolCA+:
• Can we automatically learn hierarchies?
• How can we learn (or do without) pseudo-reward functions?
Joelle Pineau Tractable Planning for Real-World Robotics 63
Questions?
Project information:www.cs.cmu.edu/~nursebot
Navigation software:www.cs.cmu.edu/~carmen
Papers and more:www.cs.cmu.edu/~jpineau
Collaborators: Geoffrey Gordon, Judith Matthews, Michael Montemerlo,Martha Pollack, Nicholas Roy, Sebastian Thrunh
Joelle Pineau Tractable Planning for Real-World Robotics 64
Two types of uncertainty
Effect → Stochastic action effects
State → Partial and noisy sensor information
Joelle Pineau Tractable Planning for Real-World Robotics 65
Example: Effect uncertainty
Startposition
Distribution over possiblenext-step positions
Startposition
Distribution over possiblenext-step positions
Motion action
Joelle Pineau Tractable Planning for Real-World Robotics 66
Two types of uncertainty
Effect → Stochastic action effects
State → Partial and noisy sensor information
Joelle Pineau Tractable Planning for Real-World Robotics 67
Example: State uncertainty
Effect → Stochastic action effects
State → Partial and noisy sensor information
Model → Inaccurate parameterization of the environment
Agent → Unknown behaviour of other agents
Startposition
Distribution over possiblenext-step positions
Joelle Pineau Tractable Planning for Real-World Robotics 68
Validation of PBVI’s belief expansion heuristic
No Belief
Hallway domain60 states, 5 actions, 20 observations
Joelle Pineau Tractable Planning for Real-World Robotics 69
0
500
1000
1500
2000
2500
3000
3500
4000
4500
NoAbs PolCA PolCA+
# S
tate
ssubInform
subMove
subContact
subRest
subAssist
subRemind
act
State space reduction
No hierarchy PolCA+
Joelle Pineau Tractable Planning for Real-World Robotics 70
Future directions
• Improving POMDP planning
– sparser belief space sampling, ordered value updating, dimensionality reduction, continuous / hybrid domains
• Addressing two more types of uncertainty:1. Effect2. State3. Model4. Agent
• Exploring new applications