Upload
christian-shaw
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
CS B659: INTELLIGENT ROBOTICSPlanning Under Uncertainty
Perception(e.g., filtering,
SLAM)
Prior knowledge (often probabilistic)
Sensors
Decision-making(control laws, optimization,
planning)
Offline learning(e.g., calibration)
Actuators
Actions
DEALING WITH UNCERTAINTY
Sensing uncertainty Localization error Noisy maps Misclassified objects
Motion uncertainty Noisy natural processes Odometry drift Imprecise actuators Uncontrolled agents (treat as state)
MOTION UNCERTAINTY
g
obstacle
obstacle
obstacle
s
DEALING WITH MOTION UNCERTAINTY
Hierarchical strategies Use generated trajectory as input into a low-level
feedback controller Reactive strategies
Approach #1: re-optimize when the state has diverged from the planned path (online optimization)
Approach #2: precompute optimal controls over state space, and just read off the new value from the perturbed state (offline optimization)
Proactive strategies Explicitly consider future uncertainty Heuristics (e.g., grow obstacles, penalize nearness) Markov Decision Processes Online and offline approaches
PROACTIVE STRATEGIES TO HANDLE MOTION UNCERTAINTY
g
obstacle
obstacle
obstacle
s
DYNAMIC COLLISION AVOIDANCE ASSUMING WORST-CASE BEHAVIORS
Optimizing safety in real-time under worst case behavior model
T=1
T=2
TTPF=2.6
Robot
MARKOV DECISION PROCESS APPROACHES
Alterovitz et al 2007
DEALING WITH SENSING UNCERTAINTY
Reactive heuristics often work well Optimistic costs Assume most-likely state Penalize uncertainty
Proactive strategies Explicitly consider future uncertainty Active sensing Reward actions that yield information gain Partially Observable Markov Decision Processes
(POMDPs)
10
Assuming no obstacles in the unknown region and taking the shortest path to the goal
11
Assuming no obstacles in the unknown region and taking the shortest path to the goal
12
Assuming no obstacles in the unknown region and taking the shortest path to the goal
Works well for navigation because the space of all maps is too huge, and certainty is monotonically nondecreasing
13
Assuming no obstacles in the unknown region and taking the shortest path to the goal
Works well for navigation because the space of all maps is too huge, and certainty is monotonically nondecreasing
14
What if the sensor was directed (e.g., a camera)?
15
What if the sensor was directed (e.g., a camera)?
16
What if the sensor was directed (e.g., a camera)?
17
What if the sensor was directed (e.g., a camera)?
18
What if the sensor was directed (e.g., a camera)?
19
What if the sensor was directed (e.g., a camera)?
20
What if the sensor was directed (e.g., a camera)?
21
What if the sensor was directed (e.g., a camera)?
22
What if the sensor was directed (e.g., a camera)?
23
What if the sensor was directed (e.g., a camera)?
24
What if the sensor was directed (e.g., a camera)?
At this point, it would have made sense to turn a bit more to see more of the unknown map
ANOTHER EXAMPLE
? Locked?
Key?Key?
ACTIVE DISCOVERY OF HIDDEN INTENT
No motion
Perpendicular motion
MAIN APPROACHES TO PARTIAL OBSERVABILITY
Ignore and react Doesn’t know what it doesn’t know
Model and react Knows what it doesn’t know
Model and predict Knows what it doesn’t know AND what it
will/won’t know in the future
• Better decisions• More components in
implementation• Harder
computationally
UNCERTAINTY MODELS
Model #1: Nondeterministic uncertainty f(x,u) -> a set of possible successors
Model #2: Probabilistic uncertainty P(x’|x,u): a probability distribution over
successors x’, given state x, control u Markov assumption
NONDETERMINISTIC UNCERTAINTY : REASONING WITH SETS
x’ = x + e, [-a,a]
t=0
t=1-a a
t=2-2a 2a
Belief State: x(t) [-ta,ta]
UNCERTAINTY WITH SENSING
Plan = policy (mapping from states to actions)
Policy achieves goal for every possible sensor result in belief state
Move
Sense1 2
3 4
Observations should be chosen wisely to keep branching factor low
Outcomes
Special case: fully observable state
TARGET TRACKING
targetrobot
The robot must keep a target in its field of view
The robot has a prior map of the obstacles
But it does not know the target’s trajectory in advance
TARGET-TRACKING EXAMPLE Time is discretized into small
steps of unit duration
At each time step, each of the two agents moves by at most one increment along a single axis
The two moves are simultaneous
The robot senses the new position of the target at each step
The target is not influenced by the robot (non-adversarial, non-cooperative target)
targetrobot
TIME-STAMPED STATES (NO CYCLES POSSIBLE)
State = (robot-position, target-position, time) In each state, the robot can execute 5 possible actions :
{stop, up, down, right, left} Each action has 5 possible outcomes (one for each possible action
of the target), with some probability distribution[Potential collisions are ignored for simplifying the presentation]
([i,j], [u,v], t)
• ([i+1,j], [u,v], t+1)• ([i+1,j], [u-1,v], t+1)• ([i+1,j], [u+1,v], t+1)• ([i+1,j], [u,v-1], t+1)• ([i+1,j], [u,v+1], t+1)
right
REWARDS AND COSTSThe robot must keep seeing the target as long as possible
Each state where it does not see the target is terminal
The reward collected in every non-terminal state is 1; it is 0 in each terminal state[ The sum of the rewards collected in an execution run is exactly the amount of time the robot sees the target]
No cost for moving vs. not moving
EXPANDING THE STATE/ACTION TREE
...
horizon 1 horizon h
ASSIGNING REWARDS
...
horizon 1 horizon h
Terminal states: states where the target is not visible
Rewards: 1 in non-terminal states; 0 in others
But how to estimate the utility of a leaf at horizon h?
ESTIMATING THE UTILITY OF A LEAF
...
horizon 1 horizon h
Compute the shortest distance d for the target to escape the robot’s current field of view
If the maximal velocity v of the target is known, estimate the utility of the state to d/v [conservative estimate]
targetrobot
d
SELECTING THE NEXT ACTION
...
horizon 1 horizon h
Compute the optimal policy over the state/action tree using estimated utilities at leaf nodes
Execute only the first step of this policy
Repeat everything again at t+1… (sliding horizon)
PURE VISUAL SERVOING
Computing and Using a Policy
NEXT WEEK
More algorithm details
FINAL INFORMATION
Final presentations 20 minutes + 10 minutes questions
Final reports due 5/3 Must include a technical report
Introduction Background Methods Results Conclusion
May include auxiliary website with figures / examples / implementation details
I will not review code