Upload
hallie
View
42
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Partial Satisfaction Planning: Representations and Solving Methods. Dissertation Defense. J. Benton [email protected]. Committee: Subbarao Kambhampati Chitta Baral Minh B. Do David E. Smith Pat Langley. Classical vs. Partial Satisfaction Planning (PSP). Classical Planning Initial state - PowerPoint PPT Presentation
Citation preview
Partial Satisfaction Planning:Representations and Solving
MethodsJ. [email protected]
Dissertation DefenseCommittee:
Subbarao KambhampatiChitta BaralMinh B. Do
David E. SmithPat Langley
Classical vs. Partial Satisfaction Planning (PSP)
2
Classical Planning• Initial state• Set of goals• Actions
Find a plan that achieves all goals
(prefer plans with fewer actions)
Classical vs. Partial Satisfaction Planning (PSP)
Classical Planning• Initial state• Set of goals• Actions
Find a plan that achieves all goals
(prefer plans with fewer actions)
Partial Satisfaction Planning• Initial state• Goals with differing
utilities• Goals have utility / cost
interactions• Utilities may be deadline
dependent• Actions with differing
costs
Find a plan with highest net benefit (cumulative utility – cumulative
cost)
(best plan may not achieve all the goals)
4
Partial Satisfaction/Over-Subscription Planning
Traditional planning problems Find the shortest (lowest cost) plan that satisfies all the given goals
PSP Planning Find the highest utility plan given the resource constraints
Goals have utilities and actions have costs …arises naturally in many real world planning scenarios
MARS rovers attempting to maximize scientific return, given resource constraints
UAVs attempting to maximize reconnaissance returns, given fuel etc constraints
Logistics problems resource constraints … due to a variety of reasons
Constraints on agent’s resources Conflicting goals
With complex inter-dependencies between goal utilities Deadlines [IJCAI 2005; IJCAI 2007; ICAPS 2007; AIJ 2009;
IROS 2009; ICAPS 2012]
The Scalability Bottleneck Before: 6-10
action plans in minutes
We have figured out how to scale plan synthesis
In the last dozen years: 100 action plans in seconds
6
Realistic encodings of Munich airport!Realistic encodings
of (some of) the Munich airport!
The primary revolution in planning has been search control methods for scaling plan
synthesis
7
Opt
imiz
atio
n M
etric
s
Any (feasible) Plan
Shortest plan
Cheapest plan
Highest net-benefit
Metric-
Tempo
ral
System Dynamics
Classic
al
Tempo
ral
Metric
Non-d
et POStoc
hasti
c
Traditional Planning
PS
P
Agenda
8
In Proposal: Partial Satisfaction Planning – A Quick
History PSP and Utility Dependencies [IPC 2006; IJCAI 2007; ICAPS
2007] Study of Compilation Methods [AIJ 2009]
Completed Proposed Work: Time-dependent goals [ICAPS 2012, best student paper award]
9
An Abbreviated Timeline of PSP
h𝑌𝑜𝑐 𝑎𝑛𝑃𝑆
Distinguished performance award1964 – Herbert Simon – “On the Concept of Organizational Goals”
1967 – Herbert Simon – “Motivational and Emotional Controls of Cognition”1990 – Feldman & Sproull – “Decision Theory: The Hungry Monkey”1993 – Haddawy & Hanks – “Utility Models … for Planners”2003 – David Smith – “Mystery Talk” at Planning Summer School2004 – David Smith – Choosing Objectives for Over-subscription Planning2004 – van den Briel et al. – Effective Methods for PSP
2005 – Benton, et. al – Metric preferences2006 – PDDL3/International Planning Competition – Many Planners/Other Language2007 – Benton, et al. / Do, Benton, et al. – Goal Utility Dependencies & reasoning with them2008 – Yoon, Benton & Kambhampati – Stage search for PSP2009 – Benton, Do & Kambhampati – analysis of SapaPS & compiling PDDL3 to PSP / cost planning2010 – Benton & Baier, Kambhampati – AAAI Tutorial on PSP / Preference Planning2010 – Talamadupula, Benton, et al. – Using PSP in Open World Planning2012 – Burns, Benton, et al. – Anticipatory On-line Planning2012 – Benton, et al. – Temporal Planning with Time-Dependent Continuous Costs
BB
AB
Best student paper award
Agenda
10
In Proposal: Partial Satisfaction Planning – A Quick
History PSP and Utility Dependencies [IPC 2006; IJCAI 2007; ICAPS
2007] Study of Compilation Methods [AIJ 2009]
Completed Proposed Work: Time-dependent goals [ICAPS 2012, best student paper award]
Net Benefit
11
Soft-goals with reward:r(Have(Soil)) = 25, r(Have(Rock)) = 50, r(Have(Image)) = 30
Actions with costs:c(Move(α,β)) = 10, c(Sample(Rock,β)) = 20
Objective function: find plan P thatMaximize r(P) – c(P)
β
α γ
β
α γ
β
α γ
β
[Smith, 2004; van den Briel et. al. 2004]
Cannot achieve all goalsdue to cost/mutexes
As an extension fromplanning:
General Additive Independence Model Goal Cost Dependencies come from the plan Goal Utility Dependencies come from the user
12
GS RSf )(
GS
SfGU )()(
Utility over sets of dependent goals
15)1( gf 15)2( gf 20})2,1({ ggf
50201515})2,1({ ggU[Bacchus & Grove 1995]
g1 reward: 15 g2 reward: 15g1 ^ g2 reward: 20
[Do, Benton, van den Briel & Kambhampati IJCAI 2007; Benton, van den Briel & Kambhampati ICAPS 2007]
The PSP Dilemma
– Impractical to find plans for all 2n goal combinations
13
23=8
26=64
β
α γ
β
α γ
β
α γ
β
Handling Goal Utility Dependencies
Look at as optimization problem Encode planning problem as an Integer Program (IP)
Extends objective function of Herb Simon, 1967Resulting Planner uses van den Briel’s G1SC encoding
Look at as heuristic search problemModify a heuristic search planner
Extends state-of-the-art heuristic search methodsChanges search methodologyIncludes a suite of heuristics using Integer Programming
and Linear Programming 14
Heuristic Goal Selection
15
Step 1: Estimate the lowest cost relaxed plan P+ achieving all goals
Step 2: Build cost-dependencies between goals in P+
Step 3: Find the optimize relaxed plan P+ using goal utilities
[Benton, Do & Kambhampati AIJ 2009; Do, Benton, van den Briel & Kambhampati, IJCAI 2007]
Heuristic Goal Selection Process: No Utility Dependencies
16
[Do & Kambhampati JAIR 2002; Benton, Do, Kambhampati AIJ 2009]
at()
sample(soil, )
drive(, )
drive(, )
avail(soil, )
avail(rock, )
avail(image,)
at()
avail(soil, )
avail(rock, )
avail(image, )
at()
at()
have(soil)
sample(soil, )
drive(, )
drive(, )at()
avail(soil, )
avail(rock, )
avail(image, )
at()
at()
have(soil)
drive(, )
drive(, )
drive(, )
sample(rock, )
sample(image,)
drive(, )
have(image)have(rock)
20
10
30
20
10
30
2025
10
30
35
25
1535
40
20554510
25
A1A0 P1P0 P2
20
10
action cost
Heuristic from SapaPS
α γ
β
Heuristic Goal Selection Process: No Utility Dependencies
17
[Benton, Do & Kambhampati AIJ 2009]
at()
sample(soil, )
drive(, )
drive(, )
avail(soil, )
avail(rock, )
avail(image,)
avail(rock, )
avail(image, )
at()
have(soil) have(soil)sample(rock, )
sample(image,)
have(image)have(rock)
20
10
30
2025
10
35205545
20
10
25
3050
25 – 20 = 530 – 55 = -2550 – 45 = 5
h = -15α γ
β
at()30
Heuristic from SapaPS
Heuristic Goal Selection Process: No Utility Dependencies
18
[Benton, Do & Kambhampati AIJ 2009]
at()
sample(soil, )
drive(, )
avail(soil, )
avail(rock, )
avail(image,)
avail(rock, )
at()
have(soil) have(soil)sample(rock, )
have(rock)
20
10
20
10
3520
45
20
10
25
50
α γ
β
25 – 20 = 5
50 – 45 = 5h = 10
Heuristic from SapaPS
Goal selection with Dependencies: SPUDS
19
Step 1: Estimate the lowest cost relaxed plan P+ achieving all goals
Step 2: Build cost-dependencies between goals in P+
Step 3: Find the optimize relaxed plan P+ using goal utilities
h𝑟𝑒𝑙𝑎𝑥𝐺𝐴𝐼
Use IP Formulation to maximize net benefit.Encode relaxed plan & GUD.
[Do, Benton, van den Briel & Kambhampati, IJCAI 2007]
SapaPs Utility DependencieS
at()
sample(soil, )
drive(, )
drive(, )
avail(soil, )avail(rock, )
avail(image,)
avail(rock, )
avail(image, )
at()
have(soil) have(soil)sample(rock, )
sample(image,)
have(image)have(rock)
201030
2025
10
35 205545
2010
2530
50
25 – 20 = 530 – 55 = -2550 – 45 = 5
h = -15Heuristic
α γ
β
at()30
Encodes ourthe previouspruningapproach as an IP, andincluding goal utility dependencies
BBOP-LP:
20
1
2
DTGTruck1
Drive(l1,l2) Drive(l2,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
Load(p1,t1,l1)Unload(p1,t1,l1)
1
2
T
DTGPackage1
Load(p1,t1,l1)
Load(p1,t1,l2)
Unload(p1,t1,l1)
Unload(p1,t1,l2)
loc1 loc2
Network flow Multi-valued (captures mutexes) Relaxes action order Solves LP-relaxation Generates admissible heuristic Each state keeps same model
Updates only initial flow per state
h𝐿𝑃𝐺𝐴𝐼
[Benton, van den Briel & Kambhampati ICAPS 2007]
Heuristic as an Integer Program
21
Constraints of this Heuristic1. If an action executes, then all of its effects and prevail conditions must also.action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v prevail(a,v,f)
2. If a fact is deleted, then it must be added to re-achieve a value.1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) +
endvalue(v,f)3. If a prevail condition is required, then it must be achieved.1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M4. A goal utility dependency is achieved iff its goals are
achieved.goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk| – 1 goaldep(k) ≤ endvalue(v,f) ∀ f in dependency k
Variables
Parameters
[Benton, van den Briel & Kambhampati ICAPS 2007]
Relaxed Plan Lookahead
22
α
Move(α,β) Sample(Soil,α)α,Soil γβ
Move(α,γ)
β ,Soil γ, SoilMove(α,β) Move(α,γ)
β ,Soil,Rock
α,Soil γ, Soil
Move(β,α)
Sample(Rock,β)Move(β,γ)
…
………
LookaheadActions
LookaheadActions
LookaheadActions
LookaheadActions
α,Soil
Move(β,α)
γ, Soil
Move(β,γ)
[similar to Vidal 2004]
α γ
β
[Benton, van den Briel & Kambhampati ICAPS 2007]
23
Results:h𝐿𝑃𝐺𝐴𝐼
RoversSatellite
ZenotravelFound Optimalin 15
(higher is better)
[Benton, van den Briel & Kambhampati ICAPS 2007]
24
Stage Adopts Stage algorithm
Originally used for optimization problems
Combines a search strategy with restarts
Restart points come from value function learned via previous search
First used hand-crafted features
We use automatically derived features
PSP [Yoon, Benton, Kambhampati ICAPS 2008]
[Boyan & Moore 2000]
O-Search: A* Search Use tree to learn new value
function V S-Search:
Hill-climbing search Using V, find a state S for
restarting O-Search
Rovers
Agenda
25
In Proposal: Partial Satisfaction Planning – A Quick
History PSP and Utility Dependencies [IPC 2006; IJCAI 2007; ICAPS
2007] Study of Compilation Methods [AIJ 2009]
Completed Proposed Work: Time-dependent goals [ICAPS 2012, best student paper award]
26
Compilation
PDDL3-SPPlanning Competition “simple preferences”
language
PSP Net Benefit
Cost-basedPlanning
[Keyder & Geffner 2007, 2009][Benton, Do & Kambhampati 2006,2009]
[Benton, Do & Kambhampati 2009]
IntegerProgrammin
g
WeightedMaxSAT
Markov Decision Process
[van den Briel, et al. 2004] [Russell & Holden 2010]
[van den Briel, et al. 2004]
Also: Full PDDL3 to metric planning for symbolic breadth-first search [Edelkamp 2006]
Directly Use AI Planning Methods
Bounded-length optimalBounded-length optimal
27
PDDL3-SP to PSP / Cost-based Planning
(:goal (preference P0A (stored goods1 level1)))(:metric
(+ (× 5 (is-violated P0A) )))
(:action p0a :parameters () :precondition (and (stored goods1 level1)) :effect (and (hasPref-p0a)))
(:goal ((hasPref-p0a) 5.0))
Minimizes violation cost
Maximizes net benefit
Soft Goals
Actions that delete goal also delete “has preference”
(:goal (preference P0A (stored goods1 level1)))(:metric
(+ (× 5 (is-violated P0A) )))
(:action p0a-0 :parameters () :cost 0.0 :precondition (and (stored goods1 level1)) :effect (and (hasPref-p0a)))
(:action p0a-1 :parameters () :cost 5.0 :precondition (and (not (stored goods1 level1))) :effect (and (hasPref-p0a)))
(:goal (hasPref-p0a))
1-to-1 mappingbetween optimal solutions that achieve
“has preference” goal once
[Benton, Do & Kambhampati 2006,2009]
Agenda
29
In Proposal: Partial Satisfaction Planning – A Quick
History PSP and Utility Dependencies [IPC 2006; IJCAI 2007; ICAPS
2007] Study of Compilation Methods [AIJ 2009]
Completed Proposed Work: Time-dependent goals [ICAPS 2012, best student paper award]
Temporal Planning
30
Tempo
rally
Simple
Tempo
rally
Expre
ssive
AnyFeasible
ShortestMakespan
DiscreteCost
Deadlines
ContinuousCost
Deadlines
Opt
imiz
atio
n M
etric
s
System Dynamics
PS
P
[Benton, Coles and Coles ICAPS 2012; best paper]
Continuous Case
Apples last ~20 days Oranges last ~15 days
Blueberries last ~10 days
The Dilemma of the Perishable Food
Goal Achievement Time
Cost
softdeadline
0max costdeadline
α
β
γDeliver Apples
Deliver Blueberries
Deliver Oranges
7 days
5 days
6 days
3 days
7 days
[Benton, Coles and Coles ICAPS 2012; best paper]
Makespan != Plan Utility
Apples last ~20 days Oranges last ~15 days
Blueberries last ~10 days
Deliver Apples
Deliver Blueberries
Deliver Oranges
7 days
5 days
6 days
3 days
The Dilemma of the Perishable Food
13 + 0 + 0 = 134 + 6 + 4 = 14
αβγβγα
1516
makespanplan time-on-shelf
Cost
0max costdeadline
α
β
γ7 days
[Benton, Coles and Coles ICAPS 2012; best paper]
Solving for the Continuous Case
33
Handling continuous costs Directly model continuous costs Compile into discretized cost functions
(PDDL3 preferences)
[Benton, Coles and Coles ICAPS 2012; best paper]
Handling Continuous Costs
34
Model passing time as a PDDL+ process
Cost
d0
Use “Collect Cost” Action for Goal
tg < d : 0
at(apples, α) d < tg < d + c : f(t,g)d + c
f(t,g)
tg ≥ d + c : cost(g)
cost(g)
collected_at(apples, α)
Time
Conditional effects
precondition
effectcollected_at(apples, α)
New goal
[Benton, Coles and Coles ICAPS 2012; best paper]
“Anytime” Search Procedure Enforced hill-climbing search for an
incumbent solution P
Restart using best-first branch-and-bound: Prune using cost(P)
Use admissible heuristic for pruning
35
[Benton, Coles and Coles ICAPS 2012; best paper]
Compile to Discretized Cost
36
Cost
0d + c
cost(g)
f(t,g)
dTime
[Benton, Coles and Coles ICAPS 2012; best paper]
Discretized Compilation
37
Cost
d10
cost(g)f1(t,g)
0
cost(g)f2(t,g)
d2Time
Cost
d30
cost(g)f3(t,g)
Time
[Benton, Coles and Coles ICAPS 2012; best paper]
Final Discretized Compilation
38
fd(t,g) = f1(t,g) + f2(t,g) + f3(t,g)What’s the best granularity?
Cost
d10
d1 + c
cost(g)
fd(t,g)
d2 d3=Time
[Benton, Coles and Coles ICAPS 2012; best paper]
The Discretization (Dis)advantage
39
Cost
d10
d1 + c
cost(g)
fd(t,g)
d2 d3=Time
we can prune this one if this one is
found first
With the admissible heuristic we can do thisearly enough to reduce the search effort!
[Benton, Coles and Coles ICAPS 2012; best paper]
The Discretization (Dis)advantage
40
Cost
d10
d1 + c
cost(g)
f(t,g)
d2 d3=Time
But you’ll miss this better plan
The cost function!
[Benton, Coles and Coles ICAPS 2012; best paper]
Continuous vs. Discretization
Continuous Advantage More accurate
solutions Represents
actual cost functions
41
Discretized Advantage “Faster” search Looks for
bigger jumps in quality
The Contenders[Benton, Coles and Coles ICAPS 2012; best paper]
Continuous + Discrete-Mimicking Pruning
Continuous Representation More accurate
solutions Represents
actual cost functions
42
Tiered Search Mimicking
Discrete Pruning “Faster” search Looks for
bigger jumps in quality
[Benton, Coles and Coles ICAPS 2012; best paper]
Tiered Approach
43
Cost
0d + c
cost(g)
f(t,g)
dTime
solution value
Cost: 128 (sol)
[Benton, Coles and Coles ICAPS 2012; best paper]
Tiered Approach
44
Cost
0d + c
cost(g)
f(t,g)
dTime
solution value
heuristically prune
Cost(s1): 128 (sol)Prune >= sol – s1/2
Sequential pruning bounds where weheuristically prune
from the cost of the best plan so far
[Benton, Coles and Coles ICAPS 2012; best paper]
Tiered Approach
45
Cost
0d + c
cost(g)
f(t,g)
dTime
solution value
heuristically prune
Cost(s1): 128 (sol)Prune >= sol – s1/4
Sequential pruning bounds where weheuristically prune
from the cost of the best plan so far
[Benton, Coles and Coles ICAPS 2012; best paper]
Tiered Approach
46
Cost
0d + c
cost(g)
f(t,g)
dTime
solution value
heuristically prune
Cost(s1): 128 (sol)Prune >= sol – s1/8
Sequential pruning bounds where weheuristically prune
from the cost of the best plan so far
[Benton, Coles and Coles ICAPS 2012; best paper]
Tiered Approach
47
Cost
0d + c
cost(g)
f(t,g)
dTime
solution value
heuristically prune
Cost(s1): 128 (sol)Prune >= sol – s1/16
Sequential pruning bounds where weheuristically prune
from the cost of the best plan so far
[Benton, Coles and Coles ICAPS 2012; best paper]
Tiered Approach
48
Cost
0d + c
cost(g)
f(t,g)
dTime
solution value
Cost(s1): 128 (sol)Prune >= sol
Sequential pruning bounds where weheuristically prune
from the cost of the best plan so far
[Benton, Coles and Coles ICAPS 2012; best paper]
Summary Partial Satisfaction Planning
Ubiquitous Foregrounds Quality Present in many applications
Challenges: Modeling & Solving Extended state-of-the-art methods to
handle: - PSP problems with goal utility dependencies - PSP problems involving soft deadlines 52
Other Work In looking at PSP:
Anytime Search Minimizing Time Between Solutions [Thayer, Benton & Helmert SoCS 2012; best student paper]
Online Anticipatory Planning [Burns, Benton, Ruml, Do & Yoon ICAPS 2012] Planning for Human-Robot Teaming [Talamadupula, Benton, et al. TIST 2010] G-value plateaus: A Challenge for Planning [Benton, et al. ICAPS 2010] Cost-based Satisficing Search Considered Harmful [Cushing, Benton & Kambhampati SoCS 2010]
53
Ongoing Work in PSP More complex time-dependent
costs(e.g., non-monotonic costs, time windows, goal achievement-based cost functions)
Multi-objective (e.g., multiple resource) plan quality measures
54
55
References K. Talamadupula, J. Benton, P. Schermerhorn, M. Scheutz, S, Kambhampati.
Integrating a Closed World Planner with an Open-World Robot. In AAAI 2010. D. Smith. Choosing Objectives in Over-subscription Planning. In ICAPS 2004. D. Smith. “Mystery Talk”. PLANET Planning Summer School 2003. S. Yoon, J. Benton, S. Kambhampati. An Online Learning Method for Improving
Over-subscription Planning. In ICAPS 2008. M. van den Briel, R. Sanchez, M. Do, S. Kambhampati. Effective Approaches for
Partial Satisfaction (Over-subscription) Planning. In AAAI 2004. J. Benton, M. Do, S. Kambhampati. Over-subscription Planning with Metric Goals.
In IJCAI 2005. J. Benton, M. Do, S. Kambhampati. Anytime Heuristic Search for Partial
Satisfaction Planning. In Artificial Intelligence Journal, 173:562-592, April 2009. J. Benton, M. van den Briel, S. Kambhampati. A Hybrid Linear Programming and
Relaxed Plan Heuristic for Partial Satisfaction Planning. In ICAPS 2007. J. Benton, J. Baier, S. Kambhampati. Tutorial on Preferences and Partial
Satisfaction in Planning. AAAI 2010. J. Benton, A. J. Coles, A. I. Coles. Temporal Planning with Preferences and Time-
Dependent Continuous Costs. ICAPS 2012. M. Do, J. Benton, M. van den Briel, S. Kambhampati. Planning with Goal Utility
Dependencies. In IJCAI 2007 J. Boyan and A. Moore. Learning Evaluation Functions to Improve Optimization
by Local Search. In Journal of Machine Learning Research, 1:77-112, 2000.
56
References R. Sanchez, S. Kambhampati. Planning Graph Heuristics for Selecting Objectives
in Over-subscription Planning Problems. In ICAPS 2005. M. Do, Terry Zimmerman, S. Kambhampati. Tutorial on Over-subscription
Planning and Scheduling. AAAI 2007. W. Ruml, M. Do, M. Fromhertz. On-line Planning and Scheduling for High-speed
Manufacturing. In ICAPS 2005. E. Keyder, H. Geffner. Soft Goals Can Be Compiled Away. Journal of Artificial
Intelligence, 36:547-556, September 2009. R. Russell, S. Holden. Handling Goal Utility Dependencies in a Satisfiability
Framework. In ICAPS 2010. S. Edelkamp, P. Kissmann. Optimal Symbolic Planning with Action Costs and
Preferences. In IJCAI 2009. M. van den Briel, T. Vossen, S. Kambhampati. Reviving Integer Programming
Approaches for AI Planning: A Branch-and-Cut Framework. In ICAPS 2005. V. Vidal. A Lookahead Strategy for Heuristic Search Planning. In ICAPS 2004. F. Bacchus, A. Grove. Graphical Models for Preference and Utility. In UAI 1995. M. Do, S. Kambhampati. Planning Graph-based Heuristics for Cost-sensitive
Temporal Planning. In AIPS 2002. H. Simon. On the Concept of Organizational Goal. In Administrative Science
Quarterly. 9:1-22, June 1964. H. Simon. Motivational and Emotional Controls of Cognition. In Psychological
Review. 74:29-39, January 1964.