Partial Satisfaction Planning: Representations and Solving Methods

Partial Satisfaction Planning:Representations and Solving

MethodsJ. [email protected]

Dissertation DefenseCommittee:

Subbarao KambhampatiChitta BaralMinh B. Do

David E. SmithPat Langley

http://www.asu.edu/

http://www.asu.edu/

http://rakaposhi.eas.asu.edu/yochan.html

Classical vs. Partial Satisfaction Planning (PSP)

2

Classical Planning• Initial state• Set of goals• Actions

Find a plan that achieves all goals

(prefer plans with fewer actions)

http://www.asu.edu/

3

http://www.asu.edu/

Classical vs. Partial Satisfaction Planning (PSP)

Classical Planning• Initial state• Set of goals• Actions

Find a plan that achieves all goals

(prefer plans with fewer actions)

Partial Satisfaction Planning• Initial state• Goals with differing

utilities• Goals have utility / cost

interactions• Utilities may be deadline

dependent• Actions with differing

costs

Find a plan with highest net benefit (cumulative utility – cumulative

cost)

(best plan may not achieve all the goals)

4

http://www.asu.edu/

Partial Satisfaction/Over-Subscription Planning

Traditional planning problems Find the shortest (lowest cost) plan that satisfies all the given goals

PSP Planning Find the highest utility plan given the resource constraints

Goals have utilities and actions have costs …arises naturally in many real world planning scenarios

MARS rovers attempting to maximize scientific return, given resource constraints

UAVs attempting to maximize reconnaissance returns, given fuel etc constraints

Logistics problems resource constraints … due to a variety of reasons

Constraints on agent’s resources Conflicting goals

With complex inter-dependencies between goal utilities Deadlines [IJCAI 2005; IJCAI 2007; ICAPS 2007; AIJ 2009;

IROS 2009; ICAPS 2012]

rao

Mention that David Smith brain washed my students with his Summer School talk..

http://www.asu.edu/

The Scalability Bottleneck Before: 6-10

action plans in minutes

We have figured out how to scale plan synthesis

In the last dozen years: 100 action plans in seconds

6

Realistic encodings of Munich airport!Realistic encodings

of (some of) the Munich airport!

The primary revolution in planning has been search control methods for scaling plan

synthesis

http://www.asu.edu/

7

Opt

imiz

atio

n M

etric

s

Any (feasible) Plan

Shortest plan

Cheapest plan

Highest net-benefit

Metric-

Tempo

ral

System Dynamics

Classic

al

Tempo

ral

Metric

Non-d

et POStoc

hasti

c

Traditional Planning

PS

P

http://www.asu.edu/

Agenda

8

In Proposal: Partial Satisfaction Planning – A Quick

History PSP and Utility Dependencies [IPC 2006; IJCAI 2007; ICAPS

2007] Study of Compilation Methods [AIJ 2009]

Completed Proposed Work: Time-dependent goals [ICAPS 2012, best student paper award]

http://www.asu.edu/

9

An Abbreviated Timeline of PSP

h𝑌𝑜𝑐 𝑎𝑛𝑃𝑆

Distinguished performance award1964 – Herbert Simon – “On the Concept of Organizational Goals”

1967 – Herbert Simon – “Motivational and Emotional Controls of Cognition”1990 – Feldman & Sproull – “Decision Theory: The Hungry Monkey”1993 – Haddawy & Hanks – “Utility Models … for Planners”2003 – David Smith – “Mystery Talk” at Planning Summer School2004 – David Smith – Choosing Objectives for Over-subscription Planning2004 – van den Briel et al. – Effective Methods for PSP

2005 – Benton, et. al – Metric preferences2006 – PDDL3/International Planning Competition – Many Planners/Other Language2007 – Benton, et al. / Do, Benton, et al. – Goal Utility Dependencies & reasoning with them2008 – Yoon, Benton & Kambhampati – Stage search for PSP2009 – Benton, Do & Kambhampati – analysis of SapaPS & compiling PDDL3 to PSP / cost planning2010 – Benton & Baier, Kambhampati – AAAI Tutorial on PSP / Preference Planning2010 – Talamadupula, Benton, et al. – Using PSP in Open World Planning2012 – Burns, Benton, et al. – Anticipatory On-line Planning2012 – Benton, et al. – Temporal Planning with Time-Dependent Continuous Costs

BB

AB

Best student paper award

http://www.asu.edu/

Agenda

10





http://www.asu.edu/

Net Benefit

11

Soft-goals with reward:r(Have(Soil)) = 25, r(Have(Rock)) = 50, r(Have(Image)) = 30

Actions with costs:c(Move(α,β)) = 10, c(Sample(Rock,β)) = 20

Objective function: find plan P thatMaximize r(P) – c(P)

β

α γ

β

α γ

β

α γ

β

[Smith, 2004; van den Briel et. al. 2004]

Cannot achieve all goalsdue to cost/mutexes

As an extension fromplanning:

http://www.asu.edu/

General Additive Independence Model Goal Cost Dependencies come from the plan Goal Utility Dependencies come from the user

12

GS RSf )(

GS

SfGU )()(

Utility over sets of dependent goals

15)1( gf 15)2( gf 20})2,1({ ggf

50201515})2,1({ ggU[Bacchus & Grove 1995]

g1 reward: 15 g2 reward: 15g1 ^ g2 reward: 20

[Do, Benton, van den Briel & Kambhampati IJCAI 2007; Benton, van den Briel & Kambhampati ICAPS 2007]

http://www.asu.edu/

The PSP Dilemma

– Impractical to find plans for all 2n goal combinations

13

23=8

26=64

β

α γ

β

α γ

β

α γ

β

http://www.asu.edu/

Handling Goal Utility Dependencies

Look at as optimization problem Encode planning problem as an Integer Program (IP)

Extends objective function of Herb Simon, 1967Resulting Planner uses van den Briel’s G1SC encoding

Look at as heuristic search problemModify a heuristic search planner

Extends state-of-the-art heuristic search methodsChanges search methodologyIncludes a suite of heuristics using Integer Programming

and Linear Programming 14

http://www.asu.edu/

Heuristic Goal Selection

15

Step 1: Estimate the lowest cost relaxed plan P+ achieving all goals

Step 2: Build cost-dependencies between goals in P+

Step 3: Find the optimize relaxed plan P+ using goal utilities

[Benton, Do & Kambhampati AIJ 2009; Do, Benton, van den Briel & Kambhampati, IJCAI 2007]

http://www.asu.edu/

Heuristic Goal Selection Process: No Utility Dependencies

16

[Do & Kambhampati JAIR 2002; Benton, Do, Kambhampati AIJ 2009]

at()

sample(soil, )

drive(, )

drive(, )

avail(soil, )

avail(rock, )

avail(image,)

at()

avail(soil, )

avail(rock, )

avail(image, )

at()

at()

have(soil)

sample(soil, )

drive(, )

drive(, )at()

avail(soil, )

avail(rock, )

avail(image, )

at()

at()

have(soil)

drive(, )

drive(, )

drive(, )

sample(rock, )

sample(image,)

drive(, )

have(image)have(rock)

20

10

30

20

10

30

2025

10

30

35

25

1535

40

20554510

25

A1A0 P1P0 P2

20

10

action cost

Heuristic from SapaPS

α γ

β

http://www.asu.edu/


17

[Benton, Do & Kambhampati AIJ 2009]

at()

sample(soil, )

drive(, )

drive(, )

avail(soil, )

avail(rock, )

avail(image,)

avail(rock, )

avail(image, )

at()

have(soil) have(soil)sample(rock, )

sample(image,)


20

10

30

2025

10

35205545

20

10

25

3050

25 – 20 = 530 – 55 = -2550 – 45 = 5

h = -15α γ

β

at()30


http://www.asu.edu/


18

[Benton, Do & Kambhampati AIJ 2009]

at()

sample(soil, )

drive(, )

avail(soil, )

avail(rock, )

avail(image,)

avail(rock, )

at()


have(rock)

20

10

20

10

3520

45

20

10

25

50

α γ

β

25 – 20 = 5

50 – 45 = 5h = 10


http://www.asu.edu/

Goal selection with Dependencies: SPUDS

19

Step 1: Estimate the lowest cost relaxed plan P+ achieving all goals

Step 2: Build cost-dependencies between goals in P+

Step 3: Find the optimize relaxed plan P+ using goal utilities

h𝑟𝑒𝑙𝑎𝑥𝐺𝐴𝐼

Use IP Formulation to maximize net benefit.Encode relaxed plan & GUD.

[Do, Benton, van den Briel & Kambhampati, IJCAI 2007]

SapaPs Utility DependencieS

at()

sample(soil, )

drive(, )

drive(, )

avail(soil, )avail(rock, )

avail(image,)

avail(rock, )

avail(image, )

at()


sample(image,)


201030

2025

10

35 205545

2010

2530

50

25 – 20 = 530 – 55 = -2550 – 45 = 5

h = -15Heuristic

α γ

β

at()30

Encodes ourthe previouspruningapproach as an IP, andincluding goal utility dependencies

http://www.asu.edu/

BBOP-LP:

20

1

2

DTGTruck1

Drive(l1,l2) Drive(l2,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

Load(p1,t1,l1)Unload(p1,t1,l1)

1

2

T

DTGPackage1

Load(p1,t1,l1)

Load(p1,t1,l2)

Unload(p1,t1,l1)

Unload(p1,t1,l2)

loc1 loc2

Network flow Multi-valued (captures mutexes) Relaxes action order Solves LP-relaxation Generates admissible heuristic Each state keeps same model

Updates only initial flow per state

h𝐿𝑃𝐺𝐴𝐼

[Benton, van den Briel & Kambhampati ICAPS 2007]

http://www.asu.edu/

Heuristic as an Integer Program

21

Constraints of this Heuristic1. If an action executes, then all of its effects and prevail conditions must also.action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v prevail(a,v,f)

2. If a fact is deleted, then it must be added to re-achieve a value.1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) +

endvalue(v,f)3. If a prevail condition is required, then it must be achieved.1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M4. A goal utility dependency is achieved iff its goals are

achieved.goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk| – 1 goaldep(k) ≤ endvalue(v,f) ∀ f in dependency k

Variables

Parameters


http://www.asu.edu/

Relaxed Plan Lookahead

22

α

Move(α,β) Sample(Soil,α)α,Soil γβ

Move(α,γ)

β ,Soil γ, SoilMove(α,β) Move(α,γ)

β ,Soil,Rock

α,Soil γ, Soil

Move(β,α)

Sample(Rock,β)Move(β,γ)

…

………

LookaheadActions

LookaheadActions

LookaheadActions

LookaheadActions

α,Soil

Move(β,α)

γ, Soil

Move(β,γ)

[similar to Vidal 2004]

α γ

β


http://www.asu.edu/

23

Results:h𝐿𝑃𝐺𝐴𝐼

RoversSatellite

ZenotravelFound Optimalin 15

(higher is better)


http://www.asu.edu/

24

Stage Adopts Stage algorithm

Originally used for optimization problems

Combines a search strategy with restarts

Restart points come from value function learned via previous search

First used hand-crafted features

We use automatically derived features

PSP [Yoon, Benton, Kambhampati ICAPS 2008]

[Boyan & Moore 2000]

O-Search: A* Search Use tree to learn new value

function V S-Search:

Hill-climbing search Using V, find a state S for

restarting O-Search

Rovers

http://www.asu.edu/

Agenda

25





http://www.asu.edu/

26

Compilation

PDDL3-SPPlanning Competition “simple preferences”

language

PSP Net Benefit

Cost-basedPlanning

[Keyder & Geffner 2007, 2009][Benton, Do & Kambhampati 2006,2009]

[Benton, Do & Kambhampati 2009]

IntegerProgrammin

g

WeightedMaxSAT

Markov Decision Process

[van den Briel, et al. 2004] [Russell & Holden 2010]

[van den Briel, et al. 2004]

Also: Full PDDL3 to metric planning for symbolic breadth-first search [Edelkamp 2006]

Directly Use AI Planning Methods

Bounded-length optimalBounded-length optimal

http://www.asu.edu/

27

PDDL3-SP to PSP / Cost-based Planning

(:goal (preference P0A (stored goods1 level1)))(:metric

(+ (× 5 (is-violated P0A) )))

(:action p0a :parameters () :precondition (and (stored goods1 level1)) :effect (and (hasPref-p0a)))

(:goal ((hasPref-p0a) 5.0))

Minimizes violation cost

Maximizes net benefit

Soft Goals

Actions that delete goal also delete “has preference”

(:goal (preference P0A (stored goods1 level1)))(:metric

(+ (× 5 (is-violated P0A) )))

(:action p0a-0 :parameters () :cost 0.0 :precondition (and (stored goods1 level1)) :effect (and (hasPref-p0a)))

(:action p0a-1 :parameters () :cost 5.0 :precondition (and (not (stored goods1 level1))) :effect (and (hasPref-p0a)))

(:goal (hasPref-p0a))

1-to-1 mappingbetween optimal solutions that achieve

“has preference” goal once

[Benton, Do & Kambhampati 2006,2009]

http://www.asu.edu/

28

ResultsRovers Trucks

Storage

(lower is better)

http://www.asu.edu/

Agenda

29





http://www.asu.edu/

Temporal Planning

30

Tempo

rally

Simple

Tempo

rally

Expre

ssive

AnyFeasible

ShortestMakespan

DiscreteCost

Deadlines

ContinuousCost

Deadlines

Opt

imiz

atio

n M

etric

s

System Dynamics

PS

P

[Benton, Coles and Coles ICAPS 2012; best paper]

http://www.asu.edu/

Continuous Case

Apples last ~20 days Oranges last ~15 days

Blueberries last ~10 days

The Dilemma of the Perishable Food

Goal Achievement Time

Cost

softdeadline

0max costdeadline

α

β

γDeliver Apples

Deliver Blueberries

Deliver Oranges

7 days

5 days

6 days

3 days

7 days


http://www.asu.edu/

Makespan != Plan Utility

Apples last ~20 days Oranges last ~15 days

Blueberries last ~10 days

Deliver Apples

Deliver Blueberries

Deliver Oranges

7 days

5 days

6 days

3 days

The Dilemma of the Perishable Food

13 + 0 + 0 = 134 + 6 + 4 = 14

αβγβγα

1516

makespanplan time-on-shelf

Cost

0max costdeadline

α

β

γ7 days


http://www.asu.edu/

Solving for the Continuous Case

33

Handling continuous costs Directly model continuous costs Compile into discretized cost functions

(PDDL3 preferences)


http://www.asu.edu/

Handling Continuous Costs

34

Model passing time as a PDDL+ process

Cost

d0

Use “Collect Cost” Action for Goal

tg < d : 0

at(apples, α) d < tg < d + c : f(t,g)d + c

f(t,g)

tg ≥ d + c : cost(g)

cost(g)

collected_at(apples, α)

Time

Conditional effects

precondition

effectcollected_at(apples, α)

New goal


http://www.asu.edu/

“Anytime” Search Procedure Enforced hill-climbing search for an

incumbent solution P

Restart using best-first branch-and-bound: Prune using cost(P)

Use admissible heuristic for pruning

35


http://www.asu.edu/

Compile to Discretized Cost

36

Cost

0d + c

cost(g)

f(t,g)

dTime


http://www.asu.edu/

Discretized Compilation

37

Cost

d10

cost(g)f1(t,g)

0

cost(g)f2(t,g)

d2Time

Cost

d30

cost(g)f3(t,g)

Time


http://www.asu.edu/

Final Discretized Compilation

38

fd(t,g) = f1(t,g) + f2(t,g) + f3(t,g)What’s the best granularity?

Cost

d10

d1 + c

cost(g)

fd(t,g)

d2 d3=Time


http://www.asu.edu/

The Discretization (Dis)advantage

39

Cost

d10

d1 + c

cost(g)

fd(t,g)

d2 d3=Time

we can prune this one if this one is

found first

With the admissible heuristic we can do thisearly enough to reduce the search effort!


http://www.asu.edu/

The Discretization (Dis)advantage

40

Cost

d10

d1 + c

cost(g)

f(t,g)

d2 d3=Time

But you’ll miss this better plan

The cost function!


http://www.asu.edu/

Continuous vs. Discretization

Continuous Advantage More accurate

solutions Represents

actual cost functions

41

Discretized Advantage “Faster” search Looks for

bigger jumps in quality

The Contenders[Benton, Coles and Coles ICAPS 2012; best paper]

http://www.asu.edu/

Continuous + Discrete-Mimicking Pruning

Continuous Representation More accurate

solutions Represents

actual cost functions

42

Tiered Search Mimicking

Discrete Pruning “Faster” search Looks for

bigger jumps in quality


http://www.asu.edu/

Tiered Approach

43

Cost

0d + c

cost(g)

f(t,g)

dTime

solution value

Cost: 128 (sol)


http://www.asu.edu/

Tiered Approach

44

Cost

0d + c

cost(g)

f(t,g)

dTime

solution value

heuristically prune

Cost(s1): 128 (sol)Prune >= sol – s1/2

Sequential pruning bounds where weheuristically prune

from the cost of the best plan so far


http://www.asu.edu/

Tiered Approach

45

Cost

0d + c

cost(g)

f(t,g)

dTime

solution value

heuristically prune





http://www.asu.edu/

Tiered Approach

46

Cost

0d + c

cost(g)

f(t,g)

dTime

solution value

heuristically prune





http://www.asu.edu/

Tiered Approach

47

Cost

0d + c

cost(g)

f(t,g)

dTime

solution value

heuristically prune





http://www.asu.edu/

Tiered Approach

48

Cost

0d + c

cost(g)

f(t,g)

dTime

solution value

Cost(s1): 128 (sol)Prune >= sol




http://www.asu.edu/

Time-dependent Cost Results

49


http://www.asu.edu/


50


http://www.asu.edu/


51


http://www.asu.edu/

Summary Partial Satisfaction Planning

Ubiquitous Foregrounds Quality Present in many applications

Challenges: Modeling & Solving Extended state-of-the-art methods to

handle: - PSP problems with goal utility dependencies - PSP problems involving soft deadlines 52

http://www.asu.edu/

Other Work In looking at PSP:

Anytime Search Minimizing Time Between Solutions [Thayer, Benton & Helmert SoCS 2012; best student paper]

Online Anticipatory Planning [Burns, Benton, Ruml, Do & Yoon ICAPS 2012] Planning for Human-Robot Teaming [Talamadupula, Benton, et al. TIST 2010] G-value plateaus: A Challenge for Planning [Benton, et al. ICAPS 2010] Cost-based Satisficing Search Considered Harmful [Cushing, Benton & Kambhampati SoCS 2010]

53

http://www.asu.edu/

Ongoing Work in PSP More complex time-dependent

costs(e.g., non-monotonic costs, time windows, goal achievement-based cost functions)

Multi-objective (e.g., multiple resource) plan quality measures

54

http://www.asu.edu/

55

References K. Talamadupula, J. Benton, P. Schermerhorn, M. Scheutz, S, Kambhampati.

Integrating a Closed World Planner with an Open-World Robot. In AAAI 2010. D. Smith. Choosing Objectives in Over-subscription Planning. In ICAPS 2004. D. Smith. “Mystery Talk”. PLANET Planning Summer School 2003. S. Yoon, J. Benton, S. Kambhampati. An Online Learning Method for Improving

Over-subscription Planning. In ICAPS 2008. M. van den Briel, R. Sanchez, M. Do, S. Kambhampati. Effective Approaches for

Partial Satisfaction (Over-subscription) Planning. In AAAI 2004. J. Benton, M. Do, S. Kambhampati. Over-subscription Planning with Metric Goals.

In IJCAI 2005. J. Benton, M. Do, S. Kambhampati. Anytime Heuristic Search for Partial

Satisfaction Planning. In Artificial Intelligence Journal, 173:562-592, April 2009. J. Benton, M. van den Briel, S. Kambhampati. A Hybrid Linear Programming and

Relaxed Plan Heuristic for Partial Satisfaction Planning. In ICAPS 2007. J. Benton, J. Baier, S. Kambhampati. Tutorial on Preferences and Partial

Satisfaction in Planning. AAAI 2010. J. Benton, A. J. Coles, A. I. Coles. Temporal Planning with Preferences and Time-

Dependent Continuous Costs. ICAPS 2012. M. Do, J. Benton, M. van den Briel, S. Kambhampati. Planning with Goal Utility

Dependencies. In IJCAI 2007 J. Boyan and A. Moore. Learning Evaluation Functions to Improve Optimization

by Local Search. In Journal of Machine Learning Research, 1:77-112, 2000.

http://www.asu.edu/

56

References R. Sanchez, S. Kambhampati. Planning Graph Heuristics for Selecting Objectives

in Over-subscription Planning Problems. In ICAPS 2005. M. Do, Terry Zimmerman, S. Kambhampati. Tutorial on Over-subscription

Planning and Scheduling. AAAI 2007. W. Ruml, M. Do, M. Fromhertz. On-line Planning and Scheduling for High-speed

Manufacturing. In ICAPS 2005. E. Keyder, H. Geffner. Soft Goals Can Be Compiled Away. Journal of Artificial

Intelligence, 36:547-556, September 2009. R. Russell, S. Holden. Handling Goal Utility Dependencies in a Satisfiability

Framework. In ICAPS 2010. S. Edelkamp, P. Kissmann. Optimal Symbolic Planning with Action Costs and

Preferences. In IJCAI 2009. M. van den Briel, T. Vossen, S. Kambhampati. Reviving Integer Programming

Approaches for AI Planning: A Branch-and-Cut Framework. In ICAPS 2005. V. Vidal. A Lookahead Strategy for Heuristic Search Planning. In ICAPS 2004. F. Bacchus, A. Grove. Graphical Models for Preference and Utility. In UAI 1995. M. Do, S. Kambhampati. Planning Graph-based Heuristics for Cost-sensitive

Temporal Planning. In AIPS 2002. H. Simon. On the Concept of Organizational Goal. In Administrative Science

Quarterly. 9:1-22, June 1964. H. Simon. Motivational and Emotional Controls of Cognition. In Psychological

Review. 74:29-39, January 1964.

http://www.asu.edu/

Partial Satisfaction Planning

Thanks!

57

http://www.asu.edu/

58

http://www.asu.edu/

Documents

Partial Satisfaction Planning: Representations and Solving Methods