Upload
edirlei
View
229
Download
0
Embed Size (px)
Citation preview
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 1/8
Speeding Up the Calculation of Heuristicsfor Heuristic Search-Based Planning
Yaxin Liu, Sven Koenig and David FurcyCollege of Computing
Georgia Institute of TechnologyAtlanta, GA 30312-0280
yxliu,skoenig,dfurcy @cc.gatech.edu
Abstract
Heuristic search-based planners, such as HSP 2.0, solveSTRIPS-style planning problems efficiently but spend abouteighty percent of their planning time on calculating theheuristic values. In this paper, we systematically evaluate al-ternative methods for calculating the heuristic values for HSP2.0 and demonstrate that the resulting planning times differsubstantially. HSP 2.0 calculates each heuristic value by solv-
ing a relaxed planning problem with a dynamic programmingmethod similar to value iteration. We identify two differentapproaches for speeding up the calculation of heuristic val-ues, namely to order the value updates and to reuse infor-mation from the calculation of previous heuristic values. Wethen show how these two approaches can be combined, re-sulting in our PINCH method. PINCH outperforms both of the other approaches individually as well as the methods usedby HSP 1.0 and HSP 2.0 for most of the large planning prob-lems tested. In fact, it speeds up the planning time of HSP 2.0by up to eighty percent in several domains and, in general, theamount of savings grows with the size of the domains, allow-ing HSP 2.0 to solve larger planning problems than was pos-sible before in the same amount of time and without changingits overall operation.
IntroductionHeuristic search-based planners were introduced by (Mc-Dermott 1996) and (Bonet, Loerincs, & Geffner 1997) andare now very popular. Several of them entered the sec-ond planning competition at AIPS-2000, including HSP 2.0(Bonet & Geffner 2001a), FF (Hoffmann & Nebel 2001a),GRT (Refanidis & Vlahavas 2001), and AltAlt (Nguyen,Kambhampati, & Nigenda 2002). Heuristic search-basedplanners perform a heuristic forward or backward searchin the space of world states to find a path from the startstate to a goal state. In this paper, we study HSP 2.0,a prominent heuristic search-based planner that won one
of four honorable mentions for overall exceptional perfor-mance at the AIPS-2000 planning competition. It was oneof the first planners that demonstrated how one can obtaininformed heuristic values for STRIPS-style planning prob-lems to make planning tractable. In its default configura-tion, it uses weighted A* searches with inadmissible heuris-tic values to perform forward searches in the space of world
Copyright c 2002, American Association for Artificial Intelli-
gence (www.aaai.org). All rights reserved.
states. However, the calculation of heuristic values is time-consuming since HSP 2.0 calculates the heuristic value of each state that it encounters during the search by solvinga relaxed planning problem. Consequently, its calculationof heuristic values comprises about 80 percent of its plan-ning time (Bonet & Geffner 2001b). This suggests thatone might be able to speed up its planning time by speed-ing up its calculation of heuristic values. Some heuristicsearch-based planners, for example, remove irrelevant op-erators before calculating the heuristic values (Hoffmann& Nebel 2001b) and other planners cache information ob-tained in a preprocessing phase to simplify the calculation of all heuristic values for a given planning problem (Bonet &Geffner 1999; Refanidis & Vlahavas 2001; Edelkamp 2001;Nguyen, Kambhampati, & Nigenda 2002).
Different from these approaches, we speed up HSP 2.0without changing its heuristic values or overall operation. Inthis paper, we study whether different methods for calculat-ing the heuristic values for HSP 2.0 result in different plan-ning times. We systematically evaluate a large number of different methods and demonstrate that, indeed, the result-
ing planning times differ substantially. HSP 2.0 calculateseach heuristic value by solving a relaxed planning problemwith a dynamic programming method similar to value iter-ation. We identify two different approaches for speedingup the calculation of heuristic values, namely to order thevalue updates and to reuse information from the calculationof previous heuristic values by performing incremental cal-culations. Each of these approaches can be implementedindividually in rather straightforward ways. The questionarises, then, how to combine them and whether this is ben-eficial. Since the approaches cannot be combined easily,we develop a new method. The PINCH (Prioritized, IN-Cremental Heuristics calculation) method exploits the factthat the relaxed planning problems that are used to deter-
mine the heuristic values for two different states are similarif the states are similar. Thus, we use a method from al-gorithm theory to avoid the parts of the plan-constructionprocess that are identical to the previous ones and order theremaining value updates in an efficient way. We demonstratethat PINCH outperforms both of the other approaches indi-vidually as well as the methods used by HSP 1.0 and HSP2.0 for most of the large planning problems tested. In fact,it speeds up the planning time of HSP 2.0 by up to eighty
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 2/8
Non-Incremental Computations Incremental Computations
Unordered Updates VI, HSP2 IVI
Ordered Updates GBF, HSP1, GD PINCH
Table 1: Classification of Heuristic-Calculation Methods
percent in several domains and, in general, the amount of savings grows with the size of the domains, allowing HSP2.0 to solve larger planning problems in the same amount
of time than was possible before and without changing itsoverall operation.
Heuristic Search-Based Planning: HSP 2.0
In this section, we describe how HSP 2.0 operates. We firstdescribe the planning problems that it solves and then howit calculates its heuristic values. We follow the notation anddescription from (Bonet & Geffner 2001b).
The Problem
HSP 2.0 solves STRIPS-style planning problems withground operators. Such STRIPS-style planning problemsconsist of a set of propositions that are used to describethe states and operators, a set of ground operators
, the
start state , and the partially specified goal .Each operator has a precondition list ,an add list
, and a delete list
.The STRIPS planning problem induces a graph search prob-lem that consists of a set of states (vertices)
, a start state , a set of goal states , a set of ac-tions (directed edges)
for each state where action transitions from state to state
with cost one. All opera-tor sequences (paths) from the start state to any goal state inthe graph (plans) are solutions of the STRIPS-style planningproblem. The shorter the path, the higher the quality of thesolution.
The Method and its HeuristicsIn its default configuration, HSP 2.0 performs a forwardsearch in the space of world states using weighted A* (Pearl1985) with inadmissible heuristic values. It calculates theheuristic value of a given state by solving a relaxed versionof the planning problem, where it recursively approximates(by ignoring all delete lists) the cost of achieving each goalproposition individually from the given state and then com-bines the estimates to obtain the heuristic value of the givenstate. In the following, we explain the calculation of heuris-tic values in detail. We use
to denote the approximatecost of achieving proposition from state , and
to denote the approximate cost of achieving the pre-conditions of operator from state . HSP 2.0defines these quantities recursively. It defines for all ,
, and
(the minimum of an empty set is definedto be infinity and an empty sum is defined to be zero):
if
℄ otherwise (1)
(2)
procedure Main()
forever do
set
to the state whose heuristic value needs to get computed next
for each
do set
for each
do set
repeat
for each
do set
for each
do set
℄
until the values of all
remain unchanged during an iteration
/* use
*/
Figure 1: VI
Then, the heuristic value
of state can becalculated as
. This allows HSP 2.0
to solve large planning problems, although it is not guaran-teed to find shortest paths.
Calculation of Heuristics
In the next sections, we describe differentmethods that solveEquations 1 and 2. These methods are summarized in Ta-ble 1. To describe them, we use a variable if its values areguaranteed to be propositions, a variable
if its values are
guaranteed to be operators, and a variable if its values canbe either propositions or operators. We also use variables
(a proposition variable) for
and
(an opera-tor variable) for . In principle, the value of variable
satisfies
after termination and the value of variable
satisfies
after termination, except forsome methods that terminate immediately after the variables
for have their correct values because this alreadyallows them to calculate the heuristic value.
VI and HSP2: Simple Calculation of Heuristics
Figure 1 shows a simple dynamic programming method thatsolves the equations using a form of value iteration. ThisVI (Value Iteration) method initializes the variables
tozero if , and initializes all other variables to infinity. Itthen repeatedly sweeps over all variables and updates themaccording to Equations 1 and 2, until no value changes anylonger.
HSP 2.0 uses a variant of VI that eliminates the vari-ables
by combining Equations 1 and 2. Thus, this HSP2method performs only the following operation as part of itsrepeat-until loop:
foreach
do set
℄ .
HSP2 needs more time than VI for each sweep but re-duces the number of sweeps. For example, it needs only 5.0sweeps on average in the Logistics domain, while VI needs
7.9 sweeps.
Speeding Up the Calculation of Heuristics
We investigate two orthogonal approaches for speeding upHSP 2.0’s calculation of heuristic values, namely to orderthe value updates (ordered updates) and to reuse informationfrom the calculation of previous heuristic values (incremen-tal computations), as shown in Table 1. We then describehow to combine them.
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 3/8
procedure Main()
forever do
set
to the state whose heuristic value needs to get computed next
for each
do set
for each
do set
repeat
for each
do
set
if
then
set
for each
do
set
until the values of all
remain unchanged during an iteration
/* use
*/
Figure 2: GBF
IVI: Reusing Results from Previous Searches
When HSP 2.0 calculates a heuristic value
, it solvesequations in
and
. When HSP 2.0 then calcu-lates the heuristic value
of some other state
, itsolves equations in
and
. If
forsome and
for some , one could just cache these values during the calculation of
andthen reuse them during the calculation of
. Unfor-tunately, it is nontrivial to determine which values remainunchanged. We explain later, in the context of our PINCHmethod, how this can be done. For now, we exploit the factthat the values
and
for the same
and thevalues
and
for the same are often similarif
and
are similar. Since HSP 2.0 calculates the heuristicvalues of all children of a state in a row when it expands thestate, it often calculates the heuristic values of similar statesin succession. This fact can be exploited by changing VI sothat it initializes
to zero if but does not re-initializethe other variables, resulting in the IVI (Incremental ValueIteration) method. IVI repeatedly sweeps over all variablesuntil no value changes any longer. If the number of sweepsbecomes larger than a given threshold then IVI terminates
the sweeps and simply calls VI to determine the values. IVIneeds fewer sweeps than VI if HSP 2.0 calculates the heuris-tics of similar states in succession because then the valuesof the corresponding variables
tend to be similar as well.For example, IVI needs only 6.2 sweeps on average in theLogistics domain, while VI needs 7.9 sweeps.
GBF, HSP1 and GD: Ordering the Value Updates
VI and IVI sweep over all variables in an arbitrary order.Their numberof sweeps can be reduced by ordering the vari-ables appropriately, similarly to the way values are orderedin the Prioritized Sweeping method used for reinforcementlearning (Moore & Atkeson 1993).
Figure 2 shows the GBF (Generalized Bellman-Ford)
method, that is similar to the variant of the Bellman-Fordmethod in (Cormen, Leiserson, & Rivest 1990). It ordersthe value updates of the variables
. It sweeps over all vari-ables
and updates their values according to Equation 2.If the update changes the value of variable
, GBF iteratesover the variables
for the propositions in the add list of operator
and updates their values according to Equation 1.Thus, GBF updates the values of the variables
only if their values might have changed.
procedure Main()
forever do
set
to the state whose heuristic value needs to get computed next
for each
do set
for each
do set
set
while
set
the set of operators with no preconditions
for each
do set
set
for each
do set
for each
do
set
for each
if
set
set
for each
do set
/* use
*/
Figure 3: HSP1
Figure 3 shows the method used by HSP 1.0. The HSP1method orders the value updates of the variables
and
, similar to the variant of the Bellman-Ford method in(Bertsekas 2001). It uses an unordered list to rememberthe propositions whose variables changed their values. It
sweeps over all variables
that have these propositions aspreconditions and updates their values according to Equa-tion 2. After each update of the value of a variable
, GBFiterates over the variables
for the propositions in theadd list of operator
and updates their values according toEquation 1. The unordered list is then updated to containall propositions whose variables changed their values in thisstep. Thus, HSP1 updates the values of the variables
and
only if their values might have changed.Finally, the GD (Generalized Dijkstra) method (Knuth
1977) is a generalization of Dijkstra’s graph search method(Dijkstra 1959) and uses a priority queue to sweep over thevariables
and
in the order of increasing values. Similarto HSP1, its priority queue contains only those variables
and
whose values might have changed. To make it evenmore efficient, we terminate it immediately once the valuesof all
for
are known to be correct. We do notgive pseudocode for GD because it is a special case of thePINCH method, that we discuss next.1
PINCH: The Best of Both Worlds
The two approaches for speeding up the calculation of heuristic values discussed above are orthogonal. We nowdescribe our main contribution in this paper, namely howto combine them. This is complicated by the fact that theapproaches themselves cannot be combined easily becausethe methods that order the value updates exploit the prop-erty that the values of the variables cannot increase duringeach computation of a heuristic value. Unfortunately, thisproperty no longer holds when reusing the values of the vari-ables from the calculation of previous heuristic values. We
1PINCH from Figure 5 reduces to GD if one empties its priorityqueue and reinitializes the variables
and
before SolveE-quations() is called again as well as modifies the while loop of SolveEquations() to terminate immediately once the values of all
for
are known to be correct.
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 4/8
thus need to develop a new method based on a method fromalgorithm theory. The resulting PINCH (Prioritized, INCre-mental Heuristics calculation) method solves Equations 1and 2 by calculating only those values
and
thatare different from the corresponding values
and
from the calculation of previous heuristic values. PINCHalso orders the variables so that it updates the value of eachvariable at most twice. We will demonstrate in the experi-
mental section that PINCH outperforms the other methodsin many large planning domains and is not worse in mostother large planning domains.
DynamicSWSF-FP
In this section, we describe DynamicSWSF-FP (Rama-lingam & Reps 1996), the method from algorithm theorythat we extend to speed up the calculation of the heuristicvalues for HSP 2.0. We follow the presentation in (Rama-lingam & Reps 1996). A function
:
is called a strict weakly superior func-tion (in short: swsf) if, for every , itis monotone non-decreasing in variable
and satisfies:
. The swsf fixed point (in short: swsf-fp) problem is to compute the unique fixed point of equa-tions, namely the equations
, in the
variables
, where the
are swsf for .The dynamic swsf-fp problem is to maintain the unique fixedpoint of the swsf equations after some or all of the functions
have been replaced by other swsf’s. DynamicSWSF-FPsolves the dynamic swsf-fp problem efficiently by recalcu-lating only the values of variables that change, rather thanthe values of all variables. The authors of DynamicSWSF-FP have proved its correctness, completeness, and otherproperties and applied it to grammar problems and shortestpath problems (Ramalingam & Reps 1996).
Terminology and VariablesWe use the following terminology to explain how to useDynamicSWSF-FP to calculate the heuristic values for HSP2.0. If
then
is called consistent.Otherwise it is called inconsistent. If
is inconsistent theneither
, in which case we call
under-consistent, or
, in which case we call
overconsistent. We use the variables
to keep track of the current values of
. It always holds that
(Invariant 1). We can therefore com-pare
and
to check whether
is overconsistent orunderconsistent. We maintain a priority queue that alwayscontains exactly the inconsistent
(to be precise: it stores
rather than
) with priorities
(Invariant 2).
Transforming the EquationsWe transform Equations 1 and 2 as follows to ensure thatthey specify a swsf-fp problem, for all
,
and :
if
℄ otherwise (3)
(4)
procedure AdjustVariable(
)
if
then
if
then set
else set
else /*
*/ set
if
is in the priority queue then delete it
if
then insert
into the priority queue with priority
procedure SolveEquations()
while the priority queue is not empty do
delete the element with the smallest priority from the priority queue and assign it to
if
then
set
if
then for each
such that
do AdjustVariable(
)
else if
then for each
with
do AdjustVariable(
)
else
set
AdjustVariable(
)
if
then for each
such that
do AdjustVariable(
)
else if
then for each
with
do AdjustVariable(
)
procedure Main()
empty the priority queue
set
to the state whose heuristic value needs to get computed
for each
do set
for each
do AdjustVariable(
)
forever do
SolveEquations()
/* use
*/
set
and
to the state whose heuristic value needs to get computed next
for each
do AdjustVariable(p)
Figure 4: PINCH
The only difference is in the calculation of
. Thetransformed equations specify a swsf-fp problem since
,
℄ , and
are all
swsf in
and
for all
and all
. Thismeans that the transformed equations can be solved withDynamicSWSF-FP. They can be used to calculate
since it is easy to show that
and thus
.
Calculation of Heuristics with DynamicSWSF-FP
We apply DynamicSWSF-FP to the problem of calculating
the heuristic values of HSP 2.0, which reduces to solvingthe swsf fixed point problem defined by Equations 3 and 4.The solution is obtained by SolveEquations() shown in Fig-ure 4. In the algorithm, each
is a variable that containsthe corresponding
value, and each
is a variable thatcontains the corresponding
value.PINCH calls AdjustVariable() for each
to ensure thatInvariants 1 and 2 hold before it calls SolveEquations() forthe first time. It needs to call AdjustVariable() only for those
whose function
has changed before it calls SolveEqua-tions() again. The invariants will automatically continue tohold for all other
. If the state whose heuristic value needsto get computed changes from
to , then this changes onlythose functions
that correspond to the right-hand side of
Equation 3 for which
, in other words,where
is no longer part of
(and the corresponding vari-able thus is no longer clamped to zero) or just became part of (and the corresponding variable thus just became clampedto zero). SolveEquations() then operates as follows. The
solve Equations 3 and 4 if they are all consistent. Thus,SolveEquations() adjusts the values of the inconsistent
.It always removes the
with the smallest priority fromthe priority queue. If
is overconsistent then SolveEqua-
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 5/8
procedure AdjustVariable(
)
if
then
if
is not in the priority queue then insert it with priority
else change the priority of
in the priority queue to
else if
is in the priority queue then delete it
procedure SolveEquations()
while the priority queue is not empty do
assign the element with the smallest priority in the priority queue to
if
then
if
then
delete
from the priority queue
set
set
for each such that doif
then
set
else set
AdjustVariable(
)
else
set
if
then
set
AdjustVariable(
)
for each
such that
do
set
AdjustVariable(
)
else /*
*/
if
then
delete
from the priority queue
set
for each
with
AdjustVariable(
)
else
set
set
set
AdjustVariable(
)
for each
with
if
then
set
AdjustVariable(
)
procedure Main()
empty the priority queue
set
to the state whose heuristic value needs to get computed
for each
do set
for each
with
do set
for each
do
AdjustVariable(
)
forever do
SolveEquations()
/* use
*/
set
and
to the state whose heuristic value needs to get computed next
for each
do
AdjustVariable(
)
for each
do
AdjustVariable(
)
Figure 5: Optimized PINCH
tions() sets it to the value of
. This makes
consistent.Otherwise
is underconsistent and SolveEquations() setsit to infinity. This makes
either consistent or overcon-sistent. In the latter case, it remains in the priority queue.
Whether
was underconsistent or overconsistent, its valuegot changed. SolveEquations() then calls AdjustVariable()to maintain the Invariants 1 and 2. Once the priority queueis empty, SolveEquations() terminates since all
are con-sistent and thus solve Equations 3 and 4. One can provethat it changes the value of each
at most twice, namelyat most once when it is underconsistent and at most oncewhen it is overconsistent, and thus terminates in finite time(Ramalingam & Reps 1996).
Algorithmic Optimizations
PINCH can be optimized further. Its main inefficiency isthat it often iterates over a large number of propositions oroperators. Consider, for example, the case where hasthe smallest priority during an iteration of the while-loop inSolveEquations() and
. At some point in time,SolveEquations() then executes the following loop
for each
with
do AdjustVariable(
)
The for-loop iteratesover all propositionsthat satisfy its con-dition. For each of them, the call AdjustVariable( ) executes
set
The calculation of
therefore iterates over all operatorsthat contain in their add list. However, this iteration canbe avoided. Since
according to our assumption,SolveEquations() sets the value of
to
and thus de-creases it. All other values remain the same. Thus,
cannot increase and one can recalculate it faster as follows
set
Figure 5 shows PINCH after this and other optimizations. Inthe experimental section, we use this method rather than theunoptimized one to reduce the planning time. This reducesthe planning time, for example, by 20 percent in the Logis-tics domain and up to 90 percent in the Freecell domain.
Summary of Methods
Table 1 summarizes the methods that we have discussed andclassifies them according to whether they order the value up-dates (ordered updates) and whether they reuse informationfrom the calculation of previous heuristic values (incremen-tal computations).
Both VI and IVI perform full sweeps over all variables
and thus perform unordered updates. IVI reuses the valuesof variables from the computation of the previous heuris-tic value and is an incremental version of VI. We includedHSP2 although it is very similar to VI and belongs to thesame class because it is the method used by HSP 2.0. Itsonly difference from VI is that it eliminates all operator vari-ables, which simplifies the code.
GBF, HSP1, and GD order the value updates. They arelisted from left to right in order of increasing number of binary ordering constraints between value updates. GBFperforms full sweeps over all operator variables interleavedwith partial sweeps over those proposition variables whosevalues might have changed because they depend on oper-ator variables whose values have just changed. HSP1, the
method used by HSP 1.0, alternates partial sweeps over op-erator variables and partial sweeps over proposition vari-ables. Finally, both GD and PINCH order the value updatescompletely and thus do not perform any sweeps. This en-ables GD to update the value of each variable only onceand PINCH to update the value of each variable only twice.PINCH reuses the values of variables from the computationof the previous heuristic value and is an incremental versionof GD.
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 6/8
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 7/8
1000 2000 3000 4000 5000 6000 7000−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Blocks World
PINCHGBFHSP1GDIVIVI
PINCH
GBF
0.5 1 1.5 2 2.5 3 3.5 4
x 104
−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Blocks World: Random Instances
PINCHGBFHSP1GDIVIVI
PINCH
GBF
500 1000 1500 2000 2500 3000 3500−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Elevator
PINCHGBFHSP1GDIVIVI
PINCH
GBF
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 104
−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Freecell
PINCHGBFHSP1GDIVI
VI
PINCH
GBF
1000 2000 3000 4000 5000 6000 7000 8000 9000−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Logistics
PINCHGBFHSP1GDIVI
VI
PINCH
GBF
1000 1500 2000 2500 3000 3500−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Schedule
PINCHGBFHSP1GDIVI
VI
PINCH
GBF
1000 2000 3000 4000 5000 6000−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Gripper
PINCHGBFHSP1GDIVIVI
PINCH
GBF
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 104
−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Mprime
PINCHGBFHSP1GDIVIVI
PINCH
GBF
0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
−100
−80
−60
−40
−20
0
20
40
60
80
100
Problem Size (#P + #O)
R e l a t i v e S p e e d u p i n P l a n n i n g T i m e ( % )
Mystery
PINCHGBFHSP1GDIVIVI
PINCH
GBF
Figure 6: Relative Speedup in Planning Time over HSP2
ning time of PINCH over HSP2 tends to increase with theproblem size. For example, PINCH outperforms HSP2 byover 80 percent for large problems in several domains. Ac-cording to theoretical results given in (Ramalingam & Reps1996), the complexity of PINCH depends on the numberof operator and proposition variables whose values changefrom the calculation of one heuristic value to the next.Since some methods do not use operator variables, we onlycounted the number of proposition variables whose valueschanged. Table 2 lists the resulting dissimilarity measure
for the Logistics domain. In this domain, #CV/#P is a goodpredictor of the relative speedup in planning time of PINCHover HSP2, with a negative correlation coefficient of -0.96.More generally, the dissimilarity measure is a good predic-tor of the relative speedup in planning time of PINCH overHSP2 in all domains, with negative correlation coefficientsranging from -0.96 to -0.36. This insight can be used toexplain the relatively poor scaling behavior of PINCH in the
Freecell domain. In all other domains, the dissimilarity mea-sure is negatively correlated with the problem size, with neg-ative correlation coefficients ranging from -0.92 to -0.57. Inthe Freecell domain, however, the dissimilarity measure ispositively correlated with the problem size, with a positivecorrelation coefficient of 0.68. The Freecell domain repre-sents a solitaire game. As the problem size increases, thenumber of cards increases while the numbers of columnsand free cells remain constant. This results in additionalconstraints on the values of both proposition and operator
variables and thus in a larger number of value changes fromone calculation of the heuristic value to the next. This in-creases the dissimilarity measure and thus decreases the rel-ative speedup of PINCH over HSP2. Finally, there are prob-lems that HSP 2.0 with PINCH could solve in the 10-minutetime limit that neither the standard HSP 2.0 distribution, norHSP 2.0 with GBF, could solve. The Logistics domain issuch a domain.
8/10/2019 Speeding Up the Calculation of Heuristics for Heuristic Search-Based Planning
http://slidepdf.com/reader/full/speeding-up-the-calculation-of-heuristics-for-heuristic-search-based-planning 8/8
Second, GBF is at least as fast as HSP1 and HSP1 is atleast as fast as GD. Thus, the stricter the ordering of the vari-able updates among the three methods with ordered updatesand nonincremental computations, the smaller their relativespeedup over HSP2. This can be explained with the increas-ing overhead that results from the need to maintain the or-dering constraints with increasingly complex data structures.GBF does not use any additional data structures, HSP1 uses
an unordered list, and GD uses a priority queue. (We imple-mented the priority queue in turn as a binary heap, Fibonacciheap, and a multi-level bucket to ensure that this conclusionis independent of its implementation.) Since PINCH with-out value reuse reduces to GD, our experiments demonstratethat it is value reuse that allows PINCH to counteract theoverhead associated with the priority queue, and makes theplanning time of PINCH so competitive.
Conclusions
In this paper, we systematically evaluated methodsfor calcu-lating the heuristic values for HSP 2.0 with the
heuris-tics and demonstrated that the resulting planning times dif-fer substantially. We identified two different approaches for
speeding up the calculation of the heuristic values, namelyto order the value updates and to reuse information from thecalculation of previous heuristic values. We then showedhow these two approaches can be combined, resulting inour PINCH (Prioritized, INCremental Heuristics calcula-tion) method. PINCH outperforms both of the other ap-proaches individually as well as the methods used by HSP1.0 and HSP 2.0 for most of the large planning problemstested. In fact, it speeds up the planning time of HSP 2.0 byup to eighty percent in several domains and, in general, theamount of savings grows with the size of the domains. Thisis an important property since, if a method has a high relativespeedup for small problems, it wins by fractions of a secondwhich is insignificant in practice. However, if a method hasa high relative speedup for large problems, it wins by min-utes. Thus, PINCH allows HSP 2.0 to solve larger planningproblems than was possible before in the same amount of time and without changing its operation. We also have pre-liminary results that show that PINCH speeds up HSP 2.0with the
heuristics (Bonet & Geffner 2001b) by over20 percent and HSP 2.0 with the
(Haslum & Geffner2000) heuristic by over80 percent for small blocks world in-stances. We are currently working on demonstrating similarsavings for other heuristic search-based planners. For exam-ple, PINCH also applies in principle to the first stage of FF(Hoffmann & Nebel 2001a), where FF builds the planninggraph.
Acknowledgments
We thank Blai Bonet and Hector Geffner for making thecode of HSP 1.0 and HSP 2.0 available to us and for answer-ing our questions. We also thank Sylvie Thiebaux and JohnSlaney for making their blocksworld planning task genera-tor available to us. The Intelligent Decision-Making Groupis partly supported by NSF awards to Sven Koenig undercontracts IIS-9984827, IIS-0098807, and ITR/AP-0113881
as well as an IBM faculty partnership award. The viewsand conclusions contained in this document are those of theauthors and should not be interpreted as representing the of-ficial policies, either expressed or implied, of the sponsoringorganizations, agencies, companies or the U.S. government.
ReferencesBertsekas, D. 2001. Dynamic Programming and Optimal Control.
Athena Scientific, 2nd edition.Bonet, B., and Geffner, H. 1999. Planning as heuristic search:New results. In Proceedings of the 5th European Conference onPlanning, 360–372.
Bonet, B., and Geffner, H. 2001a. Heuristic search planner 2.0. Artificial Intelligence Magazine 22(3):77–80.
Bonet, B., and Geffner, H. 2001b. Planning as heuristicsearch. Artificial Intelligence – Special Issue on Heuristic Search129(1):5–33.
Bonet, B.; Loerincs, G.; and Geffner, H. 1997. A robust andfast action selection mechanism. In Proceedings of the NationalConference on Artificial Intelligence, 714–719.
Cormen, T.; Leiserson, C.; and Rivest, R. 1990. Introduction to Algorithms. MIT Press.
Dijkstra, E. 1959. A note on two problems in connection withgraphs. Numerical Mathematics 1:269–271.
Edelkamp, S. 2001. Planning with pattern databases. In Proceed-ings of the 6th European Conference on Planning, 13–24.
Haslum, P., and Geffner, H. 2000. Admissible heuristics for opti-mal planning. In Proceedings of the International Conference on
Artificial Intelligence Planning and Scheduling, 70–82.
Hoffmann, J., and Nebel, B. 2001a. The FF planning system:Fast plan generation through heuristic search. Journal of Artificial
Intelligence Research 14:253–302.
Hoffmann, J., and Nebel, B. 2001b. RIFO revisited: Detecting re-laxed irrelevance. In Proceedings of the 6th European Conferenceon Planning, 325–336.
Knuth, D. 1977. A generalization of Dijkstra’s algorithm. Infor-
mation Processing Letters 6(1):1–5.McDermott, D. 1996. A heuristic estimator for means-ends anal-ysis in planning. In Proceedings of the International Conferenceon Artificial Intelligence Planning and Scheduling, 142–149.
Moore, A., and Atkeson, C. 1993. Prioritized sweeping: Rein-forcement learning with less data and less time. Machine Learn-ing 13(1):103–130.
Nguyen, X.; Kambhampati, S.; and Nigenda, R. S. 2002. Plan-ning graph as the basis for deriving heuristics for plan synthesisby state space and csp search. Artificial Intelligence 135(1–2):73–123.
Pearl, J. 1985. Heuristics: Intelligent Search Strategies for Com- puter Problem Solving. Addison-Wesley.
Ramalingam, G., and Reps, T. 1996. An incremental algorithm
for a generalization of the shortest-path problem. Journal of Al-gorithms 21:267–305.
Refanidis, I., and Vlahavas, I. 2001. The GRT planning system:Backward heuristic construction in forward state-space planning.
Journal of Artificial Intelligence Research 15:115–161.