27
Variable Dependency in Local Search: Prevention is Better than Cure Steve Prestwich

Variable Dependency in Local Search: Prevention is Better than Cure

  • Upload
    habao

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Variable Dependency in Local

Search: Prevention is Better

than Cure

Steve Prestwich

Overview

LS often scales up better than BT, but has

the reputation of being inferior to BT on struc-

tured SAT instances: does badly on SAT solver

competition industrial benchmarks

does LS need a boost (cf clause learning)? hy-

bridising LS with propagation or explicitly han-

dling variable dependencies helps, but BT is

still unbeaten on these problems

improving LS on structured problems would

have many practical applications, perhaps solv-

ing larger instances of real-world applications

than is currently possible

but first we must understand the cause of its

poor performance

1

conjecture current modelling practices are (un-

intentionally) biased in favour of BT:

• in SAT modelling we often eliminate sym-

metry, but symmetry appears harmless or

even helpful to LS

• SAT encodings of constraints sometimes

improve consistency reasoning of UP, but

UP is not used in most LS algorithms (and

the ladder structure in some of these en-

codings harms LS)

• dependent variables may be introduced when

aiming for compactness (eg Tseitin)

2

more specific conjecture the model feature to

blame for LS’s poor performance is often de-

pendent variables:

• dependencies are known to slow down LS

[Kautz, McAllester & Selman], especially

in long chains [Prestwich; Wei & Selman]

I test these conjectures by remodelling 2 prob-

lems whose large instances have long resisted

solution by local search: parity learning & Tow-

ers of Hanoi as STRIPS planning

I devising new encodings with reduced vari-

able dependency (also higher solution densi-

ties) and boost LS performance by several or-

ders of magnitude in both cases: solves 32-bit

& 6-disk instances for the first time using a

standard SAT local search algorithm (RSAPS)

3

parity learning

this is a well-known SAT benchmark

given vectors xi = (xi1, . . . , xin) (i = 1 . . . m)

with each xij ∈ {0,1}, a vector y = (y1, . . . , ym)

and an error tolerance integer k

find a vector a = (a1, . . . , an) s.t. |{i : parity(a·

xi) 6= yi}| ≤ k

to make hard instances set m = 2n and k =

7n/8

n is the number of bits

4

hard for both BT & LS, especially 32-bit in-

stances:

• only quite recently solved by BT [Bailleux

& Boufkhad; Baumgartner & Massacci; Li;

Warners & van Maaren]

• only in IJCAI’07 solved by LS (specially-

designed algorithm) [Pham, Thornton &

Sattar]

I show that, after reformulation, an off-the-

shelf LS (RSAPS) solves them in similar time

5

“standard encoding” (SATLIB parX-Y-c): 3

families of clauses:

• calculate parities of a · xi

• compute disagreements in parities

• encode a cardinality constraint to limit dis-

agreements

(n is a power of 2 so cardinality is easy)

6

my encoding variables Ai contain solution, Pj

denote parities

each scalar product a · xj has parity Pj:

Pj ≡⊕

i∈τj

Ai

(τj = {i |xij = T})

≤ k of m literals are true:

LE(k, π1, . . . , πm)

(πj is P̄j if yj = T and Pj if yj = F )

new SAT-encodings of cardinality and parity

with very short dependency chains: better for

LS

7

cardinality constraint LE(k, π1, . . . , πm) says that

≤ k literals are T

I use a bitwise encoding from a CP’06 work-

shop paper that worked well on clique problems

(but not yet tested against standard cardinality

encodings)

first consider k = 1 (AMO): define variables bk

(k = 1 . . . dlog2 me, add clauses

π̄i ∨ bk [or b̄k]

if bit k of the binary representation of i − 1 is

1 [or 0], where k = 1 . . . dlog2 me

(O(log m) new variables and O(m logm) binary

clauses)

8

now k > 1: suppose we have k bins, define

xij = T if πi is placed in bin j

every true πi is in a bin:

πi →

j

xij

≤ 1 πi may be placed in a bin:

AMOi(xij)

using bitwise encoding

highly symmetric (πi can be permuted among

bins) but symmetry and LS go well together

9

parity constraints

(i) can SAT-encode⊕p

i=1 Pi = k by enumera-

tion: exponential encoding

(ii) decompose via new variables:

P1 ⊕ z1 ≡ k P2 ⊕ z2 ≡ z1 . . .Pp−3 ⊕ zp−3 ≡ zp−2 Pp ≡ zp−1

and use exponential encoding for binary & ternary

constraints: linear encoding (similar to SATLIB

encodings)

drawback: long chain of variable dependencies

10

(iii) bisect constraint, solve 2 subproblems, merge

results by a ternary constraint: bisection en-

coding replaces chain of length p by a tree of

depth log p

(iv) decompose⊕p

i=1 Pi = k into

⊕αi=1 Pi ≡ k1

⊕2αi=α+1 Pi ≡ k2 . . .

⊕pi=p−α+1 Pi ≡ kβ and

⊕βi=1 ki ≡ k

where β = dp/αe and tree branching factor α

satisfies 1 < α < p; tree depth 2

Exponentially encode remaining parity constraints:

shallow encoding

11

results

bisection and linear encoding very similar in

flips and time to SATLIB encodings (see pa-

per): are trees as harmful as chains?

best results (β = 10, median secs to find a

solution):

n linear shallow

8 0.00 0.0012 0.02 0.0316 11 0.4720 408 1124 — 27228 — 3,64032 — 49,633

12

extrapolation: 2 years for n=32 with linear

encoding

[Pham, Thornton & Sattar] results similar in

flips & time (but unspecified machine)

best BT results are better: improve LS by sim-

ilar preprocessing?

13

Towers of Hanoi

SAT-based STRIPS planning achieves very good

results in competitions

ToH-as-STRIPS up to 6 discs has been solved

by BT, but only up to 4 discs by LS (hardness

increases rapidly)

perhaps because it has dependency chains and

only 1 solution [Selman]

I design a new encoding that eliminates both

these features

14

standard STRIPS as SAT

set an upper bound on discrete times for the

plan length

define variables for (i) state after last action,

and (ii) actions at each time

define clauses for (i) linearity (≤ 1 action at

any time), (ii) actions imply preconditions and

effects, (iii) frame axioms (explanatory version:

list reasons a fluent changes at each time), (iv)

describe initial and goal states

a series of improvements for LS...

15

ToH domain knowledge

ToH is usually STRIPS-modelled like Blocks

World: model which peg or disc each disc is

on, and which pegs and discs are “clear”

instead we can just model which peg each disc

is on: the order is implied (no large discs on

small ones)

gives a more compact STRIPS model: no “clear”

predicate, no discs “on” other discs, fewer ac-

tions

16

superparallelism

an important technique in planning: parallel

plans allowing more than one action at a given

time

plan lengths (and SAT models) can be shorter

also increases the solution density of the SAT

problem: a linear plan often corresponds to

exponentially many parallel plans

there’s no parallelism in ToH but we can create

some by removing some exclusion axioms, eg...

17

• allow disk 1 to move from peg 1 to peg 2,

and disk 2 to move from peg 3 to peg 2,

at the same time

• allow disk 1 to move from peg 1 to peg 2,

and disk 2 to move from peg 2 to peg 3,

at the same time

these parallel plans can be transformed to lin-

ear ones in polynomial time: superparallelism

(adds parallelism beyond any that is naturally

present in the model)

drawback: can’t insist on optimal plans

18

long-range dependencies

frame axioms create dependency chains

seems to be no way to avoid these chains, as

they are a property of the problem itself and

not the encoding

but we can break up the chain structure by

using the method of [Wei & Selman]: add im-

plied clauses to cause long-range dependencies

between times further apart than 1 unit

I use a generalisation of explanatory frame ax-

ioms (GEF axioms) to time differences ≥ 1

adding all GEF axioms increases space com-

plexity, but we can add a randomly-chosen sub-

set of them

luckily Wei & Selman showed that adding a rel-

atively small number was optimal, and I found

the same (5%)

19

implied clauses

I add exclusion axioms corresponding to two

disks making the same move

(can never occur because the larger disk’s pre-

conditions are unsatisfied if the smaller one is

on the same peg, so these clauses are redun-

dant)

20

results

execution time (seconds)

D standard compact parallel GEF

3 0.096 0.0058 0.0010 0.00104 — 5.8 0.0093 0.0175 — — 1.8 0.306 — — — 980

“—”: > 109 flips

each technique greatly improves performance

(>5 orders of magnitude in flips for 4 discs)

first SAT LS results for 5 and 6 disks, compara-

ble to the best BT results (though at the cost

of reducing plan quality by superparallelism)

further improvements possible by operator split-

ting (which reduces the space complexity of

SAT-encoded planning problems) and prepro-

cessing by unit propagation and subsumption

21

results

also added GEF axioms to standard model but

could not solve 4 disks

more results: AdaptNovelty+ and VW faster

on compact than standard, even faster with su-

perparallelism; AdaptNovelty+ faster with GEF

axioms, VW hardly affected (apart from the

overhead of maintaining the additional clauses);

ZChaff, SATZ & SATO all improved by com-

pact encoding, ZChaff faster with superparal-

lelism, SATZ and SATO slower, SATO faster

with GEF axioms, ZChaff and SATZ slower

in other words: compact encoding helps all al-

gorithms, other techniques mostly help LS but

are erratic on BT (modelling for LS is distinct

from modelling for BT)

22

summary

LS on hard structured problems can be hugely

boosted by reformulation

reducing dependency chains and increasing so-

lution density seem to be key techniques when

modelling for LS — but not BT [Minton, John-

ston, Philips & Laird]

different from aims of modelling for BT: sym-

metry elimination and consistency of unit prop-

agation

modelling for LS is distinct from modelling for

BT, and worth studying

23

aside shouldn’t increased solution density also

help BT?

not necessarily: structured SAT problems may

contain clusters of solutions, and Minton et

al.’s nonsystematic search hypothesis is that

LS benefits more than BT from high solution

density

this is because LS is largely immune to cluster-

ing while BT may start from a point far from

any cluster

24

possible applications:

• parity constraint shallow encoding may be

useful for cryptanalysis

• cardinality constraint encoding has many

potential applications, but not yet tested

against known encodings

• superparallelism can be applied to STRIPS

models of other planning problems

• GEF axioms can be added to SAT-based

planning systems

• BMC has a similar structure to planning

and contains parity constraints, so it may

also benefit

25

conclusion

LS can be hugely boosted on some structured

problems by remodelling

no need for complex new algorithms or analy-

sis?

could combine the 2 approaches

26