Upload
habao
View
219
Download
0
Embed Size (px)
Citation preview
Overview
LS often scales up better than BT, but has
the reputation of being inferior to BT on struc-
tured SAT instances: does badly on SAT solver
competition industrial benchmarks
does LS need a boost (cf clause learning)? hy-
bridising LS with propagation or explicitly han-
dling variable dependencies helps, but BT is
still unbeaten on these problems
improving LS on structured problems would
have many practical applications, perhaps solv-
ing larger instances of real-world applications
than is currently possible
but first we must understand the cause of its
poor performance
1
conjecture current modelling practices are (un-
intentionally) biased in favour of BT:
• in SAT modelling we often eliminate sym-
metry, but symmetry appears harmless or
even helpful to LS
• SAT encodings of constraints sometimes
improve consistency reasoning of UP, but
UP is not used in most LS algorithms (and
the ladder structure in some of these en-
codings harms LS)
• dependent variables may be introduced when
aiming for compactness (eg Tseitin)
2
more specific conjecture the model feature to
blame for LS’s poor performance is often de-
pendent variables:
• dependencies are known to slow down LS
[Kautz, McAllester & Selman], especially
in long chains [Prestwich; Wei & Selman]
I test these conjectures by remodelling 2 prob-
lems whose large instances have long resisted
solution by local search: parity learning & Tow-
ers of Hanoi as STRIPS planning
I devising new encodings with reduced vari-
able dependency (also higher solution densi-
ties) and boost LS performance by several or-
ders of magnitude in both cases: solves 32-bit
& 6-disk instances for the first time using a
standard SAT local search algorithm (RSAPS)
3
parity learning
this is a well-known SAT benchmark
given vectors xi = (xi1, . . . , xin) (i = 1 . . . m)
with each xij ∈ {0,1}, a vector y = (y1, . . . , ym)
and an error tolerance integer k
find a vector a = (a1, . . . , an) s.t. |{i : parity(a·
xi) 6= yi}| ≤ k
to make hard instances set m = 2n and k =
7n/8
n is the number of bits
4
hard for both BT & LS, especially 32-bit in-
stances:
• only quite recently solved by BT [Bailleux
& Boufkhad; Baumgartner & Massacci; Li;
Warners & van Maaren]
• only in IJCAI’07 solved by LS (specially-
designed algorithm) [Pham, Thornton &
Sattar]
I show that, after reformulation, an off-the-
shelf LS (RSAPS) solves them in similar time
5
“standard encoding” (SATLIB parX-Y-c): 3
families of clauses:
• calculate parities of a · xi
• compute disagreements in parities
• encode a cardinality constraint to limit dis-
agreements
(n is a power of 2 so cardinality is easy)
6
my encoding variables Ai contain solution, Pj
denote parities
each scalar product a · xj has parity Pj:
Pj ≡⊕
i∈τj
Ai
(τj = {i |xij = T})
≤ k of m literals are true:
LE(k, π1, . . . , πm)
(πj is P̄j if yj = T and Pj if yj = F )
new SAT-encodings of cardinality and parity
with very short dependency chains: better for
LS
7
cardinality constraint LE(k, π1, . . . , πm) says that
≤ k literals are T
I use a bitwise encoding from a CP’06 work-
shop paper that worked well on clique problems
(but not yet tested against standard cardinality
encodings)
first consider k = 1 (AMO): define variables bk
(k = 1 . . . dlog2 me, add clauses
π̄i ∨ bk [or b̄k]
if bit k of the binary representation of i − 1 is
1 [or 0], where k = 1 . . . dlog2 me
(O(log m) new variables and O(m logm) binary
clauses)
8
now k > 1: suppose we have k bins, define
xij = T if πi is placed in bin j
every true πi is in a bin:
πi →
∨
j
xij
≤ 1 πi may be placed in a bin:
AMOi(xij)
using bitwise encoding
highly symmetric (πi can be permuted among
bins) but symmetry and LS go well together
9
parity constraints
(i) can SAT-encode⊕p
i=1 Pi = k by enumera-
tion: exponential encoding
(ii) decompose via new variables:
P1 ⊕ z1 ≡ k P2 ⊕ z2 ≡ z1 . . .Pp−3 ⊕ zp−3 ≡ zp−2 Pp ≡ zp−1
and use exponential encoding for binary & ternary
constraints: linear encoding (similar to SATLIB
encodings)
drawback: long chain of variable dependencies
10
(iii) bisect constraint, solve 2 subproblems, merge
results by a ternary constraint: bisection en-
coding replaces chain of length p by a tree of
depth log p
(iv) decompose⊕p
i=1 Pi = k into
⊕αi=1 Pi ≡ k1
⊕2αi=α+1 Pi ≡ k2 . . .
⊕pi=p−α+1 Pi ≡ kβ and
⊕βi=1 ki ≡ k
where β = dp/αe and tree branching factor α
satisfies 1 < α < p; tree depth 2
Exponentially encode remaining parity constraints:
shallow encoding
11
results
bisection and linear encoding very similar in
flips and time to SATLIB encodings (see pa-
per): are trees as harmful as chains?
best results (β = 10, median secs to find a
solution):
n linear shallow
8 0.00 0.0012 0.02 0.0316 11 0.4720 408 1124 — 27228 — 3,64032 — 49,633
12
extrapolation: 2 years for n=32 with linear
encoding
[Pham, Thornton & Sattar] results similar in
flips & time (but unspecified machine)
best BT results are better: improve LS by sim-
ilar preprocessing?
13
Towers of Hanoi
SAT-based STRIPS planning achieves very good
results in competitions
ToH-as-STRIPS up to 6 discs has been solved
by BT, but only up to 4 discs by LS (hardness
increases rapidly)
perhaps because it has dependency chains and
only 1 solution [Selman]
I design a new encoding that eliminates both
these features
14
standard STRIPS as SAT
set an upper bound on discrete times for the
plan length
define variables for (i) state after last action,
and (ii) actions at each time
define clauses for (i) linearity (≤ 1 action at
any time), (ii) actions imply preconditions and
effects, (iii) frame axioms (explanatory version:
list reasons a fluent changes at each time), (iv)
describe initial and goal states
a series of improvements for LS...
15
ToH domain knowledge
ToH is usually STRIPS-modelled like Blocks
World: model which peg or disc each disc is
on, and which pegs and discs are “clear”
instead we can just model which peg each disc
is on: the order is implied (no large discs on
small ones)
gives a more compact STRIPS model: no “clear”
predicate, no discs “on” other discs, fewer ac-
tions
16
superparallelism
an important technique in planning: parallel
plans allowing more than one action at a given
time
plan lengths (and SAT models) can be shorter
also increases the solution density of the SAT
problem: a linear plan often corresponds to
exponentially many parallel plans
there’s no parallelism in ToH but we can create
some by removing some exclusion axioms, eg...
17
• allow disk 1 to move from peg 1 to peg 2,
and disk 2 to move from peg 3 to peg 2,
at the same time
• allow disk 1 to move from peg 1 to peg 2,
and disk 2 to move from peg 2 to peg 3,
at the same time
these parallel plans can be transformed to lin-
ear ones in polynomial time: superparallelism
(adds parallelism beyond any that is naturally
present in the model)
drawback: can’t insist on optimal plans
18
long-range dependencies
frame axioms create dependency chains
seems to be no way to avoid these chains, as
they are a property of the problem itself and
not the encoding
but we can break up the chain structure by
using the method of [Wei & Selman]: add im-
plied clauses to cause long-range dependencies
between times further apart than 1 unit
I use a generalisation of explanatory frame ax-
ioms (GEF axioms) to time differences ≥ 1
adding all GEF axioms increases space com-
plexity, but we can add a randomly-chosen sub-
set of them
luckily Wei & Selman showed that adding a rel-
atively small number was optimal, and I found
the same (5%)
19
implied clauses
I add exclusion axioms corresponding to two
disks making the same move
(can never occur because the larger disk’s pre-
conditions are unsatisfied if the smaller one is
on the same peg, so these clauses are redun-
dant)
20
results
execution time (seconds)
D standard compact parallel GEF
3 0.096 0.0058 0.0010 0.00104 — 5.8 0.0093 0.0175 — — 1.8 0.306 — — — 980
“—”: > 109 flips
each technique greatly improves performance
(>5 orders of magnitude in flips for 4 discs)
first SAT LS results for 5 and 6 disks, compara-
ble to the best BT results (though at the cost
of reducing plan quality by superparallelism)
further improvements possible by operator split-
ting (which reduces the space complexity of
SAT-encoded planning problems) and prepro-
cessing by unit propagation and subsumption
21
results
also added GEF axioms to standard model but
could not solve 4 disks
more results: AdaptNovelty+ and VW faster
on compact than standard, even faster with su-
perparallelism; AdaptNovelty+ faster with GEF
axioms, VW hardly affected (apart from the
overhead of maintaining the additional clauses);
ZChaff, SATZ & SATO all improved by com-
pact encoding, ZChaff faster with superparal-
lelism, SATZ and SATO slower, SATO faster
with GEF axioms, ZChaff and SATZ slower
in other words: compact encoding helps all al-
gorithms, other techniques mostly help LS but
are erratic on BT (modelling for LS is distinct
from modelling for BT)
22
summary
LS on hard structured problems can be hugely
boosted by reformulation
reducing dependency chains and increasing so-
lution density seem to be key techniques when
modelling for LS — but not BT [Minton, John-
ston, Philips & Laird]
different from aims of modelling for BT: sym-
metry elimination and consistency of unit prop-
agation
modelling for LS is distinct from modelling for
BT, and worth studying
23
aside shouldn’t increased solution density also
help BT?
not necessarily: structured SAT problems may
contain clusters of solutions, and Minton et
al.’s nonsystematic search hypothesis is that
LS benefits more than BT from high solution
density
this is because LS is largely immune to cluster-
ing while BT may start from a point far from
any cluster
24
possible applications:
• parity constraint shallow encoding may be
useful for cryptanalysis
• cardinality constraint encoding has many
potential applications, but not yet tested
against known encodings
• superparallelism can be applied to STRIPS
models of other planning problems
• GEF axioms can be added to SAT-based
planning systems
• BMC has a similar structure to planning
and contains parity constraints, so it may
also benefit
25