48
March 4, 2008 Erik Aurell, KTH Computationa l Biology 1 KTH/CSC Empirical investigations of local search on random KSAT for K = 3,4,5,6... CDInfos0803 Program Kavli Institute for Theoretical Physics China Erik Aurell KTH Royal Institute of Technology Stockholm, Sweden

Empirical investigations of local search on random KSAT for K = 3,4,5,6

  • Upload
    lovie

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Empirical investigations of local search on random KSAT for K = 3,4,5,6. CDInfos0803 Program Kavli Institute for Theoretical Physics China Erik Aurell KTH Royal Institute of Technology Stockholm, Sweden. Circumspect descent prevails in solving combinatorial optimization problems. - PowerPoint PPT Presentation

Citation preview

Page 1: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 1

KTH/CSC

Empirical investigationsof local search on random

KSAT for K = 3,4,5,6...

CDInfos0803 Program

Kavli Institute for Theoretical Physics China

Erik AurellKTH Royal Institute of Technology

Stockholm, Sweden

Page 2: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 2

KTH/CSC

Circumspect descent prevails in solving combinatorial optimization problems

Mikko Alava, John Ardelius, E.A., Petteri Kaski, Supriya Krishnamurthy, Pekka Orponen, Sakari Seitz, arXiv:0711.4902 (Nov 30, 2007)

Earlier work by E.A., Scott Kirkpatrick and Uri Gordon(2004), Alava, Orponen and Seitz (2005), Ardelius and E.A. (2006), Ardelius, E.A. and Krishnamurthy (2007)……and many others

Page 3: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 3

KTH/CSC

Why did we get into this?

Page 4: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 4

KTH/CSC

Let me give three reasons

Page 5: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 5

KTH/CSC

it is a fundamental and practically important problem...which I learnt about working for the Swedish railways

E.A. J. Ekman, Capacity of single rail yards [in Swedish], Swedish RailwayAuthority Technical reports (2002)

Page 6: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 6

KTH/CSC

They have potential, under-usedapplications in systems biology

As an example I will describea consulting work we did forGlobal Genomics, a now defunct Swedish Biotech Company. They claimed to have a new method to measure global gene expression. Many oftheir ideas were in fact from S. Brenner and K. Livak, PNAS 86 (1989), 8902-06, and K. Kato, Nucleic Acids Res. 23 (1995), 3685-3690.

Page 7: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 7

KTH/CSC

The problem is that using only onerestriction Type IIS enzyme, thereis not enough information in thedata to determine which genes were expressed (many genes could have given rise to a givenpeak).

Kato (1995) tried using several enzymes of the same type sequentially. Problem: loss ofaccuracy, complicated.

Global Genomics AB’s inventionwas to use several enzymes in parallel.

Page 8: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 8

KTH/CSC

observationsgene database

100

30

70

30

70

gene 1

gene 3

gene 2

All possible matchings

gene database observations

An optimal matching

100

30

70

30

70

gene 130

gene 3 70

gene 2

The Global Genomics invention in led to a optimal matching problem

A. Ameur, E.A., M. Carlsson, J. Orzechowski Westholm, “Global gene expression analysis by combinatorial optimization”, In Silico Biology 4 (0020) (2004)

Matching the observations to a gene database gives a bipartite graph, where a link between a gene g and an observation o represents the fact that o could be an observation of g.

The best matching can be represented as a subgraph of the graph above + expression levels.

Page 9: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 9

KTH/CSC

Testing using the FANTOM data base of mouse cDNA (RIKEN)

For in silico testing we used theFANTOM data base of full-lengthmouse cDNA, available atgenome.gsc.riken.go.jp

We used an early 2003 version of60 770 RIKEN full-length clones,partitioned into 33 409 groupsrepresenting different genes.

This second list can be taken a proxy of all genes in mouse.

Principle of in silico tests:

3. Generate random peak and length perturbations

1. Select a fraction of genes

2. Generate random exp. levels

4. Run the algorithm 5. Compare

Page 10: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 10

KTH/CSC

both methods solve the optimization according to the given criteria when the perturbation parameters are small enough

the methods are comparable atlow or moderate fraction of genesexpressed

local search is superior at high fraction of genes expressed

Ameur et al (2004)

Page 11: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 11

KTH/CSC

In theory,combinatorial optimizationand constraint satisfiability

give rise to many of thecomputationally hardest

problems

Page 12: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 12

KTH/CSC

In practice,combinatorial optimizationand constraint satisfaction

problems are routinely solved by complete methods

(branch-and-bound), local search heuristics, by mixed integer programming, etc.

Page 13: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 13

KTH/CSC

How is this possible?Following many others

we will look at a simple model

Page 14: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 14

KTH/CSC

Let there be N Boolean variables, and 2N literals

Pa L1

a L2

a ... Lk

aLet there be M logical propositions (clauses)

P P1 P2 ... PMCan all M clauses be satisfied simultaneously?

Random K-satisfiability problems

A clause expresses that one out of 2k possible configurations of k variables

is forbidden. Clauses are picked randomly (with replacement) from all

possible k-tuples of variables.

Page 15: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 15

KTH/CSC

The 4.3 Point

0.02 3 4 5

Ratio of Clauses-to-Variables

6 7 8

0.2

0.6

Pro

bability

DP C

alls

0.4

50 var 40 var 20 var

50% sat

Mitchell, Selman, and Levesque 1991

0.8

1.0

0

1000

3000

2000

4000

M N

KSAT characterized

by number of clauses

per variable

phase transition between

almost surely SAT to

almost surely UNSAT

Algorithms take longest

time (on the average) close

to phase boundary

Mitchell, Selman, Levesque (AAAI-92) Kirkpatrick, Selman, Science 264:1297

(1994)

Several simple algorithms take

a.s. linear time for α small enough

Page 16: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 16

KTH/CSC

one state

A now about decade old statistical physics prediction of 3SAT

and other constraint satisfaction problems: a clustering transition

SAT UNSAT

many states many states

no solutions

M Nd

3.92

cr4.27

3SAT threshold values

Page 17: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 17

KTH/CSCThe Mezard, Palassini and Rivoire 2005 prediction for 3COL

Obtained by entropic cavity method, computing within a 1RSB

scenario the number of states with a given number of solutions

one green statemany green states, but most solutions

in one or a few big states

Page 18: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 18

KTH/CSC

The latest clustering predictions for KSAT, K > 3 are in F Krzakała, A.

Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová.”Gibbs states and the set of solutions of random constraint satisfaction problems” PNAS 2007 Jun 19;104(25):10318-23.

single cluster

many small clusters

but most solutions in

a few of them

many clusters and

solutions are found

in a large set of all

about equal size

Page 19: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 19

KTH/CSC

many clusters and

solutions are found

in a large set of all

about equal size

most clusters disappear, and

again most solutions are found

in a small number of them

The cluster condensation transition in F Krzakała et al (2007)

Page 20: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 20

KTH/CSC

So does clustering infact pose a problem tosimple local search?

Are the known/features of the static landscape

relevant to dynamics?

Page 21: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 21

KTH/CSC

a landscape that could be difficult for local search

courtesy Sui Huang

global minimum

local minima

another local

minimum

Page 22: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 22

KTH/CSC

Not quite like an equilibrium physics process in detailed balance,

because only variables in unsatisfied clauses are updated

Solves 3SAT in linear time on average up to α about 2.7

Papadimitriou invented a stochastic local search algorithm for

SAT problems in 1991, today often referred to as RandomWalksat:

Pick an unsatisfied clause

Pick a variable in that clause, flip it, loop

Page 23: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 23

KTH/CSC

A benchmark algorithm is Cohen-Kautz-Selman walksat

www.cs.wahington.edu/homes/kautz/walksat

Pick an unsatisfied clause

Compute for each variable in the clause the breakclause

If any variable has breakclause zero, flip it, loop

With probability p, flip variable with least breakclause, loop

Else, with probability 1-p, flip random variable in clause, loop

Solves 3SAT in linear time on average up to α about 4.15

Using default parameters from the public repository

(Aurell, Gordon, Kirkpatrick (2004)

breakclause is the number of other, presently satisfied,

clauses, that would be broken if the variable is flipped

Page 24: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 24

KTH/CSC

We have worked with the Focused Metropolis

Search (FMS) algorithm, and ASAT, an alternative version

ASAT: if you have a solution, output and stop

Loop

Also not in detailed balance (also tries only unsat clauses)

Parameter p has to be optimized. The optimal

value depends on the problem class, e.g. about 0.2 for 3SAT

Pick an unsatisfied clause

Pick randomly a variable in the clause

If flipping that variable decreases the energy, do so

If not, flip the variable with probability p

Page 25: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 25

KTH/CSC

Algorithm 1. ChainSAT

S = random assignment of values to the variableschaining = FALSEwhile S is not a solution do

if not chaining thenC = a clause not satisfied by S selected uniformly at randomV = a variable in C selected uniformly at random

end ifΔE = change in the number of unsatisfied clauses if V is flipped in Sif ΔE = 0 then

flip V in Selse if ΔE < 0 then

with probability p1

flip V in Send with

end ifchaining = FALSEif ΔE > 0 then

with probability 1 – p2

C = a clause that is satisfied only by V selected uniformly at randomX = a variable in C other than V selected uniformly at randomV = Xchaining = TRUE

end withend if

end while

We have a new algorithm ChainSAT which by design never goes up in energy

Page 26: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 26

KTH/CSC Solution course of a goodlocal search (ASAT at 4.2)

Page 27: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 27

KTH/CSC

Runtimes for ASAT on 3SATat α=4.21

Ardelius and E.A. (2006)

Page 28: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 28

KTH/CSC

Runtimes for ASAT on 3SATat α=4.25

Ardelius and E.A. (2006)

Page 29: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 29

KTH/CSCFMS on 4SATat α=9.6

Page 30: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 30

KTH/CSCChainSAT on 4SAT, 5SAT, 6SAT

Page 31: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 31

KTH/CSC

Do we know how localsearch fails on hard CSPs?

The first guess would be thatlocal search fails if solutionshave little slackness which isexpressed by Parisi whitening

Page 32: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 32

KTH/CSC

Page 33: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 33

KTH/CSC

Several proposed clusteringtransitions do not stopcircumspect descent

Not even an algorithmwhich would be trapped in

a potential well of any depthThe reason why local searcheventually fails is unknown

Page 34: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 34

KTH/CSCClustering has been rigorously proven for

KSAT and K greater than 8

For K less than 8 there arecavity method predictions

How does numerics compareto these?

Page 35: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 35

KTH/CSC

Solve a 3SAT instance L times with a stochastic local search (ASAT)

Compute the overlaps between these L solutions

See how that quantity changes with α

average overlap variance of the overlap

Ardelius, E.A. and Krishnamurthy (2007)

Page 36: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 36

KTH/CSC

The rank ordered plots of the overlaps in a chain of instanceswith increasing number of clauses displays a transition around 4.25

Ardelius, E.A. and Krishnamurthy (2007)

α ranges from 3.5 to 4.3

N is 2000

for α = 4.3 repeat until

solvable instance found

for α < = 4.3 repeat until

ASAT finds many solutions

on the instance

Page 37: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 37

KTH/CSC

Generate many chains of instances, check for the α at which allsolutions found have an overlap of at least 80%

Ardelius, E.A. and Krishnamurthy (2007)

N is 100, 200, 400, 1000, 2000Number of chains at each N is 110If a chain does not reach the 80% threshold, repeat

Threshold is between 4.25 and 4.27, could in fact coincide with SAT/UNSAT for 3SAT

This is not in contradiction with thetheoretical predictions of Krzakalaet al (2007) who do not address3SAT

Page 38: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 38

KTH/CSCFMS diffusion 4SAT different α

Page 39: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 39

KTH/CSCFMS diffusion 4SAT α=9.6

Page 40: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 40

KTH/CSCFMS diffusion 4SAT different N

Page 41: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 41

KTH/CSC

As far as numerics cantell, if there are clustersbeyond the clustering

transitions in 4SAT, theyare not separated by

overlap

Page 42: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 42

KTH/CSC

How does local searchcompare to more sophisticated (and

specialized) methodsthat we will hear about

at this school?(here I have to go to PDF)

Page 43: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 43

KTH/CSC

A question to the experts:

Which is (or are) the goodmetrics to compare runtimes?

Wall-clock time? Some intrinsic count?

Page 44: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 44

KTH/CSCConclusions

Local heuristics (walksat, Focused Metropolis Search,

Focused Record-to-Record Travel, ASAT, ChainSAT) are

effective on hard random 3SAT, 4SAT… problems

This is true even if the heuristic by design can never get out

of a potential well, of any depth (ChainSAT). Traps in the

landscape do not stop these algorithms.

There seems to be a “clustering condensation” transition in 3SAT

very close to SAT/UNSAT transition.

If there is a clustering transition in 4SAT, these clusters do not

seem to be separated in overlap (in contrast to K equal to 8 and greater)

Page 45: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 45

KTH/CSCThanks to

John Ardelius

Supriya Krishnamurthy

Mikko Alava

Petteri Kaski

Pekka Orponen

Sakari Seitz

KTH/CSC

Page 46: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 46

KTH/CSC

N is 1000, is 4.2

Energy as function of time Distance to target

Is the search trapped in “potential wells” of metastable states?

ASAT linear regime, solution in 1000 sweeps

Page 47: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 47

KTH/CSC

N is 1000, is 4.3

Energy as function of time Distance to target

Is the search trapped in “potential wells” of metastable states?

ASAT nonlinear regime, no barrier seen

Page 48: Empirical investigations of local search on random KSAT for K = 3,4,5,6

March 4, 2008 Erik Aurell, KTH Computational Biology 48

KTH/CSC

N is 1000, is 4.1

Energy as function of time Distance to target

Is the search trapped in “potential wells” of metastable states?

ASAT linear regime, solution in 20 sweeps