Efficient Stochastic Local Search for MPE Solving

1

Efficient Stochastic Local Efficient Stochastic Local Search Search

for MPE Solvingfor MPE Solving

Frank HutterFrank Hutter The University of British Columbia (UBC), Vancouver, The University of British Columbia (UBC), Vancouver,

CanadaCanada

Joint work with Holger Hoos (Joint work with Holger Hoos (UBC) andUBC) and

Thomas Stützle (Thomas Stützle (Darmstadt University of Darmstadt University of Technology, Germany)Technology, Germany)

2

SLS: general algorithmic SLS: general algorithmic framework for solving framework for solving

combinatorial problemscombinatorial problems

3

MPE in graphical models: MPE in graphical models: many applicationsmany applications

4

OutlineOutline Most probable explanation (MPE) problemMost probable explanation (MPE) problem

Problem definitionProblem definition Previous workPrevious work

SLS algorithms for MPESLS algorithms for MPE IllustrationIllustration Previous SLS algorithmsPrevious SLS algorithms Guided Local Search (GLS) in detailGuided Local Search (GLS) in detail

From Guided Local Search to GLSFrom Guided Local Search to GLS++ ModificationsModifications Performance gainsPerformance gains

Comparison to state-of-the-artComparison to state-of-the-art

5

MPE - problem definitionMPE - problem definition(in most general representation: (in most general representation:

factor graphs)factor graphs)

Given a factor graphGiven a factor graph Discrete Discrete Variables Variables XX = {X = {X11, ..., X, ..., Xnn}} Factors Factors = { = {11,...,,...,mm}} over subsets of over subsets of XX

A factor A factor ii over variables over variables VVii µ X µ X assigns a non-negative number to assigns a non-negative number to every complete instantiation every complete instantiation vvii of of VVii

FindFind Complete instantiation {xComplete instantiation {x11,...,x,...,xnn} maximizing } maximizing i=1i=1

mm ii[x[x11,...,x,...,xnn]] NP-hard (simple reduction from SAT)NP-hard (simple reduction from SAT) Also known as Max-product or Maximum a posteriori Also known as Max-product or Maximum a posteriori

(MAP)(MAP)

X1 X2 X4X3 X5

1 2 3 84

X6

5 6 7

88XX66XX55XX44

13.713.7111111

23.223.2001111

00110011

00000011

100100111100

3.23.20011000.90.9110000

0.10.100000021.221.211

0.90.922

0000

11XX112 1 0 0 1 0

6

Previous approaches for Previous approaches for solving MPEsolving MPE

Variable elimination / Junction treeVariable elimination / Junction tree Exponential in the graphical model´s induced Exponential in the graphical model´s induced

widthwidth Approximation with loopy belief propagation and Approximation with loopy belief propagation and

its generalizations its generalizations [Yedidia, Freeman, Weiss ´02][Yedidia, Freeman, Weiss ´02] Approximation with Mini Buckets (MB) Approximation with Mini Buckets (MB) [Dechter [Dechter

& Rish ´97] & Rish ´97] !! also gives lower & upper bound also gives lower & upper bound Search algorithmsSearch algorithms

Local SearchLocal Search Branch and Bound with various MB heuristics Branch and Bound with various MB heuristics

[Dechter´s group, ´99 - 05][Dechter´s group, ´99 - 05]UAI ´03: B&B with MB heuristic shown to be UAI ´03: B&B with MB heuristic shown to be state-of-the-artstate-of-the-art

7

Motivation for our workMotivation for our work B&B clearly outperforms best SLS algorithm so B&B clearly outperforms best SLS algorithm so

far, even on random problem instances far, even on random problem instances [Marinescu, Kask, Dechter, UAI ´03][Marinescu, Kask, Dechter, UAI ´03]

MPE is closely related to weighted Max-SAT MPE is closely related to weighted Max-SAT [Park ´02][Park ´02]

For Max-SAT, SLS is state-of-the-artFor Max-SAT, SLS is state-of-the-art(at the very least for random problems)(at the very least for random problems)

Why is SLS not state-of-the-art for MPE ?Why is SLS not state-of-the-art for MPE ? Additional problem structure inside the factorsAdditional problem structure inside the factors

But for completely random problems ?But for completely random problems ? SLS algos should be much better than they SLS algos should be much better than they

currently arecurrently are We took the best SLS algorithm so far (GLS) and We took the best SLS algorithm so far (GLS) and

improved itimproved it

8






9

SLS for MPE – SLS for MPE – illustrationillustration

XX

11

XX

22

22

00 00 2121

00 11 0.70.7

11 00 00

11 11 11

22 00 0.90.9

22 11 0.0.22

XX

33

44

00 0.0.99

11 0.10.1

X1 X2 X4X3

1 2 3 4 5

XX22 XX33 XX

44

55

00 00 00 1010

00 00 11 0.90.9

00 11 00 00

00 11 11 100100

11 00 00 33.33.22

11 00 11 00

11 11 00 23.223.2

11 11 11 13.713.7

XX11 11

00 00

11 21.221.2

22 0.0.11

2 1 0 0

X1

X3

3

00 00 1.11.1

00 11 2323

11 00 00

11 11 0.70.7

22 00 2.2.77

22 11 4242i=1i=1MM ii[[2,1,0,02,1,0,0] = ] = 0.1 0.1 ** 0.2 0.2 ** 2.7 2.7 **

0.9 0.9 ** 33.2 33.2

Instantiation:

10

SLS for MPE – SLS for MPE – illustrationillustration

XX

11

XX

22

22

00 00 2121

00 11 0.70.7

11 00 00

11 11 11

22 00 0.0.99

22 11 0.0.22

XX

33

44

00 0.0.99

11 0.10.1

X1 X2 X4X3

1 2 3 4 5

XX22 XX33 XX

44

55

00 00 00 101000 00 11 0.90.9

00 11 00 00

00 11 11 100100

11 00 00 33.33.22

11 00 11 00

11 11 00 23.223.2

11 11 11 13.713.7

XX11 11

00 00

11 21.221.2

22 0.0.11

2 1!0 0 0

X1

X3

3

00 00 1.11.1

00 11 2323

11 00 00

11 11 0.70.7

22 00 2.2.77

22 11 4242

Instantiation:

i=1i=1MM ii[2,[2,00,0,0],0,0] = =

i=1i=1MM ii[2,[2,11,0,0] * ,0,0] * 0.9/0.2 **

10/33.2

11

Previous SLS algorithms Previous SLS algorithms for MPEfor MPE

Iterative Conditional Modes Iterative Conditional Modes [Besag, ´86][Besag, ´86] Just greedy hill climbingJust greedy hill climbing

Stochastic SimulationStochastic Simulation Sampling algorithm, very poor for optimizationSampling algorithm, very poor for optimization

Greedy + Stochastic Simulation Greedy + Stochastic Simulation [Kask & [Kask & Dechter, ´99]Dechter, ´99] Outperforms the above & simulated annealing Outperforms the above & simulated annealing

by orders of magnitudeby orders of magnitude Guided Local Search (GLS) Guided Local Search (GLS) [Park ´02][Park ´02] (Iterated Local Search (ILS) (Iterated Local Search (ILS) [Hutter ´04][Hutter ´04]))

Outperforms Greedy + Stochastic Simulation Outperforms Greedy + Stochastic Simulation by orders of magnitudeby orders of magnitude

12

Guided Local Search (GLS) Guided Local Search (GLS) [Voudouris 1997][Voudouris 1997]

Subclass of Dynamic Subclass of Dynamic Local SearchLocal Search [Hoos & Stützle, 2004][Hoos & Stützle, 2004]::Iteratively:Iteratively:1) Local search 1) Local search !! local local optimumoptimum2) Modify evaluation function2) Modify evaluation function

In local optima: penalize some solution In local optima: penalize some solution featuresfeatures Solution features for MPE are partial assigmentsSolution features for MPE are partial assigments Evaluation fct. = Objective fct. - sum of respective penaltiesEvaluation fct. = Objective fct. - sum of respective penalties Penalty update rule Penalty update rule experimentally designedexperimentally designed Performs very well across many problem classesPerforms very well across many problem classes

.

...

13

GLS for MPE GLS for MPE [Park 2002][Park 2002] Initialize penalties to 0Initialize penalties to 0 Evaluation function:Evaluation function:

Obj. function - sum of penalties of current Obj. function - sum of penalties of current instantiationinstantiation

i=1i=1mm ii[x[x11,...,x,...,xnn] - ] - i=1i=1

pp ii[x[x11,...,x,...,xnn]] In local optimum:In local optimum:

Choose partial instantiations (according to GLS Choose partial instantiations (according to GLS update rule)update rule)

Increment their penalty by 1Increment their penalty by 1 Every NEvery N local optima local optima

Smooth all penalties by multiplying them with Smooth all penalties by multiplying them with < 1 < 1 Important to eventually optimize the original Important to eventually optimize the original

objective functionobjective function

14






15

GLS GLS !! GLS GLS++::Overview of modified Overview of modified

componentscomponents Modified evaluation functionModified evaluation function

Pay more attention to the actual objective functionPay more attention to the actual objective function Improved caching of evaluation functionImproved caching of evaluation function

Straightforward adaption from SAT caching Straightforward adaption from SAT caching schemesschemes

Tuning of smoothing parameter Tuning of smoothing parameter Over two orders of magnitude improvement !Over two orders of magnitude improvement !

Initialization with Mini-Buckets instead of Initialization with Mini-Buckets instead of randomrandom Was shown to perform better by Was shown to perform better by [Kask & Dechter, [Kask & Dechter,

1999]1999]

16

GLS GLS !! GLS GLS+ + (1)(1)Modified evaluation Modified evaluation

functionfunction GLSGLS

i=1i=1mm ii[x[x11,...,x,...,xnn]] - - i=1i=1

pp ii[x[x11,...,x,...,xnn]] Product of entries minus sum of penaltiesProduct of entries minus sum of penalties

¼¼ zero minus sum of penalties zero minus sum of penaltiesAlmost neglecting objective functionAlmost neglecting objective function

GLSGLS++

i=1i=1mm log( log(ii[x[x11,...,x,...,xnn])]) - - i=1i=1

pp ii[x[x11,...,x,...,xnn]] Use logarithmic objective functionUse logarithmic objective function Very simple, but much better resultsVery simple, but much better results Penalties are now just new temporary factors Penalties are now just new temporary factors

that decay over time!that decay over time! Could be improved by dynamic weighting of the Could be improved by dynamic weighting of the

penaltiespenalties

17

GLS GLS !! GLS GLS+ + (1)(1) Modified evaluation Modified evaluation

functionfunction Much faster in early stages of the searchMuch faster in early stages of the search Speedups of about 1 order of magnitudeSpeedups of about 1 order of magnitude

GLS

GLS+

GLS+

GLS

18

Time complexity for a single best-improvement Time complexity for a single best-improvement step:step: Previously best caching: Previously best caching: (|V| (|V| ££ |D |DVV| | ££ VV))

Improved caching: Improved caching: (|V(|Vimprovingimproving||££ |D |DVV|)|)

GLS GLS !! GLS GLS+ + (2)(2)Speedups by cachingSpeedups by caching

A

A

A

A

19

GLS GLS !! GLS GLS+ + (3)(3)Tuning the smoothing factor Tuning the smoothing factor [Park ´02][Park ´02] stated GLS to have ``no stated GLS to have ``no

parameters´´parameters´´ Changing Changing from Park`s setting 0.8 to 0.99from Park`s setting 0.8 to 0.99

Sometimes from unsolvable to millisecondsSometimes from unsolvable to milliseconds Effect increases for large instancesEffect increases for large instances

1

=

= 0.99 = 0.999 = 1

20

GLS GLS !! GLS GLS+ + (4)(4)Initialization with Mini-Initialization with Mini-

BucketsBuckets Sometimes a bit worse, sometimes much betterSometimes a bit worse, sometimes much better Particularly helps for some structured instancesParticularly helps for some structured instances

21






22

Comparison based on Comparison based on [Marinescu, Kask, Dechter, [Marinescu, Kask, Dechter,

UAI ´03]UAI ´03] Branch & Bound with MB heuristic was state-of-the-Branch & Bound with MB heuristic was state-of-the-

art for MPE, art for MPE, even for random instances!even for random instances!

Scales better than original GLS withScales better than original GLS with Number of variablesNumber of variables Domain sizeDomain size

Both as anytime algorithm and in terms of time Both as anytime algorithm and in terms of time needed to find optimumneeded to find optimum

On the same problem instances, we show that our On the same problem instances, we show that our new GLSnew GLS++ scales better than their implementation scales better than their implementation withwith Number of variablesNumber of variables Domain sizeDomain size DensityDensity Induced widthInduced width

23

Benchmark instancesBenchmark instances

Randomly generated Bayes netsRandomly generated Bayes nets Graph structure: completely Graph structure: completely

random/grid networks random/grid networks Controlled number of variables & Controlled number of variables &

domain sizedomain size Random networks with controlled Random networks with controlled

induced widthinduced width Bayesian networks from Bayes net Bayesian networks from Bayes net

repositoryrepository

24

Original GLS vs. B&B with MB Original GLS vs. B&B with MB heuristic :heuristic :

relative solution quality after 100 relative solution quality after 100 secondsseconds

for random grid networks of size for random grid networks of size NxNNxN

A

A

SmallMediu

m

Large

25

GLSGLS++ vs. GLS and B&B with MB vs. GLS and B&B with MB heuristic :heuristic :

relative solution quality after 100 relative solution quality after 100 secondsseconds

for random grid networks of size for random grid networks of size NxNNxN

SmallMediu

m

Large

26

GLSGLS++ vs. B&B with MB vs. B&B with MB heuristic : heuristic :

Solution time with increasing Solution time with increasing domain size on random networksdomain size on random networks

Small

Medium

Large

27

Solution times with Solution times with increasing increasing induced width on induced width on

random networksrandom networks

A

d-BBMBs-BBMB

Orig GLS

GLS+

28

Results for Bayes net Results for Bayes net repositoryrepository

GLSGLS++ shows overall best shows overall best performanceperformance Only algorithm to solve Link network Only algorithm to solve Link network

(in 1 second!)(in 1 second!) Problems for Barley and especially Problems for Barley and especially

DiabetesDiabetes Preprocessing with partial variable Preprocessing with partial variable

elimination helps a lotelimination helps a lot Can reduce #(variables) dramaticallyCan reduce #(variables) dramatically

29

ConclusionsConclusions SLS algorithms SLS algorithms areare competitive for MPE solving competitive for MPE solving

Scale very well, especially with induced widthScale very well, especially with induced width But they need careful design, analysis & parameter But they need careful design, analysis & parameter

tuningtuning SLS and Machine Learning (ML) people should SLS and Machine Learning (ML) people should

talktalk SLS can perform very well for some traditional ML SLS can perform very well for some traditional ML

problemsproblems Our C source code is onlineOur C source code is online

Please use it Please use it There‘s also a Matlab interfaceThere‘s also a Matlab interface

30

Extensions in progressExtensions in progress Real problem domainsReal problem domains

MRFs for stereo visionMRFs for stereo vision CRFs for sketch recognitionCRFs for sketch recognition

Domain-dependent extensionsDomain-dependent extensions Hierarchical SLS for problems in computer Hierarchical SLS for problems in computer

visionvision Automated parameter tuningAutomated parameter tuning

Use Machine Learning to predict runtime for Use Machine Learning to predict runtime for different settings of algorithm parametersdifferent settings of algorithm parameters

Use parameter setting with lowest predicted Use parameter setting with lowest predicted runtimeruntime

31

The EndThe End

Thanks to Thanks to Holger Hoos & Thomas StützleHolger Hoos & Thomas Stützle Radu Marinescu for their B&B codeRadu Marinescu for their B&B code You for your attention You for your attention

Documents

Efficient Stochastic Local Search for MPE Solving