10
When Does Dependency Modelling Help? Using a Randomized Landscape Generator to Compare Algorithms in Terms of Problem Structure Rachael Morgan and Marcus Gallagher School of Information Technology and Electrical Engineering, University of Queensland, Brisbane 4072, Australia [email protected], [email protected] Abstract. In this paper we extend a previously proposed randomized landscape generator in combination with a comparative experimental methodology to study the behaviour of continuous metaheuristic optimization algorithms. In particular, we generate landscapes with parameterised, linear ridge structure and perform pairwise comparisons of algorithms to gain insight into what kind of problems are easy and difficult for one algorithm instance relative to another. We apply this methodology to investigate the specific issue of explicit dependency mod- elling in simple continuous Estimation of Distribution Algorithms. Experimental results reveal specific examples of landscapes (with certain identifiable features) where dependency modelling is useful, harmful or has little impact on average algorithm performance. The results are related to some previous intuition about the behaviour of these algorithms, but at the same time lead to new insights into the relationship between dependency modelling in EDAs and the structure of the problem landscape. The overall methodology is quite general and could be used to examine specific features of other algorithms. 1 Introduction An important research direction in evolutionary and metaheuristic optimization is to improve our understanding of the relationship between algorithms and the optimization problems that they are applied to. In a general sense, an algorithm can be expected to perform well if the assumptions that it makes, either explicit or implicit, are well- matched to the properties of the search landscape or solution space of a given problem or set of problems. While it is possible to carry out theoretical investigations of performance and be- haviour (e.g. by assuming that the problem has a known analytical form), it is also useful to take a systematic and rigorous approach to the experimental analysis of algorithms. To this end, randomised problem or landscape generators have some favourable prop- erties which can be used to gain insights into the behaviour of metaheuristic optimizers with respect to some underlying properties of the problem instances generated [1]. In this paper we extend a previously proposed randomised landscape generator in combination with a methodology inspired by [2]. We do pairwise performance com- parisons of algorithms on 2D test problems with linear ridge structure to gain insight R. Schaefer et al. (Eds.): PPSN XI, Part I, LNCS 6238, pp. 94–103, 2010. c Springer-Verlag Berlin Heidelberg 2010

When does dependency modelling help? Using a randomized landscape generator to compare algorithms in terms of problem structure

  • Upload
    uq

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

When Does Dependency Modelling Help? Using aRandomized Landscape Generator to Compare

Algorithms in Terms of Problem Structure

Rachael Morgan and Marcus Gallagher

School of Information Technology and Electrical Engineering,University of Queensland, Brisbane 4072, Australia

[email protected], [email protected]

Abstract. In this paper we extend a previously proposed randomized landscapegenerator in combination with a comparative experimental methodology to studythe behaviour of continuous metaheuristic optimization algorithms. In particular,we generate landscapes with parameterised, linear ridge structure and performpairwise comparisons of algorithms to gain insight into what kind of problemsare easy and difficult for one algorithm instance relative to another. We applythis methodology to investigate the specific issue of explicit dependency mod-elling in simple continuous Estimation of Distribution Algorithms. Experimentalresults reveal specific examples of landscapes (with certain identifiable features)where dependency modelling is useful, harmful or has little impact on averagealgorithm performance. The results are related to some previous intuition aboutthe behaviour of these algorithms, but at the same time lead to new insights intothe relationship between dependency modelling in EDAs and the structure of theproblem landscape. The overall methodology is quite general and could be usedto examine specific features of other algorithms.

1 Introduction

An important research direction in evolutionary and metaheuristic optimization is toimprove our understanding of the relationship between algorithms and the optimizationproblems that they are applied to. In a general sense, an algorithm can be expectedto perform well if the assumptions that it makes, either explicit or implicit, are well-matched to the properties of the search landscape or solution space of a given problemor set of problems.

While it is possible to carry out theoretical investigations of performance and be-haviour (e.g. by assuming that the problem has a known analytical form), it is also usefulto take a systematic and rigorous approach to the experimental analysis of algorithms.To this end, randomised problem or landscape generators have some favourable prop-erties which can be used to gain insights into the behaviour of metaheuristic optimizerswith respect to some underlying properties of the problem instances generated [1].

In this paper we extend a previously proposed randomised landscape generator incombination with a methodology inspired by [2]. We do pairwise performance com-parisons of algorithms on 2D test problems with linear ridge structure to gain insight

R. Schaefer et al. (Eds.): PPSN XI, Part I, LNCS 6238, pp. 94–103, 2010.c© Springer-Verlag Berlin Heidelberg 2010

When Does Dependency Modelling Help? 95

into problem difficulty for different algorithm instances. We then use this approachto investigate the specific issue of explicit dependency modelling in the Estimation ofMultivariate Normal Algorithm (EMNA) compared to the Univariate Marginal Distri-bution Algorithm (UMDAc) which does not model variable dependencies. The overallmethodology is quite general and can be used to examine experimentally the specificfeatures of other algorithms.

An outline of the paper is as follows. Sec. 2 gives an overview of the previous workthat provides a basis for the methodology in this paper. The extension of the landscapegenerator to incorporate linear ridge structure and some illustrative experiments arepresented in Sec. 3. In Sec. 4 the extended generator and methodology are used tostudy the relationship between dependencies in problem variables and the modelling inUMDAc and EMNA. Sec. 5 concludes the paper.

2 Using a Landscape Generator to Actively Study the Relationshipbetween Problems and Algorithms

Experimental research in metaheuristics is receiving increasing attention in the litera-ture as a means of evaluating and comparing the performance of newly proposed andexisting algorithms. This includes the development of large-scale competitions and as-sociated sets of benchmark test problems (e.g at recent Genetic and Evolutionary Com-putation Conferences (GECCO) and Congress on Evolutionary Computation (CEC)).Several different types of test problems have been used for the evaluation of algorithms,including constructed analytical functions, real-world problem instances or simplifiedversions of real-world problems and problem/landscape generators [3, 4, 5]. Differentproblem types have their own characteristics, however it is usually the case that comple-mentary insights into algorithm behaviour result from conducting larger experimentalstudies using a variety of different problem types.

Max-Set of Gaussians (MSG) [3] is a randomised landscape generator that specifiestest problems as a weighted sum of Gaussian functions. By specifying the number ofGaussians and the mean and covariance parameters for each component, a variety oftest landscape instances can be generated. The topological properties of the landscapesare intuitively related to (and vary smoothly with) the parameters of the generator.

Langdon and Poli use Genetic Programming (GP) to evolve landscapes for the eval-uation and comparison of metaheuristics [2]. Individuals in the GP are candidate land-scapes, represented and evolved as 2D polynomial functions. The fitness function forthe GP is the performance difference between two specified algorithms that are runon a landscape. Consequently, landscapes found by the GP are optimization problemswhere one of the algorithms significantly outperforms the other. The results show thatconsiderable new insights can be gained into the behaviour of the algorithms testedand their parameter settings. The methodology is generally applicable to compare othermetaheuristic optimization algorithms.

An interesting possibility is to combine the advantages of a randomised landscapegenerator with an active search for landscapes that maximise performance differencebetween algorithms. This approach allows greater control over the types of landscapesgenerated through the parameterisation of the MSG generator, compared to using a

96 R. Morgan and M. Gallagher

GP to evolve arbitrary polynomial functions. Experiments can be conducted while sys-tematically and incrementally varying the landscape parameters. If a parameterisationis found that produces significant performance difference between two algorithms, alarge number of problem instances can be generated with known topological featuresfor analysis and further experimentation.

3 Extending the Landscape Generator to Incorporate RidgeStructure

In this paper we consider 2D continuous optimization problems:

maxf

f(x); x = (x1, x2) ∈ IR2; f : IR2 → IR

A symmetric boundary constraint is implemented such that x ∈ [−1, 1]2 by rejectingany search points generated by an algorithm that lie outside the feasible region.

3.1 Constructing Linear Ridges in Randomized Landscapes

Many real world optimization problems are defined over variables with significantdependency relationships. This suggests objective fitness function landscapes with cor-relation structure or ridges in their contours. In [3] parameterisations of the MSG gen-erator are described that generate localised dependencies, with peaks either uniformlydistributed around the space or in a “big valley” structure. However, these rarely leadto global dependency structure and cannot be controlled directly from the generator pa-rameters. We propose an extension to the MSG generator that incorporates linear ridgestructure as follows. In two dimensions, a line through the search space is given by

ax1 + bx2 + c = 0 (1)

Our aim is to generate linear ridges positioned randomly in the search space, with arandom angle to the coordinate axes. A ridge can be formed by positioning a numberof Gaussian components such that their means are distributed along a line. This can bedone by generating two points along the bounds of the search space, where each pointis along a different boundary. The two points are then used to solve Eqn. 1 for a, b andc. Then, the mean points of the n Gaussians are determined by generating values of x1

from U [−1, 1] and using Eqn. 1 to find the respective values of x2.The orientation (rotation angle) of each Gaussian component on the ridge is deter-

mined via its covariance structure. Firstly, an orientation angle is randomly generated.This would impose a homogeneous structure on the ridge with every local peak at thesame orientation (between completely aligned with, or orthogonal to the linear ridge).To add further variety, the orientation of each component is subsequently adjusted bya small amount of noise. Examples of ridge landscapes resulting from this method areshown in Fig. 1.

When Does Dependency Modelling Help? 97

Fig. 1. Example ridge-structured landscape instances from the extended MSG generator

3.2 Illustrative Experiments

To examine the effect of ridge structure on algorithm performance, we compared theDirect algorithm [6], UMDAc (see Sec. 4) and an implementation of Simulated An-nealing on random (i.e. with component mean values uniformly distributed in the fea-sible search space), big valley and ridge structured landscapes over a varying numberof components. For each value, we ran these algorithms on 30 problem instances, with30 random restarts per instance. Fig. 2(a) shows the average performance differencebetween Direct and UMDAc over restarts for all problem instances. We see that Directand UMDAc perform quite similarly on big valley, but quite differently on random land-scapes. On ridge landscapes, the difference is somewhere in between. Performance isalso not strongly related to the number of Gaussian components, except perhaps whenthe number of components equals 1. In this case, the difference is consistently verysmall for big valley landscapes, since the global peak will be biased towards the centreof the search space. This is not true for random and ridge landscapes.

Fig. 2(b) shows the performance difference between Direct and Simulated Anneal-ing. There is some concentration of points close to zero performance difference, that is,trials where the two algorithms performed almost identically (e.g. both found the globaloptimum). However, a larger fraction of the results is distributed around a performancedifference value of approximately 0.6. Not surprisingly, this is strongly related to thestructure of the generated landscapes. The generator includes specification of a thresh-old between the maximum height of local optima (for these results 0.5) and the heightof the global optimum (1.0). This threshold will appear in the results of many differentalgorithms for these landscapes, as most algorithms tend to converge to either the globalor a local optimum.

4 Comparing EMNA and UMDAc in Terms of LandscapeDependency Structure

EDAs are a class of metaheuristic optimization algorithms that build and use a prob-abilistic model to direct the search process [7, 8]. For continuous problems, the mostcommonly used model is a Gaussian or Normal distribution with specified covariance

98 R. Morgan and M. Gallagher

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Fitness Difference

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Number of Gaussian Components

(a) Direct vs UMDAc

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Fitness Difference

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1

Number of Gaussian Components

(b) Direct vs Simulated Annealing

Fig. 2. Performance comparison results with varying numbers of components in the landscapesgenerated. Top: random landscapes; Middle: ridge landscapes; Bottom: “big valley” landscapes.

structure. The continuous Univariate Marginal Distribution Algorithm (UMDAc) usesa diagonal covariance matrix, corresponding to a factorised product of univariate Nor-mal distributions. The Estimation of Multivariate Normal (EMNA) algorithm uses a fullcovariance matrix corresponding to an unrestricted multivariate Normal distribution [7].

One of the major issues that has been explored across EDA research, and has mo-tivated the work in Sec. 3 has been the incorporation of dependency modelling in theprobabilistic model of the algorithms. The general assumption is that many real worldoptimization problems are defined over variables that have unknown dependency rela-tionships between them. Therefore, a model that has the ability to capture and exploitdependencies between problem variables can be expected to provide good performanceon such problems. This argument has been experimentally verified several times in thecontext of developing new algorithms for both continuous and binary problems. If sucha model works well for a given optimization problem, it suggests that there are featurespresent in the fitness landscape that the model is able to fit well, but there are few re-ported studies that specifically analyse the relationship between landscape propertiesand dependency modelling in EDAs.

4.1 Experimental Results on Ridge Landscapes

In this Section we use the landscape generator described above to evaluate and comparethe performance of UMDAc and EMNA. Our assumption is that linear ridges on thelandscape result from a very simple and direct dependency relationship between x1 andx2. A set of experiments was carried out as follows. The rotation angle of componentsin the landscapes (see Sec. 3) was varied between 0 and 45 degrees with increments of1 degree with random noise of ± 5 degrees. At each angle, 30 randomised landscapeswere generated and 30 trials of each algorithm were conducted on each landscape. Eachalgorithm used a population size of 50, selection threshold of 0.8 and was run for 50generations.

Fig. 3 shows the mean fitness difference between UMDAc and EMNA in termsof best fitness values found on each landscape instance. Counter to our assumption

When Does Dependency Modelling Help? 99

0 5 10 15 20 25 30 35 40 45−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Angle of Rotation (Degrees)

Mea

n F

itnes

s D

iffer

ence

Fig. 3. Fitness difference between UMDAc and EMNA with varying rotation angle betweenthe ridge and the coordinate axes in the generated landscapes. Fitness difference overall is notcorrelated with the angle of the ridge. Points tend to be distributed with fitness difference above0, indicating that UMDAc actually outperforms EMNA on average across these landscapes.

Δf = 0.463

x1

x 2

−1 0 1−1

0

1Δf = 0.401

x1

x 2

−1 0 1−1

0

1Δf = 0.390

x1

x 2

−1 0 1−1

0

1

Δf = 0.337

x1

x 2

−1 0 1−1

0

1Δf = 0.330

x1

x 2

−1 0 1−1

0

1Δf = 0.313

x1

x 2

−1 0 1−1

0

1

Δf = 0.308

x1

x 2

−1 0 1−1

0

1Δf = 0.290

x1

x 2

−1 0 1−1

0

1Δf = 0.274

x1

x 2

−1 0 1−1

0

1

Fig. 4. Example landscape instances where UMDAc outperforms EMNA

and intuition, the results show no obvious trend between the angle of rotation andthe fitness difference of the two algorithms. More surprisingly, UMDAc tends to out-perform EMNA regardless of the rotation angle of the ridge, with fitness differences

100 R. Morgan and M. Gallagher

concentrated between 0 and 0.1 and skewed in favour of UMDAc (positive values).Our expectation was that the full covariance model of EMNA would be more capableof capturing the dependencies within the ridge-structured landscapes.

One benefit of using a landscape generator is that it is possible to analyse resultson specific landscape instances. Each data point in Fig. 3 represents a performancedifference between the two algorithms averaged over 30 trials. To further investigatethe above results, we selected three samples of the landscape instances from Fig. 3corresponding to the maximum, minimum and approximately zero fitness differences.

Δf = 0.427

x1

x 2

−1 0 1−1

0

1Δf = 0.362

x1

x 2

−1 0 1−1

0

1Δf = 0.342

x1

x 2

−1 0 1−1

0

1

Δf = 0.329

x1

x 2

−1 0 1−1

0

1Δf = 0.294

x1

x 2

−1 0 1−1

0

1Δf = 0.291

x1

x 2

−1 0 1−1

0

1

Δf = 0.262

x1

x 2

−1 0 1−1

0

1Δf = 0.258

x1

x 2

−1 0 1−1

0

1Δf = 0.244

x1

x 2

−1 0 1−1

0

1

Fig. 5. Example landscape instances where EMNA outperforms UMDAc

Fig. 4 shows 9 example landscape instances where UMDAc significantly outper-forms EMNA. Ridges in these landscapes tend to be axis aligned, but exhibit significantvariation in height along the top of the ridge. Global peaks are relatively narrow com-pared to other peaks in the landscape and are strongly positioned toward the boundariesof the search space.

In contrast, Fig. 5 shows landscape instances where EMNA significantly outperformsUMDAc. Ridges in these landscapes are more diagonal than those in Fig. 4, reflectingstronger correlation between x1 and x2. The global peaks in Fig. 5 tend to be highlyelliptical, as well as being narrow and located near the boundary of the search space.

Fig. 6 shows landscape instances where the performance of EMNA and UMDAc isalmost identical. Global peaks within these landscape are much more regular than thosein Figs. 4 and 5 in the sense that they have wider variance, are less elliptical and are

When Does Dependency Modelling Help? 101

Δf = 0.0005

x1

x 2

−1 0 1−1

0

1Δf = 0.0005

x1

x 2

−1 0 1−1

0

1Δf = 0.0033

x1

x 2

−1 0 1−1

0

1

Δf = 0.0043

x1

x 2

−1 0 1−1

0

1Δf = 0.0046

x1

x 2

−1 0 1−1

0

1Δf = 0.0046

x1

x 2

−1 0 1−1

0

1

Δf = 0.0047

x1

x 2

−1 0 1−1

0

1Δf = 0.0086

x1

x 2

−1 0 1−1

0

1Δf = 0.0113

x1

x 2−1 0 1

−1

0

1

Fig. 6. Example landscape instances where UMDAc and EMNA perform almost equally

positioned closer to the centre of the search space. Most of the landscapes in Fig. 6 haveridges that are closely aligned to the coordinate axes.

It is clear from Figs. 4-6 that the performance difference between the algorithmsis strongly influenced by identifiable features of the landscape. The main topologicalfeatures that we have observed above are the regularity, position and orientation of theridge as well as the position and orientation of the global peak relative to the ridgeand how elliptical it is. It seems likely that a number of factors are responsible forthe performance differences observed in the above experiments. The summary of allresults in Fig. 3 focus on a single factor (i.e. orientation) in isolation, but no trend isobserved because of the variability in the generated landscape instances contributed byother factors. When these factors are identified and controlled or constrained, clearerperformance difference trends may be seen. From Fig. 5, when EMNA outperformsUMDAc, the landscape does tend to have a diagonal ridge in agreement with our initialassumption. But this is in combination with a global peak that is relatively small, locatedclose to the search space boundary and is highly elliptical in a direction orthogonal tothe ridge itself. In contrast, the landscapes where UMDAc outperforms EMNA (Fig. 4)also tend to have a small global peak located close to the boundary, but the ridges areclosely axis-aligned. Furthermore, the global peak is less elliptical, and the ridge itselfis often closer to the boundary.

102 R. Morgan and M. Gallagher

4.2 Discussion and Related Work

The results above can be related to what is already known about the behaviour ofUMDAc and EMNA. It has been shown that the modelling in EMNA does not leadto efficient progress on a linear correlated slope function: the variance of the modelextends orthogonally to the direction of increasing fitness (i.e. in the worst possibledirection) [9, 10]. This is equally true of UMDAc if the contours of the slope areaxis-aligned, but since the UMDAc model cannot completely capture a linear depen-dence, it should outperform EMNA on a correlated slope. Our general observationthat UMDAc tends to outperform EMNA on ridge landscapes (Fig. 3) supports thisreasoning.

For univariate or factorisable problems (i.e. uncorrelated variables), UMDAc is alsoknown to converge prematurely on any monotonic or flat function [11, 12, 13, 14] whileconvergence is fast on a unimodal function when the model is ”close enough” to theoptimum. In the landscapes generated above, a ridge on the search space boundary isglobally similar to a monotonic slope (in the direction orthogonal to the ridge), while aridge through the centre of the search space is more like a unimodal function along mostdirections through the search space. In the above experiments we have observed thetendency of both algorithms to become stuck on the side of a ridge close to the boundary(such as Fig. 1(b)). However, in practice the behaviour of the algorithms is affected bythe interaction of several different factors: ridge location and rotation, global peak size,orientation (with respect to the ridge direction) and eccentricity (see the discussionabove). Additional experiments are required to study the interactions of these factorsrelative to algorithm performance.

5 Conclusions

This paper proposes a general experimental methodology, combining a randomisedlandscape generator with an active search for problems that differentiate between al-gorithms. More specifically, an existing generator was extended to incorporate globalridge dependency structure. This was used to investigate the relationship between prob-lem variables and dependency modelling in simple continuous EDAs. The results giveinsight into the relationship between algorithm performance and landscape structure.They also indicate that, even for parameterised benchmark functions, the experimen-tal behaviour of metaheuristics is a highly complex function of the properties of theproblem landscape.

Although the methodology presented is general, the experiments described abovewere limited to 2D problems and the algorithmic parameter settings used. Examininghigher-dimensional problems and integrating the variation of algorithm parameters areavenues for future work. A longer-term goal of this type of exploratory work is to beable to more precisely categorise or quantify the relationship between landscape struc-ture and algorithm behaviour.

When Does Dependency Modelling Help? 103

References

[1] Rardin, R.L., Uzsoy, R.: Experimental evaluation of heuristic optimization algorithms: Atutorial. Journal of Heuristics 7, 261–304 (2001)

[2] Langdon, W., Poli, R.: Evolving problems to learn about particle swarm optimizers andother search algorithms. IEEE Transactions on Evolutionary Computation 11(5), 561–578(2007)

[3] Gallagher, M., Yuan, B.: A general-purpose, tunable landscape generator. IEEE Transac-tions on Evolutionary Computation 10(5), 590–603 (2006)

[4] Gaviano, M., Kvasov, D., Lera, D., Sergeyev, Y.: Software for generation of classes of testfunctions with known local and global minima for global optimization. ACM Transactionson Mathematical Software (TOMS) 29(4), 469–480 (2003)

[5] MacNish, C.: Towards unbiased benchmarking of evolutionary and hybrid algorithms forreal-valued optimisation. Connection Science 19(4), 361–385 (2007)

[6] Jones, D., Perttunen, C., Stuckman, B.: Lipschitzian optimization without the lipschitz con-stant. Journal of Optimization Theory and Application 79(1), 157–181 (1993)

[7] Larranaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms: A New Tool forEvolutionary Computation. Kluwer, Dordrecht (2002)

[8] Pelikan, M., Goldberg, D.E., Lobo, F.: A survey of optimization by building and usingprobabilistic models. Computational Optimization and Applications 21(1), 5–20 (2002)

[9] Hansen, N.: The CMA evolution strategy: a comparing review. In: Towards a New Evolu-tionary Computation, pp. 75–102 (2006)

[10] Bosman, P., Grahl, J., Thierens, D.: Enhancing the Performance of Maximum–LikelihoodGaussian EDAs Using Anticipated Mean Shift. In: Rudolph, G., Jansen, T., Lucas, S.,Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 133–143. Springer, Heidel-berg (2008)

[11] Gonzalez, C., Lozano, J.A., Larranaga, P.: Mathematical modelling of UMDAc algorithmwith tournament selection. Behaviour on linear and quadratic functions. International Jour-nal of Approximate Reasoning (2002)

[12] Grahl, J., Minner, S., Rothlauf, F.: Behaviour of UMDAc, with truncation selection onmonotonous functions. In: Proc. Congress on Evolutionary Computation (CEC 2005), pp.2553–2559. IEEE, Los Alamitos (2005)

[13] Yuan, B., Gallagher, M.: A mathematical modelling technique for the analysis of the dy-namics of a simple continuous EDA. In: Congress on Evolutionary Computation (CEC),pp. 5734–5740. IEEE, Los Alamitos (2006)

[14] Yuan, B., Gallagher, M.: Convergence analysis of UMDAc with finite populations: a casestudy on flat landscapes. In: Proceedings of the 11th Annual Conference on Genetic andEvolutionary Computation, pp. 477–482. ACM, New York (2009)