Landscapes, operators and heuristic search

© J.C. Baltzer AG, Science Publishers

Landscapes, operators and heuristic search

Colin R. ReevesSchool of Mathematical and Information Sciences,

Coventry University, Coventry CV1 5FB, UK

Email: [email protected]

Heuristic search methods have been increasingly applied to combinatorial optimizationproblems. While a specific problem defines a unique search space, different “landscapes”are created by the different heuristic search operators used to search it. In this paper, asimple example will be used to illustrate the fact that the landscape structure changes withthe operator; indeed, it often depends even on the way the operators are applied. Recentattention has focused on trying to better understand the nature of these “landscapes”. Recentwork by Boese et al. [2] has shown that instances of the TSP are often characterised by a“big valley” structure in the case of a 2-opt exchange operator, and a particular distancemetric. In this paper, their work is developed by investigating the question of how landscapeschange under different search operators in the case of the nymyPyCmax flowshop problem.Six operators and four distance metrics are defined, and the resulting landscapes examined.The work is further extended by proposing a statistical randomisation test to provide anumerical assessment of the landscape. Other conclusions relate to the existence of ultra-metricity, and to the usefulness or otherwise of hybrid neighbourhood operators.

Keywords: heuristics, flowshop sequencing

AMS subject classification: Primary 65B15, 65B99; Secondary 65C05

1. Introduction

The metaphor of a landscape is commonly used to aid the understanding ofheuristic search methods for solving a combinatorial optimization problem (COP).We can define such problems as follows: we have a discrete search space X, and afunction

f : X a R.

The general problem is to find

x* = arg min

x ∈Xf ,

where x is a vector of decision variables and f is the objective function. The vector x*

is a global optimum; along with the idea of a landscape is the idea that there are many

Annals of Operations Research 86(1999)473–490 473

C.R. Reeves Landscapes, operators and heuristic search

local optima or false peaks in the objective function, which may have the unfortunateeffect of trapping a search algorithm before it can locate the global optimum. Thelandscape metaphor is a helpful one in some senses, but it can also be dangerous.

It is tempting to identify the landscape with the search space X: to treat them asif the labels “landscape” and “search space” apply to the same objects. However, thelandscape concept only really makes sense in the context of an associated neigh-bourhood structure, without which the related ideas of local optima have no meaning.A formal treatment of neighbourhood structures is given elsewhere (Höhn and Reeves[4,5]); here, we motivate the ideas by an intuitive approach based on a simple example.

1.1. An example

In practice, a neighbourhood structure is generated by the application of anoperator which transforms a given vector x into a new vector x ′. For example, if thesolution is represented by a binary vector (as is often the case for genetic algorithms,for instance), a simple neighbourhood might consist of all vectors obtainable by com-plementing one of the bits.

Consider the problem of maximizing a simple function,

f (x) = x3 − 60x2 + 900x + 100,

where the solution x is represented by a vector of length 5, of 0’s and 1’s. By decodingthis binary vector as an integer x in the range [0, 31], it is possible to evaluate f. Interms of x there is a single maximum at x = 10, and the “landscape” is a smoothcontinuous unimodal function in that range; however, the discrete optimization prob-lem obtained by the binary coding turns out to have 4 optima when a “single bitcomplement” operator (SBC) is used – i.e., a new vector x ′ is obtained from x bycomplementing a single bit. The neighbours of (0 0 0 0 0), for example, would be(1 0 0 0 0), (0 1 0 0 0), (0 0 1 0 0), (0 0 0 1 0) and (0 0 0 0 1). If a “steepest ascent”strategy is used (i.e., from a given vector the best neighbour is identified before amove is made), the local optima and their basins of attraction are as shown in table 1.

On the other hand, if a “next ascent” strategy is used (where the next changewhich leads uphill is accepted without ascertaining if a better one exists), the basinsof attraction are as shown in table 2. In the case of next ascent, the order of searchingthe vector also affects the landscape. In table 2, the order is “forward” (left-to-right),but if the search is made in the reverse direction (right-to-left), the basins of attractionare different, as shown in table 3.

However, the single-bit complement operator is not the only mechanism forgenerating neighbours. An alternative neighbourhood could be defined as follows:for k = 1,…,5, complement only bits k,…,5. Thus, the neighbours of (0 0 0 0 0), forexample, would be (1 1 1 1 1), (0 1 1 1 1), (0 0 1 1 1), (0 0 0 1 1) and (0 0 0 0 1). Thiscreates a very different landscape. In fact, there is now only a single global optimum(0 1 0 1 0); every vector is in its basin of attraction.

474

Table 1

Local optima and basins of attraction for steepest ascent.

Local optimum 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 (4100) (3988) (3803) (3236)

Basin 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 00 0 0 0 1 0 1 1 0 0 0 0 1 1 1 1 0 0 0 10 0 0 1 0 1 1 1 0 0 1 0 1 1 0 1 0 0 1 00 0 0 1 1 1 0 1 1 1 1 0 0 1 10 0 1 0 1 1 0 1 0 00 1 0 0 00 1 0 0 10 1 0 1 00 1 0 1 10 1 1 0 10 1 1 1 00 1 1 1 11 0 1 0 11 1 0 0 01 1 0 0 11 1 0 1 01 1 0 1 11 1 1 0 11 1 1 1 01 1 1 1 1

Table 2

Local optima and basins of attraction for next ascent(forward search) using SBC operator.

Local optimum 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 (4100) (3988) (3803) (3236)

Basin 0 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 0 0 0 00 0 1 1 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 10 1 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 00 1 0 1 0 1 0 1 0 0 1 1 1 1 1 0 0 0 1 10 1 0 1 1 1 1 0 0 0 1 0 0 0 00 1 1 0 1 1 1 1 0 0 1 0 0 0 10 1 1 1 0 1 0 0 1 01 0 1 0 1 1 0 0 1 11 0 1 1 01 1 0 0 11 1 0 1 01 1 0 1 11 1 1 0 11 1 1 1 0

C.R. Reeves Landscapes, operators and heuristic search 475

Table 3

Local optima and basins of attraction for next ascent(reverse search) using SBC operator.

Local optimum 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 (4100) (3988) (3803) (3236)

Basin 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 00 1 0 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 0 10 1 0 1 0 0 1 1 1 0 0 0 0 1 0 1 0 0 1 00 1 0 1 1 0 1 1 1 1 0 0 0 1 1 1 0 0 1 1

0 0 1 0 0 1 0 1 0 00 0 1 0 1 1 0 1 0 10 0 1 1 0 1 0 1 1 00 0 1 1 1 1 0 1 1 1

1 1 0 0 01 1 0 0 01 1 0 0 11 1 0 1 01 1 0 1 11 1 1 0 01 1 1 0 11 1 1 1 01 1 1 1 1

There are two interesting facts about this operator. Firstly, it is in fact closelyrelated to the one-point crossover operator frequently used in genetic algorithms. (Forthat reason, it has been called [4] the complementary crossover or CX operator.)Secondly, if the 32 vectors in the search space are re-coded using a Gray code, it iseasy to show that the neighbours of a point in Gray-coded space under SBC are iden-tical to those in the original binary-coded space under CX. This is an example of anisomorphism of landscapes. More details of the mathematical background to theseand similar phenomena can be found in [4,5].

1.2. Landscape topology

The above example is not of sufficiently general interest to be pursued here, butit does illustrate very neatly some of the problems associated with trying to understandthe nature of a landscape in a COP. Unfortunately, in many instances of a COP, it isnot easy to construct the landscape in terms of its underlying representation as a metricspace. Rather, a landscape is induced by the particular operator which is used to definethe neighbourhoods, from which it may not be easy explicitly to derive the repre-sentation. Thus, it is of interest to consider alternative ways of characterizing the natureof the induced landscape.

C.R. Reeves Landscapes, operators and heuristic search476

More generally, the question of what a landscape “looks like” is of some rele-vance to the way in which it should be searched by a heuristic technique. Recently,Boese et al. [2] have suggested that, in the cases of the travelling salesman problem(TSP) and graph bisection, local optima tend to be relatively close to each other (interms of a plausible metric), and to the global optimum (where it was known). Thus,for at least some problem landscapes (and possibly many), there is a “big valley”structure, where local optima occur in clusters. If this is indeed the case, it wouldsupport the idea of generating new start points for search from a previous local opti-mum rather than from a random point in the search space – good candidate solutionsare usually to be found “fairly close” to other good solutions. This in turn provides themotivation for some of the “perturbation” methods [6,11,15] which currently appearto be among the best available for such COPs as the TSP. It should also be noted thatother methods, such as genetic algorithms, and some versions of tabu search (forexample, those that use “path relinking” [3]) also implicitly rely on the existence ofsuch a structure.

However, it is not known whether this phenomenon is general, or whether it isspecific to the cases previously examined. In extending it to other COPs, some interest-ing questions arise as to the way in which “closeness” can be measured, and as to howthe significance of the observed behaviour can be assessed. In this paper, the well-known nymyPyCmax flowshop sequencing problem is studied in an attempt to shedfurther light on this question.

2. Flowshop sequencing

The permutation flowshop sequencing problem nymyPyCmax is one in which njobs have to be processed (in the same order) on m machines. The object is to find thepermutation of jobs that will minimise the makespan, i.e. the time at which the lastjob is completed on machine m. It is defined as follows:

Suppose the processing time is p(i, j) for job i on machine j, and the job permu-tation is denoted by π = {π1, π2,…, πn}. Then the job completion times C(π i , j) are asfollows:

In order to find good solutions to instances of this problem, there are manypossible NS operators, each generating a different landscape, and each needing a

C(π1,1) = p(π1,1),

C(π i ,1) = C(π i −1 ,1) + p(π i ,1) for i = 2,… , n,

C(π1 , j) = C(π1 , j − 1) + p(π1, j) for j = 2,… , m,

C(π i , j) = max{C(π i − 1, j), C(π i , j − 1)} + p(π i , j) for i = 2,… , n; j = 2, … , m,

Cmax = C(πn , m).


suitable metric for measuring the distance between solutions. In principle, the distancebetween two solutions π and π′ on a landscape should be measured by the minimalnumber of applications of the operator which would convert π into π ′. However, ingeneral, no polynomial algorithm is known for calculating this number.

There are a number of surrogate distance metrics that could be used in suchcases. In this paper, four metrics have been used:

The adjacency metric is found simply by counting the number of times, nadj , apair i, j of jobs is adjacent in both π and π ′. The distance is then measured byn – nadj – 1. This is the uni-directional version; we may relax the adjacency require-ment to cover the case of either i, j or j, i being present, thus obtaining a similar bi-directional adjacency metric.

The precedence metric tries to be a little more sophisticated. Rather than bysimply looking at adjacencies, the number of times job j is preceded by job i in bothπ and π′ is counted. To get a “distance”, this quantity is subtracted from n(n – 1)y2.

The position-based metric takes the principle one stage further, by comparingthe actual positions in the sequence of job j in each of π and π ′. For a sequence π, wecan define the “inverse” permutation σ, where the position of job πi is given by σπi = i.The position-based metric is then just

jσ j − ′ σ jj.

j =1

n

∑

We thus have four metrics to use as surrogates for the “true” metric, which ofcourse depends on the particular operator being used. Boese et al. [2] used anadjacency-based metric, which seems reasonable in the context of the TSP, where it isrelative order that counts, rather than absolute position in the sequence. Which one isthe “best” approximation to the true one in the general case is difficult to judge, butwe might expect that for the flowshop problem, the order in which they have beenlisted above is in increasing order of their effectiveness at discriminating betweendifferent local optima. While adjacency is an important property in solving the TSP,as Boese et al. [2] have argued, it is unlikely to be so relevant in this case, where theactual position of a job in the sequence is much more likely to be important.

2.1. NS operators for sequencing

Several operators have been proposed in previous work on permutation orsequencing problems. Here, we investigate six ways in which one solution can betransformed into another. All of these, with the possible exception of inversion, havefrequently been used in applications to flowshop sequencing.

Adjacent pairwise exchange (APEX) is perhaps the simplest: the positions oftwo adjacent jobs {πi , πi +1} are exchanged.


Inversion (INV) is the selection of a sub-vector of π, say πr,…, πs, and reversingits order.

Exchange (EX) is an obvious generalization of APEX: all pairs of jobs are eligiblefor exchange of positions, not just adjacent ones.

Forward shift (FSH) takes a job πi from its current position and inserts it afterjob πj (where j > i).

Backward shift (BSH) is like FSH, but with the sense reversed: job πi is removedfrom its current position and inserted before job πj (where j < i).

Double shift (DSH) is the combination of FSH and BSH.

In the case of APEX, each sequence has (n – 1) neighbours; for DSH each se-quence has n(n – 1)neighbours. In the other cases, there are n(n – 1)y2 neighbours ofa given sequence. We will refer to this as the size of the operator.

There are some interesting relationships among these operators. It is clear thatAPEX is the weakest, in that any sequence which is optimal with respect to any of theother operators is also optimal with respect to APEX. In that sense, we could say thatAPEX is subsumed by the others. Similarly, DSH subsumes FSH and BSH. However,neither INV or EX is subsumed by any other operator, although it is probable that onthe average DSH would produce better solutions, given that its neighbourhood is twiceas large.

Apart from these obvious facts, any investigation of the relative efficacy of theseoperators in a particular case will necessarily be empirical. The type of questions thatare interesting are: whether one operator of the same size tends to outperform another;how often a local optimum with respect to one operator is also locally optimal withrespect to another; and what sort of “landscape” is induced by each operator.

3. Landscape analysis

In order to try to better understand what the landscape of such problems lookslike, each operator was applied 50 times from different random initial vectors to anumber of problem instances of various sizes. This resulted in 50 distinct local optimain each case – the fact that no more than 50 initial start points was needed is a pointwhich we shall return to at a later stage.

3.1. Distance correlations

Apart from measuring distances of local optima from a given point in terms ofthe four metrics introduced above, it is also possible to compare the local optima interms of their “distances” as measured by their objective function values. If the resultsof Boese et al. [2] carry over to the flowshop problem, we would expect to find that


these distances are in some sense correlated with each other. This can be examined inany particular case by plotting a graph of local optima relative to the global optimumin terms of one of the distance metrics against distance in terms of objective functionvalues. Figure 1 provides an example of such a graph for 50 local optima generatedby repeated restarts of a next descent procedure using the first of the 20-job, 5-machineset of problem instances generated by Taillard [13]. These problem instances werechosen for investigation partly because they have become well-known cases of flow-shop sequencing, but also because the global optima are known1) for most, if not all,of the smaller instances, thus providing a fixed reference point in each case.

The first graph appears to confirm that a relationship such as that found by Boeseet al. [2] for the TSP also exists in this case of the flowshop sequencing problem –local optima seem to be “close” to each other, which would motivate the developmentof adaptive techniques similar to those proposed in [2]. In contrast, such a relationshiplooks far less likely for the second case (figure 2), where the metric used is much lesseffective at discriminating between local optima.

However, it is clearly impossible to plot such a graph for every operator, metricand problem instance. What is needed in order to assess any particular case is a simplemeasure of the relationship between the local optima. We would like to know whethersuch a relationship exists or not, and if so, how significant it is.

1) The author is indebted to Vaessens (personal communication) for making his results on global optimafor these instances available.

Figure 1. 50 local optima plotted in terms of their distances from a global optimum (x-axis),against their relative objective function values (y-axis). The operator used was forwardshift, the strategy was next ascent, each repeated from 50 different initial random startpoints, which led to 50 distinct local optima. In this case, the metric used is position-based.


Figure 2. A second example, where the metric used is uni-directional adjacency-based.

At first sight, it might appear that a simple measure would be the correlationcoefficient computed from the corresponding entries in the two distance matrices foundfrom the distance metric and the objective function values, respectively. Such a valuecan of course easily be calculated, but interpreting its significance would be difficult,as the sample clearly does not consist of independent random samples (for example,if local optima A and B are close in terms of their objective function values, and B isalso close to C, then so are A and C). In these circumstances, we cannot carry out astandard hypothesis test of the correlation coefficient. Fortunately, an alternativestrategy is available. This type of problem has been studied in other branches ofscience, for example in psychology and biology [10], where it has been approachedby using a randomization test [9]. This is carried out by repeatedly permuting thelabels of the items in one distance matrix, and re-calculating the correlation coefficient.If this procedure is repeated many (e.g. 1000) times, the fraction of replications inwhich a correlation coefficient is found that is more extreme than the originally calcu-lated value can be used as an estimated significance level, and the relevance of thevalue found can be assessed.

For example, in the two cases described in the above scatter plots, the calculatedcorrelation coefficients were 0.545 and 0.120. While the second of these does notlook very impressive, in fact both are statistically significant at the 0.1% level, on thebasis of 1000 replications in a randomization test.

A randomization test was carried out for each operator, for each metric and forall the 20y5, 20y10, 20y20, 50y5 and 50y10 sets of problem instances defined byTaillard [13]. For all of these cases, the global optimum is now known. The results aredisplayed in table 4.


Table 4

Significance of correlations: each cell records the number of P-values(out of 10) that were significant at the 1% level on a randomization test.

Metric

Operator Adj-1 Adj-2 Prec Posn

20y5 problem instances

APEX 1 10 10 10EX 0 0 8 9INV 0 8 9 10FSH 0 8 9 10BSH 0 6 7 8DSH 0 0 2 7










It is clear from this table that for the precedence- and position-based metrics,there is nearly always a strong relationship between the distances of the solutionsfrom each other and their corresponding objective function values. This is less clearin the case of the adjacency-based metrics. These results are in line with what wasexpected, as discussed above. There also appear to be some interesting interactionsbetween metric and operator: the local optima on the APEX landscape, in the caseof the larger problems, seem to be much less well correlated with the inter-optimadistances for the more sophisticated metrics.

Nevertheless, overall there appear to be significant correlations between localoptima generated by these operators, whatever metric is used to provide the meaningof “distance”. Thus, the argument put forward by Boese et al. [2] for the existence of“big valleys” in the TSP seems also to be valid in the context of flowshop sequencing.

3.2. Ultrametricity

Another issue is the existence or otherwise of an ultrametric relationship betweenlocal optima. Mathematically, an ultrametric is a distance measure d (·,·) that satisfiesthe usual requirements for a metric, except that the triangle inequality

d(x, z) ≤ d(x, y) + d( y, z)

is replaced by the stronger condition

d(x, z) ≤ max(d(x, y), d (y, z)).

(An ultrametric can thus be thought of as one where every “triangle” between threepoints is either equilateral or isosceles with an acute included angle.) This has beensuggested by a number of authors [1,8] for some combinatorial optimization problems,although evidence for it seems inconclusive. As Baldi and Baum [1] point out, weresuch a relationship to exist for a given problem class, it would imply a hierarchicalrelationship between local optima – a structure which could be exploited in devisingan algorithm for the solution of such problems. The number of sets of three localoptima for which ultrametricity is the case can be computed fairly easily for a giveninstance.

These calculations were carried out for all the 20y5, 20y10, 20y20, 50y5 and50y10 sets of problem instances defined by Taillard [13]. For all of these cases, theglobal optimum is now known. The results are displayed in table 5.

Ultrametricity seems to be an unlikely phenomenon under the more sophisticatedmetrics – those which we expect to be more representative of the underlying landscapein the case of flowshop sequencing. Even for the adjacency-based metrics and theAPEX operator, there are never more than 50% of ultrametric triangles. The tentativeevidence adduced in its favour in earlier work on the TSP may perhaps be attributedto the fact that the operator used was fairly weak (2-opt), while the metric used wasadjacency-based.


Table 5

Ultrametricity: each cell records the average percentage ofultrametric “triangles” of local optima.

Metric

Operator Adj-1 Adj-2 Prec Posn


APEX 46.3 32.6 2.89 4.51EX 38.9 29.3 3.71 5.27INV 42.3 30.3 3.27 4.64FSH 39.2 30.0 3.39 5.05BSH 37.6 29.0 3.68 5.39DSH 34.3 27.3 4.12 5.81










3.3. Dependencies between operators

As has been observed in the introduction, a local optimum on one landscape (i.e.with respect to one operator) is not necessarily a local optimum on another landscape.In some cases, where a weaker operator is subsumed by a stronger, it is obvious thata local optimum with respect to the stronger cannot be improved by the operator whichit subsumes. Thus the local optima of the DSH landscape are all local optima on theAPEX, FSH and BSH landscapes.

What is interesting from an empirical viewpoint is how often a local optimumon one landscape is capable of being improved on another. Thus, in the experimentsreported above, each time a local optimum was reached on one landscape, it was usedas a start point on one of the other landscapes in an attempt to improve it. Statisticswere computed for each of the 50 local optima generated, and the results are sum-marized in tables 6 and 7.

There are a few points of note in these tables: firstly, the surprising fact thatsome (albeit very few) of the APEX-optima could not be improved by the otheroperators, although the average improvement was of the order of 10–20%. Takentogether, this suggests that some of the local optima in the other landscapes are notparticularly high-quality. Although the same size as EX, FSH and BSH, INV appearsto be inferior to them. It is also interesting that overall, EX improves FSH and BSHmore than the converse. The dominance of the DSH operator is as expected, althougheven here, it is possible for EX, in particular, to find an improved solution.

4. Multiple global optima

A possible objection to the methods and results obtained above is that alldistances are measured relative to a single global optimum. However, it would bereasonable to assume that instances of the flowshop sequencing problem might giverise to multiple global optima. In fact, in some cases it is quite easy to generate alter-native globally optimal solutions, especially in the case of the makespan objective, byapplying a neighbourhood operator to a known global optimum and looking for animprovement of zero in the objective function. However, such solutions are necessarilyvery close to the first global optimum, and it seems safe to assume that for alternativeglobal optima generated in this way, the conclusions reached above are likely tohold.

However, it is possible that some global optima might be relatively distant fromeach other. If this were so, the relative positions of the local optima generated by theabove operators might also be affected. In an attempt to explore this question further,we adapted a program from Yamada’s work on the job-shop problem [14] and used itto generate 2548 distinct global optima to the first 20-job, 5-machine problem ofTaillard’s benchmark set [13]. There are almost certainly many more than this – thesewere generated in one overnight run on a Sun SparcStation. On examining these


Table 6

Dependencies between local optima; each cell of the table records the average number of trials(out of 50) that method B was able to improve method A for Taillard’s problem instances.

Method B

Method A APEX EX INV FSH BSH DSH


APEX 0 49.2 47.8 48.2 49.4 49.8EX 0 0 3.0 17.7 15.9 29.3INV 0 22.2 0 23.3 28.8 38.9FSH 0 28.1 17.0 0 31.6 31.6BSH 0 24.5 16.5 30.8 0 30.8DSH 0 9 5.2 0 0 0


APEX 0 49.8 47.7 49.9 49.7 50.0EX 0 0 5.7 27.2 31.5 40.9INV 0 31.3 0 33.9 40.8 45.1FSH 0 29.3 13.6 0 37.7 37.7BSH 0 28.6 17.2 38.6 0 38.6DSH 0 12.7 6.7 0 0 0




APEX 0 49.9 49.7 49.9 50.0 50.0EX 0 0 4.2 17.1 21.8 33.5INV 0 31.3 0 29.2 32.1 42.3FSH 0 34.2 19.6 0 37.5 37.5BSH 0 32.5 22.1 38.8 0 38.8DSH 0 7 2.7 0 0 0




Table 7

Dependencies between local optima; each cell of the table records the average percentageimprovement of method B over method A for Taillard’s problem instances.

Method B

Method A APEX EX INV FSH BSH DSH












solutions in greater detail, it was clear that many of them were extremely close toeach other, and to the original global optimum used in section 3, in terms of all fourmetrics. Thus the conclusions reached above regarding the 50 local optima wereintact.

Nevertheless, there were still some global optima that were further apart, at leastin terms of the precedence- and position-based metrics. (The adjacency-based metrics,as earlier, did not discriminate very effectively between different solutions, and wereignored.) After some effort, we were able to identify a “most widely separated set” Sof global optima from the complete set of 2548. The average distance from the originalglobal optimum of the 50 local optima generated by DSH was first calculated (D ) foreach metric. We then applied the criterion that the global optima belonging to the setS should be at least as far from each other as D (for each metric). Unfortunately,,there were no members of S in this case! Clearly, all of these 2548 global optima weremuch closer to each other than they were to the average of the 50 local optima.

We found it necessary to relax this criterion to 0.4D before a set of any size wasfound. With this value, set S had 13 members. We then measured the distances of eachof these global optima from each of the 50 local optima using both the precedence-and position-based metrics. It was immediately apparent that, while the actualdistances from the different global optima varied by roughly ±10%, those local optimathat were close to one global optimum were likely to be close to all the others.Conversely, those local optima that were far from one global optimum were far fromthem all. In fact, there appeared to be considerable agreement between the impliedrankings of the local optima relative to the different global optima at all distances.

Whether this apparent agreement was significant was tested using Kendall’scoefficient of concordance W [7]. For every operator, and for both precedence- andposition-based metrics, the P-value was zero to at least 8 places of decimals; in otherwords, the hypothesis of no agreement between the different rankings was alwaysdecisively rejected.

The implication of this analysis is clear: conclusions regarding distances fromone global optimum for this instance can safely be maintained for other global optima,even for ones that are relatively quite far apart. There is also an interesting insighthere into the nature of “landscape” in this instance, which points to possible dangersin our use of the metaphor. In our experience of a three-dimensional world, the pictureconjured up by the term “big valley” is one where a global optimum is surrounded bylocal optima of progressively increasing average quality as they approach that globaloptimum.

However, from the analysis of this case, the members of one set of quite widely-separated points (the global optima) all bear very nearly the same relationship to allthe members of another set (the local optima). It is probable, therefore, that the regionsinhabited by the two sets of points are actually quite sharply divided. Perhaps the“valley” in which the local optima lie does not lead inexorably towards the set ofglobal optima, but comes up against a “barrier” that separates the two groups.


This shows an inherent danger in applying the term “landscape” as a descriptionof what happens in combinatorial optimization. The idea of a “big valley” is a helpfulmetaphor, but we should not imagine it is like a valley in three dimensions.

There is need for some caution here. This analysis relates to just one probleminstance, and it cannot necessarily be extended to all instances. However, when thesame approach was attempted for some of Taillard’s other benchmarks, it provedimpossible to generate more than one global optimum other than those that weremerely trivial modifications of the global optimum used in section 3. In view of thecomputational effort already expended, we decided to abandon the attempt to verifythe findings on another problem instance. Possibly, Taillard’s first 20y5 problem isatypical both in the number of global optima it possesses, and in the relatively largedistances between them. At least we can say that the analysis of this one probleminstance has shown no reason to doubt the conclusions reached earlier.

5. Conclusion

We have outlined an approach to the analysis of landscapes induced by somespecific operators in the context of flowshop sequencing. We have discussed somesuitable metrics for representing inter-sequence distances, and shown how the signifi-cance of distance correlations can be measured by a randomization test. (How wellthese metrics approximate the “true” distances between sequences is currently thesubject of further experimentation. Preliminary results [12] suggest that both theprecedence- and postion-based metrics are very good approximations.) Using thesemetrics, we have observed that a “big valley” structure does appear to exist for suchproblems as investigated here, which suggests the motivation for perturbation ap-proaches is well-founded, and provides a sound explanation for the good performanceof these and other methods.

We also considered the question of whether these results might relate only to theone specific global optimum used in each case. The analysis of a case where multipleglobal optima were common found no reason that they should not be generalized.This analysis also showed that there is a need for caution in too readily transferringthe aspects of familiar three-dimensional landscapes to those generated by neigh-bourhood search methods in combinatorial optimization.

Finally, we return to the point that was made earlier – we have now seen that insome sense the local optima appear to form a “cluster” in the overall search space, yet50 distinct re-starts still led to 50 distinct local optima. Clearly, therefore, althoughthe local optima form a restricted subset, it is still a very large one. How many localoptima there are is difficult to measure, but it is probably a function that is exponen-tial in the problem size. Thus, to explore a given number of local optima from randomre-starts is likely to take much longer than to use perturbation methods or adaptive re-starts. This gives further reason for investigating such methods.


Acknowledgement

The comments of the anonymous referees are gratefully acknowledged.

References

[1] P. Baldi and E.B. Baum, Caging and exhibiting utrametric structures, in: Neural Networks forComputing, ed. J. Denker, American Institute of Physics, 1986.

[2] K.D. Boese, A.B. Kahng and S. Muddu, A new adaptive multi-start technique for combinatorialglobal optimizations, Operations Research Letters 16(1994)101–113.

[3] F. Glover and M. Laguna, Tabu search, in: Modern Heuristic Techniques for Combinatorial Prob-lems, ed. C.R. Reeves, Blackwell Scientific, Oxford, 1993, chapter 3. (Recently re-issued (1995)by McGraw-Hill, London.)

[4] C. Höhn and C.R. Reeves, The crossover landscape for the onemax problem, in: Proceedings of the2nd Nordic Workshop on Genetic Algorithms and their Applications, ed. J. Alander, University ofVaasa Press, Vaasa, Finland, 1996, pp. 27–43.

[5] C. Höhn and C.R. Reeves, Are long path problems hard for genetic algorithms?, in: Parallel Problem-Solving from Nature – PPSN IV, eds. H.-M. Voigt, W. Ebeling, I. Rechenberg and H.-P. Schwefel,Springer, Berlin, 1996, pp. 134–143.

[6] D.S. Johnson, Local optimization and the traveling salesman problem, in: Automata, Languagesand Programming, Lecture Notes in Computer Science 443, eds. G. Goos and J. Hartmanis,Springer, Berlin, 1990, pp. 446–461.

[7] M.G. Kendall, Rank Correlation Methods, Charles Griffin, London, 1962.[8] S. Kirkpatrick and G. Toulouse, Configuration space analysis of travelling salesman problems, J.

Physique 46(1985)1277–1292.[9] B.F.J. Manly, Randomization and Monte Carlo Methods in Biology, Chapman and Hall, London,

1991.[10] N. Mantel, The detection of disease clustering and a generalized regression approach, Cancer

Research 27(1967)209–220.[11] O. Martin, S.W. Otto and E.W. Felten, Large step Markov chains for the TSP incorporating local

search heuristics, Operations Research Letters 11(1992)219–224.[12] C.R. Reeves and T. Yamada, Distance measures in the permutation flowshop problem, Technical

Report, School of Mathematical and Information Sciences, Coventry University, UK, 1997.[13] E. Taillard, Benchmarks for basic scheduling problems, European Journal of OR 64(1993)

278–285.[14] T. Yamada and R. Nakano, Scheduling by genetic local search with multi-step crossover, in: Parallel

Problem-Solving from Nature – PPSN IV, eds. H.-M. Voigt, W. Ebeling, I. Rechenberg and H.-P.Schwefel, Springer, Berlin, 1996, pp. 960–969.

[15] G. Zweig, An effective tour construction and improvement proedure for the traveling salesmanproblem, Operations Research 43(1995)1049–1057.


Documents

Landscapes, operators and heuristic search