Multimodal Function Optimization with a Niching Genetic ...€¦ · Bulletin of the Seismological...

Preview:

Citation preview

Bulletin of the Seismological Society of America, 89, 4, pp. 978-988, August 1999

Multimodal Function Optimization with a Niching Genetic Algorithm:

A Seismological Example

by Keith D. Koper,* Michael E. Wysession, and Douglas A. Wiens

Abstract We present a variant of a traditional genetic algorithm, known as a niching genetic algorithm (NGA), which is effective at multimodal function optimi- zation. Such an algorithm is useful for geophysical inverse problems that contain more than one distinct solution. We illustrate the utility of an NGA via a multimodal seismological inverse problem: the inversion of teleseismic body waves for the source parameters of the Mw 7.2 Kuril Islands event of 2 February 1996. We assume the source to be a pure double-couple event and so parametrize our models in terms of strike, dip, and slip, guaranteeing that two global minima exist, one of which rep- resents the fault plane and the other the auxiliary plane. We use ray theory to compute the fundamental P and S H synthetic seismograms for a given source-receiver ge- ometry; the synthetics for an arbitrary fault orientation are produced by taking linear combinations of these fundamentals, yielding a computationally fast forward prob- lem. The NGA is successful at determining that two major solutions exist and at maintaining the solutions in a steady state. Several inferior solutions representing local minima of the objective function are found as well. The two best focal solutions we find for the Kuril Islands event are very nearly conjugate planes and are consistent with the focal planes reported by the Harvard CMT project. The solutions indicate thrust movement on a moderately dipping fault--a source typical of the convergent margin near the Kuril Islands.

Introduction

A common goal of optimization problems is finding the global minimum of a multidimensional objective function. In geophysics this is realized through inverse problems where the objective function to be minimized is often a norm describing the error between a specific data set and model (Menke, 1989). In cases such as seismic tomography, where model parameters number in the hundreds or thousands, it- erative matrix methods are often used (Aki, 1993). In cases where models can be described by a much smaller number of parameters, and the calculation of the forward problem is computationally fast, global search methods such as simu- lated annealing (e.g., Bina, 1998) and genetic algorithms (e.g., Stoffa and Sen, 1991) have been effective. These meth- ods are not, however, designed for finding multiple, distinct solutions to mulfimodal optimization problems. Although various solutions can be found by repeatedly using such methods with different starting points, since these methods tend to be biased by the initial conditions (Vasco et aL, 1996), distinct solutions are not explicitly solved for. An

*Present address: Department of Geosciences, University of Ar- izona, Tucson, Arizona 85721.

efficient alternative method for explicitly producing distinct solutions to a given optimization problem is the use of a genetic algorithm (GA) variant known as a niching genetic algorithm (NGA) (e.g., Holland, 1975; Goldberg, 1989; Mahfoud, 1995).

Niching genetic algorithms provide a formal mechanism for finding several solutions to a multimodal optimization problem. In the case of an objective function that has more than one global optimum, or several interesting local min- ima, an NGA can in theory find the regions in model space where the minima occur, and maintain the solutions indefi- nitely. The number of minima does not need to be known a priori, and in fact an NGA can be used to determine the number of distinct solutions empirically, given a user- defined criterion for model similarity. We apply an NGA to two example problems: optimization of an analytic, 1D, five- modal function on the unit interval, and the inversion of teleseismic body waves for the focal parameters of the Mw 7.2 Kuril Islands event of 7 February 1996. For the latter problem we assume a double-couple source and parametrize the models in terms of strike, dip, and slip, ensuring that the objective function has two broad, global minima represent- ing the fault plane and the auxiliary plane. The fundamental

978

Multimodal Function Optimization with a Niching Genetic Algorithm: A Seismological Example 979

bimodal nature of this problem is well known, thus it pro- vides an excellent venue for the seismological illustration of an NGA.

Niching Genetic Algorithms

Although GAs were originally developed to study evo- lutionary phenomena, their utility as optimization tools is widely recognized (Goldberg, 1989). GAs proceed stochas- tically by the repeated application of biologically inspired operators on a randomly generated population of candidate models. A wide variety of operators and model parameteri- zations have been implemented, but the most common op- erators are crossover and mutation, which use the binary model representation (genotype), and selection, which op- erates on the decimal representation of the models (pheno- type). With time the operators gradually improve the quality of the population of models, quantified by objective function evaluations, while avoiding the lure of local minima. Al- though GAs are not guaranteed to find the global minimum of an arbitrary objective function, they are explicitly de- signed to avoid local minima. Nevertheless the possibility of using variations of a genetic algorithm (NGAs) to find distinct solutions of a multimodal optimization problem (lo- cal minima of the objective function) has been recognized ever since genetic algorithms have been applied to optimi- zation problems in general (Holland, 1975; Goldberg, 1989). A thorough review of the subject is given by Mahfoud (1995), and here we only briefly comment on the back- ground of niching genetic algorithms.

Most previous work with NGAs has been carried out using either a (1) crowding (e.g., Dejong, 1975) or (2) shar- ing scheme (e.g., Goldberg and Richardson, 1987). In the first case, individual models in a given population are com- pared with a random subpopulation of m models, where m is known as the crowding factor and is of the order 2 to 3. The similarity of two models is defined on a bit by bit basis using the genotypic representation of the models. The model in this subpopulation that is most similar to the original model is deleted from the population. The replacement, or weeding out, of similar models promotes diversity in the population as a whole and allows different species of models to evolve from the main population, with each species in- habiting a niche representing a local minima of the objective function. In the second case, that of sharing, again the goal is to induce a population of models to evolve into separate demes representing local minima of the objective function. Such implicit speciation is accomplished by scaling (reduc- ing) the fitnesses of similar models. All the models in the population have their fitnesses artificially reduced in pro- portion to their similarity to the rest of the population. The exact proportion of reductions is defined by a sharing func- tion. Thus several models within the population that are very similar to one another will have their fitnesses reduced to a much greater extent than the more distinct models in the population, and so will tend to be eliminated during the nat-

ural course of a GA run. This mechanism inhibits the con- vergence of the model population as a whole (due to genetic drift) and, like the crowding scheme, allows for distinct so- lutions to a multimodal optimization problem to be found.

The NGA used in this work differs from those described above in that the number of subpopulations, or demes, of models is explicitly defined at the beginning of the run. Our goal is not to have a single population of models evolve into separate demes, but to have several artificially separated sub- populations migrate to the different optimal areas in the model space--those regions representing the local minima of the objective function. This is an extreme example of the application of mating restrictions to genetic algorithm search (Goldberg, 1989). We do this to increase the temporal sta- bility of the niches. In a standard NGA it is possible that demes appear and disappear owing to the stochastic nature of GAs and the finite size of the population; however, with the demes explicitly separated we can use intrademe elitist selection to maintain the integrity of the niches.

Although the number of demes is decided upon without explicit knowledge about the objective function and the number of local minima that exist, we can use our NGA to determine the number of local minima empirically. This is accomplished by individually analyzing the behavior of the demes for different scales of similarity. We illustrate this in a later section via the seismological example problem of teleseismic waveform inversion.

The NGA used in this work is described as follows. Ini- tially we construct n demes of models. Each deme has the same number of members and is carried out as a typical GA run, with equivalent probabilities of mutation and crossover, with one important distinction. The first deme is allowed to run independently of the others and behaves as a regular GA. The second deme is run similarly except that after every generation the similarity of each member of this deme is calculated with respect to the best model from the first deme--the alpha solution. If this similarity is greater than a problem-specific criterion, that model in the second deme is given an artificially high cost, causing it to be eliminated in the next generation. We use the cost of the worst model in the first deme as the penalty cost, although any sufficiently high value will work. The third deme is run equivalently to the second deme; however in addition to competing against the alpha solution, each model is compared with the best model in the second deme--the beta solution. Again, if this similarity is greater than a given criterion, that model is given the fitness of the worst model in the first deme and is effectively weeded out of the deme by the normal selection operator of a GA. This process is continued sequentially, with the nth deme competing against the top models from each of the previous n - 1 demes.

The first deme runs unimpeded and tends toward the global minimum, while the higher-order demes are progres- sively more constrained since they must compete against progressively more models. This competition forces the demes to unique areas of the model space (niches). Given

980 K.D. Koper, M. E. Wysession, and D. A. Wiens

enough time, the highest-quality solution appears in the first deme, and the lead models in the other demes are ranked in order of decreasing quality from demes 2 through n. Once the demes have aligned in this manner, the population as a whole has reached a steady state. Because we use intrademe elitist selection and a fixed number of demes, after a niche is found it is inhabited indefinitely.

Since the character of a given objective function may not be known a pr ior i , it is possible that more demes will be defined than exist local minima. In this case the extra demes are not able to inhabit a specific niche in the model space. We refer to such demes as frustrated demes, since they perform no constructive searching of the model space. The behavior of a frustrated deme is characterized by two properties: (1) the lead model in the deme shows little or no improvement over time, remaining at a relatively high cost value, and (2) the models in the deme do not cluster in a specific area of model space through time, but rather evolve chaotically from generation to generation. The chaotic evo- lution of the model distribution in a frustrated deme is caused by the entire deme essentially being reinitialized after every generation, owing to the fact that all the niches in the model space are occupied by lower-order demes. For the purposes of this work, we define a frustrated deme to be one in which every member is penalized for a majority of the generations of the run. Thus the number of nonfrustrated demes (NFDs) is related of the number of local minima existing at a specific level of resolution.

The most critical practical aspect of using an NGA is defining and quantifying the similarity of two models. A function must be constructed that takes as input two sets of model parameters (vectors) and outputs a scalar value related to the similarity of the models. The simplest approach is to define the distance, D, of two models, x and y, as the arith- metic average of the normalized separation of the model parameters,

1 ~ lxi - yil (1) D(x, y) = n i= 1 bi ai

where n is the number of model parameters, x i is the ith parameter of model x, Yi is the ith parameter of model y, b/ is the upper bound on the ith model parameter, and ai is the lower bound on the ith model parameter. Thus the distance varies from 0, for two identical models, to 1 for two models at opposite ends of the search boundary. The distance metric is problem specific and must be customized to the particular model parametrization, but a metric such as ours, which compares the decimal representation of two models rather than the binary representation (i.e., the Hamming distance), will produce more favorable results (Mahfoud, 1995).

The final step is defining a distance representing the critical level of similarity between two models; we refer to this as Ro, the critical radius of separation. When comparing the models in each deme to the alpha, beta, etc., models, it

is the value of Rc that determines whether a given model is eliminated from the deme or given the opportunity to prop- agate into the next generation. In conjunction with our dis- tance metric, using a low value of Rc causes the demes to inhabit a similar area of model space (in theory the region around the global minimum) simultaneously and indefi- nitely, while using a high value for R~ causes all of the demes except for the first to become frustrated. Thus the value of R c determines the scale with which the error landscape is examined for minima, and small wavelength topography on the surface of the objective function (resulting from errors and approximations in objective function evaluations) can be ignored by choosing a large enough value for Re. On the other hand, the fine structure of a rough, broad minimum can be investigated by choosing a suitably small value for R~.

NGA Example 1: 1D Analytic Test Problem

We first test our NGA via an analytic, 1D optimization problem. This problem is commonly used as a benchmark in evaluating NGAs and is referred to as M2 by Mahfoud (1995). The problem is stated as

minimize F ( x ) for 0 --- x <- 1.0, where

( F ( x ) = 1 - sin6(5z~x) exp -21n2 \ 0.8 ] ]"

It contains a global minimum at x = 0.1, and there are four other local minima, gradually increasing in value, which oc- cur in intervals of 0.2.

This function is one of the simpler benchmarks because of its low dimension and analytic nature, and so its solution is at most a minimum requirement to be met by a candidate NGA. We use it here only to test the viability of our NGA and not to compare the efficiency of our NGA with respect to previous approaches. Just as no particular type of GA, or optimization technique in general, is superior across the set of all possible objective functions (Wolpert and Macready, 1995), it is likely that no particular NGA representation is superior for all multimodal problems.

Table 1 NGA Control Parameters for 1D Test Problem

Lower search bound for x Upper search bound for x Search increment for x Number of distinct values for x Length of binary string Number of demes Number of models per deme Number of elitist models per deme Number of generations Probability of crossover Probability of mutation

0.0 1.0

6.1 × 10 -5 32,768 15 5

20 1

50 0.9 0.13

Multimodal Function Optimization with a Niching Genetic Algorithm: A Seismological Example 981

,b, ' [ (a) 0 . 8 ~ / 0. ~ 0.~

... 0.6 0.6 0.e

0.4 0.4 0.4 O

0.2 0.2 0.~

0 . ~ - - - ~ . - / - 7 - . 0 0 0 10 20 30 40 50 0 10 20 30 40 50

! . . . .

, , | , i , , , , *

0 10 20 30 40 50

1 [~ (d)

.. 0 . 6 I ' ~ . . . . . . . . . . . . . . . . .

0.4

0 10 20 30 40 50

1

0.8 0.6

0.4

0.2

0

~ - "- t

k .

\ 0 10 20 30 40 50

1 [-~77E~sz-777777777~7~TsTs~-7777rsz~:7~777~

0.8 t-~!~ If)

o.8 t ~', . . . . . . . . . . 0.4

0 10 20 30 40 50 Generation

0.8

0.~

o 0.4

0.2

0

v ~ (g)

\

(h) 0.8

0.6

0.4

0.2 L _ ~

0

0 10 20 30 40 50 0 10 20 30 40 50 Generation Generation

Deme #1 - - Deme #2 + Deme #3 -x- Deme #4 • -o- Deme #5

Figure 1. Results of the niching genetic algorithm for the 1D example problem. We show the time evolution of the best model from each deme for Rc values of (a) 0.0, (b) 0.01, (c) 0.05, (d) 0.10, (e) 0.25, (f) 0.50, (g) 0.75, and 0a) 1.0. With small R e values the demes act as simple genetic algorithms, and thus all the populations converge to the global minimum (a and b). With moderate values ofR c (c and d) the demes separate to the local minima, with each deme inhabiting its own niche in model space. As the value of R~ gets larger, more demes become frustrated and cannot search the model space; the niche(s) of the best deme(s) become so large that the other demes are effec- tively banished from the model space.

The control parameters for the run are listed in Table 1. We use five demes with the goal of each deme inhabiting one of the five local minima. Ultimately, the demes should fill the niches sequentially, with the first deme near the global minimum (x = 0.1) and the fifth deme near the most inferior local minimum (x = 0.9). The values for the other control parameters (i.e., Pc, Pro, the number of models in each deme, the number of generations, and the number of elitist models in each deme) are typical and consistent with successful values in the geophysical and optimization liter- ature.

The time evolution of the five demes is presented in Figure 1 for a variety of Rc values. The best models from each deme (the alpha, beta, gamma, delta, and epsilon mod- els) are shown as a function of generation. For the lowest value of Re, 0.0, the five demes all quickly inhabit the global minimum (Fig. la). Since there is effectively no artificial separation of the demes, this run is equivalent to doing five separate, simple GA runs. This illustrates the point that if a simple GA was repeatedly applied to this problem the exis- tence and location of the four inferior local minima would be undetected, no matter how many different starting pop-

982 K.D. Koper, M. E. Wysession, and D. A. Wiens

LL

1

0.8

0.6

0.4

0.2

0

1

0.8

0.E

0.4

0.2

0i

0.2

(a)

0.4 0.6 0.8 1

Deme #1

Deme #2

Deme #3

Deme #4

Oeme #5

(c)

0.2 0.4 0.6 0.8 1

1

0.8

0.6

0.4

0.2

0 0

(b)

012 0. '4 ' 0'.6 ' 0~8 ' i

U 0.8

0.6

0.4

0.2

0 • • • 0.8 x

Figure 2. Time evolution of the distribution of models for the 1D example problem, with an R c of 0.05. We show the five best models of each deme at generations (a) 1, (b) 10, (c) 30, and (d) 50. Initially the models are randomly distributed (a), but as the number of generations increases (b and c), the demes separate to the local minima. As shown in (c) and (d), the demes are in a steady state, with the best models remaining in the local minima indefinitely. The demes are also ordered sequentially, with the first derne inhabiting the global minimum, the second deme inhabiting the second-best min- ima, and so on.

ulations were tried. As the value of R~ is increased to 0.01- 0.10 (Fig. lb-d) the demes progressively become more sepa- rated and inhabit all five local minima (niches). As expected, they are arranged sequentially (demes 1-5) from the global minimum to the most inferior local minimum. As the value of Rc is further increased (Fig. le-h), in the range of 0.25- 1.00, more demes become frustrated. For example, with an Rc of 0.25 (Fig. le) only three NFDs exist. Although the alpha model inhabits the global minimum, the beta model does not inhabit the second-best local minimum but rather the third-best minimum, even though it belongs to a non- frustrated deme. The two lowest minima are closer together than the value of Rc, and so the NGA cannot find the second- best minimum. Likewise the gamma model is located in the worst local minima because it must maintain separation from the beta model. Thus the value of R~ determines the scale at which the model space is searched. In the extreme, low- resolution case, R~ = 1.0 (Fig. lh), only one NFD exists.

We have found that for this test problem, R~ values of 0.05-0.10 are effective at inducing the demes to inhabit the five local minima. Figure 2 shows the time evolution of the five demes with an R~ value in this range (0.10). The five best models from each deme are shown at generations 1, 10,

30, and 50. Initially, the models are scattered throughout the model space, but by generation 10 each of the demes has begun to colonize the appropriate niche in model space. As shown by the distribution of models at generations 30 and 50, the demes stably inhabit the niches once they have or- dered themselves. It is this niche stability that we emphasize in our particular NGA via the explicit separation of the demes.

It is, however, not always the case that the demes will be sequentially ordered early in a run. As Figure 2 shows, the demes are ordered by generation 10, but earlier disarray is indicated by the spikes in the evolution of the beta and gamma models in Figure ld. Since we use elitist selection in each of the demes, these curves should monotonically decrease with time; however, there are sharp increases in cost for the gamma model after the first generation, and for the beta model after the fifth generation. These spikes in cost represent the invasion of a deme's niche by a lower-order deme. The gamma model originally represents the second- best local minima, but is bounced out of that niche because the beta solution also initially inhabits this region. Like- wise at generation 5, the beta solution is bounced out of the global minimum when the first deme finds this optimal area

Multimodal Function Optimization with a Niching Genetic Algorithm: A Seismological Example 983

of model space. After generation 8 the demes are ordered with respect to niche quality, and so they are stable indefi-

nitely.

N G A E x a m p l e 2: T e l e s e i s m i c W a v e f o r m I n v e r s i o n

We next apply our NGA to a seismological problem: the inversion of teleseismic body waves for the source param- eters of an earthquake. Although this problem can be solved more generally using linear inverse theory (e.g., Nabelek, 1984) via the moment tensor formalism (Aki and Richards, 1980), we choose to parametrize our models using strike (0 ° -< 0 -< 360°), dip (0 ° -< ~ -< 90°), and slip (0 ° - - 2 -< 360°), so that only pure double-couple solutions are selected. Be- sides making the problem nonlinear, this parameterization leads to the existence of two global minima, one representing the fault plane and the other the auxiliary plane. Although this ambiguity can be eliminated by restricting the range of the slip vector (Zhao and Helmberger, 1994) and still pro- duce pure double-couple solutions, we leave the parame- terization unmolested so that the objective function retains two global minima and is thus ideal for the illustration of an NGA.

As additional parameters we include the focal depth of the event and the duration of the source time function, yield- ing a five-dimensional model space. Previous authors have shown that grid search methods are applicable for source inversions with a low number of parameters (e.g., Langston,

1981; Walter, 1993; Zhao and Helmberger, 1994); however, it also been shown that (1) a GA is computationally more efficient than a grid search for a low-dimension source in- version (Kobayashi and Nakanishi, 1994) and (2) GAs are capable of solving higher-dimensional source parameter in- versions in which grid searches would be infeasible (Zhou

Table 2 NGA Control Parameters for Focal Mechanism Search

Number of distinct models 8.59 × 109 Length of binary string 33 Number of demes 100 Number of models per deme 20 Number of elitist models per deme 1 Number of generations 50 Probability of crossover 0.9 Probability of mutation 0.06

Table 3 Model Parameter Bounds for Focal Plane Search

Parameter Lower Bound Upper Bound Increment Number of Values

Strike (o) 0 360 1.41 256 Dip (o) 0 90 1.43 64 Slip (°) 0 360 1.41 256 Depth (km) 16 60 0.70 64 Source time (sec) 2 25 0.74 32

et al., 1995). Cases such as the latter may arise if the near- source velocity structure is included as an unknown.

Data Processing for the Mw 7.2 Kuril Event of 1996

We choose to invert waveforms from the Mw 7.2 Kuril Island event that occurred on 7 February 1996. We use this particular earthquake because it meets the following three criteria: (1) it was large enough to produce teleseismic ar- rivals at stations with a variety of distances and azimuths, (2) it was large enough that a Harvard CMT solution exists (Dziewonski et aL, 1997), with which we can compare our solutions, and (3) it was small enough that its source time function can be described relatively simply. We gathered waveforms recorded at 24 stations, containing 23 P waves and 12 SH waves, from the IRIS Data Management Center (DMC). The stations are in the distance range of 30.6-89.3 °

with a wide variety of azimuths. We tapered the data, re- moved the instrument responses, and applied a bandpass ill-

1.8

1.6

1.4

1.2

g 1.o LL

z 0.8

o 0.6

0.4

0.2

0

R R k o Data

',k~. \ - - - Fit 1 : Iog(NFD) = -2.331og(Rc)

' ~ , , k \ - - Fit 2 : Iog(NFD) = -2.49log(Re) - 0.20

" ~ \ - - - Fit 3 : Iog(NFD) = -2.121og( R c)

o - .'- ~>..

0 0 0

0 0.'1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 R c

Figure 3. The increasing speciation of the models as Rc ~ 0. The circles indicate the number of non- frustrated demes (NFDs) for a given Ro value. The data are related in the form: number of NFDs = R£ -N, where N is an unknown scaling parameter. To esti- mate the value of N we fit curves to the data in three ways. First, we perform a 1D gradient search using the above equation (fit 1). Second, we use analytic expressions to calculate a linear fit in lOgl0--1ogl0 space (fit 2). Third, we use an analytic expression to calculate a linear fit in loglo--loglo space with the intercept constrained to be 0 (fit 3). Fits 2 and 3 differ from fit 1 in that equal weights are given to all the points; fit 1 tends to emphasize the fitting of only the lowest R c points. Fit 2 differs from fit 3 in that a non- zero intercept is allowed, adding a second parameter to be found. This curve provides a better means of interpolating between the observed values, though it is inconsistent with the above functional form. Nev- ertheless, in all three cases we find 2.0 < N < 2.5. Thus there is a dramatic increase in the number of available niches once R c drops below -0.3.

984 K.D. Koper, M. E. Wysession, and D. A. Wiens

1 I (a) 0.8

0.6

0.4

i i i i i , i i i i

0 10 20 30 40 50

1

0.8

0.E

0,4

0.2

0 0

(b)

~ ~ - - - . - - -

10 ' 2b aO ' 4b ' 5'0

1

f~ (c) 0.8 0.6

0 0.4 . . . . . . . . . \ .,""'

0.2 - - : - ; ; : ; ; ; _

0 I r i h i i , i i i ,

0 10 20 30 40 50 Generation

Deme #1

-- - Deme #2 )K Deme #3 X Deme #4

1

0.8

0.6

0.4

0.2

0 10 20 30 40 Generation

Figure 4. Results of the niching genetic algorithm for the inversion of teleseismic body waves for the source parameters of Mw 7.3 Kuril Island event of 1996. We show the time evolution of the best model from each of the four lowest-order demes for a series of runs with Rc values of (a) 0.1, (b) 0.2, (c) 0.3, and (d) 0.4. As was the case with the 1D analytic optimization problem (Fig. 1), with low values of Rc the demes do not effectively separate (a), but as Rc is increased the demes successfully inhabit specific niches in the model space (b and c).

50

ter in the 0.01 to 0.13-Hz range. We next trimmed the dis- placement waveforms so that each starts 30 sec before the theoretical P(SH) arrival time and has a duration of 100 sec. Finally, we resampled the data to have time intervals that are equivalent to the synthetics (1 sec) and normalized the displacements so that the value of the scalar moment is not relevant in the forward calculation.

Computation of Synthetic Body Waves

We use ray theory to compute the synthetic body waves for the Kuril Islands event. For a given earthquake depth, station distance, and station azimuth, we compute the three fundamental P wave and two fundamental SH wave syn- thetics. These synthetics correspond to a vertical strike-slip mechanism (P and SH), a vertical dip-slip mechanism (P and SH), and a fault plane dipping at 45 ° while seen at an azimuth of 45 ° (P). It is then possible to construct synthetics for an arbitrary focal mechanism by taking the appropriate linear combination of the fundamentals and convolving it with a source time function (Kroeger, 1987). This approach has been successfully tested against standard references (e.g., Langston and Helmberger, 1975) and used in several pre-

vious studies (e.g., Wiens and Petroy, 1990; Wetzel et al., 1993).

We hold the near-receiver velocity structures constant with values of 6.0 km/sec for P-wave velocity (a), 3.4 km/ sec for S-wave velocity (fl), and 2.6 g/cm 3 for the density (p), and we use a simple three-layer model to represent the near-source velocity structure: a 5-km-thick water layer (ce = 1.5 km/sec, fl = 0, p = 1.0 g/cm3), a 10-km-thick oceanic crustal layer (oz = 6.0 km/sec, fl = 3.5 km/sec, p = 2.6 g/cm3), and a half-space representing the upper mantle (c~ = 8.0 km/sec, fl = 4.5 km/sec, p = 3.3 g/cm3). We use a t* value of 1.0 for the P waves and 4.0 for the S waves to calculate the attenuation of the body waves. The source time function is represented by a half-sine wave. The focal depth of this event is given as 49 km in the Harvard CMT catalog, and so we make the assumption that the focal depth occurs in the half-space portion of our near-source velocity model (depth > 16 km). Because the ray parameter varies only slightly with depth in a given layer, we let the variation in depth be represented as a time shift in generating the initially upgoing and downgoing parts of the waveform. Thus each model is represented by 5 parameters: strike, dip,

Multimodal Function Optimization with a Niching Genetic Algorithm: A Seismological Example 985

20.(

15.0

"6 ,~ 10.0 E Z

(d)

0.(] 16.0

Deme #2

(a) Deme #4

20'0 f

15.0[

-6 10.0 Dome #2 Demo #1

0.0 20.0 40.0 60.0 Dip

20.0 (b) Dome #1

15.0

10.0

5.0

fi 0.0 80.0 0.0

Deme #2 Dome #4

Deme #3

100.0 200.0 300.0 Slip

Dome #4

Dome #3 I

26.0 36.0 46.0 56.0 Depth (kin)

10.0!-

5.0L

(=)

Dome #1

0.0 100.0

Dome#4

Deme #2

IE: 200.0 300.0

Strike

20.0 [

(el

15.0

10.0 Dome #3

0.0 2.0

Den ) #4 Deme #2

Dome #1

12.0 22.0 Source Time (s)

Figure 5. The final distribution of models from the four lowest-order demes for the waveform inversion problem (R c of 0.3). We show histograms of the model parameters of these 80 models for (a) dip, (b) slip, (c) strike, (d) depth, and (e) source time. Although the demes inhabit specific regions of model space, as shown by the large histogram spikes, the demes have not converged to specific points, as shown by the range of parameters that exist in the final generation. Both interdeme and intrademe diversity are maintained throughout the run.

slip, focal depth, and duration of the source time function. The values for the strike, the dip, and the slip yield the co- efficients for the linear combination of the fundamental seis- mograms, the depth value causes a time shift in the upgoing and the downgoing waveforms, and the value of the source time function defines the length of the half-sine wave that is convolved with the synthetic.

To evaluate the quality of a given set of model param- eters we cross-correlate each synthetic waveform with the appropriate observed waveform. This shape difference is calculated by summing the squares of the difference in dis- placement on a point by point basis. This error, summed over all the waveforms, is used as the cost for each model. We do not make allowances for data quality, thus all the wave- forms are given equal weights in the objective function. Once the fundamental synthetics are calculated it is com- putationally quick to evaluate a set of model parameters. Because the forward problem can be done so quickly, this inversion can be accomplished by a global search method such as a genetic algorithm or even a simple grid search (e.g., Wiens and Petroy, 1990; Wetzel et al., 1993).

Distance Metric for Waveform Inversion

Because we are searching in angular coordinates, we modify the distance metric presented in equation (1) to make use of the normal to the fault plane,

- sin(h) sin(0) n = -s in(6) cos(0)], (3)

cos( ) ]

and the slip vector,

cos(A) cos(0) + sin(A) cos(0) sin(0) d = - cos(A) sin(0) + sin(A) cos(6) sin(0)),

sin(A) sin(fi) (4)

when evaluating the distance of two models. The focal mechanism distance, Dfm, is then given by

Dfm (x, y) = 1 - (x n ' y n + Xd'Yd -I- 2)/4, (5)

986 K.D. Koper, M. E. Wysession, and D. A. Wiens

Alpha (Cost = 0 . 1 8 )

~ Dip = 59 ° (34 °} S l ip = 75 ° (113 °) S t r i ke = 42 ° (249 °) Depth = 48.9 k m Source Time = 14.4 s

Beta (Cost = 0 . 1 9 )

Dip = 34 ° (56 °) S l ip = 54 ° (94 °) Strike = 209 ° (36 °) Depth = 26.9 k m Source Time = 20.6 s

C U T ( C o s t = 0 . 2 6 )

Dip = 64 ° (28 °) S l ip = 78 ° (113 °) Strike = 30 ° (235 °) Depth = 48.7 k m S o u r c e T ime = 19.8 s

Gamma ( C o s t = 0 . 3 2 ) Delta (Cost = 0 . 5 2 )

S l ip = 135 ° (50 °) S l ip = 293 ° (213 °) S t r i ke = 306 ° (69 °) S t r i ke = 195 ° (315 °) Depth = 52.4 k m Dep th = 57.9 k m S o u r c e T ime = 8.2 s Source Time = 11.5 s

Figure 6. The final results of the waveform inver- sion with an Rc of 0.3. The four lead models, alpha, beta, gamma, and delta, are shown along with the CUT solution in a lower-hemisphere focal projection of P-wave motion. The white circles represent the sta- tion locations for which P-wave data were available.

while the distances for the other two model parameters, Dd~pth and Dstim e, are given by the expression in equation (1). The total distance, D(x,y), of two models is defined by a weighted average of the constituent distances:

D(x, y) = (3Dfm(X, y) + Ddepth(X, y) q- Dstime(X , y))/5. (6)

This metric discriminates efficiently between source models while retaining the property of the general metric presented in equation (1): it varies in value from 0 for two identical models to 1 for two grossly different models.

Results

We apply the NGA to the waveform inversion problem using the control parameters presented in Table 2 and the model parameter bounds shown in Table 3. Although we know a priori that the objective 7unction contains two global minima, it is not clear that other local minima do not exist with fits similar to the global minima. We can investigate the nature of the objective function at different scales by determining the number of nonfrustrated demes (NFDs) for various values of Re. The number of NFDs represents the number of available niches for a particular R~ value; this value can be interpreted as the maximum number of local minima at a given scale. The actual number of minima may be much smaller, since as Rc ~ 0 it becomes possible for the niches of differing demes to inhabit the same error well.

We define a deme as frustrated if its entire population is reinitialized for a majority of the generation. Figure 3

shows the number of NFDs for a series of Re values. For Rc = 0.1 none of the 100 demes are frustrated; we find that the minimum Rc value for which we can calculate the num- ber of NFDs (without increasing the number of demes) is 0.15. The NFD versus Ro relation has two fundamental prop- erties: as R e ---> 0 the number of NFDs ---> ~, and as R~ ~ 1 the number of NFDs ---> 1. This implies that the number of NFDs = R~ N, where N is an unknown scaling factor. We fit the data to a curve of this form in three ways and find that 2.0 < N < 2.5.

Since the large number of niches for small Re values is not necessarily indicative of local minima, we restrict our attention to the behavior the four lowest-order demes. The time evolution of the alpha, beta, gamma, and delta models is shown for a variety of R~ values in Figure 4. With the lowest value, Rc = 0.1, the four demes show no separation, indicating that more than one deme are inhabiting the same error well. As R~ is increased to values of 0.2-0.4 (Fig. 4b- d) deme separation occurs. Nevertheless the alpha and beta models inhabit almost equivalent niches over this broad range of Re values, as shown by the cost value, implying that two large-scale global minima exist.

Other than the two global minima (niches 1 & 2), there appear to be at least four additional significant local minima of the objective function--two with a cost of --0.3 (niches 3 & 4) and two with a cost of - 0 . 5 (niches 5 & 6). Niches 1 and 2 are inhabited by the alpha and beta solutions for each of the four runs, niche 3 is inhabited by the gamma model for R c = 0.2 and R~ = 0.3, niche 4 is inhabited by the delta model for R e = 0.2, niche 5 is inhabited by the delta model for Rc = 0.3, and niche 6 is inhabited by the gamma model for Rc = 0.4. Niches 3 and 4 (5 and 6) can be distinguished by examining the model parameter values, since the cost values of these niches are almost identical. The pairing of the niches is a direct consequence of the bi- modal nature of the source mechanism inversion.

Having found that R~ values of 0.2-0.3 produce an in- teresting separation of the demes, we take a closer look at the time evolution of the four lowest-order demes for Ro = 0.3. Initially, the parameters are scattered about the model space and none of the demes show a preference for any niche. This is expected since the initial populations of mod- els are randomly generated. During the middle of the run (generation 25) a clear separation between the demes has occurred (Fig. 4c), and the distribution of parameters indi- cates that the demes have begun to colonize niches; thus the top models in each deme show a high degree of similarity. By the end of the run the first two demes, representing the fault plane and auxiliary plane solutions to the problem, have remained in the same niche they inhabited at generation 25 and are stable indefinitely. The two higher-order demes have found local minima of the objective function and so have also inhabited the same area in model space over time. The final distribution of model parameters for the the four low- est-order demes is shown in Figure 5.

The final alpha, beta, gamma, and delta solutions are

Multimodal Function Optimization with a Niching Genetic Algorithm: A Seismological Example 987

lOO s

Figure 7. The synthetic waveforrns for the alpha solution compared with the observed data. These dis- placement seismograms are shown in 100-sec lengths for all the P-wave data used in the inversion. The distances are listed on the left, and the station azi- muths are listed on the fight. Both numbers are in units of degrees.

C CTAO~/ 65, 184 ~ 6 0

Figure 8. The synthetic waveforms for the alpha solution compared with the observed data. These dis- placement seismograms are shown in 100-see lengths for all the SH wave data used in the inversion. The distances are listed on the left, and the station azi- muths are listed on the fight. Both numbers are in units of degrees.

presented in a lower hemisphere focal projection of P-wave motion in Figure 6, with the waveform fits of the alpha model presented in Figure 7 (P waves) and Figure 8 (SH

waves). The cost of the alpha solution is 0.18 while the cost of the beta solution 0.19, and they are very nearly conjugate solutions, as is expected, and are similar to the CMT solution. They are not exactly conjugate because of (1) the discreti- zation introduced when using a binary-encoded GA, (2) the fact that, in general, GAs find only near-optimal solutions, and (3) the trade-offs of the slip vector and fault plane pa- rameters with the two additional model parameters, focal depth and source duration.

Conc lus ions

We have presented a mechanism for generating distinct solutions to multimodal optimization problems via an NGA. This method does not require the number of local minima to be known a priori and can be used to deduce the number of local minima at various spatial scales in the model space. The NGA is significantly different than a simple GA, which is not explicitly designed to find and, especially, maintain the local minima of an objective function. Our NGA is spe- cifically designed to enhance the temporal stability of the niches.

The utility of our NGA is illustrated by the inversion of teleseismic body waves for the source parameters of the Mw 7.2 Kuril Islands earthquake of February 1996. The funda- mental bimodal nature of the objective function is well un- derstood, thus this problem makes an excellent example for the illustration of an NGA. Our NGA is successful at deter- mining that two global minima exist over a range of scales, representing the fault and auxiliary planes, and that several inferior local minima exist as well. Although the inversion of source parameters, as we have parameterized the problem, can be accomplished by a simple grid search, the NGA ap- proach is more efficient computationally; more importantly though, an NGA can be applied to higher-dimensional source inversions such as when the near-source velocity structure or the details of the source time function are included as model parameters. In such cases a grid search is infeasible, and an NGA provides the means for a complete investigation of the minima of the objective function.

References

Aki, K. (1993). Overview, in Seismic Tomography Theory and Practice, H. M. Iyer and K. Hirahara (Editors), Chapman & Hall, London, 1-6.

Aid, K. and P. G. Richards (1980). Quantitative Seismology: Theory and Methods, W. H. Freeman, San Francisco.

Bina, C. R. (1998). Free energy minimization by simulated annealing with application to lithospheric slabs and mantle plumes, Pure AppL Get- phys. 151, 605-618.

Dejong, K. A. (1975), An analysis of the behavior of a class of genetic adaptive systems, Ph.D. Thesis, University of Michigan, Ann Arbor.

Dziewonski, A. M., G. Ekstrom, and M. P. Salganik (1997). Centroid-

988 K.D. Koper, M. E. Wysession, and D. A. Wiens

moment tensor solutions for January-March 1996, Phys. Earth Planet. Inter. 102, 1-9.

Goldberg, D. E. (1989). Genetic Algorithms in Search~ Optimization, and Machine Learning, Addison-Wesley, Reading, Massachusetts

Goldberg, D. E., and J. Richardson (1987). Genetic algorithms with sharing for multimodal function optimization, in Genetic Algorithms and Their Applications: Proceedings of the Second International Confer- ence on Genetic Algorithms, J. J. Grefenstette (Editor), Lawrence Erl- baum Associates, Inc., Hillsdale, New Jersey, 41-49.

Holland, J. H. (1975). Adaption in Natural and Artificial Systems, Univer- sity of Michigan Press, Ann Arbor.

Kobayashi, R., and I. Nakanishi (1994). Application of genetic algorithms to focal mechanism determination, Geophys. Res. Lett. 21, 729-732.

Kroeger, G. C. (1987). Synthesis and Analysis of Teleseismic Body Wave Seismograms, Ph.D. Thesis, Stanford University, Palo Alto, Califor- nia.

Langston, C. A. (1981). Source inversion of seismic waveforms: the Koyna, India, earthquake of 13 September 1967, Bull. Seism. Soc. Am. 71, 1-24.

Langston, C. A., and D. V. Helmberger (1975). A procedure for modelling shallow dislocation sources, Geophys. J. R. Astr. Soc. 42, 117-130.

Mahfoud, S. W. (1995). Niching Methods for Genetic Algorithms, Ph.D. Thesis, University of Illinois, Champaigne, Illinois.

Menke, W. (1989). Geophysical Data Analysis: Discrete Inverse Theory, Revised Edition, Academic Press, San Diego.

Nabelek, J. L. (1984). Determination of earthquake source parameters from inversion of body waves, Ph.D. Thesis. Massachusetts Institute of Technology, Cambridge, Massachusetts.

Stein, S., and M. E. Wysession (1998). Introduction to Seismology, Earth- quakes, and Earth Structure, Blackwell, in preparation.

Stoffa, P. L., and M. K. Sen (1991). Nonlinear multiparameter optimization using genetic algorithms: inversion of plane-wave seismograms, Ge- ophysics 56, 1794-1810.

Vasco, D. W., J. E. Peterson Jr., and E. L. Majer (1996). Nonuniqueness in travel time tomography: ensemble inference and cluster analysis, Geophysics 61, 1209-1227.

Walter, W. (1993). Source parameters of the June 29, 1992 Little Skull Mountain earthquake from complete regional waveforms at a single station, Geophys. Res. Lett. 20, 403-406.

Wetzel, L. R., D. A. Wiens, and M. C. Kleinrock (1993). Evidence from earthquakes for bookshelf faulting at large non-transform ridge off- sets, Nature 362, 235-237.

Wiens, D. A., and D. E. Petroy (1990). The largest recorded earthquake swarm: intraplate faulting near the Southwest Indian Ridge, J. Geo- phys. Res. 95, 4735-4750.

Wolpert, D. H., and W. G. Macready (1995). No free lunch theorems for search, Technical Report SFI-TR-95-02-010, Sante Fe Institute.

Zhao, L. S., and D. V. Helmberger (1994). Source estimation from broad- band regional seismograms, Bull. Seism. Soc. Am. 84, 91-104.

Zhou, R., F. Tajima, and P. L. Stoffa (1995). Earthquake source parameter determination using genetic algorithms, Geophys. Res. Lett. 22, 517- 520.

Department of Earth and Planetary Sciences Campus Box 1169 Washington University One Brookings Dr. St. Louis, Missouri 63130

Manuscript received 24 October 1998.

Recommended