A New, Fast Parallel Simulated Annealing Algorithm for

●

✌ m8m

SooIetyof PetroleumEnglneere

SITE 26419

A New, Fast Parallel Simulated Annealing Algorithm for ReservoirCharacterizationAhmed Ouenes, New Mexico Petroleum Recovery Research Center, and Naji Saad,Mobil E&P Technical Center

SPE Members

Copyright 1993, Society 01 Petroleum Engineers, Inc

This paper was prepared for presentation at the 66th Annual Tachmcal Conference and Exhlbmon of the Society of Pelroleum Engineers held m Houslon. Texas, 3-6 Oclohar 1993

This pepar was selectad for presantatlon by an SPE Program Commdtee following review 01 Intormatlon contemed in an abstract aubmnted by the eulhor($). Contents of the peper.as presented, have not been reviewed by Ihe Society of Pelroleum Engineers and sue Subjecl to correction by the aulhor(a) The material. aa presented, does no! necessarily reflaclany poshon of the Soclely of Petroleum Enginaars. ds officere, or mambers. PaPars presented at SPE maelinga are subject to pubbcallon rewaw by Editorial Commmeea of Iha Societyof Petrofewm Enginaere. Perm@amn to copy ie restricted to an abatract of not more than 300 words. Illustretlona may not be copied. Tha abstrect snould wntain conspicuous acknowledgmentof whare and by whom the peper la presented. Write Lfbranan. SPE, P.O. Box 833836, Richardson, TX 75083.3836, USA. Telex, 163245 SPEUT

ABSTRACT

This paper presents a new parallel simulated annealing algo-rithm for computational intensive problems. The new algo-rithm enables us to reduce the overall time required to solvereservoir engineering problems by using the simulated an-nealing method (SAM). A simple geostatistical optimizationproblem (variogram matching) applied to two fields is usedfor illustration purposes. The reduction of computation timestarts by optimizing the sequential simulated annealing al-gorithm. This task is achieved by an efficient coding and anappropriate choice of topology. Three different topologies areused and their effects on the overall run time and the qualityof the generated image are discussed. After optimizing thesequential algorithm, the problem of high rejection rate atlow annealing temperature is solved by using parallelizaticm.The new algorithm uses, in an optimal manner concurrentlymany sequential algorithms. The number of concurrent al-gorithms is adjusted throughout the optimization to increasethe acceptance rate with the optimal use of a CPU. The newalgorithm was implemented on a CRAY Y-M P with 4 pro-cessors. A 50,400 (280x180) gridblock field was used to testthe parallel optimization method. The overall run (clock)time was reduced by approximately the number of concur-rent calls of the sequential algorithm.

INTRODUCTION

During the last three years. the interest in global opti-mization methods in the oil and gas industry has increased.Farmer’ introduced simulated annealing method (SAM ) ingeostat istical optimization. This global optimization

References and illustrations at end of paper

method, developed simultaneously by Kirkpatrick et al.z at113M, Siarry3 in Paris, and Cernj4 at 13ratislava, has beenappiied successfully in more than twenty industrials applica-tions. The resulting computer codes has been used as matureengineering tools and allowed an unexpected gain of produc-tivity in various industrial processes. However, the applica-tion of simulated annealing to geostatistical optimization hasyet to reach this maturity.

This new technique is now added to the increasing num-ber of stochastic models developed these recent years hopingto provide some additional capabilities that will improve thecurrent status of reservoir description. Deutsch and Journe15added this technique to the GSLIB library which includesmore than twenty stochastic modeling tools. Ouenes et al.6and Ghori et al.7 compared SAM to other geostatisticalmethods, not only by looking at the generated images, but byusing 3-D reservoir simulation and tracer t.est.7. From thesecomparisons based on fluid flow, they found that SAM w?probably better than the smooth kriging but not necessarilybetter than other stochastic models tested for that particularproblem. However, one fact was certain–- SAM was slowerthan other stochastic models. From the early use of SAM,the major complaint has been the computation time. Onesolution to this problem is the use of parallel algorithms andcomputers. Sen et al.8 proposed genetic algorithms insteadof SAM, because of the higher potential of parallelization ofgenetic algorithm, H~wever, the computation time problemrequires global consideration, and all the factors contributingto the high run time requires careful investigation.

Actually, the computation time is not at all a problem forthe application of simulated annealing to geostatistical opti-mization. Ouenes et aL9- 12 and Deutsch and Journe15 usedsimulated annealing to match experimental g- 12 and n~ode15variograms. The advantage of choosing an objective func-tion based on variograms is the considerable reduct]on of

19

e I

.

2 A NewFast Paralfel Simulated AnnealingAlgorithmfor ReservoirCharacterization SPE 26419

computation time because of the updating procedure. Inother words, the new simulated variogram is obtained fromthe old one with some simple operations as described byDeutsch and Journe15. Moreover, with the vectorization ofthis updating procedure the cost of computing the objectivefunction becomes insignificant compared to the computationtime required in other problems.

The most important problems in oil and gas reservoirs arethose which involves fluid flow, which is the major differencebetween a mine and an oil or gas reservoir. Unfortunately,extensive use of traditional geostatistical methods have notdramatically improved reservoir forecasting*3. This majorcharacteristic (fluid flow) of oil and gas reservoir was nottaken into account when assessing reservoir properties. Re-cently, other authors 14-15 combined geostatistical methodsand some localized flow performances by using simulatedannealing. The obtained realizations are conditioned to welltest and honor the effective absolute permeability around thewel114, However, the simulated annealing is set in a geosta-tistical framework where the only focus is the permeabilitydistribution. The simulated annealing as a global optimiza-tion has a wide range of applicability in the oil and gas in-dustry, and the real potential of this algorithm, and otherglobal optimization methods, are not fully used.

The most promising applications of SAM in reservoir engi-neering are for inverse modeling problems. In such problems,the major focus is on the dynamic data such as productionor pressure history, which is the most important informa-tion in a reservoir. For many decades, various deterministicmethods have been used for inverse modeling and automatichistory matching. None of the algorithms developed has yetfound widespread use. On the other hand, the use of SAMin inverse problems open new horizons, The first inverseproblem solved by SAM was at core scalel 6 where relativepermeability and dynamic capillary pressure curves were es-timated simultaneously from laboratory coreflood. The sec-ond and third applications were at reservoir scale, where theupscaling problem is avoided. By using a simplified reser-voir model, permeability and porosity distributions as wellas other reservoir engineering parameters were estimated bymatching pressure history 17in gas reservoirs. In the third ap-plication, the previous automatic history matching algorithmwas extended to oil reservoirsls and infill drilling problems.

The application of SAM to inverse model in lead to usefulengineering tools routinely used by industry, 1~-]7 where en-gineers are freed from the time consuming history matchingprocedure. Because of these practical benefits, our major useand interest in global optimization and SAM is for inversemodeling, which implicitly includes geostatistical optinliza-tion. For such applications, a reservoir simulator is used ateach trial solution and makes the cost of computing the ob-jective function a real concern. At the same time, the geosta-t.istical optimization problem is also solved at each iteration.Therefore, the computation time for the geostatistical opti-mization component must be reduced as well. Our objectiveis to use all possible means to reduce the overall computationtime: the hunt for unnecessary computation, and optimaluse of current and future parallel computers hw begun. Inthis paper, we address the reduction of computation for thegeostatistical component.

The first concern is to reduce the computation time re-quired for the objective function. The ideal solution for thisproblem is to update the objective function rather than re-

compute it at each iteration. However, this ideal solution isnot easy to implement in inverse modeling where a systemof partial differential equations need to be solved. The sec-ond concern ‘is to reduce the number of iterations required toreach a certain low value of the selected objective function.There are two major ways to reduce the overall number ofiterations, which is proportional to the execution time. Thefirst way is to use appropriate topologies to transform thecurrent state of the estimation paratneters, to obtain a newstate to be tested. The second way is to use parallel comput-ers, so many trials solutions can be tested concurrently andat least one of them will be successful. These two conceptswill be illustrated with the application of SAM to geostatis-tical optimization.

SAM TOPOLOGY

In the past three years at least fifteen papers have beenproduced on the application of SAM to geostatistical opti-mization. However, many aspects of the algorithm were notaddressed. In geostatistical optimization, a simple and pe-culiar application of SAM, the algorithm is used to generatea given realization of a geological or petrophysical param-eter, for example rock permeability (or its logarithm), byminimizing the objective function J.

J = ~ w. ~(ml) - 7’e(L))2 (1)U/r L-I h.

where ~S(ha) and -ye(ho) are the simulated and experimerl-tal values of the variogram at the lagged distance ha in thespecific direction a, and W= is a weight factor. The con-straints imposed on the optimization problem are the knownlocations where field data are available. The algorithm startsby generating an initial random image which satisfies all thestatistics of the available field data. This image is trans-formed during each iteration by using a given topology thatgives the rules to change a given image. Once the new im-age is generated, the new objective function is computed andthe change in the objective function, AJ, is deduced for theMetropolislg acceptance rule. The rule consists of keepingthe obtained configuration if AJ < 0 and continuing theprocess. If AJ > 0, the new configuration is kept with aprobability equal to eap(-AJ/f3) where O is the annealingtemperature. The algorithm starts with an initial tempera-ture, L90, which is reduced after a given number of iterationsaccording to the simple rule :

O= AXOO (2)

where A is the reduction factor. Both initial temperature, 00,

and J can be chosen automatically; additionally} A, can beadjusted during the optimization procedure20. In a geosta-tistical optimization problem where the objective functionvaries smoothly, constant reduction factor such as ~ = 0.9can still allow convergence. However, in inverse modelingwhere the objective function may have chaotic variations* 7,it is better to adjust the reduction factor A according to thebehavior of the objective function.

From the early use of SAM in various industrials appli-cations, we have noticed that the rate of convergence, orthe total number of iterations required to achieve a certainlow value of the objective function, depends on the topologyused, This important point was rarely reported in the litera-ture and usually was considered as a “know-how” of the user.

20

9

,.

SPE 26419 AHMED OUENES, and NAJI SAAD 3

However, some authors21 ’22 discussed this aspect briefly intheir publications. In the application of SAM to geostatisti-cal optimization this issue is crucial since it is directly linkedto the computation time. In this section, the effect of vari-ous topologies on the computation time and total number ofiterations is discussed.

There are various ways to perform an elementary trans-formation on a given image. One simple topology is to se-lect two points ( 1 pair ) randomly in the image and swaptheir values. The same idea can be extended to 2 or morepairs. Another topology commonly used in Markov fieldsare neighborhood points. A point is selected randomly inthe field, and the swapping is performed on 4 or more of itsclose neighbors. However, as mentioned earlier, the applica-tion of SAM to geostatistical optimization is peculiar. Thereason is that the cost for computing the new variogram de-pends on the number of points used in the topology. If thecost for updating the variogram after 1 pair swapping is 2computing units, then the cost for 2 pairs (random or neigh-bors) swapping will be 4 computing units. The cost of anobjective function depending on the number of parametersdisturbed, is rarely found in other applications. When swap-ping the values from one location to another, the Markovianformalism guarantees that the mathematical convergence ofSAM is respected. However, because of the computationtime constraint, and the peculiarity of this problem, Sen etal.s reduced the computation to its minimum by disturbingonly one value at each iteration. Therefore the cost of updat-ing the variogram is reduced to 1 computing unit. However,this transformation do not respect the Markovia~ formal-ism and the mathematical convergence needs to be justified.Additionally, the moments of the distribution are not nec-essarily preserved. We are limiting this paper to numericalexperimentation, as it is very often the case with SAM, anduse a practical example to illustrate the difference between3 topologies.

For the first case, a training image of 2500 gridblock(50 x 50) shown in Fig. 4a and its variogram (Fig. 5)is used for generating simulated images by using three dif-ferent topologies. The horizontal and vertical experimentalvariograms are matched and 100 randomly distributed con-ditioning points are used as a constraint for generating thesimulated images. The stopping criterion of the algorithmfor all the topologies is the same. Automatic matching isstopped when the error on both horizontal and vertical var-iograms for all the lags considered is less than 0.01, whichmeans that the simulated variogram values match the exper-imental ones up to the second digit (Fig. 5b). Notice that weare matching experimental variograms rather than a modelvariogram. Because SAM includes matching the given vari-ogram, lag by lag, it is not necessary to use model variogramsof the experimental data, and the experimental variogral,~ it-self can be used for the matching. In this new framework, themore valid lag distances (lags that have a significant numberof pairs) the user includes in his matching, the better is thegenerated image. For the considered example all the lags areused for the matching.

The first topology, called the 1pt topology, consist of ran-domly choosing one point in the image and replacing its cur-rent value with a new value drawn randomly from the cdf ofthe training image. After 99,430 iterations, the convergencewith this topology was achieved for the realization shown inFig. 4b. The major problem with this topology is the statis-

tics of the final realization. Because, the initial statistics ofthe random image are disturbed when drawing from the calf,the a~erage value and the variance of the simulated imageare shghtly different from the values of the training image.This will also be true for the higher moments of the distri-bution. However, this difference does not exceed 1 % if anappropriate cdf is used for this example. To minimize thisdifference, the number of points in the cdf need to be closeto the number of gridblock in the training image. In otherwords, it is not recommended to use a limited cdf and inter-polation. Another problem with this topology is the risk ofsubstantial changes in the simulated images caused duringthe optimization process. As mentioned earlier, this topol-ogy allows the best run time since it uses only one point inthe updating procedure. The total run time for this examp, swill be considered as the base line (see table 1).

The second topology, called 2pts topology, consists ofswapping two values chosen randomly in the image. Theconvergence with this topology was obtained after 139,780iterations, and the simulated image is shown in Fig. 4c. Thefinal realization satisfies all the statistical attributes of thetraining image including the shape of the calf. Finally, thethird topology, called 4 pts topology, consist of swapping 2pairs at the same iteration. With this topology the algo-rithm needed 758,931 iterations to converge and produce theimage shown in Fig. 4d. Before discussing these results itis important to emphasize that these num-ber of iterationsare given for one realization. Another run will use anothersequence of random numbers and will lead to different num-bers of iterations as well as different images, However, thedifference in the number of iterations will rarely exceed 10%, and the main features of the images will be representedin a similar way.

When comparing the total computation time of thesethree examples (see table 1) as expected the Ipt topologyis the best, followed by the 2pts topo!ogy which is 2 timesslower and finally the 4pts topology which is 18 times slowerthan the 1 pt topology. Based on the computation time,the 1 pt topology is the fastest, but the obtained image hasslightly different statistics from the training image statistics(see. Fig. 3). Although all the generated images (Fig. 4b,4c, and 4d) seem to contain the main features of the trainingimage (fig. 4a), it is better to compare tbe produced realiza-tions in a quantitative manner. A simple way to achieve thisgoal is to use the correlation coefficient, CC, which rangesfrom-1 to 1.

CC’ = 2X21 - 3(Y’ -Y) (3)~~(Zi - 7)2 ~i(Yi ‘~)2

where Zi is the value of the stochastic variable in the grid-block i, and Yi is the value of the training image at the samelocation i. ~ and ~ are the average values of Z~ and Yi forall the N gridblock i, When CC is close to 1, the generatedimage is completely correlated to the training image, or inother words, contains all its features. At CC = O, there isno correlation between the generated image and the trainingimage. Based on the correlation coefficient. the best imageis the one that has the highest CC (see table 1). In thisexample, the best image is the one generated with the 2 ptstopology. Based on both criteria used, the 4 pts topology isnot recommended for this example. However, for other fields

I

21

●

✎✎

4 A New Fast Parallel Simulated Annealing Algorithm for Reservoir Characterization SPE 26419

with different sizes and different variogram structures, theperformances of these topologies may be reversed, and a 4pts topology (2 pairs random or 4 neighbors) may give bettercomputation time than a 2 pts topology by simp ;’ requiring 2times to 4 times less iterations23. It seems that when hetero-geneities are very localized, higher number of perturbationpoints in the topology will lead to fewer number of iterations.However, when the variations are more spatially apart, the2pts topology will result in fewer number of iterations.

Finally, this example shows clearly how the overall runtime can change drastically when using different topologies.But, there is no general rule for choosing the best topology.That choice depends on the field in consideration. For otherapplications of SAM, the same problem arises, and consider-able saving in run time can be obtained if appropriate topolo-gies are used. Once the ap >ropriate topology is found, thesequential algorithm can be rurther optimized by appropriatecoding. For example, the update procedure used for the var-iogram can be highly vectorized when using a CRAY. Thesereductions in computation time are designed for a sequentialsimulated annealing algorithm which still have a major prob-lem when an image that decreases the objective function isnot found. At low annealing temperature, the probability ofaccepting images which increases the objective function be-come very low. Therefore, man y trials images are rejected,and the time spent for computing the objective function iswasted. This problem can be solved by using parallel simu-lated annealing.

PARALLELIZATION OF SAM

Before addressing the parallelization of SAM, it is im-portant to remind the reader that other global optimizationmethods exist and are currently used in various applica-tions. A brief description and the parallelization potentialof each method can be found in 0uenes20. As mentionedearlier, genetic algorithm is better suited for parallelizationthan simulated annealing. However, before considering theparallelization, it is important to reduce the computationtime by considering the topology aspect and the cost of com-puting the objective function. Because of the possibility ofonly updating the variogram when using SAM, it is may beunnecessary to consider genetic algorithms for geostatisticaloptimization. The reason for this is that implementing theupdate procedure for computing the new variogram is notpractically possible with genetic algorithm. Although thenumber of necessary generations required to obtain a goodimage with genetic algorithm is considerably less than thenumber of iterations required by SAM, still, the computa-tion time with genetic algorithm is 3 to 5 times more thanthe computation time with SAM. Therefore, in geostatisticaloptimization we will focus on SAM in this paper.

The idea of a parallel simulated annealing algorithm mayseem to contradict the philosophy of the method which isundoubtedly sequential. Different types of alternatives areproposed in the literature, and the Azencott’s book24 is thecomplete reference in this area. Many parallel simulated an-nealing algorithms have been developed since the mid 1980’s.One approach proposed by Aarts et af.25 consists of run-ning the algorithm in parallel on N P Markov chains. EachMarkov chain of constant length LC is assigned to a givenmicroprocessor. In addition, each Markov chain is dividedinto sub-chains of length LC/N P. The first processor starts

on the first chain and performs the first LC/NP iterations(the first sub-chain); from these computation, the algorithmcomputes the temperature for the second Markov chain. Thesecond processor uses this temperature, and the optimal con-figuration obtained by the first processor to start the secondMarkov chain. During this time, the first processor starts towork on the second sub-chain of the first chain. This pro-cedure is extended to all the processors, which are updatingboth temperature and optimal configuration according to thecomputation performed by each processor by the one beforeit. The computation time for such an algorithm is

TCpa, =me x 1X x (fvc-t-NP - 1)

NP(4)

where TCe is the time required for one elementary perturba-tion, and NC the total number of Markov chains used duringthe computation. We remind the ,eader that the computa-tion time for the sequential algorithm is

Tc,eq =TCex LCx NC (5)

which shows that the computation time is divided by a fac-tor NP (if N P is small compared to NC). The main disad-vantage of such an algorithm is that the convergence is notguaranteed, and using it may lead to a waste of time insteadof reducing computation time.

Another approach to parallelization was proposed byKravitz and Rutenbar26 where parallel computation are per-formed on the same Markov chain. The idea is based onthe fact that at low temperature, many trials are rejected.Therefore, instead of trying different configurations in asequential manner, it is possible to try them in parallel.Roussel-Ragot27- 2s proposed the following strategy to im-plement such an algorithm in a way that guarantees that theresult obtained will be the same as the sequential one.

There are N P processors which are used to perform oneor more elementary perturbation and Metropolis’ acceptancerule is applied after each transformation. The N P processorsare used at the same time during the same process, and thereare two modes:1. At high temperature, each processor performs only one

elementary perturbation. After all the N P processors aredone, one of the optimal configurations obtained by theprocessors is chosen randomly, and used to update all theprocessors.

2. At low temperature, the NP processors may not provide anew optimal configuration during one process. Therefore,many process are required to find an acceptable transfor-mation. Once one processor finds this new solution, allthe processors are updated.

In contrast to the previous parallel algorithm, this approachconverges exactly like the sequential algorithm. Finally, athird parallel algorithm proposed by Casotto et U1.29testedon a placement problem, gave comparable results to thoseobtained by a sequential algorithm. The new algorithm pre-sented in this paper uses the Roussel-Ragot2s approach withmajor improvements in the use of the parallel processes andannealing schedule.

Our objective in using a parali-4 simulated annealing al-gorithm is to reduce the overall computation time for a givenproblem. However, we want this time reduction at minimumcost in terms of real dollars. When using a parallel com-puter. the CPU time increases with the increase of parallelor concurrent runs of a specific sequential algorithm. For ex-ample if the cost of running one sequential algorithm is 1 unit

22

.

SPE 26419 AHMEDOUENES,and NAJ[ SAAD 5

time or $1, the cost or running two of the same algorithmh parallel will be 2 unit time, or $2. Therefore, the goal indeveloping a parallel algorithm is to increase the number ofparallel runs only when needed .

The parallel algorithm will be composed of a master andparallel slaves. The master or optimization manager, is re-sponsible for the following tasks:1. Determining the number of necessary concurrent calls of

the sequential algorithm (optimal number of slaves), and2. managing and adjusting the annealing schedule.

The number of concurrent calls required by the masterdepends on the acceptance rate, AR, at the previous tem-perature level.

~R _ MLC

(6)

where LS is the total number of successful iterations ob-tained during the previous level of temperature that has aconstant Markov chain length LC (maximum number of ite-rations allowed for a given temperature level). The genera]trend of the AR for a sequential algorithm is shown in Fig.2. Notice two major points:

1. Half of the number of iterations required to converge areperformed with an acceptance rate less than 0.1

2. The average acceptance rate decreases, but AR have achaotic behavior where the increase and decrease alter-nate.

The new parallel algorithm uses in an optimal way the vari-ation of the acceptance rate. [n contrast to Roussel-Ragotwhere only two or three modes of temperature are used, weuse more than 6 modes corresponding to different ranges ofacceptance rate. Moreover, the number of processors usedis not the same for all the modes. For example, the high-est temperature mode is defined by AR = [1., 0.7]. In otherwords, when the acceptance rate is in this range, the masterwill use only one sequential SAM ( 1 slave), and unnecessaryparallel runs are avoided to save CPU time. When the accep-tance rate drop below AR = 0.7 another temperature modestarts, where the master will require the use of two parallel orconcurrent calls of the sequential SAM (2 slaves). If both se-quential calls are successful (the two images are accepted bythe Metropolis rule), then the master will choose randomlyone of them. If only one of the concurrent calls satisfies theMetropolis rule, then the master will select this image as thewinner, and update all its parameters. The new image is thenpassed to the two parallel sequential algorithms for anothertrial. The number of parallel calls is adjusted throughoutthe optimization according to the acceptance rate. If theacceptance rate drop into a lower temperature mode, themaster will add one more parallel call. However, if the ac-ceptance rate increases to a higher temperature mode, themaster will reduce the number of parallel calls. In this man-ner, the CPU time and the cost of running a parallel machinewill stay in a reasonable range. At the end of each Markovchain, the master uses the lowest value and the average valueof the objective function provided by all t.~ winner slaves.to reduce the temperature as described in uuenes20. Thisnew algorithm is illustrated with a second geostatistical op-timization problem where a 50,400 (280x180) gridblockfield will be used (Fig. 8). The experimental variogram ofthis field shown in Fig. 6 will be matched by using a 2 ptstopology and 720 conditioning points. The parallel SAM istested on a CRAY Y-MP.

CRAY Y-MP IMPLEMENTATION

The parallel architecture of the CRAY Y-M P is conve-nient because with a CRAY Y-MP we can use two or moreparallel slaves and only one processor. The main programwill represent the master, and the sequential SAMS to becalled in parallel are simply subroutines seql, seq2. etc. ofthe main program. The slaves can be called in parallel bythe CMID command used in FORTIIAN:

CMID $ parallelCMID $ case

call seqlCMID $ case

call seq2CMID $ caseCMID $ end caseCMID $ end parallel

With these simple commands the program will run seq 1 andseg2 in parallel. The subroutines will not run in parallel ontwo processors with the CM 1D command. However, anothercommand can force seql and seq2 to run in parallel on twoprocessors. This is not necessarily a more efficient way ofparallelization because the overhead time increases.

The maximum number of slaves allowed during the opti-mization and its effect on the convergence was tested on the50,400 gridblock field. First, the algorithm was run with-out any parallel ization, and convergence was achieved afterN, = 1,487,012 iterations. When using at most two slaves,or two parallel calls of the sequential algorithm, the totalnumber of iterations dropped to iV2 = 761,729 iterationswhich represents a speed factor SU F = 0.51, The speed-upfactor SU F, is defined as the ratio of the number of itera-tions when using the parallel SAM, divided by the numberof iterations obtained with the sequential SAM. When themaximum number of slaves is increased to 3, the speed-upfactor drops to SUF = 0.41. The speed up factor on the to-tal computation time is very ciose to the one obtained basedon the total number of iterations. However, the CPU timeincreases by a factor close to the inverse of the speed-up fac-tor. The reduction of total number of iterations reaches anasymptotic behavior after a maximum number of 6 slaveswhere the speed-up factor dces not exceed SU F = 0.38 forthis example. When using the parallel SAM, the acceptancerate increases significantly and the need to use more concur-rent calls, because of a low value of AR, almost vanishes.Therefore, it is expected to see little effect on the numberof iterations when doubling the number of concurrent callsfrom 3 to 6. The main reason is that the acceptance rate,AR, never drops below AR = 0.1 when using 3 concurrentcalls.

In addition to the parallelization of the optimizationmethod, each sequential subroutine was highly vectorized inthis application.

CONCLUSIONS

Based on the results presented in this paper, the folIowingconclusions can be drawn.!. The topology has a considerable effect on the number of

iterations and computation time required for convergence.

23

6 A New Fast ParaIlel Simulated Annealing Algorithm for Remrvoir Characterization SPE 26419

2.

3.

4.

5.

The lpt topology has the advantage of being the fastestfor the example considered in this paper. However, thestatistics of the obtained realization is slightly differentfrom the original statistics of the training image. The de-gree of variation in the statistics depends on the problemand cannot be known a priori.The quality of the images measured by the correlation co-efficient are different for the three topologies considered.The 2 pts topology provided the best image for the exam-ple considered.A new parallel simulated annealing algorithm was pro-posed that uses in an optimal way the parallel CPU.A substantial reduction in computation time was obtainedwhen using the developed parallel simulated annealing al-gorithm on a CRAY Y-MP.

NOMENCLATURE

AR = acceptance rateh = lag distanceJ = objective function (dimensionless)LC = Markov chain lengthLS = number of successful iterations during one temper-

ature levelN = total number of gridblockNC = total number of Markov chains during the opti-

mizationNP = total number of processorsTC = computation timeY = stochastic variable for training imagez = stochastic variable for simulated imagecl = anisotropic direction

1’ = variogram valueA = reduction factor for f?0 = annealing temperature (dimensionless)

Superscripte = experimentals = simulated

Subscripte = elementary perturbationi = index for gridblock ipar = parallelseq = sequentialo = index for initial d

ACKNOWLEDGMENTS

The authors wish to thank Patrick Siarry (Ecole Centrale deParis) for his valuable comments and suggestions. Appre-ciation is extended to Akhtarul 13assan (University of NewMexico), for his help in the parallelization work. The au-thors would like to thank CRAY Research Inc. for their sup-port and for providing machine time. Ahmed Ouenes wouldlike to thank Mobil R&D, Gaz de France DETN, MarathonOil Company, the New Mexico Petroleum Recovery ResearchCenter, and New Mexico Tech’s Petrcdeurm Engineering De-partment for their active support. The authors would liketo thank Scott Richardson for the graphics, Steve Whitlach,and K. Stanley for reviewing the paper.

REFERENCES

1,

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

Farmer, C. L.: “Numerical Rocks,” The Mathematics ofOil Recovery, P. R.. King (cd. ) Clarrmdon Press, Oxford(1992), 437.Kirkpatrick, S., Gelatt Jr, C. D., Vechi, M. P,: “Optimiza-tion by Simulated Annealing,” Science ( 1983) 220, 671.Siarry, P. and Dreyfus, G.: “An Application of PhysicalMethods to the Computer Aided Design of Electronic Cir-cuits,” Journal de Physique, LeUres ( 1984] 45, L-39.

dernj.,V.: “A Thermodynamical Approach to the Travel-ing Salesman Problem,” J. OSOptimization Theory andApplications (1985) 45, 41.Deutsch, C. and Journel, A.: GSLIB; G’eostatistical Sofi-ware Library, Oxford University Press, New York (1992).Ouenes, A., Meunier, G., Pelc6 V., and Lhote, 1.: “En-hancing Gas Reservoir Characterization by Simulated An-nealing Method (SAM),” paper SPE 25023 presentedat the 1992 European Petroleum Conference, Cannes.France, Nov. 16-18.Ghori, S., Ouenes, A., Pope, G., Sepehrnoori, K., andHeller, J.: “The Effect of Four Geostatistical Methodson Reservoir Description and F1OWMechanism ,“ paperSPE 24755 presented at the 1992 SPE Annual TechnicalConference and Exhibition, Washington, D.C., Oct. 4-7.Sen, M., Datta Gupta, A., Stoffa, P., Lake, L., and Pope,G.: “Stochastic Reservoir Modeling Using Simulated An-nealing and Genetic Algorithm,” paper SPE 24754 pre-sented at the 1992 SPE Annual Technical Conference andExhibition, Washington, D.C., Oct. 4-7.Ouenes, A., Bahralolom, 1., Gutjahr, A., and Lee, R.: “ANew Method for Predicting Field Permeability Distribu-tion,” Proc, 1992 Mediterranean Petroleum Conferenceand Exhibition, Tripoli, Libya, Jan. 19-22., 468-477.Ouenes, A., Bahralolom, I., Gutjahr, A., and Lee, R.:“Application of Simulated Annealing Method (SAM) toLaboratory and Field Anisotropic Porosity? Proc., 1992Lerkendal Petroleum Engineering Workshop, NorwegianInstitute of Technology, Trondheim, Norway, Feb. 5-6,107-118.Ouenes, A., Meunier, G., and Moegen, H. de: “Applica-tion of Simulated Annealing Method (SAM) to Gas Stor-age Reservoir Characterization ,“ paper 96e presented atthe 1992 Annual AICh E National Spring h!eeting, New-Orleans, March 29 -April 3.Ouenes, A., Bahralolom, I., Gutjahr, A,, and Lee, R.:“Conditioning Permeability Fields by Simulated Anneal-ing,” Proc., Third European Conference on the Mathe-matics of Oil Recovery, DeIft, Netherlands, June 17-19,41-50.Haldorsen, H. and Damsleth, E.: “Challenges in Reser-voir Characterization,” The American Association ojPetroleum Geologists Bulletin (1993), 77, No 4,541-551,Deutsch, C.: Annealing Techniques Applied to ReservoirModeling and the Integration of Geological and Engineer-ing (Wel/ T’esi) Data, Ph D Dissertation, Stanford [Univer-sity, Stanford CA (1992).Hird, K.B. and Kelkar, M.: “Conditional Simulation forReservoir Description Using Spatial and Well PerformanceConstraints,” paper SPE 24750 presented at the 1992 SPEAnnual Technical Conference and Exhibition, Washing-ton, D.C., Oct. 4-7.

24

●

SPE 26419 A}IMED OUENES, and NAJI SAAD ‘i

16.

17.

18.

190

20,

21.

Ouenes, A., Fasanino, G., and Lee, R.: “Simulated 22.Annealing for Interpreting Gas/Water Laboratory Core-flood ,“ paper SPE 24870 presented at the 1992 SPEAnnual Technical Conference and Exhibition, Washing- 23.ton, D.C., Oct. 4-7.Ouenes, A., Br+fort, B., Meunier, G., and Dup&6, S.: “ ANew Algorithm for Automatic History Matching : Appli-cation of Simulated Annealing Method (SAM) to Reser- 24.voir Inverse Modeling,” Unsolicited paper SPE 26297(1993). 25.

Sultan, J .A., Ouenes, A., and Weiss, W.: “Reservoir De-scription by Inverse Modeling: Application to EVGSA UField ,“ paper SPE 26478 to be presented at the 1993 26.SPE Annual Technical Conference and Exhibition, Hous-ton, Oct. 4-6.Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., 27”Teller, A. H., and Teller, E.: “Equations of State Calcula-tions by Fast Computing Machines,” J. Chernicai Physzcs(1953) 21, 1087.Ouenes, A.: Application of Simulated Annealing to Reser-

28.

voir Characterization and Petrophysics Inverse Problems,PhD dissertation, New Mexico Tech, Socorro, NM (1992).Lundy, M. and Liees, A.: “Convergence of an AnnealingAlgo~thm,” Mathernuficaf Progran;mzng, ( 1986), 34, 11~ 29.

Otten, R. H.J.M and Van Ginneken, L.P. P. P.: ‘The Anneal-ing Algorithm, Kluwer Academic Publishers, Dordrecht(198!3).Ouenes, A., Gutjahr, A.: “Conclitioning Random Fieldsby Simulated Annealing Method (SAM) Using MultipleSwapping,” accepted for publication in Mathematical Ge-ology.Azencott, R.: Simulated A nnealing ; Parallelization Tech-niques, John Wiley, New York (1992).Aarts, E., de Bent, F., Habers, J,, and Van Laarhoven.P.: “ Parallel Implementations of the Statistical CoolingAlgorithm,” Integration 4, 209-238.Kravitz, S. and Rutenhar, R.: “Placement by Simu-lated Annealing on a Multiprocessor,” IEEE Trans. CAD(1987), 6, No 4,534-549.Roussel-Ragot, P., Siarry, P. and Dreyfus, G. “LaM6thode du Recuit Simu16: Principe et Parall$lisation,”paper presented at the second national colloquim for elec-tronic circuit design, Grenoble, France (1986).Roussel-Ragot, P. and Dreyfus, G.: “Parallel Anneal-ing by Multiple Trials: An Experimental Study on aTransputer Network,” Simulated Annealing: Paralleliza-tion ?’echnigaes, R. Azencott (cd), John-Wiley, New York(1992).Casso~o, A., Romeo, F. and Sangiovanni-Santelli, A.: “A Parallel Simulated Annealing Algorithm for the Place-ment of Macro Cells,” IEEE Trans. CAD (1987) 6, 838-847.

Table 1: Topology Comparison

TODO I Iterations I Run Time I CC

1 m I 99,430 I 1 I 0.351

2 Pts I 139,780 I 2 I 0.427

4 pts ! 758,931 I 18.7 [ 0.324

25

102

10’

“g10°ez●❇10-’Ga

●-

81()-2

1o-t

Fig. I: Rate ofconvergence for fieldlSPE26i I b .

1 I I I I “1

—1--- 2....... 4

pt topopts topopk tapo

102

al>.-5 -2.glo

................8

0 5 10 15Iterations (xl O’*4)

..’ 4..”.,. . ““””,........ ... ... .... .. ... . . ...... . . D O..*..,,,..

. . . . . “.”..”. . . .. . . . . . . . . . . ..”.....?... ... . .

[

.

\\ I 1 I I I

o 1 2 3 4 5 6 7Iterations (xl 0“5)

Fig. 2: Acceptance rate, AR, for field 1

Iv

o 5 10Iterations (xl 0“4)

0.25

0.2

%:0.153

~ 0.1

Fig. 3: Histogram of lpt topo reaUzation

Training Image0001 Pt. Topo.

5Log Permeability

10

26

SPE26419●

Fig. 4 Grayscale maps for field 1

a. Training Image b, 1 pt topo

c. 2 pts topo d. 4 pts topo

27

—

2

1.5

EJ

gl.-%>

0.5

Q

2

1.5

E~!?1.-%>

0.5

0

a. Initial Variogram

Experimental........ Simulated

20 40Lag Distance

b. Optimal Variogram

Experimental........ Simulated

20 40

SIPE~6419 ●

a. Initial Variogram

)2 “

E ●

fgl.5.m9“~ 1 Experimental

........ Simulated

oo~100 200 300Lag Distance

b. Optimal Variogram2.5

“o 100 200 300Lag DistanceLag Distance

Fig. 5: bIatching experimental variograms for field 1 Fig. 6: Matching experimental variograms for field 2

28 ..

sPE26& 19

--

R

● Fig. 7: Effect of CRA.Y parallel SAM on the rate of convergence – -——- ● *,

.

2- ,

— Sequential-- 2 parallel“-” - 3 parallel

1.5 - k = 1487012“oHo” 6 parallel

GtaE SW= Ni/Ns~1 - Ns: SW = 10.-~ N2: S(.JF= 0.51a ;...~ N3: SUF = 0.41

O*5- “.:.O N6 = 562934 N6: SUF = 0.38“..*,

--~. ~.~,0. . n1O“z 10-’ 1o“ 10’

Objective Function

Fig. 8: Grayscak map of 50,400 gridblock for field 2

29

Documents

A New, Fast Parallel Simulated Annealing Algorithm for