Using Multiple Surrogates for Metamodeling

Apresentao do PowerPoint

1

Using Multiple Surrogates for MetamodelingRaphael T. Haftka (andFelipe A. C. Viana University of Florida

1This lecture is mostly based on a conference presentation at the conference below:

F.A.C. Viana and R.T. Haftka, "Using multiple surrogates for metamodeling," 7th ASMO-UK/ISSMO International Conference on Engineering Design Optimization, Bath, UK, July 7-8, 2008.

Its main message is that many surrogates are available for approximating expensive computer simulations and that you should not commit yourself to one.

More details are available in the following journal papers.

Viana, F.A.C., Haftka, R.T. and Steffen V., (2009), Multiple surrogates: how cross-validation errors can help us to obtain the best predictor, Structural and Multidiscipilanary Optimization, Vol. 39(4), 439457.

Glaz, B, Goel, T, Liu, L, Haftka, RT, Friedmann. (2009)Multiple-Surrogate Approach to Helicopter Rotor Blade Vibration Reduction AIAA Journal ,Vol 47(1), 271282.2KEY QUESTIONS

Is it possible to choose the best surrogate for a given problem?What are the advantages of using multiple surrogates in optimization?Surrogates are only approximations and as such they incur errors.This raises questions that will be discussed in this lecture:

2The figure shows contours of a well known test function and how it is approximated by two different surrogates. Surrogates typically replace expensive computer simulations, but as this example shows they are only approximations. Furthermore, we have the choice of many kinds of surrogates.

So the questions discussed in this lecture are whether we can choose the best surrogate for a given problem, and whether if we use surrogates for optimization we can benefit from using several simultaneously.3WHY SO MANY SURROGATES?Different statistical models:Response surface assumes known function and noise in data.Kriging assumes exact data and random function.

Different basis functions (polynomials, radial basis functions).

Different loss functions (mostly SVR):RMS is not sacred (L1 has some advantages).

Popular surrogates: polynomial response surface. (PRS) kriging models (KRG), support vector regression (SVR). Each has multiple flavors.

3There are half a dozen common type of surrogates, with the most popular being polynomial response surfaces (PRS), kriging (KRG), and support vector regression (SVR). Each one of these types has multiple flavors. For example, in PRS one can choose different degree polynomials, such as linear or quadratic.

The different types of surrogates are based on different assumptions on the data and the true function. For example, response surface fits typically assume that we know the functional form of the true function, but the data is contaminated by noise. In contrast, the common variants of kriging assume that the data is accurate, but all we know about the function is that it can be modeled by a Gaussian process.

Then there is the question of how we compare the goodness of two alternative fits. The common measure (loss function) is root mean square (RMS) difference between the data and the fit. However, if most of the data is very accurate but there is one or two very inaccurate data (outliers), an L1 loss function (average of absolute errors) is less influenced by the outliers. In the figure, we have data that comes from a linear function, except at x=2, there is a large error. A quadratic response surface (PRS2) will be affected strongly by this outlier, while kriging my end up with strange shape trying to fit it exactly. On the other hand, a polynomial fit that minimizes the L1 measure will completely ignore the errant data. This loss function is often used with SVR.4WE USE ROOT MEAN SQUARE ERRORRoot Mean Square Error (RMSE) in design domain with volume V:where is error in prediction of the surrogate model, .Compute RMSE by Monte-Carlo integration at large number of ptest test points:Used only for assessing accuracy for test problems.

45DOE FOR FIT AND DOE FOR INTEGRATION

Example of design of experiments (DOE) used for fitting functions of 2 variablesTwo of the five highly dense DOEs used for RMSE estimation. RMSE is the average of the values obtained with the five DOEs.

5The process of testing the performance of a surrogate for a given test function starts by generating a set of points where the function will be evaluated. This set of points is called design of experiments (DOE). The figure on the left shows the normally sparse DOE that is used for fitting. For the two dimensional examples that are used in this lecture, we will often use 12 data points as shown in the left figure.

On the other hand, for checking the goodness of the fit, we will use highly dense DOEs, and we will compensate for their randomness by averaging the integrals from five different DOEs. Two of them are shown in the figure on the right in red and black.6HOW WE GENERATE A LARGE SET OF SURROGATESA set of 24 basic surrogates is generated by varying the model technique and the respective associated parameters.

6To illustrate the impact of choosing a surrogate, it was important to have a large group that included representatives of all the popular types of surrogates. Of the 24 selected for that purpose, the largest number were SVR surrogates, because the SVR package permitted great flexibility in choosing the kernels (i.e. the base functions) and the loss function. The parameter C controls the balance between the desire to have a fit with low values of the loss function, and the desire to have a flat surrogate (which usually translates to less complex one). The epsilon parameter specifies the size of errors that are considered to be noise and do not contribute to the loss function.

Kriging also had 6 different surrogates, because it offers different trends (the polynomial regression part), and different correlation function that controls how the function values at different points are correlated based on their distance from one another.

All of these surrogates were available from a Matlab toolbox created by Viana.7NO BEST SURROGATE EVEN FOR GIVEN FUNCTIONBranin-Hoo function (100 DOEs)

12 Points20 PointsFor 11 test problems, 12 surrogates were the best at least 10 times. Every problem had at least 2 surrogates that worked the best at least 10 times.

7This slide illustrate the fact that usually there is no single surrogate that is the best even for a given problem. This is illustrated by performing the fit for each problem with 100 different designs of experiments (DOE). These were generated using a DOE approach called Latin Hypercube Sampling (LHS, see lecture on space-filling DOEs), which has a random element in it, but it tends to spread points evenly, as shown on the left figure in Slide 5.

The results in this slide, are for the Branin-Hoo test function defined by the equation and shown in the figure on the left. The top figure on the right, shows the surrogates that gave the smallest RMS error when 12 points were used for fitting. It is seen that four different SVR surrogates did best for 93 out of 100 DOEs. However, when the number of data points was changed from 12 to 20, 95 percent of the time kriging was best (the even number kriging surrogates use Gaussian correlation function, 2 uses a constant trend, 4 uses a linear polynomial, and 6 uses a quadratic polynomial).

Altogether, 11 test functions were used ranging in dimension between 2 and 12. For each one, the second best surrogate, was best for at least 10 out of the 100 DOEs, and for the set of problems as a whole, there were 12 surrogates out of the 24 that were best at least 10 times (out of 1100 cases).

These results illustrate that it is risky to choose surrogates a priori for a problem, in spite of a large number of papers that advocate the advantages of one type of surrogate or another.8

TEST PROBLEMSOther analytical test functions

8These are the other test functions. Their equations are given in the paper. The box plots shown here merely convey the distribution of points in the design box.

The number of points used for fitting was chosen to be twice the number of the coefficients of a quadratic polynomial. So for example, in 12 dimensional space a quadratic polynomial has 12x14/2=91 coefficients, and so for the Dixon-Price function, 182 points were used in the fitting DOEs. On the other hand, the accuracy of Monte Carlo integration is only weakly influenced by the dimensionality, and so the number of points used for integration rose very slowly with the dimension.9AIRFOIL APPLICATIONAIRCRAFT TAKEOFF PERFORMANCE: Requires airfoil lift, drag and pitching moment coefficients.11 design variables: 10 variables describing the airfoil and one for the angle of attack.

450 simulations: 156 points for fitting and 294 points for RMSE computation. Points selected randomly for 100 DOEs.

9One problem was selected to be an expensive simulation problem where we cannot afford to calculate the RMSE with many thousands of integration points. However, the number of test points of 294 is still almost double the 156 fitting points. Also, the fact that we repeat the process with 100 different subsets of the 450 simulations reduces the chance that the results would be misleading.10CROSS-VALIDATION ERRORSOne data point is ignored and surrogate fitted to other p 1 points.

Repeat for each data point to obtain the vector of PRESS errors, .

For large p, k-fold strategy used instead. Leaves k points out each time.With the PRESS vector, estimate RMSE as:

We can now compare surrogates on the basis of their PRESS error.

10

11CORRELATION IMPROVES WITH NUMBER OF POINTSCONCLUSION: With enough points (even sparse) can use PRESSRMS to choose a good surrogate.

Mean value of the correlation between PRESSRMS and RMSE (out of 100 experiments).

11Using the cross-validation (or PRESS) error to choose a surrogate would make sense if there is strong correlation between the PRESS error and the RMSE error. The figure shows the correlation for the vectors of 24 PRESS and RMS errors obtained for the 24 surrogates averaged over the 100 DOEs. We see that the correlation is good except for small number of points (12 or 20). This indicates that the PRESS error can help us identify the more accurate surrogates among the 24 we included.12FREQUENCY OF BestRMSE vs. BestPRESSFor large number of points, the best 3 surrogates according to both RMSE (in blue) and PRESSRMS (in red) tend to be the same.

12In terms of identifying the most accurate surrogates, the figures compare the number of times surrogates were found to be most accurate on the basis of PRESS errors and RMS errors. It is seen that for small number of points the correspondence is not very good, but for large number of points the PRESS error is almost sure to select a surrogate that is in the top 3 (out of 24) in terms of the true error (RMSE).13COMPUTATIONAL COSTWall (wait) time on an Intel Core2 T5500 1.66GHz, 2GB or RAM laptop, running MATLAB 7.0 under Windows XP.

13Calculating PRESS errors for all surrogates when the number of points in large, can be expensive. Here the computational costs of fitting surrogates and calculating the PRESS errors are compared. The calculation of PRESS errors for all 24 surrogates for the Rosenbrock function took 11 hours on the computer used in 2008.

11 hours may appear high, but for expensive simulations this is often less than the cost of a single simulation, and we need 110 of them. As computers grow faster, simulations always expand in complexity and continue to demand hours and days of computer time. So the cost of the PRESS error calculation will shrink in comparison to the cost of simulations.14PASSIVE HELICOPTER VIBRATION REDUCTIONIn helicopters, the dominant source of vibrations is the rotor (Nb/rev).

In the passive approach:Objective function consists of a suitable combination of the Nb/rev hub loadsConstraints: stability margin, frequency placement, autorotation, side constraintsDesign variables: cross-sectional dimensions, mass and stiffness distributions along the span, pretwist, and geometrical parameters which define advanced geometry tipsAerodynamic environment is expensive to modelGlaz, B, Goel, T, Liu, L, Haftka, RT, Friedmann. (2009)Multiple-Surrogate Approach to Helicopter Rotor Blade Vibration Reduction AIAA Journal ,Vol 47(1), 271282

14An example that illustrates the advantage of the use of multiple surrogates is taken from a paper by Glaz et al. (see below). It concerns the passive reduction in helicopter blade vibration by changing geometry and mass distributions. The aerodynamic environment is expensive to model, so vibration simulation is expensive and can benefit from using a surrogate.

Glaz, B, Goel, T, Liu, L, Haftka, RT, Friedmann. (2009)Multiple-Surrogate Approach to Helicopter Rotor Blade Vibration Reduction AIAA Journal ,Vol 47(1), 27128215OPTIMIZATION PROBLEMObjective function to be minimized: Weighted sum of the 4/rev oscillatory hub shear resultant and the 4/rev oscillatory hub moment resultant17 Design Variables: t1, t2, t3, and mnsThree thickness defined at 0%, 25%, 50%, 75%, and 100% blade stationsNon-structural mass is defined at 68% and 100% stations

15The objective function weights the force resultants and moment resultants on the blade, and there are 15 thickness design variables (three thicknesses shown in the figure) at five locations on the blade, varying linearly in between. Two more variables are added non-structural mass at two locations on the blade.16EXPENSIVE STRESS CONSTRAINTAssuming isotropy, Von Mises criterion is used to determine if the blade yields, with a factor of safety.

Constraint is enforced at a set of discrete points.

Calculation of blade stresses is as expensive as a vibration objective function evaluation since a forward flight simulation is needed.

A surrogate used for this constraint.

1617Weighted (PWS) Surrogate ConstructionOptimal Latin hypercubes (OLH) used to create surrogatesOut of a 300 pt. OLH, 283 had converged trim solutions (53 hours)Out of a 500 pt. OLH, 484 had converged trim solutions (82 hours)Each simulation took 8 hours, with 40-50 run in parallelFitting plus PRESS took 7-10 minutes for 283 points, 30-40 minutes for 484Weights in table inversely proportional to PRESS error.WeightCoefficientSample SizeF4XF4Y

F4Z

M4X

M4Y

M4Z

JStressConstraintwpoly2830.410.400.320.370.310.290.330.35wkrg2830.480.470.460.460.450.410.460.44wRBNN2830.120.130.220.170.250.300.210.21wpoly4840.420.440.360.380.340.330.380.40wkrg4840.450.420.430.450.410.380.420.42wRBNN4840.130.140.220.180.260.290.200.18

171st Bullet: A quadratic polynomial is 17 variables has 171 coefficients. Two Latin hypercube samples were considered: a 300 point OLH, of which 283 were used for surrogate fitting, and a 500 point OLH, of which 484 were used for fitting. This illustrates the fact that often simulations fail, and one of the advantages of Latin Hypercube designs is that

Using multiple processors in parallel, 53 hours were required to generate the 283 fitting points, and 82 hours were required for the 484 points.

Table was generated for a study on using a weighted sum surrogate, with the weights higher for the more accurate surrogates. So each column in the table has three weights whose sum adds up to one.

For the 8 function which were approximated in this study (6 underlying responses, the overall response J, and the stress constraint) kriging generally was the most accurate and had the highest weight.

18Errors at 197 test points283: Kriging has lowest error484: Polynomials are lowest in some instances283 Sample Points484 Sample PointsAverage Errors

18The results on this slide show the average errors in the surrogates at 197 test points for the 6 force and moment resultants and the stress constraint surrogate.

For the 283 point sample set, the kriging surrogate has the lowest average error among the individual approximation methods for each response, while for some responses with the 484 point sample set, the polynomials corresponded to the lowest average errors. For instance, in the case of F4x, polys are better than kriging. So the choice of the ``best" surrogate in terms of approximating over the entire design space is dependent on the sample size for the responses considered in this study, which is one of the pitfalls of attempting to identify the best surrogate.

19Optimization ResultsAmong individual approximation methods:Poly. result in the best design with 283 sample points, and the worst design with 484 sample pointsRBNN lead to the best design with 484 sample pointsEach optimization required 2-4 hours with about 200,000 function evaluations.None of the surrogates led to the same design

SurrogateSample SizeVibration ReductionPoly.28364%KRG28354%RBNN28357%Poly.48445%KRG48456%RBNN48468%Vibration reduction (relative to MBB BO-105)

19The results on this slide correspond to optimization of the surrogate objective function generated by fitting the overall response. Results for optimization of the objective function built from the surrogate underlying responses can be found in the paper.

Each surrogate required 2-4 hours to optimize using 200,000 function evaluations with a genetic algorithm from iSIGHT. Since each surrogate could be optimized independently, the optimizations were conducted in parallel. Thus only 4 hours were required to optimize all surrogates, which is relatively small compared to the 53 hours needed to generate the 283 fitting points, and 82 hours for the 484 points.

First bullet: 1st sub-bullet - This illustrates a problem with only optimizing a single surrogate - if one were to perform optimization with 484 sample points using only polynomials because polynomials were the best with 283 sample points, then this would result in the worst design.

2nd sub-bullet - Notice that the RBNN led to the best design among the individual surrogates for 484 points, yet are the least accurate. So, if one neglected to optimize the RBNN because it is the least accurate, a good design would have been missed. On the other hand, for the relatively low additional cost of optimizing multiple surrogates, such issues would be overcome.

20CONCLUDING REMARKSThe most accurate surrogate for a given function depends on the design of experiments and point density.The cross validation error identifies accurate surrogates well, especially as the number of points in the DOE increases.Cost of fitting multiple surrogates and calculating cross-validation errors low enough to use now for most expensive simulation problems.Optimizing with several surrogates adds little to overall cost, and the best design may be obtained by a less accurate surrogate.

20

Documents

Using Multiple Surrogates for Metamodeling