Optimization of car body under constraints of noise, vibration, and harshness (NVH), and crash

Industrial applications and design case studies Struct Multidisc Optim 22, 295–306 Springer-Verlag 2001

Optimization of car body under constraints of noise, vibration,and harshness (NVH), and crash

J. Sobieszczanski-Sobieski, S. Kodiyalam� and R.Y. Yang

Abstract A car body structure was optimized for mini-mumweight under the constraints of noise, vibration, andharshness (NVH), and a crash event, using up to 254 con-currently operating processors. The crash analysis alone,if executed on a single processor and repeated the numberof times this optimization required, would have taken 257days of elapsed computing time. Parallel processing hascompressed the elapsed time to one day demonstratinghow a multiprocessor machine may be useful in solvingengineering tasks that heretofore were regarded as in-tractable. The optimization procedure transformed thestructure initially infeasible to one having its weight re-duced and all the constraints satisfied. The experiencegained in the reported application indicated it is import-ant to tailor the solution method to the characteristicsof the multiprocessor computer architecture and to un-derstand the data handling options offered by that archi-tecture. Another conclusion drawn from this case is thatthe coarse-grained parallelism whereby an existing codeis being replicated over an array of processors should beregarded as an effective way of utilization of multiproces-sor machines, immediately available in the interim beforesolutions are redeveloped from ground up specifically forthat class of machines.

Key words Multidisciplinary Design Optimization(MDO), High-Peformance Computing (HPC), crashwor-thiness, approximations

Received May 15, 2000

J. Sobieszczanski-Sobieski1, S. Kodiyalam 2 and R.Y. Yang3

1NASALangley Research Center,MS 139Hampton,VA,USAe-mail: [email protected] SGI (Silicon Graphics, Inc.), HPC Applications Engineering& Market Development, MS 405 Mountain View, CA 94043,USAe-mail: [email protected] Ford Motor Company Vehicle Safety R&D DepartmentDearborn, MI 48121, USAe-mail: [email protected]

� formerly with Engineous Software Inc.

1Introduction

Occasionally, problems arise in structural mechanics thatare so computing intensive that they acquire reputationof intractability. Optimization of a car structure for mini-mumweight under the constraints imposed by the regula-tions pertaining to the passenger protection in crash is inthis category. Analysis of a car body structure for the dy-namics and nonlinearities of a crash event may, typically,require the elapsed computer time measured in weeks forone particular design if executed on a state-of-the-art sin-gle processor. Consequently, optimization in which theabove analysis would have to be repeated many times hashardly ever been attempted.

Increasing availability of computers with many pro-cessors begins to enable what was impossible before, asdemonstrated by Chargin and Miura (1999), Stander(1999), Schramm et al. (1999), Yang et al. (1994). Thispaper reports on another instance of a previously in-tractable optimization in which a car body structure forminimum weight under the constraints of crash, those ofthe noise, vibration, and harshness (NVH).

The car body structure was represented by a fi-nite element model of 390 000 elastic degrees of freedom(Sect. 3). Its NVH analysis was performed by the MSC/NASTRAN code (MSC/NASTRAN Guide 1998), whilethe crash analysis (a single scenario) was carried out bythe RADIOSS Crash code (RADIOSS user manual 1998).In the optimization procedure, a parallel computing wasapplied in execution of the RADIOSS code in two ways.The code was internally parallelized using 12 processorson a SGI Origin 2000 computer. The computer contained256 processors, therefore, it was possible to perform 21simultaneous executions, each execution employing 12processors internally in RADIOSS. Analysis organized inthe above manner (often referred to as the coarse-grainedparallelization) was used to construct a Response Sur-face for the crash behaviour in the space of the designvariables.

The MSC-NASTRAN dynamic analysis and the sen-sitivity analysis available in that code yielded the NVH

296

behaviour variables and their derivatives with respect to30 design variables. Based on that data, an analyticalapproximation to the NVH behaviour as a function ofthe design variables was constructed and used in the op-timization procedure (Section: Optimization procedure)alongside the Response Surface representing the crashbehaviour in a separate design space of 20 variables.A search algorithm (Spellucci 1998) was directed to drawinformation from these two approximations. The mix oftwo dissimilar approximations derived from completelydifferent computer codes was a novel element in the op-timization procedure. Therefore, the procedure has beennamed the Optimization by a Mix of Dissimilar Analysesand Approximations (OMDAA).

The optimization procedure converged in three cycles,in each cycle the NVH data, including the derivatives,were refreshed and the crash Response Surface was en-riched with the analysis data generated in the previouscycle. One cycle was completed in a day while it wouldhave consumed 257 days if it were executed on a singleprocessor. The optimization took the design from the ini-tial “best guess” state in which some of the constraintswere violated to a state in which all the constraint vio-lations were removed and yet the structural weight wasreduced by 15 kg (Sect. 4).

As pointed out in the Concluding Remarks, a coarse-grained parallelism of the type used herein offers an ef-fective way in which to exploit concurrently operatingprocessors. Although the number of processors is limitedby the particulars of the application at hand, the ap-proach has a distinct advantage of accommodating exist-ing (legacy) codes unchanged.

2NVH and crash of the car structure: the model andthe problem

In this section, we introduce the elements of the overalloptimization problem. They are the Noise, Vibration, andHarshness behaviour, and the Crash event. Each of thetwo elements involves finite element modelling and designvariables. There are also design variables common to bothelements.

Fig. 1 NVH car body model

2.1Noise, vibration, and harshness (NVH)

Noise, Vibration, and Harshness (NVH) is one of the mostimportant attributes for car product development. A ve-hicle with a good NVH often results in a much highercustomers satisfaction. In car product development pro-cess, different NVH models are used for different pur-poses so that the quality of the NVH is high and thecost is at minimum. A car body called Body-In-Prime(BIP) is used for this study. The BIP is a trimmed bodywithout all the closures (door, hood, deck lid) and othersub-systems (steering column, fuel tank, and seats) andtrim items (carpeting, battery, etc.). A trimmed bodystructure may be thought of as a vehicle without the sus-pension and powertrain sub-systems. The BIP can alsobe thought as the “Body-In-White” with glass. The BIPplays an important role in determining the dynamic char-acteristics of the vehicle. The BIP normal modes, staticbending and static torsion analyses were conducted usingthe MSC/NASTRAN. The full scale NVH finite elementmodel is shown in Fig. 1. The total number of shell elem-ents is close to 68000. The total number of nodes is about69000. The normal modes were calculated under the free-free condition. The static bending analysis was conducedwith front (yz and z) and rear (xz and xyz) shock towersconstrained while for the static torsion rear shock towersupports (xz and xyz) and a mid point of the lower radi-ator support (z) were constrained. The bending stiffnesscalculated using a load applied at the front rocker loca-tions was 3676N/mm while the torsion stiffness calcu-lated using a torque applied at the front shock tower loca-tions was 9092N-m/Deg. The free-free normal mode an-alysis show that the overall torsion at 26.7 Hz and overallbending at 38.9Hz. The total number of design variablesfor the NVH model is 19, including 10 for backlite glassesand sheet metal thickness, 9 for the stiffness of connec-tion between the backlite glass and structures, as shownin Fig. 1. The thickness design variables contain floorpanels, jacking/towing on quarter panel, backlite glass,shotgun and radiator support. The initial thickness, stiff-ness design variables, and their lower bounds and upperbounds are given in Tables 1 and 2, respectively. The tor-sion frequency for the BIP free-free normal mode is to

297

Table 1 Thickness design variables for NVH

No. Design variable Initial Lower Upperdesign bound bound

1 Rear floor panel 0.76 0.5 1.02 Rear floor cross member 1.4 0.8 2.03 Front floor pan 0.76 0.5 1.04 Front floor inner 1.07 0.5 1.55 Jacking/towing on 0.8 0.5 1.5

quarter panel6 Quarter panel 0.8 0.5 1.57 Back light glass 3.8 2.6 5.08 Rear tyre cover 0.75 0.5 1.09 Shotgun 1.22 0.9 1.510 Radiator support 0.76 0.5 1.0

Table 2 Stiffness design variables for NVH


1 Top edge 1073.3 750 1395(x-component)

2 Top edge 366.9 256 478(y-component)

3 Top edge 2733.6 1912 3554(z-component)

4 Bottom edge 1424.5 1000 1850(x-component)

5 Bottom edge 487 340 630(y-component)

6 Bottom edge 3628.3 2540 1090(z-component)

7 Side edges 1521 1065 1977(x-component)

8 Side edges 520 365 675(y-component)

9 Side edges 3874 2710 5035(z-component)

increase by 10% to 29.32Hz. The upper bounds for statictorsion and static bending displacements are chosen as3.3mm and 1.1mm, i.e. 10% improvement from the ini-tial design.

Fig. 2 Roof crush finite element model

2.2Roof crush

Vehicle roof crush is a federally mandated requirement in-tended to enhance passenger protection during a roll-overevent. The test procedure is clearly defined in the Fed-eral Motor Vehicle Safety Standards (FMVSS 216). Thefinite element roof crush model for this study is convertedfrom a NVH model. The Crash finite element model isshown in Fig. 2 and the summary of its characteristicsis shown in Table 3. The explicit finite element dynamicsoftware Radioss was used for crush simulation. Some un-necessary parts in the NVH model are deleted and somemissing parts are added in the roof crush model, e.g. verydetailed side doors were added and the glasses are refined.The total number of elements for roof crush is about120000. A 72 inches by 30 inches square ram is added toperform the roof crush as specified by the FMVSS 216.The longitudinal axis of the ram (see Fig. 1) is at a for-ward angle (side view) of 5 degrees below the horizontal,and is parallel to the vertical plane through the vehicleslongitudinal centreline. The lateral axis is at a lateral out-board angle, in the front view projection, of 25 degreesbelow the horizontal. The lower surface is tangent to thesurface of the vehicle and initial contact point is on thelongitudinal centreline of the lower surface of the ramand 10 inches from the forward most point of the centre-line. In the RADIOSS simulation, the ram normal speed

Table 3 Numerical characteristics of the crash model

No. of nodal points 128826No. of boundary conditions 6085Analysis type:0 = 3D, 1 = AXYSYM, 2 = plane strain 0No. of 2D solid elements 0No. of 3D solid elements 0No. of 3D shell elements (4-nodes) 124868No. of 3D truss elements 0No. of property sets 286No. of 3D beam elements 2No. of 3D spring elements 2484No. of 3D shell elements (3-nodes) 0

298

Table 4 Door thickness design variables for roof crush


1 Front door 0.7 0.4 1.02 Front door inner 0.7 0.4 1.03 Rear door 1.0 0.7 1.3

Table 5 Yield stress design


1 A-pillar 1 MS MS HSS2 A-pillar 2 MS MS HSS3 A-pillar 3 MS MS HSS4 B-pillar 1 MS MS HSS5 B-pillar 2 MS MS HSS6 Front door inner 1 MS MS HSS7 Front door inner 2 MS MS HSS

MS: mild steel, HSS: high strength steel

was set to 7.5MPH. As described in the FMVSS 216, theforce generated by vehicle resistance must be greater than5000 lb (22 240N) or 1.5 times the vehicle weight, whichever is less, through 5 inches of ram displacement. In thisstudy, the roof crush resistant force was set to be 5400 lb.The door thicknesses and material yield stresses are cho-sen as the design variables, as shown in Tables 4 and 5respectively.

2.3Common design variables

The common design variables for the NVH and roof crushproblems are windshield, A-pillar, B-pillar, C-pillar androof thickness, as shown in Table 6.

Table 6 Common design variables


1 Windshield 3.8 2.6 5.02 Roof panel 0.7 0.4 1.03 Roof rail 0.8 0.6 1.24 Roof cross member 0.8 0.6 1.2

(front)5 Roof cross member 0.7 0.5 1.15 1.5

(rear)6 A-pillar 1 0.8 0.5 1.17 B-pillar 1 0.8 0.4 1.08 B-pillar 2 0.8 0.5 1.19 B-pillar 3 1.35 0.9 1.510 C-pillar 0.8 0.5 1.1

2.4Optimization problem statement

The optimization problem is stated in the following formthat specifies the minimum weight as the common objec-tive, and includes constraints from both NVH and Crashsubproblems:

given a set of design variablesX,

find: X

minimize: F [X,Y (X)]

satisfy:Gj [X,Y (X)] bounds onX . (1)

In the problem defined by (1), Y (X) represents the be-haviour (state) variables of NVH and Crash, F representsthe design objective function and Gj represents the de-sign constraints.

The NVH-crashworthiness optimization problem canbe specifically stated as a multidisciplinary problem in-volving the “disciplines” of NVH and Crash:

given a set of system (Z) and local (X) design variables,find:∆X and∆Zminimize weight of the car body structuresatisfy: static torsion displacement > −3.3mmstatic bending displacement < 1.1mfrequency (Mode3)26.65< ω3 < 29.32Hzcrash force at interface 2 (normal)> 24 kN over the crush distance < 5 in.Bounds on the design variables,X and Z.

In this optimization task, the NVH discipline has 30local design variables while the crash discipline has 20local design variables. A subset of these design variables(Z = 10) are common to both the NVH and crash disci-plines. The design variables are primarily sizing (thick-ness) variables and spring stiffness.

3Analysis tools and optimization procedure

In Sect. 3 we report on the principal tools used for the an-alysis and optimization procedure of a car body for mini-mum weight under the constraints of NVH and crashwor-thiness. We include also information on the computer onwhich the above was implemented.

3.1Analysis tools

NVH analysis and sensitivity computations were per-formed using the Solution Sequence 200 in MSC/NAS-TRAN v70.5 (MSC/NASTRAN Guide 1998). Since MSC/NASTRAN is a widely used tool the reader is referred tothe reference for additional information on its capabilitiesand usage.

299

Crashworthiness analysis was performed using theRADIOSS CRASH (Version 4.1b for SGI) code fromMECALOG in France (RADIOSS user manual 1998).RADIOSS is an explicit finite element analysis softwarefor performing dynamic, non-linear structural analysisinvolving large strains. The main features of RADIOSSinclude:

– an explicit time integration scheme,– finite element library including 8 node solid, 3 and 4node shell, beam and truss and various type of springelements

– contact computation include efficient adaptive algo-rithms

– various kinematic conditions including boundary con-ditions, rigid bodies, fixed velocities, rigid walls,rivets,

– over 35 different material laws for typical metals,composites, foams, glasses, honeycombs, plastics, andrubber. User laws can be implemented.

The global algorithm used in RADIOSS can be out-lined as follows.

Start at T = 0, where T represents time.While T < Tfinal:

Compute contact forcesCompute elements:

Strain, Stress, Internal ForcesIntegrate V =

∫γ dt

Compute kinematic constraintsIntegrateX =

∫V dt

Compute new time step dtT = T+dt

End while

Since crash simulation is an extremely CPU inten-sive application, the RADIOSS code has successfully im-plemented three main parallel programming paradigms,including the Shared Memory, Message Passing and DataParallel paradigms. The SGI Origin 2000 computer usedin this study uses a SharedMemory paradigm. For the carbody roof crush analysis reported in this paper, the in-ternal parallelization in RADIOSS is optimal for n= 12,n being the number of processors. This was experimen-tally determined through repeating the same analysiswith different number of processors, n = 1, 8, 12, 14,and 16.

3.2Optimization by a mix of dissimilar analysis andapproximations (OMDAA)

The solution procedure is a piecewise approximationbased optimization method. The solution procedure con-stitutes a method of some novelty wherein the novelty liesin the use of approximate models derived from differentcodes and different mathematical approaches:

– NVH – MSC/NASTRAN FEA and MSC/NASTRAN(Solution Sequence 200) internal semi-analyticalsensitivity used to construct a linear approximationmodel.

– Crash RADIOSS, explicit FEA, with internal paral-lelization. Multiple RADIOSS analyses are executedconcurrently to generate points to which a polynomialresponse surface is fitted to create another approxi-mate model.

This OMDAA procedure is similar to one presentedby Golovidov et al. (1999), for simplified analysis models.A flow chart of OMDAA solution procedure is shown inFig. 3.

First the NVH constraints and their sensitivities withrespect to the design variables are evaluated using MSC/NASTRAN. The crash constraints are evaluated usingthe code RADIOSS. The next step is the construc-tion of the approximation models for NVH and Crashresponses.

The conservative Taylor series approximation (Starnesand Haftka 1979) is used for approximation of the NVH(MSC/NASTRAN) responses. The conservative approxi-mation is given by

ak(x) = f(xk) =n∑i=1

gi(xk)(xi−xik

)φi(xi, xik

),

where

φi(xi, xik

)=

{1, if xikgi(xk)≥ 0

xik/xi otherwise

In the above formulae, ak(X) is the approximate func-tion at step k, f(X) is the exact function, gi is a partialderivative of f with respect to the i-th design variable.The crash constraints are evaluated using the code RA-DIOSS on the Origin 2000 operating on 256 processorssimultaneously to generate data for a polynomial re-sponse surface in the design variable domain. An “adapt-able” response surface model (RSM) implementation(Golovidov et al. 1999) is used for approximation of thecrash (RADIOSS) responses. In the adaptable RSM ap-proach, a linear approximation model is constructed ini-tially using (m+1) design points, where m is the numberof model inputs (m crash design variables).

The NVH constraints and their derivatives combinedwith the response surface for the crash constraints forman approximation to the system analysis (surrogate an-alysis) that enables a cycle of multidisciplinary opti-mization within move limits. The numerical optimizationalgorithm used for solving the optimization problem isthe DONLP Sequential Quadratic Programming imple-mentation described by Spellucci (1998). In the innerloop of Fig. 1, the NVH sensitivities are recomputed toupdate the NVH approximation model while keepingthe Crash response surface model constant. After thebest design is found using these approximation modelswithin the specified move limits, the design is analyzed

300

Fig. 3 OMDDAA procedure flow chart

using RADIOSS and the Crash response surface modelis updated as part of the outer loop. Each cycle of theouter loop involves one RADIOSS analysis on the op-timal design point and therefore one additional pointfor the response surface model, enabling a gradual in-crease in the order of the response surface and the re-sponse surface extension in the direction of the search.Alternatively, during each outer loop if the design vari-ables have moved to their limits, additional RADIOSSanalysis could be performed concurrently for updat-ing the response surface model in the new region ofinterest.

The outer loop is repeated for successive cycles untilthe system convergence conditions are met. The conver-gence conditions within the inner loop corresponds tosatisfaction of the Kuhn-Tucker conditions while for theouter loop it represents a point of diminishing returns inthe system objective as well as the feasibility of all designconstraints.

4Results and their discussion

Results to be reported in this section are categorized intothe car body structure data before and after the optimiza-tion, the computational performance on the optimizationprocedure, and extrapolation to future applications. Thelatter includes recommendations for incorporation of theprocedure in the actual design process.

4.1The car body structure data before and after theoptimization

The results in terms of the design variable and behaviourfunction values are documented in Table 7. The accuracyof the approximationmodels are tabulated in Table 8 andthe final design deformed vehicle shape is shown in Fig. 4.

As seen from Table 7, the initial design is an infeasi-ble design with NVH discipline Static Torsion constraintviolations of over 10%. The final design is a feasible de-sign with a weight reduction of 15 kg relative to the initialdesign.

Table 7 shows that the optimization procedure wasvery judicious in choosing for instance, variable #33,

Fig. 4 Deformed shape after root impact corresponding tofinal design

301

Table 7 Design variables: initial and optimized variables

No. Attribute name Initial Cycle 1 Cycle 2 Lower Upperdesign (N = 3) (N = 2) bound bound

NVH design variables

1 rear floor panel 0.76 1.0 1.0 0.5 1.0

2 rear floor cross member 1.4 2.0 2.0 0.8 2.0

3 front floor pan 0.76 0.5 0.5 0.5 1.0

4 front floor inner 1.07 1.0923 1.1926 0.5 1.5

5 Jacking/towing 0.8 1.5 1.5 0.5 1.5

6 Quarter panel 0.8 0.8876 0.8876 0.5 1.5

7 Back light glass 3.8 2.6 2.6 2.6 5.0

8 Rear tyre cover 0.75 1.0 1.0 0.5 1.0

9 Shotgun 1.22 1.3681 1.2643 0.9 1.5

10 Radiator support 0.76 0.5 0.5 0.5 1.0

11. Top edge 1073.3 1070.1 1193.41 750.0 1395.0

(x-component)

12 Top edge 366.9 477.95 478.0 256.0 478.0

(y-component)

13 Top edge 2733.6 2733.37 2734.24 1912.0 3554.0

(z-component)

14 Bottom edge 1424.5 1417.75 1438.51 1000.0 1850.0

(x-component)

15 Bottom edge 487.0 484.41 629.97 340.0 630.0

(y-component)

16 Bottom edge 3628.3 3627.95 3632.52 2540.0 5090.0

(z-component)

17 Side edges 1521.0 1518.30 1914.19 1065.0 1977.0

(x-component)

18 Side edges 520.0 513.96 675.0 365.0 675.0

(y-component)

19 Side edges 3871.0 3873.39 3886.59 2710.0 5035.0

(z-component)

Common design variables to NVH and Crash

20 windshield 3.8 2.6 2.6 2.6 5.0

21 roof panel 0.7 0.4 0.4 0.4 1.0

22 roof rail 0.8 0.6 0.6 0.6 1.2

23 roof cross member 0.8 0.6 0.6 0.6 1.2

front

24 roof cross member 0.7 0.5 0.5 0.5 1.1

rear

25 A-pillar 0.8 1.0998 1.0971 0.5 1.1

26 B-pillar 1 0.8 0.7944 0.7788 0.4 1.0

27 B-pillar 2 0.8 0.5 0.5 0.5 1.1

28 B-pillar 3 1.35 0.9 0.9 0.9 1.5

29 C-pillar 0.8 0.5 0.5 0.5 1.1

Crash design variables

30 Front door thickness 0.7 0.4 0.4 0.4 1.0

31 front door inner t 0.7 0.4 0.4 0.4 1.0

32 rear door thickness 1.0 0.7 0.7 0.7 1.3

33 A-pillar 1 0.207 0.345 0.192 0.192 0.345

34 A-pillar 2 0.207 0.345 0.345 0.192 0.345

35 A-pillar 3 0.207 0.345 0.192 0.192 0.345

36 B-pillar 1 0.207 0.345 0.345 0.192 0.345

37 B-pillar 2 0.345 0.192 0.345 0.192 0.345

38 Front door inner 1 0.207 0.192 0.192 0.192 0.345

39 Front door inner 2 0.207 0.345 0.345 0.192 0.345

302

Table 7 Continued

NVH and Crash outputs – problem objective and constraint responses

1 NVH weight (kg) 282.44 282.70 282.53 objective minimize

2 crash weight (kg) 1255.65 1240.3 1240.2 objective minimize

3 mode 3 (Hz) 26.65 29.32 29.32 26.65 29.32

4 static torsion 3.76 3.29 3.29 none 3.3

Z displacement (violated)

(mm)

5 static torsion −3.68 −3.31 −3.31 −3.3 none


(mm)

6 static bending −0.97 −0.97 −0.935 none 1.1


(mm)

7 crash: NF-normal 34.69 28.82 29.43 24.0 none

reaction at I/F 2 (kN)

8 internal energy 3015.79 2331.7 2400.97 none none

Table 8 NVH and crash approximation model errors (after cycle 2)

Response name Response values % Error between actualand approximate values

actual aprpoximate

weight (kg) 1522.73 1522.69 0.0mode 3 frequency (Hz) 29.32 29.32 0.0static torsion (mm) 3.29 3.30 0.3static bending (mm) −0.935 −0.895 4.3crash normal force (kN) 29.43 30.57 3.9internal energy 2400.9 2617.9 9.0

A-pillar yield stress parameter, has been increased by67%, while variable #28, C-pillar dimension, has been re-duced by 33%.

Convergence of optimization based on approxima-tions, such as OMDAA, depends on the accuracy of thepredictions the approximations make in regard to the be-haviour data requested by the optimizer. The accuracycan be assessed by comparing the behaviour predictedby the analysis performed at the outset of a new cyclewith the predictions generated by the approximations inthe previous cycle. Table 8 displays such comparisons byshowing the data obtained by analysis, labelled Actual,and those based on the polynomial crash response surfacemodel, labelled Approximate. The errors are relativelyquite small attesting to the effectiveness of the approxi-mations used in this application.

4.2Computational performance on the optimizationprocedure

The optimization procedure performance data presentedhere include the wall clock (elapsed) times, and CPU

times as functions of the number of processors and of thechoice of the common data placement option.

Figure 5 shows the wall time required for a singleRADIOSS analysis using varying number of processors.Figure 5 also shows a comparison of 2 different commondata placement procedures within the allotted memories

Fig. 5 Wall time (hrs.) for a single RADIOSS analysis withvariable number of processors

303

for each analysis. The wall clock and CPU Time Statis-tics Data for N Processors, where N is 1, 12, 14 or 16 areshown below for the First Touch common data handlingoption.

1 CPU: Wall time: 454 hrsCPU Time: 454 hrs12 CPUs: Wall time: 38:05:06 hrsCPU time: 454:04:25 hrs14 CPUs: Wall time: 41:05:19 hrsCPU time: 571:51:57 hrs16 CPUs: Wall time: 42:43:47 hrsCPU time: 679:32:35 hrs

Benchmark analyses were also performed to deter-mine the optimal number of processors and to choosea common data placement procedure from two optionsavailable. The Origin 2000 computer architecture pairsup the CPUs and assigns a common fast memory unit toeach pair. Thus, when an analysis solution is performedusing 12 CPUs, 6 different local memory units are en-gaged. If there is a certain number of data common toall the processors, these common data may either be allstored on one of those local memory units, or they may bereplicated and stored on each local memory unit. The firstoption is called First Touch, the second is named RoundRobin, both were experimented with and the results areshown below.

12 CPUs, FIRST_TOUCH:Wall time: 38:05:06 hrs CPU time: 454:04:25 hrs12 CPUs, ROUND_ROBIN:Wall time: 24:26:21 hrs CPU time: 291:29:36 hrs14 CPUs, FIRST_TOUCH:Wall time: 41:05:19 hrs CPU time: 571:51:57 hrs14 CPUs, ROUND_ROBIN:Wall time: 23:18:19 hrs CPU time: 323:59:03 hrs

In general, the First Touch advantage is storage ef-ficiency because each datum is stored only once, butthe Round Robin is obviously faster and should be usedwhenever the storage space is not a constraint. On thebasis of experimentation illustrated by Fig. 5, the num-ber of processors to be used concurrently in a single RA-DIOSS execution was set at 12, and ROUND ROBINwas selected for the data placement. The shortest timecorresponds to 14 processors. However, the total num-ber of processors available in the machine was 256 andthere were 20 design variables for which an initially lin-ear Response Surface was to be created. Generation ofsuch a Response Surface required 21 points. Execution of21 simultaneous RADIOSS analyses each using 14 pro-cessors would have required 294 processors exceeding themachine capacity. Hence, the number of processors inter-nal to a RADIOSS execution was reduced to 12, so thatthe total number of processors employed in 21 concurrentRADIOSS executions was 252 that stayed well within themachine capacity at a very small penalty the elapsed timefor individual RADIOSS execution.

4.3Generalization from the particular application

The particular application reported herein generated anexperience that might be of interest to future users ofmultiprocessor machines. Therefore, this section is an at-tempt to generalize from the above experience.

Three general observations are in order. Firstly, inter-nal parallelization of the RADIOSS code was an import-ant factor in reducing the optimization procedure totalelapsed time. Without the internal parallelization thetotal elapsed time would have been about 19 times longerthat would have probably rendered the case intractable.

Secondly, the internal parallelization of RADIOSS perse did not make the optimization possible. It was thechoice of a Response Surface approximation techniquethat enabled a coarse-grained parallel computing thatcompressed the elapsed time to generate a Response Sur-face to a practically acceptable length of 1 day and madeoptimization possible. It should be noted that RADIOSSneeded no reprogramming to be utilized in such a coarse-grained parallel computing mode.

Thirdly, for effective utilization of a multiprocessorcomputer user has to become acquainted with the ma-chine architecture and the related operational character-istics to the extent far greater than it was customarywhen operating conventional, single processor machines.This point was illustrated in the foregoing by the consid-erations that led to the choice of the number of processorsto be operated internally and externally to RADIOSSand to the selection of the option for the common datahandling. It is also clear that the choice of the overall op-timization procedure, OMDAA in the case at hand wascritical to the successful application and it was stronglyinfluenced by the machine architecture.

The above three observations drawn from the particu-lar application at hand may be generalized consideringthat any computation implemented on a multiprocessormachine will have a part that can be distributed anda part that cannot. Suppose that the total elapsed time,T , of the computation at hand when executed on a singleprocessor machine consists of the sum

T = Tp+Tn , (2)

where Tp is the sum of the elapsed times of the partswhich can be distributed (parallelized), and Tn is theanalogous sum for the nondistributable parts.

Let us now assume that the above computation is ex-ecuted on N concurrently operating processors on a ma-chine whose total number of processorsNP.GE.N.We willalso recognize that even though the processors operate in-dependently, they may need to communicate with eachother as required by the solution algorithm. The commu-nication time is, usually, a function of N , Tc= Tc(N).Then, the total elapsed, TTe, time to solution is

TTe= Tp/N+Tn+Tc(N) . (3)

304

A plot of TTe and its components Tp/N , Tn, and Tc,assuming linearity of the latter, is shown in Fig. 6. Theplots indicate the importance of Tn and Tc as limitationson N that can be used effectively.

Fig. 6 Total elapsed time, TTe, and its components

Comparing TTe to T , we have a fewmetrics of obviousinterest.

The first metric is the ratio of TTe/T , denoted Rt,that measures, the time saved by using a multiprocessorcomputer relative to T . It can be expressed as

Rt= TTe/T =

(1/N)(1+NTn/Tp+NTc/Tp)/(1+Tn/Tp) . (4)

The inverse of Rtmeasures the speed-up

S = 1/Rt=N(1+Tn/Tp)/(1+NTn/Tp+NTc/Tp) .(5)

The total time saved, Ts, is

Ts= T −TTe= Tp(1−1/N−Tc/Tp) . (6)

Finally, the ratio ofS/N , denotedE, thatmeasures theefficiency of utilization of a set ofN processors becomes

E = S/N = (1+Tn/Tp)/(1+NTn/Tp+NTc/Tp) . (7)

The metrics TTe and Ts are of a primary interest to themachine user, while the metrics S and E matter most forthe machine operator.

The limits of S and E are instructive. Assume L and sto stand for large and small numbers, then

limit S forN → L= (1+Tn/Tp)/(Tn/Tp+Tc/Tp) ,(8)

limit S forN → L .AND.

Tn/Tp→ s .AND. T c/Tp→ s= Tp/(Tn+Tc) , (9)

limit E forN → L=

(1+Tn/Tp)/[N(Tn/Tp+Tc/Tp)]→ 0 . (10)

To illustrate the orders of magnitude, suppose thatTn/Tp= 0.01 and Tc/Tp= 0.01. Then, for N = 100 andN = 1000000, we have

N = 100 1000000TTe= Tp 0.03 Tp 0.020001Rt= 0.0297 0.0197S = 33.7 51.0Ts= Tp 0.98 Tp 0.97E = 0.337 0.000051

It is obvious that we need to keep Tn/To and Tc/Tpas low as possible in order to achieve high S and E,and low Rt. It is apparent that increase of N by fourorders of magnitude produces very little improvement interms of Rt, S, and Ts in presence of even quite smallTn/TpandTc/Tp.Theseparameters depress significantlyS andE, and they severely limit the number of processorsthat can be effectively used. Indeed, (7) suggests that forTn/Tp= 0.01 there is little incentive to increaseN beyond100, even if Tc/Tp= 0. This does not negate the positivefact that for N = 100, the time saved, Ts, nearly equalsTp. Obviously, if T is very long, that metric becomes veryimportant to the user. One should note that in abovecomputation, Tc/Tp is assumed constant. Should it beincreasing proportionally to N as it would be likely ifa global communicationweremaintained,TTewouldhaveincreased to the order of 100Tp� T for N = 100. Themultiprocessor machine advantage would evaporate! Thisexample clearly shows that onemust be extremely wary ofintroducingaprocessor-to-processorcommunicationwhenimplementing an algorithm on a multiprocessor machine.

To conclude, let us return now to the NVH/crash op-timization at hand. The question that occurs naturally iswhether it would have been desirable to use more than254 processors in that application. To answer the ques-tion, consider that one ODAA cycle included the NVHanalysis, the crash analysis replicated 21 times, and oneexecution of an optimizer coupled with linear extrapola-tions of the NVH and crash behaviours. The NVH an-alysis elapsed time was about 6 hours (including the sen-sitivity analysis), the elapsed time of the crash analysisof a single design point using internally parallelized RA-DIOSS was, as indicated before, about 24 hours, and theoptimizer operation took only about 15 minutes. Withthe NVH analysis carried out alongside the batch of thecrash analyses, the total elapsed time of one cycle equals24 hours and 15 minutes, totally dominated by the crashanalysis time. That time cannot be compressed any fur-ther because the use of 12 processors exhausts the RA-DIOSS potential for internal parallelization. Thus, thenumber of analyses needed to construct a linear responsesurface for 20 design variables sets the number of sim-ultaneous analyses at 21. That number multiplied by 12results in 254 that is the number of processors used.

The optimizer short elapsed time gave little incentiveto parallelize in that compartment, hence one must con-clude that the total elapsed time was made as short as itcould be in this application and there was no potential

305

left for using more processors. The larger number couldonly be useful if there were more crash scenarios to beanalysed, the number of design variables were greater,the order of the response surface were increased, or ifthe solution algorithm in RADIOSS were replaced witha new one, intrinsically parallelizable to a larger numberof processors.

This vividly shows that the characteristics of the prob-lem and attributes of the solution tool set the number ofprocessors that can used in parallel. To increase that num-ber radically one needs to change the solution method.In the extreme, one needs to invent a new, intrinsicallyparallel, solution paradigm for the problem at hand.

4.4Extrapolation to future applications

The study reported herein included only a single crashscenario. To produce a car design conforming to the safetyregulation a larger number of crash scenarios needs to beincorporated.Accommodation of the additional crash sce-narios would be very straightforwardwithin the OMDAAorganization simply by generating additional ResponseSurfaces. To do so without extending the total elapsedtime would require increase of the number of processorsavailable in the machine by a factor equal to the numberof additional crash scenarios. It appears that the resultingtotalwouldstillbewellwithinthemultiprocessorcomputercapacities expected in the very near future. Conversely,if the number of processors were to remain constant, anequivalent increase of the elapsed time would result.To incorporate optimization such as the one reportedherein in the actual design process would require closescrutiny of the results after each OMDAA cycle anda free exercise of human judgment and intervention in theprocess. That intervention might include adding and re-moving design variables, constraints, and changes to themathematical model as the non-linear crash process un-folds. The changes to the design itself may also be calledfor as suggested by physical insight in the process gainedby examination of the intermediate results. For example,changes of the cross-section from one type to another sayfrom a closed tube to an open channel, may be in orderdependent whether the critical constraints are those ofstrength or stiffness. It is by such a symbiosis of engineer’sjudgment with computational algorithm that superior de-signs can best be achieved.

The OMDAA approach is compatible with the ap-proach suggested above because it is open to human inter-vention at the completion of each cycle (the Outer Loopin Fig. 3).

5Concluding remarks

An optimization of a car body structure for minimumweight under the noise, vibration, harshness (NVH) and

crash constraints was accomplished compressing 257 daysof computing that would have been required if the prob-lem solution were attempted on a conventional single pro-cessor machine to one day. This was achieved by using254 concurrently operating processors on an SGI Origin2000 computer. Essential to the success was the choice ofthe Response Surface (RS) approximation to model thecrash response by simultaneous analysis of many points inthe design space and having the optimizer refer to a sur-face fitted to these points in lieu of the analysis. The useof RS approximation enabled a coarse-grainedparallelismwhereby the crash analysis code was replicated over manyprocessors without reprogramming.

Generalizing the above experience, one may observethat a requirement that makes the use of multiproces-sor computers distinctly different from that of a conven-tional, single processor machine is the necessity to har-monize the problem characteristics, the solution proced-ure choice, and the particular machine architecture. Suchharmonization is the prerequisite to effective and efficientsolution.

This approach demonstrated usefulness of the coarse-grained mode of utilization of multiprocessor computersin applications heretofore regarded as intractable. Theprincipal advantages of that approach are its immedi-ate availability because of its utilization of existing codeswith little or no reprogramming, and nearly linear scala-bility with the magnitude of the problem. The approachhas also limitations. One of them is the maximum num-ber of processors that may be simultaneously engaged.That number depends on the characteristics of the prob-lem at hand. The other is the necessity of each pro-cessor having enough of a fast memory to contain theproblem.

To overcome the above limitation, one would have togo beyond the coarse-grained parallelism of an existingcode replication and to develop new codes from groundup to tailor them to multiprocessor machine architecture,possibly inventing new solution paradigms. Conversely,one may anticipate that the computer architecture maybe developed to fit a class of applications. Until that hap-pens, however, the coarse-grained parallelism of the typereported on herein is an option that will be found useful inmany applications.

The optimization reported herein is an iterative pro-cess involving highly nonlinear analysis. It is, therefore,imperative to allow human judgment and interventioninto the process to combine engineer’s creativity with thecomputing speed of a machine.

Acknowledgements The studywasperformedby collaboration

of the Computational AeroSciences Team of the High Per-

formanceComputingandCommunicationProgram,Engineous

Software Development, Inc, and the FordMotor Company Re-

search Laboratories. The authors acknowledge the support of

Ms.MaryHultquistoftheNASAAmesNASComputingfacility

for the assistance and the dedicated time on SGI Origin 2000

(Steger) machine. The authors also acknowledge the support

306

from Ford engineers including Mr. C.H. Tho, Mr. L. Gu, and

Mr. M. Jayasuriya for providing themodels and consultation.

References

Chargin, M; Miura, H. 1999: Computer Aided Engineering forImproved Vehicle Crashworthiness. Poster paper atOptimiza-tion in Industry II (held in Banff, Canada, June 6–11)

Golovidov, O.; Kodiyalam, S.; Marineau, P. 1999: A flexible,object-based implementation of approximation models in anMDO framework. Des. Opt. 1, 388–404

MSC/NASTRAN design sensitivity and optimization guide(solution sequence 200). Los Angeles, CA: The MacNeal-Schwendler Corporation

RADIOSS user manual , Version 4.1. France: MECALOGSARL

Schramm, U.; Schneider, D.; Thomas, H. 1999: Structuraloptimization in occupant safety and crash analysis. Proc. Op-ticon99 (held in Newport Beach, CA, October 14–15)

Spellucci, P. 1998: DONLP2 users guide. Technical Univer-sity at Darmstadt, Department of Mathematics, Darmstadt,Germany

Stander, N.: Crashworthiness technology using responsesurface methodology and massively parallel programming.Poster paper at Optimization in Industry II (held in Banff,Canada, June 6–11)

Starnes, J.H.; Haftka, R.T. 1979: Preliminary design of com-posite wings for buckling, strength and displacement con-straints. J. Aircraft 16, 564–570

Yang, R.J.; Tseng, L.; Nagy, L.; Cheng, J. 1994: Feasibilitystudy of crash optimization. ASME , 69, 549–556

Documents

Optimization of car body under constraints of noise, vibration, and harshness (NVH), and crash