Statistical Meta-Modeling for Complex System Simulations

Preview:

Citation preview

• Statistical meta modeling of computer experiments• Kriging as the main tool: two examples of computer experiments

- two levels of accuracy- with qualitative/quantitative factors

• Alternatives to kriging (to avoid numerical instability)• Some novel design issues

C. F. Jeff WuSchool of Industrial and Systems Engineering

Georgia Institute of Technology

Statistical Meta-Modeling for Complex System Simulations:

Kriging, Alternatives and Design

Why computer experiments?

Study dangerous or infeasible physical experiments, such as ammunition detonation.

1

Some examples

2

Statistical Meta-Modeling of Computer Experiments

Robustness, optimization

Surrogate model(Kriging)

Noise simulations, error propagation

More FEA runs

Computer modeling (finite-element simulation)

Physical experimentor observations

3

Kriging as an interpolator

• Kriging is an interpolation method:• Originated in geostatistics; proposed for computer experiments by Sacks, Welch et al.

.,)(ˆ pii Rxyxy ∈=

4

Kriging Modeling

• Model assumption: Gaussian process.

– mean, linear model where

– correlation function, e.g., Gaussian Correlation Function,

– variance.

5

Best Linear Unbiased Predictor

• Best Linear Unbiased Predictor (BLUP):

– is the generalized least squares estimation, =

– .– = vector of correlation between

prediction points and observed points.

– interpolating property.,)(ˆ ii yxy =

6

Kriging Modeling: EstimationKriging Modeling: Estimation

M i Lik lih d E i i• Maximum Likelihood Estimation:

))}.log(det()ˆlog({min 2

0Rn

./)ˆ()ˆ(ˆ

))}g( ()g({1'2

0

nFyRFy

where

• Numerical Stability: R is ill-conditioned, especially for large nd/ k ( i t di i ) (P d W 2010b)and/or k (=input dimension) (Peng and Wu, 2010b).

• Computational Complexity: matrix inversion O(n3); manyoptimization iterations are needed; in each iterationoptimization iterations are needed; in each iterationmatrix inversion is computed.

7

Examples of Kriging Models

• Computer experiments with two levels of accuracy

• Computer experiments with quantitative and qualitative factors

• Simulation codes with calibration parameters

8

Example: Designing Cellular Heat Exchangers for an Electronic Cooling Application

9

Heat Transfer AnalysisHE: Detailed Computer Simulation – Finite Element

Analysis (FEA) Method

• Using the computational fluid dynamic solver FLUENT

• Problem domain is divided into thousands or millions of elements.

• Each run requires hours to daysto complete.

FLUENT

10

Heat Transfer AnalysisLE: Approximate Computer Simulation —Finite

Difference Method

• The finite difference techniqueis a numerical technique forsolving 2- or 3-D steady state heat transfer problems.

• Temperature distribution approximated via numerical solution of 3D heat transfer equations using forward or central difference methods.

• Each run takes minutes to complete.

• Less accurate than FEA.

Finite Difference

11

High-accuracy and Low-accuracy Experiments

• Paradigm shift: single experiment→ multiple experiments withdifferent

levels of accuracy.

• A generic pair:high-accuracy experiment(HE) andlow-accuracyexperiment (LE).

• HE is moreaccuratebut moreexpensivethan LE.

• Examples:

– physical experiment vs. computer simulation

– detailed computer simulation vs. approximate computer simulation

• 1. Modeling:

How to model and analyze data from HE and LE?

2. Experimental design:

How to plan HE and LE?

12

Frequentist Approach in Qian et al. (2006, ASME)Related to the autoregressive model in Kennedy and O’Hagan (2000)

• x = (x1, . . . ,xk): design variables.Dl andDh: sets of design points (xi) for LE and HE withDh ⊂ Dl .yh andyl : outputs from HE and LE.

• Base Surrogate Model:yl = βl0 +∑ j βl j xi + εl (xi), xi ∈ Dl ,εl ∼ GP(0,σ2

l ,φl ).

• Adjustment Model: yh(xi) = ρ(xi)yl (xi)+δ(xi), xi ∈ Dh,

– scale adjustment:ρ(xi) = ρ0 +∑ j ρ jxi j ;

– location adjustment: δ ∼ GP(δ0,σ2δ,φδ).

• Fitting:

Base surrogate model:yl

Adjustment model:ρ andδ

=⇒ Final surrogate model:yh = ρyl + δ.

• Desirable property:yh interpolatesyh(xi),xi ∈ Dh if HE is deterministic.

13

Bayesian Approach in Qian and Wu (2008,

Technometrics)

• UseflexibleBayesian hierarchicalGaussianprocess model (BHGP).

• Provide moreflexible adjustment. Can accommodateparameteruncertainty and measurement error of HE.

• BHGP:

– Base Surrogate Model: yl = βl0 +∑ j βl j xi + εl (xi), xi ∈ Dl ,

εl ∼ GP(0,σ2l ,φl ).

– Flexible Adjustment Model:

yh(xi) = ρ(xi)yl (xi)+δ(xi)+ ε(xi), xi ∈ Dh,

scale adjustmentρ ∼ GP(ρ0,σ2ρ,φρ),

location adjustmentδ ∼ GP(δ0,σ2δ,φδ),

ε(x) ∼ N(0,σ2ε). ε = 0⇔ no measurement error.

14

Model Parameters and Priors for the BHGP Model

• Model parameters

– Mean parametersθ1 = (βl ,ρ0,δ0).

– Variance parametersθ2 = (σ2l ,σ

2ρ,σ2

δ,σ2ε).

– Correlationparametersθ3 = (φl ,φρ,φδ);

complexityincreaseswith the no. of input factors (∵dim = 3k).

• Priors

– Normal forθ1.

– Inverse-gamma forθ2.

– Gamma forθ3.

15

Computational Algorithm

1: Fitting Correlation Parameters θ3

Solve astochasticprogram

maxφρ,φδ

L2 = Ep(τ1,τ2) f (τ1,τ2)

by the Sample Average Approximation (SAA) algorithm.

2: Markov Chain Monte Carlo (MCMC) Sampling from p(θ1,θ2|yl ,yh,θ3)

• Rationale: Conditional distributions forθ1,θ2 are in closed form, but not for

θ3 givenθ1, θ2.

• It is not afully Bayesian procedure (to save computation forθ3).

16

Posterior density ofρ0, δ0, σ2ρ, σ2

δ

0.6 0.8 1.0 1.2 1.4

01

23

4

(a)

ρ0

De

nsi

ty

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

2.5

(b)

σρ2

De

nsi

ty

−1 0 1 2 3 4

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(c)

δ0

De

nsi

ty

0 1 2 3 4 5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(d)

σδ2

De

nsi

ty

17

Location and Scale Change

• Symmetric location changeδ0 with center at 0.85.

• Scaleadjustmentρ0 has multi modes, which may be explained bymultiplelaws.

18

Calibration of Computer ModelsCalibration of Computer Models

C lib ti f fitti t d l t b d• Calibration = process of fitting a computer model to observed data by adjusting input parameters of the model (Kennedy and O’Hagan, 2001).

• Computer model: ,• Physical model:

= calibration parameters = model inadequacy = adjustment

),( xyc

,)(),( xxyp )(x – = calibration parameters, = model inadequacy, = adjustment term.

• Example 1-Nuclear Accident: Use data on reactor fire to calibrate a plume model of radioactive deposition Source term

)(x

calibrate a plume model of radioactive deposition. Source term and deposition velocity = calibration parameters.

• Example 2-Methane Combustion: a computer model of ignition d l f h b i C lib idelay of methane combustion. Calibration parameters = seven reaction rates.

19

adminNUS
Typewritten Text
adminNUS
Typewritten Text

GP with quanti/quali factors: Data Center Thermal Distribution

20

Configuration Variables for the Data Center Example

• Five quanti factors: rack temperature rise, rack power, diffuser angle, diffuser

flow rate, ceilingheight.

• Threequali factors: diffuser location, hot-air return-vent location, power

allocation.

21

GP Models with Quantitative and Qualitative Factors

(Qian, H. Wu, C. F. J. Wu 2008,Technometrics)

• Input factors:w = (x,z); I quantitative factors:x = (x1, . . . ,xI ); J qualitative

factors:z = (z1, . . . ,zJ); response value:y(w).

• Build asingleGP model for bothx andz. Borrow strengths from all the

observations:

y(w) = ∑m

βm fm(w)+ ε(w).

• How to specify fm? Use regression modeling involvingx andz.

• How to specifyε (especially its correlation structure)?

• Related work: Bayesian hierarchical GP models (Han et al., 2009).

• Corresponding designs:Sliced space-filling designs(Qian and Wu, 2009,

Biometrika).

22

Construction of Correlation Functions for ε(w)

• Consider one quali factorz1 with m1 levels 1, . . . ,m1.

Foru = 1, . . . ,m1, εu(x) = ε(x,u).

• Idea: envision a mean-zero multivariate process(ε1(x), . . . ,εm1(x)′ = Aη(x).

• A: anm1×m1 matrix withunit row vectors.

• Elements ofη(x): m1 independent processes with a common varianceσ2.

• For two input valuesw1 = (x1,z11) andw2 = (x2,z12),

cor(ε(w1),ε(w2)) = τz11,z12Kφ(x1,x2)

– Kφ(x1,x2): correlation betweenx1 andx2.

– T1 = AA t : anm1×m1 correlationmatrix for z1 (i.e.,positive definitematrix with unit diagonal elements).

23

Alternatives to Kriging

1. Tweaking of kriging:– covariance matrix tapering (Kaufman et al., 2008),– rank reduction (Cressie-Johannesson, 2008),– regularized krigng (Peng-Wu, 2010a)

2. Adding penalty to likelihood (Li-Sudjianto, 2005): another approach to regularization.

24

Alternatives to Kriging (cont )Alternatives to Kriging (cont.)

3. Completely different ideas:

- Radial basis interpolation functions (Buhmann 2004)- Radial basis interpolation functions (Buhmann, 2004)

- Regression-based inverse distance weighting (Joseph and Kang, 2009): a fast interpolator

O l b i h d (OBSM) (Ch W d W- Overcomplete basis surrogate method (OBSM) (Chen, Wang and Wu, 2010): an approximator, not interpolator

25

Inverse Distance Weighting (IDW)

• Inverse Distance Weighting (Shepard, 1968):

– Simple computation but poor prediction.

26

Regression-Based Inverse Distance Weighting (RIDW)

• Add regression part to IDW (Joseph and Kang, 2009):

– = mean part; can be linear, nonlinear, nonparametric.

– (faster convergence than IDW)

27

Comparisons Between RIDW and Kriging

28

Design of Experiments with HE and LE

• Key to efficiently allocate resources and acquire information from HE and LE.

• x = (x1, . . . ,xk): design variables in[0,1]k.

Dl : set of design points (xi) for LE. Dh: set of design points (xi) for HE.

• Three principles for constructingDl andDh:

Economy: The size ofDh is less than the size ofDl .

Nested relationship: Dh is a subset ofDl .

Space-filling: Points inDh andDl achieve uniformity in low dimensions.

• How to constructmultiple experimentswith respect tomultiplerequirements?

• New issue in design of experiments: traditional methods deal almost

exclusively with experiments with one level of accuracy.

29

Nested Space-Filling Designs

Proposed in Qian, Tang and Wu (2009),StatisticaSinica, and furtherdeveloped in

Qian, Ai and Wu (2009),Annals of Statistics, Qian (2009),Biometrika.

Ideas:

1. Use aspecialorthogonal array to construct an OA-based Latin hypercube

design forDl .

2. ChooseDh to be acarefullyselected subset ofDl with two-dimensional

balance.

Need to use simple Galois field theory.

30

Nested Orthogonal Array

Let A be anOA(N1,k,s1, t). SupposeA has asubarray withN2 rows, denoted by

A1, and there is a projectionφ that collapse thes1 levels ofA into s2 levels. Further

supposeA1 is anOA(N2,k,s2, t) if the s1 levels of its entries are collapsed tos2

levels according toφ. Then an array with this structure is called anested

orthogonal array.

31

Construction of Nested Space-Filling Design

• The nested orthogonal arrayA2 ⊂ A1 does not automatically generatenested

space-fillingdesigns if thes1 levels are arbitrarily labeled.

• In constructing OA-based Latin hypercubeD1 usingA1, thes1 levels ofA1,

represented by the polynomials of degreeu1−1 or lower, have to be first

labeled as 1, . . . ,s1.

• The subsetD2 of D1 corresponding toA2 may not have good space-filling

properties.

• Care should be taken inlabelingthe levels to ensure thatD2 achieves

stratificationin two dimensions.

• Thes1 levels ofA1 must be labeled in such a way that the group of levels that

are mapped to the same level should form a consecutive subset of 1, . . . ,s1.

32

An Example of Nested Space-Filling Design

• Five factorsx = (x1,x2,x3,x4,x5) taking values inthe unit hypercube[0,1]5.

• Use the orthogonal array in Table 1 to construct an OA-based Latin hypercube

D1 for x1-x5.

• Choose a subarrayD2 of D1 consisting of the 16 points inD1 corresponding to

runs 1-4, 9-12, 17-20, 25-28 of the array in Table 1.

33

The Underlying Nested Orthogonal Array

Run # x1 x2 x3 x4 x5 Run # x1 x2 x3 x4 x5

1 1 1 1 1 1 33 8 1 8 8 8

2 1 3 3 5 7 34 8 3 6 4 2

3 1 5 5 8 4 35 8 5 4 1 5

4 1 7 7 4 6 36 8 7 2 5 3

5 1 8 8 7 2 37 8 8 1 2 7

6 1 6 6 3 8 38 8 6 3 6 1

7 1 4 4 2 3 39 8 4 5 7 6

8 1 2 2 6 5 40 8 2 7 3 4

9 3 1 3 3 3 41 6 1 6 6 6

10 3 3 1 7 5 42 6 3 8 2 4

11 3 5 7 6 2 43 6 5 2 3 7

12 3 7 5 2 8 44 6 7 4 7 1

13 3 8 6 5 4 45 6 8 3 4 5

14 3 6 8 1 6 46 6 6 1 8 3

15 3 4 2 4 1 47 6 4 7 5 8

16 3 2 4 8 7 48 6 2 5 1 2

34

Run # x1 x2 x3 x4 x5 Run # x1 x2 x3 x4 x5

17 5 1 5 5 5 49 4 1 4 4 4

18 5 3 7 1 3 50 4 3 2 8 6

19 5 5 1 4 8 51 4 5 8 5 1

20 5 7 3 8 2 52 4 7 6 1 7

21 5 8 4 3 6 53 4 8 5 6 3

22 5 6 2 7 4 54 4 6 7 2 5

23 5 4 8 6 7 55 4 4 1 3 2

24 5 2 6 2 1 56 4 2 3 7 8

25 7 1 7 7 7 57 2 1 2 2 2

26 7 3 5 3 1 58 2 3 4 6 8

27 7 5 3 2 6 59 2 5 6 7 3

28 7 7 1 6 4 60 2 7 8 3 5

29 7 8 2 1 8 61 2 8 7 8 1

30 7 6 4 5 2 62 2 6 5 4 7

31 7 4 6 8 5 63 2 4 3 1 4

32 7 2 8 4 3 64 2 2 1 5 6

Table 1. An OA(64,5,8,2) that contains an OA(16,5,4,2), i.e., the submatrix consisting of runs1-4, 9-12, 17-20, 25-28 becomes an OA(16,5,4,2) after some suitable level collapsing

35

2-D Projection of D1

x1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

x2

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x3

36

2-D Projection of D2

x1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

x2

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

x3

37

iConclusions

• Computer simulations have become popular in studying complex systems.

• Statistics-based meta (surrogate, approximate) models are useful companions to simulations.

• A rich class of problems: modeling, estimation, prediction, calibration, computations, design.

• Kriging is popular for building meta models but can have problems with numerical instability.

• Effective alternatives to kriging have emerged in recent work. Cross-disciplinary in nature.

38

Step 1: Fitting Correlation Parametersθ3

• Obtainposterior modeθ3 = (φl , φρ, φδ) by solving

(P) : maxφl ,φρ,φδ

p(θ3|yh,yl ).

• (P) is equivalent to two separableproblems:

– (P1) : a non-linear program forφl ,

– (P2) : a problem forφρ andφδ. Its objective function involves an integral.

(P2) ⇔ astochastic program (P′2) : max

φρ,φδL2 = Ep(τ1,τ2) f (τ1,τ2).

• SolveP′2 by theSample Average Approximation (SAA) algorithm :

1. Generate random samples(τs1,τ

s2) from p(τ1,τ2),s= 1,· · · ,S.

2. Solve the approximate problem:

(φρ, φδ) = argmaxφρ,φδ

[L2 =1S

S

∑s=1

f (τs1,τ

s2)].

39

Step 2: Markov Chain Monte Carlo (MCMC)

Sampling from p(θ1,θ2|yl ,yh,θ3)

• Somefull conditional distributions for θ1,θ2 are not regular:

– p(βl |yl ,yh,θ3,βl ) ∼ Normal,

– p(ρ0|yl ,yh,θ3,ρ0) ∼ Normal,

– p(δ0|yl ,yh,θ3,δ0) ∼ Normal,

– p(σ2l |yl ,yh,θ3,σ2

l ) ∼ IG,

– p(σ2ρ|yl ,yh,θ3,σ2

ρ) ∼ IG,

– p(τ1,τ2|yl ,yh,τ1,τ2) ∝ 1

ταδ+ 3

21

αε+12

exp{−1τ1

(γδσ2

ρ+

(δ0−uδ)2

2vδσ2ρ

)− γετ2σ2

ρ} 1

|M|12

·exp{−(yh−ρ0yl1 −δ01n1)

tM−1(yh−ρ0yl1 −δ01n1)

2σ2ρ

}

– τ1 = σ2δ/σ2

ρ,τ2 = σ2δ/σ2

ε

• Use theMetropolis-within-Gibbs algorithm.

40

The General Case

• ConsiderJ qualitative factorsz1, . . . ,zJ with zj havingmj levels 1, . . . ,mj .

• For two input valuesw1 = (x1,z11) andw2 = (x2,z12),

cor(ε(w1),ε(w2)) =

[J

∏j=1

τ j,zj1,zj2

]exp

{−

I

∑i=1

φi(xi1 −xi2)2

}.

• T j : anmj ×mj correlationmatrix for zj (i.e., apositivedefinite matrix with

unit diagonal elements).

• This correlation function has a product form.

• Assume the elements ofT j ≥ 0 for deterministic computer experiments.

41

Some Restrictive Forms of Tj

Consider two input valuesw1 = (x1,z1) andw2 = (x2,z2) with responsesy(w1)

andy(w2). Recallthatτ j,zj1,zj2 is the correlation betweenzj1 andzj2.

1. Isotropy correlation function:τ j,zj1,zj2 = exp{−θ j I [zj1 6= zj2]}.

Forw1 andw2,

cor(ε(w1),ε(w2)) = exp

{−

I

∑i=1

φi(xi1 −xi2)2−

J

∑j=1

θ j I [zj1 6= zj2]

}

Euclideandistance forxi ; 0-1 distance forzj .

2. Multiplicative correlation function:

τr,s = exp{−θr,s} = exp{−(θr +θs)I [r 6= s]}. (1)

3. Group correlation functions.

4. Correlation functions for ordinal qualitative factors.

42

Estimation

• Modelparameters:

– mean parametersβ = (β1, . . . ,βp).

– variance parameterσ2.

– correlation parametersphi = (φ1, . . . ,φI )t , andT = {T1, . . . ,TJ}.

• The estimation iterates between

Regression fitting: Given ˆphi andT, estimateβ andσ2.

Simple!

Correlation fitting: Givenβ andσ2, let U = (u1, . . . ,un) with

ui = [yi − βtf(wi)]/σ and then fit a GP with mean zero and variance one to

the transformed dataU.

43

Recommended