32
Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation Curtis B. Storlie a a Los Alamos National Laboratory E-mail:[email protected] Curt Storlie Reduction of Emulator Model Complexity

Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Reduction of Model Complexity and the Treatment

of Discrete Inputs in Computer Model Emulation

Curtis B. Storliea

aLos Alamos National Laboratory

E-mail:[email protected]

Curt Storlie Reduction of Emulator Model Complexity

Page 2: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Emulation using Functional ANOVA and Variable Selection◮ Adaptive Component Selection and Smoothing Operator

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 3: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Motivating Example

◮ Computational Model from Yucca Mountain Certification

◮ 150 input variables (several of which are discrete in nature),dozens of time dependent responses

◮ Response Variable (for this illustration)◮ ESIC239C.10K: Cumulative release of Ic (i.e. Glass) Colloid of

239Pu (Plutonium 239) out of the Engineered Barrier Systeminto the Unsaturated Zone at 10,000 years.

◮ Model is very expensive to run, we have an Latin Hypercubesample of size n = 300 where the model is evaluated.

◮ How to perform sensitivity/uncertainty analysis?

Curt Storlie Reduction of Emulator Model Complexity

Page 4: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Computer Model Emulation

◮ An emulator is a simpler model that mimics a larger physicalmodel. Evaluations of an emulator are much faster.

◮ Nonparametric Regression◮ We have n observations from the model

yi = f (xi ) + εi i = 1, . . . , n

where xi = (xi1, . . . , xip), and f is the physical model.◮ Usually some weak assumptions are made about f (e.g., f

belongs to a “smooth” class of functions).

◮ Methods of Estimation: Orthogonal Series/Wavelets, KernelSmoothing/local regression, Penalization Methods(Smoothing Splines, Gaussian Processes), MachineLearning/Algorithmic Approaches.

◮ With limited number of model evaluations and high numberof inputs, we need to reduce emulator complexity.

Curt Storlie Reduction of Emulator Model Complexity

Page 5: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Emulation using Functional ANOVA and Variable Selection◮ Adaptive Component Selection and Smoothing Operator

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 6: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Variable Selection in Regression Models

◮ Focus for now on variable selection for the linear model

y = β0 +

p∑

j=1

βjxj + ε

◮ Stepwise/best subsets type model fitting◮ Can produce unstable estimates

◮ More recently:◮ Continuous shrinkage using L1 penalty (LASSO), Tibshirani

1996◮ Stochastic Search Variable Selection (SVSS), George &

McCulloch 1993, 1997.

Curt Storlie Reduction of Emulator Model Complexity

Page 7: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Shrinkage, aka Penalized Regression

Ridge Regression: Find the minimizing βj ’s to

1

n

n∑

i=1

yi − β0 −

p∑

j=1

βjxi ,j

2

+ λ

p∑

j=1

β2j

Note: All the x ’s must first be standardized.

◮ Improved MSE Estimation via bias-variance trade-off.

◮ Ridge Regression is equivalent to minimizing

1

n

n∑

i=1

yi − β0 −

p∑

j=1

βjxi ,j

2

subject to

p∑

j=1

β2j < t2

for some t(λ).

Curt Storlie Reduction of Emulator Model Complexity

Page 8: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Shrinkage, aka Penalized Regression

LASSO: Find the minimizing βj ’s to

1

n

n∑

i=1

yi − β0 −

p∑

j=1

βjxi ,j

2

+ λ

p∑

j=1

(

β2j

)1/2

This is equivalent to minimizing

1

n

n∑

i=1

yi − β0 −

p∑

j=1

βjxi ,j

2

subject to

p∑

j=1

|βj | < t

for some t(λ).

Curt Storlie Reduction of Emulator Model Complexity

Page 9: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Geometry of Ridge Regression and the LASSO

Curt Storlie Reduction of Emulator Model Complexity

Page 10: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Stochastic Search Variable Selection (SSVS)

◮ Linear Regression: y = β0 +∑p

j=1βjxj + ε, where:

◮ βj = γjαj

◮ γj ∼ Bern(πj )◮ αj ∼ N(0,τ2j )

◮ (γ1, ..., γp)′ is the “model”, and is treated as an unknown

random variable.

◮ The prior probability that xj is included in the model isP(βj 6= 0) = πj .

◮ Inference is based on the posterior probability that xj isincluded in the model, P(βj 6= 0|y).

◮ It is common to determine the “best” model as the one thatincludes the variables that have P(βj 6= 0|y) > 0.5.

Curt Storlie Reduction of Emulator Model Complexity

Page 11: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Emulation using Functional ANOVA and Variable Selection◮ Adaptive Component Selection and Smoothing Operator

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 12: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Functional ANOVA Decomposition

◮ Any function f(x) can be decomosed into main effects andinteractions,

f (x) = µ0 +

p∑

j=1

fj(xj ) +

p∑

j<k

fj ,k(xj , xk) + · · · ,

where µ0 is the mean, fj are the main effects, fj ,k are thetwo-way interactions, and (· · · ) are the higher orderinteractions.

◮ The functional components (fj , fj ,k , · · · ) are an orthogonaldecomposition of the space, which implies the constraints∫

1

0fj(xj)dxj = 0 for all j and

1

0fj ,k(xj , xk)dxj = 0 for all j , k ,

and similar relations for higher order interactions. This insuresidentifiability of the functional components.

Curt Storlie Reduction of Emulator Model Complexity

Page 13: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Functional ANOVA Decomposition

◮ A convenient way to treat the high order interactions is to let

f (x) = µ0 +

p∑

j=1

fj(xj ) +

p∑

j<k

fj ,k(xj , xk) + fR(x)

where fR is a high-order interaction (catch-all) remainder.

◮ In general we can say the function f (x) lies in some space F ,

F = {1} ⊕

q⊕

j=1

Fj

(1)

where {1},F1 . . .Fq is an orthogonal decomposition of thespace. For the example above, we would havef1 ∈ F1, . . . , fp ∈ Fp , f1,2 ∈ Fp+1, . . . .

◮ Continuity assumptions on f , such as number of continuousderivatives, can be built in through the choice of the Fj .

Curt Storlie Reduction of Emulator Model Complexity

Page 14: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Emulation using Functional ANOVA and Variable Selection◮ Adaptive Component Selection and Smoothing Operator

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 15: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

The General Smoothing Spline

◮ The L-spline estimate f̂ is given by the minimizer of

1

n

n∑

i=1

(yi − f (xi))2 + λ

q∑

j=1

‖P j f ‖2F ,

over f ∈ F . P j f is the orthogonal projection of f onto Fj ,j = 1, . . . , q.

◮ For the additive model with each component function inS2 = {g : g , g ′ are absolutely continuous and g ′′ ∈ L2[0, 1]}then f̂ is given by the minimizer of

1

n

n∑

i=1

(yi − f (xi ))2+λ

p∑

j=1

{

[fj(1)− fj(0)]2 +

1

0

[

f ′′j (xj)]2

dxj

}

◮ The solution can be obtained conveniently with tools fromreproducing kernel Hilbert space theory (see Wahba 1990).

Curt Storlie Reduction of Emulator Model Complexity

Page 16: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Adaptive COmponent Selection and Smoothing Operator

◮ LASSO is to Ridge Regression as ACOSSO is to theSmoothing Spline. Find the minimizer over f ∈ F of

1

n

n∑

i=1

(yi − f (xi))2 + λ

q∑

j=1

wj‖Pj f ‖F .

◮ For the additive model with each the minimization becomes

1

n

n∑

i=1

(yi − f (xi ))2+λ

p∑

j=1

wj

{

[fj(1)− fj(0)]2 +

1

0

[f ′′j (xj)]2dxj

}1/2

◮ This estimator sets some of the functional components (fj ’s)equal to exactly zero (i.e., xj is removed from the model.)

◮ We want wj to allow prominent functional components toenjoy the benefit of a smaller penalty. Use a weight based onL2 norm of an initial estimate f̃

wj = ‖f̃ j‖−γL2

=

(∫

1

0

(f̃ j(xj))2dxj

)−γ/2

Curt Storlie Reduction of Emulator Model Complexity

Page 17: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Emulation using Functional ANOVA and Variable Selection◮ Adaptive Component Selection and Smoothing Operator

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 18: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Bayesian Smoothing Spline ANOVA (BSS-ANOVA)

◮ Assume

f (x) = µ0 +

p∑

j=1

fj(xj ) +

p∑

j<k

fj ,k(xj , xk) + fR(x)

◮ Model the mean as µ0 ∼ N(0, τ20 )

◮ Model the fj ∼ GP(0, τ2j K1), fj ,k ∼ GP(0, τ2j ,kK2) and

fR ∼ GP(0, τ2RKR).

◮ The covariances functions K1,K2,KR are such that thefunctions µ0, fj , fj ,k , fR obey the Functional ANOVAconstraints almost surely. They can also be chosen for desiredlevel of continuity.

◮ Lastly apply SSVS to the variance parameters τ2j , τ2j ,k ,

j , k = 1, 2, . . . , p, and τR to accomplish variable selection.

Curt Storlie Reduction of Emulator Model Complexity

Page 19: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Emulation using Functional ANOVA and Variable Selection◮ Adaptive Component Selection and Smoothing Operator

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 20: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Treating Discrete Inputs

◮ Discrete Inputs can be thought of as having a graphicalstructure.

◮ Two examples where j th predictor xj ∈ {0, 1, 2, 3, 4}:

Curt Storlie Reduction of Emulator Model Complexity

Page 21: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Treating Discrete Inputs

◮ Use Functional ANOVA framework to allow for these discretepredictors. Restrictions implied on the discrete input maineffect component are

c fj(c) = 0, and similarly forinteractions.

◮ The norm (penalty) used is f ′Lf, where L = D− A is thegraph Laplacian matrix.

◮ It can be shown that f ′Lf =∑

Al ,m[f (l)− f (m)]2, i.e., thepenalty is the sum (weighted by the adjacency) of all of thesquared distances between neighboring nodes.

◮ There is also a corresponding covariance function K1 whichenforces the ANOVA constaints for fj in the BSS-ANOVAframework as well. Something like a harmonic expansion overthe graph domain with variance decreasing with frequency.

Curt Storlie Reduction of Emulator Model Complexity

Page 22: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 23: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Simulation Study

◮ xjiid∼ Unif{1, 2, 3, 4, 5, 6} for j = 1, . . . , 4

xjiid∼ Unif(0, 1) for j = 5, . . . , 15.

◮ x1, . . . , x4 are unordered qualitative factors.

◮ The test function used here is a function of only 3 inputs (2 ofwhich are qualitative). So 12 of the 15 inputs are completelyuninformative.

◮ Collect a sample of size n = 100 from yi = f (xi) + εi , where

εiiid∼ N (0, 1), giving SNR ≈ 100 : 1 for the 2 test cases.

Curt Storlie Reduction of Emulator Model Complexity

Page 24: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Test function

Curt Storlie Reduction of Emulator Model Complexity

Page 25: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Simulation Results

Estimator Pred MSE Pred 99% CDF ISE

ACOSSO 0.28 (0.03) 3.98 (0.60) 0.006 (0.000)

BSS-ANOVA 0.18 (0.01) 3.09 (0.46) 0.006 (0.000)

GP 1.09 (0.06) 18.26 (1.73) 0.010 (0.001)

Pred MSE Average over the 100 realizations of the MeanSquared Error for prediction of new observations.

Pred 99% Average over the 100 realizations of the 99th

percentile of the squared error for prediction of a newobservation.

CDF ISE Average over the 100 realizations of the integratedsquared error of the true CDF curve to the estimatedCDF via the emulator.

Curt Storlie Reduction of Emulator Model Complexity

Page 26: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Outline

◮ Reduction of Emulator Complexity

◮ Variable Selection

◮ Functional ANOVA

◮ Emulation using Functional ANOVA and Variable Selection◮ Adaptive Component Selection and Smoothing Operator

◮ Bayesian Smoothing Spline ANOVA Models

◮ Discrete Inputs

◮ Simulation Study

◮ Example from the Yucca Mountain Analysis

◮ Conclusions and Further Work

Curt Storlie Reduction of Emulator Model Complexity

Page 27: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Yucca Mountain Certification

◮ Response Variable (for this illustration)◮ ESIC239C.10K: Cumulative release of Ic (i.e. Glass) Colloid of

239Pu (Plutonium 239) out of the Engineered Barrier Systeminto the Unsaturated Zone at 10,000 years.

◮ Predictor Variables (that appear in plots below)◮ TH.INFIL: Categorical variable describing different scenarios

for infiltration and thermal conductivity in the regionsurrounding the drifts. high relative humidity (∼ 85%).

◮ CPUCOLWF: Concentration of irreversibly attached plutoniumon glass/waste form colloids when colloids are stable (mol/L).

Curt Storlie Reduction of Emulator Model Complexity

Page 28: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Yucca Mountain Certification

Curt Storlie Reduction of Emulator Model Complexity

Page 29: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Yucca Mountain Certification

Curt Storlie Reduction of Emulator Model Complexity

Page 30: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Yucca Mountain Certification

◮ Below is a Sensitivity Analysis for ESIC239C.10K

◮ Let Tj denote the total variance index for the j th input (i.e.,Tj is the proportion of the total variance of the output thatcan be attributed to the j th input and its interactions).

Meta-model: ACOSSO

Model Summary: R2 = 0.960, model df = 92

Input T̂ j 95% Tj CI p-val

CPUCOLWF 0.565 (0.473, 0.621) < 0.01

TH.INFIL 0.424 (0.360, 0.518) < 0.01

RHMUNO65 0.063 (0.052, 0.126) < 0.01

FHHISSCS 0.053 (0.041, 0.106) < 0.01

SEEPUNC 0.020 (0.000, 0.040) 0.10

Curt Storlie Reduction of Emulator Model Complexity

Page 31: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

Conclusions and Further Work

◮ Functional ANOVA construction and variable selection canhelp to increase efficiency in function estimation.

◮ A general treatment of graphical inputs easily allows forordinal and qualitative inputs as special cases.

◮ When using Functional ANOVA construction, the main effectand interaction functions are immediately available (i.e., noneed to numerically integrate).

◮ Functional ANOVA construction also lends itself well toallowing for “nonstationarity” in function estimation.

◮ The overall function (which is potentially quite complex) iscomposed of fairly simple functions (i.e., main effects or 2-wayinteractions), so the extension is much easier than for ageneral function of p inputs.

Curt Storlie Reduction of Emulator Model Complexity

Page 32: Reduction of Emulator Model Complexity · Outline Reduction of Emulator Complexity Variable Selection Functional ANOVA Emulation using Functional ANOVA and Variable Selection Adaptive

References

1. Tibshirani, R. (1996), ’Regression shrinkage and selection via thelasso’, Journal of the Royal Statistical Society: Series B.

2. George, E. & McCulloch, R. (1993), ’Variable selection via Gibbssampling’, Journal of the American Statistical Association.

3. Wahba, G. (1990), Spline Models for Observational Data,CBMS-NSF Regional Conference Series in Applied Mathematics.

4. Storlie, C., Bondell, H., Reich, B. & Zhang, H. (2009a), ’Surfaceestimation, variable selection, and the nonparametric oracleproperty’, Statistica Sinica.

5. Reich, B., Storlie, C. & Bondell, H.D. (2009), ’Variable Selection inBayesian Smoothing Spline ANOVA Models: Application toDeterministic Computer Codes’, Technometrics.

6. Smola, A. &Kondor, R. (2003), Kernels and Regularization on

Graphs. In Learning theory and Kernel machines.

Curt Storlie Reduction of Emulator Model Complexity