Quantifying Input Uncertainty via Simulation Confidence ...users.iems.northwestern.edu/~nelsonb/Publications/BartonNelsonXieIJOC.pdf · critically, project success, corporate proﬁtability,

This article was downloaded by: [165.124.161.142] On: 26 October 2013, At: 11:25Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

INFORMS Journal on Computing

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

Quantifying Input Uncertainty via Simulation ConfidenceIntervalsRussell R. Barton, Barry L. Nelson, Wei Xie

To cite this article:Russell R. Barton, Barry L. Nelson, Wei Xie (2013) Quantifying Input Uncertainty via Simulation Confidence Intervals. INFORMSJournal on Computing

Published online in Articles in Advance 14 Jun 2013

. http://dx.doi.org/10.1287/ijoc.2013.0548

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2013, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

http://pubsonline.informs.org

http://dx.doi.org/10.1287/ijoc.2013.0548

http://pubsonline.informs.org/page/terms-and-conditions

http://www.informs.org

INFORMS Journal on ComputingArticles in Advance, pp. 1–14ISSN 1091-9856 (print) � ISSN 1526-5528 (online) http://dx.doi.org/10.1287/ijoc.2013.0548

© 2013 INFORMS

Quantifying Input Uncertainty via SimulationConfidence Intervals

Russell R. BartonDepartment of Supply Chain and Information Systems, Smeal College of Business, Pennsylvania State University,

University Park, Pennsylvania 16802, [email protected]

Barry L. Nelson, Wei XieDepartment of Industrial Engineering and Management Sciences, Northwestern University, Evanston,

Illinois 60202 {[email protected], [email protected]}

We consider the problem of deriving confidence intervals for the mean response of a system that is repre-sented by a stochastic simulation whose parametric input models have been estimated from “real-world”

data. As opposed to standard simulation confidence intervals, we provide confidence intervals that account foruncertainty about the input model parameters; our method is appropriate when enough simulation effort can beexpended to make simulation-estimation error relatively small. To achieve this we introduce metamodel-assistedbootstrapping that propagates input variability through to the simulation response via an equation-based modelrather than by simulating. We develop a metamodel strategy and associated experiment design method thatavoid the need for low-order approximation to the response and that minimizes the impact of intrinsic (simu-lation) error on confidence level accuracy. Asymptotic analysis and empirical tests over a wide range of simu-lation effort show that confidence intervals obtained via metamodel-assisted bootstrapping achieve the desiredcoverage.

Key words : input modeling; bootstrapping; confidence intervals; metamodeling; stochastic krigingHistory : Accepted by Marvin Nakawama, Area Editor for Simulation; received January 2011; revised June

2011, January 2012, October 2012; accepted January 2013. Published online in Articles in Advance.

1. IntroductionOne of the most valuable aspects of simulation isits ability to characterize the behavior of arbitrar-ily complex stochastic systems. However, effectivecharacterization depends on controlling simulation-estimation error, because a stochastic simulation is inevery sense a statistical experiment. Years of researchon measuring and controlling simulation-estimationerror has yielded robust methods that are adequatefor many practical simulation problems; one mighteven be tempted to say that the estimation-errorproblem has been “solved.” Unfortunately, there isa hidden, and often substantial additional error insimulation experiments that is present even if bestpractices for modeling, experiment design, and out-put analysis are employed: input-uncertainty error.In fact, the impact of input-uncertainty error relativeto simulation-estimation error can be exacerbated bybest practices for controlling estimation error.

Input models are the driving stochastic processesin simulation experiments. They represent uncertaintyat a basic level that resists more detailed modeling.Arrival processes in queueing simulations, demandprocesses in supply chain simulations, and patientoccupancy times in hospital simulations are examples

of inputs. Input models are often based on a finitesample of observed real-world data, and thereforeare subject to error. All simulation software mea-sures simulation-estimation error; no simulation soft-ware accounts for input-uncertainty error, and fewpractitioners are even aware of this problem. Yet,as we demonstrate next, input-uncertainty error canoverwhelm simulation-estimation error, placing sim-ulation users at risk of making critical and expen-sive decisions with unfounded confidence in (whatappear to be) highly precise simulation assessmentsor optimizations.

1.1. An IllustrationThe steady-state expected number of customers in anM/M/� queue (Poisson arrival process with rate �,exponentially distributed service times with mean � ,and an infinite number of servers) is �4�1�5 = �� .To understand how input uncertainty affects simula-tion, suppose we do not know this result, nor do weknow � or � , but we want to estimate the mean num-ber of customers in the system in steady state via sim-ulation. Consider the following stylized experiment.

1. Start by observing m real-world customer inter-arrival times A11A21 0 0 0 1Am and estimate � by

1

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Barton, Nelson, and Xie: Quantifying Input Uncertainty2 INFORMS Journal on Computing, Articles in Advance, pp. 1–14, © 2013 INFORMS

�= 1/A, where a ¯ indicates the sample aver-age. Similarly, observe m real-world service timesX11X21 0 0 0 1Xm and estimate � by � = X. In this illus-tration and throughout the paper we assume para-metric input models but with unknown parameters,as is typical in input modeling. In this case the distri-butions (exponential) are correct, but the parametersare estimated.

2. Conditional on � and � , simulate n independentreplications of the queue, and on each one take a sin-gle observation of the number of customers in the sys-tem in steady state Y11Y21 0 0 0 1Yn. This is a simplifiedversion of what actually happens in simulation wherewe record the number in the system for some periodof time and average, not just take a single observation.

3. Finally, estimate the steady-state expectednumber in the queue �4�1�5 by the sample mean Y .

For this example the properties of Y can be derived:

E6Y 7=m

m− 1��1 (1)

Var6Y 7≈��

n+

24��52

m0 (2)

These results integrate over both input and simulationuncertainties. The impact of input uncertainty showsup in two ways: First, the estimator Y is biased, asshown in (1). The bias is a function of m, the amountof real-world data used to estimate the input models.The variance of the estimator displayed in (2) containstwo terms. The first term represents the simulation-estimation error; it decreases as the number of replica-tions n increases. This is the only term that is capturedby simulation-based confidence intervals (CIs), andit can be driven to 0 by increasing the number ofreplications. In fact, in many practical problems itis both feasible and reasonable to make simulation-estimation error negligible. The second term repre-sents the input-uncertainty error. Notice that it mayentirely dominate the first term depending upon theamount of real-world data m and it is immune tomore simulation effort.

This is a stylized example, but the situation isactually much worse in practical problems. A realis-tic simulation may have tens of input models, someestimated from huge stockpiles of data and othersfrom very little. The simulation output is often ahighly nonlinear function of these inputs, so assess-ing input-uncertainty error with anything like theprevious mathematical analysis is impossible. Mostcritically, project success, corporate profitability, andeven human lives may depend on decisions thatare informed by overly confident characterizations ofuncertainty.

1.2. Quantifying Input UncertaintyThe contributions of this paper are to develop aframework to construct confidence intervals for themean simulation response that account for inputuncertainty in parametric input models, and to pro-vide a specific implementation that makes the frame-work efficient and effective. Our approach is topropagate input uncertainty from the input modelsto the simulation output using bootstrap resamplingof the real-world data and a metamodel that approx-imates the mean simulation response as a function ofthe input models. As described in §2, this approachovercomes shortcomings of other methods that havebeen proposed. The choice of metamodel form in con-junction with an experiment design strategy tailoredto this application are also contributions. The result isa framework for quantifying input uncertainty that isrigorously justified, but also practically useful.

In this paper we only consider the case of paramet-ric input models that describe independent and iden-tically distributed (i.i.d.) inputs and input models thatare mutually independent. We assume we know thecorrect distribution families, but not their parameters,and that we have real-world data from which to esti-mate the parameters. This is a step toward solvingthe problem in which the families of distributionsthemselves are also unknown. It is a very importantstep, however, because the existence of a “true, cor-rect” distribution for real processes is always fiction;any distribution is an approximation, and to accountfor other possible parameters that could have beenchosen for the approximation is significant.

The next section describes other approaches to theinput-uncertainty problem; this is followed by a for-mal description of the problem and development ofthe metamodel-assisted bootstrapping concept in §3.Implementation is considered in §4. We report resultsfrom an empirical evaluation in §5, and conclude thepaper in §6.

2. BackgroundSimulation input-uncertainty has been addressed inthe literature in essentially two ways: The first isto perform statistical output characterization condi-tional on the correctness of the input probability mod-els and separately perform sensitivity, uncertainty, orrobustness analysis of simulation output to changesin simulation-input distributions. Descriptions of thisapproach can be found in Kleijnen (1987, 1994) andLaw and Kelton (2000). As noted by Schmeiser in hisdiscussion in Barton et al. (2002), this method hasbroad applicability, and can handle qualitative uncer-tainty in model form as well as situations where nocalibrating real-world data exist. This separate analy-sis approach has a significant drawback: it providesno statistical characterization of input uncertainty.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Barton, Nelson, and Xie: Quantifying Input UncertaintyINFORMS Journal on Computing, Articles in Advance, pp. 1–14, © 2013 INFORMS 3

The second approach is to combine explicit char-acterization of the impact of input uncertainty withthe measurement of output variability to provide con-fidence intervals for output parameter values. Therehave been three primary streams of research in thisvein.

One stream of research has developed methods fora priori known families of parametric input distri-butions, using maximum likelihood estimator prop-erties of the parameter estimates and Taylor seriesapproximations of the expected simulation response,and assuming that the correct parametric familiesof the input distributions are known. Cheng (1994)and Cheng and Holland (1997) used a delta-methodapproximation to achieve this, which requires anadditional number of simulation runs that is linear inthe number of input model parameters. Cheng andHolland (1998, 2004) suggested an alternative methodthat requires only two runs, but yields more conser-vative confidence intervals. In many cases, the lin-ear approximation of the impact of input on outputthat these methods require cannot be supported. Ourapproach, metamodel-assisted bootstrapping, directlyaccounts for nonlinearity.

A second stream of research is based on Bayesianmethods. The distribution of the expected value of thesimulation output is characterized by running simu-lations at each of many repeated samplings from aposterior distribution for the parametric input modelparameters. The posterior distributions are deter-mined by applying Bayes’ rule to a prior distributionand observations of real-world data. This Bayesianmodel averaging (BMA) strategy has been describedand refined by Chick (1997, 1999, 2000, 2001), Chickand Ng (2002), Ng and Chick (2001), Zouaoui andWilson (2001a), and Zouaoui and Wilson (2001b,2003). These methods require normality and homo-geneity assumptions on the impact of input uncer-tainty that may not be tenable. Zouaoui and Wilson(2004) used a different BMA approach to capture bothparameter uncertainty and uncertainty across an apriori determined set of possible parametric families.The assumption of homogeneous variance inducedby input variability remains, and the t-type confi-dence interval they describe also depends on approx-imate normality because of input uncertainty. Billerand Corlu (2011) extended the Bayesian framework tohandle a large number of correlated inputs.

A third stream of research takes a frequentistapproach, characterizing the impact of input uncer-tainty on the simulation output using direct sam-pling and bootstrap resampling methods. This workincludes the papers on nonparametric bootstrapapproaches by Barton and Schruben (1993, 2001) andBarton (2007), and a parametric bootstrap approach

by Cheng and Holland (1997). These percentile inter-vals do not require normality or homogeneity becauseof input uncertainty. Unfortunately, because the boot-strap statistic is the output of a stochastic simulation,it is not a smooth function of the input data; thisviolates a requirement for asymptotic consistency ofa bootstrap confidence interval, which is one of thecentral issues that metamodel-assisted bootstrappingaddresses.

Further, the sampling methods in this third streamand the percentile method in Zouaoui and Wilson(2004) suffer from a failure to distinguish the behaviorof output variability and input variability. Schmeiserin his discussion in Barton et al. (2002, p. 363) noted“Any reasonable version [of a method capturing bothinput-uncertainty error and simulation-estimationerror] needs to reflect the fundamental difference thatsampling error decreases with additional simulationsampling while additional sampling has no impacton modeling error.” This means that confidence inter-vals based on resampling strategies will have cover-age that exceeds the nominal value if the simulationeffort for each resample is relatively low. Barton iden-tified this error for nonparametric bootstrap methodsin Barton et al. (2002) and proposed simulation effortguidelines in Barton (2007).

There has been related work on allocation-of-effortstrategies to minimize the joint impact of inputuncertainty and simulation-output uncertainty. Leeand Glynn (2003) developed a framework to esti-mate the distribution of the conditional expecta-tion of a simulation-output statistic in the presenceof model and parameter uncertainty, and presentedmethods to minimize the mean squared error ofan estimator of the expected value. Steckley andHenderson (2003) further developed the frameworkand described a kernel approach for estimating thedensity of the conditional expectation depending on asingle unknown parameter. Ng and Chick (2001) usedthe delta method in a Bayesian framework to sequen-tially choose which real-world data to augment withadditional samples to reduce the variance of the simu-lation output. Freimer and Schruben (2002) describedtwo approaches to determine whether additional real-world data should be collected, given a fixed levelof simulation effort. In the first approach, upper andlower confidence limits on model parameters are usedas parameter values in a factorial set of simulationexperiments. If the parameter effects are statisticallysignificant, then more real-world data must be col-lected. In the second approach, the real-world dataare bootstrap resampled to generate random values ofthe parameters, and a random-effects model is usedto test for significance of the parameter uncertainty.

Henderson (2003) gives a detailed review and cri-tique of these approaches. He found no clearly supe-rior method, and suggested the need for a method

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


that is transparent, statistically valid, implementable,and efficient. In this paper we describe a metamodel-assisted bootstrapping method and show its use withparametric input models. This approach addressesmany of the shortcomings in the prior work: Theuse of a general-form metamodel can provide ahigher fidelity approximation than a first-order Taylorapproximation and makes the bootstrap statistic asmooth function of the input data. Further, the use ofa metamodel reduces the impact of intrinsic simula-tion error on the accuracy of the confidence intervalsthat we construct.

In Barton et al. (2010) we conducted an empiri-cal study comparing metamodel-assisted bootstrap-ping against the alternatives using two test caseswith special structure. The results provided com-pelling evidence that metamodel-assisted bootstrap-ping is both effective and superior to competitors,motivating its extension in this paper to more generalinput-uncertainty problems.

The two test cases in Barton et al. (2010)were M/M/� and M/M/1/q queues. We comparedmetamodel-assisted bootstrapping to the standardconditional confidence interval (i.e., ignoring inputuncertainty), direct bootstrapping, Bayesian boot-strapping, and a fully Bayesian analysis using non-informative priors; detailed algorithms are given inBarton et al. (2010). Factors we varied included thequantity of real-world data, simulation budget, andnumber of bootstrap/posterior resamples.

Not surprisingly, the coverage of the conditionalCIs was far below nominal level and decreased withgreater simulation budget. As has been noted in theliterature, the direct bootstrap, Bayesian bootstrap,and Bayesian methods could have substantial over-coverage (for instance, we observed 100% coveragein 1,000 trials when the nominal level was 95%).Although overcoverage is certainly better than under-coverage, overcoverage is not harmless as it misleadsthe analyst into thinking that there is more uncer-tainty than there really is.

Metamodel-assisted bootstrapping was robust, pro-viding coverage right at the nominal level acrossall experimental settings. However, it is importantto note that because all of the input distributionsin these examples were exponential and thus had asingle parameter (the mean), metamodeling is rela-tively straightforward. The open questions remain-ing from Barton et al. (2010) are what form shouldthe metamodel take, and what is an effective exper-iment design to fit the metamodel, for generalmultiparameter distributions? Answers to these ques-tions, along with a proof of validity of the bootstrapapproach, are the subject of the remainder of thepaper. We will not include the competitors in thispaper because they were already shown to be inferiorin Barton et al. (2010).

3. A Framework for ConfidenceIntervals

To account for input uncertainty, we introduce thefollowing representation of a computer simulation.

A simulation is a function g2 �→ Y , where � is anelementary outcome from a probability space 4ì1P5;to be concrete, one might think of � as a realiza-tion of an infinite sequence of i.i.d. uniform (011)random numbers (although only a finite subset willbe used in the simulation). The mapping g is also afunctional of L input distributions, say 8F11 F21 0 0 0 1 FL9.Thus, g4�5 = g4� � F11 F21 0 0 0 1 FL5. In this paper weassume that the different input processes are inde-pendent. We represent the simulation in this wayto indicate that different choices are possible for thedriving input processes, and we distinguish L dis-tributions because the typical simulation is drivenby i.i.d. samples from one or more distinct distribu-tions and “input modeling” involves characterizingeach of these distributions separately. For instance,in a queueing simulation, F1 might be the distribu-tion of the interarrival times of a stationary arrivalprocess, and F2 could be the distribution of servicetimes. In this paper, we only consider parametricinput models. The parameters of the lth input distri-bution are denoted by xl, a pl ×1 vector. The collectionof all of the distribution parameters is denoted x> =

4x>1 1x>

2 1 0 0 0 1x>L 5, facilitating the equivalent but more

concise representation Y = g4� � x5. We discuss ourchoice of parameterization next.

In addition, the simulation output often dependson other parameters describing the structure of themodel or the experiment, such as the number ofservers, the “feeds and speeds” of the equipment, orthe simulation stopping time, but this dependence isomitted from the notation and not a part of our frame-work. We assume that the objective is to characterizesystem behavior given a fixed set of these controllableparameters.

Our interest is in the mean response

�4x5=

∫

g4� � x5 dP1

and in particular, the mean response at the true, cor-rect parameters xc, which are not known and must beestimated from real-world data. Assuming the exis-tence of xc, our goal in this paper is to obtain randombounds CL and CU such that

Pr8�4xc5 ∈ 6CL1CU 79≥ 1 −�0 (3)

The approach we take exploits a metamodel that pre-dicts the mean response �4x5 as a function of theparameters x. Because we will use a spatial correla-tion metamodel, we need to define “space” so thatinput distributions that are “close” to each other lead

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


to similar simulation output. A natural way to mea-sure the distance between two parametric distribu-tions is as a function of the difference between theirparameters. For instance, a gamma4�1 �5 distribu-tion has a shape parameter � and scale parameter �,so the distance between two gamma distributionscould be of the form �14�i −�j5

2 + �24�i −�j52, where

4�11 �25 reflect the importance of differences in eachdimension. Clearly when this distance is small, thegamma distributions will yield probabilistically sim-ilar inputs. Nevertheless, there are a number of rea-sons why we do not measure the distance betweeninput distributions as a function of the natural modelparameters.

To build metamodels we will run simulation exper-iments that cover the relevant parameter space. Buteven two-parameter distributions are not alwaysparameterized by “shape” and “scale.” Therefore,if we use the natural parameters then distancebetween two, say, time-to-failure distributions mightbe measured differently than between two service-time distributions. More importantly, to fit a meta-model we need to define a relevant space to cover.The feasible parameter space for most parametric dis-tributions is enormous (e.g., � > 0 for the gamma)and would require far too many experiments to covercompletely. So instead we want an experiment designstrategy that adapts to the real-world data we have,and the possible bootstrap resamples it could gener-ate. The natural parameters of many distributions arecomplex nonlinear functions of the data, making itdifficult or costly to deduce the relevant range.

In this paper, we measure the distance between dis-tributions as a function of the distance between thefirst few moments (or standardized moments) of thedistribution. We assume that a pl-parameter distri-bution is uniquely specified by its first pl moments,which is true for the distributions that are most oftenused in stochastic simulation. We are exploiting thefact that when the parametric family of distributionsis known and fixed then the values of its momentscompletely specify it, and when the moments areclose, then the distributions will be similar. Thus, theindependent variable x is the concatenation of the firstpl moments of all input distributions l = 1121 0 0 0 1L.The dimension of x is d =

∑Ll=1 pl.

The output from a single simulation replication canbe written as Y 4x1�5 = �4x5 + �4x1�5, where �4x5 =∫

g4� � x5 dP and �4x1�5 = g4� � x5 − �4x5. From hereon we drop the dependence of these random variableson � for notational convenience, giving the randomoutput

Y 4x5=�4x5+ �4x50 (4)When independent replications are obtained (fromindependent realizations �j of �) we append a sub-script j to indicate the jth replication as in Yj4x5 =

�4x5+ �j4x5.

For many simulation settings the output is an aver-age of a large number of more basic outputs, soa version of the central limit theorem implies thatthe distribution of �4x5 (and consequently Y 4x5) isapproximately Gaussian. Simulation outputs in thiscategory include continuous-time-average statisticssuch as utilization and discrete-time-average statisticssuch as average waiting time. If xc were known thenthe desired CI could be obtained from classical statis-tics using simulated outputs 8Yj4xc51 j = 1121 0 0 0 1n9from a set of n replicated simulation runs.

Of course, xc is typically unknown and must beestimated using finite real-world samples, and thenumber of observations from each of the L inputdistributions might differ. Let ml be the number ofi.i.d. observations from lth input distribution, and letthe observed values be Zl1ml

= 8Zl111Zl121 0 0 0 1Zl1ml9.

Let Zm = 8Zl1ml1 l = 1121 0 0 0 1L9 be the collection of

observations from all L input distributions, wherem = 4m11m21 0 0 0 1mL5. The pl × 1 vector of momentestimators for the lth process is a function of this data,Xl1ml

= Xl4Zl1ml5. Combining moment estimators from

all processes, we obtain a d × 1 moment vector X>m =

4X>11m1

1 0 0 0 1 X>L1mL

5.Simulation-output analysis is usually based on a

particular realization of Zm, say z405m , giving a corre-

sponding moment (parameter) estimate x405m . Standard

simulation-output analysis constructs a confidenceinterval that is conditioned on x405

m ; that is, it formsrandom bounds CL and CU such that

Pr8�4x405m 5 ∈ 6CL1CU 7 � X405

m = x405m 9≥ 1 −�0 (5)

The probability in the conditional CI is with respectto �; the randomness in x405

m is ignored. Thus, thebounds provide a probability guarantee of covering�4x405

m 5 rather than �4xc5. As has been shown in manypapers, this can lead to coverage probabilities for�4xc5 that are far from 1 − �. Our goal is to obtainasymptotically correct confidence intervals for �4xc5as the amount of real-world data increases, under theassumptions that the real-world input distributionsare stable and the model logic is correct. Clearly thereal-world distributions must be unchanging for theprobability statement (3) to make sense.

Our focus on a CI for �4xc5 avoids the problemof correcting for the bias of �4x405

m 5 as an estimatorof �4xc5, as illustrated in §1.1; instead we bound thevalue of �4xc5 with high probability and let this inter-val account for the bias.

How can we obtain a CI like (3) rather than onelike (5)? Suppose, as a thought experiment, that wecould obtain i.i.d. observations of size m of the real-world process, Z405

m 1Z415m 1Z425

m 1 0 0 0 , and let X4i5m denote

the corresponding moment estimator obtained from

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


the ith set. This makes it easy to see that the dis-tribution of the simulation-output Y 4X4i5

m 5 has twosources of randomness: the simulation output vari-ability conditional on X4i5

m , and the randomness fromthe input variability of Z4i5

m used in constructing X4i5m .

Specifically,

Y 4X4i5m 5 = �4X4i5

m 5+ �4X4i5m 5

= �4xc5+ 6�4X4i5m 5−�4xc57+ 6Y 4X4i5

m 5−�4X4i5m 57

= �4xc5+D+V 0 (6)

Now suppose that the function �4 · 5 were known,as well as the distribution of D, denoted FD. Thenif we observed only one real-world sample Z405

m , therandom interval CU = �4X405

m 5 − F −1D 4�/25 and CL =

�4X405m 5− F −1

D 41 −�/25 achieves objective (3) because

Pr8CL ≤�4xc5≤CU 9

= Pr8F −1D 4�/25≤�4X405

m 5−�4xc5≤ F −1D 41 −�/259

= 1 −�1

because �4X405m 5−�4xc5 has distribution FD. Of course,

neither �4 · 5 nor FD are known, but this insightsuggests a path to a CI:

• Use simulation experiments to build a meta-model �4 · 5 to stand in for �4 · 5.

• Let bootstrap resampling stand in for repeatedobservations of the real-world data. More specifically,for the lth input process, l = 1121 0 0 0 1L, draw ml ran-dom samples with replacement from Z405

l1ml; repeat this

procedure B times to obtain B bootstrap samples andcorresponding moment estimators X4b5

m , b = 1121 0 0 0 1B.Approximate the distribution FD of �4X4i5

m 5−�4xc5 bythe empirical distribution of �4X4b5

m 5− �4X405m 5.

Building a metamodel for the mean simulationresponse as a function of the input moments is key toour framework, and building an accurate one is essen-tial to making it work. By running a designed exper-iment in the input moment space we leverage moreinformation about the response surface �4 · 5, and theimpact of distribution choice on output performance,than we would have from a single simulation at x405

m .Fitting a metamodel allows us to spend the simulationbudget carefully across a relatively small number ofwell-chosen design points providing precise estimatesnot only at those design points, but throughout themean response surface. As we show empirically, thiseffectively eliminates the impact of V . Using a meta-model �4 · 5 also provides the continuity conditionsthat support the asymptotic validity of bootstrapping,which is that the bootstrap statistic be a smooth func-tion of the input data. Of course, the challenges aredesigning the experiment in input-model space, in thechoice of metamodel, and in proving that it works.These are the primary contributions of this paper.

An alternative to creating a metamodel is runningseparate simulation experiments at each of the boot-strap input moments X4b5

m 1 b = 1121 0 0 0 1B. Because, foraccuracy of the bootstrap CI, B is recommended tobe at least 1,000, this spreads the simulation budgetacross a relatively large number of bootstrap exper-iments, leading to imprecise estimates of the meanresponse at those points. Metamodeling leverages allof the data from a designed experiment to providebetter predictions of the response at each bootstrappoint.

4. Metamodel-Assisted BootstrappingNext we give an overview of our approach, hinting atthe particular (and important) implementation detailswe describe in the sections that follow. Within thisframework other implementation choices are possi-ble, although we will make arguments in favor of ourselections.

1. Input: A set of real-world data z405m .

2. Construct metamodel �4x5: Select an experimentdesign, run simulations at the design points, and fit ametamodel �4x5 as an approximation to the true meanresponse �4x5 as a function of the input moments x.This requires two choices:

a. Metamodel form. We choose stochastic kriging,which is reviewed in §4.1, because it does notrequire strong assumptions about the response sur-face �4x5, and it is specifically intended to accountfor output variability in stochastic simulation.b. Experiment design. The design points xi, i =

1121 0 0 0 1 k fill the space of most likely bootstrapresample moments. To locate this region, we gener-ate a preliminary bootstrap resample from the real-world data z405

m whose sole purpose is to map therelevant design space (see §4.2).3. Construct confidence interval: Generate indepen-

dent bootstrap resamples Z4b5m ∼ z405

m and correspond-ing sample moments X4b5

m , b = 1121 0 0 0 1B, and let theconfidence interval be the �/2 and 1 − �/2 samplequantiles of

8�4X415m 51 �4X425

m 51 0 0 0 1 �4X4B5m 590

That is, we use a bootstrap percentile interval. Thevalidity of the approach is discussed in §4.3.

4.1. Stochastic KrigingAt the heart of our approach is a metamodel thatpredicts the mean simulation response as a functionof the moments of the input models that drive thesimulation. Consider the examples used in Bartonet al. (2010): Because the input models are a Poissonarrival process and exponentially distributed servicetimes, the arrival rate � and the mean service time �provide complete characterizations. For the M/M/�

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


queue the steady-state expected number in queue is�4�1�5= �� , and for the M/M/1/q queue it is

�4�1�5=��

1 −��−

4q + 154��5q+1

1 − 4��5q+10

Notice that even restricting ourselves to Markovianqueueing models, these two response functions arequite different. To be general, we need a metamod-eling framework that neither makes strong assump-tions about the response surface (such as beinglow-order polynomial), nor is tied to particularinput distributions (such as exponential). Our pro-posal is to use stochastic kriging for the metamodel,and input-distribution moments as the independentvariables.

Kriging is an interpolation-based method that hasbeen widely used for the design and analysis of deter-ministic computer experiments (Santner et al. 2003).Differing from computer experiments with determin-istic response, the outputs from stochastic simulationinclude intrinsic output variability, and this variabil-ity often changes significantly across the design space.We use the term “intrinsic” uncertainty or variabilityto refer to the variability inherent in the sampling thatgenerates stochastic simulation output, as opposed to“extrinsic” uncertainty (introduced later) that refersto our lack of knowledge about the mean responsesurface. To meet the requirements of stochastic sim-ulation, stochastic kriging (SK) was introduced byAnkenman et al. (2010). Whereas kriging metamodelsassume that there is no error at the design points, SKappropriately weights the contributions of the designpoints depending on their precision. In this sectionwe provide enough background on SK to see why wechose it and how we use it.

In SK, Ankenman et al. (2010) represent thesimulation output on replication j at design point x as

Yj4x5= f4x5TÂ+W4x5+ �j4x50 (7)

The independent variable x = 4x11x21 0 0 0 1 xd5 is inter-preted as a location in space and the variationin the simulation response is divided into threeuncorrelated parts: a trend model f4x5TÂ, extrin-sic (response-surface) uncertainty W4x5, and intrinsic(simulation-output) uncertainty �j4x5.

When we have no process physics to justify aparametric trend model—which is the situation weassume—best practice in kriging and stochastic krig-ing is to use only a constant trend f4x5TÂ= �0. Includ-ing a trend model, if appropriate, would not changeour approach.

Uncertainty about the response surface is modeledby a mean 0, second-order stationary Gaussian ran-dom field W , which accounts for the spatial depen-dence. Loosely, a Gaussian random field is a function

from <d to < such that any finite collection ofW4x151W4x251 0 0 0 1W4xk5 has a multivariate normaldistribution.

A parametric form for the spatial covariance ofW4x5 is typically assumed:

è4x1x′5= Cov6W4x51W4x′57= �2r4x − x′� È51

where r is a correlation function that depends onsome unknown parameters È and �2 is the variance.Stationarity implies that the response-surface correla-tion depends only on x−x′ and not the actual locationof the design points.

Based on the assumed smoothness of the response,different correlation functions can be chosen. Theproduct-form Gaussian correlation function is themost commonly used in practice, as well as the onewe use in this paper:

r4x − x′� È5= exp

{ d∑

j=1

�j4xj − x′

j52

}

0 (8)

To generate a global predictor, we choose k designpoints x11x21 0 0 0 1xk, and then run ni replications at theith point. The sample mean of the simulation outputsat xi is Y 4xi5=

∑nij=1 Yj4xi5/ni, and the vector of sample

means at all design points is denoted by Y = 4Y 4x15,Y 4x251 0 0 0 1 Y 4xk55

T . The variances of the simulation-output uncertainty � at different x’s are typically notequal. Let �24x5 = Var6�4x57. Because the outputs atdifferent x’s are independent,1 the diagonal matrix Cof intrinsic output variances can be written as

C =

�24x15/n1

0 0 0

�24xk5/nk

0

Let è represent the k × k covariance matrix ofè4xi1xj5 across all design points x11x21 0 0 0 1xk, andlet è4x1 ·5 be the k × 1 covariance vector betweenprediction point x and all design points. Then theminimum mean squared error (MSE) linear predictor(Ankenman et al. 2010) is

�4x5= �0 +è4x1 ·5>6è+ C7−14Y −�0 · 1k×150 (9)

Notice that the prediction at x is the weighted averageof the observations, with the design points closer tothe prediction point and with smaller output variancehaving larger weight.

Because the constant �0 and spatial correlationparameters �2 and È are unknown, maximum likeli-hood estimates are typically used for prediction. SK

1 Chen et al. (2012) showed that the use of common random num-bers inflates the mean squared error of prediction for stochastickriging, so we use independent simulations.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


employs an estimate of C using sample variancesS24xi5. To facilitate experiment design, SK also fits avariance model �24x5 = �2 + V 4x5, where V 4x5 is anindependent Gaussian random field. See Ankenmanet al. (2010) for details.

4.2. Experiment DesignIn this section, we describe an experiment designstrategy for fitting a stochastic kriging metamodel�4 · 5 so that it can accurately estimate the meansimulation response at bootstrap resample moments.

The process of bootstrapping involves three steps:(i) resampling with replacement from the originalsample data, (ii) computing the statistic under studyfrom the bth set of bootstrap resampled data, and(iii) characterizing the empirical distribution of theB values of the bootstrapped statistic to yield con-fidence intervals. In our case the resampled dataare themselves moments, and the computed statis-tic is the simulation output. The design space forfitting the metamodel is well defined by consider-ing where the values computed under step (i) fall,and so these bootstrap resample moments themselvescould be used for the experiment design. Unfortu-nately, this approach has two drawbacks. First, if thenumber of bootstrap resamples (B) is not very large,then the randomness of the selection process will notlikely result in uniform coverage of the design space.On the other hand, if B is very large, then running thesimulation an extremely large number of times willbe computationally intractable. One might imagineselecting a small subset of a large number of candi-date bootstrap resamples to maximize some unifor-mity property, such as a maxi–min design, but thisstrategy is also computationally intractable when B islarge. This has been a motivating factor for compu-tationally efficient design construction strategies suchas Latin hypercubes and other orthogonal arrays, anduniform designs. Our strategy is to modify one ofthese efficient design generation schemes to generatepoints falling in some regular region where bootstrapresample moments can be expected to fall.

The most common regular region used to define adesign space is a hyperbox. For ease of exposition, wefirst suppose that there is a single input from whichwe have a sample of real-world data z

4051 1 z

4052 1 0 0 0 1 z

405m

and a two-parameter input distribution (such as thegamma, lognormal, or Weibull). A bootstrap resam-ple is a sample of size m with replacement fromz4051 1 z

4052 1 0 0 0 1 z

405m . Let Z

4b51 1Z

4b52 1 0 0 0 1Z

4b5m denote the bth

bootstrap resample, with corresponding standardizedsample moments (mean and standard deviation) of

Z4b5m =

1m

m∑

j=1

Z4b5j

S4b5m =

√

1m− 1

m∑

j=1

4Z4b5j − Z4b5

m 520

The metamodel will be used to predict the meansimulation response given any bootstrap resamplemoments 4Z

4b5m 1 S

4b5m 5. Thus, we need the metamodel to

be accurate over the space of possible 4Z4b5m 1 S

4b5m 5 pairs

that we could obtain from bootstrap resampling thereal-world data.

There is no way to specify a tight experimentdesign—one that covers only the relevant momentspace—prior to obtaining the real-world data. There-fore, we need to use the real-world data to guidethe experiment design. Given z

4051 1 z

4052 1 0 0 0 1 z

405m , the

extreme moments can be computed: the minimumand maximum sample means are min z

405j and max z405j ,

respectively, and the minimum standard deviation is0 and the maximum (if m is even) is

√

√

√ m

m− 1

(max z405j − min z405j

2

)2

0

However, the rectangle defined by these extremes is apoor choice for the design space because the extremesare very unlikely, or impossible in the case of themaximum standard deviation paired with either theminimum or maximum mean. Further, 4Z4b5

m 1 S4b5m 5 will

tend to be correlated so that the entire rectangle isnot equally relevant. These deficiencies become evenmore severe when there are L > 1 input processes ifwe let the experimental region be the hyperrectangledefined by the extreme moments.

The tilted cloud shape common in scatter plots sug-gests that an ellipse (or ellipsoid when d > 2) willtend to provide a more suitable regular region thana hyperbox. When B is not very large, the enclosingellipsoid combined with a transformed Latin hyper-cube design that falls within the ellipsoid is easy tocompute and helps ensure uniform coverage. Thisleads to the following algorithm.

Fit Ellipse.1. Generate B0 bootstrap resamples from z405

m andcompute the corresponding sample moments X4b5

m 1b = 1121 0 0 0 1B0.

2. Fit an ellipsoid to the sample moment data insuch a way that it contains a large fraction of thepairs, say q. Specifically, find the smallest a2 such that

#8b2 4X4b5m − M5>V−14X4b5

m − M5≤ a29/B0 ≥ q (10)

where

M =

B0∑

b=1

X4b5m /B0 and

V =

B0∑

b=1

4X4b5m − M5>4X4b5

m − M5/4B0 − 150

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


3. Generate B1 independent bootstrap resamplesfrom z405

m and compute X4b5m , b = B0 + 11B0 + 21 0 0 0 1

B0 +B1. If more than c of these B1 additional pairs arecontained in the ellipsoid (10), then accept this ellip-soid as the design space. Otherwise, add these boot-strap resamples to the data set, let B0 ← B0 + B1, andgo to step 2.

Here, B1 and c are chosen to obtain a desired type Ierror and power for a hypothesis test based on thebinomial distribution that the probability a bootstrapresample moment is contained in the ellipsoid is ≥ q.In our experiments we set q = 0099, type I error to00005, and power to 0095 when the true probability isq = 0097 or less.

The output of this algorithm is an ellipsoid definedby M1V, and a

8x ∈ <d2 4x − M5>V−14x − M5≤ a290 (11)

To place k design points into this space, we employan algorithm due to Sun and Farooq (2002), §3.2.1, forgenerating points uniformly distributed in the desiredellipsoid. The algorithm first generates the polar coor-dinates of a point uniformly distributed in a hyper-sphere, then transforms it to Cartesian coordinates,and finally transforms it again to a point uniformlydistributed in an ellipsoid. The advantage of thisapproach is that each element of the initial polar coor-dinates are independently distributed, allowing themto be generated coordinate by coordinate via theirinverse cumulative distribution function. Therefore, ifwe start with points 8uj1 j = 1121 0 0 0 1 k9 ∈ 60117d thatare space filling instead of random, then their space fill-ing property will tend to be preserved in the ellipsoid.We obtain 8uj1 j = 1121 0 0 0 1 k9 by generating manyLatin hypercube samples and picking the one thatmaximizes the minimum distance between points.The output is then an experiment design x11x21 0 0 0 1xk

with all the xi falling in the ellipsoid (11). Figure 1illustrates how this works when there is L = 1 inputdistribution with two parameters.

Note that no simulation experiments are requiredto generate the experiment design. Of course thebootstrap resamples that are generated to create theellipsoid could be evaluated directly via simulationruns rather than fitting a metamodel at all. This strat-egy is inferior to what we propose for three rea-sons. First, the simulation response is stochastic: tworeplications with the same input parameters producedifferent results, so the bootstrap statistic is not asmooth function of the data. Second, when the num-ber of input parameters is not too large, the experi-ment design that we generate to fit a metamodel willrequire significantly fewer simulation runs (e.g., tens)than would be needed for direct bootstrapping (e.g.,thousands). Third, the intrinsic error effect on the cor-rectness of bootstrap intervals as discussed in Barton

2.2 2.4 2.6 2.8 3.0 3.2

1.2

1.4

1.6

1.8

2.0

2.2

2.4

X

S

Figure 1 Experiment Design in Moment Space, Where the HorizontalAxis Is the Mean and the Vertical Axis Is the StandardDeviation

Note. A ∗ indicates a bootstrap resample, the dashed line is an ellipse thatcovers 99% of the data, and a � indicates a design point.

(2007) is reduced significantly when it is propagatedthrough the fitted metamodel.

4.3. Metamodel-Assisted Bootstrap CIIn this section we present the metamodel-assistedbootstrapping algorithm and provide some theoreticalsupport for its use.

Metamodel-Assisted Bootstrap1. Given real-world data z405

m , obtain experimentdesign xi1 i = 1121 0 0 0 1 k as described in §4.2.

2. For design points i = 1121 0 0 0 1 k, simulate i.i.d.replications Yj4xi5 for j = 112 0 0 0 1ni, and compute thesample average Y 4xi5 and sample variance S24xi5 ofthe simulation outputs.

3. Fit a stochastic kriging metamodel �4x5 using4Y 4xi51 S

24xi51xi5, i = 1121 0 0 0 1 k as described in §4.1(see Equation (9)).

4. For b = 1 to B,(a) generate bootstrap resample Z4b5

m ∼ z405m and

compute sample moments X4b5m ;

(b) set �b = �4X4b5m 5.

Next b5. Report bootstrap percentile confidence interval

6�4�B�/2�51 �4�B41−�/25�57, where �415 ≤ �425 ≤ · · · ≤ �4B5 arethe sorted values.

Consider the algorithm above with �4 · 5, the trueresponse function, replacing the simulation-basedestimator �4x5 everywhere in the algorithm. Weestablish the asymptotic coverage of the bootstrappercentile CI in this case, then discuss the necessityfor estimating �4 · 5 following the proof.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Theorem 1. Suppose that the following conditionshold:

1. The input distribution Fl is uniquely determined byits first pl moments, and it has finite first 2pl moments, forl = 1121 0 0 0 1L.

2. We have i.i.d. observations Z405l111Z

405l121 0 0 0 1Z

405l1ml

fromFl, for l = 1121 0 0 0 1L. As m → � we have m/ml → 1,l = 1121 0 0 0 1L, where m drives the number of observationsto � from all input sources.

3. The response surface �4x5 is finite and continuouslydifferentiable with nonzero gradient in a neighborhoodof xc.

4. If x′ is a moment vector such that �4x′5 does not exist,then the definition of �4 · 5 is extended so that �4x′5 = �

or −�.2

Then the bootstrap CI is consistent, meaning

limm→�

limB→�

Pr8�4xc5∈ 6�4�B�/2�51�4�B41−�/25�579=1−�0 (12)

Proof. We show that the CI satisfies the conditionsof Theorem 4.1 of Shao and Tu (1995) for consistencyof a bootstrap CI.

The distribution of√m6�4X4b5

m 5−�4X405m 57 is strongly

consistent for the distribution of√m6�4X405

m 5 − �4xc57as m→ � by Theorem 3.1 of Shao and Tu (1995). Thedistribution of

√m6�4X405

m 5 − �4xc57 is asymptoticallynormal with mean 0 as m→ � using standard delta-method arguments. Together, these results establishthat the percentile interval 6�4�B�/2�51�4�B41−�/25�57 satis-fies the conditions of Theorem 4.1 of Shao and Tu(1995) and is therefore asymptotically consistent asm1B → �; that is, (12) holds. �

The CI in Theorem 1 is 6�4�B�/2�51�4�B41−�/25�57, andthe CI from metamodel-assisted bootstrapping is6�4�B�/2�51 �4�B41−�/25�57. Stated differently, the theoremassumes that �4x5=�4x5 and there is no metamodel-ing error. As a practical matter, what our proof estab-lishes is that metamodel-assisted bootstrapping willbe effective when we have a (nearly) global meta-model with a good fit. We chose stochastic krigingbecause it explicitly accounts for simulation variabil-ity through the intrinsic variance matrix C and canprovide a good fit without making strong assump-tions about the form of the response surface, such asbeing linear as assumed in a first-order Taylor seriesapproximation. Notice, also, that stochastic krigingwith the Gaussian correlation function (8) always sat-isfies the smoothness property of condition 3. Whatwe have found to date (see the results in §5 andin Barton et al. 2010) is that when the input uncer-tainty is substantially more significant than the out-put variability, then the metamodel-assisted bootstrap

2 The canonical example of this is an open queueing system wherethe arrival rate is greater than the service rate. This extensioninsures that bootstrap samples leading to infeasible moments implyresponses in an extreme tail.

CI attains the correct coverage. This shows that inpractice the metamodel need not be exact, but onlysufficiently good so that it can be treated as exact.Unfortunately, the interaction between the amount ofreal-world data available and the simulation effortrequired to obtain a metamodel that is “sufficientlygood” is not simple: When m is very large, the meta-model may only need to be accurate over a very smallregion near the true values of the input-model param-eters; thus, the entire experiment design is concen-trated within a small ellipsoid. When m is small, thevariation in the likely parameter values could be sub-stantial, leading to an ellipsoid that covers a largespace of possible parameter values and a complexresponse surface. For these reasons we introduce anempirical test for sufficiently good next.

For stochastic kriging, having an adequate num-ber of design points k seems to be more importantthan having a substantial number of replications perdesign point ni. The standard recommendation fromthe kriging literature of k ≥ 10d, where d is the dimen-sion of the independent variable x, works well andis what we recommend. In the absence of any priorinformation about the variances of the simulation out-puts at the design points (�24xi51 i = 1121 0 0 0 1 k), anequal allocation of ni = n0 replications per designpoint is sensible; we recommend n0 ≥ 10 simplyto insure a stable estimate of the intrinsic variancematrix C = diag6�24x151�

24x251 0 0 0 1�24xk57/n0.

Following these simple guidelines will frequentlybe sufficient to achieve the desired CI coverage. How-ever, it is possible that if the intrinsic variance is verylarge then there might be undercoverage; therefore,we provide the following easy-to-apply empirical testand remedial action if the test fails.

Our CI is based on the metamodel �4 · 5 formedfrom k design points with (say) n0 replications at each.Let C be the estimate of C. Notice that, given a setof parameters, C is the only term in the stochastickriging metamodel (9) that is affected by the numberof replications. If we increased the number of repli-cations per design point to n0 + ãn, then we wouldexpect the new estimate of C to be approximatelyC+ = 4n0/4n0 + ãn55C. As a test for whether the CIin step 5 is sensitive to additional replications, weadd the following steps to the metamodel-assistedbootstrap algorithm:

6. Form an adjusted metamodel �+4 · 5 by onlyreplacing C with C+ and leaving everything else thesame (no new simulation-output data are generated).

7. If

#8�+4X4b5m 5 ∈ 6�4�B�/2�51 �4�B41−�/25�579

B≈ 1 −�1 (13)

then accept the CI. Otherwise, actually increase thenumber of replications per design point to n0 ← n0 +

ãn and go to step 3.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


In words, (13) tests whether the empirical coverageof the bootstrap CI would change if we increased thenumber of replications per design point. If not, thenthe current CI is stable with respect to the simulation-output variance. The algorithm is guaranteed to ter-minate because C → 0 with probability 1 as n → �.We illustrate the contribution of these additional stepsin the next section.

5. Empirical EvaluationTo evaluate metamodel-assisted bootstrapping, weconsider estimating the mean number of cus-tomers in a GI/G/1/50 queue where the interarrivaltimes are gamma4�A1�A5 and the service times aregamma4�S1�S5. The real-world parameters are �A = 2,�A = 5/3, �S = 3, �S = 1, implying a traffic intensity of0.9. Based on a very long simulation, the true meannumber of customers in the system in steady state is40147. In the experiments we assume that the inputdistribution families are known to be gamma, but theparameters �A, �A, �S , �S are unknown. To imitateobtaining real-world data, we generate samples fromthe correct gamma distributions.

When we do metamodel-assisted bootstrapping,the metamodel is a function of the means and stan-dard deviations of both interarrival and service times;that is, pA = pS = 2 and the metamodel is four-dimensional. We systematically examined the impactof the quantity of real-world data, the total simulationbudget, and allocation of the budget between designpoints and replications by varying them as follows:we chose real-world sample sizes m= 1015011001500(both interarrival and service times have a commonsample size); two levels of total computational bud-get nc = 600111200 replications; and number of designpoints k = 20140160 (see Figure 2). At each designpoint, the run length was 1,000 finished customerswith warm-up period of 300 customers. The numberof replications at all design points is nc/k, so we didnot take advantage of the ability of SK to differen-tially allocate replications based on differences in out-put variance.

We also report results for the conditional CI, whichmeans fitting the gamma distributions to the real-world data and allocating all nc replications to sim-ulating the resulting system. The competitors tometamodel-assisted bootstrapping were evaluated inBarton et al. (2010).

Table 1 displays the coverage, width, and stan-dard deviation of the width of metamodel-assistedbootstrapping CIs and conditional CIs based on1,000 macroreplications of the entire experiment; eachmacroreplication obtains an independent sample ofreal-world data, and the nominal confidence levelis 95%. The small width of the conditional CIs sug-gests that we have estimated the true mean very

20 40 60

600

1,200

k

n c

n = 60 n = 30 n = 20

n = 30 n = 15 n = 10

Figure 2 Experiments for GI/G/1/50 Queue

precisely. However, the coverage of the conditionalCI is far from the nominal 95% level, even whenthe quantity of real-world data is a relatively largem= 500 observations. The much wider CI obtainedwith metamodel-assisted bootstrapping shows that,even with 500 real-world samples, input uncertaintydominates the simulation uncertainty. In other words,the simulation has no practical value if we really wantto know what the long-run average number of cus-tomers would be in the real world: the CI width isthree to 11 times as big as the mean value. With-out some evaluation of input uncertainty the analystwould have no clue how little they actually knew.

Notice that the coverage of the metamodel-assistedbootstrapping CI is very close to the nominal valuewhen m is 50 or greater, and the width of the CIdecreases as roughly

√m. The coverage is closer to

90% when we have only 10 real-world observations,but this is a very small sample of data to estimate theparameters of any distribution. In these experimentsthe coverage of the metamodel-assisted CI is robustto the experiment design; that is, it works well with asmaller design each receiving many replications, or alarger design with fewer replications per design point.

These results, as well as those in Barton et al. (2010),demonstrated robust coverage provided that there isat least a modest amount of real-world data; andeven when there is little real-world data (m = 10 inour experiments) the coverage is much closer to thenominal level than the conditional CI.

The amount of real-world data is often not underour control, but the amount of simulation effort is.Therefore, we created an example in which nc is ini-tially too small to obtain the desired coverage to showthat our test (steps 6–7) can both detect the lack ofcoverage and compensate for it. To do this, we cut

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Table 1 Coverage, Width, and Standard Deviation (Std. Dev.) of Widthof Metamodel-Assisted Bootstrapping and Conditional CIs

nc = 600 nc = 11200

k = 20 k = 40 k = 60 k = 20 k = 40 k = 60

m = 10M-A bootstrap 8808 9009 8602 8900 9002 8303

coverage (%)M-A bootstrap 48097 47090 44009 49039 48045 43064

widthStd. dev. bootstrap 16039 15043 18020 15082 15086 18038

widthConditional CI 1 003

coverage (%)Conditional CI 0027 0019

widthStd. dev. conditional 0020 0010

widthm = 50

M-A bootstrap 9406 9503 9500 9409 9600 9504coverage (%)

M-A bootstrap 40040 39002 37085 40036 39021 38000width

Std. dev. bootstrap 15000 13072 13067 14091 13087 13064width

Conditional CI 2 2coverage (%)

Conditional CI 0049 0032width

Std. dev. conditional 0028 0013width

m = 100M-A bootstrap 9504 9407 9408 9605 9506 9505

coverage (%)M-A bootstrap 33014 32012 31030 33022 32004 32009

widthStd. dev. bootstrap 14076 14054 14054 14092 14028 14020

widthConditional CI 3 1

coverage (%)Conditional CI 0045 0034

widthStd. dev. conditional 0024 0013

widthm = 500

M-A bootstrap 9601 9601 9308 9408 9506 9602coverage (%)

M-A bootstrap 12027 11071 11046 12062 12029 11064width

Std. dev. bootstrap 8049 8000 8039 8043 8032 8004width

Conditional CI 6 4coverage (%)

Conditional CI 0032 0023width

Std. dev. conditional 0008 0003width

the run length of each replication from 1,000 com-pleted customers to 50 completed customers, appliedthe rule of thumb of k = 10d = 40 design points, ini-tially made n0 = 10 replications at each design point,and used the increment ãn = 10 if the test in step 6

indicated that more replications were needed (esti-mated coverage that deviated by no more than 00005from 0095 was required to pass the test). We did thisfor two levels of real-world data, m = 1001500. Thisgave coverages after the initial n0 replications of only00921 and 00924, respectively. When m = 100, imple-mentation of steps 6 and 7 increased the coverage to00953 with an average of 1409 replications per designpoint at termination. When m= 500 the coverage was00955 with an average of 1804 replications per designpoint at termination. As before, coverage was esti-mated using 1,000 macroreplications.

6. ConclusionsThe empirical results in this paper, and those inBarton et al. (2010), illustrate good performance formetamodel-assisted bootstrapping when the simu-lation budget is sufficient to give a short condi-tional CI; in other words, when input uncertaintydominates point-estimator variability. We also pro-vided an empirical test to verify that this is the case.Because simulation replications are typically cheaprelative to real-world data, metamodel-assisted boot-strapping addresses the situation most frequentlyencountered in practice. We do not recommend thismethod when the simulation budget is so tight thateven the conditional CI is relatively wide.

Although our one-stage experiment design workedwell in the examples, there will be simulationswith more complex response surfaces for which asequential design will be needed to obtain an ade-quate metamodel. Rather than evenly distributing thesimulation effort throughout the design space, weshould sequentially and adaptively add design pointsand replications to balance the global and local fittinguncertainties. Specifically, we need an accurate fit insubareas of the moment space with responses close tothe quantiles that form the bootstrap CI. Sequentialdesign is a strength of stochastic kriging.

An advantage of metamodel-assisted bootstrap-ping relative to direct bootstrapping is effectiveuse of the simulation budget: Metamodel-assistedbootstrapping spends the simulation effort on adesigned experiment that covers the input space andleverages information from all design points; thismakes bootstrapping not only fast but also robustto simulation-estimation error. Direct bootstrappingspreads the simulation budget across 1,000 or morebootstrap resamples, giving noisy, isolated estimatesof the simulation response at each bootstrapped inputsetting.

However, there will be limits to the number ofinput models we can handle via a metamodel: 50input models averaging two parameters each impliesa 100-dimensional input space. Adding a screening

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


step that first determines the most influential inputmodels, and then accounting for only their inputuncertainty via the metamodel, is one way to extendthe methodology to such situations.

When will our method fail or not be appropri-ate? Clearly, input distributions without enough finitemoments (e.g., Cauchy) are outside of the scope ofthis approach. Also, continuity and differentiabilityof �4 · 5 in a neighborhood of the true parameters xc

matter. Consider, for instance, a multiclass queueingnetwork where one input model is the distribution ofcustomer type. A response surface, such as the meantime in the system, can have nondifferentiable pointswhen a small change in customer mix distributioncauses the bottleneck queue to shift. Of course, hav-ing the true input parameters correspond exactly tosuch a point is unlikely.

A further limitation of our approach is the useof parametric input models. Of course, there is notrue, correct input model for any real-world pro-cess, so even techniques that average over a num-ber of possible model families suffer this problem(although perhaps less so). Our proposal is to extendthe metamodel-assisted bootstrapping approach tosimulations driven by empirical distributions (that is,the real-world data itself).

AcknowledgmentsPortions of this paper were previously published in Bartonet al. (2010). The authors acknowledge the helpful commentsof the associate editor and referees in refining the presen-tation. This paper is based upon work supported by theNational Science Foundation [Grants CMMI-0900354 andCMMI-1068473].

ReferencesAnkenman B, Nelson BL, Staum J (2010) Stochastic kriging for sim-

ulation metamodeling. Oper. Res. 58:371–382.Barton RR (2007) Presenting a more complete characteriza-

tion of uncertainty: Can it be done? Proc. 2007 INFORMSSimulation Soc. Res. Workshop (INFORMS Simulation Soci-ety, Fontainebleau), http://www.informs-sim.org/2007informs-csworkshop/13.pdf.

Barton RR, Schruben LW (1993) Uniform and bootstrap resamplingof input distributions. Evans GW, Mollaghasemi M, Biles WE,Russell EC, eds. Proc. 1993 Winter Simulation Conf. (Institute ofElectrical and Electronics Engineers, Piscataway, NJ), 503–508.

Barton RR, Schruben LW (2001) Resampling methods for inputmodeling. Peters BA, Smith JS, Medeiros DJ, Rohrer MW, eds.Proc. 2001 Winter Simulation Conf. (Institute of Electrical andElectronics Engineers, Piscataway, NJ), 372–378.

Barton RR, Nelson BL, Xie W (2010) A framework for input uncer-tainty analysis. Johansson B, Jain S, Montoya-Torres J, Hugan J,Yücesan E, eds. Proc. 2010 Winter Simulation Conf. (Institute ofElectrical and Engineers, Inc., Piscataway, NJ), 1189–1198.

Barton RR, Cheng RCH, Chick SE, Henderson SG, Law AM, LeemisLM, Schmeiser BW, Schruben LW, Wilson JR (2002) Panel oncurrent issues in simulation input modeling. Yücesan E, ChenC-H, Snowdon JL, Charnes JM, eds. Proc. 2002 Winter Simu-lation Conf. (Institute of Electrical and Electronics Engineers,Piscataway, NJ), 353–369.

Biller B, Corlu CG (2011) Accounting for parameter uncertainty inlarge-scale stochastic simulations with correlated inputs. Oper.Res. 59:661–673.

Chen X, Ankenman BE, Nelson BL (2012) The effects of com-mon random numbers on stochastic kriging metamodels. ACMTrans. Modeling Comput. Simulation 22(2):Article 7.

Cheng RCH (1994) Selecting input models. Tew JD, Manivannan S,Sadowski DA, Seila AF, eds. Proc. 1994 Winter Simulation Conf.(Institute of Electrical and Electronics Engineers, Piscataway,NJ), 184–191.

Cheng RCH, Holland W (1997) Sensitivity of computer simulationexperiments to errors in input data. J. Statist. Comput. Simula-tion 57:219–241.

Cheng RCH, Holland W (1998) Two-point methods for assessingvariability in simulation output. J. Statist. Comput. Simulation60:183–205.

Cheng RCH, Holland W (2004) Calculation of confidence intervalsfor simulation output. ACM Trans. Modeling Comput. Simulation14:344–362.

Chick SE (1997) Bayesian analysis for simulation input and output.Andradottir S, Healy KJ, Withers DH, Nelson BL, eds. Proc.1997 Winter Simulation Conf. (Institute of Electrical and Elec-tronics Engineers, Piscataway, NJ), 253–260.

Chick SE (1999) Steps to implement Bayesian input distribu-tion selection. Farrington PA, Nembhard HB, Sturrock DT,Evans GW, eds. Proc. 1999 Winter Simulation Conf. (Institute ofElectrical and Electronics Engineers, Piscataway, NJ), 317–324.

Chick SE (2000) Bayesian methods for simulation. Joines JA, BartonRR, Kang K, Fishwick PA, eds. Proc. 2000 Winter Simula-tion Conf. (Institute of Electrical and Electronics Engineers,Piscataway, NJ), 109–118.

Chick SE (2001) Input distribution selection for simulation experi-ments: Accounting for input uncertainty. Oper. Res. 49:744–758.

Chick SE, Ng SH (2002) Joint criterion for factor identificationand parameter estimation. Yücesan E, Chen C-H, Snowdon JL,Charnes JM, eds. Proc. 2002 Winter Simulation Conf. (Institute ofElectrical and Electronics Engineers, Piscataway, NJ), 400–406.

Freimer M, Schruben LW (2002) Collecting data and estimatingparameters for input distributions. Yücesan E, Chen C-H,Snowdon JL, Charnes JM, eds. Proc. 2002 Winter Simula-tion Conf. (Institute of Electrical and Electronics Engineers,Piscataway, NJ), 392–399.

Henderson SG (2003) Input model uncertainty: Why do we careand what should we do about it? Chick SE, Sanchez PJ,Ferrin D, Morrice DJ, eds. Proc. 2003 Winter Simulation Conf.(Institute of Electrical and Electronics Engineers, Piscataway,NJ), 90–100.

Kleijnen JPC (1987) Statistical Tools for Simulation Practitioners(Marcel Dekker Inc., New York).

Kleijnen JPC (1994) Sensitivity analysis versus uncertainty analysis:When to use what? Grasman J, van Straten G, eds. Predictabil-ity and Nonlinear Modelling in Natural Sciences and Economics(Kluwer, Dordrecht, the Netherlands), 322–333.

Law AM, Kelton WD (2000) Simulation Modelling and Analysis, 3rded. (McGraw Hill, New York).

Lee S-H, Glynn PW (2003) Computing the distribution function of aconditional expectation via Monte Carlo: Discrete conditioningspaces. ACM Trans. Modeling Comput. Simulation 13:238–258.

Ng SH, Chick SE (2001) Reducing input parameter uncertainty forsimulations. Peters BA, Smith JS, Medeiros DJ, Rohrer MW,eds. Proc. 2001 Winter Simulation Conf. (Institute of Electricaland Electronics Engineers, Piscataway, NJ), 364–371.

Santner TJ, Williams BJ, Notz WI (2003) The Design and Analysis ofComputer Experiments (Springer, New York).

Shao J, Tu D (1995) The Jackknife and Bootstrap (Springer, New York).

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.


Steckley SG, Henderson SG (2003) A kernel approach to estimatingthe density of a conditional expectation. Chick SE, Sanchez PJ,Ferrin D, Morrice DJ, eds. Proc. 2003 Winter Simulation Conf.(Institute of Electrical and Electronics Engineers, Piscataway,NJ), 383–391.

Sun H, Farooq M (2002) Note on the generation of random pointsuniformly distributed in hyper-ellipsoids. Proc. Fifth Internat.Conf. Inform. Fusion, 489–496.

Zouaoui F, Wilson JR (2001a) Accounting for input model andparameter uncertainty in simulation. Peters BA, Smith JS,Medeiros DJ, Rohrer MW, eds. Proc. 2001 Winter Simulation

Conf. (Institute of Electrical and Electronics Engineers,Piscataway, NJ), 290–299.

Zouaoui F, Wilson JR (2001b) Accounting for parameter uncertaintyin simulation input modeling. Peters BA, Smith JS, MedeirosDJ, Rohrer MW, eds. Proc. 2001 Winter Simulation Conf. (Insti-tute of Electrical and Electronics Engineers, Piscataway, NJ),354–363.

Zouaoui F, Wilson JR (2003) Accounting for parameter uncertaintyin simulation input modeling. IIE Trans. 35:781–792.

Zouaoui F, Wilson JR (2004) Accounting for input-model and input-parameter uncertainties in simulation. IIE Trans. 36:1135–1151.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Dow

nloa

ded

from

info

rms.

org

by [

165.

124.

161.

142]

on

26 O

ctob

er 2

013,

at 1

1:25

. Fo

r pe

rson

al u

se o

nly,

all

righ

ts r

eser

ved.

Documents

Quantifying Input Uncertainty via Simulation Confidence ...users.iems.northwestern.edu/~nelsonb/Publications/BartonNelsonXieIJOC.pdf · critically, project success, corporate proﬁtability,