MGP SeparableA

  • Upload
    ahmad

  • View
    239

  • Download
    0

Embed Size (px)

Citation preview

  • 7/25/2019 MGP SeparableA

    1/28

    Multi-output separable Gaussian process: Towards an efficient,

    fully Bayesian paradigm for uncertainty quantification

    Ilias Bilionis a,b, Nicholas Zabaras a,b,, Bledar A. Konomi c, Guang Lin c

    a Center for Applied Mathematics, 657 Frank H.T. Rhodes Hall, Cornell University, Ithaca, NY 14853, USAb Materials Process Design and Control Laboratory, Sibley School of Mechanical and Aerospace Engineering, 101 Frank H.T. Rhodes Hall, Cornell University,

    Ithaca, NY 14853-3801, USAc Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, MSIN K7-90, Richland, WA 99352,

    USA

    a r t i c l e i n f o

    Article history:

    Received 29 July 2012

    Accepted 10 January 2013

    Available online 7 February 2013

    Keywords:

    Bayesian

    Gaussian process

    Uncertainty quantification

    Separable covariance function

    Surrogate models

    Stochastic partial differential equations

    Kronecker product

    a b s t r a c t

    Computer codes simulating physical systems usually have responses that consist of a set of

    distinct outputs (e.g., velocity and pressure) that evolve also in space and time and depend

    on many unknown input parameters (e.g., physical constants, initial/boundary conditions,

    etc.). Furthermore, essential engineering procedures such as uncertainty quantification,

    inverse problems or design are notoriously difficult to carry out mostly due to the limited

    simulations available. The aim of this work is to introduce a fully Bayesian approach for

    treating these problems which accounts for the uncertainty induced by the finite number

    of observations. Our model is built on a multi-dimensional Gaussian process that explicitly

    treats correlations between distinct output variables as well as space and/or time. The

    proper use of a separable covariance function enables us to describe the huge covariance

    matrix as a Kronecker product of smaller matrices leading to efficient algorithms for carry-

    ing out inference and predictions. The novelty of this work, is the recognition that the

    Gaussian process model defines a posterior probability measure on the function space of

    possible surrogates for the computer code and the derivation of an algorithmic procedure

    that allows us to sample it efficiently. We demonstrate how the scheme can be used in

    uncertainty quantification tasks in order to obtain error bars for the statistics of interest

    that account for the finite number of observations.

    2013 Elsevier Inc. All rights reserved.

    1. Introduction

    It is very common for a research group or a company to spend years of development of sophisticated software in order to

    simulate realistically important physical phenomena. However, carrying out tasks like uncertainty quantification, model cal-

    ibration or design using the full-fledged model is in all but the simplest cases a daunting task, since a single simulation

    might take days or even weeks to complete, even with state-of-the-art modern computing systems. One, then, has to resort

    to computationally inexpensive surrogates of the computer code. The idea is to run the solver on a small, well-selected set of

    inputs and then use these data to learn the response surface. The surrogate surface may be subsequently used to carry out

    any of the computationally intensive engineering tasks.

    0021-9991/$ - see front matter 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jcp.2013.01.011

    Corresponding author at: Materials Process Design and Control Laboratory, Sibley School of Mechanical and Aerospace Engineering, 101 Frank H.T.

    Rhodes Hall, Cornell University, Ithaca, NY 14853-3801, USA. Tel.: +1 607 255 9104.

    E-mail address: [email protected](N. Zabaras).

    URL: http://mpdc.mae.cornell.edu/(N. Zabaras).

    Journal of Computational Physics 241 (2013) 212239

    Contents lists available atSciVerse ScienceDirect

    Journal of Computational Physics

    j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a te / j c p

    http://dx.doi.org/10.1016/j.jcp.2013.01.011mailto:[email protected]://mpdc.mae.cornell.edu/http://dx.doi.org/10.1016/j.jcp.2013.01.011http://www.sciencedirect.com/science/journal/00219991http://www.elsevier.com/locate/jcphttp://www.elsevier.com/locate/jcphttp://www.sciencedirect.com/science/journal/00219991http://dx.doi.org/10.1016/j.jcp.2013.01.011http://mpdc.mae.cornell.edu/mailto:[email protected]://dx.doi.org/10.1016/j.jcp.2013.01.011http://crossmark.dyndns.org/dialog/?doi=10.1016/j.jcp.2013.01.011&domain=pdf
  • 7/25/2019 MGP SeparableA

    2/28

    The engineering community and, in particular, the researchers in uncertainty quantification, have been making extensive

    use of surrogates, even though most times it is not explicitly stated. One example is the so-called stochastic collocation(SC)

    method (see [1]for a classic illustration) in which the response is modeled using a generalized Polynomial Chaos (gPC) basis

    [2]whose coefficients are approximated via a collocation scheme based on a tensor product rule of one-dimensional Gauss

    quadrature points. Of course, such approaches scale extremely badly with the number of input dimensions since the number

    of required collocation points explodes quite rapidly. A partial remedy of the situation can be found by using sparse grids

    (SG) based on the Smolyak algorithm [3], which have a weaker dependence on the dimensionality of the problem (see

    [46]and the adaptive version developed by our group [7]). Despite the rigorous convergence results of all these methods,

    their applicability to the situation of very limited observations is questionable. In that case, a statistical approach seems

    more suitable.

    To the best of our knowledge, the first attempt of the statistics community to build a computer surrogate starts with the

    seminal papers of Currin et al. [8] and independently Sacks et al. [9], both making use of Gaussian processes. On the same

    spirit is also the subsequent paper by Currin et al. [10]and the work of Welch et al. [11]. One of the first applications to

    uncertainty quantification can be found in OHagan et al.[12]and Oakley and OHagan[13]. The problem of model calibra-

    tion is considered in [14,15]. Refs. [16,17] model non-stationary responses, while [18,15] (in quite different ways) attempt to

    capture correlations between multiple outputs. Following these trends, we will consider a Bayesian scheme based on Gauss-

    ian processes.

    Despite the simplistic nature of the surrogate idea, there are still many hidden obstacles. Firstly, the question about the

    choice of the design of the inputs on which the full model is to be evaluated arises. It is generally admitted that a good start-

    ing point is a Latin hyper-cube design[19], because of its great coverage properties. However, it is more than obvious, that

    this choice should be influenced by the task in which the surrogate will be used. For example, in uncertainty quantification, it

    makes sense to bias the design using the probability density of the inputs [17] so that highly probable regions are adequately

    explored. Furthermore, it also pays off to consider a sequential design that depends on what is already known about the sur-

    face. Such a procedure, known as active learning, is particularly suitable for Bayesian surrogates since one may use the pre-

    dictive variance as an indicative measure of the informational content of particular points in the input space [20]. Secondly,

    computer codes solving partial differential equations usually have responses that are multi-output functions of space and/or

    time. One can hope, that explicitly modeling this fact may squeeze more information out of the observations. Finally, it is

    essential to be able to say something about the epistemic uncertainty induced by the limited number of observations

    and, in particular, about its effect on the task for which the surrogate is constructed for.

    In this work, we are mainly concerned with the second and the third points of the last paragraph. Even though we are

    making use of active learning ideas in a completely different context (see Section2.3), we will be assuming that the obser-

    vations are simply given to us. Our first goal is the construction of a multi-output Gaussian process model that explicitly

    treats space and time (Section 2.1). This model, in its full generality is extremely computationally demanding. In Section 2.2,

    we carefully develop the so-called separable model, which allows us to express the inference and prediction tasks using Kro-

    necker products of small matrices. This, in turn, results in highly efficient computations. Finally, in Section 2.3, we apply our

    scheme to uncertainty quantification tasks. Contrary to other approaches, we recognize the fact that the predictive distribu-

    tion of the Gaussian process conditional on the observations, actually defines a probability measure on the function space of

    possible surrogates. The weight of this measures corresponds to the epistemic uncertainty induced by the limited data.

    Extending on ideas of[13], we develop a procedure that allows us to approximately sample this probability space. Each sam-

    ple, is a kernel approximation of a candidate surrogate for the code. In the same section, we show how we can semi-analyt-

    ically compute all the statistics of the candidate surrogate up to second order. Higher order statistics or even probability

    densities may be calculated quite effortlessly via a Monte Carlo procedure. By repeatedly sampling the posterior surrogate

    space, we are able to provide error bars for practically anything that depends on it. In Section 3.1, we apply our scheme to a

    stochastic ordinary differential equation with three distinct outputs and two random variables. The purpose of this example,

    is to demonstrate the validity of our approach in a simple problem. In Section 3.2, we consider the problem of flow through

    random porous media. There, we model the velocity field and the pressure as a function of space and 50 random variables. In

    this more challenging problem, we clearly see the advantages of a fully Bayesian approach to uncertainty quantification.

    Namely, the ability to say something about the statistics of a 50 dimensional stochastic problem with as little as 24 obser-

    vations is intriguing. Finally, we conclude by noting the limitations of the approach and discuss the many possibilities for

    extension.

    2. Methodology

    We are interested in modeling computer codes returning a multi-output response y2 Rq, whereq > 0 is the number of

    distinct outputs, given an input n2 Xn Rkn , defined over a spatial domain Xs R

    ks and/or a time interval Xt 0; T, where

    knis the number of inputs to the code, ksthe spatial dimension (either 1, 2 or 3) and T> 0 is the time horizon. Even though for

    a given input n the code reports the response simultaneously at various spatial and time locations, we will be modeling it as a

    function f: Xn Xs Xt! Rq. As an example, you may consider the problem of two-dimensional flow in random porous

    media. The input variables n would represent the permeability field while for a fixed n f

    (n,xs

    ,t) would give the velocity

    components as well as the pressure at the spatial location xs and timet (hereq = 3).

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 213

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/25/2019 MGP SeparableA

    3/28

  • 7/25/2019 MGP SeparableA

    4/28

    where a c;x1; h;. . .; c;xn; h 2 Rn. Ifn> m+ q(so that all distributions involved are proper), it is possible to integrate

    out bothB and R resulting in the predictive distribution off() conditional only onh. It is aq-variate Students process with

    n mdegrees of freedom (see[18]):

    fjh;Y Tqm; c; ; hR;n m; 8

    where

    m

    x ^BT

    hx Y H^B

    T

    A1

    ax;

    cx1;x2; h cx1;x2; h hx1 H

    TA1ax1 T

    HTA1H1hx2 HTA1ax2;

    B HTA1H 1

    HTA1

    Y;

    R 1

    n mY HBTA1Y HB:

    2.1.4. The posterior distribution ofh

    Let us conclude this section by giving the posterior distribution of the hyper-parameters of the correlation function. Using

    Bayes theorem to write down the joint posterior forB,R

    andhconditional on Y(combining Eqs.(3) and (4)) and integratingoutB and R, we obtain:

    phjY / phjAjq2jHTA1Hj

    q2jRj

    nm2 : 9

    2.2. The separable model

    It is apparent that the above mentioned model becomes computationally intractable quite fast due to the fact that a high-

    dimensional and dense covariance matrix has to be inverted. An important simplification can be achieved if the spatial and

    the temporal points at which the output is observed remain fixed independent of the input n and if we assume that the cor-

    relation function is separable, i.e.:

    cx1;x2; h:cnn1; n2; hncsxs;1;xs;2; hsctt1; t2; ht; 10

    wherecn(, ;hn),cs(, ;hs) andct(, ; ht) are the correlation functions of the parameter space, the spatial domain and the time

    domain, respectively, and h= (hn,hs,ht). We will now show that under these assumptions, the covariance matrix can be writ-

    ten as the Kronecker product of smaller covariance matrices. Using this observation, it is possible to carry out inference and

    make predictions without ever forming the full covariance matrix. Finally, we also assume that the hyper-parameters of the

    various covariance functions are a priori independent:

    ph phnphspht: 11

    Remark 1. Another more general model for the covariance function is the linear model of coregionalization (LMC) [2426].

    The more general nature of this covariance function does not necessarily make it more attractive for the applications of

    interest. The introduction of such models is usually associated with higher computational cost which we try to avoid in this

    paper.

    2.2.1. Organizing the inputs

    Let us consider how the data are collected from a computer code. For a parametern 2 Xn, the computer code returns the

    (multi-output) response on a given (a priori known) set ofnsspatial points Xs xs;1;. . .;xs;ns T 2 Rnsks ;where ks= 1, 2 or 3 i s

    the spatial dimension (Xs Rks ), ateachone of the nttimesteps Xt t1;. . .; tnt 2 R

    nt1. That is, a single choice of the param-

    eter n generates a total ofnsnttraining samples. Therefore, the response of the code is a matrixYn2 Rnsntq, whichwe call the

    output matrix. The output matrix is assembled as follows:

    Yn yTn;1. . .y

    Tn;ns

    T;

    where eachyn;i 2 Rntq is the response at the spatial point xs,i at each timestep, that is:

    yn;i yn;i;1. . .yn;i;ntT;

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 215

  • 7/25/2019 MGP SeparableA

    5/28

    whereyn;i;j 2 Rq is the response at the spatial point xs,i at timetj:

    yn;i;j yn;i;j;1. . .yn;i;j;qT;

    where, of course, yn,i,j,l is thelth output of the response at xs,i at tj.

    2.2.2. Separating the covariance matrices

    If we take a total ofnn samples of the parameters:

    Xn n1;. . .;nnn T 2 Rnnkn ;

    wherekn is the dimension of the input variables n (Xn Rkn ), we will have a total of

    nnnnsnt

    training samples for our model. The covariance matrix A 2 Rnn can now be written as:

    AAnAsAt; 12

    whereAn 2 Rnnnn is the covariance matrix generated by Xnand cn(, ;hn),As 2 R

    nsns is the covariance matrix generated by Xsand cs(; ;hs) and At2 R

    ntnt is the covariance matrix generated by Xt andct(; ;ht) and corresponds to the Kronecker

    product.

    2.2.3. Separating the design matrices

    Now let us consider the basis functions used in the generalized linear model of Eq.(2). Suppose, we wish to usemtbasis

    functions to capture the time dependence of the mean:

    Ht fht;1t;. . .;ht;mttg:

    We choose also ms basis functions to capture the spatial dependence of the mean:

    Hs fhs;1xs;. . .;hs;msxsg:

    These can be for example the finite element basis of the model or any other suitable functions. Finally, we choose mn basis

    functions to capture the dependence of the mean on the stochastic parameter:

    Hn fhn;1n;. . .;hn;mn ng:

    For example, in an uncertainty quantification setting these could be a gPC basis as induced by the probability distribution of

    thens. The global basis functions are formed from the tensor product:

    H Hn Hs Ht:

    Thus, the total number of basis functions present in the model is:

    m mnmsmt:

    In order to have a consistent enumeration, we proceed as follows:

    h1x:hn;1nhs;1xsht;1t;

    h2x:hn;1nhs;1xsht;2t;

    ..

    .

    hmtx:hn;1nhs;1xsht;mtt;

    hmt1x:hn;1nhs;2xsht;1t;

    ..

    .

    hmsmti1mtj1l :hn;inhs;jxsht;lt;

    where, at the last line,i= 1,. . . , mn,j= 1,. . . , msandl= 1,. . . , mt. With this enumeration, the design matrixH defined in Eq.(5)

    breaks down as:

    HHn Hs Ht; 13

    whereHn 2 Rnnmn is:

    Hn;ij hn;jni;

    Hs 2 Rnsms is:

    Hs;ij hs;jxs;i

    216 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    6/28

    andHt2 Rntmt is:

    Ht;ij ht;jti:

    2.2.4. Efficient predictions and inference

    Givena set of hyper-parameters h, all the statistics that arerequired to make predictions or evaluate the posterior ofp(h) can

    be calculated efficiently by exploiting the properties of the Kronecker product. Its most important property is that various fac-

    torizations (e.g. Cholesky or QR) of a matrix formed by Kronecker products is given by the Kronecker products of the factoriza-

    tions of the individual matrices[27]. Furthermore, matrixvector multiplications as well as solving linear systems when the

    matricesforming the Kronecker product are triangular can be carried out withoutadditional memory. Therefore, working con-

    sistently with the Cholesky decomposition of the covariance matrices, leads to very efficient computations. All the linear alge-

    bra details pertaining to efficient computations with Kronecker products are documented in Appendices A and B.

    The posterior of the hyper-parameters (see Eq.(9)) can now be sampled efficiently via Gibbs sampling[28], as described

    inAlgorithm 1. Each one of the steps can be carried out using MCMC[29,30]. The prior distributions as well as the proposal

    distributions for hn, hs and htare given in the next paragraph.

    Algorithm 1. Sampling the posterior distribution

    Require:Observed dataXandYand initial h = (hn,hs,ht).

    EnsureRepeated application ensures that h = (hn,hs,ht) is a sample from Eq.(9).

    Sample:

    hn phnjY; hs; ht / phnjAnj qn

    2nn jHTnA1n Hnj

    qm2mn jRj

    nm2 :

    Sample:

    hs phsjY; hn; ht / phsjAsj qn

    2ns jHTsA1s Hsj

    qm2ms jRj

    nm2 :

    Sample:

    ht phtjY; hn; hs / phtjAtj qn

    2ntjHTtA1t Htj

    qm2mtjRj

    nm2 :

    2.2.5. Choice of the covariance function

    The separable model described in this section requires the specification of three covariance functionscn(, ;hn),cs(, ;hs)

    andct(, ;ht). In this work, we choose to work with:

    cnnn1 ; nn2 ; hn:exp 1

    2

    Xknk1

    nn1 ;k nn2 ;k2

    r2n;k

    !( )gndn1n2 ;

    csxs;n1 ;xs;n2 ; hs:exp 1

    2Xks

    k1

    xs;n1 ;kxs;n2 ;k2

    r2s;k

    !( )gsdn1n2 ;

    cttn1 ; tn2;ht:exp 1

    2

    tn1 tn2 2

    r2t

    !( )gtdn1n2 ;

    with the hyper-parameters completely specified by:

    hn rn;gn; hs rs;gs and ht rt;gt:

    The core part of the covariance functions is based on the Square Exponential (SE) kernels with the ra,a= n,s,thyper-param-eters being interpreted as the length scale of each input dimension. Thega, a= n, s, tare termed nuggets. The main purposeof the nuggets is to ensure the well-conditioning of the covariance matrices involved in the calculations and they are ex-

    pected to be typically small (of the order of 102). By looking at the full covariance function of the separable model and

    ignoring second-order products of the nuggets, we can see that the gn+gs+gtcan be interpreted as the measurement noise.

    Apart from improving the stability of the computations, one can argue that the presence of the nugget can also lead to better

    predictive accuracy[31].

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 217

    http://-/?-http://-/?-
  • 7/25/2019 MGP SeparableA

    7/28

    The priors of the hyper-parameters rn,gn, rs, gs, rtand gt, should be chosen to represent any prior knowledge about the

    computer code that might be available. In order to ensure positive support, we make the common choicea= n,s andt:

    pra;kjca Era;kjca; 14

    pgajfa Egajfa; 15

    whereEjk denotes the probability density of the exponential distribution with parameter k > 0.

    For the proposal required by the MCMC sampling schemes described in the previous section, we use a log-normal random

    walk for all hyper-parameters (again because they are all positive). The step size of the random walk is selected so that the

    observed acceptance ratio of the MCMC is between 30 and 60%.

    The particular values ofca and fa for a= n,s andtare specified in each numerical example.

    2.3. Application to uncertainty quantification

    In uncertainty quantification tasks, one specifies a probability density on the inputs ns,p(n), and tries to quantify the

    probability measure induced by it on the output field. In this work, we quantify this uncertainty by interrogating the surro-

    gate built using the Gaussian process model introduced in the previous sections (see Eq. (8)). The whole process is compli-

    cated by the fact that our model in reality defines a probability measure over the function space of potential surrogates. This

    probability measure essentially quantifies the lack of information regarding the real response due to the finite number of

    observations. In a fully Bayesian setting, this probability measure will be reflected as a probability measure on the predicted

    statistics (e.g. mean, variance, PDFs, etc.). To the best of our knowledge, such ideas were introduced for the first time in the

    statistics literature[13]but were largely ignored by the UQ community. Inspired by the above mentioned work, we will de-

    scribe in this section how our model can be used to essentially sample the posterior distribution of the induced statistics. The

    procedure is conceptually simple and described in Algorithm 2. The key component of this algorithm is the ability to sample

    a response surface based on Eq.(8)that can be described analytically via a kernel representation. This is achieved through

    the generalization of the techniques discussed in [13]. The final component of the algorithm has to do with the evaluation of

    the statistics of interest induced by this response surface. We will show that our model allows for semi-analytic calculation

    of all statistics up to second-order. Higher-order statistics, or full probability densities have to be obtained using Monte Carlo

    techniques on the sampled surrogate surface.

    Algorithm 2. Sampling the posterior of the statistics. By repeatedly calling this algorithm, error bars for the desired statistics

    may be obtained

    RequireObserved data XandYand h0 sample from Eq.(9).

    EnsureSis a sample from the statistic of interest.

    Sample a newh1 following the Gibbs procedure given in Section2.2.

    Sample a response surface usingAlgorithm 3.

    Interrogate the obtained response surface (analytically or via MC) to obtain S.

    2.3.1. Sampling a response surface

    In order to obtain an analytical representation of the response surface, Ref.[13] suggests selecting a space filling design of

    the input variables, using Eq. (8) to sample the outputs and then augmenting the original data set with the new observations

    to derive an updated Eq.(8)with reduced variance. The mean of the updated posterior predictive distribution is an analytic

    function that can be thought of (if its variance is sufficiently small) as a sample from the predictive probability measure. Sev-

    eral problems arise if one follows this approach. To start with, one does not know a priori how many design points are re-

    quired in order to reduce the predictive variance to a pre-specified tolerance. Furthermore, design points must be well placed

    and far away from the initial observations in order to avoid numerical instabilities. Finally and this is particular to our mod-

    el including design points in all sets of different inputs (n,xsandt) breaks down the Kronecker product representation of

    the covariance and design matrices which, in turn, leads to a tremendous computational burden. In order to avoid the latter

    of these conundrums, we choose to work with the same spatial and time points as the ones included in the original data

    (namely Xs andXt). This approximation, will ignore only a hopefully small part of the epistemic uncertainty due to

    the finite number of observations. That is, we only choose design points in Xn. The former two problems are addressed

    by employing a sequential strategy in which the ns are selected one by maximizing the predictive variance until a specific

    tolerance is achieved. This approach is guaranteed to produce a space filling design that is well separated from the original

    observations. In addition, the only covariance matrix that needs to be updated is the one pertaining to n. In AppendixC, we

    describe how the Cholesky decomposition of the covariance matrix as well as solutions to the relevant linear systems can be

    updated in quadratic time when new design points are added.

    218 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/25/2019 MGP SeparableA

    8/28

    Consider h fixed and letXn;d 2 Rnndkn andYd 2 R

    nndnsntq denote the set ofns and the corresponding outputs whend

    design points have been observed. That is

    Xn;d

    Xn

    nnn1

    ..

    .

    nnnd

    0BBBBB@

    1CCCCCA

    :

    Ford = 0, we obtain the observed data:

    Xn;0 Xn and Y0 Y:

    DefineBd 2 Rmq, Hn;d 2 R

    nndmn ,An;d 2 Rnndnnd to be the weight, design and covariance matrices pertaining to n, respec-

    tively, whenXn,d and Yd have been observed. In order to avoid cluttering the final formulas, let us also define:

    an;dn

    an

    cnnnn1; n; h

    ..

    .

    cnnnnd; n; h

    0BBBBB@

    1CCCCCA

    2 Rnnd ;

    Ad An;dAsAt

    and

    Hd Hn;d Hs Ht:

    Now, let n2 Xnand Zn 2 Rnsntq be the output at nand all spatial and time points in Xsand Xt. By using Bayes theorem and

    Eq.(8), we can show that:

    ZnjYd; h Tnsntq Mdn; Cdn;R;nd m

    ; 16

    wherend= (nn+ d)nsntand the mean is given by:

    Mdn hTn n Hs HtBd an;dn

    T AsAtA1d Yd HdBd 17

    and the covariance matrix by:

    Cdn cnn; n; h AsAt an;dn AsAt T

    A1d an;dn AsAt

    hnn Hs Ht

    HTdA1d

    THTdA

    1d Hd

    1 hnn Hs Ht H

    TdA

    1d : 18

    In order to sample Eq.(16), we need to compute the Cholesky decomposition ofCd(n). This is not trivial, sinceCd(n) does not

    have a particular structure. In the numerical examples and in particular for the porous flow problem considered in Sec-

    tion3.2 this matrix turned out to be extremely ill-conditioned. Even though, theoretically Cd(n) is guaranteed to be sym-

    metric positive definite, numerically it must be treated as positive semi-definite. For this reason, one has to use a low-rank

    approximation ofCd(n) using the pivoted Cholesky factorization [32]. This can be carried out using the LAPACK routine dpstrf.

    The tolerance we used for this approximation was 103 for all numerical examples we considered. We found no observable

    difference between samples obtained with the normal Cholesky factorization and this approach. Finally, let us mention that ascalar quantity that is associated directly with the uncertainty pertaining n is given by:

    r2dn trCdntrR

    nsntq pn: 19

    This is the sum of the variances of all outputs at all different spatial and time points weighted by the input probability dis-

    tributionp(n). The idea is to sequentially augment the data set by including the design points from a dense subset Xn ofXnthat maximize Eq.(19)until a pre-specified tolerance is achieved. At that point, the joint predictive mean given by Eq. (17)

    may be used as an analytic sample surrogate surface. In general, we would like to evaluate the response surface on a denser

    spatial designXs 2 Rns ks and/or more time steps Xt 2 R

    nt ). The joint predictive mean at those points is given by:

    Mdn hnnT Hs H

    t Bd an;dnT A;Ts A;Tt A1d Yd HdBd; 20

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 219

    http://-/?-http://-/?-
  • 7/25/2019 MGP SeparableA

    9/28

    whereHs 2 Rns ms ; Ht 2 R

    nt

    mt are the design matrices that pertain to the test spatial and time points, respectively, while

    As 2 R

    nsns ; A

    t 2 R

    ntnt are the corresponding cross covariance matrices. We identify Eq. (20)as a sample response surface

    from the function space of possible surrogates. The complete algorithmic details are given inAlgorithm 3.

    Algorithm 3. Sample a response surface

    Require: Observed data XandY, h sampled from Eq. (8), a dense set of design points Xn n1;. . .; n

    n

    n

    n o, the desired final

    toleranced > 0 and dense spatial and time designsXs 2 Rns ks andXt 2 R

    nt on which we wish to make predictions.

    Ensure:AfterdP 1 steps, the uncertainty of Eq.(16)as captured by Eq.(19)is less thand andMdngiven by Eq.(20)

    can be used as an analytic representation of the sampled response surface.

    Initialized 0.

    repeat

    Find the next design point:

    nnnd1 argmaxn2Xn

    r2dn: 21

    Sample Znnnd1from Eq.(16).

    Augment the set of observations with the pair nnnd1; Znnnd1

    .

    d d+ 1.untilr2dnnnd< d.

    2.3.2. Analytic first-order and second-order statistics

    In applications, we are usually interested in first and second-order statistics. We can obtain a sample of the mean re-

    sponse by integrating out n from Eq.(20):

    Md :

    Z Mdnpndn: 22

    It can be easily shown that:

    Md Th Hs Ht

    Bd Ta;dA;Ts A

    ;Tt

    A

    1d Yd HdBd

    ; 23

    where

    h

    Z hnnpndn and a;d

    Z an;dnpndn:

    Now, let i,j 2 {1,. . . , q} be two arbitrary outputs. The covariance matrix between all possible spatial and time test points is

    defined by:

    Cij;d :

    Z Md;in M

    d;i

    Md;jn M

    d;j

    Tpndn; 24

    where the subscriptsiandjselect columns of the associated matrices. This matrix, contains all second-order statistics of the

    surrogate. For example, the variance of each output i = 1,. . . , qis on all spatial and time locations Xs andXt, respectively is

    given by:

    Vi;d :diag Cii;d : 25

    It can be shown using tensorial notation, that Cij;d may be evaluated by:

    Cij;d Hs H

    t

    Bidmhh H

    s H

    t

    Bjd

    h iT A;Ts A

    ;Tt

    ~Yidmaa;d A

    ;Ts A

    ;Tt

    ~Yjd

    h iT Hs H

    t

    Bidmha;d A

    ;Ts A

    ;Tt

    ~Yjd

    h iT A;Ts A

    ;Tt

    ~Yidmah;d H

    s H

    t

    Bjd

    h iT; 26

    where Bid 2 Rmsmtmn is such that vecBid

    Bd;i (i.e. the ith column ofBd), ~Yd 2 Rnq is defined by:

    ~Yd A1d Yd HBd;

    ~Yid 2 Rnsntnn is such that vec~Yid

    ~Yd;i; mhh 2 Rmnmn is given by:

    220 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    10/28

    mhh

    Z hnn hhnn h

    Tpndn; 27

    maa;d 2 Rnnnn by:

    maa;d

    Z an;dn d;aan;dn a;d

    Tpndn; 28

    mha;d 2 Rmnnn by:

    mha;d Z hnn han;dn a;d

    Tpndn 29

    andmah,d= mha,dT.

    0 2 4 6 8 100

    0.2

    0.4

    0.6

    0.8

    1KO2, n=100

    Time (t)

    Meanofy1

    (t)

    95% Conf. Interval

    Mean

    (a)

    0 2 4 6 8 101.5

    1

    0.5

    0

    0.5

    1KO2, n=100

    Time (t)

    Meanofy3

    (t)

    95% Conf. Interval

    Mean

    (b)

    0 2 4 6 8 100

    0.05

    0.1

    0.15

    0.2KO2, n=100

    Time (t)

    Varianceofy1

    (t)

    95% Conf. Interval

    Mean

    (c)

    0 2 4 6 8 100

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    KO2, n=100

    Time (t)

    Varianceofy3

    (t)

    95% Conf. Interval

    Mean

    (d)

    0 2 4 6 8 100

    0.05

    0.1

    0.15

    0.2KO2, n=150

    Time (t)

    Varianceofy1(t)

    95% Conf. Interval

    Mean

    (e)

    0 2 4 6 8 100

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8KO2, n=150

    Time (t)

    Varianceofy3(t)

    95% Conf. Interval

    Mean

    (f)

    Fig. 1. KO-2: the thick blue line is the mean of the statistic predicted by our model while the gray area provides 95% confidence intervals. The first row ((a)

    and (b)) corresponds to the mean of the response as captured with nn= 100. The second ((c) and (d)) and the last ((e) and (f)) rows show the variance of the

    response fornn= 100 andnn= 150, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version

    of this article.)

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 221

  • 7/25/2019 MGP SeparableA

    11/28

    3 2 1 0 1 2 30

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Probability

    y2(t=4.0)

    95% error bars

    Mean of PDF

    (a)

    3 2 1 0 1 2 30

    0.2

    0.4

    0.6

    0.8

    1

    Probability

    y2(t=4.0)

    95% error bars

    Mean of PDF

    (b)

    3 2 1 0 1 2 30

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    Probability

    y2(t=4.0)

    95% error bars

    Mean of PDF

    (c)

    3 2 1 0 1 2 30

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    Probability

    y2(t=6.0)

    95% error bars

    Mean of PDF

    (d)

    3 2 1 0 1 2 30

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    Probability

    y2(t=6.0)

    95% error bars

    Mean of PDF

    (e)

    3 2 1 0 1 2 30

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    Probability

    y2(t=6.0)

    95% error bars

    Mean of PDF

    (f)

    3 2 1 0 1 2 30

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    P

    robability

    y2(t=8.0)

    95% error bars

    Mean of PDF

    (g)

    3 2 1 0 1 2 30

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    P

    robability

    y2(t=8.0)

    95% error bars

    Mean of PDF

    (h)

    3 2 1 0 1 2 30

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    P

    robability

    y2(t=8.0)

    95% error bars

    Mean of PDF

    (i)

    3 2 1 0 1 2 30

    0.5

    1

    1.5

    2

    2.5

    Pro

    bability

    y2(t=10.0)

    95% error bars

    Mean of PDF

    (j)

    3 2 1 0 1 2 30

    0.5

    1

    1.5

    2

    2.5

    Pro

    bability

    y2(t=10.0)

    95% error bars

    Mean of PDF

    (k)

    3 2 1 0 1 2 30

    0.5

    1

    1.5

    2

    2.5

    Pro

    bability

    y2(t=10.0)

    95% error bars

    Mean of PDF

    (l)

    Fig. 2. KO-2: thefirst column corresponds to nn= 70, the secondto nn= 100 and the third to nn= 150. Each row depictsthe PDF ofy2(t) for times t = 4,6,8,10.

    Thethick blue line is themean of the PDFpredicted by our modelwhile thegray area provides 95%confidence intervals. (For interpretation of the references

    to color in this figure legend, the reader is referred to the web version of this article.)

    222 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    12/28

    3. Numerical examples

    3.1. KrainchnanOrszag three-mode problem

    Consider the system of ordinary differential equations [33]:

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=4.0)

    y3

    (t=4.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    2

    2.5

    3

    (a)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=4.0)

    y3

    (t=4.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    2

    2.5

    3

    (b)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=4.0)

    y3

    (t=4.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    2

    2.5

    (c)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=6.0)

    y3

    (t=6.0

    )

    Mean of PDF

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    (d)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=6.0)

    y3

    (t=6.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    (e)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=6.0)

    y3

    (t=6.0

    )

    Mean of PDF

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    (f)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=8.0)

    y3

    (t=8.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    2

    2.5

    (g)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=8.0)

    y3

    (t=8.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    2

    (h)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=8.0)

    y3

    (t=8.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    2

    2.5

    3

    (i)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=10.0)

    y3

    (t=10.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    (j)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=10.0)

    y3

    (t=10.0

    )

    Mean of PDF

    0

    0.5

    1

    1.5

    2

    (k)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=10.0)

    y3

    (t=10.0

    )

    Mean of PDF

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    (l)

    Fig. 3. KO-2: the first column corresponds to nn= 70, the second tonn= 100 and the third tonn= 150. Each row depicts the joint PDF ofy2(t) andy3(t) for

    timest= 4,6,8,10.

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 223

  • 7/25/2019 MGP SeparableA

    13/28

    dy1dt

    y1y3;

    dy2dt y2y3;

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=4.0)

    y3

    (t=4.0

    )

    Variance of PDF

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    (a)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=4.0)

    y3

    (t=4.0

    )

    Variance of PDF

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    (b)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=4.0)

    y3

    (t=4.0)

    Variance of PDF

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    (c)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=6.0)

    y3

    (t=6.0)

    Variance of PDF

    0

    0.05

    0.1

    0.15

    (d)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=6.0)

    y3

    (t=6.0)

    Variance of PDF

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    (e)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=6.0)

    y3

    (t=6.0)

    Variance of PDF

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    (f)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=8.0)

    y3

    (t=8.0

    )

    Variance of PDF

    0.1

    0.2

    0.3

    0.4

    0.5

    (g)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=8.0)

    y3

    (t=8.0)

    Variance of PDF

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    (h)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=8.0)

    y3

    (t=8.0

    )

    Variance of PDF

    0

    0.1

    0.2

    0.3

    0.4

    (i)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=10.0)

    y3(t=10.0)

    Variance of PDF

    0.05

    0.1

    0.15

    0.2

    (j)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=10.0)

    y3(t=10.0)

    Variance of PDF

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    (k)

    2 1 0 1 22

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    y2(t=10.0)

    y3(t=10.0)

    Variance of PDF

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (l)

    Fig. 4. KO-2: the first column corresponds to nn= 70, the second tonn= 100 and the third tonn= 150. Each row depicts the predictive variance of the joint

    PDF ofy2(t) andy3(t) for times t= 4,6,8,10.

    224 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    14/28

    dy3dt

    y21y22

    subject to random initial conditions att= 0: The stochastic initial conditions are defined by:

    y10 1; y20 0:1n1; y30 n2;

    where

    ni U1; 1; i1; 2:

    This dynamical system is particularly interesting because the response has a discontinuity at n1= 0. The deterministic sol-

    ver we use is a 4th order RungeKutta method as implemented in GNU Scientific Library[34].

    As is apparent, the input variablesn represent the initial conditions. We will consider the case of two input dimensions,

    i.e.kn= 2. The output consist of three distinct variables (q= 3) that are functions of time (ks= 0). For convenience, we choose

    to work with a constant prior mean by selecting:

    hnn 1 and htt 1:

    That is,mn= 1,mt= 1. We fixnnand gather the input dataXn 2 Rnnkn from a Latin hyper-cube design[35]. We solve the sys-

    tem for the time interval [0,10] and record the response at 20 equidistant time steps, i.e.Xt2 Rnt with nt= 20. Both Xnand Xtare scaled in [0,1]. The priors are specified by selecting:

    ca 1=0:05 and fa 106; for a n; t:

    This means, that we a priori assume that the mean for all length scales is 0.05 and the mean of all the nuggets is 106. We

    train our model fornn= 70, 100 and 150 observations by sampling the posterior ofh = (rn,gn, rt,gt) given in Eq. (9)following

    the Gibbs-MCMC procedure described in Algorithm 1. To initialize the Markov chain, we sample the prior (Eqs. (14) and (15))

    of the hyper-parameters 100 times and set h0 equal to the sample with maximum posterior probability defined by Eq.(9).

    The proposals are selected to be log-normal random walks and the step size (the same for all types of inputs) is set to 0.01.

    The chain is well mixed after about 500 iterations of the Gibbs scheme.

    After the Markov chain has been sufficiently mixed, we are ready to start making predictions. Predictions are made at 50

    equidistant time steps in [0,10], i.e. Xt 2 Rn

    t with nt 50. Then, we draw 100 samples from the posterior distribution of the

    statistics of interest as described inAlgorithm 2with tolerance d = 102. We plot the mean of the statistics as well as 95%

    error bars (2 times the standard deviation of the statistic). To calculate the mean of a sampled response surface, we use

    0 200 400 600 800 10000

    2

    4

    6

    8

    10

    Number of samples x 100

    Lengthscalesof

    0 200 400 600 800 10000

    0.02

    0.04

    0.06

    0.08

    0.1

    Number of samples x 100

    Spatiallengthscales

    0 200 400 600 800 10000.005

    0.01

    0.015

    0.02

    Number of samples x 100

    Nuggets

    Fig. 5. Porous flow: samples drawn from the posterior of the hyper-parameters ((a) for rn, (b) for rsand (c) for the nuggets) for the case of 120 observations.

    It is apparent that the spatial length scales are clearly identified, while the hyper-parameters of the stochastic variables have a much wider posterior. Of

    course, this is expected given the limited number of observations.

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 225

  • 7/25/2019 MGP SeparableA

    15/28

    Eq. (23) while for the variance we use the diagonal ofCii;d ; i 1;. . .; q(Eq. (26)). One or two dimensional probability densities

    for each sampled response surface are evaluated by the following MC procedure: (1) We draw 10,000 samples fromp(n); (2)

    We evaluate the sampled response (Eq.(20)) at each one of these ns; (3) We use a one- or two-dimensional kernel density

    estimator[36]to approximate the desired PDF. The predicted means of all the statistics are practically identical to the ones

    obtained via Monte Carlo estimate (not shown in the figures, see [17]). The first row ofFig. 1 shows the time evolution of the

    mean ofy1(t) andy3(t) fornn= 100. Notice that the error bars are very tight. The second and third rows of the same figures

    depict the variance of the same quantities for nn= 100 andnn= 150, respectively. We can see the width of the error bars

    decreasing as the number of observations is increased. Fig. 2shows the time evolution of the probability density ofy2(t).The four rows correspond to different time instants (specifically t= 4, 6, 8 and 10). The columns correspond to nn= 70,

    100 and 150 counting from the left. Fig. 3shows the time evolution of the joint probability density ofy2(t) andy3(t). The

    four rows correspond to different time instants (specifically t= 4, 6, 8 and 10). The columns correspond to nn= 70, 100

    and 150 counting from the left. The variance of the same joint probability density is shown in Fig. 4.

    3.2. Flow through porous media

    In this example, we study a two-dimensional, single phase, steady-state flow through a random permeability field. A good

    review of the mathematical models of flow through porous media can be found in [37]. The spatial domain Xs is chosen to be

    the unit square [0,1]2, representing an idealized oil reservoir. Let us denote withp andu the pressure and the velocity fields

    of the fluid, respectively. These are connected via the Darcy law:

    u Krp; in Xs; 30

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    Fig. 6. Porous flow: mean ofux. Subfigures (a)(c) show the mean of the mean ofux for 24, 64 and 120 observations, respectively. Subfigure (d) plots two

    standard deviations of the mean ofux for 120 observations. Finally, (e) shows the MC estimate of the same quantity using 108,000 observations.

    226 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    16/28

    whereK is the permeability tensor that models the easiness with which the liquid flows through the reservoir. Combining

    the Darcy law with the continuity equation, it is easy to show that the governing PDE for the pressure is:

    r Krp f; in Xs; 31

    where the source termfmay be used to model injection/production wells. We use two model square wells: an injection well

    on the left-bottom corner of Xs and a production well on the top-right corner. The particular mathematical form is as

    follows:

    fxs

    r; if jxsi 12 wj 0 is its variance. The values we choose for the

    parameters arem= 0,k= 0.1 andsG= 1. In order to obtain a finite dimensional representation ofG, we employ the Karhun-

    enLove expansion[39]and truncate it afterkn= 50 terms:

    Gw;xs m Xknk1

    wkwkxs;

    wherew w1;. . .; wkn is vector of independent, zero mean and unit variance Gaussian random variables and wk(xs) are theeigenfunctions of the exponential covariance given in Eq. (34) (suitably normalized, of course). In order to guarantee the ana-

    lytical calculation of statistics of the first-orderpand uof Section 2.3, we choose to work with the uniform random variables

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.15

    0.1

    0.05

    0

    0.05

    0.1

    0.15

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.15

    0.1

    0.05

    0

    0.05

    0.1

    0.15

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.15

    0.1

    0.05

    0

    0.05

    0.1

    0.15

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.15

    0.1

    0.05

    0

    0.05

    0.1

    0.15

    Fig. 8. Porous flow: mean ofp. Subfigures (a)(c) show the mean of the mean ofp for 24, 64 and 120 observations, respectively. Subfigure (d) plots two

    standard deviations of the mean ofp for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000 observations.

    228 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

    http://-/?-http://-/?-
  • 7/25/2019 MGP SeparableA

    18/28

    nk Uwk U0; 1; k1;. . .;kn;

    where U() is the cumulative distribution function of the standard normal distribution. Putting it all together, the finite-

    dimensional stochastic representation of the permeability field is:

    Kn;xs exp m Xkn

    k1

    U1nkw

    k

    xs( ): 35In order to make the notational connection with the rest of the paper obvious, let us define the response of the physical

    model as

    f: Xn Xs ! Rq;

    where, of course, Xn 0; 1kn ,Xs 0; 1

    2 andq = 3. That is,

    fn;xs pn;xs; un;xs;

    wherep(n;xs) and u(n;xs) is the solution of the boundary problem defined by Eqs. (30), (31) and (33) at the spatial pointxsfor

    a permeability field given by Eq.(35). Our purpose is to learn this map and also propagate the uncertainty of the stochastic

    variables through it by using a finite number of simulations.

    The boundary value problem is solved using the Mixed Finite Element formulation. We use first-order RaviartThomas

    elements for the velocity [40], and zero-order discontinuous elements for the pressure [41]. The spatial domain is discretized

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    4

    5

    6

    7

    8

    9

    10

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    4

    6

    8

    10

    12

    14

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    2

    4

    6

    8

    10

    12

    14

    16

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    Fig. 9. Porous flow: variance ofux. Subfigures (a)(c) show the mean of the variance ofuxfor 24, 64 and 120 observations, respectively. Subfigure (d) plots

    two standard deviations of the variance ofux for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000

    observations.

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 229

  • 7/25/2019 MGP SeparableA

    19/28

    using a 64 64 triangular mesh. The solver was implemented using the Dolfin C++library [42]. The eigenfunctions of the

    exponential random field used to model the permeability were calculated via Stokhos which is part of Trilinos[43].

    For each stochastic input n, the response is observed on a regular 32 32 square spatial grid. Because of the regular nat-

    ure of the spatial grid as well as the separable nature of the Square Exponential correlation function we use, it can be easily

    shown that the 1024 1024 spatial covariance matrixAs can be written as

    As As;1As;2;

    whereAs,i, i= 1, 2 are 32 32 covariance matrices pertaining to the horizontal and vertical spatial directions, respectively. Of

    course, it is vital to make use of this simplification. The data collected this way are used to train a 3-dimensional Gaussian

    process which is then used to make predictions on the same spatial grid. We train our model, using in sequence 24, 64 and

    120 observations of the deterministic solver in which the stochastic inputs are selected from a Latin hyper-cube design. The

    prior hyper-parameters are set to:

    cn 1=3

    cs 1=0:01

    fa 102; fora n; s:

    The initial values of the hyper-parameters used to start the Gibbs procedure are chosen to be the means of the priors. For

    each training set, we sample the posterior of the hyper-parameters 100,000 times (see Fig. 5for a representative example).

    Then, we draw 100 sample surrogates as described inAlgorithm 3. For each sampled surrogate, we calculate the statistics of

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    2

    4

    6

    8

    10

    12

    14

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    2

    4

    6

    8

    10

    12

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    2

    4

    6

    8

    10

    12

    14

    16

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.005

    0.01

    0.015

    0.02

    0.025

    Fig. 10. Porous flow: variance ofuy. Subfigures (a)(c) show the mean of the variance ofuyfor 24, 64 and 120 observations, respectively. Subfigure (d) plots

    two standard deviations of the variance ofuy for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000

    observations.

    230 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    20/28

    interest. Finally, we compute and report the mean and the standard deviation of these statistics. The results are compared to

    Monte Carlo estimates.

    Fig. 6compares the mean of the mean ofux to a Monte Carlo estimate using 108,000 observations. Two standard devia-

    tions of the mean ofuxfor the case of 120 observations are shown in subfigure (d). The same statistic foruyandpis reportedinFigs. 7 and 8, respectively.Fig. 9compares the mean of the variance ofuxto a Monte Carlo estimate using 108,000 obser-

    vations. Two standard deviations of the variance ofuxfor the case of 120 observations are shown in subfigure (d). The same

    statistic for uy and p is reported inFigs. 10 and 11, respectively. We observe especially for the cases of 24 and 64 obser-

    vations that the variance is underestimated. Of course, this is to be expected given the very limited set of data available.

    Fortunately, the error bars seem to compensate for this under-estimation with the exception of the case that corresponds to

    the variance of the pressurep.

    Fig. 12depicts the predicted probability densities ofux(0.5,0.5) along with their error bars, for all available training sets.

    We see that the tails of the probability density are not estimated correctly. In particular, we observe two different types of

    potential problems. Firstly, the left hand side puts too much weight on negative values forux(0.5, 0.5) even though it is quite

    clear (see subfigure (d)) thatuxis always positive on that particular spatial location. However, our prior assumption is that

    the response is a sample from a Gaussian random field. Hence, negative values forux(0.5,0.5) are very plausible. The model,

    can correct this belief only by observing an adequate number of data points. It is a fact, that all 120 observations ofux near

    (0.5,0.5) are strictly positive. However, these observations are not enough to change the prior belief that ux(0.5,0.5) might

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    2

    4

    6

    8

    10

    12

    x 104

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.5

    1

    1.5

    2

    2.5

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.5

    1

    1.5

    2

    2.5

    x 103

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    x 104

    x

    y

    0 0.5 10

    0.2

    0.4

    0.6

    0.8

    1

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    x 103

    Fig. 11. Porous flow: variance ofp. Subfigures (a)(c) show the mean of the variance ofp for 24, 64 and 120 observations, respectively. Subfigure (d) plots

    two standard deviations of the variance of p for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000

    observations.

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 231

  • 7/25/2019 MGP SeparableA

    21/28

    also take negative values. Notice, though, that as we go from (a)(c), the trend is gradually corrected. If one wanted to incor-

    porate the fact that a quantity of interest is strictly positive, then it is usually recommended to observe the logarithm of the

    quantity instead. Let us now get to the second problem which has to do with the underestimation of the right tail of the

    0.1 0 0.1 0.2 0.30

    2

    4

    6

    8

    10

    ux(0.50,0.50)

    Probabilitydensity

    0.1 0 0.1 0.2 0.30

    2

    4

    6

    8

    10

    ux(0.50,0.50)

    Probabilitydensity

    0.1 0 0.1 0.2 0.30

    2

    4

    6

    8

    10

    ux(0.50,0.50)

    Probabilitydensity

    0.1 0 0.1 0.2 0.30

    2

    4

    6

    8

    10

    ux(0.50,0.50)

    Probabilitydensity

    Fig. 12. Porous flow: the PDF ofux(0.5,0.5). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)

    observations. The filled gray area corresponds to two standard deviations of the PDFs about the mean PDF. The solid red line of (d) is the Monte Carlo

    estimate using 10,000 observations. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this

    article.)

    0.1 0 0.1 0.2 0.3 0.40

    2

    4

    6

    ux(0.25,0.25)

    Probabilit

    ydensity

    0.1 0 0.1 0.2 0.3 0.40

    2

    4

    6

    ux(0.25,0.25)

    Probabilit

    ydensity

    0.1 0 0.1 0.2 0.3 0.40

    2

    4

    6

    ux(0.25,0.25)

    Prob

    abilitydensity

    0.1 0 0.1 0.2 0.3 0.40

    2

    4

    6

    ux(0.25,0.25)

    Prob

    abilitydensity

    Fig. 13. Porous flow: the pdf ofux(0.25,0.25). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)

    observations. The filled gray area corresponds to two standard deviations of the PDFs about the mean PDF. The solid red line of (d) is the Monte Carlo

    estimate using 10,000 observations. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this

    article.)

    232 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    22/28

    distribution. Let us start by noticing that cases (a) and (b) put enough weight on it. The reason for this is not that there are

    observations close to this region. It is again a consequence of the Gaussian assumption, just like in the first problem we

    discussed. However, for the case of 120 observations, we see that the model significantly underestimates the right tail.

    0.04 0.02 0 0.02 0.040

    10

    20

    30

    40

    50

    60

    70

    p(0.50,0.50)

    Probabilitydensity

    0.04 0.02 0 0.02 0.040

    10

    20

    30

    40

    50

    60

    70

    p(0.50,0.50)

    Probabilitydensity

    0.04 0.02 0 0.02 0.040

    10

    20

    30

    40

    50

    60

    70

    p(0.50,0.50)

    Probabilitydensity

    0.04 0.02 0 0.02 0.040

    10

    20

    30

    40

    50

    60

    70

    p(0.50,0.50)

    Probabilitydensity

    Fig. 14. Porous flow: the pdf ofp(0.5,0.5). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)

    observations. The filled gray area corresponds to two standard deviations of the PDFs about the mean PDF. The solid red line of (d) is the Monte Carlo

    estimate using 10,000 observations. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this

    article.)

    0 0.05 0.1 0.150

    10

    20

    30

    40

    p(0.25,0.25)

    Probabilitydensity

    0 0.05 0.1 0.150

    10

    20

    30

    40

    p(0.25,0.25)

    Probabilitydensity

    0 0.05 0.1 0.150

    10

    20

    30

    40

    p(0.25,0.25)

    Probabilitydensity

    0 0.05 0.1 0.150

    10

    20

    30

    40

    p(0.25,0.25)

    Probabilitydensity

    Fig. 15. Porous flow: the PDF ofp(0.25,0.25). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)

    observations. The filled gray area corresponds to two standard deviations of the PDFs about the mean PDF. The solid red line of (d) is the Monte Carlo

    estimate using 10,000 observations. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this

    article.)

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 233

  • 7/25/2019 MGP SeparableA

    23/28

    The reason, of course, is that there is not a single observation in the training set that takes values close to that region. One

    cannot possibly expect to capture a long tail without observing any events on it. The remedy here is a smarter choice of the

    observations on the lines of the active learning techniques that we have investigated in other places [17]. It is needless to say,

    that if the purpose of the practitioner is the investigation of improbable events, then she should favor active learning

    schemes that have a bias for extreme values. This is clearly beyond the scope of the present work. Finally, Figs. 1315show

    the predicted probability densities ofux(0.25,0.25),p(0.5,0.5) andp(0.25,0.25), respectively. The same comments as for the

    ux(0.5,0.5) case are also applicable here. Finally, the joint probability density ofux(0.5,0.5) and uy(0.5,0.5) is shown in Fig. 16.

    We observe again, the underestimation of the top right long tail of the distribution and the broadening that occurs close to

    (0,0).

    4. Conclusions

    We developed a multi-output Gaussian process model that explicitly models the linear part of correlations between dis-

    tinct outputs as well as space and/or time. By exploiting the static nature of the spatial/time inputs as well as the special

    nature of separable covariance functions, we were able to express all required quantities for inference and predictions in

    terms of Kronecker products. This led to highly efficient computations both in terms of memory and CPU time. We recog-

    nized the fact that the posterior predictive distribution of the Gaussian process defines a probability measure on the function

    space of possible surrogates and we described an approximate method that yields kernel-based analytic sample surrogates.

    The scheme was applied to uncertainty quantification tasks in which we were able to quantify the epistemic uncertainty

    induced by the limited number of observations.

    ux(0.50,0.50)

    uy

    (0.50,0.50)

    0.2 0 0.2 0.4

    0.1

    0

    0.1

    0.2

    0.3

    0

    5

    10

    15

    20

    25

    30

    35

    40

    (a)

    ux(0.50,0.50)

    uy

    (0.50,0.50)

    0.1 0 0.1 0.2 0.3

    0.1

    0

    0.1

    0.2

    0

    10

    20

    30

    40

    50

    60

    (b)

    ux(0.50,0.50)

    uy

    (0.50,0.50)

    0 0.1 0.2

    0.05

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0

    10

    20

    30

    40

    50

    60

    70

    (c)

    ux(0.50,0.50)

    uy

    (0.50,0.50)

    0 0.1 0.2

    0.05

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    (d)

    ux(0.50,0.50)

    uy

    (0.50,0.50)

    0 0.1 0.2 0.30

    0.1

    0.2

    0.3

    0

    10

    20

    30

    40

    50

    60

    70

    80

    (e)

    Fig. 16. Porous flow: the joint PDF ofux(0.5,0.5) anduy(0.5,0.5). The contours show the average joint PDF over 100 sampled surrogates for the cases of 24

    (a), 64 (b) and 120 (c) observations, respectively. Subfigure (d) plots two standard deviations of the joint PDF for 120 observations. Finally, subfigure (e)

    shows the MC estimate of the same quantity using 10,000 observations.

    234 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    24/28

    Despite the successes, we observe certain aspects that require further investigation. Firstly, we noticed a systematic

    underestimation of the tails of the predicted probability densities. Of course, this is expected in a limited observations set-

    ting. However, we are confident that the model can be considerably improved without losing in efficiency in several different

    ways. To start with, in the flow through porous media example, we can see that the assumption of stationarity in space is

    wrong. It is evident that the velocities vary more close to the wells, than they do away from them. The stationary covariance

    in space, forces the model to make a compromise in the spatial length scales. On one hand, regions close to the wells seem

    smoother than necessary while, on the other hand, regions away from them are more wavy. Hence, we are expecting that

    using a non-stationary covariance or a tree-based model will improve the situation significantly[17]. Furthermore, it would

    be very interesting to see how the results would change if a sequential active learning approach was followed for the col-

    lection of the observations. It seems plausible, that the most effective way to improve the tails of the distributions would

    be to select observations with extreme properties. A simple variance-based active learning scheme seems inadequate (of

    course here we are talking about the case in which the observations are kept to very small number). The particulars of an

    alternative are a very interesting research topic.

    Acknowledgements

    The research at Cornell was supported by an OSD/AFOSR MURI09 award on uncertainty quantification, the US Depart-

    ment of Energy, Office of Science, Advanced Scientific Computing Research and the Computational Mathematics program

    of the National Science Foundation (NSF) (award DMS-1214282). The research at Pacific Northwest National Laboratory

    (PNNL) was supported by the Applied Mathematics program of the US DOE Office of Advanced Scientific Computing Re-

    search. PNNL is operated by Battelle for the US Department of Energy under Contract DE-AC05-76RL01830. This researchused resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science

    of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Additional computing resources were provided

    by the NSF through TeraGrid resources provided by NCSA under Grant No. TG-DMS090007.

    Appendix A. Kronecker product properties

    A.1. Calculating matrixvector and matrixmatrix products

    A.1.1. Matrixvector product

    LetA 2 Rm1n1 ; B2 Rm2n2 andx2 Rn1n2 . We wish to calculate:

    y A B x2 Rm1m2 ;

    without explicitly forming the Kronecker product. This may be easily achieved by exploiting the properties of the vectoriza-tion operation vec()[22]. LetX2 Rn2n1 be the matrix formed by folding the vector xso thatx= vec(X). Then we obtain:

    y vecBXAT: A:1

    So all we need to do is two matrix multiplications. Notice that for the case of triangular A and B no additional memory is

    required.

    A.1.2. Matrixmatrix product

    LetA andB be as before and X2 Rn1n2s. We wish to calculate:

    Y A BX2 Rm1m2 s:

    This can be trivially calculated by working column by column using the results of the previous subsection.

    A.1.3. Three Kronecker products

    LetC 2 Rm3n3 andx2 Rn1n2n3 . Then the product

    y A B Cx2 Rm1m2m3

    can by calculated by observing that:

    y vecCXA CT;

    where X2 Rn3n1n2 such that x= vec(X). To simplify the expression inside the vectorization operator, let

    Z A C XT 2 Rn1n2n3 . This matrix can be calculated using what was described in the previous subsection. Finally, we ob-

    tain the following:

    y vecCZT: A:2

    Again notice that if all matrices are triangular all operations can be performed without additional memory.

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 235

  • 7/25/2019 MGP SeparableA

    25/28

    A.2. Solving linear systems

    Now let A 2 Rmm; B2 Rnn andy2 Rmn. We wish to solve the linear system:

    A Bxy

    forx2 Rmn. LetX2 Rmn andY2 Rmn be such thatx= vec(X) andy= vec(Y), respectively. Using, again, the properties of the

    vectorization operator, we obtain:

    BXAT Y:

    Therefore, we can findXby solving two linear systems:

    BZY; A:3

    AXT ZT: A:4

    IfA andB are triangular matrices, then no additional memory is required.

    Finally, let C 2 Rss be another matrix and y2 Rnms. We wish to solve the linear system:

    A B Cx y

    forx2 Rnms. LetX2 Rsmn andY2 Rsmn be such thatx= vec(X) andy= vec(Y), respectively. Then

    CX A B T Y:

    We start by solving the system:

    CZ Y

    and then solve the system:

    A BXT ZT;

    using the results of the previous paragraph on each of the rows ofXandZ.

    Appendix B. Implementation details

    Given a set of hyper-parameters h, the various statistics may be evaluated efficiently in the following sequence:

    1. Compute the Cholesky decomposition of all covariance matrices:

    An LnLTn ;

    As LsLTs;

    At LtLTt:

    2. Scale the outputs by solving in place the linear system:

    Ln Ls Lt

    ~Y Y:

    3. Scale the design matrices by solving in place the linear systems:

    Ln ~Hn Hn;

    Ls ~Hs Hs;

    Lt ~Ht Ht:

    4. Calculate the QR factorizations of the scaled design matrices:

    ~Hn QnRn; Qn Qn;1 Qn2 ; Rn RTn;1 0

    h iT;

    ~Hs QsRs; Qs Qs;1 Qs2 ; Rs RTs;1 0

    h iT;

    ~Ht QtRt; Qt Qt;1 Qt2 ; Rt RTt;1 0

    h iT;

    where fora = n,s,t, Qa;12 Rnama ; Qa;2 2 R

    nanama andRa;12 Rmama is upper triangular.

    5. Find B by solving in place the upper triangular system:

    Rn;1 Rs;1 Rt;1B Qn;1 Qs;1 Qt;1 ~Y:6. Calculate (by doingn rank-1 updates) R:

    236 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    26/28

    R 1

    n m~Y ~Hn ~Hs ~Ht

    B

    h iT~Y ~Hn ~Hs ~Ht

    B

    h i:

    7. Calculate the Cholesky decomposition ofR:

    RLRLTR:

    8. Now we can evaluate all the determinants involved in the posterior ofh:

    log jAnj 2Xnni1

    log Ln;ii;

    log jAsj 2Xnsi1

    log Ls;ii;

    log jAtj 2Xnti1

    log Lt;ii;

    log jHTnA1n Hnj 2

    Xmni1

    log jRn;1;iij;

    log jHTsA1s Hsj 2

    Xmsi1

    log jRs;1;iij;

    log jHTtA1t Htj 2Xmti1

    log jRt;1;iij;

    log jRj 2Xqi1

    log LR;ii;

    log jAj n

    nnlog jAnj

    n

    nslog jAsj

    n

    ntlog jAtj;

    log jHTA1Hj m

    mnlog jHTnA

    1n Hnj

    m

    mslog jHTsA

    1s Hsj

    m

    mtlog jHTtA

    1t Htj:

    Appendix C. Fast Cholesky updates

    This section is concerned with updating the Cholesky decomposition of a covariance matrix in O(n2) time when a new

    data point is observed. The updates useful in two occasions:

    1. When we are doing active learning without updating the hyper-parameters.

    2. When we wish to sample sequentially the joint distribution in order to obtain a response surface (see Section2.3).

    LetAn 2 Rnn be a symmetric positive definite matrix and assume that we have already calculated its Cholesky decom-

    positionLn 2 Rnn (lower triangular). Now let An+m2 R

    (n+m)(n+m) be another symmetric positive definite matrix (e.g. the

    one we obtain if we observe m new data points). In particular let it be given by:

    Anm An B

    BT C

    ;

    whereB 2 Rnm (e.g. cross covariance) andC 2 Rmm (e.g. covariance matrix of the new data). Let Lnm 2 Rnmnm be the

    lower triangular Cholesky factor ofAn+m (i.e.Anm LnmLTnm). It is convenient to represent it in the matrix block form:

    Lnm D11 0nm

    D21 D22

    ;

    whereD112 Rnn; D212 R

    mn andD222 Rmm. We will derive formulas for the efficient calculation of the D ij based on the

    Cholesky decomposition ofAn. Notice that:

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 237

    http://-/?-http://-/?-
  • 7/25/2019 MGP SeparableA

    27/28

    Anm Ln1LTn1

    ) An B

    BT C

    D11 0nm

    D21 D22

    D11 0nm

    D21 D22

    T

    ) LnL

    Tn B

    BT C

    !

    D11DT11 D11D

    T21

    D21DT11 D21D

    T21 D22D

    T22

    !:

    From the above equation, we see right away that D11= Ln.D21can be found by solving the triangular system LnDT

    21

    B and

    finallyD22is the lower triangular Cholesky factor ofC D21DT21. To wrap it up, here is how the update should be performed:

    1. SetD11= Ln.

    2. Solve the following system forD21:

    LnDT21 B:

    3. Compute the Cholesky decomposition of:

    D22DT22 C D21D

    T21;

    to findD22.

    For the special (but common in practice) case wherem= 1, thenD21is a vector andCandD22are numbers, step 3 can be

    replaced by:

    D22ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffi

    C D21DT21

    q :

    We can also update solutions of linear systems. Suppose that we have already solved the following system:

    Lnxn yn

    and after observingm more data points, we need to solve the following:

    Lnmxnm ynm:

    If you let

    ynm yn

    z

    ;

    it is trivial to show thatxn+m is given by:

    xnm xn

    D122 z D21xn

    :

    References

    [1] I. Babuka, F. Nobile, R. Tempone, A stochastic collocation method for elliptic partial differential equations with random input data, SIAM Journal of

    Numerical Analysis 45 (3) (2007) 10051034.

    [2] D. Xiu, G.E. Karniadakis, The WienerAskey polynomial chaos for stochastic differential equations, Journal of Scientific Computing (24) (2002) 619

    644.

    [3] S.A. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions, in: Dokl. Akad. Nauk SSSR, vol. 4, 1963, p. 123.

    [4] D. Xiu, J.S. Hesthaven, High-order collocation methods for differential equations with random inputs, SIAM Journal on Scientific Computing 27 (3)

    (2005) 11181139.

    [5] D. Xiu, Efficient collocational approach for parametric uncertainty analysis, Communications in Computational Physics 2 (2) (2007) 293309.[6] F. Nobile, R. Tempone, C. Webster, A sparse grid collocation method for elliptic partial differential equations with random input data, SIAM Journal of

    Numerical Analysis 45 (2008) 23092345.

    [7] X. Ma, N. Zabaras, An adaptive hierarchical sparse grid collocation algorithm for the solution of stochastic differential equations, Journal of

    Computational Physics 228 (8) (2009) 30843113.

    [8] C. Currin, T. Mitchell, M. Morris, D. Ylvisaker, A Bayesian approach to the design and analysis of computer experiments, Tech. Rep. ORNL6498, Oak

    Ridge Laboratory, 1988.

    [9] J. Sacks, W.J. Welch, T.J. Mitchell, H.P. Wynn, Design and analysis of computer experiments, Statistical Science 4 (4) (1989) 409435.

    [10] C. Currin, T. Mitchell, M. Morris, D. Ylvisaker, Bayesian prediction of deterministic functions, with applications to the design and analysis of computer

    experiments, Journal of the American Statistical Association 86 (416) (1991) 953963.

    [11] W.J. Welch, R.J. Buck, J. Sacks, H.P. Wynn, T.J. Mitchell, M.D. Morris, Screening, predicting, andcomputer experiments, Technometrics 34 (1) (1992) 15

    25.

    [12] A. OHagan, M.C. Kennedy, J.E. Oakley, Uncertainty analysis and other inference tools for complex computer codes, in: J.M. Bernardo et al. (Eds.),

    Bayesian Statistics 6, Oxford University Press, 1999, pp. 503524.

    [13] J. Oakley, A. OHagan, Bayesian inference for the uncertainty distribution of computer model outputs, Biometrika 89 (4) (2002) 769784.

    [14] M.C. Kennedy, A. OHagan, Bayesian calibration of computer models, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (3)

    (2001) 425464.

    [15] D. Higdon, J. Gattiker, B. Williams, M. Rightley, Computer model calibration using high-dimensional output, Journal of the American StatisticalAssociation 103 (482) (2008) 570583.

    238 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239

  • 7/25/2019 MGP SeparableA

    28/28

    [16] R.B. Gramacy, H.K.H. Lee, Bayesian treed Gaussian process models with an application to computer modeling, Journal of the American Statistical

    Association 103 (483) (2008) 11191130.

    [17] I. Bilionis, N. Zabaras, Multi-output local Gaussian process regression: applications to uncertainty quantification, Journal of Computational Physics 231

    (17) (2012) 57185746.

    [18] S. Conti, A. OHagan, Bayesian emulation of complex multi-output and dynamic computer models, Journal of Statistical Planning and Inference 140 (3)

    (2010) 640651.

    [19] M.D. McKay, R.J. Beckman, W.J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a

    computer code, Technometrics 21 (2) (1979) 239245.

    [20] D.J. MacKay, Information-based objective functions for active data selection, Neural Computation 4 (1992) 590604.

    [21] A.P. Dawid, Some matrix-variate distribution theory: notational considerations and a Bayesian application, Biometrika 68 (1) (1981) 265274.

    [22] J. Magnus, H. Neudecker, Matrix Differential Calculus With Applications in Statistics and Econometrics, Wiley Series in Probability and Statistics, JohnWiley, 1999.

    [23] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag, Berlin, New York, Inc., Secaucus, NJ, USA,

    2006.

    [24] M. Goulard, M. Voltz, Linear coregionalization model: tools for estimation and choice of cross-variogram matrix, Mathematical Geology 24 (1992)

    269286.

    [25] G. Bourgault, D. Marcotte, Multivariable variogram and its application to the linear model of coregionalization, Mathematical Geology 23 (1991) 899

    928.

    [26] S. De Iaco, D. Myers, D. Posa, The linear coregionalization model and the productsum spacetime variogram, Mathematical Geology 35 (2003) 2538.

    [27] C.F. van Loan, The ubiquitous Kronecker product, Journal of Computational and Applied Mathematics 123 (12) (2000) 85100.

    [28] G. Casella, E.I. George, Explaining the Gibbs sampler, The American Statistician 46 (3) (1992) 167174.

    [29] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, Equation of state calculations by fast computing machines, The Journal of

    Chemical Physics 21 (6) (1953) 10871092.

    [30] W.K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57 (1) (1970) 97109.

    [31] R.B. Gramacy, H.K. Lee, Cases for the nugget in modeling computer experiments, Statistics and Computing 22 (3) (2012) 713722.

    [32] H. Harbrecht, M. Peters, R. Schneider, On the low-rank approximation by the pivoted Cholesky decomposition, Applied Numerical Mathematics 62 (4)

    (2012) 428440.

    [33] X. Wan, G.E. Karniadakis, An adaptive multi-element generalized polynomial chaos method for stochastic differential equations, Journal ofComputational Physics 209 (2005) 617642.

    [34] M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, P. Alken, M. Booth, F. Rossi, GNU Scientific Library Reference Manual, 2009.

    [35] M.D. Mckay, R.J. Beckman, W.J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a

    computer code, Technometrics 42 (1) (2000) 202208.

    [36] A.W. Bowman, A. Azzalini, Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations, Oxford Statistical Science

    Series, Oxford University Press, USA, 1997.

    [37] J.E. Aarnes, V. Kippe, K.-A. Lie, A.B. Rustad, Modelling of Multiscale Structures in Flow Simulations for Petroleum Reservoirs, in: G. Hasle, K.-A. Lie, E.

    Quak (Eds.), Geometric Modelling, Numerical Simulation, and Optimization, Springer, Berlin, Heidelberg, 2007, pp. 307360 (Ch. 10).

    [38] P. Bochev, R.B. Lehoucq, On finite element solution of the pure Neumann problem, SIAM Review 47 (2001) 5066.

    [39] R.G. Ghanem, P.D. Spanos, Stochastic Finite Elements: A Spectral Approach, Springer-Verlag, New York, 1991.

    [40] P. Raviart, J. Thomas, A mixed finite element method for 2nd order elliptic problems, in: I. Galligani, E. Magenes (Eds.), Mathematical Aspects of Finite

    Element Methods, Lecture Notes in Mathematics, vol. 606, Springer, Berlin/Heidelberg, 1977, pp. 292315 (Ch. 19).

    [41] F. Brezzi, T.J.R. Hughes, L.D. Marini, A. Masud, Mixed discontinuous Galerkin methods for Darcy flow, Journal of Scientific Computing 2223 (1) (2005)

    119145.

    [42] A. Logg, G.N. Wells, J. Hake, DOLFIN: a C++/Python Finite Element Library, Springer, 2012 (Ch. 10).

    [43] M.A. Heroux, R.A. Bartlett, V.E. Howle, R.J. Hoekstra, J.J. Hu, T.G. Kolda, R.B. Lehoucq, K.R. Long, R.P. Pawlowski, E.T. Phipps, A.G. Salinger, H.K.

    Thornquist, R.S. Tuminaro, J.M. Willenbring, A. Williams, K.S. Stanley, An overview of the trilinos project, ACM Transactions on Mathematical Software31 (3) (2005) 397423.

    I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 239