MGP SeparableA

7/25/2019 MGP SeparableA

1/28

Multi-output separable Gaussian process: Towards an efficient,

fully Bayesian paradigm for uncertainty quantification

Ilias Bilionis a,b, Nicholas Zabaras a,b,, Bledar A. Konomi c, Guang Lin c

a Center for Applied Mathematics, 657 Frank H.T. Rhodes Hall, Cornell University, Ithaca, NY 14853, USAb Materials Process Design and Control Laboratory, Sibley School of Mechanical and Aerospace Engineering, 101 Frank H.T. Rhodes Hall, Cornell University,

Ithaca, NY 14853-3801, USAc Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O. Box 999, MSIN K7-90, Richland, WA 99352,

USA

a r t i c l e i n f o

Article history:

Received 29 July 2012

Accepted 10 January 2013

Available online 7 February 2013

Keywords:

Bayesian

Gaussian process

Uncertainty quantification

Separable covariance function

Surrogate models

Stochastic partial differential equations

Kronecker product

a b s t r a c t

Computer codes simulating physical systems usually have responses that consist of a set of

distinct outputs (e.g., velocity and pressure) that evolve also in space and time and depend

on many unknown input parameters (e.g., physical constants, initial/boundary conditions,

etc.). Furthermore, essential engineering procedures such as uncertainty quantification,

inverse problems or design are notoriously difficult to carry out mostly due to the limited

simulations available. The aim of this work is to introduce a fully Bayesian approach for

treating these problems which accounts for the uncertainty induced by the finite number

of observations. Our model is built on a multi-dimensional Gaussian process that explicitly

treats correlations between distinct output variables as well as space and/or time. The

proper use of a separable covariance function enables us to describe the huge covariance

matrix as a Kronecker product of smaller matrices leading to efficient algorithms for carry-

ing out inference and predictions. The novelty of this work, is the recognition that the

Gaussian process model defines a posterior probability measure on the function space of

possible surrogates for the computer code and the derivation of an algorithmic procedure

that allows us to sample it efficiently. We demonstrate how the scheme can be used in

uncertainty quantification tasks in order to obtain error bars for the statistics of interest

that account for the finite number of observations.

2013 Elsevier Inc. All rights reserved.

1. Introduction

It is very common for a research group or a company to spend years of development of sophisticated software in order to

simulate realistically important physical phenomena. However, carrying out tasks like uncertainty quantification, model cal-

ibration or design using the full-fledged model is in all but the simplest cases a daunting task, since a single simulation

might take days or even weeks to complete, even with state-of-the-art modern computing systems. One, then, has to resort

to computationally inexpensive surrogates of the computer code. The idea is to run the solver on a small, well-selected set of

inputs and then use these data to learn the response surface. The surrogate surface may be subsequently used to carry out

any of the computationally intensive engineering tasks.

0021-9991/$ - see front matter 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jcp.2013.01.011

Corresponding author at: Materials Process Design and Control Laboratory, Sibley School of Mechanical and Aerospace Engineering, 101 Frank H.T.

Rhodes Hall, Cornell University, Ithaca, NY 14853-3801, USA. Tel.: +1 607 255 9104.

E-mail address: [email protected](N. Zabaras).

URL: http://mpdc.mae.cornell.edu/(N. Zabaras).

Journal of Computational Physics 241 (2013) 212239

Contents lists available atSciVerse ScienceDirect

Journal of Computational Physics

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a te / j c p
http://dx.doi.org/10.1016/j.jcp.2013.01.011mailto:[email protected]://mpdc.mae.cornell.edu/http://dx.doi.org/10.1016/j.jcp.2013.01.011http://www.sciencedirect.com/science/journal/00219991http://www.elsevier.com/locate/jcphttp://www.elsevier.com/locate/jcphttp://www.sciencedirect.com/science/journal/00219991http://dx.doi.org/10.1016/j.jcp.2013.01.011http://mpdc.mae.cornell.edu/mailto:[email protected]://dx.doi.org/10.1016/j.jcp.2013.01.011http://crossmark.dyndns.org/dialog/?doi=10.1016/j.jcp.2013.01.011&domain=pdf


2/28

The engineering community and, in particular, the researchers in uncertainty quantification, have been making extensive

use of surrogates, even though most times it is not explicitly stated. One example is the so-called stochastic collocation(SC)

method (see [1]for a classic illustration) in which the response is modeled using a generalized Polynomial Chaos (gPC) basis

[2]whose coefficients are approximated via a collocation scheme based on a tensor product rule of one-dimensional Gauss

quadrature points. Of course, such approaches scale extremely badly with the number of input dimensions since the number

of required collocation points explodes quite rapidly. A partial remedy of the situation can be found by using sparse grids

(SG) based on the Smolyak algorithm [3], which have a weaker dependence on the dimensionality of the problem (see

[46]and the adaptive version developed by our group [7]). Despite the rigorous convergence results of all these methods,

their applicability to the situation of very limited observations is questionable. In that case, a statistical approach seems

more suitable.

To the best of our knowledge, the first attempt of the statistics community to build a computer surrogate starts with the

seminal papers of Currin et al. [8] and independently Sacks et al. [9], both making use of Gaussian processes. On the same

spirit is also the subsequent paper by Currin et al. [10]and the work of Welch et al. [11]. One of the first applications to

uncertainty quantification can be found in OHagan et al.[12]and Oakley and OHagan[13]. The problem of model calibra-

tion is considered in [14,15]. Refs. [16,17] model non-stationary responses, while [18,15] (in quite different ways) attempt to

capture correlations between multiple outputs. Following these trends, we will consider a Bayesian scheme based on Gauss-

ian processes.

Despite the simplistic nature of the surrogate idea, there are still many hidden obstacles. Firstly, the question about the

choice of the design of the inputs on which the full model is to be evaluated arises. It is generally admitted that a good start-

ing point is a Latin hyper-cube design[19], because of its great coverage properties. However, it is more than obvious, that

this choice should be influenced by the task in which the surrogate will be used. For example, in uncertainty quantification, it

makes sense to bias the design using the probability density of the inputs [17] so that highly probable regions are adequately

explored. Furthermore, it also pays off to consider a sequential design that depends on what is already known about the sur-

face. Such a procedure, known as active learning, is particularly suitable for Bayesian surrogates since one may use the pre-

dictive variance as an indicative measure of the informational content of particular points in the input space [20]. Secondly,

computer codes solving partial differential equations usually have responses that are multi-output functions of space and/or

time. One can hope, that explicitly modeling this fact may squeeze more information out of the observations. Finally, it is

essential to be able to say something about the epistemic uncertainty induced by the limited number of observations

and, in particular, about its effect on the task for which the surrogate is constructed for.

In this work, we are mainly concerned with the second and the third points of the last paragraph. Even though we are

making use of active learning ideas in a completely different context (see Section2.3), we will be assuming that the obser-

vations are simply given to us. Our first goal is the construction of a multi-output Gaussian process model that explicitly

treats space and time (Section 2.1). This model, in its full generality is extremely computationally demanding. In Section 2.2,

we carefully develop the so-called separable model, which allows us to express the inference and prediction tasks using Kro-

necker products of small matrices. This, in turn, results in highly efficient computations. Finally, in Section 2.3, we apply our

scheme to uncertainty quantification tasks. Contrary to other approaches, we recognize the fact that the predictive distribu-

tion of the Gaussian process conditional on the observations, actually defines a probability measure on the function space of

possible surrogates. The weight of this measures corresponds to the epistemic uncertainty induced by the limited data.

Extending on ideas of[13], we develop a procedure that allows us to approximately sample this probability space. Each sam-

ple, is a kernel approximation of a candidate surrogate for the code. In the same section, we show how we can semi-analyt-

ically compute all the statistics of the candidate surrogate up to second order. Higher order statistics or even probability

densities may be calculated quite effortlessly via a Monte Carlo procedure. By repeatedly sampling the posterior surrogate

space, we are able to provide error bars for practically anything that depends on it. In Section 3.1, we apply our scheme to a

stochastic ordinary differential equation with three distinct outputs and two random variables. The purpose of this example,

is to demonstrate the validity of our approach in a simple problem. In Section 3.2, we consider the problem of flow through

random porous media. There, we model the velocity field and the pressure as a function of space and 50 random variables. In

this more challenging problem, we clearly see the advantages of a fully Bayesian approach to uncertainty quantification.

Namely, the ability to say something about the statistics of a 50 dimensional stochastic problem with as little as 24 obser-

vations is intriguing. Finally, we conclude by noting the limitations of the approach and discuss the many possibilities for

extension.

2. Methodology

We are interested in modeling computer codes returning a multi-output response y2 Rq, whereq > 0 is the number of

distinct outputs, given an input n2 Xn Rkn , defined over a spatial domain Xs R

ks and/or a time interval Xt 0; T, where

knis the number of inputs to the code, ksthe spatial dimension (either 1, 2 or 3) and T> 0 is the time horizon. Even though for

a given input n the code reports the response simultaneously at various spatial and time locations, we will be modeling it as a

function f: Xn Xs Xt! Rq. As an example, you may consider the problem of two-dimensional flow in random porous

media. The input variables n would represent the permeability field while for a fixed n f

(n,xs

,t) would give the velocity

components as well as the pressure at the spatial location xs and timet (hereq = 3).

I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239 213
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


3/28


4/28

where a c;x1; h;. . .; c;xn; h 2 Rn. Ifn> m+ q(so that all distributions involved are proper), it is possible to integrate

out bothB and R resulting in the predictive distribution off() conditional only onh. It is aq-variate Students process with

n mdegrees of freedom (see[18]):

fjh;Y Tqm; c; ; hR;n m; 8

where

m

x ^BT

hx Y H^B

T

A1

ax;

cx1;x2; h cx1;x2; h hx1 H

TA1ax1 T

HTA1H1hx2 HTA1ax2;

B HTA1H 1

HTA1

Y;

R 1

n mY HBTA1Y HB:

2.1.4. The posterior distribution ofh

Let us conclude this section by giving the posterior distribution of the hyper-parameters of the correlation function. Using

Bayes theorem to write down the joint posterior forB,R

andhconditional on Y(combining Eqs.(3) and (4)) and integratingoutB and R, we obtain:

phjY / phjAjq2jHTA1Hj

q2jRj

nm2 : 9

2.2. The separable model

It is apparent that the above mentioned model becomes computationally intractable quite fast due to the fact that a high-

dimensional and dense covariance matrix has to be inverted. An important simplification can be achieved if the spatial and

the temporal points at which the output is observed remain fixed independent of the input n and if we assume that the cor-

relation function is separable, i.e.:

cx1;x2; h:cnn1; n2; hncsxs;1;xs;2; hsctt1; t2; ht; 10

wherecn(, ;hn),cs(, ;hs) andct(, ; ht) are the correlation functions of the parameter space, the spatial domain and the time

domain, respectively, and h= (hn,hs,ht). We will now show that under these assumptions, the covariance matrix can be writ-

ten as the Kronecker product of smaller covariance matrices. Using this observation, it is possible to carry out inference and

make predictions without ever forming the full covariance matrix. Finally, we also assume that the hyper-parameters of the

various covariance functions are a priori independent:

ph phnphspht: 11

Remark 1. Another more general model for the covariance function is the linear model of coregionalization (LMC) [2426].

The more general nature of this covariance function does not necessarily make it more attractive for the applications of

interest. The introduction of such models is usually associated with higher computational cost which we try to avoid in this

paper.

2.2.1. Organizing the inputs

Let us consider how the data are collected from a computer code. For a parametern 2 Xn, the computer code returns the

(multi-output) response on a given (a priori known) set ofnsspatial points Xs xs;1;. . .;xs;ns T 2 Rnsks ;where ks= 1, 2 or 3 i s

the spatial dimension (Xs Rks ), ateachone of the nttimesteps Xt t1;. . .; tnt 2 R

nt1. That is, a single choice of the param-

eter n generates a total ofnsnttraining samples. Therefore, the response of the code is a matrixYn2 Rnsntq, whichwe call the

output matrix. The output matrix is assembled as follows:

Yn yTn;1. . .y

Tn;ns

T;

where eachyn;i 2 Rntq is the response at the spatial point xs,i at each timestep, that is:

yn;i yn;i;1. . .yn;i;ntT;



5/28

whereyn;i;j 2 Rq is the response at the spatial point xs,i at timetj:

yn;i;j yn;i;j;1. . .yn;i;j;qT;

where, of course, yn,i,j,l is thelth output of the response at xs,i at tj.

2.2.2. Separating the covariance matrices

If we take a total ofnn samples of the parameters:

Xn n1;. . .;nnn T 2 Rnnkn ;

wherekn is the dimension of the input variables n (Xn Rkn ), we will have a total of

nnnnsnt

training samples for our model. The covariance matrix A 2 Rnn can now be written as:

AAnAsAt; 12

whereAn 2 Rnnnn is the covariance matrix generated by Xnand cn(, ;hn),As 2 R

nsns is the covariance matrix generated by Xsand cs(; ;hs) and At2 R

ntnt is the covariance matrix generated by Xt andct(; ;ht) and corresponds to the Kronecker

product.

2.2.3. Separating the design matrices

Now let us consider the basis functions used in the generalized linear model of Eq.(2). Suppose, we wish to usemtbasis

functions to capture the time dependence of the mean:

Ht fht;1t;. . .;ht;mttg:

We choose also ms basis functions to capture the spatial dependence of the mean:

Hs fhs;1xs;. . .;hs;msxsg:

These can be for example the finite element basis of the model or any other suitable functions. Finally, we choose mn basis

functions to capture the dependence of the mean on the stochastic parameter:

Hn fhn;1n;. . .;hn;mn ng:

For example, in an uncertainty quantification setting these could be a gPC basis as induced by the probability distribution of

thens. The global basis functions are formed from the tensor product:

H Hn Hs Ht:

Thus, the total number of basis functions present in the model is:

m mnmsmt:

In order to have a consistent enumeration, we proceed as follows:

h1x:hn;1nhs;1xsht;1t;

h2x:hn;1nhs;1xsht;2t;

..

.

hmtx:hn;1nhs;1xsht;mtt;

hmt1x:hn;1nhs;2xsht;1t;

..

.

hmsmti1mtj1l :hn;inhs;jxsht;lt;

where, at the last line,i= 1,. . . , mn,j= 1,. . . , msandl= 1,. . . , mt. With this enumeration, the design matrixH defined in Eq.(5)

breaks down as:

HHn Hs Ht; 13

whereHn 2 Rnnmn is:

Hn;ij hn;jni;

Hs 2 Rnsms is:

Hs;ij hs;jxs;i

216 I. Bilionis et al. / Journal of Computational Physics 241 (2013) 212239


6/28

andHt2 Rntmt is:

Ht;ij ht;jti:

2.2.4. Efficient predictions and inference

Givena set of hyper-parameters h, all the statistics that arerequired to make predictions or evaluate the posterior ofp(h) can

be calculated efficiently by exploiting the properties of the Kronecker product. Its most important property is that various fac-

torizations (e.g. Cholesky or QR) of a matrix formed by Kronecker products is given by the Kronecker products of the factoriza-

tions of the individual matrices[27]. Furthermore, matrixvector multiplications as well as solving linear systems when the

matricesforming the Kronecker product are triangular can be carried out withoutadditional memory. Therefore, working con-

sistently with the Cholesky decomposition of the covariance matrices, leads to very efficient computations. All the linear alge-

bra details pertaining to efficient computations with Kronecker products are documented in Appendices A and B.

The posterior of the hyper-parameters (see Eq.(9)) can now be sampled efficiently via Gibbs sampling[28], as described

inAlgorithm 1. Each one of the steps can be carried out using MCMC[29,30]. The prior distributions as well as the proposal

distributions for hn, hs and htare given in the next paragraph.

Algorithm 1. Sampling the posterior distribution

Require:Observed dataXandYand initial h = (hn,hs,ht).

EnsureRepeated application ensures that h = (hn,hs,ht) is a sample from Eq.(9).

Sample:

hn phnjY; hs; ht / phnjAnj qn

2nn jHTnA1n Hnj

qm2mn jRj

nm2 :

Sample:

hs phsjY; hn; ht / phsjAsj qn

2ns jHTsA1s Hsj

qm2ms jRj

nm2 :

Sample:

ht phtjY; hn; hs / phtjAtj qn

2ntjHTtA1t Htj

qm2mtjRj

nm2 :

2.2.5. Choice of the covariance function

The separable model described in this section requires the specification of three covariance functionscn(, ;hn),cs(, ;hs)

andct(, ;ht). In this work, we choose to work with:

cnnn1 ; nn2 ; hn:exp 1

2

Xknk1

nn1 ;k nn2 ;k2

r2n;k

!( )gndn1n2 ;

csxs;n1 ;xs;n2 ; hs:exp 1

2Xks

k1

xs;n1 ;kxs;n2 ;k2

r2s;k

!( )gsdn1n2 ;

cttn1 ; tn2;ht:exp 1

2

tn1 tn2 2

r2t

!( )gtdn1n2 ;

with the hyper-parameters completely specified by:

hn rn;gn; hs rs;gs and ht rt;gt:

The core part of the covariance functions is based on the Square Exponential (SE) kernels with the ra,a= n,s,thyper-param-eters being interpreted as the length scale of each input dimension. Thega, a= n, s, tare termed nuggets. The main purposeof the nuggets is to ensure the well-conditioning of the covariance matrices involved in the calculations and they are ex-

pected to be typically small (of the order of 102). By looking at the full covariance function of the separable model and

ignoring second-order products of the nuggets, we can see that the gn+gs+gtcan be interpreted as the measurement noise.

Apart from improving the stability of the computations, one can argue that the presence of the nugget can also lead to better

predictive accuracy[31].

http://-/?-http://-/?-


7/28

The priors of the hyper-parameters rn,gn, rs, gs, rtand gt, should be chosen to represent any prior knowledge about the

computer code that might be available. In order to ensure positive support, we make the common choicea= n,s andt:

pra;kjca Era;kjca; 14

pgajfa Egajfa; 15

whereEjk denotes the probability density of the exponential distribution with parameter k > 0.

For the proposal required by the MCMC sampling schemes described in the previous section, we use a log-normal random

walk for all hyper-parameters (again because they are all positive). The step size of the random walk is selected so that the

observed acceptance ratio of the MCMC is between 30 and 60%.

The particular values ofca and fa for a= n,s andtare specified in each numerical example.

2.3. Application to uncertainty quantification

In uncertainty quantification tasks, one specifies a probability density on the inputs ns,p(n), and tries to quantify the

probability measure induced by it on the output field. In this work, we quantify this uncertainty by interrogating the surro-

gate built using the Gaussian process model introduced in the previous sections (see Eq. (8)). The whole process is compli-

cated by the fact that our model in reality defines a probability measure over the function space of potential surrogates. This

probability measure essentially quantifies the lack of information regarding the real response due to the finite number of

observations. In a fully Bayesian setting, this probability measure will be reflected as a probability measure on the predicted

statistics (e.g. mean, variance, PDFs, etc.). To the best of our knowledge, such ideas were introduced for the first time in the

statistics literature[13]but were largely ignored by the UQ community. Inspired by the above mentioned work, we will de-

scribe in this section how our model can be used to essentially sample the posterior distribution of the induced statistics. The

procedure is conceptually simple and described in Algorithm 2. The key component of this algorithm is the ability to sample

a response surface based on Eq.(8)that can be described analytically via a kernel representation. This is achieved through

the generalization of the techniques discussed in [13]. The final component of the algorithm has to do with the evaluation of

the statistics of interest induced by this response surface. We will show that our model allows for semi-analytic calculation

of all statistics up to second-order. Higher-order statistics, or full probability densities have to be obtained using Monte Carlo

techniques on the sampled surrogate surface.

Algorithm 2. Sampling the posterior of the statistics. By repeatedly calling this algorithm, error bars for the desired statistics

may be obtained

RequireObserved data XandYand h0 sample from Eq.(9).

EnsureSis a sample from the statistic of interest.

Sample a newh1 following the Gibbs procedure given in Section2.2.

Sample a response surface usingAlgorithm 3.

Interrogate the obtained response surface (analytically or via MC) to obtain S.

2.3.1. Sampling a response surface

In order to obtain an analytical representation of the response surface, Ref.[13] suggests selecting a space filling design of

the input variables, using Eq. (8) to sample the outputs and then augmenting the original data set with the new observations

to derive an updated Eq.(8)with reduced variance. The mean of the updated posterior predictive distribution is an analytic

function that can be thought of (if its variance is sufficiently small) as a sample from the predictive probability measure. Sev-

eral problems arise if one follows this approach. To start with, one does not know a priori how many design points are re-

quired in order to reduce the predictive variance to a pre-specified tolerance. Furthermore, design points must be well placed

and far away from the initial observations in order to avoid numerical instabilities. Finally and this is particular to our mod-

el including design points in all sets of different inputs (n,xsandt) breaks down the Kronecker product representation of

the covariance and design matrices which, in turn, leads to a tremendous computational burden. In order to avoid the latter

of these conundrums, we choose to work with the same spatial and time points as the ones included in the original data

(namely Xs andXt). This approximation, will ignore only a hopefully small part of the epistemic uncertainty due to

the finite number of observations. That is, we only choose design points in Xn. The former two problems are addressed

by employing a sequential strategy in which the ns are selected one by maximizing the predictive variance until a specific

tolerance is achieved. This approach is guaranteed to produce a space filling design that is well separated from the original

observations. In addition, the only covariance matrix that needs to be updated is the one pertaining to n. In AppendixC, we

describe how the Cholesky decomposition of the covariance matrix as well as solutions to the relevant linear systems can be

updated in quadratic time when new design points are added.

http://-/?-http://-/?-http://-/?-http://-/?-


8/28

Consider h fixed and letXn;d 2 Rnndkn andYd 2 R

nndnsntq denote the set ofns and the corresponding outputs whend

design points have been observed. That is

Xn;d

Xn

nnn1

..

.

nnnd

0BBBBB@

1CCCCCA

:

Ford = 0, we obtain the observed data:

Xn;0 Xn and Y0 Y:

DefineBd 2 Rmq, Hn;d 2 R

nndmn ,An;d 2 Rnndnnd to be the weight, design and covariance matrices pertaining to n, respec-

tively, whenXn,d and Yd have been observed. In order to avoid cluttering the final formulas, let us also define:

an;dn

an

cnnnn1; n; h

..

.

cnnnnd; n; h

0BBBBB@

1CCCCCA

2 Rnnd ;

Ad An;dAsAt

and

Hd Hn;d Hs Ht:

Now, let n2 Xnand Zn 2 Rnsntq be the output at nand all spatial and time points in Xsand Xt. By using Bayes theorem and

Eq.(8), we can show that:

ZnjYd; h Tnsntq Mdn; Cdn;R;nd m

; 16

wherend= (nn+ d)nsntand the mean is given by:

Mdn hTn n Hs HtBd an;dn

T AsAtA1d Yd HdBd 17

and the covariance matrix by:

Cdn cnn; n; h AsAt an;dn AsAt T

A1d an;dn AsAt

hnn Hs Ht

HTdA1d

THTdA

1d Hd

1 hnn Hs Ht H

TdA

1d : 18

In order to sample Eq.(16), we need to compute the Cholesky decomposition ofCd(n). This is not trivial, sinceCd(n) does not

have a particular structure. In the numerical examples and in particular for the porous flow problem considered in Sec-

tion3.2 this matrix turned out to be extremely ill-conditioned. Even though, theoretically Cd(n) is guaranteed to be sym-

metric positive definite, numerically it must be treated as positive semi-definite. For this reason, one has to use a low-rank

approximation ofCd(n) using the pivoted Cholesky factorization [32]. This can be carried out using the LAPACK routine dpstrf.

The tolerance we used for this approximation was 103 for all numerical examples we considered. We found no observable

difference between samples obtained with the normal Cholesky factorization and this approach. Finally, let us mention that ascalar quantity that is associated directly with the uncertainty pertaining n is given by:

r2dn trCdntrR

nsntq pn: 19

This is the sum of the variances of all outputs at all different spatial and time points weighted by the input probability dis-

tributionp(n). The idea is to sequentially augment the data set by including the design points from a dense subset Xn ofXnthat maximize Eq.(19)until a pre-specified tolerance is achieved. At that point, the joint predictive mean given by Eq. (17)

may be used as an analytic sample surrogate surface. In general, we would like to evaluate the response surface on a denser

spatial designXs 2 Rns ks and/or more time steps Xt 2 R

nt ). The joint predictive mean at those points is given by:

Mdn hnnT Hs H

t Bd an;dnT A;Ts A;Tt A1d Yd HdBd; 20

http://-/?-http://-/?-


9/28

whereHs 2 Rns ms ; Ht 2 R

nt

mt are the design matrices that pertain to the test spatial and time points, respectively, while

As 2 R

nsns ; A

t 2 R

ntnt are the corresponding cross covariance matrices. We identify Eq. (20)as a sample response surface

from the function space of possible surrogates. The complete algorithmic details are given inAlgorithm 3.

Algorithm 3. Sample a response surface

Require: Observed data XandY, h sampled from Eq. (8), a dense set of design points Xn n1;. . .; n

n

n

n o, the desired final

toleranced > 0 and dense spatial and time designsXs 2 Rns ks andXt 2 R

nt on which we wish to make predictions.

Ensure:AfterdP 1 steps, the uncertainty of Eq.(16)as captured by Eq.(19)is less thand andMdngiven by Eq.(20)

can be used as an analytic representation of the sampled response surface.

Initialized 0.

repeat

Find the next design point:

nnnd1 argmaxn2Xn

r2dn: 21

Sample Znnnd1from Eq.(16).

Augment the set of observations with the pair nnnd1; Znnnd1

.

d d+ 1.untilr2dnnnd< d.

2.3.2. Analytic first-order and second-order statistics

In applications, we are usually interested in first and second-order statistics. We can obtain a sample of the mean re-

sponse by integrating out n from Eq.(20):

Md :

Z Mdnpndn: 22

It can be easily shown that:

Md Th Hs Ht

Bd Ta;dA;Ts A

;Tt

A

1d Yd HdBd

; 23

where

h

Z hnnpndn and a;d

Z an;dnpndn:

Now, let i,j 2 {1,. . . , q} be two arbitrary outputs. The covariance matrix between all possible spatial and time test points is

defined by:

Cij;d :

Z Md;in M

d;i

Md;jn M

d;j

Tpndn; 24

where the subscriptsiandjselect columns of the associated matrices. This matrix, contains all second-order statistics of the

surrogate. For example, the variance of each output i = 1,. . . , qis on all spatial and time locations Xs andXt, respectively is

given by:

Vi;d :diag Cii;d : 25

It can be shown using tensorial notation, that Cij;d may be evaluated by:

Cij;d Hs H

t

Bidmhh H

s H

t

Bjd

h iT A;Ts A

;Tt

~Yidmaa;d A

;Ts A

;Tt

~Yjd

h iT Hs H

t

Bidmha;d A

;Ts A

;Tt

~Yjd

h iT A;Ts A

;Tt

~Yidmah;d H

s H

t

Bjd

h iT; 26

where Bid 2 Rmsmtmn is such that vecBid

Bd;i (i.e. the ith column ofBd), ~Yd 2 Rnq is defined by:

~Yd A1d Yd HBd;

~Yid 2 Rnsntnn is such that vec~Yid

~Yd;i; mhh 2 Rmnmn is given by:



10/28

mhh

Z hnn hhnn h

Tpndn; 27

maa;d 2 Rnnnn by:

maa;d

Z an;dn d;aan;dn a;d

Tpndn; 28

mha;d 2 Rmnnn by:

mha;d Z hnn han;dn a;d

Tpndn 29

andmah,d= mha,dT.

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1KO2, n=100

Time (t)

Meanofy1

(t)

95% Conf. Interval

Mean

(a)

0 2 4 6 8 101.5

1

0.5

0

0.5

1KO2, n=100

Time (t)

Meanofy3

(t)

95% Conf. Interval

Mean

(b)

0 2 4 6 8 100

0.05

0.1

0.15

0.2KO2, n=100

Time (t)

Varianceofy1

(t)

95% Conf. Interval

Mean

(c)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

KO2, n=100

Time (t)

Varianceofy3

(t)

95% Conf. Interval

Mean

(d)

0 2 4 6 8 100

0.05

0.1

0.15

0.2KO2, n=150

Time (t)

Varianceofy1(t)

95% Conf. Interval

Mean

(e)

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8KO2, n=150

Time (t)

Varianceofy3(t)

95% Conf. Interval

Mean

(f)

Fig. 1. KO-2: the thick blue line is the mean of the statistic predicted by our model while the gray area provides 95% confidence intervals. The first row ((a)

and (b)) corresponds to the mean of the response as captured with nn= 100. The second ((c) and (d)) and the last ((e) and (f)) rows show the variance of the

response fornn= 100 andnn= 150, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version

of this article.)



11/28

3 2 1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4

Probability

y2(t=4.0)

95% error bars

Mean of PDF

(a)

3 2 1 0 1 2 30

0.2

0.4

0.6

0.8

1

Probability

y2(t=4.0)

95% error bars

Mean of PDF

(b)

3 2 1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4

Probability

y2(t=4.0)

95% error bars

Mean of PDF

(c)

3 2 1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Probability

y2(t=6.0)

95% error bars

Mean of PDF

(d)

3 2 1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Probability

y2(t=6.0)

95% error bars

Mean of PDF

(e)

3 2 1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Probability

y2(t=6.0)

95% error bars

Mean of PDF

(f)

3 2 1 0 1 2 30

0.5

1

1.5

2

2.5

3

3.5

P

robability

y2(t=8.0)

95% error bars

Mean of PDF

(g)

3 2 1 0 1 2 30

0.5

1

1.5

2

2.5

3

3.5

4

P

robability

y2(t=8.0)

95% error bars

Mean of PDF

(h)

3 2 1 0 1 2 30

0.5

1

1.5

2

2.5

3

3.5

4

P

robability

y2(t=8.0)

95% error bars

Mean of PDF

(i)

3 2 1 0 1 2 30

0.5

1

1.5

2

2.5

Pro

bability

y2(t=10.0)

95% error bars

Mean of PDF

(j)

3 2 1 0 1 2 30

0.5

1

1.5

2

2.5

Pro

bability

y2(t=10.0)

95% error bars

Mean of PDF

(k)

3 2 1 0 1 2 30

0.5

1

1.5

2

2.5

Pro

bability

y2(t=10.0)

95% error bars

Mean of PDF

(l)

Fig. 2. KO-2: thefirst column corresponds to nn= 70, the secondto nn= 100 and the third to nn= 150. Each row depictsthe PDF ofy2(t) for times t = 4,6,8,10.

Thethick blue line is themean of the PDFpredicted by our modelwhile thegray area provides 95%confidence intervals. (For interpretation of the references

to color in this figure legend, the reader is referred to the web version of this article.)



12/28

3. Numerical examples

3.1. KrainchnanOrszag three-mode problem

Consider the system of ordinary differential equations [33]:

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=4.0)

y3

(t=4.0

)

Mean of PDF

0

0.5

1

1.5

2

2.5

3

(a)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=4.0)

y3

(t=4.0

)

Mean of PDF

0

0.5

1

1.5

2

2.5

3

(b)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=4.0)

y3

(t=4.0

)

Mean of PDF

0

0.5

1

1.5

2

2.5

(c)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=6.0)

y3

(t=6.0

)

Mean of PDF

0

0.2

0.4

0.6

0.8

1

1.2

(d)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=6.0)

y3

(t=6.0

)

Mean of PDF

0

0.5

1

1.5

(e)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=6.0)

y3

(t=6.0

)

Mean of PDF

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

(f)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=8.0)

y3

(t=8.0

)

Mean of PDF

0

0.5

1

1.5

2

2.5

(g)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=8.0)

y3

(t=8.0

)

Mean of PDF

0

0.5

1

1.5

2

(h)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=8.0)

y3

(t=8.0

)

Mean of PDF

0

0.5

1

1.5

2

2.5

3

(i)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=10.0)

y3

(t=10.0

)

Mean of PDF

0

0.5

1

1.5

(j)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=10.0)

y3

(t=10.0

)

Mean of PDF

0

0.5

1

1.5

2

(k)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=10.0)

y3

(t=10.0

)

Mean of PDF

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

(l)

Fig. 3. KO-2: the first column corresponds to nn= 70, the second tonn= 100 and the third tonn= 150. Each row depicts the joint PDF ofy2(t) andy3(t) for

timest= 4,6,8,10.



13/28

dy1dt

y1y3;

dy2dt y2y3;

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=4.0)

y3

(t=4.0

)

Variance of PDF

0

0.1

0.2

0.3

0.4

0.5

(a)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=4.0)

y3

(t=4.0

)

Variance of PDF

0

0.1

0.2

0.3

0.4

0.5

(b)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=4.0)

y3

(t=4.0)

Variance of PDF

0

0.05

0.1

0.15

0.2

0.25

0.3

(c)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=6.0)

y3

(t=6.0)

Variance of PDF

0

0.05

0.1

0.15

(d)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=6.0)

y3

(t=6.0)

Variance of PDF

0.02

0.04

0.06

0.08

0.1

0.12

0.14

(e)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=6.0)

y3

(t=6.0)

Variance of PDF

0.01

0.02

0.03

0.04

0.05

0.06

0.07

(f)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=8.0)

y3

(t=8.0

)

Variance of PDF

0.1

0.2

0.3

0.4

0.5

(g)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=8.0)

y3

(t=8.0)

Variance of PDF

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

(h)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=8.0)

y3

(t=8.0

)

Variance of PDF

0

0.1

0.2

0.3

0.4

(i)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=10.0)

y3(t=10.0)

Variance of PDF

0.05

0.1

0.15

0.2

(j)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=10.0)

y3(t=10.0)

Variance of PDF

0

0.05

0.1

0.15

0.2

0.25

(k)

2 1 0 1 22

1.5

1

0.5

0

0.5

1

1.5

2

y2(t=10.0)

y3(t=10.0)

Variance of PDF

0.02

0.04

0.06

0.08

0.1

0.12

(l)

Fig. 4. KO-2: the first column corresponds to nn= 70, the second tonn= 100 and the third tonn= 150. Each row depicts the predictive variance of the joint

PDF ofy2(t) andy3(t) for times t= 4,6,8,10.



14/28

dy3dt

y21y22

subject to random initial conditions att= 0: The stochastic initial conditions are defined by:

y10 1; y20 0:1n1; y30 n2;

where

ni U1; 1; i1; 2:

This dynamical system is particularly interesting because the response has a discontinuity at n1= 0. The deterministic sol-

ver we use is a 4th order RungeKutta method as implemented in GNU Scientific Library[34].

As is apparent, the input variablesn represent the initial conditions. We will consider the case of two input dimensions,

i.e.kn= 2. The output consist of three distinct variables (q= 3) that are functions of time (ks= 0). For convenience, we choose

to work with a constant prior mean by selecting:

hnn 1 and htt 1:

That is,mn= 1,mt= 1. We fixnnand gather the input dataXn 2 Rnnkn from a Latin hyper-cube design[35]. We solve the sys-

tem for the time interval [0,10] and record the response at 20 equidistant time steps, i.e.Xt2 Rnt with nt= 20. Both Xnand Xtare scaled in [0,1]. The priors are specified by selecting:

ca 1=0:05 and fa 106; for a n; t:

This means, that we a priori assume that the mean for all length scales is 0.05 and the mean of all the nuggets is 106. We

train our model fornn= 70, 100 and 150 observations by sampling the posterior ofh = (rn,gn, rt,gt) given in Eq. (9)following

the Gibbs-MCMC procedure described in Algorithm 1. To initialize the Markov chain, we sample the prior (Eqs. (14) and (15))

of the hyper-parameters 100 times and set h0 equal to the sample with maximum posterior probability defined by Eq.(9).

The proposals are selected to be log-normal random walks and the step size (the same for all types of inputs) is set to 0.01.

The chain is well mixed after about 500 iterations of the Gibbs scheme.

After the Markov chain has been sufficiently mixed, we are ready to start making predictions. Predictions are made at 50

equidistant time steps in [0,10], i.e. Xt 2 Rn

t with nt 50. Then, we draw 100 samples from the posterior distribution of the

statistics of interest as described inAlgorithm 2with tolerance d = 102. We plot the mean of the statistics as well as 95%

error bars (2 times the standard deviation of the statistic). To calculate the mean of a sampled response surface, we use

0 200 400 600 800 10000

2

4

6

8

10

Number of samples x 100

Lengthscalesof

0 200 400 600 800 10000

0.02

0.04

0.06

0.08

0.1


Spatiallengthscales

0 200 400 600 800 10000.005

0.01

0.015

0.02


Nuggets

Fig. 5. Porous flow: samples drawn from the posterior of the hyper-parameters ((a) for rn, (b) for rsand (c) for the nuggets) for the case of 120 observations.

It is apparent that the spatial length scales are clearly identified, while the hyper-parameters of the stochastic variables have a much wider posterior. Of

course, this is expected given the limited number of observations.



15/28

Eq. (23) while for the variance we use the diagonal ofCii;d ; i 1;. . .; q(Eq. (26)). One or two dimensional probability densities

for each sampled response surface are evaluated by the following MC procedure: (1) We draw 10,000 samples fromp(n); (2)

We evaluate the sampled response (Eq.(20)) at each one of these ns; (3) We use a one- or two-dimensional kernel density

estimator[36]to approximate the desired PDF. The predicted means of all the statistics are practically identical to the ones

obtained via Monte Carlo estimate (not shown in the figures, see [17]). The first row ofFig. 1 shows the time evolution of the

mean ofy1(t) andy3(t) fornn= 100. Notice that the error bars are very tight. The second and third rows of the same figures

depict the variance of the same quantities for nn= 100 andnn= 150, respectively. We can see the width of the error bars

decreasing as the number of observations is increased. Fig. 2shows the time evolution of the probability density ofy2(t).The four rows correspond to different time instants (specifically t= 4, 6, 8 and 10). The columns correspond to nn= 70,

100 and 150 counting from the left. Fig. 3shows the time evolution of the joint probability density ofy2(t) andy3(t). The

four rows correspond to different time instants (specifically t= 4, 6, 8 and 10). The columns correspond to nn= 70, 100

and 150 counting from the left. The variance of the same joint probability density is shown in Fig. 4.

3.2. Flow through porous media

In this example, we study a two-dimensional, single phase, steady-state flow through a random permeability field. A good

review of the mathematical models of flow through porous media can be found in [37]. The spatial domain Xs is chosen to be

the unit square [0,1]2, representing an idealized oil reservoir. Let us denote withp andu the pressure and the velocity fields

of the fluid, respectively. These are connected via the Darcy law:

u Krp; in Xs; 30

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Fig. 6. Porous flow: mean ofux. Subfigures (a)(c) show the mean of the mean ofux for 24, 64 and 120 observations, respectively. Subfigure (d) plots two

standard deviations of the mean ofux for 120 observations. Finally, (e) shows the MC estimate of the same quantity using 108,000 observations.



16/28

whereK is the permeability tensor that models the easiness with which the liquid flows through the reservoir. Combining

the Darcy law with the continuity equation, it is easy to show that the governing PDE for the pressure is:

r Krp f; in Xs; 31

where the source termfmay be used to model injection/production wells. We use two model square wells: an injection well

on the left-bottom corner of Xs and a production well on the top-right corner. The particular mathematical form is as

follows:

fxs

r; if jxsi 12 wj 0 is its variance. The values we choose for the

parameters arem= 0,k= 0.1 andsG= 1. In order to obtain a finite dimensional representation ofG, we employ the Karhun-

enLove expansion[39]and truncate it afterkn= 50 terms:

Gw;xs m Xknk1

wkwkxs;

wherew w1;. . .; wkn is vector of independent, zero mean and unit variance Gaussian random variables and wk(xs) are theeigenfunctions of the exponential covariance given in Eq. (34) (suitably normalized, of course). In order to guarantee the ana-

lytical calculation of statistics of the first-orderpand uof Section 2.3, we choose to work with the uniform random variables

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.15

0.1

0.05

0

0.05

0.1

0.15

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.15

0.1

0.05

0

0.05

0.1

0.15

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.15

0.1

0.05

0

0.05

0.1

0.15

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.005

0.01

0.015

0.02

0.025

0.03

0.035

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.15

0.1

0.05

0

0.05

0.1

0.15

Fig. 8. Porous flow: mean ofp. Subfigures (a)(c) show the mean of the mean ofp for 24, 64 and 120 observations, respectively. Subfigure (d) plots two

standard deviations of the mean ofp for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000 observations.

http://-/?-http://-/?-


18/28

nk Uwk U0; 1; k1;. . .;kn;

where U() is the cumulative distribution function of the standard normal distribution. Putting it all together, the finite-

dimensional stochastic representation of the permeability field is:

Kn;xs exp m Xkn

k1

U1nkw

k

xs( ): 35In order to make the notational connection with the rest of the paper obvious, let us define the response of the physical

model as

f: Xn Xs ! Rq;

where, of course, Xn 0; 1kn ,Xs 0; 1

2 andq = 3. That is,

fn;xs pn;xs; un;xs;

wherep(n;xs) and u(n;xs) is the solution of the boundary problem defined by Eqs. (30), (31) and (33) at the spatial pointxsfor

a permeability field given by Eq.(35). Our purpose is to learn this map and also propagate the uncertainty of the stochastic

variables through it by using a finite number of simulations.

The boundary value problem is solved using the Mixed Finite Element formulation. We use first-order RaviartThomas

elements for the velocity [40], and zero-order discontinuous elements for the pressure [41]. The spatial domain is discretized

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

4

5

6

7

8

9

10

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

4

6

8

10

12

14

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

2

4

6

8

10

12

14

16

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.5

1

1.5

2

2.5

3

3.5

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0

0.005

0.01

0.015

0.02

0.025

Fig. 9. Porous flow: variance ofux. Subfigures (a)(c) show the mean of the variance ofuxfor 24, 64 and 120 observations, respectively. Subfigure (d) plots

two standard deviations of the variance ofux for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000

observations.



19/28

using a 64 64 triangular mesh. The solver was implemented using the Dolfin C++library [42]. The eigenfunctions of the

exponential random field used to model the permeability were calculated via Stokhos which is part of Trilinos[43].

For each stochastic input n, the response is observed on a regular 32 32 square spatial grid. Because of the regular nat-

ure of the spatial grid as well as the separable nature of the Square Exponential correlation function we use, it can be easily

shown that the 1024 1024 spatial covariance matrixAs can be written as

As As;1As;2;

whereAs,i, i= 1, 2 are 32 32 covariance matrices pertaining to the horizontal and vertical spatial directions, respectively. Of

course, it is vital to make use of this simplification. The data collected this way are used to train a 3-dimensional Gaussian

process which is then used to make predictions on the same spatial grid. We train our model, using in sequence 24, 64 and

120 observations of the deterministic solver in which the stochastic inputs are selected from a Latin hyper-cube design. The

prior hyper-parameters are set to:

cn 1=3

cs 1=0:01

fa 102; fora n; s:

The initial values of the hyper-parameters used to start the Gibbs procedure are chosen to be the means of the priors. For

each training set, we sample the posterior of the hyper-parameters 100,000 times (see Fig. 5for a representative example).

Then, we draw 100 sample surrogates as described inAlgorithm 3. For each sampled surrogate, we calculate the statistics of

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

2

4

6

8

10

12

14

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

2

4

6

8

10

12

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

2

4

6

8

10

12

14

16

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.5

1

1.5

2

2.5

3

3.5

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0

0.005

0.01

0.015

0.02

0.025

Fig. 10. Porous flow: variance ofuy. Subfigures (a)(c) show the mean of the variance ofuyfor 24, 64 and 120 observations, respectively. Subfigure (d) plots

two standard deviations of the variance ofuy for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000

observations.



20/28

interest. Finally, we compute and report the mean and the standard deviation of these statistics. The results are compared to

Monte Carlo estimates.

Fig. 6compares the mean of the mean ofux to a Monte Carlo estimate using 108,000 observations. Two standard devia-

tions of the mean ofuxfor the case of 120 observations are shown in subfigure (d). The same statistic foruyandpis reportedinFigs. 7 and 8, respectively.Fig. 9compares the mean of the variance ofuxto a Monte Carlo estimate using 108,000 obser-

vations. Two standard deviations of the variance ofuxfor the case of 120 observations are shown in subfigure (d). The same

statistic for uy and p is reported inFigs. 10 and 11, respectively. We observe especially for the cases of 24 and 64 obser-

vations that the variance is underestimated. Of course, this is to be expected given the very limited set of data available.

Fortunately, the error bars seem to compensate for this under-estimation with the exception of the case that corresponds to

the variance of the pressurep.

Fig. 12depicts the predicted probability densities ofux(0.5,0.5) along with their error bars, for all available training sets.

We see that the tails of the probability density are not estimated correctly. In particular, we observe two different types of

potential problems. Firstly, the left hand side puts too much weight on negative values forux(0.5, 0.5) even though it is quite

clear (see subfigure (d)) thatuxis always positive on that particular spatial location. However, our prior assumption is that

the response is a sample from a Gaussian random field. Hence, negative values forux(0.5,0.5) are very plausible. The model,

can correct this belief only by observing an adequate number of data points. It is a fact, that all 120 observations ofux near

(0.5,0.5) are strictly positive. However, these observations are not enough to change the prior belief that ux(0.5,0.5) might

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

2

4

6

8

10

12

x 104

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.5

1

1.5

2

2.5

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.5

1

1.5

2

2.5

x 103

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

x 104

x

y

0 0.5 10

0.2

0.4

0.6

0.8

1

0.5

1

1.5

2

2.5

3

3.5

4

4.5

x 103

Fig. 11. Porous flow: variance ofp. Subfigures (a)(c) show the mean of the variance ofp for 24, 64 and 120 observations, respectively. Subfigure (d) plots

two standard deviations of the variance of p for 120 observations. Finally, subfigure (e) shows the MC estimate of the same quantity using 108,000

observations.



21/28

also take negative values. Notice, though, that as we go from (a)(c), the trend is gradually corrected. If one wanted to incor-

porate the fact that a quantity of interest is strictly positive, then it is usually recommended to observe the logarithm of the

quantity instead. Let us now get to the second problem which has to do with the underestimation of the right tail of the

0.1 0 0.1 0.2 0.30

2

4

6

8

10

ux(0.50,0.50)

Probabilitydensity

0.1 0 0.1 0.2 0.30

2

4

6

8

10

ux(0.50,0.50)

Probabilitydensity

0.1 0 0.1 0.2 0.30

2

4

6

8

10

ux(0.50,0.50)

Probabilitydensity

0.1 0 0.1 0.2 0.30

2

4

6

8

10

ux(0.50,0.50)

Probabilitydensity

Fig. 12. Porous flow: the PDF ofux(0.5,0.5). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)

observations. The filled gray area corresponds to two standard deviations of the PDFs about the mean PDF. The solid red line of (d) is the Monte Carlo

estimate using 10,000 observations. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this

article.)

0.1 0 0.1 0.2 0.3 0.40

2

4

6

ux(0.25,0.25)

Probabilit

ydensity

0.1 0 0.1 0.2 0.3 0.40

2

4

6

ux(0.25,0.25)

Probabilit

ydensity

0.1 0 0.1 0.2 0.3 0.40

2

4

6

ux(0.25,0.25)

Prob

abilitydensity

0.1 0 0.1 0.2 0.3 0.40

2

4

6

ux(0.25,0.25)

Prob

abilitydensity

Fig. 13. Porous flow: the pdf ofux(0.25,0.25). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)



article.)



22/28

distribution. Let us start by noticing that cases (a) and (b) put enough weight on it. The reason for this is not that there are

observations close to this region. It is again a consequence of the Gaussian assumption, just like in the first problem we

discussed. However, for the case of 120 observations, we see that the model significantly underestimates the right tail.

0.04 0.02 0 0.02 0.040

10

20

30

40

50

60

70

p(0.50,0.50)

Probabilitydensity

0.04 0.02 0 0.02 0.040

10

20

30

40

50

60

70

p(0.50,0.50)

Probabilitydensity

0.04 0.02 0 0.02 0.040

10

20

30

40

50

60

70

p(0.50,0.50)

Probabilitydensity

0.04 0.02 0 0.02 0.040

10

20

30

40

50

60

70

p(0.50,0.50)

Probabilitydensity

Fig. 14. Porous flow: the pdf ofp(0.5,0.5). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)



article.)

0 0.05 0.1 0.150

10

20

30

40

p(0.25,0.25)

Probabilitydensity

0 0.05 0.1 0.150

10

20

30

40

p(0.25,0.25)

Probabilitydensity

0 0.05 0.1 0.150

10

20

30

40

p(0.25,0.25)

Probabilitydensity

0 0.05 0.1 0.150

10

20

30

40

p(0.25,0.25)

Probabilitydensity

Fig. 15. Porous flow: the PDF ofp(0.25,0.25). The blue lines show the average PDF over 100 sampled surrogates for the cases of 24 (a), 64 (b) and 120 (c)



article.)



23/28

The reason, of course, is that there is not a single observation in the training set that takes values close to that region. One

cannot possibly expect to capture a long tail without observing any events on it. The remedy here is a smarter choice of the

observations on the lines of the active learning techniques that we have investigated in other places [17]. It is needless to say,

that if the purpose of the practitioner is the investigation of improbable events, then she should favor active learning

schemes that have a bias for extreme values. This is clearly beyond the scope of the present work. Finally, Figs. 1315show

the predicted probability densities ofux(0.25,0.25),p(0.5,0.5) andp(0.25,0.25), respectively. The same comments as for the

ux(0.5,0.5) case are also applicable here. Finally, the joint probability density ofux(0.5,0.5) and uy(0.5,0.5) is shown in Fig. 16.

We observe again, the underestimation of the top right long tail of the distribution and the broadening that occurs close to

(0,0).

4. Conclusions

We developed a multi-output Gaussian process model that explicitly models the linear part of correlations between dis-

tinct outputs as well as space and/or time. By exploiting the static nature of the spatial/time inputs as well as the special

nature of separable covariance functions, we were able to express all required quantities for inference and predictions in

terms of Kronecker products. This led to highly efficient computations both in terms of memory and CPU time. We recog-

nized the fact that the posterior predictive distribution of the Gaussian process defines a probability measure on the function

space of possible surrogates and we described an approximate method that yields kernel-based analytic sample surrogates.

The scheme was applied to uncertainty quantification tasks in which we were able to quantify the epistemic uncertainty

induced by the limited number of observations.

ux(0.50,0.50)

uy

(0.50,0.50)

0.2 0 0.2 0.4

0.1

0

0.1

0.2

0.3

0

5

10

15

20

25

30

35

40

(a)

ux(0.50,0.50)

uy

(0.50,0.50)

0.1 0 0.1 0.2 0.3

0.1

0

0.1

0.2

0

10

20

30

40

50

60

(b)

ux(0.50,0.50)

uy

(0.50,0.50)

0 0.1 0.2

0.05

0

0.05

0.1

0.15

0.2

0.25

0

10

20

30

40

50

60

70

(c)

ux(0.50,0.50)

uy

(0.50,0.50)

0 0.1 0.2

0.05

0

0.05

0.1

0.15

0.2

0.25

2

4

6

8

10

12

14

16

18

20

(d)

ux(0.50,0.50)

uy

(0.50,0.50)

0 0.1 0.2 0.30

0.1

0.2

0.3

0

10

20

30

40

50

60

70

80

(e)

Fig. 16. Porous flow: the joint PDF ofux(0.5,0.5) anduy(0.5,0.5). The contours show the average joint PDF over 100 sampled surrogates for the cases of 24

(a), 64 (b) and 120 (c) observations, respectively. Subfigure (d) plots two standard deviations of the joint PDF for 120 observations. Finally, subfigure (e)

shows the MC estimate of the same quantity using 10,000 observations.



24/28

Despite the successes, we observe certain aspects that require further investigation. Firstly, we noticed a systematic

underestimation of the tails of the predicted probability densities. Of course, this is expected in a limited observations set-

ting. However, we are confident that the model can be considerably improved without losing in efficiency in several different

ways. To start with, in the flow through porous media example, we can see that the assumption of stationarity in space is

wrong. It is evident that the velocities vary more close to the wells, than they do away from them. The stationary covariance

in space, forces the model to make a compromise in the spatial length scales. On one hand, regions close to the wells seem

smoother than necessary while, on the other hand, regions away from them are more wavy. Hence, we are expecting that

using a non-stationary covariance or a tree-based model will improve the situation significantly[17]. Furthermore, it would

be very interesting to see how the results would change if a sequential active learning approach was followed for the col-

lection of the observations. It seems plausible, that the most effective way to improve the tails of the distributions would

be to select observations with extreme properties. A simple variance-based active learning scheme seems inadequate (of

course here we are talking about the case in which the observations are kept to very small number). The particulars of an

alternative are a very interesting research topic.

Acknowledgements

The research at Cornell was supported by an OSD/AFOSR MURI09 award on uncertainty quantification, the US Depart-

ment of Energy, Office of Science, Advanced Scientific Computing Research and the Computational Mathematics program

of the National Science Foundation (NSF) (award DMS-1214282). The research at Pacific Northwest National Laboratory

(PNNL) was supported by the Applied Mathematics program of the US DOE Office of Advanced Scientific Computing Re-

search. PNNL is operated by Battelle for the US Department of Energy under Contract DE-AC05-76RL01830. This researchused resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science

of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Additional computing resources were provided

by the NSF through TeraGrid resources provided by NCSA under Grant No. TG-DMS090007.

Appendix A. Kronecker product properties

A.1. Calculating matrixvector and matrixmatrix products

A.1.1. Matrixvector product

LetA 2 Rm1n1 ; B2 Rm2n2 andx2 Rn1n2 . We wish to calculate:

y A B x2 Rm1m2 ;

without explicitly forming the Kronecker product. This may be easily achieved by exploiting the properties of the vectoriza-tion operation vec()[22]. LetX2 Rn2n1 be the matrix formed by folding the vector xso thatx= vec(X). Then we obtain:

y vecBXAT: A:1

So all we need to do is two matrix multiplications. Notice that for the case of triangular A and B no additional memory is

required.

A.1.2. Matrixmatrix product

LetA andB be as before and X2 Rn1n2s. We wish to calculate:

Y A BX2 Rm1m2 s:

This can be trivially calculated by working column by column using the results of the previous subsection.

A.1.3. Three Kronecker products

LetC 2 Rm3n3 andx2 Rn1n2n3 . Then the product

y A B Cx2 Rm1m2m3

can by calculated by observing that:

y vecCXA CT;

where X2 Rn3n1n2 such that x= vec(X). To simplify the expression inside the vectorization operator, let

Z A C XT 2 Rn1n2n3 . This matrix can be calculated using what was described in the previous subsection. Finally, we ob-

tain the following:

y vecCZT: A:2

Again notice that if all matrices are triangular all operations can be performed without additional memory.



25/28

A.2. Solving linear systems

Now let A 2 Rmm; B2 Rnn andy2 Rmn. We wish to solve the linear system:

A Bxy

forx2 Rmn. LetX2 Rmn andY2 Rmn be such thatx= vec(X) andy= vec(Y), respectively. Using, again, the properties of the

vectorization operator, we obtain:

BXAT Y:

Therefore, we can findXby solving two linear systems:

BZY; A:3

AXT ZT: A:4

IfA andB are triangular matrices, then no additional memory is required.

Finally, let C 2 Rss be another matrix and y2 Rnms. We wish to solve the linear system:

A B Cx y

forx2 Rnms. LetX2 Rsmn andY2 Rsmn be such thatx= vec(X) andy= vec(Y), respectively. Then

CX A B T Y:

We start by solving the system:

CZ Y

and then solve the system:

A BXT ZT;

using the results of the previous paragraph on each of the rows ofXandZ.

Appendix B. Implementation details

Given a set of hyper-parameters h, the various statistics may be evaluated efficiently in the following sequence:

1. Compute the Cholesky decomposition of all covariance matrices:

An LnLTn ;

As LsLTs;

At LtLTt:

2. Scale the outputs by solving in place the linear system:

Ln Ls Lt

~Y Y:

3. Scale the design matrices by solving in place the linear systems:

Ln ~Hn Hn;

Ls ~Hs Hs;

Lt ~Ht Ht:

4. Calculate the QR factorizations of the scaled design matrices:

~Hn QnRn; Qn Qn;1 Qn2 ; Rn RTn;1 0

h iT;

~Hs QsRs; Qs Qs;1 Qs2 ; Rs RTs;1 0

h iT;

~Ht QtRt; Qt Qt;1 Qt2 ; Rt RTt;1 0

h iT;

where fora = n,s,t, Qa;12 Rnama ; Qa;2 2 R

nanama andRa;12 Rmama is upper triangular.

5. Find B by solving in place the upper triangular system:

Rn;1 Rs;1 Rt;1B Qn;1 Qs;1 Qt;1 ~Y:6. Calculate (by doingn rank-1 updates) R:



26/28

R 1

n m~Y ~Hn ~Hs ~Ht

B

h iT~Y ~Hn ~Hs ~Ht

B

h i:

7. Calculate the Cholesky decomposition ofR:

RLRLTR:

8. Now we can evaluate all the determinants involved in the posterior ofh:

log jAnj 2Xnni1

log Ln;ii;

log jAsj 2Xnsi1

log Ls;ii;

log jAtj 2Xnti1

log Lt;ii;

log jHTnA1n Hnj 2

Xmni1

log jRn;1;iij;

log jHTsA1s Hsj 2

Xmsi1

log jRs;1;iij;

log jHTtA1t Htj 2Xmti1

log jRt;1;iij;

log jRj 2Xqi1

log LR;ii;

log jAj n

nnlog jAnj

n

nslog jAsj

n

ntlog jAtj;

log jHTA1Hj m

mnlog jHTnA

1n Hnj

m

mslog jHTsA

1s Hsj

m

mtlog jHTtA

1t Htj:

Appendix C. Fast Cholesky updates

This section is concerned with updating the Cholesky decomposition of a covariance matrix in O(n2) time when a new

data point is observed. The updates useful in two occasions:

1. When we are doing active learning without updating the hyper-parameters.

2. When we wish to sample sequentially the joint distribution in order to obtain a response surface (see Section2.3).

LetAn 2 Rnn be a symmetric positive definite matrix and assume that we have already calculated its Cholesky decom-

positionLn 2 Rnn (lower triangular). Now let An+m2 R

(n+m)(n+m) be another symmetric positive definite matrix (e.g. the

one we obtain if we observe m new data points). In particular let it be given by:

Anm An B

BT C

;

whereB 2 Rnm (e.g. cross covariance) andC 2 Rmm (e.g. covariance matrix of the new data). Let Lnm 2 Rnmnm be the

lower triangular Cholesky factor ofAn+m (i.e.Anm LnmLTnm). It is convenient to represent it in the matrix block form:

Lnm D11 0nm

D21 D22

;

whereD112 Rnn; D212 R

mn andD222 Rmm. We will derive formulas for the efficient calculation of the D ij based on the

Cholesky decomposition ofAn. Notice that:

http://-/?-http://-/?-


27/28

Anm Ln1LTn1

) An B

BT C

D11 0nm

D21 D22

D11 0nm

D21 D22

T

) LnL

Tn B

BT C

!

D11DT11 D11D

T21

D21DT11 D21D

T21 D22D

T22

!:

From the above equation, we see right away that D11= Ln.D21can be found by solving the triangular system LnDT

21

B and

finallyD22is the lower triangular Cholesky factor ofC D21DT21. To wrap it up, here is how the update should be performed:

1. SetD11= Ln.

2. Solve the following system forD21:

LnDT21 B:

3. Compute the Cholesky decomposition of:

D22DT22 C D21D

T21;

to findD22.

For the special (but common in practice) case wherem= 1, thenD21is a vector andCandD22are numbers, step 3 can be

replaced by:

D22ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffi

C D21DT21

q :

We can also update solutions of linear systems. Suppose that we have already solved the following system:

Lnxn yn

and after observingm more data points, we need to solve the following:

Lnmxnm ynm:

If you let

ynm yn

z

;

it is trivial to show thatxn+m is given by:

xnm xn

D122 z D21xn

:

References

[1] I. Babuka, F. Nobile, R. Tempone, A stochastic collocation method for elliptic partial differential equations with random input data, SIAM Journal of

Numerical Analysis 45 (3) (2007) 10051034.

[2] D. Xiu, G.E. Karniadakis, The WienerAskey polynomial chaos for stochastic differential equations, Journal of Scientific Computing (24) (2002) 619

644.

[3] S.A. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions, in: Dokl. Akad. Nauk SSSR, vol. 4, 1963, p. 123.

[4] D. Xiu, J.S. Hesthaven, High-order collocation methods for differential equations with random inputs, SIAM Journal on Scientific Computing 27 (3)

(2005) 11181139.

[5] D. Xiu, Efficient collocational approach for parametric uncertainty analysis, Communications in Computational Physics 2 (2) (2007) 293309.[6] F. Nobile, R. Tempone, C. Webster, A sparse grid collocation method for elliptic partial differential equations with random input data, SIAM Journal of

Numerical Analysis 45 (2008) 23092345.

[7] X. Ma, N. Zabaras, An adaptive hierarchical sparse grid collocation algorithm for the solution of stochastic differential equations, Journal of

Computational Physics 228 (8) (2009) 30843113.

[8] C. Currin, T. Mitchell, M. Morris, D. Ylvisaker, A Bayesian approach to the design and analysis of computer experiments, Tech. Rep. ORNL6498, Oak

Ridge Laboratory, 1988.

[9] J. Sacks, W.J. Welch, T.J. Mitchell, H.P. Wynn, Design and analysis of computer experiments, Statistical Science 4 (4) (1989) 409435.

[10] C. Currin, T. Mitchell, M. Morris, D. Ylvisaker, Bayesian prediction of deterministic functions, with applications to the design and analysis of computer

experiments, Journal of the American Statistical Association 86 (416) (1991) 953963.

[11] W.J. Welch, R.J. Buck, J. Sacks, H.P. Wynn, T.J. Mitchell, M.D. Morris, Screening, predicting, andcomputer experiments, Technometrics 34 (1) (1992) 15

25.

[12] A. OHagan, M.C. Kennedy, J.E. Oakley, Uncertainty analysis and other inference tools for complex computer codes, in: J.M. Bernardo et al. (Eds.),

Bayesian Statistics 6, Oxford University Press, 1999, pp. 503524.

[13] J. Oakley, A. OHagan, Bayesian inference for the uncertainty distribution of computer model outputs, Biometrika 89 (4) (2002) 769784.

[14] M.C. Kennedy, A. OHagan, Bayesian calibration of computer models, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (3)

(2001) 425464.

[15] D. Higdon, J. Gattiker, B. Williams, M. Rightley, Computer model calibration using high-dimensional output, Journal of the American StatisticalAssociation 103 (482) (2008) 570583.



28/28

[16] R.B. Gramacy, H.K.H. Lee, Bayesian treed Gaussian process models with an application to computer modeling, Journal of the American Statistical

Association 103 (483) (2008) 11191130.

[17] I. Bilionis, N. Zabaras, Multi-output local Gaussian process regression: applications to uncertainty quantification, Journal of Computational Physics 231

(17) (2012) 57185746.

[18] S. Conti, A. OHagan, Bayesian emulation of complex multi-output and dynamic computer models, Journal of Statistical Planning and Inference 140 (3)

(2010) 640651.

[19] M.D. McKay, R.J. Beckman, W.J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a

computer code, Technometrics 21 (2) (1979) 239245.

[20] D.J. MacKay, Information-based objective functions for active data selection, Neural Computation 4 (1992) 590604.

[21] A.P. Dawid, Some matrix-variate distribution theory: notational considerations and a Bayesian application, Biometrika 68 (1) (1981) 265274.

[22] J. Magnus, H. Neudecker, Matrix Differential Calculus With Applications in Statistics and Econometrics, Wiley Series in Probability and Statistics, JohnWiley, 1999.

[23] C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag, Berlin, New York, Inc., Secaucus, NJ, USA,

2006.

[24] M. Goulard, M. Voltz, Linear coregionalization model: tools for estimation and choice of cross-variogram matrix, Mathematical Geology 24 (1992)

269286.

[25] G. Bourgault, D. Marcotte, Multivariable variogram and its application to the linear model of coregionalization, Mathematical Geology 23 (1991) 899

928.

[26] S. De Iaco, D. Myers, D. Posa, The linear coregionalization model and the productsum spacetime variogram, Mathematical Geology 35 (2003) 2538.

[27] C.F. van Loan, The ubiquitous Kronecker product, Journal of Computational and Applied Mathematics 123 (12) (2000) 85100.

[28] G. Casella, E.I. George, Explaining the Gibbs sampler, The American Statistician 46 (3) (1992) 167174.

[29] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, Equation of state calculations by fast computing machines, The Journal of

Chemical Physics 21 (6) (1953) 10871092.

[30] W.K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57 (1) (1970) 97109.

[31] R.B. Gramacy, H.K. Lee, Cases for the nugget in modeling computer experiments, Statistics and Computing 22 (3) (2012) 713722.

[32] H. Harbrecht, M. Peters, R. Schneider, On the low-rank approximation by the pivoted Cholesky decomposition, Applied Numerical Mathematics 62 (4)

(2012) 428440.

[33] X. Wan, G.E. Karniadakis, An adaptive multi-element generalized polynomial chaos method for stochastic differential equations, Journal ofComputational Physics 209 (2005) 617642.

[34] M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, P. Alken, M. Booth, F. Rossi, GNU Scientific Library Reference Manual, 2009.

[35] M.D. Mckay, R.J. Beckman, W.J. Conover, A comparison of three methods for selecting values of input variables in the analysis of output from a

computer code, Technometrics 42 (1) (2000) 202208.

[36] A.W. Bowman, A. Azzalini, Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations, Oxford Statistical Science

Series, Oxford University Press, USA, 1997.

[37] J.E. Aarnes, V. Kippe, K.-A. Lie, A.B. Rustad, Modelling of Multiscale Structures in Flow Simulations for Petroleum Reservoirs, in: G. Hasle, K.-A. Lie, E.

Quak (Eds.), Geometric Modelling, Numerical Simulation, and Optimization, Springer, Berlin, Heidelberg, 2007, pp. 307360 (Ch. 10).

[38] P. Bochev, R.B. Lehoucq, On finite element solution of the pure Neumann problem, SIAM Review 47 (2001) 5066.

[39] R.G. Ghanem, P.D. Spanos, Stochastic Finite Elements: A Spectral Approach, Springer-Verlag, New York, 1991.

[40] P. Raviart, J. Thomas, A mixed finite element method for 2nd order elliptic problems, in: I. Galligani, E. Magenes (Eds.), Mathematical Aspects of Finite

Element Methods, Lecture Notes in Mathematics, vol. 606, Springer, Berlin/Heidelberg, 1977, pp. 292315 (Ch. 19).

[41] F. Brezzi, T.J.R. Hughes, L.D. Marini, A. Masud, Mixed discontinuous Galerkin methods for Darcy flow, Journal of Scientific Computing 2223 (1) (2005)

119145.

[42] A. Logg, G.N. Wells, J. Hake, DOLFIN: a C++/Python Finite Element Library, Springer, 2012 (Ch. 10).

[43] M.A. Heroux, R.A. Bartlett, V.E. Howle, R.J. Hoekstra, J.J. Hu, T.G. Kolda, R.B. Lehoucq, K.R. Long, R.P. Pawlowski, E.T. Phipps, A.G. Salinger, H.K.

Thornquist, R.S. Tuminaro, J.M. Willenbring, A. Williams, K.S. Stanley, An overview of the trilinos project, ACM Transactions on Mathematical Software31 (3) (2005) 397423.


Documents

MGP SeparableA