IETERESTIMATION IN PARABOLIC AND …...PARAl\'IETERESTIMATION IN PARABOLIC AND HYPERBOLIC EQUATIONS by FinbarrO'Sullivan TECHNICAL REPORT No. 127 May 1988 DepartmentofStatistics,

$Page 1: IETERESTIMATION IN PARABOLIC AND …...PARAl\'IETERESTIMATION IN PARABOLIC AND HYPERBOLIC EQUATIONS by FinbarrO'Sullivan TECHNICAL REPORT No. 127 May 1988 DepartmentofStatistics,$
PARAl\'IETER ESTIMATION

IN PARABOLIC AND HYPERBOLIC EQUATIONS

by

Finbarr O'Sullivan

TECHNICAL REPORT No. 127

May 1988

Department of Statistics, GN-22

University of Wasbington

Seattle, Washington 98195 USA

Parameter Estimation in Parabolic and Hyperbolic Equations

Finbarr 0'SullivarJ

Department of Biostatistics adStatisticsUniversity of Washington

Seattle, WA98195.

ABSTRACT

Two model ill-posed inverse problems associated with the estimation offunctional parameters in I-dimensional parabolic and hyperbolic equations areconsidered. .. These estimation problems are. approached from a statisticalperspective which considers the estimatiop problem in terms of generalized nonlinear regression. A standard regularization method is stdied. The role of thesensitivity matrix for statistical inference is very important, however theevaluation of this matrix involves the numerical solution of a system of coupledparabolic or hyperbolic equations. This is a large scale supercomputing problem,but worth tackling since the sensitivity matrix is useful in the analysis of anumber of practically important problems including: fully adaptive choice of theamount of regularization, intervals estimates for parameters and resolutionanalysis. Numerical experiments are used to justify each of these techniques.The paper includes a discussion of the feasibility of applying similar methods inhigher dimensional settings.

AMS 1980 subject classifications. Primary, 62-G05, Secondary, 62-J05, 41-A35,41-A25, 47-A53, 45-LlO, 45-M05.

Key words and phrases. constrained non-linear regularization, cross-validation,Gauss-Newton algorithm, ill-posed inverse problems, interval estimates,resolution.characteristics, simulation, supercomputing

Running Head: Distributed Parameter Estimation

6,1988

1 Research in part by the National Science Foondation onder Grant No. MCS-840-3239 and by the Departmentof Energy under Grant No. DE-FG06-8SER2S006. Some of the worle was done during a stay at the Institute forMathematics and its University ofMinneliOtll.

Parameter Estimation in Parabolic and Hyperbolic Equations

Finbarr0'Sullivan1

Departmentof Biostatistics ad Statistics

University of Washington

Seattle,WA 98195.

1. Introduction

Estimation problems associated with the identification of functional parameters in partial

differential equations arise in several fields including diffraction tomography, reservoir

engineering and seismology. Several such problems are discussed in[7,8] for example. A

generalized framework for a large class of these problems is outlined in[11, 16]: U, F and e are

function spaces, there is a system operator A such that for each f in F and a in a subset C of e,

there is a locally unique u in U satisfying

A(u,a)=f (1.1)

The inverse system identification problem is to estimate a from measured information about u ,

(j is taken to be known). LetXn be a linear mapping on U,Xn : U ~ Rn, we are given a vector

of measurements z,

(1.2)

where en is a vector of measurement or possible modeling errors (1.1). The estimation problem

for acan be in the form of a non-smear non-parametric regression. Equation (1.1) gives an

Researcl1 supported in partby the National Science Foundationunder Grant No. Mcs.84().3239 and by the Departmentof Energy under Grant No. DE-PG06-8SER25006. Soma of the wod;: was done a stay at the Institute forMathematicsand ofMlanellOtlL

can thus be regarded as non-linear functionals of 9. The data are written as

(1.3)

where the design points Xi lie in some set I. The ci are the measurement/modeling errors. Since

9 is high dimensional and the data are discrete and noisy the estimation problem is ill-posed and

some form of regularization is required to obtain reliable solutions. Various forms of

regularization have been proposed for estimating 9, see(3, 11] for example. A constrained non

linear least squares regularization estimator is defined as

(1.4)

A. > 0 is the regularization parameter and J is an appropriate penalty functional, see(25] for a

discussion of possible penalty functions. Quadratic J functions based on L 2 norms of derivatives

are familiar in the statistical spline smoothing literature and Gaussian state space representations

have been used to motivate such functions from a Bayesian viewpoint(lO]. Non-quadratic J

functions involving L 1 norms ofderivatives for example have also been proposed(20], a Bayesian

motivation using non-Gaussian state space representations is available for these J functions also.

In general C is some subset of the parameter space 8. This might represent constraints such as

positivity, monotonicity ol"c!:)IJ.vexity. In SOme circumstances itmight be appropriate to replace

the residual sum of squares term in (1.4) by a more robust alternative.

relied

9

is much less problemaue since, intuitivelv, the data need only

penalty n.lnlc:tional, see section 3

Theoretical convergence characteristics of constrained non-linear least squares estimators

qU<ldr,ltiC penalty functionals are studied The identifiability ot9 when A. zero and

COIltlnuOtlS plays an role A. than zero

to ideIltify

the sampling

identifiability

some theore~tical

general system identification framework.

1.1. Model Problems

(i) Parabolic Diffusion Equation

The temperature at time t and position x on a l-dimensional rod with unknown heat

conductance characteristics is denoted u (x ,t). An initial heat distribution is placed on the rod

and by observing what happens to the distribution of heat in time one hopes to gain information

about heat conductance 9 of the rod. The system is governed by the differential equation

au _ a (9(x) aU} = 0dt dX dX

with initial/boundary conditions

os e s i , O~t~T. (1.5)

{

u (x ,0) = uo(x)

au ~~,t) = 0 for x = 0,1

u 0 is the initial heat distribution and the boundary condition says that no heat escapes from the

system. The history matching problem of reservoir engineering generalizes this to 3-dimensions,

see[ll].

(ii) Hyperbolic Wave Equation

Here u (x ,t) represents a local displacement at time t and position x along a rod with

unknown sound transmission characteristics. An explosion is set off at one end of the rod which

causes the rod to vibrate. Now by observing the displacement pattern at Val101.1S positions along

the rod (seiism.ogl-anIs) get inforno.ation about the sound translnisisic~n characteristics

of the material of the rod. The system modelled an equation form

=0 os e s t ,0~t~T. (1.6)

The initial conditions say that the rod is initially at rest. The explosion is set off at x = 0 and is

described by f (t), the right hand boundary is held fixed. We are interested in the estimation of 8.

The problem of reflection seismology problem is of this type[20]. Seismologists are interested in

two and three dimensional versions of problems of this type.

The behavior ofthe above systems with some specified values for the initial conditions and

model parameters are given in Figures 1.1.1-1.2.2. Model parameters and initial conditions are

given in Figure 1.1.1 and 1.2.1. Figures 1.1.2 and 1.2.2 gives gray-scale image plots of u (x ,t)

(heavier shading corresponds to larger values of u (x,t ». For the parabolic system the initial

temperature distribution is quickly smoothed out in time. In the hyperbolic system the initial

explosion sets off a wave which is bounced back and forth from one end of the rod to the other,

the wavepattem spreads outin time. The measured data in both problems are time series of the

solution characteristics observed at a finite number ofdistinct locations along the rod.

It is clear to see how the non-linear non-parametric regression framework in (1.3) applies to

the model problems. The measurements are

Zij = Z(Xi ,t) = u (Xht ; 8) + £ij , i = 1,2, .. ·,n , j = 1,2, ... m (1.7)

Here we explicitly indicate the dependence of u on the underlying profile 8, u (Xi,t ; 8) are non-

linear functionals of the functional parameter 9. Later on the paper we do numerical studies in

which data are generated according to (1.7) with the £ij are indlep<mdent Gaussian random

variables.

For statistical inference the model sensitivity matrix is a most important quantity. If the

profile is approximated by a p -dimensional finite element approximation with coefficients

9 = (91)92, ... ,9p )' (p maybe arbitrarily large) the model sensitivity matrix is defined as

, k = 1,2, ... ,p . (1.8)

The sensitivity matrix determines a first order linearization for the model and in the standard

non-linear least squares framework the matrix X'X is important for: (a) evaluating the estimator

by the Gauss-Newton algorithm, (b) generation of approximate interval estimates, (c) comparison

of experimental designs, and (d) understanding asymptotic estimation characteristics. The X'X

matrix plays a very similar role in the analysis of regularization procedures but in addition it can

also be used to adaptively choose an appropriate value for the regularization parameter A.. This is

illustrated in section 4.

1.3. Outline

The paper is organized as follows. A Gauss-Newton algorithm for computing the

regularization estimator is described in section 2. Practical convergence characteristics of the

algorithm are discussed. Section 3 gives details on the computation of the sensitivity matrix.

The method is naturally suited to parallel processing. Some applications of X'X are given in

section 4; including adaptive choice of the regularization parameter and interval estimates for

parameters. A straightforward procedure for the analysis of resolution characteristics which uses

X'X is described I14' .,... ,. Some asymp1totic analysis of the one-step Gauss-Newton expanston

been presen,ted In section 5 we prel;ent a sinlul:il.ticln study shows

that resolution analysis based on X'X works well in samples. This has obvious

The minimization of the regularization functional in (1.4) could be approached either by

first or order gradient algorithms and it is likely that a combination of first and second

order gradient algorithms would prove most efficient. First order gradient methods have been

proposed for instance in[2, 11]. A powerful result which comes from optimal control theory is

that the gradient for the regularization functional can be computed by solving the original

differential equation (state equation) together with one associated adjoint equation, see[2, 11] for

example. Second order gradient algorithms use Hessian information and as a result the

computational effort per.iteration is much greater than for first order methods. In practice this has

to be balanced against available computing resources and the fact that second order methods can

achieve higher order rates of convergence. Here we will focus on the Gauss-Newton algorithm

which is a very well known second order gradient method for solving non-linear least squares

problems. The Gauss-Newton algorithm requires the sensitivity matrix which also plays a crucial

role in statistical inference and resolution analysis, see section 4 and 5.

2.1. Algorithm

For numerical computation we approximate the parameter by a linear combination of

approximating elements

(2.1)

where <1>" a cubic B-spline elements defined with respect to a uniform knot sequence on (0, I) and

knots of and 1, see[6]. The number of be quite large relative

to so the error in 1) will not be concern.

higher dimensions tensor product B-spline or elements could be employed.

a

where 9 = (9}>92, ... ,9p )' and 0 comes from the quadratic penalty term. Throughout the

remainder of the paper

so 0 is given by

1.. ..Old =J(l>k (x )epl (x)dx k,l =1,2, ... p .

(2.3)

(2.4)

With cubic B-splines, 0 becomes a symmetric seven-banded matrix. To compute the regularized

estimator we minimize lA,(9) subject to the constraint the solution be positive. Positivity is

enforced by requiring that the coefficients be positive, since the individual B-splines are positive,

see[6], the positivity of the solution is guaranteed. The standard damped Gauss-Newton

algorithm leads to the iteration scheme

9(s+l) = 9(s) - 't . [X'X+ AOjl[-X'r + A09(s)] (2.5)

for s =0,1,2, .. '. 't is a step size which is chosen so that I')", is minimized subject to 9(s+l)

satisfying the constraints. A step size halving strategy is used. In our experiments we have very

rarely found it necessary to explicitly enforce the positivity constraint. X is the sensitivity matrix

(2.6)

and r is the current residual,

(2.7)

for i = 1,2, . . . n , j = ... m , and k = 1,2, ...p. Convergence is declared when successive

At convergence 9')", is set to

the evaluation of sensitivity

coefficients. X is a very large array but fortunately it turns out that X does not have to be stored

in memory, rather it is only necessary to store X'X and X'r explicitly. This is described in

section 3. Before going on to this we will spend a bit of time discussing the practical

performance of the algorithm.

2.2. Performance of the Algorithm

2.2.1. Theoretical Convergence

Theoretical results concerning the convergence characteristics of the Gauss-Newton

algorithm are rather well documented in standard texts, see[18] for example. Newton methods

can be shown to be locally quadratically convergent for well behaved objective functions.

Damped Newton methods are globally convergence for strictly convex objective functions with

compact level sets. The objective function involved in (2.2) is very complicated and very

difficult to work with analytically.

The regularization functional consists of two parts: a residual sum of squares term and a

penalty term. The penalty term is quadratic and so it is clearly convex. If we knew that the

residual sum of squares term was approximately convex then we should have a good deal of

confidence in the convergence of the damped Gauss-Newton method. Let the residual sum of

squares be denoted RSS (e). Some random profiles of the residual sum of squares function were

evaluated to investigate convexity. Two values for 0 were generated as follows: ej!)= OJ + 3.·CJ.j

andOp) =ej - 3.·(1-CJ.j) for j =1.2.3•...p where 0 is the true value and the CJ.j'S are independent

uniforms over the interval [0.1]. The percent errors in euclidean norm errors between e(1),

plots of

l--l::m:J/'n I for 0 S; 13 S 1.

the hYl:>erlbolic and for the parabolic. Figure

squares cornpcmeltlt of the of)ll~ti'ile function. RSS

and eare about

residual sum

The nyperbolic problern shows some non-convexity on scales.

From these plots one would guess that the Gauss-Newton algorithm would perform rather well at

least provided the initial estimates are reasonable. (Of course even a very large number random

profiles would not generate enough information to allow us conduct a formal statistical test of a

convexity hypothesis - the set of alternatives is just too large. However given the appearance of

random profiles it would be unreasonable to act as if the objective function were pathological).

Line searches should alleviate concerns about starting guesses and the potential of getting caught

in local minima.

2.2.2. Observed Convergence Characteristic

The algorithm is terminated when the relative change in the objective function is small, we

test if

(2.8)

where tol = .001. Estimates for successively smaller values of A are computed, and the best

value is chosen using the method of Generalized Cross Validation described in section 4. If

Ai > A2 are successive values of Aat which solutions are computed eAt is used as a starting guess

in solving for 9A.z. For the examples computed in section 4, see Figure 4.2. the initial solution

took about 6 to 7 iterations but subsequent solutions numerically converged in 1 or 2 iterations.

The smoothest solution is essentially linear so that the computational effort with this solution

could be reduced by explicitly fitting a line for 9. The order of discretization in solving the

differential equations is N =4'p , see the next section. With p =50 average computing times

per on the Cray X-MP at the UC-Berkeley was on the order of 10.2 seconds for the

parabolic and 65 seconds for the hyperbolic. gives a computing time per time step per A

are 100

supercompnter at Berkeley is a new geIler,lt1C,n


3.1. Solving the Forward Problems

Implicit finite difference schemes are among the most popular methods for solving

parabolic and hyperbolic equations, see[23} for example. The interval 0::;; x ::;; 1 is divided into

N +1 subintervals each of width h = lI(N +1). The discretized approximation to the solution of

the differential equation is obtained at time intervals of length 6.. The solution at time j 6. is

denoted uJ' for j = 0,1,2, . .. with u(::: u (ih .i6. ;9). Thoughout this section we assume for

simplicity that measurement times are tj :;j6. for j = 1,2, ... m • When the measurement times

areotherwise,a simple interpolation scheme will be used, like section 3.3 below. The elliptic

component of the differential equations is approximated by a difference quotient

a [6(X) au Uh,j6.;6)] :::; 6(ih) (Ui+l -2ul+ui_l) + 9(ih) (Ul+l-ui-l )dX (fx h2 2h (3.1)

for i =1,2, ... N. 8 denotes differentiation with respect to x. The boundary conditions are used

to eliminateu6 and Uh+l from these equations and we obtain

a f6(x ) aUCih ,j 6.;9) ] =AuJ' +qidX l dx

(3.2)

where A is a non-symmetric tridiagonal matrix (independent of time) and qi involves boundary

and/or forcing conditions~ With Dirichlet or zero Neuman boundary conciitioIlS qi does not

involve 9 and so for the parabolic problem qi = O. Non zero Neuman boundary conditions force

qi to depend on 6 and for the hyperbolic problem we have

. {-28(h)1 (k6.)/h + 9(h)1 (k6.) i = 1ql= .0 i » 1

Implicit difference schemes set time differences u to be equal

(3.3)

averages of the

applroxinultiolllS to

A

ex.I>onlential fiJlnction, is

Sui+1 =Bui +(-!-qi+1 + -!-qi) , j =0,1,2,··· (3.4)

S 4A and B = I + The method is completely specified UOis given.

Difference Scheme for the Hyperbolic Equation

Sui+1= Bui - Sui-1 + ~2({-qi+1 + -!-qj + {-qj-1) j = 0,1,2, . . . (3.5)

where S = 1-1f.A and B = 21+ .tf-A. Here UO is given but the initial conditions must be used

to obtain u'. Since-9tu (x ,0)=0we setu1- u-1 obtain

(3.6)

The equations are easily solved by direct methods. The Cholesky decomposition of S is

obtained once at j = 0 and the solutions at all subsequent times are computed by back

substitution. In the case that adepends on time (directly or via U which occurs with non-linear

diffusion) separate Cholesky decompositions atel1eeded atever}'tirne step.

3.2. Comuptation of the Sensitivity Matrix

Recall that the sensitivity matrix is defined as the gradient of the solution u with respect to

the profile a.··· ·The of the sensitivity matrix is denoted dleu = By

differentiating the implicit difference equations for the forward problem we obtain a method for

nurnel'Jical approximationto dkU , let =

Dit'ferenc:e seneme for in the Parabolic Equation

j = 0,1,2, ... (3.7)

where Sk; = ie~ and Bk; = gl both of which are easy to evaluate since Sand B are linear in a,

this comes from (3.1) and (3.2). The local support property of B-splines makes Sk; and Bk; have

mostly zero elements. The q term drops out because in the parabolic problem q is independent of

a. dk;UO= 0 again because uOdoes not depend on a.

Difference Scheme for dk;U in the Hyperbolic Equation

Sdkui+1 = Bdk;ui - Sdk;ui-1 - Sk;ui+1 + Bk;Uj - Sk; ui-1 (3.8)

+ .6.2( -ldk;qi+l + -tdk;qj + ~ dkqi-l) j = 1,2, ...

again Sk = it and Bk = gl· Since dk; Un =0

3.3. Computation of X'r and X'X

(3.9)

Let ch:uj represent the n-vector resulting from linearly interpolating dk;ui to the

measurement sites, Xi , i = 1,2, ... n . The approximations to X'r and X'X are

X'rk :: 'Y(dkW)'r(tj))"'::1m . .

X'X/d ::: )~(iJk iii )'dl iii kl = 1,2,··· P (3.10)

where r(tj) is vector of residuals at the measurement sites at the j tm is the

time insl:ant at At the j'th step we conlpute ui+1 and

we then evaluate dkuJ'+l ...p. X'r and X'X areupdated as

= +

= j= m-I

X'rg and X'Xj are both set to zero.

Parallelism

It is obvious how to take advantage of parallel processing for computing X'r and X'X. In a

parallel processing environment at the j'th step the computation of dkUi+1 for k = 1,2, ... p is

distributed over available processors. An individual processor must first evaluate the right hand

side for dkui +1 and then obtain dkui +1 by back substitution. The shared data are ui+l, u/, (ui-1 in

the hyperbolic case), and the S and B matrices. After all p of dkUi +1 have been computed X'r

and X'X are updated using (3.11). The computatiQ.tl may then proceed to the next time step. First

the forward problem update is evaluated and using this the X'r and X'X updates are. carried out

Some experimentation with this scheme on an Alliant FXl8 at Lawerance Livermore National

Laboratories resulted in almost linear speed-up over sequential computation.

Feasibility of Higher Dimensional Problems

In l-dimension the computations effort per iteration is proportional to N '(p +1)·m. The N

term comes from the time required to solve the linear systems in (3.4)-(3.8), (p +1) because th~re

are (p+l) systems to solve at each time step and m is the number of time steps. In higher

dimensions the computational effort will be determined by how well the analogues of the basic

linear systems may be solved. Direct solution of the linear systems (3.5)-(3.8) may be worth

investigating because such techniques are easily vectorized, see[19]. Iterative solution with

acceleration schemes are natural because at a given time step the solution at the previous time

step will serve as a good starting guess. Computational experience in solving time dependent

partial appears in modem is rapidly

accumulating, see[19] and the cited our applications time

should grow as a function of p 2 (N = 4·p). For the in three dimensions

p=

6¢rk:el(~y mlachine, i.e, a

the conlputing· case considered secnon 2. Memory res'tn(:tiol1S on

a nUIILeric:al but assUlIll1Ung it is correct coaverts to

about 3.75 time time step per on a Cray 3. A single experiment

carried out on a four processor Cray 2 with p = 1000 indicates a computing time of 6 seconds per

time step per A-value. Since the Cray 3 is considered to be between 2 and 4 times faster than a

Cray 2, the 3.75 second estimate seems conservative. Computing requirements on this order are

minor when compared to other costs involved in the practical context of these problems.

Furth~rmoresupercomputingi~chnology' is. still moving •rapidly and it is clear that in

more powerful systems will become available to further reduce computing time concerns.

The availability of X'X makes it easy to tackle some important inference

problems such as the adaptive of the regularization parameter and interval estimates for

parameters. Both of these problems are discussed in this section.

4.1. Adaptive Choice of the Regularization Parameter

A number of techniques have been proposed for chosing regularization parameters. For

example section 5 of[ll] describes two techniques one due to MiIIer[12] and another which is a

variant of the discrepancy.principle of ArcangeIi[I] and Morozovf13]. In regularizatioll problems

it iswell known that the optimal choice of the regularization parameter is a function of the

unknown function 9 and the unknown noise level. Miller's method requires an apriori upper

bound on J (9), the penalty function evaluated at the true parameter value, and also a specification

of the noise level in the data. In practice such quantities are difficult to obtain. For optimal

convergence in the linear regularization problems the upper bound has to increase with increasing

sample size, yet it is unclear how the method can accommodate this. The discrepancy principle

approach is more clearly defined. The method has been analyzed and shown to be suboptimal,

see[9, 22], basically because it produces solutions which are consistently too smooth.

The choice of regularization-like parameters is a problem which has received considerable

problem of choosing the number of variables to enter into a regression equation. A number of

procedures have emerged which have been shown to have

these methods is Generalized Cross-Validation

minimi:idng the crtteriea

V (A.) = RSS (A.) I [mn -- J11(A.)]2

theoretical properties, one of

criterion A. by

(4.1)

and

(4.2)

RSS (A) is the residual sum of squares and [mn - J.!.l(A)] is the referred to as the effective degrees

of freedom for error. see[14] for another use of this sort of cross-validation function in a non-

linear least squares regularization problem.

In linear regularization problems, the GCV estimate is know to produce estimates which

perform wen from the point view of the predictive mean square error loss, In the present

situation the predictive mean square error is

(4.3)

where 9 is the true parameter and 9j. is the regularization estimate. An asymptotic theory for

GCV in non-linear regularization models is not available at present but numerical experiments

in[14, 17]have shown that the technique has definite promise.

An lllustration

For the parabolic and hyperbolic problems data were generated according to (1.7) with the

standard deviatioa.er. One

sample was generated with n=80 equispaced measurements sites over the interval [0,1] and a

supsample with n =20 equispaced measurement sites was also taken. The temporal sampling in

the pm'ab()lic case COIlSisted m=100 equispalced over the intl~rv,al [0,.35]; the

m=400 eqarspeced points over the intle:rv,aI G was 3.0 in

the paraocnc case and 0.1 in the hyperbolic case, this resulted of G to the sampte

true 11. p3nlbolic and hyt:erbolic

For 800Q(2()00) obsersations in the panlbolic and

32000(8000) observations in the hyperbolic case. Regularization estimates with p =30, were

computed on each data set for a range of A. values and the GCV function evaluated. The starting

was a constant profile a(x )=.5. Figure 4.1 shows the GCV function for each sample. The

GCV function is plotted against J..ll(A.) - the effective degrees of freedom for the model,

J..ll(O) :::: P = 30 (the order of discretization for a) and J..ll(A.) -t 2 as A. -t 00. In all four cases the

minimizer of the GCV criterion produces a value of A. which is very close to the minimizer of the

predictive mean square error. The efficacy of the GCV function is measured by

(4.4)

where i.. is the value of A. minimizing the GCV function. Efficacies, marked by "eff" on the

graphs, are close to one in all cases which means that the GCV criterion choses a value of A.

which is performs well from the point of view of the predictive mean square error loss.

Remark

We chose the scaling of A. on a degrees offreedom for model scale, Le. given a X'Xmatrix,

A. is chosen so that the effective degrees of freedom for the model, J..ll CA.), is some specified value.

Since J..ll is monotonically decreasing with increasing A. a zero finding routine was used to find the

A. which achieves the desired model degrees offreedom. At present the optimal value is found by

brute force but gradient methods might also be used provided care is taken to avoid local minima

- cross-validation functions can be noisy, see Figure 4.1.

4.2. Simultaneous Interval Estimates for Model Parameters

Ap,pr()xilmate Bayc~siatn intervals estimates are obl:airled in a straightforward maaaer, see

to a Gausstaa

(5 is estimated by the residual sraadaru devianon

(5= (4.6)

As indicated by Silverman[21], approximate ,nt,'rv'll esurnates 8, or functionals thereof, can

be generated by simulating from the posterior distribution. To obtain Highest Probability Density

(HPD) bands for the curve we generate realizations from the posterior distribution 01>82, ... Os

with B large say 1000 or so. The 2.5'th and 97.5'th percentiles of the set of values {

81{x ),a2{X) ... as (x) } are the lower and upper limits for a

(4.5) is Gaussian these this interval has the form

eA.{x) ± 1.96se (x)

where se (x) is the standard error in 8A.(x).

HPD interval at a point x. Since

(4.7)

For simultaneous intervals we replace the standard normal percentile value of 1.96 by a

factor{j with the property that

=.95 (4.8)

P is evaluating the posterior probability of the event. The factor f may be estimated by

simulation, i,e, we take the 95'th percentile of the set of values

b = I "',8. (4.9)

true curve simuitaneoills HPD are plotted

=3.0 so than l is set at the

true curve together

estimate is

wider than for the parabolic.

a relative lack of confidence in thesereduced from 80 to 20 measurement

The intervals for the hyperboiic estimation are

This could be due to a difference in the temporal sampnng rate. Looking back to (1.5) and (1.6),

we might sense that while the parabolic must somehow an estimation of the

hyperbolic must involve an estimation of

intervals are wider.

This be another reason why the hyperbolic

A theoretical discussion of Bayesian confidence intervals for linear regularization

estimators can be found in[4].· One might argued that the intervals ought to be adjusted to take

into account the uncertainty in the determination of A. An investigation of this issue is beyond

the scope of this paper.

A straightforward procedure XIX is

described in[l5]. The method is on a one-sten Gauss-Newton expansion, some asymptotic

analysis of this is provided in[l6]. we present a simulation shows that

resolution analysis based on XIX works

5.1. Bias and Variance

The error in the regularization estimator (h. may be expressed as

(5.1)

where ao is the true value of a and aA. is the regularized solution with continuous error free data.

The first term is the systematic error or bias in statistical jargon and the second is the random

error. The one-step Gauss-Newton approximations to the systematic and random errors are

aA. -- ao::: [XIX + AQ]-lX/Xao-- ao

and

(5.2)

(53)

where e is the vector of errors. In the first case

(2.5) with 't = L If the errors have mean zero

from (53)

sensitivity matrix is computed at ao while in

are uncorrelated with constant variance 02 then

(5.4)

one-step linearizatioa random error

can two tormu:las are as}'ml)toitictllly

a neil~borbjood

With these approximations asymptotic resolution characteristics can studied by

considering the simultaneous diagonalization

5.2. A Simulation Experiment

for example.

A simulation study was used to evaluate the accuracy of the X'Xvbased resolution analysis.

Data sets were generated according to the ,.,,,'r,,h,nJir and hyperbolic models.

Z'(s}-u(x' t, : C'»+,,(s}IJ - I , J ' 0 '-'IJ s =1,2,'" (5.5)

for i =1,2, ... nand j =1,2,..m , with £I~} being i.i.d. zero mean Gaussian random variables

with standard deviation of cr. The setup was the same as in section 4 except that only the n=80

spatial sampling was considered. For each of several A-values the regularization estimate eA.[s]

was computed for 25 data sets generated according to (5.5). For fixed A, the mean and variance

of (1A.[s] over the simulation runs provided direct estimates of the pointwise bias and variance.

The approximations in (5.3) and (5.4) were used to obtain X'Xvbased pointwise bias and variance

characteristics. The linearized bias based on (5.2) is a poor predictor of the true bias except for

small A. For larger A the bias is much better approximated by evaluating the regularization

estimator on error free data (i.e. (5.5) without the f4}). Pointwise bias and standard deviation

results for A. corresponding to a 4.5 degree of freedom model (see Figure 4.1) are presented in

Figure 5.1. The approximation matches the true remarkably well. We see that the regularization

estimate over shoots the true at the edges the parabolic case and undershoots in the hyperbolic

case. It natural to to use information 10 a to

of in

variance so some caution A-varues are

percent error bel:Wt~en the true

inregrated SQ1J13R:d error vn"•.U,J}

so, it is

hYIJerlJolic problems when degrees

,,\--tm,st:u analvsis orovioes a remarkably good

the true mean mtegratec squared error.

Acknewlegement

I am grateful to Michael Stewart and Tony for a number valuable comments.

1. Arcangeli, R., "Pseudo-solution de

l' Academie des Sciences, vol, Ser, A pp. '::'0.,,-'::'0,,).

" Comptes Rendes Seances de

2. Chen, W. H., Gavalas, G. R., Seinfeld, J. H., and Wasserman, L., Algorithm

for Automatic History Matching, " Soc. Pet. 1., vel. 14, pp, 593-608, 1974.

3. Cooley, R. L., "Incorporation of prior on parameters into nonlinear regression

groundwater flow models, 1 Theory," Water Resour. Res., vel. 18, pp, 965-976, 1982.

4. Cox, D.D., "An Analysis of Bayesian Inference for Non-parametric regression," Dept. of

Statistics, UW-Seattle, 1986.

5. Cox, D. D. and O'Sullivan, F., Analysis of penalized likelihood type estimators wih

application to generalized smoothing in Sobolev Spaces, 1988 (revision in preparation for

Annals of Statistics).

6. Delsoor, C., A Practical Guide to B-Splines, Springer-Verlag, New York, 1978.

7. Deufthard, P.and Hairer, E., Numerical Treatment of Inverse Problems in Differential and

Integral Equations, Birkhauser, Boston, 1983.

8. Engl, H.W. and Groetseh, C.W., Inverse and Ill-posed Problems, Academic Press, Inc.,

Boston, 1987.

9. Engl, H.W. and Neubauer, A., "Optimal parameter choice for ordinary and iterated

Tikhonov regularization," in Inverse and Ill-Posed Problems, ed. H.W. Engl and C.W.

Greetseh, pp. 97-125, Academic Press Inc., Boston,

Kimeldorf, G. S. between Bayesian Estimiflticln in

Splines," rvrcncu:» Matnematica: Statistics, vol,

Seinfeld, J.

systems 'by regularization," SIAM J. Control and Optimization,

1985.

23, 217-241,

Miller, K., "Least-squares methods for iu-cosec problems with a prescribed bound," SIAM

J. Math. Anal., vol. 1, pp, 52-74, 1970.

13. Morozov, A., "On the solution of functional equations the method of regularization,"

Soviet Math. Dokl., vol. 7, pp. 414-417,1966.

14. O'Sullivan, F. and Wahba,G., "A cross validated Bayesian retrieval algorithm for non-

linear remote sensing experiments," J. Compo Physics, vol. 59, no. pp. 441-455, 1985.

15. O'Sullivan, F., "A statistical perspective on ill-posed inverse problems (with discussion),"

J. Statist. Science, vol, 1, pp, 502-527, 1986.

16. O'Sullivan, F., "Constrained non-linear least squares regularization with application to the

estimation of functional parameters in elliptic partial differential equations," Tech. Rep.

No. 99, Dept. ofStatistics, DC-Berkeley, 1987.

17. O'Sullivan, F., "Fast computation of fully automated log-density and log-hazard

estimators," Siam J. Sci. Statist. Compo (in press), 1988.

18. Ortega, J. M. and Rheinboldt, W. C., Iterative Solutions ofNon-linear Equations in Several

Differential Equat:1OIIS on Vector andR.G., "Sclution19. Ortega, J.M. and

Parallel computers,"

iSarLtoza, F. and .:>YIJUt;:s, invelrsion of band-limited renecnon seismognuns,"

B. smootmng approach to nOIllparame:trlc

22. Til:teItn~~toJn, D.M., "Common structure in staestics.,' Inter.

Statist. Rev., 53, pp,

23. Twizell, E.H., Computational Methods for Partial Diiterential Equations,

Chichester, 1984.

Horweood,

24. Wahba, G., "Bayesian confidence intervals for the cross-validated smoothing spline," J. R.

Statist. ss«, vol. 45, pp. 133-150, 1983.

25. Wahba, G., "Partial and interaction spline models for semi-parametric estimation of

functions ofseveral variables," Proceedings of the 18' tIL- Symposium on the Interface, pp.

75-80, American Statistical Association, 1986.

1.1.1: r'arameters for the Parabolic Problem

1.1.2: Behavior of udarker snadmg representing

,t) in the Parabolic Probtern.values.

is a gray with

Figure 1.2.1: Parameters for the Hyperbolic Problem

Figure 1.1.2: Behavior u (x ,t) in the Hyperbolic Problem. This is a gray scale image plot withdarker shading representing larger values.

Figure 2.1 : Random Profiles of the Residual Sum of Squares

Figure 4.1 : Cross-validation functions plotted against the effective degrees of freedom for themodel, J.ll(A) in (4.2). The efficacy relative to the predictive mean square error, see (4.4), isindicated by eff,

Figure 4.2 : Estimate and 95% Confidence Bands The plots show the true curve (solid line) thecross-validated regularization estimate (dashed line) and the upper and lower confidence bands(dotted lines).

Figure 5.1 : Pointwise Error Characteristics for Agiving a model degrees of freedom of 4.5. Theplots show the true error characteristic estimated from the simulation (solid line) and thetheoretical approximations based on X'X (dashed line).

Figure 5.2 : Percent Errors in the Mean Integrated Squared Error Approximation

~

II9-

..,Q:::e:-(S':::'"

t=:t..:::J.C=-~.~ g.

I

~Q.J"""'"'<:t>>;'-'

c:JQ.J><1;::II

o

oIJ\~

IJ\-oIJ\

IJ\:-;

uO

o 40 60 so 100 0.50 0.55 0.60 0.65 0.70 0.75 o.ao 0.85

~ rr----.,...---,.....---.,----r-----..,

co~:..

)(

~ 'fII'"

eo

t

--

!='o

!='....~r"';,<

X ...s:

pOl

f

II

~~C II'-' 0IIo

I

e1<:uo~"-'

&1::::-'"1"IIo

fA-ofA

fA

~

-1.0 0.0 0.5 1.0 2.0 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8

p..

.. f

- X

.. ~

...

c

t

-

po

pQ)

Figure 2.1: Random Profiles of Residual Sum of Squares

Parabolic, n:::80

I I

Parabolic, n=20

J It. J

0.0 0,2 0.4

Beta

0.6 0.8 1.0 0.0 0.2 0.4

Beta

0.6 0.8 1.0

Hyperbolic, n:::80 Hyperbolic. n=20

0.0 0,2 0.4

Beta

0.6 0.8 1.0 0.0 0.2 0.4

Beta

0.6 0.8 1.0

Parabolic 0=20

2 4 8 16 30 2 4 8 16 30

Model Degrees of Freedom

Hyperbolic 0=80

I


Hyperbolic 0=20

2 4 8 16 2 4 8 16

/~--

~~""~~~.. -,.. ~ ....,.<~>.,

~.. .. ...... -"" ~,,~

"':"\,

.--

0.0 0.2 0.4 0.6 0,8 1,0 0,0 0.2 0,4 0.6 0,8 1.0

Hyperbolic 0=80 Hyperbolic 0=20

o.,;

oo

Parabolic

· Pointwise .Error Gl1lan:lcterj:stjc:~s

ad

II

N0d

\ r\ r

0 \ id

\ ~

\ f0 ........~.............d -s9

od

od

o9

7/

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Hyperbolic (bias) Hyperbolic (s.d.)

oo,.-

o10

o ~ .

o10

I

oo,.

I

2 4 8 16 30

oo,.-

o10

o

o10

I

oo,.

I


Hyperbolic

2 4 8 16

Documents

IETERESTIMATION IN PARABOLIC AND …...PARAl\'IETERESTIMATION IN PARABOLIC AND HYPERBOLIC EQUATIONS by FinbarrO'Sullivan TECHNICAL REPORT No. 127 May 1988 DepartmentofStatistics,