Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
PARAl\'IETER ESTIMATION
IN PARABOLIC AND HYPERBOLIC EQUATIONS
by
Finbarr O'Sullivan
TECHNICAL REPORT No. 127
May 1988
Department of Statistics, GN-22
University of Wasbington
Seattle, Washington 98195 USA
Parameter Estimation in Parabolic and Hyperbolic Equations
Finbarr 0'SullivarJ
Department of Biostatistics adStatisticsUniversity of Washington
Seattle, WA98195.
ABSTRACT
Two model ill-posed inverse problems associated with the estimation offunctional parameters in I-dimensional parabolic and hyperbolic equations areconsidered. .. These estimation problems are. approached from a statisticalperspective which considers the estimatiop problem in terms of generalized nonlinear regression. A standard regularization method is stdied. The role of thesensitivity matrix for statistical inference is very important, however theevaluation of this matrix involves the numerical solution of a system of coupledparabolic or hyperbolic equations. This is a large scale supercomputing problem,but worth tackling since the sensitivity matrix is useful in the analysis of anumber of practically important problems including: fully adaptive choice of theamount of regularization, intervals estimates for parameters and resolutionanalysis. Numerical experiments are used to justify each of these techniques.The paper includes a discussion of the feasibility of applying similar methods inhigher dimensional settings.
AMS 1980 subject classifications. Primary, 62-G05, Secondary, 62-J05, 41-A35,41-A25, 47-A53, 45-LlO, 45-M05.
Key words and phrases. constrained non-linear regularization, cross-validation,Gauss-Newton algorithm, ill-posed inverse problems, interval estimates,resolution.characteristics, simulation, supercomputing
Running Head: Distributed Parameter Estimation
6,1988
1 Research in part by the National Science Foondation onder Grant No. MCS-840-3239 and by the Departmentof Energy under Grant No. DE-FG06-8SER2S006. Some of the worle was done during a stay at the Institute forMathematics and its University ofMinneliOtll.
Parameter Estimation in Parabolic and Hyperbolic Equations
Finbarr0'Sullivan1
Departmentof Biostatistics ad Statistics
University of Washington
Seattle,WA 98195.
1. Introduction
Estimation problems associated with the identification of functional parameters in partial
differential equations arise in several fields including diffraction tomography, reservoir
engineering and seismology. Several such problems are discussed in[7,8] for example. A
generalized framework for a large class of these problems is outlined in[11, 16]: U, F and e are
function spaces, there is a system operator A such that for each f in F and a in a subset C of e,
there is a locally unique u in U satisfying
A(u,a)=f (1.1)
The inverse system identification problem is to estimate a from measured information about u ,
(j is taken to be known). LetXn be a linear mapping on U,Xn : U ~ Rn, we are given a vector
of measurements z,
(1.2)
where en is a vector of measurement or possible modeling errors (1.1). The estimation problem
for acan be in the form of a non-smear non-parametric regression. Equation (1.1) gives an
Researcl1 supported in partby the National Science Foundationunder Grant No. Mcs.84().3239 and by the Departmentof Energy under Grant No. DE-PG06-8SER25006. Soma of the wod;: was done a stay at the Institute forMathematicsand ofMlanellOtlL
can thus be regarded as non-linear functionals of 9. The data are written as
(1.3)
where the design points Xi lie in some set I. The ci are the measurement/modeling errors. Since
9 is high dimensional and the data are discrete and noisy the estimation problem is ill-posed and
some form of regularization is required to obtain reliable solutions. Various forms of
regularization have been proposed for estimating 9, see(3, 11] for example. A constrained non
linear least squares regularization estimator is defined as
(1.4)
A. > 0 is the regularization parameter and J is an appropriate penalty functional, see(25] for a
discussion of possible penalty functions. Quadratic J functions based on L 2 norms of derivatives
are familiar in the statistical spline smoothing literature and Gaussian state space representations
have been used to motivate such functions from a Bayesian viewpoint(lO]. Non-quadratic J
functions involving L 1 norms ofderivatives for example have also been proposed(20], a Bayesian
motivation using non-Gaussian state space representations is available for these J functions also.
In general C is some subset of the parameter space 8. This might represent constraints such as
positivity, monotonicity ol"c!:)IJ.vexity. In SOme circumstances itmight be appropriate to replace
the residual sum of squares term in (1.4) by a more robust alternative.
relied
9
is much less problemaue since, intuitivelv, the data need only
penalty n.lnlc:tional, see section 3
Theoretical convergence characteristics of constrained non-linear least squares estimators
qU<ldr,ltiC penalty functionals are studied The identifiability ot9 when A. zero and
COIltlnuOtlS plays an role A. than zero
to ideIltify
the sampling
identifiability
some theore~tical
general system identification framework.
1.1. Model Problems
(i) Parabolic Diffusion Equation
The temperature at time t and position x on a l-dimensional rod with unknown heat
conductance characteristics is denoted u (x ,t). An initial heat distribution is placed on the rod
and by observing what happens to the distribution of heat in time one hopes to gain information
about heat conductance 9 of the rod. The system is governed by the differential equation
au _ a (9(x) aU} = 0dt dX dX
with initial/boundary conditions
os e s i , O~t~T. (1.5)
{
u (x ,0) = uo(x)
au ~~,t) = 0 for x = 0,1
u 0 is the initial heat distribution and the boundary condition says that no heat escapes from the
system. The history matching problem of reservoir engineering generalizes this to 3-dimensions,
see[ll].
(ii) Hyperbolic Wave Equation
Here u (x ,t) represents a local displacement at time t and position x along a rod with
unknown sound transmission characteristics. An explosion is set off at one end of the rod which
causes the rod to vibrate. Now by observing the displacement pattern at Val101.1S positions along
the rod (seiism.ogl-anIs) get inforno.ation about the sound translnisisic~n characteristics
of the material of the rod. The system modelled an equation form
=0 os e s t ,0~t~T. (1.6)
The initial conditions say that the rod is initially at rest. The explosion is set off at x = 0 and is
described by f (t), the right hand boundary is held fixed. We are interested in the estimation of 8.
The problem of reflection seismology problem is of this type[20]. Seismologists are interested in
two and three dimensional versions of problems of this type.
The behavior ofthe above systems with some specified values for the initial conditions and
model parameters are given in Figures 1.1.1-1.2.2. Model parameters and initial conditions are
given in Figure 1.1.1 and 1.2.1. Figures 1.1.2 and 1.2.2 gives gray-scale image plots of u (x ,t)
(heavier shading corresponds to larger values of u (x,t ». For the parabolic system the initial
temperature distribution is quickly smoothed out in time. In the hyperbolic system the initial
explosion sets off a wave which is bounced back and forth from one end of the rod to the other,
the wavepattem spreads outin time. The measured data in both problems are time series of the
solution characteristics observed at a finite number ofdistinct locations along the rod.
It is clear to see how the non-linear non-parametric regression framework in (1.3) applies to
the model problems. The measurements are
Zij = Z(Xi ,t) = u (Xht ; 8) + £ij , i = 1,2, .. ·,n , j = 1,2, ... m (1.7)
Here we explicitly indicate the dependence of u on the underlying profile 8, u (Xi,t ; 8) are non-
linear functionals of the functional parameter 9. Later on the paper we do numerical studies in
which data are generated according to (1.7) with the £ij are indlep<mdent Gaussian random
variables.
For statistical inference the model sensitivity matrix is a most important quantity. If the
profile is approximated by a p -dimensional finite element approximation with coefficients
9 = (91)92, ... ,9p )' (p maybe arbitrarily large) the model sensitivity matrix is defined as
, k = 1,2, ... ,p . (1.8)
The sensitivity matrix determines a first order linearization for the model and in the standard
non-linear least squares framework the matrix X'X is important for: (a) evaluating the estimator
by the Gauss-Newton algorithm, (b) generation of approximate interval estimates, (c) comparison
of experimental designs, and (d) understanding asymptotic estimation characteristics. The X'X
matrix plays a very similar role in the analysis of regularization procedures but in addition it can
also be used to adaptively choose an appropriate value for the regularization parameter A.. This is
illustrated in section 4.
1.3. Outline
The paper is organized as follows. A Gauss-Newton algorithm for computing the
regularization estimator is described in section 2. Practical convergence characteristics of the
algorithm are discussed. Section 3 gives details on the computation of the sensitivity matrix.
The method is naturally suited to parallel processing. Some applications of X'X are given in
section 4; including adaptive choice of the regularization parameter and interval estimates for
parameters. A straightforward procedure for the analysis of resolution characteristics which uses
X'X is described I14' .,... ,. Some asymp1totic analysis of the one-step Gauss-Newton expanston
been presen,ted In section 5 we prel;ent a sinlul:il.ticln study shows
that resolution analysis based on X'X works well in samples. This has obvious
The minimization of the regularization functional in (1.4) could be approached either by
first or order gradient algorithms and it is likely that a combination of first and second
order gradient algorithms would prove most efficient. First order gradient methods have been
proposed for instance in[2, 11]. A powerful result which comes from optimal control theory is
that the gradient for the regularization functional can be computed by solving the original
differential equation (state equation) together with one associated adjoint equation, see[2, 11] for
example. Second order gradient algorithms use Hessian information and as a result the
computational effort per.iteration is much greater than for first order methods. In practice this has
to be balanced against available computing resources and the fact that second order methods can
achieve higher order rates of convergence. Here we will focus on the Gauss-Newton algorithm
which is a very well known second order gradient method for solving non-linear least squares
problems. The Gauss-Newton algorithm requires the sensitivity matrix which also plays a crucial
role in statistical inference and resolution analysis, see section 4 and 5.
2.1. Algorithm
For numerical computation we approximate the parameter by a linear combination of
approximating elements
(2.1)
where <1>" a cubic B-spline elements defined with respect to a uniform knot sequence on (0, I) and
knots of and 1, see[6]. The number of be quite large relative
to so the error in 1) will not be concern.
higher dimensions tensor product B-spline or elements could be employed.
a
where 9 = (9}>92, ... ,9p )' and 0 comes from the quadratic penalty term. Throughout the
remainder of the paper
so 0 is given by
1.. ..Old =J(l>k (x )epl (x)dx k,l =1,2, ... p .
(2.3)
(2.4)
With cubic B-splines, 0 becomes a symmetric seven-banded matrix. To compute the regularized
estimator we minimize lA,(9) subject to the constraint the solution be positive. Positivity is
enforced by requiring that the coefficients be positive, since the individual B-splines are positive,
see[6], the positivity of the solution is guaranteed. The standard damped Gauss-Newton
algorithm leads to the iteration scheme
9(s+l) = 9(s) - 't . [X'X+ AOjl[-X'r + A09(s)] (2.5)
for s =0,1,2, .. '. 't is a step size which is chosen so that I')", is minimized subject to 9(s+l)
satisfying the constraints. A step size halving strategy is used. In our experiments we have very
rarely found it necessary to explicitly enforce the positivity constraint. X is the sensitivity matrix
(2.6)
and r is the current residual,
(2.7)
for i = 1,2, . . . n , j = ... m , and k = 1,2, ...p. Convergence is declared when successive
At convergence 9')", is set to
the evaluation of sensitivity
coefficients. X is a very large array but fortunately it turns out that X does not have to be stored
in memory, rather it is only necessary to store X'X and X'r explicitly. This is described in
section 3. Before going on to this we will spend a bit of time discussing the practical
performance of the algorithm.
2.2. Performance of the Algorithm
2.2.1. Theoretical Convergence
Theoretical results concerning the convergence characteristics of the Gauss-Newton
algorithm are rather well documented in standard texts, see[18] for example. Newton methods
can be shown to be locally quadratically convergent for well behaved objective functions.
Damped Newton methods are globally convergence for strictly convex objective functions with
compact level sets. The objective function involved in (2.2) is very complicated and very
difficult to work with analytically.
The regularization functional consists of two parts: a residual sum of squares term and a
penalty term. The penalty term is quadratic and so it is clearly convex. If we knew that the
residual sum of squares term was approximately convex then we should have a good deal of
confidence in the convergence of the damped Gauss-Newton method. Let the residual sum of
squares be denoted RSS (e). Some random profiles of the residual sum of squares function were
evaluated to investigate convexity. Two values for 0 were generated as follows: ej!)= OJ + 3.·CJ.j
andOp) =ej - 3.·(1-CJ.j) for j =1.2.3•...p where 0 is the true value and the CJ.j'S are independent
uniforms over the interval [0.1]. The percent errors in euclidean norm errors between e(1),
plots of
l--l::m:J/'n I for 0 S; 13 S 1.
the hYl:>erlbolic and for the parabolic. Figure
squares cornpcmeltlt of the of)ll~ti'ile function. RSS
and eare about
residual sum
The nyperbolic problern shows some non-convexity on scales.
From these plots one would guess that the Gauss-Newton algorithm would perform rather well at
least provided the initial estimates are reasonable. (Of course even a very large number random
profiles would not generate enough information to allow us conduct a formal statistical test of a
convexity hypothesis - the set of alternatives is just too large. However given the appearance of
random profiles it would be unreasonable to act as if the objective function were pathological).
Line searches should alleviate concerns about starting guesses and the potential of getting caught
in local minima.
2.2.2. Observed Convergence Characteristic
The algorithm is terminated when the relative change in the objective function is small, we
test if
(2.8)
where tol = .001. Estimates for successively smaller values of A are computed, and the best
value is chosen using the method of Generalized Cross Validation described in section 4. If
Ai > A2 are successive values of Aat which solutions are computed eAt is used as a starting guess
in solving for 9A.z. For the examples computed in section 4, see Figure 4.2. the initial solution
took about 6 to 7 iterations but subsequent solutions numerically converged in 1 or 2 iterations.
The smoothest solution is essentially linear so that the computational effort with this solution
could be reduced by explicitly fitting a line for 9. The order of discretization in solving the
differential equations is N =4'p , see the next section. With p =50 average computing times
per on the Cray X-MP at the UC-Berkeley was on the order of 10.2 seconds for the
parabolic and 65 seconds for the hyperbolic. gives a computing time per time step per A
are 100
supercompnter at Berkeley is a new geIler,lt1C,n
3.1. Solving the Forward Problems
Implicit finite difference schemes are among the most popular methods for solving
parabolic and hyperbolic equations, see[23} for example. The interval 0::;; x ::;; 1 is divided into
N +1 subintervals each of width h = lI(N +1). The discretized approximation to the solution of
the differential equation is obtained at time intervals of length 6.. The solution at time j 6. is
denoted uJ' for j = 0,1,2, . .. with u(::: u (ih .i6. ;9). Thoughout this section we assume for
simplicity that measurement times are tj :;j6. for j = 1,2, ... m • When the measurement times
areotherwise,a simple interpolation scheme will be used, like section 3.3 below. The elliptic
component of the differential equations is approximated by a difference quotient
a [6(X) au Uh,j6.;6)] :::; 6(ih) (Ui+l -2ul+ui_l) + 9(ih) (Ul+l-ui-l )dX (fx h2 2h (3.1)
for i =1,2, ... N. 8 denotes differentiation with respect to x. The boundary conditions are used
to eliminateu6 and Uh+l from these equations and we obtain
a f6(x ) aUCih ,j 6.;9) ] =AuJ' +qidX l dx
(3.2)
where A is a non-symmetric tridiagonal matrix (independent of time) and qi involves boundary
and/or forcing conditions~ With Dirichlet or zero Neuman boundary conciitioIlS qi does not
involve 9 and so for the parabolic problem qi = O. Non zero Neuman boundary conditions force
qi to depend on 6 and for the hyperbolic problem we have
. {-28(h)1 (k6.)/h + 9(h)1 (k6.) i = 1ql= .0 i » 1
Implicit difference schemes set time differences u to be equal
(3.3)
averages of the
applroxinultiolllS to
A
ex.I>onlential fiJlnction, is
Sui+1 =Bui +(-!-qi+1 + -!-qi) , j =0,1,2,··· (3.4)
S 4A and B = I + The method is completely specified UOis given.
Difference Scheme for the Hyperbolic Equation
Sui+1= Bui - Sui-1 + ~2({-qi+1 + -!-qj + {-qj-1) j = 0,1,2, . . . (3.5)
where S = 1-1f.A and B = 21+ .tf-A. Here UO is given but the initial conditions must be used
to obtain u'. Since-9tu (x ,0)=0we setu1- u-1 obtain
(3.6)
The equations are easily solved by direct methods. The Cholesky decomposition of S is
obtained once at j = 0 and the solutions at all subsequent times are computed by back
substitution. In the case that adepends on time (directly or via U which occurs with non-linear
diffusion) separate Cholesky decompositions atel1eeded atever}'tirne step.
3.2. Comuptation of the Sensitivity Matrix
Recall that the sensitivity matrix is defined as the gradient of the solution u with respect to
the profile a.··· ·The of the sensitivity matrix is denoted dleu = By
differentiating the implicit difference equations for the forward problem we obtain a method for
nurnel'Jical approximationto dkU , let =
Dit'ferenc:e seneme for in the Parabolic Equation
j = 0,1,2, ... (3.7)
where Sk; = ie~ and Bk; = gl both of which are easy to evaluate since Sand B are linear in a,
this comes from (3.1) and (3.2). The local support property of B-splines makes Sk; and Bk; have
mostly zero elements. The q term drops out because in the parabolic problem q is independent of
a. dk;UO= 0 again because uOdoes not depend on a.
Difference Scheme for dk;U in the Hyperbolic Equation
Sdkui+1 = Bdk;ui - Sdk;ui-1 - Sk;ui+1 + Bk;Uj - Sk; ui-1 (3.8)
+ .6.2( -ldk;qi+l + -tdk;qj + ~ dkqi-l) j = 1,2, ...
again Sk = it and Bk = gl· Since dk; Un =0
3.3. Computation of X'r and X'X
(3.9)
Let ch:uj represent the n-vector resulting from linearly interpolating dk;ui to the
measurement sites, Xi , i = 1,2, ... n . The approximations to X'r and X'X are
X'rk :: 'Y(dkW)'r(tj))"'::1m . .
X'X/d ::: )~(iJk iii )'dl iii kl = 1,2,··· P (3.10)
where r(tj) is vector of residuals at the measurement sites at the j tm is the
time insl:ant at At the j'th step we conlpute ui+1 and
we then evaluate dkuJ'+l ...p. X'r and X'X areupdated as
= +
= j= m-I
X'rg and X'Xj are both set to zero.
Parallelism
It is obvious how to take advantage of parallel processing for computing X'r and X'X. In a
parallel processing environment at the j'th step the computation of dkUi+1 for k = 1,2, ... p is
distributed over available processors. An individual processor must first evaluate the right hand
side for dkui +1 and then obtain dkui +1 by back substitution. The shared data are ui+l, u/, (ui-1 in
the hyperbolic case), and the S and B matrices. After all p of dkUi +1 have been computed X'r
and X'X are updated using (3.11). The computatiQ.tl may then proceed to the next time step. First
the forward problem update is evaluated and using this the X'r and X'X updates are. carried out
Some experimentation with this scheme on an Alliant FXl8 at Lawerance Livermore National
Laboratories resulted in almost linear speed-up over sequential computation.
Feasibility of Higher Dimensional Problems
In l-dimension the computations effort per iteration is proportional to N '(p +1)·m. The N
term comes from the time required to solve the linear systems in (3.4)-(3.8), (p +1) because th~re
are (p+l) systems to solve at each time step and m is the number of time steps. In higher
dimensions the computational effort will be determined by how well the analogues of the basic
linear systems may be solved. Direct solution of the linear systems (3.5)-(3.8) may be worth
investigating because such techniques are easily vectorized, see[19]. Iterative solution with
acceleration schemes are natural because at a given time step the solution at the previous time
step will serve as a good starting guess. Computational experience in solving time dependent
partial appears in modem is rapidly
accumulating, see[19] and the cited our applications time
should grow as a function of p 2 (N = 4·p). For the in three dimensions
p=
6¢rk:el(~y mlachine, i.e, a
the conlputing· case considered secnon 2. Memory res'tn(:tiol1S on
a nUIILeric:al but assUlIll1Ung it is correct coaverts to
about 3.75 time time step per on a Cray 3. A single experiment
carried out on a four processor Cray 2 with p = 1000 indicates a computing time of 6 seconds per
time step per A-value. Since the Cray 3 is considered to be between 2 and 4 times faster than a
Cray 2, the 3.75 second estimate seems conservative. Computing requirements on this order are
minor when compared to other costs involved in the practical context of these problems.
Furth~rmoresupercomputingi~chnology' is. still moving •rapidly and it is clear that in
more powerful systems will become available to further reduce computing time concerns.
The availability of X'X makes it easy to tackle some important inference
problems such as the adaptive of the regularization parameter and interval estimates for
parameters. Both of these problems are discussed in this section.
4.1. Adaptive Choice of the Regularization Parameter
A number of techniques have been proposed for chosing regularization parameters. For
example section 5 of[ll] describes two techniques one due to MiIIer[12] and another which is a
variant of the discrepancy.principle of ArcangeIi[I] and Morozovf13]. In regularizatioll problems
it iswell known that the optimal choice of the regularization parameter is a function of the
unknown function 9 and the unknown noise level. Miller's method requires an apriori upper
bound on J (9), the penalty function evaluated at the true parameter value, and also a specification
of the noise level in the data. In practice such quantities are difficult to obtain. For optimal
convergence in the linear regularization problems the upper bound has to increase with increasing
sample size, yet it is unclear how the method can accommodate this. The discrepancy principle
approach is more clearly defined. The method has been analyzed and shown to be suboptimal,
see[9, 22], basically because it produces solutions which are consistently too smooth.
The choice of regularization-like parameters is a problem which has received considerable
problem of choosing the number of variables to enter into a regression equation. A number of
procedures have emerged which have been shown to have
these methods is Generalized Cross-Validation
minimi:idng the crtteriea
V (A.) = RSS (A.) I [mn -- J11(A.)]2
theoretical properties, one of
criterion A. by
(4.1)
and
(4.2)
RSS (A) is the residual sum of squares and [mn - J.!.l(A)] is the referred to as the effective degrees
of freedom for error. see[14] for another use of this sort of cross-validation function in a non-
linear least squares regularization problem.
In linear regularization problems, the GCV estimate is know to produce estimates which
perform wen from the point view of the predictive mean square error loss, In the present
situation the predictive mean square error is
(4.3)
where 9 is the true parameter and 9j. is the regularization estimate. An asymptotic theory for
GCV in non-linear regularization models is not available at present but numerical experiments
in[14, 17]have shown that the technique has definite promise.
An lllustration
For the parabolic and hyperbolic problems data were generated according to (1.7) with the
standard deviatioa.er. One
sample was generated with n=80 equispaced measurements sites over the interval [0,1] and a
supsample with n =20 equispaced measurement sites was also taken. The temporal sampling in
the pm'ab()lic case COIlSisted m=100 equispalced over the intl~rv,al [0,.35]; the
m=400 eqarspeced points over the intle:rv,aI G was 3.0 in
the paraocnc case and 0.1 in the hyperbolic case, this resulted of G to the sampte
true 11. p3nlbolic and hyt:erbolic
For 800Q(2()00) obsersations in the panlbolic and
32000(8000) observations in the hyperbolic case. Regularization estimates with p =30, were
computed on each data set for a range of A. values and the GCV function evaluated. The starting
was a constant profile a(x )=.5. Figure 4.1 shows the GCV function for each sample. The
GCV function is plotted against J..ll(A.) - the effective degrees of freedom for the model,
J..ll(O) :::: P = 30 (the order of discretization for a) and J..ll(A.) -t 2 as A. -t 00. In all four cases the
minimizer of the GCV criterion produces a value of A. which is very close to the minimizer of the
predictive mean square error. The efficacy of the GCV function is measured by
(4.4)
where i.. is the value of A. minimizing the GCV function. Efficacies, marked by "eff" on the
graphs, are close to one in all cases which means that the GCV criterion choses a value of A.
which is performs well from the point of view of the predictive mean square error loss.
Remark
We chose the scaling of A. on a degrees offreedom for model scale, Le. given a X'Xmatrix,
A. is chosen so that the effective degrees of freedom for the model, J..ll CA.), is some specified value.
Since J..ll is monotonically decreasing with increasing A. a zero finding routine was used to find the
A. which achieves the desired model degrees offreedom. At present the optimal value is found by
brute force but gradient methods might also be used provided care is taken to avoid local minima
- cross-validation functions can be noisy, see Figure 4.1.
4.2. Simultaneous Interval Estimates for Model Parameters
Ap,pr()xilmate Bayc~siatn intervals estimates are obl:airled in a straightforward maaaer, see
to a Gausstaa
(5 is estimated by the residual sraadaru devianon
(5= (4.6)
As indicated by Silverman[21], approximate ,nt,'rv'll esurnates 8, or functionals thereof, can
be generated by simulating from the posterior distribution. To obtain Highest Probability Density
(HPD) bands for the curve we generate realizations from the posterior distribution 01>82, ... Os
with B large say 1000 or so. The 2.5'th and 97.5'th percentiles of the set of values {
81{x ),a2{X) ... as (x) } are the lower and upper limits for a
(4.5) is Gaussian these this interval has the form
eA.{x) ± 1.96se (x)
where se (x) is the standard error in 8A.(x).
HPD interval at a point x. Since
(4.7)
For simultaneous intervals we replace the standard normal percentile value of 1.96 by a
factor{j with the property that
=.95 (4.8)
P is evaluating the posterior probability of the event. The factor f may be estimated by
simulation, i,e, we take the 95'th percentile of the set of values
b = I "',8. (4.9)
true curve simuitaneoills HPD are plotted
=3.0 so than l is set at the
true curve together
estimate is
wider than for the parabolic.
a relative lack of confidence in thesereduced from 80 to 20 measurement
The intervals for the hyperboiic estimation are
This could be due to a difference in the temporal sampnng rate. Looking back to (1.5) and (1.6),
we might sense that while the parabolic must somehow an estimation of the
hyperbolic must involve an estimation of
intervals are wider.
This be another reason why the hyperbolic
A theoretical discussion of Bayesian confidence intervals for linear regularization
estimators can be found in[4].· One might argued that the intervals ought to be adjusted to take
into account the uncertainty in the determination of A. An investigation of this issue is beyond
the scope of this paper.
A straightforward procedure XIX is
described in[l5]. The method is on a one-sten Gauss-Newton expansion, some asymptotic
analysis of this is provided in[l6]. we present a simulation shows that
resolution analysis based on XIX works
5.1. Bias and Variance
The error in the regularization estimator (h. may be expressed as
(5.1)
where ao is the true value of a and aA. is the regularized solution with continuous error free data.
The first term is the systematic error or bias in statistical jargon and the second is the random
error. The one-step Gauss-Newton approximations to the systematic and random errors are
aA. -- ao::: [XIX + AQ]-lX/Xao-- ao
and
(5.2)
(53)
where e is the vector of errors. In the first case
(2.5) with 't = L If the errors have mean zero
from (53)
sensitivity matrix is computed at ao while in
are uncorrelated with constant variance 02 then
(5.4)
one-step linearizatioa random error
can two tormu:las are as}'ml)toitictllly
a neil~borbjood
With these approximations asymptotic resolution characteristics can studied by
considering the simultaneous diagonalization
5.2. A Simulation Experiment
for example.
A simulation study was used to evaluate the accuracy of the X'Xvbased resolution analysis.
Data sets were generated according to the ,.,,,'r,,h,nJir and hyperbolic models.
Z'(s}-u(x' t, : C'»+,,(s}IJ - I , J ' 0 '-'IJ s =1,2,'" (5.5)
for i =1,2, ... nand j =1,2,..m , with £I~} being i.i.d. zero mean Gaussian random variables
with standard deviation of cr. The setup was the same as in section 4 except that only the n=80
spatial sampling was considered. For each of several A-values the regularization estimate eA.[s]
was computed for 25 data sets generated according to (5.5). For fixed A, the mean and variance
of (1A.[s] over the simulation runs provided direct estimates of the pointwise bias and variance.
The approximations in (5.3) and (5.4) were used to obtain X'Xvbased pointwise bias and variance
characteristics. The linearized bias based on (5.2) is a poor predictor of the true bias except for
small A. For larger A the bias is much better approximated by evaluating the regularization
estimator on error free data (i.e. (5.5) without the f4}). Pointwise bias and standard deviation
results for A. corresponding to a 4.5 degree of freedom model (see Figure 4.1) are presented in
Figure 5.1. The approximation matches the true remarkably well. We see that the regularization
estimate over shoots the true at the edges the parabolic case and undershoots in the hyperbolic
case. It natural to to use information 10 a to
of in
variance so some caution A-varues are
percent error bel:Wt~en the true
inregrated SQ1J13R:d error vn"•.U,J}
so, it is
hYIJerlJolic problems when degrees
,,\--tm,st:u analvsis orovioes a remarkably good
the true mean mtegratec squared error.
Acknewlegement
I am grateful to Michael Stewart and Tony for a number valuable comments.
1. Arcangeli, R., "Pseudo-solution de
l' Academie des Sciences, vol, Ser, A pp. '::'0.,,-'::'0,,).
" Comptes Rendes Seances de
2. Chen, W. H., Gavalas, G. R., Seinfeld, J. H., and Wasserman, L., Algorithm
for Automatic History Matching, " Soc. Pet. 1., vel. 14, pp, 593-608, 1974.
3. Cooley, R. L., "Incorporation of prior on parameters into nonlinear regression
groundwater flow models, 1 Theory," Water Resour. Res., vel. 18, pp, 965-976, 1982.
4. Cox, D.D., "An Analysis of Bayesian Inference for Non-parametric regression," Dept. of
Statistics, UW-Seattle, 1986.
5. Cox, D. D. and O'Sullivan, F., Analysis of penalized likelihood type estimators wih
application to generalized smoothing in Sobolev Spaces, 1988 (revision in preparation for
Annals of Statistics).
6. Delsoor, C., A Practical Guide to B-Splines, Springer-Verlag, New York, 1978.
7. Deufthard, P.and Hairer, E., Numerical Treatment of Inverse Problems in Differential and
Integral Equations, Birkhauser, Boston, 1983.
8. Engl, H.W. and Groetseh, C.W., Inverse and Ill-posed Problems, Academic Press, Inc.,
Boston, 1987.
9. Engl, H.W. and Neubauer, A., "Optimal parameter choice for ordinary and iterated
Tikhonov regularization," in Inverse and Ill-Posed Problems, ed. H.W. Engl and C.W.
Greetseh, pp. 97-125, Academic Press Inc., Boston,
Kimeldorf, G. S. between Bayesian Estimiflticln in
Splines," rvrcncu:» Matnematica: Statistics, vol,
Seinfeld, J.
systems 'by regularization," SIAM J. Control and Optimization,
1985.
23, 217-241,
Miller, K., "Least-squares methods for iu-cosec problems with a prescribed bound," SIAM
J. Math. Anal., vol. 1, pp, 52-74, 1970.
13. Morozov, A., "On the solution of functional equations the method of regularization,"
Soviet Math. Dokl., vol. 7, pp. 414-417,1966.
14. O'Sullivan, F. and Wahba,G., "A cross validated Bayesian retrieval algorithm for non-
linear remote sensing experiments," J. Compo Physics, vol. 59, no. pp. 441-455, 1985.
15. O'Sullivan, F., "A statistical perspective on ill-posed inverse problems (with discussion),"
J. Statist. Science, vol, 1, pp, 502-527, 1986.
16. O'Sullivan, F., "Constrained non-linear least squares regularization with application to the
estimation of functional parameters in elliptic partial differential equations," Tech. Rep.
No. 99, Dept. ofStatistics, DC-Berkeley, 1987.
17. O'Sullivan, F., "Fast computation of fully automated log-density and log-hazard
estimators," Siam J. Sci. Statist. Compo (in press), 1988.
18. Ortega, J. M. and Rheinboldt, W. C., Iterative Solutions ofNon-linear Equations in Several
Differential Equat:1OIIS on Vector andR.G., "Sclution19. Ortega, J.M. and
Parallel computers,"
iSarLtoza, F. and .:>YIJUt;:s, invelrsion of band-limited renecnon seismognuns,"
B. smootmng approach to nOIllparame:trlc
22. Til:teItn~~toJn, D.M., "Common structure in staestics.,' Inter.
Statist. Rev., 53, pp,
23. Twizell, E.H., Computational Methods for Partial Diiterential Equations,
Chichester, 1984.
Horweood,
24. Wahba, G., "Bayesian confidence intervals for the cross-validated smoothing spline," J. R.
Statist. ss«, vol. 45, pp. 133-150, 1983.
25. Wahba, G., "Partial and interaction spline models for semi-parametric estimation of
functions ofseveral variables," Proceedings of the 18' tIL- Symposium on the Interface, pp.
75-80, American Statistical Association, 1986.
1.1.1: r'arameters for the Parabolic Problem
1.1.2: Behavior of udarker snadmg representing
,t) in the Parabolic Probtern.values.
is a gray with
Figure 1.2.1: Parameters for the Hyperbolic Problem
Figure 1.1.2: Behavior u (x ,t) in the Hyperbolic Problem. This is a gray scale image plot withdarker shading representing larger values.
Figure 2.1 : Random Profiles of the Residual Sum of Squares
Figure 4.1 : Cross-validation functions plotted against the effective degrees of freedom for themodel, J.ll(A) in (4.2). The efficacy relative to the predictive mean square error, see (4.4), isindicated by eff,
Figure 4.2 : Estimate and 95% Confidence Bands The plots show the true curve (solid line) thecross-validated regularization estimate (dashed line) and the upper and lower confidence bands(dotted lines).
Figure 5.1 : Pointwise Error Characteristics for Agiving a model degrees of freedom of 4.5. Theplots show the true error characteristic estimated from the simulation (solid line) and thetheoretical approximations based on X'X (dashed line).
Figure 5.2 : Percent Errors in the Mean Integrated Squared Error Approximation
~
II9-
..,Q:::e:-(S':::'"
t=:t..:::J.C=-~.~ g.
I
~Q.J"""'"'<:t>>;'-'
c:JQ.J><1;::II
o
oIJ\~
IJ\-oIJ\
IJ\:-;
uO
o 40 60 so 100 0.50 0.55 0.60 0.65 0.70 0.75 o.ao 0.85
~ rr----.,...---,.....---.,----r-----..,
co~:..
)(
~ 'fII'"
eo
t
--
!='o
!='....~r"';,<
X ...s:
pOl
f
II
~~C II'-' 0IIo
I
e1<:uo~"-'
&1::::-'"1"IIo
fA-ofA
fA
~
-1.0 0.0 0.5 1.0 2.0 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
p..
.. f
- X
.. ~
...
c
t
-
po
pQ)
Figure 2.1: Random Profiles of Residual Sum of Squares
Parabolic, n:::80
I I
Parabolic, n=20
J It. J
0.0 0,2 0.4
Beta
0.6 0.8 1.0 0.0 0.2 0.4
Beta
0.6 0.8 1.0
Hyperbolic, n:::80 Hyperbolic. n=20
0.0 0,2 0.4
Beta
0.6 0.8 1.0 0.0 0.2 0.4
Beta
0.6 0.8 1.0
Parabolic 0=20
2 4 8 16 30 2 4 8 16 30
Model Degrees of Freedom
Hyperbolic 0=80
I
Model Degrees of Freedom
Hyperbolic 0=20
2 4 8 16 2 4 8 16
/~--
~~""~~~.. -,.. ~ ....,.<~>.,
~.. .. ...... -"" ~,,~
"':"\,
.--
0.0 0.2 0.4 0.6 0,8 1,0 0,0 0.2 0,4 0.6 0,8 1.0
Hyperbolic 0=80 Hyperbolic 0=20
o.,;
oo
Parabolic
· Pointwise .Error Gl1lan:lcterj:stjc:~s
ad
II
N0d
\ r\ r
0 \ id
\ ~
\ f0 ........~.............d -s9
od
od
o9
7/
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Hyperbolic (bias) Hyperbolic (s.d.)
oo,.-
o10
o ~ .
o10
I
oo,.
I
2 4 8 16 30
oo,.-
o10
o
o10
I
oo,.
I
Model Degrees of Freedom
Hyperbolic
2 4 8 16