2008 Royal Statistical Society 13697412/08/70567
J. R. Statist. Soc. B (2008)70, Part 3, pp. 567587
A wavelet- or lifting-scheme-based imputationmethod
T. J. Heaton
Oxford University, UK
and B. W. Silverman
St Peters College, Oxford, UK
[Received September 2006. Revised September 2007]
Summary. The paper proposes a new approach to imputation using the expected sparse repre-sentation of a surface in a wavelet or lifting scheme basis. Our method incorporates a Bayesianmixture prior for these wavelet coefficients into a Gibbs sampler to generate a complete posteriordistribution for the variable of interest. Intuitively, the estimator operates by borrowing strengthfrom those observed neighbouring values to impute at the unobserved sites. We demonstratethe strong performance of our estimator in both one- and two-dimensional imputation problemswhere we also compare its application with the standard imputation techniques of kriging andthin plate splines.
Keywords: Bayesian prior; Gibbs sampler; Imputation; Kriging; Lifting scheme; Thin platesplines; Wavelets
1.1. Aims of an imputation methodOften when performing a spatial survey there are sites of interest for which the measured vari-able might be unobserved. These could arise either through non-response or simply that the siteof interest could not be included in our original survey. A variety of geostatistical techniques,for instance LOESS (Cleveland et al., 1992), thin plate splines (Green and Silverman, 1994) andkriging (Cressie, 1993), have been developed which attempt to estimate the unobserved site onthe basis of the surrounding observed sites. Ideally any such imputation method will have thefollowing properties:
(a) it will produce a feasible estimate of the value at the missing sitewe wish the estimatethat is obtained to be close to its true, unobserved value;
(b) it will produce an estimate which indicates the uncertainty in the surrounding environmentif the missing site is in an area of high volatility, we wish the estimator to reflect that andsimilarly in regions of low volatility;
(c) it will be computationally feasibleit will produce the estimate sufficiently rapidly as toallow its use in practical situations;
(d) it will require no specification of unknown problem-specific parameterswe wish a methodto be fully automatic and to require little or no subjective specification of problem-specific
Address for correspondence: T. J. Heaton, Department of Statistics, University of Oxford, 1 South Parks Road,Oxford, OX1 3TG, UK.E-mail: firstname.lastname@example.org
568 T. J. Heaton and B. W. Silverman
parameters. Many current methods suffer from this problem of determining a suitablesmoothing or covariance.
In this paper, we propose a non-parametric method for imputation satisfying all the above cri-teria. Our method operates by utilizing the expected sparsity of wavelet expansions within aMarkov chain Monte Carlo framework. As we shall show, it is fully adaptive to the observeddata and requires no prespecification of parameters. The scheme proposed produces a completeposterior distribution on the possible values for the values at these missing sites that is capableof representing distinct features of the possible underlying response function. It is also able torecognize and react to the nature of the missing site producing an estimate with a small rangeof possible values if the surrounding area is stable and alternatively a large range if the area ismore volatile. Finally, owing to the compact support of the wavelets that we use, our methodremains computationally inexpensive.
1.2. Wavelet coefficients and a Markov chain Monte Carlo approach to imputationThe great strength of wavelets is their ability to provide parsimonious representations of alarge space of functions including those that contain inhomogeneities. As a result, it is oftenreasonable to assume that an observed function is well approximated by a sparse wavelet expan-sion with few non-zero coefficients. One approach, which was first suggested by Abramovichet al. (1998), that attempts to capture this potential sparsity models each wavelet coefficient jkindependently as
fprior.jk/= .1wj/0 +wj a.jk/,which is a mixture of an atom of probability at zero 0 and a unimodal symmetric density a./.Here, 0 wj 1 is a level-dependent mixing probability and the subscript a = aj allows pos-sible incorporation of a level-dependent scaling parameter. We incorporate this mixture priorinto a Gibbs sampler (Gelfand and Smith, 1990) to take advantage of the functions expectedsparse wavelet expansion and hence to borrow strength from those neighbouring sites that wehave observed. Intuitively, our method proceeds by updating the missing data points so that thecompleted function is expressed economically according to our prior.
1.3. Layout of the paperThe concepts in this paper are applicable to any wavelet-type scheme which creates a sparseexpansion in some suitable basis. This includes not only standard one-dimensional wavelets butalso the lifting scheme (Sweldens, 1997) and, in particular, its two-dimensional extension. Wepresent results by using both the one-dimensional discrete wavelet transform of Mallat (1989a)and the two-dimensional Voronoi-based lifting scheme that was introduced by Jansen et al.(2004). The one-dimensional case allows a simpler understanding of our methods operationwhereas the two-dimensional extension is of much more use in practical applications. Through-out this paper we interchangeably use the term wavelet and lifting coefficient.
In Section 2, our method is formally developed with a description of how a Bayesian priorenables imputation. Here we set out the relationships between the various variables within ourGibbs sampler with a special emphasis on the significant computational savings that are avail-able. We continue the explanation of our method in Section 3 where we consider three possiblepriors to use when modelling the non-zero wavelet coefficients: normal, Laplace and quasi-Cauchy, each requiring different implementation. We give detailed information about each andpresent the computational implications for their use.
Wavelet or Lifting Scheme Imputation 569
Our techniques performance is demonstrated in Section 4. Initially, we consider the use ofone-dimensional wavelets and present some general features of our method. This is followedby a presentation of our imputation results on two-dimensional problems. We consider botha simulated example as well as real life imputation of rainfall across America. Here we alsocompare our methods with the traditional approaches of kriging and thin plate splines.
The data that are analysed in the paper can be obtained from
2. The method
2.1. Notation and the wavelet set-upSuppose that we have a series of sites t1, . . . , tn at which we aim to measure the value of someresponse function h.t/. Further assume that the tis can be partitioned into t= .tF, tM/ as follows.At sites tF we can observe the response function h subject to noise,
xj =h.tj/+ "j,whereas at the sites tM we cannot gain any observation. Here the "j are independent N.0,2j /random variables. Our problem is the estimation of h at these off-plan sites tM.
Denote by xF those known, noisy observations corresponding to the on-plan sites tF andlet us postulate that the off-plan sites tM would generate the unknown observations xM. Assuch our completed data set would become
xM postulated noisy observations at missing sites,xF fixed known noisy observations.
The central idea of our approach is to use either the discrete wavelet transform (Mallat,1989a, b) or the lifting scheme (Jansen et al., 2004) to transform these x to an alternative basisin which we can legitimately expect the expansion to be sparse. Both of these techniques willleave us with a series of empirical wavelet coefficients zjk which can be modelled as
zjk =jk + "jkwhere jk are the underlying wavelet coefficients of the unknown function h./, and "jk are inde-pendent N.0,2jk/ random variables. In the wavelet domain, each of the x
M will typically affectseveral of the wavelet coefficients. We denote by zM those empirical wavelet coefficients whichdo depend on the values xM. Those empirical wavelet coefficients which do not depend on xM
will conversely be denoted by zF. We partition the s analogously.
2.2. The principle2.2.1. Gibbs samplingAlthough it may appear that in the wavelet domain we have made our problem more complicated(there will be more wavelet observations which are affected by the missing data points than inour original domain), we expect the wavelet coefficients to be sparse. This expected parsimonyis naturally incorporated in the model by placing the Bayesian prior of Abramovich et al. (1998)on our underlying population wavelet coefficients,
fprior.jk/= .1wj/0 +wj a.jk/where 0 wj 1, a is a symmetric unimodal density with possible level-dependent scalingparameter aj and the jks are independent. Here j denotes in which level of the wavelet decom-
570 T. J. Heaton and B. W. Silverman
position our coefficient lies. For the lifting scheme we impose an artificial dyadic level structureas discussed in Jansen et al. (2004) so that aj and wj can be chosen level by level, using informa-tion within each level to make that choice. To complete the specification of the prior we assumethat the wjs are independent with wj beta.j,j/. This beta model enables us to use conjugacyin our Gibbs sampler algorithm. Section 2.3 addresses how to choose j and j, along with thescale parameter aj, in a data-adaptive manner.
With this prior on the wavelet coefficients we allow access to a Gibbs sampler; see Gem