Download pdf - A wavelet- or lifting-scheme-based imputation methodsilverma/pdf/heatonsilverman.pdf · A wavelet- or lifting-scheme-based imputation method ... the one-dimensional discrete wavelet

© 2008 Royal Statistical Society 1369–7412/08/70567

J. R. Statist. Soc. B (2008)70, Part 3, pp. 567–587

A wavelet- or lifting-scheme-based imputationmethod

T. J. Heaton

Oxford University, UK

and B. W. Silverman

St Peter’s College, Oxford, UK

[Received September 2006. Revised September 2007]

Summary. The paper proposes a new approach to imputation using the expected sparse repre-sentation of a surface in a wavelet or lifting scheme basis. Our method incorporates a Bayesianmixture prior for these wavelet coefficients into a Gibbs sampler to generate a complete posteriordistribution for the variable of interest. Intuitively, the estimator operates by borrowing strengthfrom those observed neighbouring values to impute at the unobserved sites. We demonstratethe strong performance of our estimator in both one- and two-dimensional imputation problemswhere we also compare its application with the standard imputation techniques of kriging andthin plate splines.

Keywords: Bayesian prior; Gibbs sampler; Imputation; Kriging; Lifting scheme; Thin platesplines; Wavelets

1. Introduction

1.1. Aims of an imputation methodOften when performing a spatial survey there are sites of interest for which the measured vari-able might be unobserved. These could arise either through non-response or simply that the siteof interest could not be included in our original survey. A variety of geostatistical techniques,for instance LOESS (Cleveland et al., 1992), thin plate splines (Green and Silverman, 1994) andkriging (Cressie, 1993), have been developed which attempt to estimate the unobserved site onthe basis of the surrounding observed sites. Ideally any such imputation method will have thefollowing properties:

(a) it will produce a feasible estimate of the value at the missing site—we wish the estimatethat is obtained to be close to its true, unobserved value;

(b) it will produce an estimate which indicates the uncertainty in the surrounding environment—if the missing site is in an area of high volatility, we wish the estimator to reflect that andsimilarly in regions of low volatility;

(c) it will be computationally feasible—it will produce the estimate sufficiently rapidly as toallow its use in practical situations;

(d) it will require no specification of unknown problem-specific parameters—we wish a methodto be fully automatic and to require little or no subjective specification of problem-specific

Address for correspondence: T. J. Heaton, Department of Statistics, University of Oxford, 1 South Parks Road,Oxford, OX1 3TG, UK.E-mail: [email protected]

568 T. J. Heaton and B. W. Silverman

parameters. Many current methods suffer from this problem of determining a suitablesmoothing or covariance.

In this paper, we propose a non-parametric method for imputation satisfying all the above cri-teria. Our method operates by utilizing the expected sparsity of wavelet expansions within aMarkov chain Monte Carlo framework. As we shall show, it is fully adaptive to the observeddata and requires no prespecification of parameters. The scheme proposed produces a completeposterior distribution on the possible values for the values at these missing sites that is capableof representing distinct features of the possible underlying response function. It is also able torecognize and react to the nature of the missing site producing an estimate with a small rangeof possible values if the surrounding area is stable and alternatively a large range if the area ismore volatile. Finally, owing to the compact support of the wavelets that we use, our methodremains computationally inexpensive.

1.2. Wavelet coefficients and a Markov chain Monte Carlo approach to imputationThe great strength of wavelets is their ability to provide parsimonious representations of alarge space of functions including those that contain inhomogeneities. As a result, it is oftenreasonable to assume that an observed function is well approximated by a sparse wavelet expan-sion with few non-zero coefficients. One approach, which was first suggested by Abramovichet al. (1998), that attempts to capture this potential sparsity models each wavelet coefficient θjk

independently as

fprior.θjk/= .1−wj/δ0 +wj γa.θjk/,

which is a mixture of an atom of probability at zero δ0 and a unimodal symmetric density γa.·/.Here, 0 � wj � 1 is a level-dependent mixing probability and the subscript a = aj allows pos-sible incorporation of a level-dependent scaling parameter. We incorporate this mixture priorinto a Gibbs sampler (Gelfand and Smith, 1990) to take advantage of the function’s expectedsparse wavelet expansion and hence to borrow strength from those neighbouring sites that wehave observed. Intuitively, our method proceeds by updating the missing data points so that the‘completed’ function is expressed economically according to our prior.

1.3. Layout of the paperThe concepts in this paper are applicable to any wavelet-type scheme which creates a sparseexpansion in some suitable basis. This includes not only standard one-dimensional wavelets butalso the lifting scheme (Sweldens, 1997) and, in particular, its two-dimensional extension. Wepresent results by using both the one-dimensional discrete wavelet transform of Mallat (1989a)and the two-dimensional Voronoi-based lifting scheme that was introduced by Jansen et al.(2004). The one-dimensional case allows a simpler understanding of our method’s operationwhereas the two-dimensional extension is of much more use in practical applications. Through-out this paper we interchangeably use the term wavelet and lifting coefficient.

In Section 2, our method is formally developed with a description of how a Bayesian priorenables imputation. Here we set out the relationships between the various variables within ourGibbs sampler with a special emphasis on the significant computational savings that are avail-able. We continue the explanation of our method in Section 3 where we consider three possiblepriors to use when modelling the non-zero wavelet coefficients: normal, Laplace and quasi-Cauchy, each requiring different implementation. We give detailed information about each andpresent the computational implications for their use.

Wavelet or Lifting Scheme Imputation 569

Our technique’s performance is demonstrated in Section 4. Initially, we consider the use ofone-dimensional wavelets and present some general features of our method. This is followedby a presentation of our imputation results on two-dimensional problems. We consider botha simulated example as well as real life imputation of rainfall across America. Here we alsocompare our methods with the traditional approaches of kriging and thin plate splines.

The data that are analysed in the paper can be obtained from

http://www.blackwellpublishing.com/rss

2. The method

2.1. Notation and the wavelet set-upSuppose that we have a series of sites t1, . . . , tn at which we aim to measure the value of someresponse function h.t/. Further assume that the tis can be partitioned into t= .tF, tM/ as follows.At sites tF we can observe the response function h subject to noise,

xj =h.tj/+ "j,

whereas at the sites tM we cannot gain any observation. Here the "j are independent N.0,σ2j /

random variables. Our problem is the estimation of h at these ‘off-plan’ sites tM.Denote by xF those known, noisy observations corresponding to the ‘on-plan’ sites tF and

let us postulate that the ‘off-plan’ sites tM would generate the unknown observations xM. Assuch our ‘completed’ data set would become

x ={

xM postulated noisy observations at missing sites,xF fixed known noisy observations.

The central idea of our approach is to use either the discrete wavelet transform (Mallat,1989a, b) or the lifting scheme (Jansen et al., 2004) to transform these x to an alternative basisin which we can legitimately expect the expansion to be sparse. Both of these techniques willleave us with a series of empirical wavelet coefficients zjk which can be modelled as

zjk =θjk + "jk

where θjk are the underlying wavelet coefficients of the unknown function h.·/, and "jk are inde-pendent N.0,σ2

jk/ random variables. In the wavelet domain, each of the xM will typically affectseveral of the wavelet coefficients. We denote by zM those empirical wavelet coefficients whichdo depend on the values xM. Those empirical wavelet coefficients which do not depend on xM

will conversely be denoted by zF. We partition the θs analogously.

2.2. The principle2.2.1. Gibbs samplingAlthough it may appear that in the wavelet domain we have made our problem more complicated(there will be more wavelet observations which are affected by the missing data points than inour original domain), we expect the wavelet coefficients to be sparse. This expected parsimonyis naturally incorporated in the model by placing the Bayesian prior of Abramovich et al. (1998)on our underlying population wavelet coefficients,

fprior.θjk/= .1−wj/δ0 +wj γa.θjk/

where 0 � wj � 1, γa is a symmetric unimodal density with possible level-dependent scalingparameter aj and the θjks are independent. Here j denotes in which level of the wavelet decom-


position our coefficient lies. For the lifting scheme we impose an artificial dyadic level structureas discussed in Jansen et al. (2004) so that aj and wj can be chosen level by level, using informa-tion within each level to make that choice. To complete the specification of the prior we assumethat the wjs are independent with wj ∼beta.αj,βj/. This beta model enables us to use conjugacyin our Gibbs sampler algorithm. Section 2.3 addresses how to choose αj and βj, along with thescale parameter aj, in a data-adaptive manner.

With this prior on the wavelet coefficients we allow access to a Gibbs sampler; see Geman andGeman (1984) or Gelfand and Smith (1990). Furthermore, only those values zM can affect thevalues at the missing sites tM—for the wavelet transform this is guaranteed by its orthogonalityand a proof for the lifting scheme is given in Appendix A. As such, we are only required toperform updates on the coefficients zM and θM. Since both the wavelet and the lifting schemehave compact basis functions, the cardinality of these sets will be relatively small. Only in theinitial selection of the scaling parameter aj for γa.·/ and the hyperparameters αj and βj for theprior of wj do we use the fixed empirical wavelet coefficients zM. These hyperparameters areselected adaptively according to the values of the observed zFs lying in the corresponding leveland are not updated. The computational economy that this short cut provides is extremely largeand gives our method its practical interest and attractiveness.

2.2.2. The algorithmThe set-up of our Gibbs sampler is illustrated by the flow chart in Fig. 1, with each completeGibbs cycle consisting of two passes of the chart, the first from left to right with a secondreturning from right to left both updating variables as we cross them.

Although five variables appear in our flow chart, several of the relationships are deterministicin nature. The only variables that we are required to update for our Gibbs sampler are zM, wand θM. The two other variables, xM and hM, are shown to clarify the update process and tomake explicit the role of the observations in the original domain.

(a) Update θM given w and zM: without loss of generality, let Z ∼N.θ, 1/. Given our prior,simple Bayes theory shows that the posterior of θjk given zjk and wj satisfies

fpost.θjk|Zjk = zjk/={1−wpost.zjk/}δ0 +wpost.zjk/f1.θjk|zjk/,

where wpost.z/ is the posterior probability that θ �= 0 given Z = z and f1.θ|z/ is the pos-terior density of θ given Z = z and θ �= 0. We draw a sample from fpost.θjk|Zjk = zjk/ inaccordance with standard Gibbs sampling. For further details refer to Section 3.

(b) Update w given θM and zM: first introduce the sufficient variable δ = .δj/j∈J where

δj = ∑θM in level j

I.θM �=0/,

Fig. 1. Gibbs sampler set-up


the number of θMs in level j which are non-zero. For any wj, we have δj|wj ∼binom-ial.nj, wj/ where nj is the total number of θMs which lie in level j. Standard Bayesiantheory gives the posterior wj|δj ∼ beta.αj + δj,βj + nj − δj/. For each level j we drawfrom this distribution wj|δj to update our w.

(c) Update zM given w and θM: we first apply the abridged inverse wavelet or lifting transformto θM to obtain the imputed values hM which will create our estimate. As this is a linearoperation the full inverse transform is unnecessary. For each missing site, we require justone vector multiplication, which greatly saves computation. See Appendix A for moreinformation. We then create a new realization of our noisy xM ∼N{hM, .σM/2I} beforeperforming an abridged forward wavelet or lifting transform to generate new zM. Againlinearity means that the full forward transform is not required and we merely need toupdate those wavelet coefficients zM which are affected by the new xM. As above, this canbe done by using one vector operation for each missing site.

2.3. Using the fixed wavelet coefficients zF

Although the beauty of our approach lies in the fact that we are never required to update thefixed wavelet coefficients zF we can still use them to give information on the type of responsefunction h that is expected. If the function is extremely inhomogeneous at those sites that wecan observe (as would be manifested by a high density of large non-zero wavelet coefficients)then we would presumably wish to incorporate this knowledge in our estimator. Similarly, ifthe function appears relatively smooth at our observed sites (as would be suggested by a highdegree of sparsity in our wavelet coefficients) then it would seem reasonable to integrate thisinformation as well. In both cases this can be achieved by an adaptive choice of prior hyper-parameters for w and scaling parameter a, the probability and size respectively of a non-zerocoefficient in our model.

Following the EbayesThresh procedure of Johnstone and Silverman (2004) we can maximize

l.aFj , wF

j /= ∑z∈zF

j

log{.1−wFj /φ.z/+wF

j ga.z/}

to find a partial marginal maximum likelihood estimate of .aj, wj/. Here zFj are those empirical

wavelet coefficients within level j that also lie in zF, φ.·/ is the standard normal density andga.·/ is the convolution of our non-zero prior γa and φ. Our hope is to use this partial marginalmaximum likelihood estimate .aF

j , wFj / to select an appropriate scaling parameter aj and prior

for the mixing weight wj over the whole level, including the relevant zMs, which we can then userepeatedly within our Gibbs sampler.

To select the scaling parameter in our Gibbs sampler we set for each level a = aFj whereas

the hyperparameters αj and βj are chosen as follows. For each level, suppose that nFj is the

number of fixed wavelet coefficients zFj . We select our prior for wj adaptively according to

wj ∼beta.αj,βj/ where αj =1+nFj wF

j and βj =1+nFj −nF

j wFj . This corresponds to a posterior

distribution having initially assumed a prior wj ∼U[0, 1] before observing nFj Bernoulli random

variables of which a proportion wFj are non-zero.

3. Possible priors and computational details

As explained in Section 2.2.2, each iteration of our Gibbs sampler requires us to sample fromthe posterior of θjk given our empirical zjk and wj,

fpost.θjk|Zjk = zjk/={1−wpost.zjk/}δ0 +wpost.zjk/f1.θjk|zjk/:


where f1.θ|z/ is the posterior density of θ given Z = z and θ �=0 and

wpost.zjk/

1−wpost.zjk/= ga.zjk/

φ.zjk/

wj

1−wj,

with notation ga.z/ = γa Åφ, the convolution of our non-zero prior and the standard normaldistribution. We have considered three possible γ.·/s, all previously used in the wavelet literature(see for instance Abramovich et al. (1998) or Johnstone and Silverman (2005a)).

(a) Normal prior: mainly because of its simplicity, the normal prior has been the prior ofchoice in wavelet applications for many years, i.e. γτ2.θ/=φ.z; 0, τ2/ where φ.z;μ,σ2/ isthe density of an N.μ,σ2/ random variable. For this prior we find both

gτ2.z/=φ.z; 0, 1+ τ2/,

f1.θ|z/=φ

(θ;

τ2z

1+ τ2 ,τ2

1+ τ2

):

(b) Laplace prior: using the prior γa.θ/= 12 a exp.−a|θ|/ we obtain

ga.z/= 12 a exp.a2=2/{exp.−az/Φ.z−a/+ exp.az/Φ.−z−a/},

f1.θ|z/=

⎧⎪⎪⎪⎨⎪⎪⎪⎩

exp.az/φ.θ− z−a/

exp.−az/Φ.z−a/+ exp.az/Φ.−z−a/for θ�0,

exp.−az/φ.θ− z+a/

exp.−az/Φ.z−a/+ exp.az/Φ.−z−a/for θ> 0.

For details refer to Johnstone and Silverman (2005b). The posterior f1.θ|z/ is a weightedsum of two truncated normal distributions and is sampled by using rejection techniques.

(c) Quasi-Cauchy prior: for this prior a scaling parameter is not tractable but

θ|β∼N.0,β−1 −1/ with β∼beta. 12 , 1/:

Referring again to Johnstone and Silverman (2005b) for the details, we find

g.z/= 1√.2π/

z−2{1− exp.− 12 z2/}:

To sample from f1.θ|z/, we initially form a realization from β given z via

f.β|z/=12 z2 exp.− 1

2βz2/

1− exp.− 12βz2/

0�β�1,

which is a truncated exponential distribution. We then sample from the posterior of θgiven z with this value of β by using the result of the normal prior above.

4. Results

4.1. One and two dimensionsOur results are split into two distinct sections. Section 4.2 is concerned with one-dimensionalimputation using standard wavelets. Here we are required to incorporate the standard waveletassumption that our sites t are equidistant. Although it is unlikely that such one-dimensionalimputation is of great practical use, this section aims to make clear the mechanism of ourapproach by using a more familiar tool than the lifting scheme. Our understanding is further


helped by knowledge of the wavelet basis functions. This section demonstrates our method’sperformance on two artificial data sets and compares the effectiveness of the various γ-priors.

In Section 4.3 we present our method’s natural extension to two-dimensional imputationusing a Voronoi-based lifting scheme. The lifting scheme does not require that we have 2N sites,or that they be equidistant. Instead our method can be employed given any number of sites tlying arbitrarily in our space. This versatility allows it applicability in many practical situations.Our first illustration uses the simulated function maartenfunc of Jansen et al. (2004) to exam-ine the effectiveness of our method at a range of sites. We then consider a real life imputationexample using rainfall data obtained from the US Geological Survey.

4.2. Wavelets4.2.1. Test data setsWe demonstrate one-dimensional performance by using the well-known wavelet data sets Bumpsand Blocks (Donoho and Johnstone, 1994). These functions are shown in Fig. 2 along withrealizations of noisy signal where independent white noise (RSNR=7) has been added. In thissection, we consider only the situation of a single missing site tM at which we wish to predict thevalue of the function h. At all the other sites we can observe the data subject to noise. Althoughthis may seem a large restriction, owing to the non-overlapping nature of the wavelet basis thisis not so. As long as the missing sites are not clustered together, our method is capable of pro-ducing reliable estimates for each point. In Section 4.3 we investigate the performance of oursampler with multiple imputation sites in two dimensions.

0 200 400 600 800 1000

−2

6

Site

Blo

cks

0 200 400 600 800 1000

−2

6

Site

0 200 400 600 800 1000

Site

0 200 400 600 800 1000

Site

Blo

cks

with

Noi

se

0

Bum

ps

0

02

4

02

4

12

34

5

12

34

5

Bum

ps w

ith N

oise

Fig. 2. Blocks and Bumps test functions of Donoho and Johnstone (1994) both with and without noise(RSNR = 7)


760 780 800 820 840

4

Site(a)

(b)

(c)

Obs

erve

d V

alue

0 1000 2000 3000 4000 5000

02

13

5

Iteration

Est

imat

e

3.5 4.0 4.5 5.0 5.5

0.0

1.0

2.0

Estimate

Fig. 3. Summary of results for missing site 810 with a Laplace prior and Daubechies (1988) least asym-metric wavelets, filter 10: (a) Blocks function with noise (RSNR = 7; �, missing site); (b) estimate of missingsite 810 by using a Laplace prior; (c) histogram of estimate for site 810 (removing the burn-in of the first 500iterations)

4.2.2. An initial Blocks exampleWe first consider the circumstance illustrated in Fig. 3. Here we observe the Blocks function,subject to noise with RSNR=7, at all sites except site 810 where we use our method to predictthe underlying function on the basis of the observed sites. Our sampler converges extremelyquickly as assessed by the R-statistic of Gelman and Rubin (1992). It also provides a plausi-ble value that is centred close to the noiseless Blocks function. In this instance, the underlyingBlocks function has a value of 4.20 at site 810 whereas our estimator has a mean of 4.35 and avariance of 0.07.

4.2.3. Simulation studyIn general, it would be improper to assume that there is a definitive answer about the value thatthe estimator should take. There are many equally valid functions which could be fitted to thedata points, all of which would give differing values. This makes any objective measure of ourmethod’s performance somewhat difficult. However, we might on average expect the estimatesthat are obtained at the various missing sites to follow the underlying test function from whichwe have created the data.

With this in mind, we devise a ‘leave-one-out’ test. For both the Blocks and the Bumpsfunction, let us define a test range that contains several of the features which characterize therespective functions. For each site ti in this range, we create five independent realizations ofnoisy signal (RSNR=7) and apply our method on each supposing that we can observe all sites


other than ti as in the example in Section 4.2.2. For each site and realization we run the samplerfor 5000 iterations including a burn-in of 1000 which is discarded. The Bumps test range will bebetween sites 390 and 465, which is an area containing a broad peak accompanied by a sharpernarrower peak; the Blocks test range will be between 760 and 845 and contain several of thediscontinuities which identify it.

Figs 4 and 5 summarize the results for each of the three priors discussed. Figs 4(a), 4(c) and4(e), and 5(a), 5(c) and 5(e) display as dots the mean of the estimator averaged over the indepen-dent realizations compared with the full line of the underlying function. Meanwhile in Figs 4(b),4(d) and 4(f), and 5(b), 5(d) and 5(f) we show the variance of the estimator, again averaged overthe three runs. It is important to appreciate when drawing any conclusions from these figuresthat by using such a condensed measure of the estimator we lose a tremendous quantity ofinformation about the nature of the individual runs.

Our estimator performs well as an imputation method. At most of the sites the mean recreatesthe underlying function reliably. Some outliers exist at those sites that are near the very apicesof the Bumps function and the breaks in the Blocks function. On investigation, this is a conse-quence of bimodality. At these sites the observed values can legitimately be modelled by usingone of two wavelets and our sampler alternates between these two states. Although one modeis generally a good fit to the underlying function, the other highlights some other feature of thedata in the region of interest. We also tested the method on an additional set of narrower peaksfrom the Bumps function with considerably more noise (RSNR = 3) and similar performancewas seen with reliable tracking of the underlying function at the sides of the peaks but breakingdown into bimodality in the centres. We discuss this further in Section 4.2.4 by using the exampleof site 410 in the Bumps function with the Laplace prior.

The variance of the estimate reflects the nature of the missing site’s neighbourhood. In inho-mogeneous regions such as the peaks of the Bumps and the jumps in the Blocks functions,the estimate has a corresponding high variance. This is again a result of the estimate’s bimo-dality where our sampler can pick out and present differing possibilities for the underlyingfunction.

Little difference is to be seen between the priors that were considered with the same generalfeatures and values occurring for all three. Finally of note is the introduction of some patter-nation in the constant sections of the function. We believe that this is due to the rippled tails ofthose wavelets which must be used to fit the large discontinuities.

4.2.4. Bimodality—Bumps site 410 with Laplace priorWe have already indicated the tendency of our method to settle into a bimodal equilibrium ifdiffering wavelets can be used to represent the possible underlying function. A case in point issite 410 of the Bumps function with the Laplace prior. In Fig. 6(a) we present a particular runof our sampler illustrating this bimodality with one mode occurring around 4.00 and the otheraround 3.50. The underlying value of Bumps at this site is 3.99.

Table 1 shows individual posterior wavelet coefficients when the sampler lies in each modetogether with the coefficients of the noiseless Bumps function and the relevant row of thewavelet decomposition matrix determining the influence of each wavelet on the final estimate.Unsurprisingly the largest discrepancies are seen at wavelets of high influence. Different valuesof xM will greatly affect these wavelets and vice versa. They are also at finer scales since a singlemissing data point will have less relative influence on the presence of coarser-scaled wavelets.Also at coarser scales all coefficients tend to be large and so the posterior θMs are rarely sampledas 0. The mode into which our estimator falls seems to depend on which of the two waveletsj = 8, k = 105, and j = 9, k = 210, are present in the posterior. In the higher estimate only the


400 420 440 460

Missing Site(a) (b)

(c) (d)

(e) (f)

400 420 440 460

Missing Site

400 420 440 460

Missing Site

400 420 440 460

Missing Site

400 420 440 460

Missing Site

400 420 440 460

Missing Site

Mea

n of

Est

imat

e

0.0

0.2

0.4

0.6

Var

ianc

e of

Est

imat

e

Mea

n of

Est

imat

e

0.0

0.2

0.4

0.6

Var

ianc

e of

Est

imat

e

01

23

40

12

34

01

23

4M

ean

of E

stim

ate

0.0

0.2

0.4

0.6

Var

ianc

e of

Est

imat

e

Fig. 4. Leave-one-out estimator for the Bumps function, sites 390–465 with Daubechies’s least asymmetricwavelets, filter 10: (a), (b) normal prior; (c), (d) Laplace prior; (e), (f) quasi-Cauchy prior


760 780 800 820 840

05

Missing Site(a) (b)

(c) (d)

(e) (f)

Mea

n of

Est

imat

e

760 780 800 820 840

0.0

0.5

1.0

1.5

2.0

2.5

Missing Site

760 780 800 820 840Missing Site

760 780 800 820 840Missing Site

760 780 800 820 840Missing Site

760 780 800 820 840Missing Site

Var

ianc

e of

Est

imat

e

Mea

n of

Est

imat

e

0.0

0.5

1.0

1.5

2.0

2.5

Var

ianc

e of

Est

imat

e

12

34

01

23

45

01

23

45

Mea

n of

Est

imat

e

0.0

0.5

1.0

1.5

2.0

2.5

Var

ianc

e of

Est

imat

e

Fig. 5. Leave-one-out estimator for the Blocks function, sites 760–845, with Daubechies’s least asymmetricwavelets, filter 10: (a), (b) normal prior; (c), (d) Laplace prior; (e), (f) quasi-Cauchy prior


3.0 3.5 4.0 4.5

0.0

0.5

1.0

1.5

Estimate(a) (b)

Site

Missing Site

390 394 398 402 406 410 414 418 422 426 430

Fig. 6. (a) Histogram of the estimate for missing site 410 of the Bumps function by using a Laplace priorand Daubechies’s least asymmetric (filter 10) wavelets and (b) the two wavelets determining into which modethe sampler falls ( , Bumps function; – – –, wavelet j D9, k D210; . . . . . . ., wavelet j D8, k D105)

Table 1. Important posterior wavelet coefficients for the Laplace prior andmissing site 410 of the Bumps test

Wavelet Influence on Wavelet transform Estimate(j, k) estimate of noiseless

Bumps functionLower, 3.50 Higher, 4.02

(9, 209) −0.47 −0.11 0.00 0.00(9, 210) −0.38 0.14 0.35 0.00(9, 211) 0.03 −0.04 0.00 0.00(8, 104) 0.00 −0.09 0.00 0.00(8, 105) 0.64 0.39 0.00 0.43(8, 106) 0.01 −0.30 0.00 −0.25(7, 51) 0.05 0.30 0.47 0.56(7, 52) −0.20 −0.85 −0.61 −0.87(7, 53) 0.17 1.15 1.32 1.27(6, 24) −0.01 −0.19 −0.24 −0.19(6, 25) 0.04 0.69 0.38 0.66(6, 26) −0.20 −2.09 −2.33 −2.36

j =8, k=105, wavelet is present whereas in the lower estimate only the j =9, k=210, is non-zero.These two wavelets are also shown in Fig. 6 along with the relevant Bumps section.

For insight into the causes of bimodality, we need to recognize that intuitively our methodattempts to produce parsimonious representations of the underlying function. It tends to keeplarge coefficients large and small coefficients small. The differences lie in those moderate-valued coefficients whose presence or absence is more difficult to ascertain. Our sampler is thuspredisposed to produce estimates which enable the completed function to be expressed solelyin terms of large and small wavelet coefficients. Bimodality can occur when different imputedmissing values allow for different sparse representations. For site 410 these possibilities derive


from the fact that, depending on the missing site’s value, the peak can be modelled by using thetwo different wavelets that were mentioned above.

4.2.5. Option 1—illustrated by finest scale wavelet j =9, k=210As shown in Fig. 6, this wavelet has a very narrow central peak centred on site 411 surroundedby small troughs at sites 410 and 412. If the missing site 410 were to take a value near the lowermode then the neighbourhood of the completed function could be accurately represented bythis single wavelet—the function value would gently increase up to site 409, dip slightly at site410 before rising to a very sharp peak at 411. Observing the posterior wavelet coefficients inTable 1 we can see this effect. When the estimate takes the lower value j =9, k =210, is the onlynon-zero wavelet in any of the two finest scales.

4.2.6. Option 2—illustrated by wavelet j =8, k=105Alternatively, the wavelet j =8, k =105, can also allow a highly parsimonious representation ofthe imputed function if the missing site were to take a value towards the higher mode. As Fig. 6demonstrates, with such a value the completed peak would have a width that is similar to thatof the j = 8, k = 105, wavelet. Again this would allow us to express the whole neighbourhoodwithout the need for other wavelets. Indeed it is this wavelet which is most significant in buildingup the peak of the noiseless Bumps function itself.

This can be seen in the posterior coefficients of Table 1. In the higher mode the finer scaledwavelet j =9, k=210, is not present and the coarser j =8, k=105, dominates. In this higher modethe posterior wavelet coefficients are very similar to the wavelet coefficients of the underlyingnoiseless function and so the resulting estimate has a very similar value.

4.3. The lifting schemeWe now move on to discuss and present the practical applications of our method on two-dimen-sional geostatistical problems using the lifting scheme (Jansen et al., 2004). Here we can relax theassumption on the spacing and number of data sites. The lifting scheme allows the applicationof our method for any number of sites no matter where they may lie.

4.3.1. Artificial data studyThe initial investigation into our method’s two-dimensional performance considers the impu-tation of points lying on a surface within a unit square according to Fig. 7. In Fig. 7(a) we cansee a plot of this maartenfunc surface,

f.x, y/= .2x+y/ I.3x−y< 1/+ .5x−y/ I.3x+y> 1/:

This function, which was introduced in Jansen et al. (2004), consists of two planes with a jumpdiscontinuity where they intersect.

We consider the situation that is illustrated in Fig. 7(b); we can observe the value of thissurface, subject to noise, at 100 points that are distributed uniformly at random within thisarea. On the basis of these random observations we wish to predict the surface value at thefour predetermined sites that are shown with a dot. These sites were chosen to demonstratehow our method operates in a range of situations. To illustrate the method’s performance nearmaartenfunc’s discontinuity we selected the two sites .0:36, 0:38/ and .0:54, 0:32/ whereas foran example in more stable regions we chose the two sites .0:2, 0:8/ and .0:8, 0:2/. In Table 2we present the respective means and confidence intervals along with, for comparison, the cor-responding values by using the now standard methods of kriging and thin plate splines. Both


x(a)

y

0.0 0.2 0.4 0.6 0.8 1.0x

(b)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

y

0.0

0.2

0.4

0.6

0.8

1.0

Fig. 7. Contour plots of maartenfunc: (a) noiseless function; (b) simulated noisy realization (RSNR = 5)(having been able to observe the noisy realization at 100 uniformly distributed sites we then wish to estimatethe value of the underlying function at the four sites (�))

Table 2. Mean and confidence intervals of our estimate at each site by using our method (Laplace, quasi-Cauchy prior and normal prior all with 15 scaling coefficients) along with kriging and thin plate splines in oursimulated maartenfunc imputation example

Method Results for site 1, Results for site 2, Results for site 3, Results for site 4,(0.2, 0.8) (0.36, 0.38) (0.54, 0.32) (0.8, 0.2)

Mean 95% credible Mean 95% credible Mean 95% credible Mean 95% credibleinterval interval interval interval

maartenfunc 1.20 — 1.10 — 2.38 — 3.80 —Laplace 1.09 (−0.02,2.39) 1.26 (0.08,2.12) 2.52 (1.70,3.53) 4.09 (3.13,5.10)Quasi-Cauchy 1.14 (−0.02,2.53) 1.25 (−0.18,2.01) 2.48 (1.85,3.40) 4.08 (3.15,5.11)Normal 1.17 (−0.17,2.70) 1.16 (−0.09,2.19) 2.54 (1.51,3.68) 4.06 (3.13,5.01)Kriging 1.13 (0.77,1.49) 1.38 (0.99,1.78) 2.50 (2.13,2.88) 3.99 (3.63,4.36)Thin plate splines 1.12 (0.84,1.39) 1.37 (1.07,1.67) 2.48 (2.21,2.74) 4.00 (3.73,4.27)

these alternative methods were implemented in R by using the fields package (Nychka, 2005)with the default generalized cross-validation settings.

All the methods produce feasible estimates in terms of generating means that are close tothe value of the underlying function at all our chosen sites and, for our lifting algorithm, theredoes not seem to be a great difference between the estimates that are obtained with the Laplace,


quasi-Cauchy and normal prior. The lifting scheme performs most favourably at site .0:36, 0:38/

neighbouring the jump discontinuity where it can generate an estimate that lies closer to thetrue value than the other techniques. Preserving such inhomogeneities is much more difficultfor kriging and thin plate splines since they are based on finding a smooth representation ofthe underlying surface. As a consequence, both kriging and thin plate splines overestimate thefunction here. At sites .0:2, 0:8/ and .0:54, 0:32/ our method allows equally accurate predic-tion to either kriging or thin plate splines. At site .0:8, 0:2/ our lifting method seems to beslightly less accurate than kriging or splines. We expect this to be a consequence of site .0:8, 0:2/

lying towards the boundaries of the region surveyed and so having fewer neighbours to drawinformation from.

The mean surface prediction confidence intervals are narrowest with thin plate splines andkriging. The lifting estimates typically have heavier tails and hence give wider intervals. Also,in some alternative examples we discovered that, as in Section 4.2.4, our lifting method cangenerate estimates exhibiting bimodality.

It should be noted that the lifting estimates that were obtained above are relatively stableto the number of scaling coefficients kept which are not assumed to be sparse. We consideredranges from 5 to 20. Our method is also affected by the method of selection of the lifting scheme’sfilter coefficients which must be done to generate wavelet coefficients with realistic variances;see Jansen et al. (2004) for further details. We used least squares. Both of these considerations,along with consideration of alternative strategies for the artificial partitioning into levels of thewavelet coefficients, would be interesting topics for further study.

4.3.2. Rainfall data across AmericaOur final example concerns the prediction of rainfall across the USA. The National Atmos-pheric Deposition Program (NADP) is a co-operative network of precipitation monitoringstations that spans the USA. On the basis of a network of over 250 sites nationwide, itsaim is to collect information on the quantity and chemistry of precipitation to assess possi-ble geographical and temporal trends. The NADP make their collected data publicly availableat http://nadp.sws.uiuc.edu/sites/ntnmap.asp?.

An important and common problem for the NADP is predicting values at sites which, forone reason or another, may be unavailable for measurement. We consider the specific case offorecasting the yearly quantity of rainfall in 2004 as illustrated in Fig. 8. To allow numericalcomparison of our imputation, we select three sites at which to estimate the rainfall. At these

Site 1

Site 2

Site 3

Fig. 8. Map of the USA showing the quantity of rainfall at government measuring stations: the circles showthe locations of the stations, with the shading representing the amount of rain (the darker the shading, themore rain); we remove the three labelled sites from the data set and attempt to predict them on the basis ofthose which are left


sites, we use the NADP data at all the sites bar these three to create a prediction of the amount ofprecipitation. The predictions are then compared with the known observed quantity of rainfall.The imputation here was done on a log-scale with the error in observed values assumed to havea variance of 5%.

Table 3 displays the mean and standard error of our estimator by using all three suggestedpriors with the corresponding results by using kriging and thin plate spines shown for compar-ison. Additionally, Fig. 9 presents histograms of the estimates that were obtained by using our

Table 3. Mean and standard error of the rainfall estimate at the selected missing sites byusing various methods

Method Results for site 1 Results for site 2 Results for site 3

Mean 95% credible Mean 95% credible Mean 95% credible(cm) interval (cm) interval (cm) interval

Observed 15.7 — 97.0 — 146.1 —Laplace 30.5 (13.5,43.6) 74.6 (37.1,126.9) 127.0 (54.5,164.9)Quasi-Cauchy 29.9 (13.0,42.0) 72.9 (35.3,112.8) 127.2 (64.5,167.5)Normal 31.8 (16.7,45.4) 74.3 (31.5,135.7) 127.2 (53.2,165.5)Kriging 41.9 (18.4,95.6) 73.0 (30.4,175.6) 122.4 (57.1,262.6)Thin plate splines 40.6 (30.6,53.9) 74.7 (54.1,103.1) 117.9 (94.7,146.9)

0 50 100 150

0.00

0.02

0.04

0.06

0.08

0.10

Rainfall prediction

(a) (b) (c)

0 100 200 300 400

0.00

0.01

0.02

0.03

0.04

Rainfall prediction

0 100 300 500

0.00

00.

005

0.01

00.

015

0.02

00.

025

Rainfall prediction

Fig. 9. Histograms showing the amount of rain (in centimetres) predicted by using our method with theLaplace prior at each of the sites selected (the real observed rainfall at each site is also given for comparison):(a) site 1 (observed value 16 cm); (b) site 2 (observed value 97 cm); (c) site 3 (observed value 146 cm)


method with the Laplace prior. The real rainfall values for each site are quoted in the captionfor comparison.

Our lifting method gives the closest estimates for both site 1 and site 3 and equal performanceat site 2. It also generates feasible credible intervals for the rainfall surface that in all cases, barsite 1 with the normal prior, contain the true value of observed rainfall. Thin plate splines gener-ate very narrow 95% credible intervals that at site 1 are a long way from containing the observedvalue whereas at sites 2 and 3 the observed value lies on the very boundary. Kriging also fails toinclude the observed value at site 1 and at the other sites gives very broad credible intervals thatare not particularly informative since they cover almost the entire range of observed rainfallacross the country. Note also that for all the sites our lifting estimates have considerably moreflexibility in their distribution than the log-normals of kriging and thin plate splines.

All methods overpredict the rainfall at site 1. This is perhaps to be expected since, as Fig. 8shows, this site’s rainfall is lower than its surrounding neighbours’—its nearest neighbour had39 cm of rainfall. The nearest predictions are achieved by using lifting with estimate means ofaround 30 cm. Both kriging and thin plate splines overestimate the rainfall by a greater degreewith values above 40 cm. The only methods that can include the observed value in the credibleinterval are our lifting method with the Laplace and quasi-Cauchy prior.

Similar predictions for site 2 are given whatever the method used with means around 73 cm,as compared with the observed value of 97 cm. All methods contain the observed value withinthe confidence interval although it is towards the interval boundary for the thin plate splinemethod.

Finally, at site 3 the lifting scheme again appears to be the best of the methods. In comparisonwith the observed rainfall of 146 cm, the lifting scheme generates estimates with means of 127cm whereas kriging gives a prediction of 122 cm and thin plate splines 118 cm. All three tech-niques include the observed value in their 95% credible interval although it is again on the veryboundary of the confidence interval for thin plate splines whereas the kriging interval suggestspossible rainfall far beyond what we might reasonably expect to observe.

5. Closing remarks

In this paper we have developed a novel method of imputation utilizing the expected sparserepresentation of a function or surface in a wavelet basis. Our method builds on the work ofAbramovich et al. (1998) and Johnstone and Silverman (2005a), incorporating their Bayesianframework into a Gibbs sampler to generate a complete posterior distribution for the variableof interest at a missing site. We have developed the method by using three possible priors for thewavelet or lifing coefficients: normal, Laplace and quasi-Cauchy. The relative performance ofeach was also intended to offer some insight into the suitability of each prior for other waveletmodelling applications.

The performance of our approach was initially investigated in one dimension by using thediscrete wavelet decomposition of Mallat (1989a). Although the practical applications of suchone-dimensional implementation are limited, this analysis did permit a clearer understandingof our method’s operation. On the standard Blocks and Bumps functions (Donoho and John-stone, 1994) our approach produced feasible estimates, often highlighting important featuresof the underlying function such as inhomogeneity through bimodality in the resulting estimate.

Our method was then extended to the two-dimensional problem by using the Voronoi liftingscheme of Jansen et al. (2004). An investigation was carried out on simulated and real datawhere its performance was contrasted with the well-recognized techniques of kriging and thinplate splines. So long as a suitable scaling parameter within the prior was selected, little differ-


ence was observed between the Laplace, quasi-Cauchy and normal prior methods. With anyof these three, the lifting technique compared well with both kriging and thin plate splines.Its use matched their performance in homogeneous regions but offered increased accuracy inneighbourhoods containing discontinuities where the traditional method would smooth.

In the case of the lifting scheme the success of our method does depend on the ability of theparticular decomposition scheme to produce coefficients exhibiting sparsity while also havingrealistic variances. The lifting scheme is a relatively new technique and we believe that find-ing new decomposition schemes that achieve these aims would be a valuable area for furtherresearch.

Acknowledgements

The authors very much thank Maarten Jansen, Guy Nason and Matthew Nunes for permittingthe use of their R implementation for the Voronoi lifting scheme, and Doug Nychka for hishelp in implementing the fields package. This research has also made use of rainfall data inthe USA that were made available by the NADP. The authors are very grateful for this resourcealong with the accompanying personal correspondence from Christopher Lehmann. Finallywe thank two reviewers and the Joint Editor for their suggestions on improvements to thispaper.

Appendix A

A.1. Defining zM

As discussed in Section 2, the practicality of our method lies largely in the economy of having to performoperations solely on the few wavelet coefficients zM which are affected by the missing values. Since thetransformation is linear, we are not required to update or invert those other coefficients whose basis func-tions do not overlap the missing sites. Throughout this paper we have denoted these coefficients zF. Thereis, however, some potential ambiguity in the definition of zM and zF as it is not necessarily only thosewavelet coefficients which are affected by the missing data points in the forward transformation that cancontribute to the value of the missing sites in the inverse step. If there were some such coefficients, ourmethod would be slowed considerably since they would be difficult to identify.

We are required to show the equivalence of the set of coefficients affected or affecting values in theoriginal domain in the forward–backward transformation. In this appendix we first demonstrate a prooffor orthogonal and biorthogonal wavelets. We then continue to consider our Voronoi-based lifting scheme,describing the process by which the lifting coefficients evolve (for further details see Jansen et al. (2004)),before moving on to present a simple argument demonstrating that this equivalence still holds.

A.2. Wavelets and biorthogonal waveletsLetting Sj,k denote the support of the dual wavelet function ψj,k.t/, the wavelet coefficient

zj,k =∫

Sj,k

x.t/ ψj,k.t/ dt:

A coefficient zj,k is therefore affected by a missing value at location ti if and only if the support of the dualwavelet function ψj,k.t/ contains ti, or, in our discrete case, if and only if the row corresponding to zj,k hasa non-zero entry in the column corresponding to ti.

Conversely, a coefficient zj,k affects the value at location ti in the reconstruction if and only if the primalwavelet function ψj,k.t/ has a support that contains ti. This can be seen since for Sj,k the support of theprimal wavelet function ψj,k.t/ we have

x.ti/=∑k

xL,k φL,k.ti/+J−1∑j=L

∑k:i∈Sj, k

zj,k ψj,k.ti/:


Here J is the resolution level of the observations, L is the lowest level of the decomposition, xL,k are thescaling functions at that level andφj,k are the primal scaling functions. In our discrete case we similarly havethat coefficient zj,k affects the value at location ti in reconstruction if and only if the column correspondingto zj,k in the inverse transform has a non-zero value in the row corresponding to ti.

In the case of orthogonal wavelets ψj,k.t/ = ψj,k.t/, our inverse wavelet transform matrix is simply thetranspose of the forward wavelet transform matrix and so our equivalence is trivial. In the case of bior-thogonal wavelets, the support of the scaling function φj,k is determined by the number of non-zero valuesin the filter of the dilation equation. Specifically we have the following lemma.

Lemma 1. If h0, . . . , hN are non-zero in the dilation equation

φ0,0.t/= ∑k∈Z

hk φ1,k.t/

then supp.φ0,0/= [0, N].

Proof. For a proof see Strang and Nguyen (1996), page 195, or Daubechies (1992), page 176.

From the wavelet equation it immediately follows (using the support of the scaling functions) that ifg0, . . . , gM are non-zero in the wavelet equation ψ0,0.t/=Σk∈Z gk φ1,k.t/ then supp.ψ0,0/= [0, 1

2 .N +M/].If we realize biorthogonality by using the alternating flip, i.e. gk = .−1/khN−k and gk = .−1/khM−k, we

find that N =M and M =N. The equivalence is shown as

supp.ψj,k/= [0, 12 .N +M/]= [0, 1

2 .M + N/]= supp.ψj,k/:

A.3. The Voronoi lifting schemeA.3.1. Inverting the Voronoi lifting scheme for a missing site iM

We now move on to consider the case of the Voronoi-based lifting scheme. Let us suppose that we havean order in, . . . , il+1 in which the coefficients will be lifted. Further, let T p be the Delaunay triangulationat any stage p of the ‘one coefficient at a time’ lifting scheme. Similarly, for any point i∈T p, let J

pi denote

the index set of its neighbours in that triangulation. Consider the inverse transformation, in particular theeffect of a single wavelet coefficient ψij with index ij that we removed at stage j. Let

Iq ={sites i that at stage q (i.e. the Delaunay triangulation contains q points/of the inverse transformation have a working value that depends on ψij }:

Note that Iq =∅ for all q<j and Ij ={ij ∪Jjij}, and also that Il ⊆ Il+1 ⊆ . . . ⊆ In:

Now consider one step of the inverse transformation from stage q to stage q+1, i.e. the introduction ofpoint iq+1. Then we have two possibilities: either

(a) ∃k ∈Jq+1iq+1 with k ∈ Iq ⇒ Iq+1 = Iq ∪J

q+1iq+1

∪{iq+1} or(b) �k ∈Jq+1

iq+1 with k ∈ Iq ⇒ Iq+1 = Iq:

The first instance is the case where the new point has a neighbour whose value currently depends on thewavelet. The latter is the case where no such link exists.

Let us suppose that the wavelet coefficient ψij does affect the index iM . Then, for some q, iM ∈ Iq. Let qÅ

be the smallest value for which this occurs, i.e. the first time that iM is affected.

A.3.2. The forward lifting transformationNow let us stop and run the process forwards. Analogously consider the effect in the forward transforma-tion of a single scaling coefficient φnk for some k. Define by Mq the set of indices whose working value isaffected by the coefficient φnk at stage q,

Mq ={sites i that at stage q (i.e. the Delaunay triangulation contains q points) of theforward transformation have a working value that depends on the value at site iM}:

Here k =Mn ⊆Mn−1 ⊆ . . . ⊆Ml and Ml will make explicit which of those entries in the corresponding dualbasis function are non-zero.

Now let us consider a single lifting step from q to q−1 and hence observe the development of Mq−1. Wehave three possibilities for the index iq that we remove:


(a) (b)

Fig. 10. (a) Effect of lifting ip and (b) effect of lifting is

(a) iq ∈Mq, i.e. iq is currently affected by the coefficient φnk ⇒Mq−1 = .Mq ∪Jqiq

/, or(b) ∃ip ∈J

qiq

such that ip ∈Mq, i.e. iq has a neighbour which is currently affected ⇒Mq−1 = .Mq ∪ iq ∪Jqiq

/or

(c) neither of the above, i.e. iq is separated from Mq ⇒Mq−1 =Mq.

A.3.3. Equivalence in the forward and inverse lifting schemesHaving introduced the notation of the previous sections we prove the following lemma.

Lemma 2. Suppose that wavelet ψij affects iM in the inverse transformation. Then ∀Iq �=∅, ∃i∈Mq whichis also contained in Iq.

Proof. Our proof is by induction. Firstly, by definition we have that iM ∈MqÈand iM ∈ IqÈ

. Now let ussuppose that we have any point ip ∈Mr for some j<r<qÅ. Consider lifting index ir. Clearly, ip ∈ Ir−1 andalso ip ∈Mr−1 unless either

(a) we lift index ip itself, i.e. p= r—in this case then ∃il ∈Jrip

such that il ∈ Ir−1; this il must also lie inMr−1; or

(b) we lift a neighbour of ip, say is—this is must have some neighbour it which lies in Ir−1 (this it couldbe ip itself); this it will also lie in Mr−1.

See Fig. 10 for a visual illustration.

Corollary 1.{ wavelets which affect iM

in inverse lifting transformation

}⊆

{wavelets which are affected by iM

in forward lifting transformation

}:

Proof. Consider the specific stage j, i.e. the lifting to create wavelet ψij . Then ∃i∈Mj subject to i∈ Ij ={ij ∪ J

jij}. On lifting site ij we shall have that ij ∈ Mj−1 ⊆ Ml and so wavelet ψij must be affected in for-

ward transformation.

A similar argument to that for lemma 2 but in the opposite direction and following the evolution of Iq

for any wavelet ψi with i∈Ml allows us to deduce the following lemma.

Lemma 3. { wavelets which affect iM

in inverse lifting transformation

}=

{wavelets which are affected by iM

in forward lifting transformation

}:

References

Abramovich, F., Sapatinas, T. and Silverman, B. W. (1998) Wavelet thresholding via a Bayesian approach.J. R. Statist. Soc. B, 60, 725–749.

Cleveland, W. S., Gosse, E. and Shyu, W. M. (1992) Local regression models. In Statistical Models in S, pp. 309–376. Pacific Grove: Wadsworth.

Cressie, N. A. C. (1993) Statistics for Spatial Data. New York: Wiley-Interscience.Daubechies, I. (1988) Orthonormal bases of compactly supported wavelets. Communs Pure Appl. Math., 41,

909–996.


Daubechies, I. (1992) Ten Lectures on Wavelets. Philadelphia: Society for Industrial and Applied Mathematics.Donoho, D. L. and Johnstone, I. M. (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425–455.Gelfand, A. E. and Smith, A. F. M. (1990) Sampling-based approaches to calculating marginal densities. J. Am.

Statist. Ass., 85, 398–409.Gelman, A. and Rubin, D. B. (1992) Inference from iterative simulation using multiple sequences (with discussion).

Statist. Sci., 7, 457–511.Geman, S. and Geman, D. (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of

images. IEEE Trans. Pattn Anal. Mach. Intell., 6, 721–741.Green, P. J. and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: a Roughness

Penalty Approach. London: Chapman and Hall.Jansen, M., Nason, G. P. and Silverman, B. W. (2004) Multivariate non-parametric regression using lifting. Tech-

nical Report. University of Bristol, Bristol.Johnstone, I. M. and Silverman, B. W. (2004) Needles and straw in haystacks: empirical Bayes estimates of possible

sparse sequences. Ann. Statist., 32, 1594–1649.Johnstone, I. M. and Silverman, B. W. (2005a) Empirical Bayes selection of wavelet thresholds. Ann. Statist., 33,

1700–1752.Johnstone, I. M. and Silverman, B. W. (2005b) Ebayesthresh: R programs for Empirical Bayes thresholding.

J. Statist. Softwr., 12, no. 8, 1–38.Mallat, S. G. (1989a) Multiresolution approximations and wavelet orthonormal bases of l2.R/. Trans. Am. Math.

Soc., 315, 69–87.Mallat, S. G. (1989b) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans.

Pattn Anal. Mach. Intell., 11, 674–693.Nychka, D. (2005) Fields package. National Center for Atmospheric Research, Boulder. (Available from

http://www.image.ucar.edu/GSP/Software/Fields.)Strang, G. and Nguyen, T. (1996) Wavelets and Filter Banks. Wellesley: Wellesley-Cambridge.Sweldens, W. (1997) The lifting scheme: a construction of second generation wavelets. SIAM J. Math. Anal., 29,

511–546.