Variable Groupwise Structured Wavelets · VARIABLE GROUPWISE STRUCTURED WAVELETS 3 77 will be only applied on the spatial variables. 78 In this work, we describe a novel construction

VARIABLE GROUPWISE STRUCTURED WAVELETS∗1

Y. FAROUJ† , J.M. FREYERMUTH‡ , L. NAVARRO§ , M. CLAUSEL¶, AND P.2

DELACHARTRE‖3

Abstract. At the origins of the wavelet transform, there were basically two ways to build multi-4dimensional transform from the unidimensional ones. The first and most commonly used construction5construction generalizes the concept of multiresolution analysis (MRA) to multi-dimensional settings6by taking a tensor product of the MRA. The second and most mathematically appealing construc-7tion is formed by the tensor products of one-dimensional wavelet basis functions. These transforms8have plenty of different denominations in the literature, we refer to them as standard and hyperbolic9wavelet transforms respectively. These have been studied and used in a very wide range of applica-10tions from image compression to denoising and classification. In this paper we introduce a generalized11hyperbolic wavelet basis which mixes both standard and hyperbolic ways of building basis accord-12ingly to the nature of the underlying variables in the multidimensional objects. We demonstrate13its enhanced practical performances for denoising images, image sequences and incompressible flows.14Moreover, we adapt this transform to deal with real life data denosing showing how naturally this15generalized hyperbolic wavelet transform can be used with divergence-free wavelets or jointly with16variance stabilization techniques.17

Key words. Structured sparsity, Non-parametric estimation, Hyperbolic wavelets, Data-driven18denoising, Hyperspectral datas, Images sequence, Flow denoising19

AMS subject classifications. 65T60, 94A0820

1. Introduction and Motivation. We consider the problem of recovering an21

unknown multidimensional function f from a noisy observation fε. This task is com-22

mon in various problems related to image processing. We are interested in the follow-23

ing Additive White Gaussian noise (AWGN) model24

(1) fε = f + εξ25

where fε is the observed data, ε ∈ (0,∞) the noise level and ξ ∼ N (0, 1) is a white26

noise. We are interested in cases where the multidimensional function f depends27

on variables with different physical meaning along the different coordinate axes (e.g28

spatial, temporal, spectral, hyperspectral,. . .).29

Non parametric methods for denoising based on wavelets expansions of the func-30

tion fε have been widely developed over the last two decades since the seminal work31

∗Submitted to the editorsFunding: This work was supported by “Region Rhone-Alpes” under the ARC 6. M. Clausel’s

research is supported by the French Agence Nationale de la Recherche (ANR) under reference ANR-13-BS03-0002-01 (ASTRES) and PERSYVAL-Lab (ANR-11-LABX-0025-01). L. Navarro’s researchis supported by the (ANR) under reference ANR-15-CE19-0002 (LBSMI). P. Delachartre is withinthe framework of the LabEX PRIMES (ANR-11-LABX-0063) of the Universite de Lyon. J.-M.Freyermuth’s research was supported by the Engineering and Physical Sciences Research Council[EP/K021672/2]† Medical Image Processing Lab (MIP:Lab), Institute of Bioengineering, EPFL, CH-1015 Lau-

sanne, Switzerland ([email protected]).‡KU Leuven, ORSTAT and Leuven Statistics Research centre, Naamsestraat 69, 3000 Leuven,

Belgium ([email protected]).§University of Lyon, CNRS UMR 6158 LIMOS, INSERM U1059 F4200, SAINBIOSE F-42023

([email protected]).¶University of Grenoble and CNRS, Laboratoire Jean Kuntzmann, UMR 5224 du CNRS, 51 rue

des Mathematiques 38010 Saint Martin d’Heres, France ([email protected]).‖ Universite de Lyon, CREATIS, UMR5220, INSERM U1044, INSA-Lyon, Lyon 69621, France

([email protected]).

1

This manuscript is for review purposes only.

mailto: [email protected]

mailto:[email protected]




2 Y. FAROUJ, J.M. FREYERMUTH, L. NAVARRO, M. CLAUSEL, P. DELACHARTRE

of Donoho and Johnstone [19] who introduced the celebrated wavelet shrinkage proce-32

dure. Thereafter, we propose a novel extension of these wavelet denoising techniques33

to deal with images or volumes of images. Whenever the variables have the same phys-34

ical meaning, it is natural to use the standard (isotropic) multidimensional wavelet35

bases. However, in many applications one is confronted to data in which the vari-36

ables have different natures and hence, the signal or image of interest have different37

properties according to these different variables. In such an anisotropic case the use38

of hyperbolic wavelet bases is natural. We give here some examples:39

40

Spectral, Multispectral and Hyperspectral data A first example is the41

evolutionary spectrum of possibly non-stationary process which is a bidimensional42

function of frequency and time. The evolutionary-spectrum is consistently estimated43

by smoothing the empirical Wigner-Ville distribution, so called prepriodogram in [38].44

In this context emerges the theory of estimation of anisotropic functions based on the45

thresholding of hyperbolic wavelet coefficients of the preperiodogram. Other exam-46

ples include multispectral and hyperspectral imaging; The 3D stack of images have47

completely different regularities in spectral and spatial variables as the data consists48

of the same image with different spectral bands or wavelengths [14, 45].49

50

Image and volume sequences An image or volume temporal sequence in which51

voxel intensities are preserved over time is governed by the following conservation52

equation53

(2) ∂tI + u.∇xI = 0 ,54

where I refers to the sequence and u is the velocity field of the voxels. When the un-55

known is the velocity u, Equation (2) is the so-called optical flow equation and plays a56

major role in computer vision since it is the reference model for motion estimation [29].57

From a PDE’s point of view this equation can be seen as a simple advection equa-58

tion on the scalar field I assuming that the flow is incompressible. Solutions of such59

time–dependent partial differential equations have different degrees of smoothness in60

space and time and they are not well characterized in isotropic spaces [18].61

Velocity fields Now, consider that the velocity field u is given. This situation oc-62

curs for example in blood flow measurement provided by Phase Contrast MRI [35] or63

Ultrasound Doppler [26]. The output u is computed from phase transitions. It usually64

suffers from low velocity–to–noise ratio (VNR) and a denoising step is often needed.65

The velocity field u can also be computed solving some PDE modeling the underlying66

motion. For example, a blood flow verifies variants of Navier-Stokes equations [40].67

In this case, the difference between the temporal and the spatial directions is related68

to their physical nature in the sense that the divergence over spatial dimensions is null.69

70

Partial Data-dependent noise intensity In many applications, the observed71

data cannot be properly modelled by the equation (1). In particular the noise intensity72

can be proportional to the underlying unknown function. As a consequence wavelet73

thresholding methods need to be adapted via variance stabilization techniques. Such74

data–dependent noises models are often purely related to spatial domain because of75

the imaging systems. Hereafter the variance stabilization technique developed in [23]76


VARIABLE GROUPWISE STRUCTURED WAVELETS 3

will be only applied on the spatial variables.77

In this work, we describe a novel construction of a so–called structured wavelet78

basis that can faithfully represent multivariate functions presenting different charac-79

teristics along the different coordinate axis. Our construction extend the so–called80

hyperbolic wavelet basis introduced in [17]. In the same spirit, J. Bigot and T. Sap-81

atinas adressed in [9] the problem of the denoising of time-dependent multivariate82

functions. It has been shown that the hyperbolic wavelet basis is well adapted for83

functions having different degrees of smoothness along the directions [37]. In partic-84

ular, interesting recent results [6] brought to light that classical wavelet estimators85

cannot achieve optimal reconstructions of functions with different regularities on the86

different dimensions. We present a structured wavelets construction which consider87

sparsity assumptions on groups of variables and not only on single variables using a88

generalization of the hyperbolic construction of wavelets. We describe a relevant type89

of functional classes which are described by these structured wavelets basis, and we90

demonstrate that the minimax lower bound of the L2-loss of the estimator is driven by91

the dimension of the largest group, breaking, “partially”, the curse of dimensionality.92

We show that this generalization do not only allow to take regularity features into ac-93

count but also physical characteristic such as divergence-freedom. We also show that94

the obtained construction can handle situations in which the noise characteristics are95

data-dependent but only on a part of the variables.96

There is no reason for the standard multidimensional wavelet basis to be system-97

atically outperformed by the hyperbolic wavelet basis. If the theoretical justification98

does not exists yet, from an empirical point of view we will explain the advantages of99

the primer.100

The rest of the paper is organized as follows. In Section 2, we give a reminder on101

wavelets and their extensions in multivariate cases. The proposed wavelet construction102

is exposed and motivated in Section 3, along with a derivation of the lower bound of103

the L2-loss for a particular functional space. Extensive numerical experiments and104

comparisons are presented in Section 4105

2. Limits of usual wavelet estimation procedures. First, for the sake of106

clarity and ease of reading, we recall some definitions. We begin by introducing107

one–dimensional wavelet bases, then we describe two different extensions in higher108

dimensions.109

One dimensional wavelet bases are defined from a one-dimensional function ψ,110

called mother wavelet and its dilated and translated versions ψj,k(.) = 2j/2ψ(2j .− k)111

with (j, k) ∈ N × Z and a scaling function ϕ defined with its dilated and translated112

versions ϕj,k(.) = 2j/2ϕ(2j .−k). It is then well–known that {ϕ0,k}k∈Z∪{ψj,k}j≤0,k∈Z113

forms an orthogonal basis of L2(R). In the sequel we shall denote ψ−1,k = ϕ0,k.114

In the multivariate setting, one defines multidimensional wavelets as115

ψj1,··· ,jd,k1,··· ,kd(x) = ψj1,k1(x1)⊗ · · · ⊗ ψjd,kd(xd).(3)116

Two possible multidimensional extensions of wavelet bases come from this defi-117

nition. The first one consists in taking all possible combinations of the multiindices118

j = (j1, · · · , jd) ∈ (N ∪ {−1})d, k = (k1, · · · , kd) ∈ Zd leading to the so–called119

“hyperbolic”1 wavelet basis [17] Bhyp,d = {ψj,k}j∈(N∪{−1})d,k∈Zd . The second possi-120

bility is to fix the directional dilation indices to be the same along each dimension121

1This wavelets appears under other names names in the literature, such as mixing scales [41] orrectangular [50].



j = j1 = · · · = jd. Then one first define multidimensional wavelets ψ(i) for any122

i ∈ {0, 1}d \ {(0, · · · , 0)} as123

(4) ψ(i) = ψ(i1) ⊗ · · · ⊗ ψ(id) with ψ(0) = φ, ψ(1) = ψ124

and for any j ≥ −1, k ∈ Zd and i ∈ {0, 1}d \ {(0, · · · , 0)}, one set125

Ψ(i)j,k(x) = 2jd/2ψ(i)(2jx− k)126

We also set for any k ∈ Zd, Ψ(0,··· ,0)−1,k (x) = [φ ⊗ · · · ⊗ φ](x − k). The family Biso,d =127

{Ψ(i)k , j,k} is then an orthonormal wavelet basis of L2(Rd), which is said to be128

“isotropic” [16]. See Figure 1 for a comparison of hyperbolic and classical wavelet129

decompositions on a two-dimensional example.130

In the sequel, we denote as I the indices of a particular wavelet or scaling function.131

The set of all combinations of I will be referred as I. If not specified I can refer both132

to standard or hyperbolic constructions. In the hyperbolic case one has133

I = {I = (j,k) with j = (j1, · · · , jd) ∈ (N ∪ {−1})d, k = (k1, · · · , kd) ∈ Zd}134

whereas in the isotropic case135

136

I = {I = (i, j,k) with i ∈ {0, 1}d\{(0, · · · , 0)}, j ∈ N∪{−1}, k = (k1, · · · , kd) ∈ Zd}137

∪ {I = ((0, · · · , 0),−1,k)k = (k1, · · · , kd) ∈ Zd}138139

The wavelet decomposition of a function f ∈ L2([0, 1]d) is then given by140

(5) f =∑I∈I

βI(f)ΨI ,141

where142

(6) βI(f) =

∫(0,1)d

f(x)ΨI(x) dx.143

Denoising methods based on thresholding the wavelet coefficients has proven its144

effectiveness since the seminal work of Donoho and Johnstone [19]. Under model (1),145

it is possible to recover the unknown function f from its noisy observation fε by146

different procedures on the wavelet empirical wavelet coefficients βI(fε). This results147

in the following estimator148

f =∑I∈Iε

βI(fε)ΨI ,149

where Iε ⊂ I is to the set of multiindices corresponding to the coefficients that are150

kept in the reconstruction. The choice of Iε can be performed following different rules.151

In the sequel we shall consider the most classical one, the so called hard thresholding152

rule where153

Iε = {I ∈ I s.t |j| ≤ j(tε) and |βI(fε)| > tε} ,154

where tε is a threshold calibrated in function of the level of noise ε : tε = 4√

2ε√

log ε−1.155

The integer j(tε) is chosen such that 2−j(tε) ≤ t2ε < 21−tε . The notation |j| allow to156

encompass both the hyperbolic and the isotropic case in the definition of the hard157



House Isotropic Hyperbolic

Fig. 1: Wavelets decomposition in isotropic and hyperbolic settings

thresholding estimator, since we set |j| = j1 + · · · + jd in the case of a multidimen-158

sional index j = (j1, · · · , jd) whereas one has |j| = j in the case of an unidimensional159

scaling index j. Depending on the considered wavelet basis (isotropic or hyperbolic),160

one has then two possible estimations procedures which can be derived from the hard161

thresholding rule : the isotropic hard thresholding estimator and the hyperbolic one.162

We want to compare the numerical performances of these two procedures ac-163

cording to the anisotropic nature of the data. From the theoretical point of view,164

this question has already been addressed and appears in the work of Neumann and165

Von Sachs [38] where the extension of wavelet thresholding techniques to multivari-166

ate anisotropic scenarios is introduced. It is shown that one has better performances167

for hyperbolic wavelets whenever the unknown function has some anisotropic fea-168

tures. Moreover, due to the adaptive nature of the hard thresholding and the fact169

that isotropy is a particular case of anisotropy, it has been proved recently that even170

if the considered data are isotropic, hyperbolic wavelet thresholding gives the same171

theoretical guarantees as isotropic wavelets [6].172

However, empirical evidences contradict this result. We give here a motivational173

experiment using the sequence “Ayiko”. In Figure 2, we consider a spatial frame (i.e,174

we fix time). The denoising experiment uses the hard thresholding rule. Isotropic175

wavelets are slightly better in terms of PSNR. Moreover, undesirable axis-aligned176

artifacts are visible on the reconstruction from noisy data in the hyperbolic settings.177

This results are in contradiction with theoretical estimations. A possible explanation178

for this phenomenon is the biased definition of anisotropic functional spaces which179

consider only axis-aligned regularities, and the fact that the used image, and natural180

images, in general do not have strong differences in terms of smoothness along the181

vertical and the horizontal directions, that is are not strongly anisotropic.182

In Figure 3, a temporal cross section of the sequence is considered (i.e, we fix183

one of the spatial dimensions). The resulting 2D images (see Figure 3) has different184

regularities in time and space and it then is highly anisotropic. In particular, the185

resulting image is highly regular in the temporal dimension2. In this case, the hyper-186

bolic wavelets have clearly superior performance. To have a better understanding of187

this phenomenon, the degree of anisotropy can be seen as maximal when the degree188

of interaction between the variables is small. This degree is called the atomic dimen-189

2Of course, this is not a generalization. It is also possible to have some irregularity in the temporaldimension, for example, when new objects are appearing in the scene.



Original Noisy Isotropic Hyperbolic

26.67 dB 29.45 dB 28.93 dB

Fig. 2: Denoising of one frame of the Ayiko sequence.


24.53 dB 30.97 dB 33.68 dB

Fig. 3: Denoising of a temporal cross section of the sequence Ayiko

sion. Simple examples where the atomic dimension is one, are additive models. For190

example in the two dimensional case, these functions are of the form191

(7) f(x1, x2) = f1(x1) + f2(x2), (x1, x2) ∈ R2.192

Such functions are known to allow rates of convergence corresponding to the one-193

dimensional case [6]. Figure 4 shows an example of denoising under the model (7)194

where f1 and f2 are respectively the standard test functions, Doppler and Blocks [4].195

The results show that the hyperbolic setting outperforms the isotropic setting in this196

case.197

To the best of our knowledge, existing research focuses on cases when the data198

are fully anisotropic, that is all the regularities are different according each direction.199

In this work, we investigate the cases when variables can be grouped in sub-ensembles200

with the same physical meaning. This is motivated by the conclusions of the numerical201

results in this section and by the fact that functions having a group behavior arises202

in many applications such as spatio-temporal and multi-spectral data. This naturally203

suggests the use of wavelet atoms that consider both features. In the next section,204

we discuss the construction of such wavelets, which are an instance of the so-called205

composite wavelets [27] and the definition of associated estimation procedures.206

3. Estimation procedures based on tensored wavelet basis.207

3.1. The tensored wavelet basis and associated estimation procedures.208

We now introduce the structured wavelet basis, which arise as a case of composite209




28.11 dB 32.87 dB 37.72 dB

Fig. 4: Denoising of an additive model.

wavelets defined in [27]. The starting point are N multidimensional wavelet bases210

{ΨI1,1, I1 ∈ I1}, · · · , {ΨIN ,N , IN ∈ IN} of L2(Rd1), · · · , L2(RdN ) respectively. For211

each I = (I1, · · · , IN ) ∈ I1 × · · · × IN , one then set212

ΨI = ΨI1,1 ⊗ · · · ⊗ΨIN ,N213

The family {ΨI, I ∈ I1 × · · · × IN} is then a basis of L2(Rd) with d = d1 + · · ·+ dN .214

Note that if N = d, we recover anhyperbolic wavelet basis as a special case whereas215

in the case N = 1 this construction corresponds to isotropic wavelet basis.216

Any function f ∈ L2(Rd) can then be decomposed in217

f =∑I

βIΨI218

One then define the hard thresholding estimator as219

f =∑I∈Iε

βI(f) ΨI220

with Iε = {I ∈ I1 × · · · × IN , |j| = j1 + · · · + jN ≤ j(λε)} where j(λε) is such that221

2−j(λε) ≤ λ2ε < 21−j(λε) with λε = mε√

log(ε−1).222

The next section aims at proving the optimality of this procedure in the minimax223

sense.224

3.2. Minimax results. To state and prove our minimax results about the hard225

thresholding procedure in the tensored wavelet basis, we first need to introduce226

some appropriate functional spaces, which appear as approximation spaces for our227

structured wavelet bases. These spaces extend the space of functions with dominat-228

ing mixed derivatives introduced in [42] as approximation spaces for the hyperbolic229

wavelet basis230

For a vector α = (α1, · · ·αd) ∈ Nd and function f ∈ L2([0, 1]d), let us define as231

usual its partial derivatives in the distribution sense232

Dαf =∂|α|f

∂xα11 · · · ∂x

αdd

,233

where |α| = α1+· · ·+αd. Given R = (R1, · · · , RN ) ∈ NN , we now define the following234

functional spaces235

FRd,N = {f ∈ L2((0, 1)d) ‖f‖FR

d,N=

N∑i=1

∑r=(r1,··· ,rN )∈Nd1×···×NdN ,|ri|≤Ri

‖Dr1 · · ·DrN f‖L2((0,1)d) <∞}236



We equip these spaces with the norm237

‖f‖FRd,N

= ‖f‖L2((0,1)d) +

N∑i=1

∑r=(r1,··· ,rN )∈Nd1×···×NdN ,|ri|≤Ri

‖Dr1 · · ·DrN f‖L2((0,1)d) .238

We then denote FRd,N (K) the ball of FR

d,N of radius K.239

The space FRd,N contains functions with dominating mixed derivatives which have240

for any ` ∈ {1, · · · , N}, d` variables having the same regularity order R`. Two special241

cases of interest arise from this definition:242

• If ∀i = 1, · · · , N we have r` ∈ R, that is if N = d, we find the so–called spaces243

with dominating mixed derivatives defined in [42]244

FRN = {f ∈ L2((0, 1)N )|

∑(r1,··· ,rN ), ri≤Ri

‖Dr1 · · ·DrN f‖L2((0,1)N ) <∞}245

• If N = 1 and d1 = d, we recover the classical isotropic Sobolev spaces.246

HR = {f ∈ L2((0, 1)d)|∑

r1∈Nd, |r1|≤R

‖Dr1f‖L2((0,1)d) <∞}247

We can now state our main result giving the rate of convergence of the procedure248

defined in Section 3.1 and proving its optimality in the minimax sense :249

Theorem 3.1. Assume that R1 = · · · = RN = R. Let Jε be such as ε22JεdmaxJε =250

2−2JεR, with dmax = max(di). Then251

inff

supf∈Fd,NR (K)

E‖f − f‖22 =(ε4R/(2R+dmax)

) (| log(ε)|N−1

).252

Note that this rate of convergence shows a lower dimension which is different from253

one in the mimimax estimate and thus breaking “partially” the curse of dimensional-254

ity. The lower bound, in this case, is of the order of ε2R/(R+dmax), while it is ε2R/(R+d)255

in the isotropic case and of ε2R/(R+1) in the fully anisotropic case. It is interesting to256

note that the order of the logarithmic term is related to the degrees of freedom of the257

scales vector instead of the usual dimension.258

3.3. Extension of the method to other settings. The construction (3) al-259

lows to consider different families of wavelets along the different dimensions. One260

could for example consider a representation of a piecewise stationary spectrum with261

Haar wavelets along the time and a smooth wavelet along the spatial frequency. The262

hard thresholding can also be modified considering possibly random threshold depend-263

ing not only on the level of noise but also on the index I. It will be the case when264

considering stabilisation of variance approaches. Here, we extend the approach intro-265

duced in Section 3 by pointing out other situations where specific types of wavelets266

and estimation procedure are needed.267

3.3.1. Denoising of spatio temporal incompressible flows. We consider268

the problem of denoising spatio-temporal vector fields with null divergence in space269

such as incompressible flows. Denoising such type of data gains a lot of attention with270

the emergence of Phase contrast MRI imaging [35]. A velocity field is a function of271

(x, t) where x is the spatial variable of dimension d ≥ 2. The wavelet paradigm for272

such data consists in two types of algorithms based on a decomposing the flow in a273



divergence free wavelet frame. The classical spatial denoising [35] do not consider the274

time variable and a thresholding is done on the divergence-free wavelet transform of275

each spatial stack. A more interesting procedure was introduced recently by Bostan276

et al. [11, 10] which considers spatio-temporal regularization. The algorithm considers277

two regularization terms in a variational framework278

(8) f = argmin{||WBdivf ||1 +Rt(f) +1

2||gε − f ||22},279

where WBdivf denotes the vector of the coefficients resulting from the decompo-280

sition of f in free-divergence wavelet basis Bdiv and Rt(f) is a term aiming at reg-281

ularizing f in the temporal dimension on which incompressibility is not maintained.282

Many choices for Rt(f) are possible. The presence of two norms makes the mini-283

mization process challenging. In particular, the direct use of simple soft-thresholding284

algorithms is not possible. Moreover, the convexity is lost with the result of the non-285

uniqueness of a minimum. If Rt(f) is not differentiable, as in the case of the `1-norm,286

the problem needs to be divided into two sub-problems by considering one norm at287

the time in an iterative fashion.288

Here, we propose an alternative approach which consists in using one single norm289

which sparsifies the complete vector. This can be achieved by the the structured290

wavelet construction. Let us define the following spatio temporal wavelets :291

Ψ = ψdivx (x)⊗ ψt(t)(9)292

where ψdiv is a divergence-free spatial wavelet and ψt is a one-dimensional temporal293

wavelet. With our construction, we can replace the estimation procedure (8) by a294

simpler one as a thresholding on the wavelet coefficients of the flow of interest in295

the structured wavelet basis. We also propose to consider other possible bases. For296

example, if temporal discontinuities are not allowed, ψt can be replaced by a Fourier297

basis: family of complex exponentials {eint}.298

3.3.2. Partial data-dependent noise. Another advantage of the tensor con-299

struction given is its ability to deal with the case where the noise dependss on the data.300

Such situations occurs, for example, in dynamic imaging when the noise has a specific301

spatial characteristic due to imaging system which is not preserved in time. We are,302

of course interested in cases where the noise can be removed via known thresholding303

methods. Consider, for example, the following noise model304

(10) fε = f + F (f)ξ305

where F is a non-decreasing function and ξ a white noise. Many noise models are of306

the form (10). Examples are speckle noise in Ultrasound imaging [34] and Poisson307

noise in Photon imaging systems [39, 25]. When an image sequence is considered, each308

image of the sequence follows model (10). For each image, this model can be solved by309

wavelet techniques after a Gaussianizing process on the wavelet coefficients [21, 24].310

The corresponding transform is called the Wavelet-Fisz transform and consists in311

stabilizing the variance by dividing wavelet coefficients by local estimations of the noise312

standard deviation. Now consider the problem of denoising a sequence in which each313

image follows model (10). Using a construction similar to (9) comes naturally. Here,314

instead of a divergence free wavelet transformation, a wavelet-Fisz transformation315

is considered; the spatio-temporal coefficients are stabilized in the spatial dimension316



before applying a hard thresholding procedure. The next section is devoted to an317

exhaustive experimental study on the performance of structured wavelet construction318

in practice.319

4. Experiments. In this section, we present some experiments to illustrate the320

effectiveness of the structured variable grouping presented earlier. We start with321

the simple problem of sequence denoising under model (1), then we consider the322

denoising of hyperspectral data, velocity flows and finally denoising under partially323

data-dependent noise models. We aim at showing the merits of using structured324

wavelets presented above compared to the classical constructions. Therefore we com-325

pare our results with those obtained by wavelet thresholding which do not take all326

variables into account (2D-Wavelets) and isotropic multivariate wavelet thresholding327

techniques (3D-Wavelets) which acts in the same manner on all variables. We also328

use simple universal hard thresholding rules. Except the velocity flows, all data have329

a [0−255] grey-values range. At each noise level, we compared results of the different330

methods using the classical Peak Signal to Noise Ratio (PSNR) as a criteria. In the331

case of data-dependent noise, we also show the difference between the true image and332

the denoised result of every method. This is known in the literature as the method333

noise [12]. For the wavelet filters, Daubechies wavelets with 6 vanishing moments are334

used in the different directions for the 2D-Wavelets and the Stuctured wavelets, while335

the 3D-Wavelets are complex isotropic filters [44].336

In each case, we first define sets of variables that we shall group. On these vari-337

ables the appropriate wavelet transform is taken (isotropic, divergence-free) and the338

appropriate estimator is defined (isotropic hard thresholding, hyperbolic hard thresh-339

olding, Fisz, etc· · · ). The rest of variables are taken into account by the standard340

tensor product. It is interesting to note that the complexity of the construction in341

dimension d is O(Nd) as for classical isotropic wavelets, no matter what is the or-342

der of the variables. Figure 5 shows a particular case construction for d = 3 in which343

isotropy in {x1, x2} is considered. This latter case will be of special interest, as x3 will344

refer to the time variable. It can be seen that the obtained atoms have an isotropic345

support along x1 and x2 while it is different along x3.346

The 3D atom x1 − x2 plane x1 − x3 plane

Fig. 5: Example of a Structured Wavelet atoms.

We now detail the different cases that we considered.347

4.1. Image Sequence denoising. The utility of considering isotropy in space348

and anisotropy in time was discussed and motivated in Section 2. We also want to show349



the merit of the spatio-temporal treatment in general. We consider two images for our350

synthetic experiments. The first sequence “Ayiko” available on the web 3. The second351

sequence “Heart” is generated by the ASSESS software [15]. This software applies a352

simple kinematic model of the heart deformation on an initial image. The result is a353

non-corrupted realistic simulation of a cardiac MRI sequence. We corrupted the two354

sequences with respect to noise model (7) by picking a noise level ε ∈ {5, 10, 15, 20}.355

PSNR results are given in 1. Results obtained by the structured wavelets are clearly356

superior to those obtained by 2D-Wavelets and 3D-Wavelets at all noise levels. The357

benefit of taking the temporal dimension into account can be observed as 3D-Wavelets358

give better results than 2D-Wavelets. As the visualization is done frame by frame, we359

show in figure the temporal evolution of the PSNR for each method. It can be observed360

that, in the case of the 2D-Wavelet approach it imposes a local treatment in time, as361

a consequence there is no temporal evolution of the PSNR. It can be observed that362

the results of 3D-Wavelets and structured wavelets demonstrate a time-dependent363

behaviour. In particular, the PSNR starts and finishes with low values as there is364

not enough information in the temporal domain and has higher values in the rest of365

the sequence. In the case of structured wavelets some sudden drops in the PSNR are366

observed due to temporal discontinuities. Note that an interesting phenomenon is367

observed in the case of the “Heart” sequence as the periodicity of the cardiac cycles368

can be seen in the PSNR results. This is due to the variations in deformation and369

discontinuities between cycles as the heart motion changes direction introducing a370

strong temporal discontinuity.371

Original Noisy (σ = 10) 2D-wavelets 3D-Wavelets StructuredWavelets

Fig. 6: Results of various methods applied to the 20th image of the sequence “Ayiko”.Quantitative evaluation is given in Table 1.

4.2. Spectral Denoising. We considered also two examples of hyperspectral372

and multispectral data. The first sample “Uncle Bens” is taken from the Multispectral373

database4. It consists of 31 images with a resolution reflectance ranging from 400nm374

to 700nm. The second sample “Indian Pine” from the AVIRIS hyperspectral sensor375

database5. It contains 220 images, each one representing a specific wavelength. The376

multispectral data was modified to have a dyadic size in the spectral dimension for377

the wavelet transform. Note that the Structured Wavelet approach does not require378

to have the same size in all dimension as the 3D-Wavelet approach. We show the379

results only for the original 31 samples.380

3http://see.xidian.edu.cn/vipsl/database Video.html4http://www2.cmp.uea.ac.uk/Research/compvis/MultiSpectralDB.htm5https://purr.purdue.edu/publications/1947


http://see.xidian.edu.cn/vipsl/database_Video.html

http://www2.cmp.uea.ac.uk/Research/compvis/MultiSpectralDB.htm

https://purr.purdue.edu/publications/1947



Fig. 7: Results of various methods applied to the 20th image of the sequence “Heart”.Quantitative evaluation is given in Table 1.

0 20 40 60 80 100 120 140

24

26

28

30

32

34

36

Frame

PSNR

(dB)

Noisy 2D−Wavelets 3D−Wavelets Structured Wavelets

“Ayiko”

0 20 40 60 80 100 120 140

24

26

28

30

32

34

36

Frame

PSNR

(dB)


“Heart”

Fig. 8: PSNR evolution for different methods applied to the sequences “Ayiko” and“Heart”.

“Ayiko” “Heart”σ 2D 3D Structured 2D 3D Structured5 35.05 38.12 40.48 35.60 38.54 41.2310 30.33 33.62 36.16 31.41 34.53 37.1225 27.75 30.92 33.68 29.00 32.24 34.7020 26.03 29.11 31.88 27.33 30.70 32.96

Table 1: Quantitative comparison (PSNR) for different methods applied to “Ayiko”and “Heart” at different noise levels.



“Uncle Bens” “Indian Pines”

Fig. 9: 3D Stacks : “Indian Pines” and “Uncle Bens”.


Fig. 10: Results of various methods applied to the 20th image of the stack “UncleBens”. Quantitative evaluation is given in Table 2.


Fig. 11: Results of various methods applied to the image 10th image of the stack“Indian Pines”. Quantitative evaluation is given in Table 2.

4.3. Incompressible flows Denoising. The experiments were done on a syn-381

thetic velocity map which was initiated by a null divergence vector flow (u, v) where382

the horizontal and vertical component are given respectively by u0(x, y) = sin(2πx)2 sin(4πy)383

and v0(x, y) = − sin(2πy)2 sin(4πx). The temporal evolution was governed by a rota-384



0 5 10 15 20 25 30 35

15

20

25

30

35

Image

PSNR

(dB)


“Uncle Bens”

0 20 40 60 80 100 120 140

−20

−10

0

10

20

30

40

Image

PSNR

(dB)


“Indian Pines”

Fig. 12: PSNR evolution for different methods applied to the data “Uncle Bens” and“Indian Pines”.

“Uncle Bens” “Indian Pines”σ 2D 3D Structured 2D 3D Structured5 36.14 37.78 39.41 35.26 35.64 37.3810 32.26 34.24 36.24 31.81 32.22 34.3615 30.11 32.17 34.16 30.08 30.55 32.7020 28.67 30.65 32.53 28.94 29.36 31.42

Table 2: Quantitative comparison (PSNR) for different methods applied to “UncleBens” and “Indian Pines” at different noise levels.

tion matrix u(x, y, t + 1) = u(x, y, t) + hu(x, y, t) and v(x, y) = v(x, y, t)− hv(x, y, t)385

where h is fixed. The result is a velocity map in form of 4 vertices with increas-386

ing velocity (e.g Figure 13). We corrupted this flow with gaussian noise where387

σ = 0.1, 0.3, 0.5. We tested spatial two-dimensional divergence free wavelets and388

spatio-temporal structured divergence free wavelets. Here, the 3D construction is not389

appropriate as divergence freedom cannot be imposed in time. We used divergence free390

wavelets with periodic boundary condition introduced by Harouna and Perrier [30].391

A visual comparison of the results is given in Figure 14 and Figure 15. The vector392

flow obtained using the spatio-temporal structured wavelets are visually smoother393

and still preserve divergence freedom. As the velocity is increasing and the the noise394

variance is constant the PSNR is increasing within time Figure 16. Note that the395

spatio-temporal approach fails in some time steps because of the discontinuities but396

its overall performance is better than the one of the spatial approach as it can be seen397

in Table 3.398



0 5 10 15 20 250

5

10

15

20

25

x-axis

y-axis

(a)

x-axisy-axis

10 20 30 40 50 60

10

20

30

40

50

60−1.5

−1

−0.5

0

0.5

1

1.5

(b)

x-axis

Tim

e

10 20 30 40 50 60

10

20

30

40

50

60−4

−3

−2

−1

0

1

2

3

4

(c)

Fig. 13: 2D+t Flow data : (a) Vector field at the 10th time step. (b) Horizontalvelocity at the 10th time step. (c) Temporal evolution of the horizontal velocity for afixed vertical plan.

Horizontal velocity Vertical velocityσ 2D Structured 2D Structured

0, 1 46.64 49.00 46.65 48.900, 3 39.65 43.83 39.51 43.530, 5 35.49 39.65 35.47 39.36

Table 3: Quantitative comparison (PSNR) for different methods for horizontal andvertical velocity at different noise levels.

4.4. Mixed models Denoising. Here we consider the model (10), we have a399

particular interest in models arising from ultrasound imaging [34] in which h(f) =√f .400

Our experiments are performed with respect to this model for σ = {1, 2, 3, 4}. We401

used spatial isotropic wavelets with Fisz variance stabilization [22, 24]. We compared402

the spatial approach to the spatio-temporal approach. Note that again classical 3D403

wavelets are not applicable for this problem. Visual comparisons for the sequences404

“Ayiko” and “Heart” are given in Figure 17 and Figure 18. Note that the spatio-405

temporal approach provided by the structured wavelet construction provides a consid-406

erable visual improvement compared to the spatial approach. In the PSNR evolution407

in Figure 20, we can observe again the temporal structure linked to the heart motion408

and the superiority of structured wavelets locally and and globally (e.g Table 4).409

5. Discussion & Perspectives. We give in this section a discussion about the410

positioning of our work compared to existing works on structured group-sparsity. We411

also mention some orientations for future works on the wavelet construction presented412

in this part of the thesis.413

5.1. Some related work on structured group-sparsity. First, we want to414

note that a similar construction to the one presented in the paragraph 3.1 was pro-415

posed in [20] for compressed sensing which is also motivated by video acquisition and416

hyperspectral imaging. There, the purpose was to study the efficiency of considering417

Kronecker products of different bases for multivariate signals with different regulari-418



0 5 10 15 20 250

5

10

15

20

25

x-axis

y-axis

(a)

0 5 10 15 20 250

5

10

15

20

25

x-axisy-axis

(b)

0 5 10 15 20 250

5

10

15

20

25

x-axis

y-axis

(c)

x-axis

y-axis

10 20 30 40 50 60

10

20

30

40

50

60

−2

−1

0

1

2

(d)

x-axis

y-axis

10 20 30 40 50 60

10

20

30

40

50

60−1.5

−1

−0.5

0

0.5

1

1.5

2

(e)

x-axis

y-axis

10 20 30 40 50 60

10

20

30

40

50

60

−2

−1.5

−1

−0.5

0

0.5

1

1.5

(f)

Fig. 14: Example of results obtained by the two methods for σ = 0.3 at the 10th

time step. (a) Noisy flow, (b) 2D-Wavelets reconstruction of the flow, (c)StructuredWavelets reconstruction of the flow, (d) Noisy horizontal velocity, (e) 2D-Waveletsreconstruction of the horizontal velocity., (f) Stuctured Wavelets reconstruction ofthe horizontal velocity.

Test Kidney“Ayiko” “Heart”

σ 2D Structured 2D Structured1 28.82 29.66 32.93 34.312 27.13 29.26 30.44 33.283 25.64 28.82 28.48 32.274 24.42 28.39 26.90 31.34

Table 4: Quantitative comparison (PSNR) for different methods applied to “Ayiko”and “Heart” at different noise levels.

ties. In particular, the authors studied the sparsifying properties and conditions for419

which these bases can be used in the compressed sensing theory. Structured sparsity420

have proven to be useful in denoising, existing structures consider groupe-wise spar-421

sity as in Block Thresholding [13], Hierarchical sparsity as in Tree Thresholding [8]422

or combinations the two [7]. Imposing sparsity in a variational framework can be423



0 5 10 15 20 250

5

10

15

20

25

x-axis

y-axis

(a)

x-axis

y-axis

10 20 30 40 50 60

10

20

30

40

50

60

−2

−1

0

1

2

(b)

0 5 10 15 20 250

5

10

15

20

25

x-axis

y-axis

(c)

x-axis

y-axis

10 20 30 40 50 60

10

20

30

40

50

60−1.5

−1

−0.5

0

0.5

1

1.5

2

(d)

0 5 10 15 20 250

5

10

15

20

25

x-axis

y-axis

(e)

x-axis

y-axis

10 20 30 40 50 60

10

20

30

40

50

60

−2

−1.5

−1

−0.5

0

0.5

1

1.5

(f)

Fig. 15: Example of results obtained by the two methods for σ = 0.3 at the 10th timestep. (a) Noisy flow, (b) Noisy horizontal velocity, (c) 2D-Wavelets reconstructionof the flow, (d) 2D-Wavelets reconstruction of the horizontal velocity, (c) StructuredWavelets reconstruction of the flow, (d) Stuctured Wavelets reconstruction of thehorizontal velocity.

done via minimizing a `1-regularized least squares functional. This is known as the424

Lasso problem [46]. Structuring the regularization term in groups is known as the425

group-lasso [36] and has other extensions such as the Fused Lasso [47] and the Group426

Fused Lasso [3]. All these paradigms consider structures directly on the sparse rep-427



0 10 20 30 40 50 60 70

10

15

20

25

30

35

40

45

Spatial step

PSNR

(dB)

Noisy 2D−Wavelets Structured Wavelets

Horizontal velocity

0 10 20 30 40 50 60 70

10

15

20

25

30

35

40

45

Image

PSNR

(dB)


Vertical velocity

Fig. 16: PSNR evolution for the two methods.

Original Noisy (σ = 2) 2D-wavelets StructuredWavelets

Fig. 17: Example of results obtained by the two methods for the 20th image of thesequence “Ayiko” with σ = 0.3.

resentation (i.e after transformation). In our work, we aimed at showing the merit428

of considering the group selection on variables before transformation. As mentioned429

before, many works in the statistical literature demonstrated the advantages of using430

hyperbolic wavelets in multivariate analysis. Overcoming the curse of dimensionality431

has been pointed first in [37] with recent contributions in [5]. In all these works,432

anisotropy was considered on single variables and the authors aimed at dropping the433

error to dimension one.434

5.2. Methodological extensions. The minimax results obtained in this pa-435

per were computed assuming that the same regularity parameter along he different436

groups of variables. As mentioned by [37], it would be also interesting to consider the437

different regularity parameters. This is, actually, more appealing when considering438

problems with different regularities such as the experiments presented in Section 4.439



Original Noisy (σ = 2) 2D-wavelets StructuredWavelets

Fig. 18: Example of results obtained by the two methods for the 20th image of thesequence “Heart” with σ = 0.3.

Original 2D-wavelets StructuredWavelets

Fig. 19: Method noise: various methods applied to the 20th image of the sequence“Heart”. Quantitative evaluation is given in Table 1.

One can predict that the hardest regularity (smallest parameter) appears in the rate440

of convergence. Still, this is an open question, particularly, how this parameter will441

be combined with the dimension of the largest group. For Functional Data Analysis442

within the functional analysis of variance (FANOVA) framework [2, 1], we believe443

that the structured wavelet construction might give new ways of gathering variables444

presenting similar behaviour, especially, in very high dimensions. Finally, similarly to445

considering divergence-free wavelets on some variables, it is also possible to consider446

operator-like wavelets [48, 33]. This type of wavelets can be used, for instance, to447

invert convolution operators for joint deconvolution/denoising [32]. The structured448

construction can deal with problems in which the data is blurred only on some of the449

variables.450

6. Appendix.451

6.1. Preliminary results on structured wavelets and function spaces.452

In what follows, we shall need a structured wavelet characterization of the functional453

spaces FRN . We have the following result, which extend Theorem 3.1 of [43]454

Theorem 6.1. Assume that the structured wavelet basis is sufficiently regular.455



0 20 40 60 80 100 120 140

16

18

20

22

24

26

28

30

Image

PSNR

(dB)


“Ayiko”

0 20 40 60 80 100 120 140

18

20

22

24

26

28

30

32

34

Image

PSNR

(dB)


“Heart”

Fig. 20: PSNR evolution for different methods applied to the sequences “Ayiko” and“Heart”.

Then f ∈ FRN,d iff456

‖f‖2hyp,FRN,d

=∑i

∑j

(1 + 22(

∑` j`R`)

) ∑k∈ZN

|βI|2 <∞ .457

In addition, the two norms ‖ · ‖FRN,d

and ‖ · ‖hyp,FRN,d

are equivalent.458

These functional spaces can be related in a rather classical way to their weak coun-459

terpart. More precisely, let us define the following weak space :460

WN,d(r) = {f ∈ L2, supλ>0

λr−2∑I

|βI(f)|21|βI(f)|<λ <∞} .461

As in Lemma 2.2 of [31], we can prove that we have an alternative definition of these462

spaces :463

Proposition 6.2. Let r ∈ (0, 2). Then464

WN,d(r) = {f ∈ L2, supλ>0

λr∑I

1|βI(f)|>λ <∞} .465

Proof466

Let us first prove that if f ∈WN,d(r) then supλ>0 λr∑

I 1|βI(f)|>λ <∞. Indeed,467 ∑I

1|βI(f)|>λ =∑I

∑`≥0

12`λ≤|βI(f)|<2`+1λ468

≤∑`≥0

∑I

(2`λ)−2|βI(f)|1|βI(f)|<2`+1λ469



We now use the assumption that f ∈WN,d(r) which implies that470 ∑I

|βI(f)|21|βI(f)|<2`+1λ ≤ (2`+1λ)2−r471

Hence472 ∑I

1|βI(f)|>λ ≤∑`≥0

(2`λ)−2(2`+1λ)2−r ≤ 22−rλ−r

∑`≥0

2−`r

473

Since r > 0, the last sum converges and we get the first inclusion.474

Conversely, let us assume that supλ>0 λr∑

I 1|βI|>λ < ∞ and let us prove that475

f ∈WN,d(r). Indeed,476 ∑I

|βI|2 1|βI|<λ =∑I

∑`≥0

|βI(f)|2 12−`−1λ<|βI(f)|<2−`λ477

≤∑`≥0

∑I

(2−`λ)2 12−`−1λ<|βI(f)|<2−`λ478

We now use the assumption supλ>0 λr∑

I 1|βI(f)|>λ <∞ and deduce that479 ∑I

|βI(f)|2 1|βI|<λ ≤∑`≥0

(2−`λ)2(2−`−1λ)−r480

≤ λ2−r∑`≥0

2−`(2−r)

481

Since 2− r > 0 the last sum converges and we can conclude.482

The following proposition gives some details about the embeddings between the483

spaces FRN,d and WN,d(r) :484

Proposition 6.3. Assume that R1 = · · · = RN = R. Set r = 2dmax/(dmax+2R).485

Then486

FRN,d ⊂WN,d,logN−1(r) .487

Proof488

Let f ∈ FRN,d. Define Jλ such that 2−Jλ ≤ λ

22R+dmax ≤ 2−Jλ+1. Observe that489 ∑

I

|βI(f)|21|βI(f)|≤λ ≤∑|j|≤Jλ

∑k

|βI(f)|21|βI(f)|≤λ +∑|j|>Jλ

∑k

|βI(f)|21|βI(f)|≤λ490

≤ λ2∑|j|≤Jλ

∑k

1 +∑`>Jλ

∑|j|=`

2−2R|j|491

≤ λ2∑|j|≤Jλ

2j1d1+···+jNdN +∑`>Jλ

2−2R``N−1492

≤ λ2∑`≤Jλ

∑|j|=`

2|j|dmax +∑

`>Jλ,R

2−2R``N−1493

≤ λ22JλdmaxJN−1λ + 2−2RJλJN−1λ494

where in the two last inequalities we used the assumption f ∈ FRN,d and Theorem 6.1.495

The conclusion comes from the definition of the index Jλ,N and of that of the space496

WN,d(r) for r = 2dmax/(dmax + 2R).497

We can now use all these results to deduce the minimax results stated in Sec-498

tion 3.2.499



6.2. Proof of upper bound. We fix some function f ∈ FRN,d and bound as500

usual the quadratic risk E‖f − f‖2L2 :501

E‖fε − f‖2L2 ≤∑i

∑j1+···+jN>Jε

[βI(f)]2502

+∑i

∑j1+···+jN≤Jε

βI(f)2E[1|βI(fε)|≤λε ]503

+∑i

∑j1+···+jN≤Jε

E[|βI(fε)− βI(f)|21|βI(fε)|>λε ]504

We first bound the sum∑

i

∑j1+···+jN>Jε [βI(f)]2 using the assumption f ∈ FRN,d and505

Theorem 6.1. Since R1 = · · · = RN = R and j1 + · · ·+ jN > Jε, one deduces that506 ∑i

∑j1+···+jN>Jε

[β2I ] ≤ C2−2RJεJN−1ε .507

We now bound the sum∑

i

∑j1+···+jN≤Jε

∑k βI(f)2E[1|βI(fε)|≤λε ]. Observe that if508

|βI(fε)| ≤ λε, either |βI(f)| ≤ 2λε either |βI(fε)− βI(f)| > λε. It implies that509 ∑i

∑j1+···+jN≤Jε

∑k

βI(f)2E[1|βI(fε)|≤λε ] ≤∑i

∑j1+···+jN≤Jε

∑k

βI(f)2E[1|βI(f)|≤λε ]510

+∑i

∑j1+···+jN≤Jε

∑k

βI(f)2E[1|βI(fε)−βI(f)|>λε ]511

The sum∑

i

∑j1+···+jN≤Jε βI(f)2E[1|βI(f)|≤λε ] is deterministic and equals∑

i

∑j1+···+jN≤Jε

βI(f)21|βI(f)|≤λε .

It can then be bounded using the embedding proved in Proposition 6.3 and the defi-512

nition of the spaces WN,d(r).513

The bound of the sum∑i

∑j1+···+jN≤Jε

∑k βI(f)2E1|βI(fε)−βI(f)|>λε comes from514

the Gaussian assumption on the noise and the classical concentration inequality515

P(|Z| > λ) ≤ Ce−λ2/2 valid for any standard Gaussian random variable Z which516

implies that517 ∑i

∑j1+···+jN≤Jε

βI(f)2E[1|βI(fε)−βI(f)|>λε ] =∑i

∑j1+···+jN≤Jε

βI(f)2P[|(βI(fε)− βI(f))/ε| > λε/ε]518

≤ ε∑i

∑j1+···+jN≤Jε

βI(f)2 = Cεm2/2

519

We then deduce that520 ∑i

∑j1+···+jN≤Jε

∑k

βI(f)2E1|βI(fε)|≤λε ≤ Cεm2/2 + λ4R/(2R+dmax)

ε ≤ Cλ4R/(2R+dmax)ε521

as soon as m > 2√

2.522

Finally the bound of the sum∑i

∑j1+···+jN≤Jε

∑k E|βI(fε)−βI(f)|21|βI(fε)|>λε523

shall follow from Propositions 6.2 and 6.3. Indeed, one has by the Cauchy Schwartz524



inequality and as in the bound of the last sum525 ∑i

∑j1+···+jN≤Jε

∑k

E|βI(fε)− βI(f)|21|βI(fε)|>λε526

≤ ε2∑

1|βI(f)|>λε + 2Jε/2ε2∑

P1/2[|βI(fε)− βI(f)| > λε/2]527

≤ Cλ4R/(2R+dmax)ε | log(λε))|N−1 + 2Jε/2εm

2/16 ≤ Cλ4R/(2R+dmax)ε | log(λε))|N−1528

as soon as m > 8. The last display is deduced from Proposition 6.3 and the classi-529

cal concentration inequality for standard Gaussian random variables. Gathering the530

bound of the three sums∑

i

∑j1+···+jN≤Jε βI(f)2E[1|βI(f)|≤λε ],

∑i

∑j1+···+jN≤Jε

∑k βI(f)2E1|βI(fε)−βI(f)|>λε531

and∑i

∑j1+···+jN≤Jε

∑k E|βI(fε) − βI(f)|21|βI(fε)|>λε , we get the upper bound532

stated in Theorem 3.1.533

6.3. Proof of lower bound. The derivation of a lower bound for wavelet func-534

tion estimation on FRN,d(K) relies on a classical procedure for constructing minimax535

lower bounds in non-parametric estimation known as the Assouad’s method. We give536

here a version of this lemma when functional spaces and quadratic L2 distances are537

considered (see Lemma 2 in [49] and Lemma 10.2 in [28]).538

Lemma 6.4 (Assouad’s lemma). Let V be a functional space containing a set of539

functions {gτ}τ with τ ∈ {0, 1}m. For each couple τ and τ ′, we write τ ∼ τ ′ if τ and540

τ ′ differs in only one coordinate, and τ ∼kτ ′ if it is the kth coordinate. If we assume541

that for any k542

infτ∼kτ ′E‖gτ − gτ ′‖2L2 ≥ δ543

Then, any estimator f of a function f ∈ V verifies544

maxgτ

E‖f − gτ‖2L2 ≥ mδ

2minτ∼τ ′{Λ(Pτ , Pτ ′)},545

where Pτ is the probability measure associated to gτ and the affinity Λ is given by546

Λ(Pτ , Pτ ′) = 1− |Pτ − Pτ′ |1

2.547

In order to use this lemma for constructing the lower bound, we need first to reduce548

the problem to a parametric family of the form {gτ}τ . The value of m depends on549

a fixed scale which calibrates the risk in order to have an optimal convergence. One550

expects that such scale defines the limit between finest scales for which the coefficients551

are small and so dominated by the smoothness (i.e encode noise on some coefficients)552

and coarse scales. As coarse scales do not contribute to the risk, the error is driven by553

the error made on the hardest scale of fine scales which we denote Jε. We are given554

C0 > 0 not depending on ε > 0 and Jε. Thererafter, we define the following family555

{gτ}τ depending on ε, C0. For any τ = (τI) ∈ {0, 1}m with m = #{I s.t |j| = Jε},556

we set557

gτ =∑i

∑j, j1+···+jN=Jε

βI,τψI with βI,τ =

{0 if τI = 0C0ε otherwise.

,558

We first check that this family belongs to the class V = FRN,d(K) :559



Lemma 6.5. Assume that ε22JεdmaxJε � 2−2JεR, with dmax = max(di). Then we560

have561

{gτ}τ ⊆ FRN,d(K)562

Proof563

It directly comes from the definition of the class FRN,d(K) and from Theorem 6.1.564

All that remains, now, is to apply lemma 6.4 for the class FRN,d(K) and the565

parametric family {gτ}τ . First, note that for the family {gτ}τ , we have566

δ = C20ε

2.567

Note also that the hypercube dimension m is given by the cardinality of the568

coefficients at the scale Jε.569

(11) m = #{I s.t |j| = Jε} = JN−1ε 2(∑ni=1 diji),570

Hence,571

maxgτ

E‖f − gτ‖2L2 ≥ C20ε

2JN−1ε 2(∑ni=1 diji) min

τ∼τ ′{Λ(Pτ , Pτ ′)},572

≥ C20ε

2JN−1ε 2Jεdmax minτ∼τ ′{Λ(Pτ , Pτ ′)},573

As usual, the calibration ε22JεdmaxJε � 2−2JεR imposes that Jε is of the same574

order as log(ε−1) which yields to575

(12) 2Jε =(ε2[log(ε−1)]N−1

)−1/(2R+dmax)

.576

Thus577

maxgτ

E‖f − gτ‖2L2 ≥ C20ε

2(ε2 log(ε−1)]N−1

)−dmax/(2R+dmax)

minτ∼τ ′{Λ(Pτ , Pτ ′)}.578

Finally, by definition the affinity Λ takes only positive values, which gives579

maxgτ

E‖f − gτ‖2L2 ≥ C(ε2 log(ε−1)]N−1

)2R/(2R+dmax)

.580

with C = C20 minτ∼τ ′{Λ(Pτ , Pτ ′)}, which ends the proof.581

REFERENCES582

[1] F. Abramovich and C. Angelini, Testing in mixed-effects fanova models, Journal of statistical583planning and inference, 136 (2006), pp. 4326–4348.584

[2] F. Abramovich, A. Antoniadis, T. Sapatinas, and B. Vidakovic, Optimal testing in a585fixed-effects functional analysis of variance model, International Journal of Wavelets, Mul-586tiresolution and Information Processing, 2 (2004), pp. 323–349.587

[3] C. M. Alaız, A. Barbero, and J. R. Dorronsoro, Group fused lasso, in Artificial Neural588Networks and Machine Learning–ICANN 2013, Springer, 2013, pp. 66–73.589

[4] A. Antoniadis, J. Bigot, and T. Sapatinas, Wavelet estimators in nonparametric regression:590a comparative simulation study, Journal of Statistical Software, 6 (2001), pp. pp–1.591



[5] F. Autin, G. Claeskens, and J. Freyermuth, Hyperbolic wavelet thresholding methods and592the curse of dimensionality through the maxiset approach, Appl. Comput. Harmon. Anal.,59336 (2014), pp. 239–255.594

[6] F. Autin, G. Claeskens, and J. Freyermuth, Asymptotic performance of projection estima-595tors in standard and hyperbolic wavelet bases, To appear in Electronic journal of statistics,596(2015).597

[7] F. Autin, J.-M. Freyermuth, and R. von Sachs, Combining thresholding rules: a new way598to improve the performance of wavelet estimators, Journal of Nonparametric Statistics, 24599(2012), pp. 905–922.600

[8] R. G. Baraniuk, Optimal tree approximation with wavelets, in SPIE’s International Sym-601posium on Optical Science, Engineering, and Instrumentation, International Society for602Optics and Photonics, 1999, pp. 196–207.603

[9] J. Bigot and T. Sapatinas, Nonparametric adaptive time-dependent multivariate function604estimation, (2012).605

[10] E. Bostan, M. Unser, and J. P. Ward, Divergence-free wavelet frames, Signal Processing606Letters, IEEE, 22 (2015), pp. 1142–1146.607

[11] E. Bostan, O. Vardoulis, D. Piccini, P. D. Tafti, N. Stergiopulos, and M. Unser,608Spatio-temporal regularization of flow-fields, in Biomedical Imaging (ISBI), 2013 IEEE60910th International Symposium on, Ieee, 2013, pp. 836–839.610

[12] A. Buades, B. Coll, and J.-M. Morel, A non-local algorithm for image denoising, in Pro-611ceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern612Recognition (CVPR’05), vol. 2, 2005, pp. 60–65.613

[13] T. T. Cai, On block thresholding in wavelet regression: Adaptivity, block size, and threshold614level, Statistica Sinica, (2002), pp. 1241–1273.615

[14] C.-I. Chang, Hyperspectral imaging: techniques for spectral detection and classification, vol. 1,616Springer Science & Business Media, 2003.617

[15] P. Clarysse, J. Tafazzoli, P. Delachartre, and P. Croisille, Simulation based eval-618uation of cardiac motion estimation methods in tagged-mr image sequences, Journal619of Cardiovascular Magnetic Resonance, 13 (2011), p. P360, https://doi.org/10.1186/6201532-429X-13-S1-P360, http://jcmr-online.com/content/13/S1/P360.621

[16] I. Daubechies, Ten lectures on wavelets, vol. 61, SIAM, 1992.622[17] R. DeVore, S. Konyagin, and V. Temlyakov, Hyperbolic wavelet approximation, Construc-623

tive Approximation, 14 (1998), pp. 1–26.624[18] R. DeVore, G. Petrova, and P. Wojtaszczyk, Anisotropic smoothness spaces via level sets,625

Communications on Pure and Applied Mathematics, 61 (2008), pp. 1264–1297.626[19] D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika,627

81 (1994), pp. 425–455.628[20] M. F. Duarte and R. G. Baraniuk, Image Processing, IEEE Transactions on, 21 (2012),629

pp. 494–504.630[21] J. M. Fadili, J. Mathieu, B. Romaniuk, and M. Desvignes, Bayesian wavelet-based poisson631

intensity estimation of images using the fisz transformation, in International conference632on image and signal processing, vol. 1, 2003, pp. 242–253.633

[22] J. M. Fadili, J. Mathieu, B. Romaniuk, and M. Desvignes, Bayesian wavelet-based pois-634son intensity estimation of images using the fisz transformation, in Proc.International635Conference on Image and Signal Processing, Agadir, Morocco, 2003, pp. 242–253.636

[23] Y. Farouj, J.-M. Freyermuth, L. Navarro, M. Clausel, and P. Delachartre, IEEE637Transactions on Computational Imaging, 3 (2017), pp. 1–10.638

[24] P. Fryzlewicz, Data-driven wavelet-fisz methodology for nonparametric function estimation,639Electronic journal of statistics, 2 (2008), pp. 863–896.640

[25] P. Fryzlewicz and G. P. Nason, A haar-fisz algorithm for poisson intensity estimation,641Journal of computational and graphical statistics, 13 (2004), pp. 621–638.642

[26] D. Garcia, J. C. del Alamo, D. Tanne, R. Yotti, C. Cortina, E. Bertrand, J. C. An-643toranz, E. Perez-David, R. Rieu, F. Fernandez-Aviles, et al., Two-dimensional intra-644ventricular flow mapping by digital processing conventional color-doppler echocardiography645images, Medical Imaging, IEEE Transactions on, 29 (2010), pp. 1701–1713.646

[27] K. Guo, D. Labate, W.-Q. Lim, G. Weiss, and E. Wilson, Wavelets with composite dila-647tions, Electronic research announcements of the American Mathematical Society, 10 (2004),648pp. 78–87.649

[28] W. Hardle, G. Kerkyacharian, D. Picard, and A. Tsybakov, Wavelets, approximation,650and statistical applications, vol. 129, Springer Science & Business Media, 2012.651

[29] B. K. P. Horn and B. G. Schunck, Determining optical flow, Artificial Intelligence, 17 (1981),652pp. 185–203.653


https://doi.org/10.1186/1532-429X-13-S1-P360

https://doi.org/10.1186/1532-429X-13-S1-P360

https://doi.org/10.1186/1532-429X-13-S1-P360

http://jcmr-online.com/content/13/S1/P360


[30] S. Kadri Harouna and V. Perrier, Divergence-free wavelet projection method for incom-654pressible viscous flow on the square, Multiscale Modeling & Simulation, 13 (2015), pp. 399–655422.656

[31] G. Kerkyacharian and D. Picard, Thresholding algorithms, maxisets and well-concentrated657bases, Test, 9 (2000), pp. 283–344.658

[32] I. Khalidov, J. Fadili, F. Lazeyras, D. Van De Ville, and M. Unser, Activelets:659Wavelets for sparse representation of hemodynamic responses, Signal Processing, 91660(2011), pp. 2810–2821.661

[33] I. Khalidov, D. Van De Ville, T. Blu, and M. Unser, Construction of wavelet bases that662mimic the behaviour of some given operator, in Optical Engineering+ Applications, Inter-663national Society for Optics and Photonics, 2007, pp. 67010S–67010S.664

[34] T. Loupas, W. McDicken, and P. Allan, An adaptive weighted median filter for speckle665suppression in medical ultrasonic images, Circuits and Systems, 36 (1989), pp. 129–135.666

[35] M. Markl, F. P. Chan, M. T. Alley, K. L. Wedding, M. T. Draney, C. J. Elkins,667D. W. Parker, R. Wicker, C. A. Taylor, R. J. Herfkens, et al., Time-resolved668three-dimensional phase-contrast mri, Journal of Magnetic Resonance Imaging, 17 (2003),669pp. 499–506.670

[36] L. Meier, S. Van De Geer, and P. Buhlmann, The group lasso for logistic regression, Journal671of the Royal Statistical Society: Series B (Statistical Methodology), 70 (2008), pp. 53–71.672

[37] M. H. Neumann, Multivariate wavelet thresholding in anisotropic function spaces, Statistica673Sinica, 10 (2000), pp. 399–431.674

[38] M. H. Neumann and R. Von Sachs, Wavelet thresholding in anisotropic function classes and675application to adaptive estimation of evolutionary spectra, The Annals of Statistics, 25676(1997), pp. 38–76.677

[39] R. D. Nowak and R. G. Baraniuk, Wavelet-domain filtering for photon imaging systems,678Image Processing, IEEE Transactions on, 8 (1999), pp. 666–678.679

[40] A. Quarteroni, A. Veneziani, and P. Zunino, Mathematical and numerical modeling of680solute dynamics in blood flow and arterial walls, SIAM Journal on Numerical Analysis, 39681(2002), pp. 1488–1511.682

[41] N. Remenyi, O. Nicolis, G. Nason, and B. Vidakovic, Image denoising with 2d scale-mixing683complex wavelet transforms, Image Processing, IEEE Transactions on, 23 (2014), pp. 5165–6845174.685

[42] H.-J. Schmeisser, Recent developments in the theory of function spaces with dominating mixed686smoothness, Nonlinear Analysis, Function Spaces and Applications, (2007), pp. 145–204.687

[43] H.-J. Schmeisser and W. Sickel, Spaces of functions of mixed smoothness and ap-688proximation from hyperbolic crosses, Journal of Approximation Theory, 128 (2004),689pp. 115 – 150, https://doi.org/http://dx.doi.org/10.1016/j.jat.2004.04.007, http://www.690sciencedirect.com/science/article/pii/S0021904504000693.691

[44] I. W. Selesnick and K. Y. Li, Video denoising using 2d and 3d dual-tree complex wavelet692transforms, in Optical Science and Technology, SPIE’s 48th Annual Meeting, International693Society for Optics and Photonics, 2003, pp. 607–618.694

[45] G. A. Shaw and H.-h. K. Burke, Spectral imaging for remote sensing, Lincoln Laboratory695Journal, 14 (2003), pp. 3–28.696

[46] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical697Society. Series B (Methodological), (1996), pp. 267–288.698

[47] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, Sparsity and smoothness via699the fused lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology),70067 (2005), pp. 91–108.701

[48] D. Van De Ville, T. Blu, B. Forster, and M. Unser, Semi-orthogonal wavelets that behave702like fractional differentiators, in Optics & Photonics 2005, International Society for Optics703and Photonics, 2005, pp. 59140C–59140C.704

[49] B. Yu, Assouad, fano, and le cam, in Festschrift for Lucien Le Cam, Springer, 1997, pp. 423–705435.706

[50] V. Zavadsky, Image approximation by rectangular wavelet transform, Journal of Mathematical707Imaging and Vision, 27 (2007), pp. 129–138.708


https://doi.org/http://dx.doi.org/10.1016/j.jat.2004.04.007

http://www.sciencedirect.com/science/article/pii/S0021904504000693



Documents

Variable Groupwise Structured Wavelets · VARIABLE GROUPWISE STRUCTURED WAVELETS 3 77 will be only applied on the spatial variables. 78 In this work, we describe a novel construction