36
Integrated wavelet processing and spatial statistical testing of fMRI data Dimitri Van De Ville a,* Thierry Blu a Michael Unser a a Biomedical Imaging Group, Swiss Federal Institute of Technology Lausanne (EPFL), CH-1015 Lausanne, Switzerland Abstract We introduce an integrated framework for detecting brain activity from fMRI data, which is based on a 3D spatial discrete wavelet transform. Unlike the standard wavelet-based approach for fMRI analysis, we apply the suitable statistical test procedure in the spatial domain. For a desired significance level, this scheme has one remaining degree of freedom, characterizing the wavelet processing, which is optimized according to the principle of minimal approximation error. This allows us to determine the threshold values in a way that does not depend on data. While developing our framework, we make only conservative assumptions. Consequently the detection of activation is based on strong evidence. We have implemented this framework as a toolbox (WSPM) for the SPM2 software, taking advantage of multi- ple options and functions of SPM such as the setup of the linear model and the use of the hemodynamic response function. We show by experimental results that our method is able to detect activation patterns; the results are comparable to those obtained by SPM even though statistical assumptions are more conservative. Key words: discrete wavelet transform, wavelet thresholding, statistical testing, threshold selection, approximation error Accepted for publication in NeuroImage (YNIMG2708) * Corresponding author. Telephone +41-21-6935142. Fax +41-21-6933701. Email addresses: [email protected] (Dimitri Van De Ville), [email protected] (Thierry Blu), [email protected] (Michael Unser). Preprint submitted to Elsevier Science 26 July 2004

Integrated wavelet processing and spatial statistical

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integrated wavelet processing and spatial statistical

Integrated wavelet processing and spatial

statistical testing of fMRI data

Dimitri Van De Ville a,∗ Thierry Blu a Michael Unser a

aBiomedical Imaging Group,Swiss Federal Institute of Technology Lausanne (EPFL),

CH-1015 Lausanne, Switzerland

Abstract

We introduce an integrated framework for detecting brain activity from fMRI data,which is based on a 3D spatial discrete wavelet transform. Unlike the standardwavelet-based approach for fMRI analysis, we apply the suitable statistical testprocedure in the spatial domain. For a desired significance level, this scheme hasone remaining degree of freedom, characterizing the wavelet processing, which isoptimized according to the principle of minimal approximation error. This allowsus to determine the threshold values in a way that does not depend on data. Whiledeveloping our framework, we make only conservative assumptions. Consequentlythe detection of activation is based on strong evidence. We have implemented thisframework as a toolbox (WSPM) for the SPM2 software, taking advantage of multi-ple options and functions of SPM such as the setup of the linear model and the useof the hemodynamic response function. We show by experimental results that ourmethod is able to detect activation patterns; the results are comparable to thoseobtained by SPM even though statistical assumptions are more conservative.

Key words: discrete wavelet transform, wavelet thresholding, statistical testing,threshold selection, approximation error

Accepted for publication in NeuroImage (YNIMG2708)

∗ Corresponding author. Telephone +41-21-6935142. Fax +41-21-6933701.Email addresses: [email protected] (Dimitri Van De Ville),

[email protected] (Thierry Blu), [email protected] (Michael Unser).

Preprint submitted to Elsevier Science 26 July 2004

Page 2: Integrated wavelet processing and spatial statistical

1 Introduction

Functional magnetic resonance imaging (fMRI) has become a key modalityto perform non-invasive studies of brain. Its working principle is based oninteraction between neuronal activity and physiology, such as blood oxygena-tion and blood flow. Through a variation of the magnetic field uniformity,these interactions induce a weak and noisy T2*-contrast signal (Ogawa et al.(1993)).

The most widely deployed and recognized method and associated softwarepackage for performing the analysis of fMRI data is Statistical ParametricMapping (SPM) (Friston et al. (1995); Frackowiak et al. (1997)). SPM is aparametric hypothesis-driven approach: it performs a statistical test on thefitted parameters of a linear model (LM) and detects activation at the spa-tial locations where the non-activation hypothesis is rejected. One of SPM’smain characteristics is the application of a fixed Gaussian prefilter. The ra-tionale of the Gaussian prefilter is twofold. First, it improves the signal-to-noise ratio (SNR) as indicated by the matched-filter theorem (Worsley et al.(1996)). Second, it controls the spatial smoothness of the data which is used tocorrect for multiple testing by advanced methods based on continuous Gaus-sian random fields and Euler characteristics (Poline et al. (1997)). However,Gaussian-shaped activation patterns are observed rather rarely in practice.While researchers have also proposed the use of more general filters (Kruggelet al. (1999); Shafie et al. (2003)), the use of fixed Gaussian smoothing is stillthe most-commonly applied method for fMRI analysis.

As an alternative approach, some researchers have advocated the use of aspatial wavelet transform instead of Gaussian prefiltering (Ruttimann et al.(1998); Turkheimer et al. (2000)). This method exploits the sparsity of thedata representation in the wavelet domain to improve the detection sensitiv-ity. In particular, the cluster of voxels that make up an activated region tend tobe spatially correlated and can be efficiently encoded using only a few waveletcoefficients. Such a sparse representation increases the SNR, since the noiseremains evenly distributed in the wavelet domain. Therefore, a coefficient-wisestatistical t-test provides a much higher sensitivity than a voxel-wise approachin the spatial domain, even when multiple testing is compensated by a conser-vative Bonferroni correction. After the detection phase in the wavelet domain,an inverse wavelet transform is applied to reconstruct an activation map fromthe coefficients that are designated as significant. While this reconstructedmap is very useful for visualization purposes, it does not have a direct statis-tical interpretation; i.e., the t-values are computed in the wavelet domain andthere is no straightforward way to map the statistics to the spatial domain.Some proposed solutions include the application of an ad-hoc threshold to thereconstructed map (e.g., a percentage of the maximal signal level); re-testing

2

Page 3: Integrated wavelet processing and spatial statistical

in the spatial domain (without taking into account the effect of the initialtest in the wavelet domain) (Fu et al. (1998)); recursively reconstructing theactivity map by controlling the false discovery rate (Genovese et al. (2002);Bullmore et al. (2003)). Wavelets have also been applied to fMRI analysis asa pure spatial denoising preprocessing step (Hilton et al. (1996); Wink andRoerdink (2004)). Another important issue is to deal with the temporal corre-lation of fMRI data (Woolrich et al. (2001)), for which wavelet-based methodshave also been proposed (Bullmore et al. (2001); Fadili and Bullmore (2001)).The combination of the proposed method with temporal treatments will beconsidered in future work.

The previous spatial wavelet-based approach faces a fundamental problem;i.e., approximation and detection are performed at once in the wavelet domain.Therefore, the detected parameter map problematically remains in the waveletdomain, which complicates the spatial interpretation. In this paper, we proposea new framework that is based on two main principles:

(1) Clear separation of the approximation and detection procedures: The ap-proximation procedure is carried out in the wavelet domain, while thestatistical testing is done in the spatial domain. The test procedure prop-erly takes into account the effect of the approximation step.

(2) Most faithful reconstruction: After statistical testing in the spatial do-main, only a limited number of voxels are detected. Thus, we propose tooptimize the wavelet processing so as to minimize the difference betweenthe detected and unprocessed parameter map. Imposing this constraintallows us to uniquely determine the optimal threshold values that char-acterize the approximation and detection procedures, for a given desiredsignificance level. It turns out that this optimum does not depend on thedata.

The key to the proposed framework is that it is completely integrated: even-though wavelet-domain processing and spatial statistical testing are separate,we take into account their mutual influence. Moreover, the detection proce-dure of the framework rests on conservative assumptions that comply with aworst-case scenario. Therefore, the framework provides a strong type-I errorcontrol; i.e., the rejection of the hypothesis H0 for no activity is evidence foran activated region since false positives are very well controlled.

The new framework is quite general and can be applied to a wide varietyof wavelet-like transforms. In this paper, we restrict ourselves to a proof-of-concept and we demonstrate its usefulness by the way of experiments thatinvolve the well-known B-spline wavelet transform. Despite the conservativeassumptions, we find activation maps comparable to those of SPM. The pro-posed framework has been integrated into SPM2 as a “Wavelet toolbox”,allowing the user to have the usual SPM-based analysis and the joint spatio-

3

Page 4: Integrated wavelet processing and spatial statistical

wavelet analysis side by side.

The paper is organized as follows: In Sect. 2, we briefly review the standardwavelet-based method. Next, in Sect. 3, we introduce the new framework.Experimental results are shown in Sect. 5.

2 Background

We start this section with a brief review of the wavelet transform. We thenpresent the standard wavelet-based method for fMRI analysis, extended withthe linear model (LM) to easily setup experiments and incorporate the effect ofthe hemodynamic response function (HRF) (Friston et al. (1995); Van De Villeet al. (2003); Mueller et al. (2003)). The statistical inference is performed bya coefficient-wise t-test (Turkheimer et al. (2000); Feilner et al. (2000)).

2.1 The discrete wavelet transform

The discrete wavelet transform (DWT) is a popular tool for multi-resolutionanalysis (Mallat (1989, 1998)). Its two-channel filterbank implementation iswell-known and has found its way into a wide range of applications. To providethe best insight, we first formulate the transform in the continuous domain bydecomposing a function into a sum of shifted and scaled continuously definedwavelets.

Let us consider a 1D continuously defined function v(x), which is known by itssample values v[n], n ∈ Z. We start by the representation of this function in ashift-invariant signal space V0(ϕ) = spann∈Z

{ϕ(x−n)}. For the decompositionto be numerically stable and unambiuous, the function ϕ needs to generate aRiesz basis. So we have

v(x) =∑

k

c0[k]ϕ(x − k), (1)

where the coefficients c0[k] can be chosen such that v(n) = v[n]. Perform-ing one stage of the wavelet transform consists of nothing else than splittingthe signal space V0(ϕ) into V1(ϕ) = spann{ϕ(x/2 − n)}, characterized by di-lated basis functions on a coarser grid, and its orthogonal complement in V0,W1(ψ) = spann{ψ(x/2 − n)}, characterized by the wavelet ψ. Consequently,the function (1) can be decomposed as

v(x) =√

2∑

k

c1[k]ϕ(x/2 − k) +√

2∑

k

w1[k]ψ(x/2 − k), (2)

4

Page 5: Integrated wavelet processing and spatial statistical

where the coefficients c1[k] and w1[k] fully describe the orthogonal projectionof v(x) into the spaces V1 and W1, respectively. Specifically, we obtain c1[k]as 〈v(·), ϕ(·/2 − k)〉, where ϕ denotes the dual function of ϕ; i.e., the uniquefunction of V0 such that 〈ϕ(· − k), ϕ(· − l)〉 = δk−l. Similarly, the “detail co-efficients” w1[k] are obtained using the dual wavelet ψ(·/2 − k). Figure 1 (a)shows a flowchart of a one-level decomposition. Using the same principle, thesignal (2) can be decomposed further, Jw times, as in

v(x) = 2Jw/2∑

k

cJw [k]ϕ(x/2Jw − k) +Jw∑

j=1

2j/2∑

k

wj[k]ψ(x/2j − k). (3)

We now introduce a shorthand notation for a wavelet decomposition such as(3):

v(x) =∑

k

wkψk(x), (4)

where wk and its index k runs over all scales of the decomposition, while thefunctions ψk corresponds to the associated scaled, shifted, and dilated versionsof the scaling function or of the wavelet.

In practice, the DWT is implemented efficiently through an iterated filter-bank, which directly operates on the discrete coefficients cj[k]. The filters arederived from the dual scaling function and wavelet. For example, the scalingcoefficients c1[k] after one iteration can be found by filtering and subsamplingwith the scaling filter h at the analysis side:

c1[k] = (c0 ∗ h)[2k]. (5)

Usually, we characterize the filters in the z-domain; e.g., H(z) =∑

k h[k]z−k.In Fig. 1 (b), we show the flowchart of the filterbank representation for oneiteration of the DWT.

The DWT is non-redundant; i.e., the number of coefficients is always equalto the initial number of samples. The wavelet transform of piecewise smoothsignals tends to be quite sparse, a property that is very useful for many ap-plications.

The decomposition above is easily extended to the multi-dimensional caseby using tensor-product basis functions; i.e., the 3D scaling function ϕ(x)corresponds to ϕ(x1)ϕ(x2)ϕ(x3). Specifically, we have one scaling functionand 7 wavelets, while the overall subsampling ratio at each iteration is 8.

2.2 Standard wavelet-based statistical analysis of fMRI data

An fMRI dataset is denoted as v(t)[n], n ∈ Z3, t ∈ Z, where n and t = 1, . . . , Nt

are the 3D-spatial and temporal indices, respectively. The non-redundant spa-

5

Page 6: Integrated wavelet processing and spatial statistical

tial 3D DWT of a volume, v(t)[n], yields the coefficients w(t)k

, where the indexk addresses all subbands and orientations. As in (4), we compactly denote thewavelet decomposition as

v(t)[n] =∑

k

w(t)k

ψk(n). (6)

Now we introduce the time-series vector of length Nt for a wavelet coefficient;i.e., wk = [w

(1)k

. . . w(Nt)k

]T. The temporal behavior of the wavelet coefficient isdescribed by a LM, which represent the vector wk as 1

wk = Xyk + ek, (7)

where X is the Nt×L design matrix, and ek is the residual error. The matrix X

contains the desired signal models. For example, in the case of a simple block-based experiment, X contains two columns: one for the on-off stimulus andone for the background. Usually, one convolves the stimulus with a model forthe HRF. Under the assumption of independently and identically Gaussian-distributed residual error ek, the optimal unbiased estimate of yk is the least-squares estimate given by yk = (XTX)−1XTwk. The residual error of thisestimate is obtained as ek = wk − Xyk. Next, the information of interestis extracted from yk by a contrast vector c (e.g., c = [1 0]T for our simpleexample). At this stage, we obtain two scalar values for the k-th waveletcoefficient:

gk = cTyk, (8)

s2k= eT

kekc

T(XTX)−1c, (9)

where gk and s2k

follow a Gaussian and a χ2 distribution, respectively. Thehypothesis to test is whether the coefficient k is activated and thus has anon-zero mean:

H0 : µk = 0, (10)

H1 : µk 6= 0. (11)

The t-value for each wavelet coefficient can be found as

tk =gk

s2k/J

, with J = Nt − rank(X), (12)

which can be tested against a threshold τw. This value τw is determined by thedesired significance level α (e.g., 5%) for a two-sided t-test (since both positive

1 Since the DWT is a spatial linear operator and the LM analysis a temporal linearoperator, it is equivalent to apply the LM to v(t)[n] or to its wavelet coefficients

w(t)k

. The link between the respective parameters y[n] and yk is simply the DWT.

6

Page 7: Integrated wavelet processing and spatial statistical

and negative wavelet coefficients can contribute to an increased signal in thespatial domain). The significance level α needs to be corrected for multiplehypothesis testing. One typically applies a conservative Bonferroni correction,which reduces the significance level to αB = α/Nc, where Nc is the numberof statistical tests. The value Nc can be chosen as the number of intracranialwavelet coefficients, which corresponds closely to the number of intracranialvoxels.

The wavelet coefficients gk that survived the statistical test |tk| > τw can bereconstructed as

r[n] =∑

k

H(|tk| − τw) gkψk(n), (13)

where H(t) is the Heaviside step function defined as

H(t) =

0, when t < 0,

1, otherwise.(14)

In other words, the term H(|tk| − τw) in (13) acts as an indicator functionwhich is equal to 1 for |tk| ≥ τw and 0 otherwise. Depending on the supportof the wavelet ψk, which can be infinite, the reconstructed volume r[n] willcontain many non-zero voxels. Often, an ad-hoc spatial threshold is appliedto obtain an “acceptable” activation map. However, such a threshold does nottake into account the variability of the underlying voxels as compared to thet-value in the wavelet domain. It is therefore not possible to associate a clearstatistical meaning to the reconstructed volume r[n]; this constitutes the maindisadvantage of the wavelet-based approach.

Figure 2 (a) shows schematically the standard wavelet procedure. The desiredsignificance level αB directly determines the threshold value τw for the t-testin the wavelet domain. The detection procedure in the wavelet domain isstatistically sound and quantitative; unfortunately, the transposition of theresults back into the spatial domain is qualitative only.

The non-redundancy of the DWT is an important property for the multiple-testing correction. Indeed, a redundant transform would require the use ofa higher threshold value and would therefore reduce the sensitivity of theapproach.

7

Page 8: Integrated wavelet processing and spatial statistical

3 Method: Integrated wavelet processing and spatial statistical

testing

3.1 Main idea

The major advantage of the wavelet-based method is its apparent high sensi-tivity, even though the conservative Bonferroni-correction for multiple testingis used. The underlying reason is the sparsity property of the wavelet trans-form. In other words, the activation patterns exhibit a local correlation whichis compactly encoded by the wavelet basis functions. Therefore, thresholdingthe t-values in the wavelet domain is an efficient way to improve the SNR with-out removing any information. It follows that this approach has the potentialto detect activations with a high spatial resolution.

Here, we also propose to rely on thresholding in the wavelet domain. How-ever, this procedure is considered as a pre-processing step (parametrized bythe wavelet threshold), and the statistical test is ultimately performed in thespatial domain, taking into account the processing that has been done before.Figure 2 (b) summarizes this basic philosophy. The threshold τw is still appliedto the t-values in the wavelet domain but is treated as a general parameter ofa nonlinear algorithm. The final testing in the spatial domain is implementedvia a threshold τs. Intuitively, it is clear that there is an infinity of possiblecombinations (τw, τs) that will establish the same desired significance levelαB. We introduce the principle of “minimal approximation error” between theunprocessed data (after the LM) and the final result. Our method does notrequire any additional hypotheses. In particular, the wavelet coefficients donot need to be decorrelated.

The proposed framework consists of two major parts. The first one links theprocessing of the t-values in the wavelet domain, which is parameterized bya threshold τw, to the statistical hypothesis testing in the spatial domain,characterized by a threshold τs. Second, we develop the principle of minimizingthe approximation error between the data after LM fitting and the result afterapplying both thresholds. It is this second part of our framework that willprovide the right way to determine the optimal values of the wavelet andspatial thresholds. These optimal thresholds can be computed off-line and aredata-independent; they are a function of the desired significance level and thenumber of intracranial voxels.

8

Page 9: Integrated wavelet processing and spatial statistical

3.2 Simplified case—true σk are known

To facilitate the exposition of our method, we first consider the case where Nt

is sufficiently large, so that the normalizing term in (12), sk/√

J , is essentiallyequivalent to σk, the standard deviation of gk. In this limiting case, the t-values tk are equivalent to z-scores and are normally distributed. Therefore,we denote gk/σk as zk.

3.2.1 Part I: Effect of wavelet processing

We follow the standard procedure of Sect. 2 up to the calculation of the t-valuein the wavelet domain, which is this time equivalent to a z-score under theassumption σk = sk/

√J . Similarly, we apply a threshold to the values gk to

obtain H(|zk| − τw) gk. However, we ignore the interpretation as a statisticaltest. Our goal is to derive a spatially-varying threshold q[n] such that, underthe null hypothesis, the desired significance level αB is not exceeded by theprobability of the reconstruction of the processed wavelet coefficients thatcontribute to the value of the voxel n:

P

[∑

k

H(|zk| − τw) gkψk(n) ≥ q[n]

]

≤ αB. (15)

For this purpose, we will make use of Theorem 1 in Appendix A, which providesan upper probability bound for a convex sum of random variables. Therefore,we take a closer look at the reconstructed volume and manipulate it to obtaina convex sum of random variables that follow the same probability densityfunction. Specifically, by introducing σk, which is the true standard deviationof gk, we can rewrite the reconstructed map as

k

H(|zk| − τw) gkψk(n) =∑

k

H(|zk| − τw)

zk︷︸︸︷gk

σk

sign(ψk(n)) σk |ψk(n)|

= Λ[n]∑

k

λkH(|zk| − τw) zk sign(ψk(n))

= Λ[n]∑

k

λkξk, (16)

with Λ[n] =∑

k′ σk′ |ψk′(n)|, λk = σk |ψk(n)| /Λ[n], and where we have intro-duced the random variables

ξk = H(|zk| − τw) zksign(ψk(n)). (17)

It is important to notice that ξk follows a normalized probability density func-tion independent of k because zk follows a normalized Gaussian distribution,

9

Page 10: Integrated wavelet processing and spatial statistical

which is symmetric. Furthermore, we see that the factors λk are positive suchthat

k λk = 1. Therefore, Λ[n] can be considered as a normalization ofthe reconstructed volume and the non-stationary threshold can be chosen asq[n] = τsΛ[n].

Using (16) and (17), we can then simplify Expression (15) as

P

[∑

k

H(|zk| − τw) gkψk(n) ≥ q[n]

]

= P

[∑

k

λk(ξk − τs) ≥ 0

]

. (18)

Since the sum on the right-hand side of Eq. (18) satisfies the conditions ofTheorem 1, we obtain the following probability bound for the convex sum:

P

[∑

k

λk(ξk − τs) ≥ 0

]

≤ mina>0

E[(1 + a(ξ − τs))+], (19)

where the notation (x)+ stands for max(0, x). Going back to Eq. (17), we seethat ξ follows a truncated normalized Gaussian distribution with a Dirac peakat the origin, as illustrated in Fig. 3. The effect of the threshold is to mapall small coefficients to zero, while the sign function has no influence on asymmetric distribution.

Usually, the value of a that provides the sharpest probability estimate (i.e.,which minimizes the right-hand side of (19)) is a∗ = 1/τs. The probability canthen be bounded as

P

[∑

k H(|zk| − τw) gkψk(n)∑

k σk |ψk(n)| ≥ τs

]

≤ E[(ξ)+]

τs

,

which finally provides the conservative spatial threshold

τs =E[(ξ)+]

αB

(20)

given the significance level αB.

The expectation value of the right-hand side can be computed as a functionof the wavelet threshold τw, independently from the data. Figure 4 (a) showsthe surface described by the triplet (τs, τw, αB).

The main importance of the bound of (19), is that it allows us to derivesharp threshold values, in particular for short-tailed distributions such as theGaussian. For an illustrative example, we refer to App. B.

10

Page 11: Integrated wavelet processing and spatial statistical

3.2.2 Part II: Minimizing the approximation error

Now we consider the second approximation aspect of our framework. Equa-tion (20) does not present a complete solution to our problem. Indeed, infin-ityly many combinations of τw and τs achieve the same desired significancelevel αB. These solutions differ, however, by the quality of approximation ofthe wavelet processing: the better the approximation—i.e., the smaller thewavelet threshold τw—the larger the spatial threshold τs; on the contrary,if the approximation is poor (i.e., large wavelet threshold), the few spatialdetections cannot reasonably be localized and identified on the unprocessedvolume.

We resolve this issue by searching for the solution that minimizes the worst-case approximation error between the unprocessed and detected parametermap. In other words, we want the final result to be as close as possible tothe original data. We will see how this provides a simple but elegant way todetermine optimal threshold values.

The basic problem is outlined in Fig. 5. Top left, we have the wavelet coeffi-cients gk, obtained after fitting the LM. From these data, we can obtain thethree following reconstructions in the spatial domain:

(1) The reconstruction of the raw wavelet coefficients without any processing:

u[n] =∑

k

gkψk(n). (21)

Notice that the same u[n] would be obtained by fitting the LM directlyin the spatial domain.

(2) The reconstruction after thresholding the wavelet coefficients accordingto their t-values in the wavelet domain:

r[n] =∑

k

H(|zk| − τw) gkψk(n). (22)

(3) The final result after statistical testing of r[n] in the spatial domain:

r′[n] = H(r[n] − τsΛ[n]) r[n]. (23)

Our guiding principle is to minimize the difference between the non-processedreconstruction u[n] and the final result r′[n]. To this end, we express thedifferent between u[n] and r′[n] as

|u[n] − r′[n]|= |u[n] − r[n] + r[n] − r′[n]|≤ |u[n] − r[n]| + |r[n] − r′[n]| ,

where the first term can be bounded by

11

Page 12: Integrated wavelet processing and spatial statistical

|u[n] − r[n]|=∣∣∣∣∣

k

(1 − H(|zk| − τw))g[k]ψk[n]

∣∣∣∣∣

≤ τwσk |ψk[n]|= τwΛ[n],

and the second one by

|r[n] − r′[n]|= (1 − H(r[n] − τsΛ[n])) |r[n]|≤ τsΛ[n].

So we obtain|u[n] − r′[n]| ≤ (τw + τs)Λ[n], (24)

which is valid pointwise for each voxel n, and therefore also conservative, in thesense that it holds for a worst-case scenario. It also important to observe thatthis is a sharp worst-case bound since it can be attained by some configurationof realizations of gk. Consequently, the optimal values for the thresholds τw

and τs are obtained by simply minimizing their sum:

(τw, τs) = arg minτw,τs

{

τw + τs, subject to τs =E[(ξ)+]

αB

}

. (25)

We now apply this optimization to the results obtained from Sect. 3.2.1. First,we derive the relationship between τw and τs by using Eq. (20), which yields

τs =1

αB

1√2π

exp

(

−τ 2w

2

)

. (26)

Furthermore, we can show that the minimization of the sum τw + τs leads usto the optimal values

τs = 1/τw, τw =√

−W−1 (−2πα2B), (27)

where W−1(·) is the −1-branch of the Lambert W -function (also called Omegafunction); i.e., the inverse of the function f(W ) = W exp(W ). This functioncan be evaluated numerically (Corless et al. (1996)). For a detailed derivationof (26) and (27), we refer to Appendix C.

On the probability surface of Fig. 4 (a), the circular dots mark the optimalthreshold values for the covered range of αB. In Fig. 4 (b), we show the optimalthreshold values directly as a function of αB. As a typical example, we couldconsider α = 5% and Nc = 104 (number of intracranial voxels), resulting inαB = 5 × 10−6. Notice that the surface of Fig. 4 (a) can also be used todetermine p-values; i.e., for a fixed threshold τw, the value of τs can be chosenequal to r[n]/Λ[n], resulting into a p-value Nc × αB(τs, τw).

12

Page 13: Integrated wavelet processing and spatial statistical

3.3 General case—true σk are unknown

In the general case, the true σk are unknown and their estimation by sk/√

Jshould be taken into consideration for the testing procedure. Consequently,the number of volumes Nt and the related degrees of freedom J will influencethe threshold values.

The strategy of our approach is very similar to Sect. 3.2. However, this timethe spatially-varying threshold becomes

q[n] = τs

k

sk√J|ψk(n)| , (28)

where sk/√

J is the estimate for σk, meaning that we can no longer reducet-values to z-scores in the inequality inside the probability of (15). We write

k

H(|tk| − τw) gkψk(n) − q[n] ≥ 0

⇐⇒∑

k

H(|tk| − τw) gkψk(n) − τssk√J|ψk(n)| ≥ 0

⇐⇒∑

k

σk |ψk(n)|(

H(|tk| − τw)gk

σk

sign(ψk(n)) − τssk/

√J

σk

)

≥ 0

⇐⇒∑

k

λk

(

H(|tk| − τw)gk

σk

sign(ψk(n)) − τssk/

√J

σk

)

≥ 0

⇐⇒∑

k

λk(ξk − τsςk) ≥ 0, (29)

where λk = σk |ψk(n)| / ∑

l σl |ψl(n)|, and where we have introduced the ran-dom variables

ξk = H(|tk| − τw)gk

σk

sign(ψk(n)), (30)

ςk =sk/

√J

σk

. (31)

Once again, it is important to notice that all ξk and ςk follow the same prob-ability density distributions ξ and ς, respectively. These are defined as somereference random variables

ξ = H

(∣∣∣∣∣

g′

s′/√

J

∣∣∣∣∣− τw

)

g′, (32)

ς = s′/√

J, (33)

13

Page 14: Integrated wavelet processing and spatial statistical

where g′ follows a normalized Gaussian distribution, and where s′2 followsa normalized χ2-distribution with J degrees of freedom, and is statisticallyindependent of g′.

Then, the probability of (29) can be bounded again using Theorem 1:

P

[∑

k H(|tk| − τw) gkψk(n)∑

k sk/√

J |ψk(n)|≥ τs

]

= P

[∑

k

λk (ξk − τsςk) ≥ 0

]

≤mina>0

E [(1 + a(ξ − τsς))+] , (34)

which finally provides the following (conservative) relation between the spatialand the wavelet threshold

mina>0

E [(1 + a(ξ − τsς))+] = αB (35)

given the significance level, αB.

In general, the exact computation of the bound (34) is quite involved. Asharp upper value is given in Appendix D; it is also used to determine theoptimal value a∗ by a numerical optimization procedure. Figure 6 (a) showsthe probability surface that we finally obtain. Again, the optimal thresholdvalues τw and τs can be found by minimizing their sum. The result of thecomplete optimization is provided in Fig. 6 (b) and (c), where we show theoptimal threshold values as function of the desired significance level for Nt =50 and Nt = 150, respectively. Notice how, for the second case, the thresholdvalues approach the idealized case of Sect. 3.2. The calculation of the optimalthresholds could be done offline since their values are data-independent. Thecomputation of one pair (τs, τw), including the optimization of a, requires thedegrees of freedom J and the significance level αB, and takes a few seconds inMATLAB

.

3.4 Summary of the proposed approach

Having discussed the different modules in Figs. 2 (b) and 5, we now brieflysummarize the main computational steps of the proposed approach.

(1) The threshold values τw and τs are determined as a function of the desiredsignificance level α/Nc and the degrees of freedom J .

(2) The spatial DWT is computed for each volume v(t)[n] of the time-series,

resulting in the coefficients w(t)k

.

(3) For each time-serie of wavelet coefficient w(t)k

, the LM is applied and theparameter of interest is extracted. This way, we obtain the parameter’sestimate gk and its estimated standard deviation s2

k.

14

Page 15: Integrated wavelet processing and spatial statistical

(4) After wavelet processing (i.e., applying the threshold τw to the gk’s), weuse the inverse DWT to reconstruct the volume r[n].

(5) The values sk/√

J are reconstructed by a modified inverse DWT algo-rithm, which corresponds to putting the absolute value of the wavelet.We obtain the volume Λ[n].

(6) The detected parameter map is obtained by applying the threshold τs tor[n]/Λ[n].

4 Implementation: A new toolbox for SPM

Our new approach has been implemented as a “Wavelet toolbox”, calledWSPM, for SPM2. In this way, the user can setup his experiments as usualusing SPM’s extensive features for preprocessing (e.g., registration) and LMspecification, including the HRF modelling. Next to the standard analysisperformed by SPM, the toolbox allows to use our joint spatio-wavelet statis-tical testing. Its results are added as new “contrasts” to the SPM structurerelated to the experiment, and they can be explored using SPM’s extensivevisualization features.

5 Experimental results

The aim of this section is to provide a proof-of-concept of the proposed frame-work, rather than a full coverage and fine-tuning of each possible parameteroffered by the method and its wavelet transform. Therefore, unless mentionedotherwise, the results of wavelet-based approach are obtained using the orthog-onal B-spline wavelets of degree 1 with Jw = 1 iteration in 3D. Essentially,we want to show that we are able to detect activation patterns similar toSPM, while making rather conservative assumptions. Further exploration ofthe parameter space will be presented in a future paper.

5.1 Type I error control in null data set experiments

Explicit testing of formal type I error control can be studied by null data setexperiments, e.g. as in Fadili and Bullmore (2001). For a given significancelevel—corresponding to the expected false positives fraction (FPF)—the ob-served FPF is evaluated. Usually, these experiments are carried out for verylow significance levels (e.g., a high value αB = 0.1), in order to determine anestimate of the error control rate from only a few null data sets.

15

Page 16: Integrated wavelet processing and spatial statistical

However, the integrated framework requires the use of high significance levels.First, the tightness of the detection bound (34), gets better as αB decreases.Second, the calculation of the thresholds τw and τs, based on the bound derivedin App. D, is also sharper for small αB. Due to these effects, the resultsfor very high values of αB are not very instructive since the thresholds areoverestimated. Therefore, we conducted a type I error control experiment forsynthetic null data sets, which allows us to regenerate easily new data setsand consequently to operate at the (usual) high significance levels.

The synthetic null data sets are generated for a time-series of Nt = 120 volumesof 64×64×22 voxels. The voxels contain Gaussian white noise and the designmatrix contained a dummy on-off activation with 5 volumes per epoch. Therange of the significance level was chosen αB = 10−6, . . . , 10−3. The average ofthe observed FPF curve over 200 experiments is shown in Fig. 7. As expected,the spatial t-test is exactly calibrated by the type I error. Clearly, the newwavelet approach has a more conservative behavior for these null data sets.This can be explained by the non-linear nature of the wavelet thresholdingprocedure, which suppresses the null-signal extremely well. As we will showin the next sections, this conservative behavior for null data sets does notprevent our method from detecting correctly activation regions.

5.2 Software phantom study

Following Desco et al. (2001), we have performed 3D software phantom studies.The clusters shown in Fig. 9 (a) served as seeds for activation regions. Each ofthese activation patterns was embedded at different signal levels (4%, 2% and1% higher than the background) inside an intracranial mask of a 64× 64× 22volume. To obtain more realistic activations, the initial activation map wassmoothed by a Gaussian filter (FWHM=2 voxels). The central slice of both themask and the 12 activation regions, is shown in Fig. 9 (b) and (c), respectively.The activation map was then transformed into a time-series of 80 volumesaccording to a block-paradigm of 4 cycles with 10 volumes per epoch, takinginto account the HRF used by SPM (assuming TR=3s). Each volumes wascorrupted by Gaussian white noise of 2% of the background level.

The design matrix used to detect activation incorporated the exact knowledgeof the temporal course of activated voxels. The desired significance level waschosen to be α = 5%. The Bonferroni-corrected significance level αB = α/Nc

was corrected by the number of intracranial voxels Nc = 16087. The detectedactivation patterns for the various methods are shown in Fig. 9, and the num-ber of detected voxels per cluster are summarized in Table 1. For detectedvoxels, the colors (red-yellow-white) are attributed according to the statisticalparameter map; i.e., the estimated t-values for SPM and the spatial t-test, the

16

Page 17: Integrated wavelet processing and spatial statistical

normalized value r[n]/Λ[n] for the wavelet approach. None of the methodsdetected the single voxel activation of cluster 1. A comparison of SPM’s re-sults using various levels of smoothing, shows that increasing the width of theGaussian filter improves the number of detected clusters, but also degradestheir localization. The voxel-wise t-test with Bonferroni correction only detectsa few voxels. The proposed wavelet approach using the orthogonal B-splinewavelets closely resembles the result obtained by SPM with FWHM=1.5 vox-els, but misses the second activation region of cluster 3. As suggested in VanDe Ville et al. (2003), the DWT using the dual B-spline wavelet has a “pure”B-spline at the analysis side, which corresponds more closely to a SPM-likeGaussian prefiltering. Interestingly, the results using the dual B-spline waveletwith the same parameters as the orthogonal one (degree 1, 1 iteration), do de-tect the second activation region of cluster 3, and at the same time rendersmore concentrated activation patterns. This example shows that the choiceof the wavelet basis influences the results. An extensive comparison will beshown in a future paper.

5.3 Block-based experiment

Our example is an fMRI experiment with auditory stimulation following ablock-based paradigm (Rees and Friston (1999)). The data were obtained ona 2T Siemens Magneton, 7s repetition time, providing volumes of 64×64×64isotropic voxels of 3mm×3mm×3mm. The number of volumes used for thedata analysis is Nt = 84. The setup of the design matrix has been done usingSPM and incorporates a model for the HRF. In Fig. 10, we show the on-offstimulus function and the modelled activation response, which is obtainedby convolution with a HRF. The significance level α has been fixed at 5%,which gives, after correction for multiple testing by the number of intracranialvoxels, αB = 7.1 × 10−7. The activation patterns obtained by SPM for theslices located around the auditory cortex are shown in Fig. 11. We used theGaussian prefilter for typical values recommended by SPM; i.e., the FWHMin the left and right column is equal to 4mm and 6mm, respectively. Clearly,for a higher FWHM, stronger smoothing emphasizes large activation patterns.Specifically, the number of detected voxels is 348 and 617, respectively. Theactivated voxels are colored according to their t-value above SPM’s threshold.

For the desired significance level αB and Nt, the thresholds as determined byour optimization are τw = 6.058 and τs = 0.234. To visualize the detectedvoxels, we color them according to their ratio with the right-hand side ofEq. (34) since this normalization can be considered to be comparable to at-value.

In the left column of Fig. 12, we show the activation patterns detected by

17

Page 18: Integrated wavelet processing and spatial statistical

our method. The total number of detected voxels is 408, which is between thesensitivity reported using SPM with FWHM=4mm and FWHM=6mm. Giventhe strong type-I error control of the method, this is a promising result.

We also want to indicate the influence of the wavelet coefficients in the high-pass subbands on the results. In the right column, we show the result ofour approach if we only consider the detected coefficients from the lowpasssubband. The activation patterns are more “conventional” (larger and moreconnected regions) since they originate from lowpass-filtered data. The totalnumber of detected voxels now increases to 460. We observe that includingthe wavelet coefficients from the highpass subbands contributes to the spatialresolution of the detection map; i.e., activation patterns are refined, which in-dicates that those highpass wavelet coefficients contain essential informationand that the true activation patterns are well localized.

The threshold values τw and τs have been selected optimally according tothe second principle of our framework. As an experiment, we now lower thethreshold τw at a fixed value 5.6, and compute the remaining threshold τs

so as to satisfy the desired significance level αB. This way, we obtain τs =1.75. While this setting surely detects more wavelet coefficients, the spatialstatistical testing retains almost nothing: only a few voxels are detected. Thisclearly shows that our way of choosing the threshold values significantly differsfrom the standard wavelet approach, which suggests to select τw accordingto a two-sided t-test for the coefficients as in (12) in order to obtain thedesired significance level. For this experiment, such a procedure would lowerτw to 5.20 while further augmenting τs and finally detect no activation at all.These observations confirm the conservative nature of our framework and alsoindicate the optimality of the threshold selection and its criterion based onthe approximation quality of the reconstructed data.

6 Conclusions

We have presented an integrated framework for statistical analysis of fMRIdata using a joint spatio-wavelet approach. The major disadvantage of thestandard wavelet-based method is that it does not yield a statistical inter-pretation in the spatial domain. Therefore, our new framework addresses twoimportant issues: (1) the link between the wavelet and the spatial domain toproperly model the effect of processing wavelet coefficients; (2) the optimal se-lection of the two threshold values characterizing the processing in the waveletdomain and the statistical testing in the spatial domain. The proposed frame-work makes conservative assumptions; in particular, it applies a Bonferronicorrection for multiple testing, and therefore has a strong type-I error control.As a proof-of-concept, we included experimental results of a block-based fMRI

18

Page 19: Integrated wavelet processing and spatial statistical

experiment which are quite comparable to those of SPM. Future research isneeded to further explore the full potential of the new framework; e.g., theinfluence of parameters of the wavelet transform such as the degree/order ofwavelet, the number of iterations, and the type of wavelet transform (orthog-onal, semi-orthogonal).

The proposed framework incorporates only limited knowledge about the data(i.e., the HRF). Recently published methodologies propose to also take intoaccount aspects of neural dynamics (Friston et al. (2003)). The extension ofwavelet-based methods in this direction remains a challenge for future re-search.

A Bound for a convex sum of random variables

Theorem 1 Consider the random variables xk, k = 1, . . . , N , following the

same probability law as a generic random variable x, and the weighted sum∑N

k=1 λkxk, with λk ≥ 0 and∑N

k=1 λk = 1. Then the probability that this sum

be positive is bounded by

P

[N∑

k=1

λkxk ≥ 0

]

≤ mina>0

E [(1 + ax)+] . (A.1)

Proof: Obviously, this upper bound should still be valid without the minoperand, provided that the parameter a is positive. Let f(x) be the func-tion (1+ax)+, where a is a positive real parameter, and where (x)+ is definedas max(0, x). We then have

E

[

f

(N∑

k=1

λkxk

)]

= E

[

f

(N∑

k=1

λkxk

) ∣∣∣∣∣

N∑

k=1

λkxk ≥ 0

]

P

[N∑

k=1

λkxk ≥ 0

]

+E

[

f

(N∑

k=1

λkxk

) ∣∣∣∣∣

N∑

k=1

λkxk < 0

]

P

[N∑

k=1

λkxk < 0

]

(Bayes′ rule)

≥E

[

f

(N∑

k=1

λkxk

) ∣∣∣∣∣

N∑

k=1

λkxk ≥ 0

]

P

[N∑

k=1

λkxk ≥ 0

]

(since f(x) ≥ 0)

≥P

[N∑

k=1

λkxk ≥ 0

]

. (A.2)

(since infx≥0

f(x) = 1)

19

Page 20: Integrated wavelet processing and spatial statistical

Moreover, the function f(x) is convex, which implies Jensen’s inequality (Jensen(1906):

f

(N∑

k=1

λkxk

)

≤N∑

k=1

λkf(xk). (A.3)

Combining (A.2) and (A.3) results into

P

[N∑

k=1

λkxk ≥ 0

]

≤N∑

k=1

λkE [f(xk)] =

(N∑

k=1

λk

)

E [f(x)]

= E [f(x)]

= E [(1 + ax)+] ,

which is the bound of (A.1). Finally, the parameter a > 0 can be optimizedto make the right-hand side as small as possible.

The optimality of the choice f(x) = (1+ax)+ over all possible convex functionswill be shown in another paper.

B On the sharpness of the probability bound

By a simple example of a normalized Gaussian random variable x, we canshow that the bound (A.1) can be advantageously used to obtain “sharp”thresholds. According to Theorem 1, the probability that a realization of xexceeds the threshold τ is bounded by

P [x ≥ τ ] ≤ mina>0

E[(1 + a(x − τ))+]. (B.1)

The optimal value a∗ is found by setting the derivative of the right-hand sideto 0,

d

daE[(1 + a(x − τ))+] ≡ E[(x − τ)H(1 + a(x − τ))] = 0, (B.2)

which yields the optimal value a∗ ≈ τ , for large τ .It is instructive to rewrite the bound (B.1) as

P [x ≥ τ ]≤E[(1 + a∗(x − τ))+]

= E[(1 + a∗(x − τ))(1 + a∗(x − τ))0+]

= E[(1 + a∗(x − τ))0+], thanks to (B.2)

= P[

x ≥ τ − 1

a∗

]

.

20

Page 21: Integrated wavelet processing and spatial statistical

We clearly see that the estimated threshold τ becomes sharper as 1/a∗ getssmaller, which is the case for large threshold values since a∗ ≈ τ . This isparticularly true for short-tailed probability distributions such as the Gaussiandistribution.

Notice the difference with the optimal value a∗ = 1/τs in Sect. 3.2.1, whichcorresponds to the case of a truncated Gaussian random variables.

C Optimal values of τs and τw in Sect. 3.2.2

The relationship between τs and τw, given in (26), can be derived by computingthe expectation in (20):

τs =E[(ξ)+]

αB

=1

αB

∫ +∞

τw

x√2π

exp

(

−x2

2

)

dx

=1

αB

1√2π

exp

(

−τ 2w

2

)

.

To minimize τs + τw, we first rewrite this sum using (26) as

1

αB

1√2π

exp

(

−τ 2w

2

)

+ τw,

and set the derivative with respect to τw equal to zero

− τw

αB

1√2π

exp

(

−τ 2w

2

)

+ 1 = 0.

This finally yields

−τ 2w exp

(

−τ 2w

)

= −2πα2B.

The term −τ 2w can be identified as the inverse of the function f(W ) = W exp(W )

for f(W ) < 0, which is known as the −1-branch of the Lambert W -function(also called Omega function) and can be evaluated numerically (Corless et al.(1996)). In this way, we obtain the optimal threshold values

τs = 1/τw, τw =√

−W−1 (−2πα2B).

21

Page 22: Integrated wavelet processing and spatial statistical

D Calculation of the bound in Sect. 3.3

In this Appendix, we approximate the bound of Eq. (34) by a larger upper-bound. We recall the definition of the random variables

ξ = H

(∣∣∣∣∣

g′

s′/√

J

∣∣∣∣∣− τw

)

g′,

ς = s′/√

J,

where g′ is distributed according to a normalized Gaussian, and s′ accordingto a χ-distribution with J degrees of freedom. We also introduce the variablet′ = g′/(s′/

√J), which follows a t-distribution with J degrees of freedom.

Now, we can split the bound into three parts. We write

E[(1 + a(ξ − τsς))+] =

E[(1 − aτsς)+H(− |t′| − τw)] + (D.1)

E[(1 + a(g′ − τsς))+H(t′ − τw)] + (D.2)

E[(1 + a(g′ − τsς))+H(−t′ − τw)], (D.3)

where each part can be more easily calculated or bounded as follows:

(D.1)≤E[(1 − aτsς)+] (D.4)

(D.2) = E[(1 + a(g′ − τsς))H(t′ − τw)] (D.5)

(D.3)≤E[H(−t′ − τw)]. (D.6)

Equality (D.5) holds as long as τs < τw, which is a very reasonable constraintfor the threshold we expect. The first term (D.4) and the third one (D.6) canbe calculated as

(D.4) =γ

(J2, J

2(aτs)2

)

− aτs

√2J

γ(

J+12

, J2(aτs)2

)

Γ(

J2

) ,

(D.6) = ατw ,

where Γ(a) =∫ +∞

0 xa−1 exp(−x)dx is the Gamma function, and

γ(a, b) =∫ b

0xa−1 exp(−x)dx (D.7)

22

Page 23: Integrated wavelet processing and spatial statistical

is the related lower incomplete Gamma function 2 , and where ατw is the com-plementary cumulative t-distribution

∫ +∞

τwpt(t

′)dt′. Furthermore, by changing

variables from (g′, ς) to (t′, x = ς√

J + t′2), we can also compute the secondterm (D.5) as

(D.5) = ατw +a√

2π (1 + τ 2w/J)J/2

− aτs

2

J

2πB 1

1+τ2w/J

(J + 1

2,1

2

)

,

where Bb(a0, a1) =∫ b0 xa0−1(1 − x)a1−1dx is the incomplete Beta function 2 .

References

Bullmore, E., Fadili, J., Breakspear, M., Salvador, R., Suckling, J., Brammer,M., 2003. Wavelets and statistical analysis of functional magnetic resonanceimages of the human brain. Statistical methods in medical research 12 (5),375–399.

Bullmore, E., Long, C., Suckling, J., Fadili, J., Calvert, G., Zelaya, F., Car-penter, T., Brammer, M., 2001. Colored noise and computational inferencein neurophysiological time series analysis: Resampling methods in time andwavelet domains. Human Brain Mapping 12, 61–78.

Corless, R. M., Gonnet, G. H., Hare, E. G., Knuth, D. E., 1996. On theLambert W function. Advances in Computational Mathematics 5 (4), 329–359.

Desco, M., Hernandez, J., Santos, A., Brammer, M., 2001. Multiresolutionanalysis in fMRI: Sensitivity and specificity in the detection of brain acti-vation. Human Brain Mapping 14, 16–27.

Fadili, M. J., Bullmore, E., 2001. Wavelet-generalised least squares: a newBLU estimator of linear regression models with 1/f errors. NeuroImage 15,217–232.

Feilner, M., Blu, T., Unser, M., 2000. Analysis of fMRI data using splinewavelets. In: Proceedings of the Tenth European Signal Processing Confer-ence (EUSIPCO’00). Vol. IV. Tampere, Finland, pp. 2013–2016.

Frackowiak, R., Friston, K., Frith, C., Dolan, R., Mazziotta, J., 1997. HumanBrain Function. Academic Press.

Friston, K. J., Harrison, L., Penny, W. D., 2003. Dynamic causal modeling.NeuroImage 19, 1273–1302.

Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J. P., Frith, C. D., Frack-owiak, R. S. J., 1995. Statistical parametric maps in functional imaging: Ageneral linear approach. Human Brain Mapping 2, 189–210.

2 Note that MATLAB�

computes a normalized version of the incomplete Gammafunction, namely γ(a, b)/Γ(a), and of the incomplete Beta function, namelyBb(a0, a1)/B(a0, a1).

23

Page 24: Integrated wavelet processing and spatial statistical

Fu, Z., Hui, Y., Liang, Z.-P., 1998. Joint spatiotemporal statistical analysis offunctional MRI data. In: Proceedings ICIP. pp. 709–713.

Genovese, C. R., Lazar, N. A., Nicols, T. E., 2002. Thresholding of statisticalmaps in functional neuroimaging using the false discovery rate. NeuroImage15, 772–786.

Hilton, M., Ogden, T., Hattery, D., Eden, G., Jawerth, B., 1996. Waveletsin Biology and Medicine. CRC Press, Ch. Wavelet Denoising of FunctionalMRI Data, pp. 93–114.

Jensen, J. L. W. V., 1906. Sur les fonctions convexes et les inegalites entre lesvaleurs moyennes. Acta Math. 30, 175–193.

Kruggel, F., von Cramon, D., Descombes, X., 1999. Comparison of filteringmethods for fMRI datasets. NeuroImage 10 (5), 530–543.

Mallat, S., 1989. A theory for multiresolution signal decomposition: Thewavelet decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693.

Mallat, S., 1998. A Wavelet Tour of Signal Processing. Academic Press, SanDiego (CA).

Mueller, K., Lohmann, G., Zysset, S., von Carmon, Y., 2003. Wavelet statisticsof functional MRI data and the general linear model. Journal of MagneticResonance Imaging 17, 20–30.

Ogawa, S., Menon, R., Tank, D., Kim, S., Merkle, H., Ellerman, J., Ugurbil,K., 1993. Functional brain mapping by blood oxygenation level-dependentconstrast magnetic resonance imaging. Biophysical Journal 64, 803–812.

Poline, J., Worsley, K., Evans, A., Friston, K., 1997. Combining spatial extentand peak intensity to test for activations in functional imaging. NeuroImage5 (2), 83—96.

Rees, G., Friston, K., 1999. Single subject epoch (block) auditory fMRI acti-vation data. http://www.fil.ion.ucl.ac.uk/spm/data/.

Ruttimann, U., Unser, M., Rawlings, R., Rio, D., Ramsey, N., Mattay, V.,Hommer, D., Frank, J., Weinberger, D., 1998. Statistical analysis of func-tional MRI data in the wavelet domain. IEEE Transactions on MedicalImaging 17 (2), 142–154.

Shafie, K., Sigal, B., Siegmund, D., Worsley, K., 2003. Rotation space randomfields with an application to fMRI data. Annals of Statistics 31, 1732–1771.

Turkheimer, F. E., Brett, M., Aston, J. A. D., Leff, A. P., Sargent, P. A., Wise,R. J., Grasby, P. M., Cunningham, V. J., 2000. Statistical modelling ofpositron emission tomography images in wavelet space. Journal of CebebralBlood Flow and Metabolism 20, 1610–1618.

Van De Ville, D., Blu, T., Unser, M., 2003. Wavelets versus resels in thecontext of fMRI: establishing the link with SPM. In: SPIE’s Symposium onOptical Science and Technology: Wavelets X. Vol. 5207. SPIE, San DiegoCA, USA, pp. 417–425.

Wink, A. M., Roerdink, J. B. T. M., 2004. Denoising functional MR images:a comparison of wavelet denoising and Gaussian smoothing. IEEE Transac-tions on Medical Imaging.

24

Page 25: Integrated wavelet processing and spatial statistical

Woolrich, M. W., Ripley, B. D., Brady, M., Smith, S. M., 2001. Temporalautocorrelation in univariate linear modeling of fMRI data. NeuroImage14 (6), 1370–1386.

Worsley, K., Marrett, S., Neelin, P., Evans, A., 1996. Searching scale space foractivation in PET images. Human Brain Mapping 4 (1), 74–90.

25

Page 26: Integrated wavelet processing and spatial statistical

(a)

〈v(·), ϕ(·/2 − k)〉

v(x) v(x)⟨

v(·), ψ(·/2 − k)⟩

(b)

H(z−1) ↓ 2 c1[k] ↑ 2 H(z)

v[k] c0[k] c0[k] v[k]

G(z−1) ↓ 2 w1[k] ↑ 2 G(z)

Fig. 1. The discrete wavelet transform for one iteration. (a) The continuous-domainrepresentation. (b) The filterbank representation.

Method cluster 2 cluster 3 cluster 4

4% 2% 1% 4% 2% 1% 4% 2% 1%

SPM (FWHM=1 voxel) 2

non

edet

ecte

d

non

edet

ecte

d

4 0

non

edet

ecte

d

26 4 0

SPM (FWHM=1.5 voxels) 6 9 2 47 18 1

SPM (FWHM=2 voxels) 15 16 4 75 37 7

Spatial t-test 0 1 0 13 0 0

Wavelet (orthogonal) 7 17 0 41 9 1

Wavelet (dual) 1 6 2 30 6 2

Table 1Overview of the number of voxels detected for each activation cluster, correspondingto the results of Fig. 9.

26

Page 27: Integrated wavelet processing and spatial statistical

(a)

αB

preprocessing• realignment

DWT LMstatistical inference• t-test

IDWT

τw

......

(b)

αB

preprocessing• realignment

DWT LMprocessing• threshold

IDWT|IDWT|

test

τw τs

......

Fig. 2. (a) The standard wavelet-based approach. The desired significance level αB

is fed to the statistical inference stage in the wavelet domain. (b) The approachof the present paper is to reinterprete the wavelet threshold as a general algorithmparameter. After reconstruction, a statistical test is performed in the spatial domain,taking into account the wavelet processing.

27

Page 28: Integrated wavelet processing and spatial statistical

pξ(ξ)

ξ

τw −τ

w

Fig. 3. The truncated Gaussian law pξ(ξ).

(a)

(b)

9 8 7 6 5 4 3 23

3.5

4

4.5

5

5.5

6

6.5

7

log10

(α )

τw

9 8 7 6 5 4 3 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

τs

B

Fig. 4. Simplified case where σk is assumed to be known. (a) Probability surface.Equiprobable contour lines are in white. The circular dots mark the optimal thresh-old values. (b) The optimal threshold values τw (full line) and τs (dotted line).

28

Page 29: Integrated wavelet processing and spatial statistical

wavelet coefficientsgk

(1) reconstructionu[n] =

∑k gkψk(n)

minimize worst-caseapproximation error

(2) thresholded coefficientsr[n] =

∑k H(|tk| − τw) gkψk(n)

(3) thresholded reconstructionr′[n] = H(r[n] − τsΛ[n]) r[n]

Fig. 5. Principle of minimizing the worst-case approximation error between thedetected and the unprocessed parameter map, represented by r′[n] and u[n], re-spectively.

29

Page 30: Integrated wavelet processing and spatial statistical

(a)

(b)

9 8 7 6 5 4 3 23

3.5

4

4.5

5

5.5

6

6.5

7

log10

(α )

τw

9 8 7 6 5 4 3 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

τs

B

(c)

9 8 7 6 5 4 3 23

3.5

4

4.5

5

5.5

6

6.5

7

log10

(α )

τw

9 8 7 6 5 4 3 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

τs

B

Fig. 6. General case taking into account the estimate of σk. (a) Probability map. Thewhite contour lines are equiprobable. The circular dots mark the optimal thresholdvalues. (b) Optimal threshold values for Nt = 50. (c) Optimal threshold values forNt = 150.

30

Page 31: Integrated wavelet processing and spatial statistical

10−6

10−5

10−4

10−3

10−8

10−7

10−6

10−5

10−4

10−3

Expected FPF (αB)

Obs

erve

d F

PF

spatial t−testwavelet approach

Fig. 7. Detection results for synthetic null data sets. The curves show the observedversus expected FPF as an average over 200 experiments. Each experiment corre-sponds to a time-series of Nt = 120 volumes (64×64×22 voxels). The design matrixcontained a dummy on-off activation with 5 volumes per epoch.

31

Page 32: Integrated wavelet processing and spatial statistical

(a)

1 2 3 4

(b) (c)

Fig. 8. The software phantom was constructed using the initial 3D activation regionsof (a), which where embedded with three different signal levels into the intracranialmask of (b). The initial activation regions contain 1, 3, 7, and 25 voxels, respectively.Next, these activation regions where smoothed by a Gaussian filter with FWHM=2voxels, resulting into the activation map shown in (c); right: cluster 1, bottom:cluster 2, top: cluster 3, left: cluster 4. Subfigures (b) and (c) show the central sliceof the volume.

32

Page 33: Integrated wavelet processing and spatial statistical

SPM (FWHM=1 voxel) SPM (FWHM=1.5 voxels)

SPM (FWHM=2 voxels) Spatial t-test (Bonferroni)

Wavelet (orthogonal) Wavelet (dual)

10 20 30 40 50 60

10

20

30

40

50

60

Fig. 9. Activation patterns obtained by the various methods for the central slice ofthe software phantom.

33

Page 34: Integrated wavelet processing and spatial statistical

0 100 200 300 400 500−0.2

0

0.2

0.4

0.6

0.8

1

t [s]

Fig. 10. Stimulus function (dotted line) and model for activation response (full line).

34

Page 35: Integrated wavelet processing and spatial statistical

SPM (FWHM=4mm) SPM (FWHM=6mm) Zoom

Fig. 11. Activation patterns obtained by SPM for the slices around the auditorycortex for SPM (FWHM=4mm and FWHM=6mm). The close-ups contain the lowerpart of the activation.

35

Page 36: Integrated wavelet processing and spatial statistical

All subbands Low-pass only Zoom

Fig. 12. Activation patterns obtained by our wavelet-based approach for the sameslices as in Fig. 11 around the auditory cortex. Left: Results using the full waveletdecomposition; Right: Results using only the low-pass subband.

36