Wavelet-Based Statistical Analysis in Functional Neuroimaging...Abstract: Wavelet-based analysis versus Gaussian smoothing in statistical parametric mapping (SPM) for detecting and

Wavelet-Based Statistical Analysis in Functional Neuroimaging

RADU MUTIHACUniversity of Bucharest

Department of Electricity and Biophysics405 Atomistilor St., 077125 Bucharest

[email protected]; http://www.fizica.unibuc.ro/mutihac

Abstract: Wavelet-based analysis versus Gaussian smoothing in statistical parametric mapping (SPM) for detectingand analyzing brain activity from functional magnetic resonance imaging (fMRI) data is presented. Detection ofactivation in fMRI data can be performed in the wavelet domain by a coefficient-wise statistical t-test. The linkbetween the wavelet analysis and SPM is based on two observations: (i) the low-pass analysis filter of the discretewavelet transform (DWT) can be similarly shaped to a Gaussian filter in SPM, (ii) the subsampling scheme providesmeans to define the number of coefficients in the low-pass subband of the wavelet decomposition [52]. Analysis ofan fMRI block-based visual stimulation paradigm was comparatively performed by wavelet analysis and statisticalparametric mapping (SPM) [13] based on the Random Field Theory (RFT). The voxels were isotropic and the samegeneral linear model (GLM) was employed in both the image space and the wavelet domain. Consequently, anequivalent spline degree for which the low-pass part of the wavelet analysis is basically equivalent to SPM can becomputed. The processing of neuroimaging data in the wavelet domain was carried out by means of two differentbiorthogonal transforms: 3D fractional-spline wavelets and 2D+Z fractional quincunx wavelets [51], resulting inactivation patterns similar to the activation maps obtained by linear regression analysis in SPM.

Key–Words: Wavelets, Fourier transform, multiresolution analysis, functional magnetic resonance imaging, generallinear model, statistical parametric mapping.

1 Introduction

Analysis of functional magnetic resonance imag-ing (fMRI) data, the most complex biomedical timeseries, is a non-invasive method that allows to local-ize and study the dynamic brain processes in intactliving brains. The promise of functional brain imag-ing is to provide a reliable answer to some of the fun-damental open questions of understanding the brain,namely how microscopic and macroscopic organiza-tion in the brain relate and interact to produce brainfunction [42].

The index of neuronal activity widely spreadin neuroimaging is the blood oxygenation leveldependent (BOLD) contrast [35], which is basedon the differing magnetic susceptibilities of oxy-genated hemoglobin (diamagnetic) and deoxygenatedhemoglobin (paramagnetic) in relation with the sur-rounding tissue. The basic assumption is that an in-crease in neuronal activity within a brain region leadsto hyperoxemia, which attracts an increase in localblood flow and reduced concentrations of deoxyhe-moglobin in the blood vessels. This is due to an over-compensation of the local tissue perfusion in responseto the increased energy demand of the activated neu-

Figure 1: Brain blood flow responds to brain activa-tion.

rons [28]. Consequently, relative decreases in deoxy-hemoglobin concentration attract a reduction in localfield inhomogeneity and a slower decay of the MRsignal, resulting in higher intensities in T2*-weightedimages (Fig. 1). Most of neuroimaging data anal-ysis relies on inferential hypothesis-driven analysisthat employs spatially extended processes like SPM,which is a representation of functional activity distri-bution induced by a certain task. In a single-subjectexperiment, the analysis of statistical parametric mapsconsists in collecting results of univariate tests com-

Proceedings of the 6th WSEAS International Conference on Wavelet Analysis & Multirate Systems, Bucharest, Romania, October 16-18, 2006 59

puted on each voxel of the brain. Changes in bloodflow, which are closely coupled to functional acti-vation, indicate brain areas involved in various tasksthat subjects perform in the scanner (Fig. 2). How-ever, these changes and oxygenation variability (vas-cular and hemodynamic) are temporally delayed rela-tive to the neural firing (the hemodynamic lag). Thenature of these regulatory processes and the BOLDsignal change have not been completely elucidated sofar. An additional difficulty in delineating functionalcorrelates from spatiotemporal fMRI data sets stemsfrom the relatively small effect sizes in blood-flow re-lated phenomena (1% − 4% in 1.5 T scanners), quiteclose to scanner-to-scanner variability, which is trans-lated in low signal-to-noise ratio (SNR) of the BOLDsignal. The difficulty in extracting information fromraw data is supplementary increased by the possibilitythat functional correlates of brain activity may relateto given behavioral paradigms in complicated ways.Moreover, the interpretation of functional brain imag-ing data requires some assumptions on processing inthe working brain that may not be entirely realisticand which preclude canonical methods of data analy-sis and experimental design.

Low SNR in fMRI and the inevitable presenceof confounding effects often result in unsatisfactoryanalysis. Significant noise and artifacts present infMRI time series, as well as their unknown struc-ture, complicate the problem of activation detectionin time domain. In functional neuroimaging, SNRcan be improved by two means: (i) reducing thescan acquisition time (TR); and (ii) by stimulus rep-etition followed by scan averaging. The duration ofBOLD-based fMRI experiments is primarily affectedby the physiological sources of variability (cardiac,pulmonary, and other pulsations), subject and scan-ner movements, RF coil heating, and gradient drift.BOLD methods are sensitive to artifacts associatedwith head and/or vessel motion [17], vascular inflow[9], and drainage effects [25]. Quite often these ar-tifacts are correlated with the signal of interest and,consequently, not discarded by simple image aver-aging. Still BOLD-based neuroimaging methods aregenerally superior to any imaging modalities in termsof: (i) equipment availability, (ii) no exogeneous con-trast agents or exposing to ionizing radiation require-ment, (iii) best spatial resolution, and (iv) activationimages can be coregistered with anatomical imagesacquired on the same machine [37].

In both multisubject and/or multisession exper-iments, in order for a voxel to represent the sameanatomical location for all subjects/sessions under ev-ery condition, raw data usually undergo a number ofpreprocessing steps and are mapped into a standard-ized coordinate space that accounts for differences

Figure 2: Hypothesis-driven statistical analysis inSPM resulting in parameter maps of task-correlatedbrain activity. Gaussian Random Field Theory is em-ployed to handle multiple hypothesis testing.

in brain size and orientation [13], [54], such as thestereotaxic space [43]. Images of the same subjectare realigned among themselves and than coregisteredwith the correspondent MRI volume. The latter isthen used to compute the nonlinear transformationthat warps images of all subjects in a standardized ref-erence space. A test statistic (e.g., t-statistic) is com-puted at each anatomical site by application of a GLMthat corrects for the effects of no interests. The resultis a large set of locations that defines a statistical field.

Statistical fields are usually smoothed by appli-cation of a Gaussian filter. Spatial smoothing is ap-plied in order to: (i) match the spatial scale of hemo-dynamic responses among subjects/sessions; (ii) in-crease the SNR (however, variation between subjectsdominates the noise in multisubject data); (iii) bet-ter data match to the assumptions of the GaussianRFT; (iv) normalize the error distribution to improvethe validity of inferences based on parametric tests;(v) spatial smoothing of temporal autocorrelations de-creases errors in the estimated SD’s by increasing theeffective degrees of freedom (df ) and decreasing itssensitivity to underlying temporal correlation struc-ture [57]. Less spatially variable t-statistic imagesand lower thresholds (p < 0.05) entail better detec-tion of activations and improves the physiological rel-evance of statistical inference; (vi) in multisubject ex-periments, averaging is necessary to smooth the pro-jected data down to a scale where homologies in func-tional anatomy are expressed across subjects. Run-ning more subjects will, nevertheless, improve statis-tical power much more than moving to higher fields.

The final task in statistical analysis of neuroimag-ing data is to come out with a statistical map in thatone has to identify the locations of the images affected


by the experimental paradigm. Commonly, these ar-eas are identified by setting an appropriate thresh-old and retaining as significant those anatomical lo-cations where the statistical field is higher than thechosen level. Values for the threshold are usuallycomputed following criteria derived from the Gaus-sian RFT. This approach provides the probability offalse positives for a certain threshold for a statisticalfield of a certain smoothness. The higher the smooth-ness, the higher is the correlation among adjacent lo-cations, the lower the risk of a noise field crossing theselected threshold. By smoothing the field with a filterof fixed resolution, the probability of detecting signalsof that particular size is maximized, yet sharper focimight be wiped out or signal of varying spatial scalecould erroneously be detected. The latter is overcomeby a multiresolution approach that employs a numberof different smoothing levels and sets the threshold ac-cordingly. However, thresholding the statistical fieldsequates to ”modeling” the noise rather than the signal,since it comes out with a binary output: everythingover the threshold is unlikely to be noise, everythingunder the threshold is likely to be noise [55]. The areaof change can be detected, though nothing can be saidon the shape, scale, and intensity of the signal becausethe values of the statistics over the threshold are obvi-ously still subject to random (noise) fluctuations.

Based on some assumptions, various parametricand nonparametric denoising methods have been pro-posed, though these assumptions may not necessarilyhold for fMRI data [29]. Wavelet methods approachthe problem of the analysis of statistical fields by es-timating the signal at any resolution among the ran-dom fluctuations. Recently, the appearance of explicitorthonormal bases in multiresolution analysis (MRA)entailed significant implications on fMRI data anal-ysis. Wavelet application to statistical fields is simi-lar to wavelet applications to images since statisticalmaps are just images with noise with variance equalto unity. Techniques that are able to identify smallsignal changes against a noisy background are em-ployed in fMRI, yet many of them cannot deal withresponses which can change amplitude in an unpre-dictable manner. Wavelets have successfully been ap-plied to time-frequency analysis of nonstationary sig-nals to detect hemodynamic responses to experimen-tal stimuli. Wavelet shrinkage of statistic maps over-laps with the work on multiple hypothesis testing [30],yet wavelet shrinkage has not thoroughly been investi-gated for 2D and 3D denoising of fMRI statistic mapsprior to statistical testing in the spatial domain. Gaus-sian spatial smoothing with single kernel is widelyapplied for this purpose at the risk of missing to de-tect spatial features of the smoothing kernel size orlower. Smoothing by wavelet shrinkage allows lo-

cally adaptive bandwidth so that the power to detectspatial features of varying extent is not constrained bythe arbitrary choice of a single kernel size [3]. Manytechniques have focused on improving the power ofthe statistical test in the wavelet domain by proposingalternative error rates (i.e., false discovery rate), bymodifying the hypotheses to be tested (i.e., recursivetesting), or by fine-tuning the statistical models (i.e.,Bayesian framework) [51].

In summary, the wavelet-based statistical analy-sis provides: (i) multiresolution decomposition suit-able for scale-invariant process analysis; (ii) sparserepresentation of typical brain activation maps; (iii)optimally whitening of data producing approximatelydecorrelated wavelet coefficients; (iv) Karhunen-Loeve (KL) expansions for long-memory (1/f -like)processes, which is the case in fMRI; and (v) goodestimators for the noise process parameters.

2 Functional Neuroimaging DataAnalysis

Extracting information from raw data is aiming toreveal the structure in data and model the underlyingprocesses that generated data. In practice, data are ac-quired in their rawest form and, consequently, are oflittle immediate use. It is only when the information isextracted via processing that data become meaningful.In data analysis, it is often desirable to reduce the di-mension of feature space because there may be irrele-vant or redundant features that complicate subsequentinferences and model design, increase the computa-tional demand, and render the analysis suboptimal.

2.1 Specificity of Cerebral ActivityThe general framework for the analysis of func-

tional neuroimaging data was established in positronemission tomography (PET) and extended thereafterto fMRI. Such noninvasive neuroimaging techniquesare widely used to study brain function and dysfunc-tion by spatially localizing dynamic brain processesin intact living brain. Changes of neural activity asso-ciated with various stimulus conditions and behaviorsare referred to as functional correlates. It is a com-mon feature for all brain imaging methods to mea-sure local neuronal activity by indirect means; PETand fMRI measure local properties of cerebral bloodflow (CBF) [20]. PET signals are based on regionalcerebral blood flow (rCBF), whereas fMRI signals aremost commonly based on BOLD contrast, which canbe employed to track blood-flow-related phenomenaaccompanying or following neuronal activations.


Two general principles of cerebral function havebeen derived from investigating brain lesions andrecording signals from smaller or larger clusters ofneurons: (i) functional specialization of brain regions,which means that different brain regions perform dif-ferent tasks [58], and (ii) functional integration, whichstates that cerebral functions are carried out by net-works of interacting regions and that different func-tions correspond to different networks [15]. Thereare consequently two main types of assumption un-derlying the interpretation of functional neuroimages,namely the subtraction paradigm and the covarianceparadigm [19]. The subtraction paradigms assumethat different brain regions are engaged in differentbrain functions (i.e., they rely on functional special-ization). The covariance paradigms assess the tem-poral covariance between different brain regions dur-ing a particular task. Significant covariance betweenregions associated with a particular brain functionis termed functional connectivity. The extraction offunctional correlates from raw data sets is facilitatedby using the subtraction or covariance paradigms forpreprocessing. Due to their complementarity, it is of-ten necessary to employ both of them in order to re-solve all the functional components of a given cerebralprocess [41].

2.2 Inferential versus Exploratory AnalysisData analysis methods employed in neuroimag-

ing to reveal statistical regularities in data that canbe associated with brain function can be loosely di-chotomize in two large categories: hypothesis-driven(inferential) and data-driven (exploratory) analysis.Most of imaging neuroscience relies on inferentialanalysis, which makes use of spatially extended pro-cesses like statistical parametric mapping (SPM). Theexpected BOLD changes in fMRI are specified asregressors of interest in a multiple linear regressionframework like the general linear model (GLM), andthe estimated regression coefficients are tested againsta null hypothesis. The voxel-wise test statistics formsummary images known as statistical parametric mapsthat are representations of the spatial distribution offunctional activity induced by the task, and which aresubsequently assessed for statistical significance [36].This approach is essentially confirmatory in natureand based on strong priors about the spatio-temporalcharacteristics of the signals in data. Consequently,the inferred spatial patterns of activation depend heav-ily on the accuracy of these assumptions.

The analysis of activation images in brain imag-ing relies on collecting results of univariate tests com-puted on each voxel of the brain under different ex-perimental conditions in one subject, in different sub-

jects or groups of subjects. Statistical analysis of im-age differences corresponding to different experimen-tal conditions belongs to one of the two classes: (i)techniques which first partition the brain into regionsof interest, based upon anatomical structure, and (ii)techniques in which one creates images of a statisti-cal parameter at the level of full brain. In SPM oneapplies statistical hypothesis testing to each elementof the difference image between two different activa-tion states of the brain controlled by some experimen-tal paradigm on the basis of modeling the differenceimage as a realization of a Gaussian random field. Be-cause the images have poor SNR due to intrinsic bio-logical heterogeneity, scanner-induced noise, and sub-ject’s movement, averaging over several experimentaltrials/subjects is a common practice that produces amean difference image and its associated sample stan-dard deviation (SD) image [37]. Their direct statisticalanalysis in the spatial domain is problematic becauseof: (i) low SNR signals, (ii) the large number of pixelssubject of investigation, (iii) the inevitable spatial cor-relation among pixels caused by the acquisition pro-cess and the spatial preprocessing of data.

The main difficulties encountered by the thehypothesis-driven models in fMRI data analysis arethe following:

1. Possible presence of unmodeled signals in data,particularly artifactual activity;

2. Structured noise, which is temporally non-orthogonal to an assumed regression model, willbias the parameter estimates in the design space,whereas noise orthogonal to the model subspacewill inflate the residual errors, thus reducing thestatistical significance and rendering the analysissuboptimal;

3. An increasingly number of models include spa-tial prior information, whereas the RFT-based in-ference deals with spatial properties after mod-elling has completed.

4. The standard GLM is essentially univariate [13].The main limitations of univariate hypothesis-driven methods stem from their solely depen-dence on temporal predictability of phenomenonto be detected and neglecting information deriv-ing from the covariance of the acquired voxeltime series (though univariate approaches maybe formulated independent of a temporal model[11]). Comparatively, multivariate fMRI dataanalysis relies on the covariance paradigm and isfree of prior assumptions on activation functions.

In contrast with the inferential approach, the ex-ploratory analysis makes no reference to prior knowl-


edge of the structure in data and provides modelswhose characteristics are determined by the statisticalproperties of data only. Within a well-defined hypoth-esis (model) space, competing models can be testedusing Bayesian second level of inference, which esti-mates how probable a set of alternative models is [32].Skilling et al. [40] defined the evidence of a model byintegrating the normalization denominator in Bayes’theorem over its prior parameters. Bayesian ranking iscarried out by evaluating and comparing the evidencefor alternative models

Exploratory analysis of fMRI data is able to de-tect functional activity without any reference to theexperimental protocol or any model space, rather itcan reveal new components in data, which are difficultto a priori cast in a temporal model, even though mul-tivariate models can be formalized in terms of tem-poral modelling. Moreover, multivariate analysis al-lows the study of neural phenomena that generate anonzero mutual correlation among voxel time coursesfrom different interconnected areas. The data-driven(model-free) methods include generally the (neuro-morphic) unsupervised learning methods like self-organizing artificial neural networks (NNs), eigenim-age analysis, temporal clustering analysis (CA) andfuzzy clustering analysis (FCA), factor analysis (FA),projection pursuit (PP), principal component analysis(PCA), and independent component analysis (ICA).Exploratory, data-driven techniques are complemen-tary to hypothesis-led methods; the representativetime courses they produce may be viewed as alterna-tive hypotheses to the null hypothesis (i.e., no acti-vation). A critical evaluation and comparison of thedata-driven methods used in fMRI data analysis hasnot been published to date. Besides, as briefly statedby Huber [21], ”. . . there are no panaceas in data anal-ysis” whatsoever, so that an educated choice appearsto be domain-dependent.

Tukey [44] argued that classical statistics leaningon analyzing small, homogeneous, stationary data bymeans of known distributional models and assump-tions will prove inappropriate to deal with the prob-lems raised by the analysis of large and complexdata. A typical 3D multi-slice volume fMRI braindata set for a single subject ranges from 108 to 1011

bytes, and the values increase further in simultaneousinter-subject and/or multi-subject studies. Two fea-tures of fMRI data that characterize massive data sets,namely nonstationarity and distributional heterogene-ity, if taken into account, are making the analysis evenmore complicated. It is claimed that the difference be-tween real-life large data sets and smaller ones consistnot only in size but in qualitative terms as well [22].Consequently, the investigations of functional brainimaging data should primarily rely on critical consid-

eration of methods that belong to data mining and ex-ploratory data analysis (EDA) [31]. In this respect,multivariate techniques are often combined with data-driven techniques to provide more complex and in-sightful representations of data.

2.3 Wavelet AnalysisWavelets are mathematical functions best suited

to process signals that contain discontinuities andsharp spikes by cutting them up into different fre-quency components and subsequently analyzing eachcomponent with a resolution matched to its scale. Un-like the traditional Fourier bases, wavelet bases of-fer a degree of localization in space as well as infrequency. Orthogonal transforms project a signalonto a set of basis functions without alterations sothat its key features can be detected in lower dimen-sional subspaces. This enables development of simplefunction estimates that respond effectively to discon-tinuities and spatially varying degrees of oscillationsin a signal, even if noisy. Wavelets analysis proce-dure is to adopt a wavelet prototype function (analyz-ing wavelet or mother wavelet). Temporal analysis isperformed with a contracted, high-frequency versionof the prototype wavelet, while frequency analysis isperformed with a dilated, low-frequency version ofthe same wavelet. Scale-varying basis functions ren-der signal processing less sensitive to noise because itmeasures the average fluctuations of the signal at dif-ferent scale. Since the original data can be representedin terms of wavelet expansion (i.e., a linear combina-tion of the wavelet functions), any operations on datacan be carried out using the corresponding wavelet co-efficients only.

In the context of fMRI data analysis, the featuresof the brain signals are not well-known and the op-timal basis functions cannot therefore be specified inadvance. Wavelet methods approach the analysis ofstatistical fields by estimating the signal at any reso-lution among the random fluctuations. Wavelet appli-cation to statistical fields is similar to wavelet appli-cations to images. The variance is computed morestraightforward for statistical maps than for imagesbecause: (i) statistical maps are images with noisevariance equal to unity; (ii) pure noise images (i.e.,residual images) can be obtained by subtracting fromthe original scans the effects estimated through statis-tical analysis. Then the noise power of the field can becomputed through Fourier techniques [56]. Variancesof wavelet levels are computed by the product of thepower function of the field with the power function ofthe wavelet filters [46]. Therefore, statistical maps aretransformed using the DWT, the resulting coefficientsare thresholded, and, finally, the denoised statistical


Figure 3: Multiresolution decomposition of a digitalsignal

maps are reconstructed by the inverse DTW [37] [47].

A multiresolution analysis (MRA) performed bythe WT circumvents the problem by detecting andextracting the key signal features over many scales(Fig. 4). Wavelet methods are particularly attractivedue to broadly fractal properties exhibited by the brainin space and time. Since the interest in fMRI datais focused on relatively spatially localized signals,wavelets are particularly suited to represent them bya small number of strong local coefficients, whereasthe power of white noise is uniformly spread through-out the wavelet space. Some wavelet-space partitionsmay exhibit improved localized SNR conditions thatcan be exploited by an orthogonal wavelet decompo-sition. Then the statistical analysis can be restrictedfor the significant coefficients to these partitions only.The potential benefits are: (i) improved SNR condi-tions, and (ii) a decrease of the detection thresholddue to the reduced number of statistical tests, bothcontributing to a higher detection sensitivity with noincrease in type I errors (false positives) [37]. There-fore, the control of false positives is stringent and theBonferroni approach to wavelet thresholding was ini-tially suggested [45].

2.3.1 Mathematical BackgroundIn order to consistently sketch the wavelet anal-

ysis framework, some basic mathematical conceptsneed to be reviewed. The space of all functionsf(x), x ∈ R that are square integrable (e.g., finite en-ergy) in the Lebesgue’s sense

∫ +∞

−∞|f(x)|2 dx < +∞ (1)

is denoted by L2(R) or, simply, L2. Wavelet analysisprocedure is to adopt some two continuously-definedfunctions:

1. The scaling function (or father function) φ(x),The scaling function φ(x) is the solution of atwo-scale equation:

φ(x) =√

2∑

k∈Zh(k)φ(2x− k) (2)

where the sequence {h(k)}k∈Z is the refinementfilter.

2. Its associated wavelet function ψ(x) (prototypeor mother wavelet function):

ψ(x) =√

2∑

k∈Zg(k)φ(2x− k) (3)

where {g(k)}k∈Z is a suitable weighting se-quence.

Wavelets are ”small” waves: they oscillate and theircurves yield zero net area:

∫ +∞

−∞ψ(x) dx = 0 (4)

The ”smallness” refers to the fact that they are local-ized in time, in contrast to Fourier basis consisting ofsines and cosines that are perfectly localized in fre-quency space but do not decay as a function of time(i.e., nonlocal support). Wavelets decay to zero asx → ±∞ and exhibit good localization properties infrequency space.

In wavelet MRA, details at various levels of reso-lution are represented by the superposition of waveletsassociated with the appropriate dilations (Fig. 4).Complex data can be inspected by dilation, which per-forms ”zoom-in” on details. Reversely, details canbe suppressed and wavelets are, therefore, candidatesfor data smoothing. Unlike the case of Fourier trans-forms, a large selection of wavelet families is availabledepending on the mother wavelet. Nevertheless, de-sirable properties like orthogonality, compactness ofsupport, rapid decay, and smoothness impose severalrestrictions on the choice of mother wavelet.


Figure 4: Wavelet MRA of a typical signal in fMRI:(a) the original MR signal, (b) forward periodized,biorthogonal, symmetric WT of the signal (waveletcoefficients), (c) multiresolution decomposition of theMR signal, (d) Inverse WT reconstruction of the MRsignal from the wavelet coefficients.

2.3.2 Wavelet Basis FunctionsFunctional features of the brain signals are com-

plex, largely unknown, and difficult to mathematicalmodeling, so that optimal basis functions cannot bespecified in advance. Wavelet MRA circumvents thisproblem by detecting and extracting the key signalfeatures over many scales. A wavelet basis is frac-tal and so a natural choice of the basis for analysis offractal data. Hence wavelets may be more than justanother basis for analysis of fMRI data [2].

Wavelet construction starts from the basic motherwavelet, ψ(x), that generates the basis by (dyadic)dilation (index j) and translation (index k) in timeψ(ax − b). For discrete wavelets, the parameterof translation, b, and dilation, a, are restricted todiscrete sets, usually a = 2j and b = k wherej, k ∈ Z. Dilation allows hierarchical representationof a data set. Temporal analysis is performed witha contracted, high-frequency version of the prototypewavelet, while frequency analysis is performed with adilated, low-frequency version of the same wavelet.

ψj,k = 2−j/2ψ(x/2−j − k) (5)

Any function f ∈ L2 can be uniquely representedby the expansion:

f(x) =∑

j∈Z

∑

k∈Zdj(k)ψj,k(x) (6)

The wavelet coefficients {dj(k)}j,k∈Z are obtained by

forming the (double infinite) sequence of inner prod-ucts:

dj(k) = 〈f, ψj,k〉L2 , j, k ∈ Z (7)

where {ψj,k}j,k∈Z is the biorthogonal basis of{ψj,k}j,k∈Z such that:

〈ψj,k, ψi,l〉 = δj−i · δk−l, i, j, k, l ∈ Z (8)

Spline bases posses the best approximation prop-erties like the smallest L2-error [48]. Due to theirsmoothness, splines are well localized in both timeand frequency domains. Studies on wavelet applica-tion in fMRI data analysis emphasized the importanceof symmetric wavelets and scaling functions that arefree from phase distortions [37]. Orthogonal bases aremainly recommended because of the following: (i)signal features not known beforehand can be detectedand extracted in a multiresolution approach over manyscales; (ii) transform of white noise into white noise[23]. As such, in wavelet analysis of fMRI time se-ries, the preprocessed data are subject to spatial non-redundant DWT, rather than spatially convolved witha Gaussian kernel.

Orthogonal wavelet basis functions for fMRI timeseries can be found by appropriate choice of the se-quences {h(k)}k∈Z and {g(k)}k∈Z or, equivalently, φand ψ, such that {ψj,k}j,k∈Z constitutes an orthonor-mal basis of L2. Hence

∀f ∈ L2, f(x) =∑

j∈Z

∑

k∈Zdj(k)ψj,k+

∑

j∈Z

∑

k∈Zcj(k)φj,k

(9)where the wavelet coefficients {dj(k)}j,k∈Z and theapproximation coefficients {cj(k)}j,k∈Z, due to or-thogonality, are obtained by inner products with thecorresponding basis functions:

dj(k) = 〈f, ψj,k〉, cj(k) = 〈f, φj,k〉, (10)

The decomposition of any f ∈ L2 is practically car-ried out on a finite number of scales only, say J , sothat:

f(x) =J∑

j=1

∑

k∈Zdj(k)ψj,k +

∑

k∈ZcJ(k)φJ,k (11)

It is more convenient to describe the underly-ing decomposition algorithm that uses two comple-mentary filters h and g. We consider hereafter non-redundant dyadic orthogonal wavelet transforms only.The algorithm consists of an iterated orthogonal fil-terbank with an analysis and a synthesis part (Fig. 5).Orthogonality imposes:

H(z) = H(z−1) and G(z) = G(z−1) (12)


Figure 5: Decomposition and reconstruction quadra-ture mirror filters. Analysis part (left) and synthesispart (right) of the WT filterbank.

where H(z) is the the synthesis scaling filter, thatis, the transfer function (z-transform) of the low-passrefinement filter h, and H(z) is the associated anal-ysis scale filter. Likewise, G(z) is the the synthe-sis wavelet filter, that is, the transfer function (z-transform) of the high-pass filter g, and G(z) is theassociated analysis wavelet filter. The high-pass filterg is the modulated version of h given by:

G(z) = z ·H(−z−1) (13)

For perfect reconstruction, the filters must obey thequadrature mirror filter (QMF) conditions:

H(z−1)H(z) + G(z−1)G(z) = 1

H(z−1)H(−z) + G(z−1)G(−z) = 0(14)

In terms of the low-pass filter h only, the QMF condi-tions equate:

H(z)H(z−1) + H(−z)H(−z−1) = 2

H(1) =√

2 ⇔ H(−1) = 0(15)

The wavelet decomposition is iteratively implementedusing QMF filterbanks. For a signal vector of lengthN0, the operations required by the WT are O(N0),as compared with the standard FFT complexity ofO(N0 logN0).

Preprocessing in image analysis (like noise reduc-tion, contrast enhancement, ...) can be carried out bymaking the operations frequency dependent (i.e., splitsignal/image into frequency subbands and apply dif-ferent operations on each subband). Rather than usinga huge multichannel filterbank to encompass the fullspectrum, the WT employs recursive 2-channel filter-banks (a low-pass filter and a high-pass filter) (Fig. 6).In a one-level FWT, a signal c0 is split into an approxi-mation part c1 and a detail part d1. In multilevel FWT,each subsequent ci is split into an approximation ci+1

and detail di+1 pair of coefficients. The inverse FWT(IFWT) reconstructs each ci from ci+1 and di+1.The decomposition (10) can be extended to multipledimensions (e.g., 2D or 3D) by using tensor product

f0,k( )x f1,k( )x f1,k( )xy1,k( )x y1,k( )x

f0,l( )y f0,l( )y

f1,l( )y

y1,l( )y

Figure 6: One iteration of the WT in 2D space: the 1Dalgorithm is first applied along the x-direction split-ting the columns of data in two halves (top), whichare subject to its subsequent application along the y-direction that splits the rows into two halves (bottom).The iteration continues on the upper left quadrant.

basis functions, which amounts to successively apply-ing the 1D decomposition algorithm along each di-mension in multidimensional data. By iteration, 2q

different type of basis functions are generated in q di-mensions. The corresponding qD separable scalingfunctions with x = (x1, x2, ..., xq) are:

φj,k(x) =q∏

i=1

φj,ki(xi) (16)

where k = (k1, ..., kq) is the vector integer index.The rest of 2q − 1 types of wavelet basis functionsare obtained by replacing one or more factors in (16)with wavelet terms of the form ψj,ki(xi), j ∈ Z, i =1, 2, ..., q. Define b = (b1, ..., bq) a binary vector suchas:

bi =

{1 if φj,ki is replaced by ψj,ki

0 otherwise, i = 1, 2, ..., q

(17)and

ϕj,ki=

{ψj,ki

if bi = 1φj,ki

otherwise, j ∈ Z, i = 1, 2, ..., q

(18)then the mixed tensor product wavelets can be rewrit-ten [37]

wmj,k(x) =

q∏

i=1

ϕj,ki(xi), m = 1, 2, ..., 2q−1 (19)

with

m =q∑

i=1

bi2i−1 (20)

Here, m indicates a preferential spatial orientationsince φ is low-pass and ψ is high-pass. As for in-stance, in the 2D case, wm

j,k(x) for m = 1, 2, 3 cor-respond to wavelets oriented along the horizontal, di-agonal, and vertical directions, respectively (Fig. 7).


Figure 7: Approximation and detailed wavelet coeffi-cients in a level-three 2D WT (top); 1D biorthogonalscaling and wavelet functions, respectively (bottom).

The corresponding multidimensional coefficients:

cj(k) = 〈f, φj,k〉dm

j (k) = 〈f, wmj,k〉

(21)

are iteratively obtained by successive filtering anddownsampling by a factor of two. In the case of mul-tilevel FWT of 2D images, each approximation coeffi-cient ci is split into an approximation coefficient ci+1

and three detail coefficients d1i+1, d2

i+1, and d3i+1, for

horizontally, vertically, and diagonally oriented de-tails, respectively (Fig. 7). The biorthogonal scal-ing function (”father”) and its corresponding waveletfunction (”mother”) are also showed in Fig. 7. Thesewavelets are employed to run 2D three-level WT ofan axial MR brain slice as presented in Fig. 8. Fur-ther on, the first three approximation levels are shownin Fig. 9 for visual comparison. Symmlets are

Figure 8: 2D wavelet decomposition of a typical MRaxial slice. A coarse (approximation) image at a res-olution level L is represented by 2L pixels in each di-rection. The detail images at a particular level L areproduced by horizontal, vertical, and diagonal dif-ferences between successive levels. The set of co-efficients produced by the WT consist of the lowestcoarse level image and the higher level detail images.The original images is 28 × 28 pixels (left).

Figure 9: Coarse images of a typical MR axial sliceat various approximation levels: (a) Approximation 1(resolution level L = 7); (b) Approximation 2 (L =6); (c) Approximation 3 (L = 5). All images wererescaled to the same size for better comparison.

wavelets within a minimum size support for a givennumber of vanishing moments, but they are as sym-metrical as possible, as opposed to the Daubechies fil-ters which are highly asymmetrical. They are indexedby the number of vanishing moments, which is equalto half the size of the support. Fractional splines of areal-valued degree were proposed to produce waveletbases [49], such as symmetric and causal, orthogo-nal, and biorthogonal. A reasonable trade-off seems tolead to symmetric, orthonormal cubic spline wavelets.Though symmetric, orthonormal, smooth wavelet ba-sis functions cannot have compact support, they ex-hibit exponential decay [4]. Symmetric basis func-tions do not introduce phase distortions, hence a bet-ter localization of the signal is achieved in the waveletdomain. Orthogonal spline wavelets were selected be-cause of the following: (i) orthogonality is requiredby the subsequent statistical analysis; (ii) the result-ing family of transforms use symmetric basis func-


tions; (iii) the use of splines reduces spectral overlapbetween resolution channels by increasing the degreeof spline n [37]. Nevertheless, small spectral overlapincreases data decorrelation [38], which raises the de-tection sensitivity. The decorrelation ability of orthog-onal spline wavelets stems from the fact that splineswith degree n yield L = n + 1 vanishing moments.The uncertainty principle limits the level of decorre-lation across scale since the correlation suppressioncomes at the expense of a loss in spatial localizationexpressed in the decay rate of the filter coefficients.Besides, selecting the degree of splines depends tosome extent on the assumed smoothness of the sig-nal to be detected. Smooth wavelet bases are asymp-totically near-optimal for estimating signals that maycontain some points of discontinuity [6].

2.3.3 Denoising Images by Wavelet ShrinkageWavelet shrinkage refers to reconstructions ob-

tained by WT of the original signal, followed byshrinking the empirical wavelet coefficients towardszero, followed by the inverse WT [8]. Wavelet-basedshrinkage methods are nonparametric regression es-timators that provide means of finding structure ina variety of data sets without imposing a parametricregression model. Wavelet-based denoising amountsto thresholding the detail coefficients in the waveletdomain. Gaussian spatial smoothing with a singlekernel is widely applied in denoising neuroimagingdata at the risk of missing to detect spatial featuresof the smoothing kernel size or lower. In contrast,smoothing by wavelet shrinkage allows locally adap-tive bandwidth, so that the power to detect spatial fea-tures of varying extent is not constrained by the arbi-trary choice of a single kernel size [3].

Thresholding in the wavelet domain was basedon the assumption of white Gaussian noise. Whenthe autocorrelation of the noise is unknown, a level-dependent threshold was suggested [24]. In thewavelet domain, the (white) noise is quite evenlyspread out among all coefficients, while the signal isconcentrated in a few coefficients only. Thus thresh-olding mostly affects the noise without disturbing thesignal. This behavior is in contrast with traditional lin-ear methods of smoothing, which perform noise sup-pression at the expense of significantly broadening thesignal features. The WaveLab802 package [1] usedfor thresholding in the wavelet domain contains vari-ous shrinkage procedures, their discriminating charac-teristic being the amount of smoothing introduced inthe denoised signal [18]. In fact, traditional methodsof signal smoothing are just a bit more than diagonalprojectors in the Fourier basis. Evidence exists thatideal diagonal projectors work better in the wavelet

domain bases than (nonideal) projectors in the Fourierbasis [8].

In order to select an optimal WT for waveletshrinkage of brain signals, a sine with two clicks thatmimics spatial inhomogeneities in fMRI data (Fig.10a) as used by Donoho [5] was subject to severalwavelet transforms and thresholding recipes. Whitenoise normally distributed with variance 1 was ad-ditively added to this test signal (Fig. 10b). Bothsoft thresholding of the wavelet coefficients withan adaptively-chosen threshold and hard threshold-ing were initially compared. Soft thresholding isa spatially adaptive method for estimation of func-tions from noisy data, which is a nearly-ideal methodof spatial adaptation in some well-defined theoreticalsense [7]. Soft thresholding closely mimics an idealdiagonal projector in a wavelet basis [7]. The com-pression abilities of the wavelet bases are responsiblefor the mean-squared error advantages of the waveletshrinkage. The results are presented for the case ofwavelet shrinkage of the modified sine by the pe-riodized symmetric biorthogonal wavelet. Both thewavelet coefficients of the original signal (Fig. 10c)and their noisy version (Fig. 10d) are presented strat-ified by scale and location. The significant coeffi-cients occur in locations of significant spatial variabil-ity. For locally smooth functions, the coefficients tendto be small at fine scales. The noise in the empiricalwavelet coefficients is a white noise and, therefore,about the same in every coefficient. The signal ex-ceeds the noise level only in locations of significantspatial variability (Fig. 10d).

We applied a method proposed by Donoho andJohnstone [5] to find the threshold that minimizesthe estimate of the mean squared error (MSE).The approach equates to applying a soft threshold-ing nonlinearity, with the threshold selected by theStein’s unbiased risk estimate (SURE) in the interval[0,

√2log(n)], where n = 2J is data number and J is

the number of scales. This was proved to posses var-ious optimality properties for MSE estimation. TheSURE shrinkage carried out the best reconstruction ofthe original signal both in terms of noise suppressionand sharp structure preservation in the neighborhoodof the highly-variable spatial components (Fig. 10e,f).

2.4 Multiple Hypotheses TestingInferential analysis of functional neuroimaging

data entails multiple testing in which a large numberof correlated test statistics must be assessed. Considera random variable with a large number of possible out-comes, say V . A statistical parametric map of V vox-els consists of the p-values pi, i = 1, 2, ..., V . Givena distribution of outcomes, a p-value is the probability


Figure 10: Wavelet shrinkage of the modified sine bythe periodized symmetric biorthogonal wavelet; (a)A sinusoid with two clicks 1D test signal; (b) Whitenoise normally distributed with variance 1 superim-posed; (c) Wavelet coefficients of the test signal strat-ified by scale and location; (d) Empirical wavelet co-efficients of the noisy test signal; (e) Reconstructionof the modified sine function; (f) Reconstruction ofthe noisy modified sine function.

of getting an outcome at least as extreme as the oneobserved when the null hypothesis H0 is correct. Wehereafter discuss the one-sample t-test only, thoughSPM allows for many statistical tests. The temporalnoise in fMRI data is assumed to be Gaussian dis-tributed, N (µ, σ2). The null hypothesis states thatµ = 0. Testing for increased activation entails an one-side test: H1 states that µ > 0. Since σ2 of the tempo-ral noise distribution is unknown, it must be estimatedvia the sample variance s2, which can be calculatedfrom the residuals. Testing for increased activation iscarried out via a t-test using these estimates. BOLDcontrasts are constructed as linear combinations of therows of the parameter matrix of the linear model (βββ inFig. 13 and eq. 25), each of which is an image ofV voxels, and their values are t-distributed. If a t-value in the BOLD contrast is in the upper α% of thedistribution, its p-value is bellow α. In other words,a small p-value provides strong evidence against thenull hypothesis. Active voxels are those with p-values

bellow a significance level α. In the case of one test,a common value is α = 0.05, which is the probabil-ity of erroneously rejecting H0. However, for V si-multaneous tests with the same significance level α,approximately αV detected activations will be falsepositives since a large number of type I errors wouldbe expected in null data. The problem of false pos-itives with multiple statistical tests amounts to findan appropriate threshold for the t-values (or, equiva-lently, z-scores), so that we can be confident that theremaining suprathresholded t-values (or z-scores) aresufficiently high to be expected by chance.

2.4.1 Familywise Error

Consider image data on a 3D regular or irregularlattice and, following a modelling process, assume animage of a test statistics T = {Ti}. Here Ti denotesthe value of the statistic image at the spatial locationi, i ∈ V = {1, 2, ..., V }, where V is the number ofvoxels in the brain. Let H = Hi be a hypothesis im-age such as Hi = 0 indicates that the null hypoth-esis holds at voxel i, and Hi = 1 indicates that thealternative hypothesis holds. The complete null caseHi = 0, ∀i, is indicated by H0. A decision to rejectthe null for voxel i will be expressed as Hi = 1 andnot rejecting as Hi = 0, the null distribution of Ti asF0,Ti , and the image of P-values as P = {Pi}. Thetest is considered unbiased and all distributions are as-sumed continuous.

False positives must be controlled over all tests,but there is not a single measure of type I error inmultiple hypotheses testing. The standard measureof type I errors in multiple testing is familywise errorrate (FWE), which specifies the chance of false pos-itives. The number of efficient FWE multiple testingprocedures for fMRI is limited by the spatial depen-dence of functional neuroimaging data.

A valid test at α significance level at location icorresponds to a rejection threshold u where Pr{Ti ≥u | Hi = 0} ≤ α. The central task in multiple hy-potheses testing is to find a threshold u that controlssome measure of false positives across the entire im-age. The weak control of FWE requires that false pos-itives are controlled under the complete null hypothe-sis H0 only:

Pr

(⋃

i∈V{Ti ≥ u} | H0

)≤ α0 (22)

where α0 is the nominal FWE. The strong control ofFWE requires that false positives are controlled for


any subset V0 ⊂ V where the null hypothesis holds:

Pr

⋃

i∈V0

{Ti ≥ u} | Hi = 0, i ∈ V0

≤ α0 (23)

Statistical significance of weak control implies thatH0 is false only and does not permit localization ofindividual significant voxels. In contrast, statisticalsignificance of strong control allows rejection of in-dividual Hi’s while controlling the FWE at all non-significant voxels. Since localization is essential inneuroimaging, strong control of FWE is consideredhereafter.

Basically, there are two broad classes of FWEcontrol: (i) Bonferroni inequality-based methods,and (ii) maximum statistic distribution-based meth-ods [33]. A statistical adjustment that is called Bon-ferroni correction effectively raises the standard ofproof when a wide range of hypotheses are simulta-neously considered. If V outcomes are tested ratherthan one, the significance level α is replaced by α/V ,which will ensure that the proportions of false posi-tives is still less than α in any subset V0 of the simul-taneous tests V . Consequently, Bonferroni correctionhas strong type I error (false positives) control, mean-ing that rejecting H0 in a certain region of the brain isevidence for activation in that region. By reducing theprobability of rejecting the null hypothesis, Bonfer-roni correction decreases the number of true positives,which introduces type II errors (false negatives).

2.4.2 False Discovery Rate

Another metric for measuring type I errors is thefalse discovery rate (FDR), which is defined as the ex-pected proportion of rejected hypotheses that are falsepositives [14]. FDR controlling procedures are morepowerful then FWE measures, yet still control falsepositives in a useful manner. FDR is an alternative toFWE, which does not require spatial smoothness. In-stead of controlling the chance of any false positives(as Bonferroni or GRF do), FDR controls the expectedproportion of false positives (i.e., type I errors) amongsuprathreshold voxels (i.e., rejected null hypotheses):

FDR = E{

#false positives

#H0 rejected

}(24)

where E denotes expectation. FDR = 0 if the re-jected null hypotheses (i.e., number of detections) =0.

3 Results and Discussion3.1 Data Acquisition and Preprocessing

One healthy right-handed subject was selectedfor single-shot MR FEEPI scanning at 1.5 T mag-netic field while performing a block-based visual taskduring 12 identical sessions of 228 s each. Boththe acquisition and the reconstruction matrices were64× 64× 35 and the voxel size was 3.8× 3.8× 3.75mm3. In all sessions, 80 volumes were acquired atTR = 3 s, and the first 8 volumes were discardedto minimize the T1 saturation effects. A flashingcheckerboard was presented in blocks of 24 s followedby 24 s of fixation, starting with activation.

All data were subject to some preprocessingsteps: (i) acquisition time correction, (ii) realign-ment (i.e., movement correction) and coregistration,and (iii) spatial normalization to stereotaxic space[43]. For analysis in both spatial domain (SPM) andwavelet domain (WT), the design matrix was built upin SPM2, which also included a model for the hemo-dynamic response function (HRF).

3.2 Noise Models for fMRI DataMost of the standard statistical tests in fMRI as-

sume Gaussian distributed noise, though MR magni-tude image data was found to obey a Rician distri-bution [16], [10], [39]. Unlike the additive Gaussiannoise, the Rician noise is multiplicative (i.e., signal-dependent), which makes it difficult to separate fromthe signal. The Rician noise is especially problematicin low SNR regimes (SNR < 2) where it causes ran-dom fluctuations, and introduces a signal-dependentbias to data that reduces the image contrast [34]. Thenoise distribution is nearly Gaussian for SNR > 2[16]. Therefore, wavelet-based noise removal meth-ods that adapt to variations of both signal and noiseare attractive for filtering out the Rician noise. How-ever, since the BOLD contrast is analyzed as the dif-ference between two MR images (e.g., active minusbaseline) both containing Rician distributed noise, thedistribution of noise appears symmetric and closelyapproximates a Gaussian curve. Moreover, for verylow signal intensities, based on Kolmogorov-Smirnoff(KS) test, a deviation from Gaussianity was evaluatedstatistically significant in very large images only [53].

The main task in image denoising is to maxi-mally suppress noise along with preserving as muchas possible of the image features. In a wavelet rep-resentation, wavelet coefficients carry both time andfrequency information, as the basis functions vary inposition and scale. The BOLD effect (due to spa-tial extent of neuronal events) and most preprocess-ing (particularly spatial smoothing) entail spatial au-


Figure 11: GLM of fMRI data.

tocorrelation. Accordingly, the optimal wavelet basisfunctions were selected by simulations of two types ofspatial correlation: (i) white noise, and (ii) 1/f noisewith 1/f power spectrum. In the orthonormal waveletdomain, most image information is contained in thelargest wavelet coefficients, while the white noise isuniformly spread out across all coefficients. An effi-cient denoising method sets the smallest coefficientsto zero and shrinks the remaining ones above a certainthreshold.

3.3 SPM AnalysisIn SPM, a convolution with an isotropic symmet-

ric Gaussian kernel is applied to preprocessed dataprior to statistical analysis. Apart form its benefits,Gaussian filtering degrades the image resolution andcomplicates the statistical analysis since the noise canno longer be considered independent. The inferentialmethods test voxelwise specific hypotheses about theexpected changes in BOLD response. These changesare specified as regressors of interest in a (multiple)linear regression framework and their relative weightsare given by the regression coefficients (i.e., modelparameters). The general linear model (GLM) is ap-plied voxelwise so that the observed fMRI time seriesat each voxel are linearly modeled as a superpositionof model time courses of activation and Gaussian er-rors (Fig. 11).

Let the matrix Y[T×V ] denote the fMRI data ac-quired in the experiment, where each matrix elementyij denotes the observed value at time i, i = 1, 2, ..., Tand voxel location j, j = 1, 2, ..., V , V is the numberof voxels in a volume, and T is the number time points(i.e., number of full volume scans). Then the linearmodel gives:

Y = Xβββ + εεε (25)

where X[T×N ] is the design matrix having the regres-

sors as N column vectors. The row vectors of thematrix βββ[N×V ] are the model parameters of the effectsof interest, and the elements in the matrix εεε[T×V ] arethe residuals (i.e., errors) of each voxel in each scan.A maximum likelihood (ML) estimate for the modelparameters βββ is found by the least squares method(LSM) assuming no temporal correlations in the data:

βββ = (XTX)−1XTY (26)

The parameter selection is done by a contrast vector c,which compares one or multiple parameter values. If amodel of the residuals εεε exists, then the statistical sig-nificance of the regression coefficients and, implicitly,of the modeled hemodynamic changes can be calcu-lated in each voxel via hypothesis testing. The activa-tion image reconstruction is carried out using a voxel-specific statistic that tests hypotheses on dynamics.The statistical inference relies on continuous GaussianRFT and takes into account the dependencies intro-duced by the Gaussian spatial smoothing. The aver-age number of resels (resolution elements) availablein the data after smoothing is defined as [50]:

#resels =V∏3

i=1 FWHMi

(27)

where FWHMi is the Gaussian full width at halfmaximum of the i-th dimension, i = 1, 2, 3. The vox-elwise test statistics form summary images known asstatistical parametric maps, which are commonly as-sessed for statistical significance against the null hy-pothesis (e.g., no activation). The resulting map of astatistic is a representation of the spatial distribution offunctional activity induced by the experimental task.

Denoising by spatial Gaussian filtering was car-ried out with smoothing kernels of 4×4×3.75 mm3,8 × 8 × 3.75 mm3, and 12 × 12 × 3.75 mm3. Theshape of the detected activation areas after denois-ing should not significantly differ from that obtained


Figure 12: Activation parametric maps obtained by SPM of spatially Gaussian smoothed fMRI data after FDRthresholding with q = 0.05.

from the original time series. Similarly to artificiallygenerated fMRI series, the active regions detected bySPM exhibited elliptic shapes with increasingly largerFWHMs of Gaussian kernels (Fig. 12). The denoisedactivation images were compared in all cases with theactivation map of the original raw data with minimumpreprocessing (i.e., acquisition and movement correc-tions only) as a reference (Fig. 13-left).

3.4 Wavelet AnalysisA large variety of non-redundant orthogonal WTs

were tested and several denoising methods were ap-plied in each wavelet subband to denoise both syn-thetic and real-life MR images corrupted by Riciannoise. In agreement with [26], for a wide range ofinput noise levels, the orthogonal fractional (α, τ)-B-splines gave the best peak SNR (PSNR), as comparedto some standard wavelet bases (e.g., Daubechies,symmlets, and coiflets).

The DWT was implemented through an iteratedfilterbank [27] by means of two types of wavelet fam-ilies: (i) the separable 3D fractional-spline wavelets,and (ii) 2D + Z quincunx wavelets [50]. The highfrequency information contained in the data was pre-

served in the wavelet subbands, contrarily to spatialsmoothing with a Gaussian kernel in SPM. Then theGLM (as in SPM) was applied to the time seriesof each wavelet coefficient. Significantly non-zeroclusters of wavelet coefficients were identified usingKolmogorov-Smirnoff statistics. The activation pat-tern was spatially localized by the IDWT of the thresh-olded coefficient map and directly compared with theactivation images obtained by statistical inference inthe spatial domain (Fig. 16). Less spatially variablet-statistic images and lower thresholds (p < 0.05) en-forced better detection of activation and improved thephysiological relevance of statistical inference.

Following [50], the functions employed through-out as symmetric scaling functions in the DWT werethe separable 3D fractional-splines that depend on twoparameters: (i) the degree, which is a continuous-varying parameter α > −1/2, and (ii) the shift pa-rameter τ . Their expression in the Fourier domain is:

βα =[1 + z−1

2

]α+1

(28)

For the symmetric functions τ = 0, hence their two-


Figure 13: (a) Activation parametric maps generated by: minimally preprocessed raw fMRI data (left); SPManalysis with Gaussian smaoothing (mid); wavelet-based SPM analysis; and their corresponding time courses ofactivation (bottom). In all cases, the multiple hypothesis testing was controled by FDR thresholding (q = 0.05).

scale relationship is given by the scaling filter:

Bα(z) =√

2[1− exp (−jω)

jω

]α+1

(29)

Both orthogonal and biorthogonal transforms can beconstructed, though of special interest are the dual fil-ters. The analysis part consists of the pure B-splinefilter, whereas the synthesis part consists of the dualpart, which is not so well localized. The fractional-spline wavelet transforms were implemented using theFFT. The use of non-separable 2D quincunx schemefor the XY-planes [12], combined with a separabletransform along the Z-direction, entails a quincunxsubsampling matrix:

D =∥∥∥∥1 11 −1

∥∥∥∥ (30)

Since detD = −2, each 2D iteration introduces only2 subbands, as opposed to 4 in the case of a 2D sepa-rable transform.

The 3D volumes, each consisting of 35 axialplanes of 64 × 64 voxels were transformed plane-by-plane to the wavelet domain. The wavelet de-composition level was set to 4. The wavelet analy-sis lead to similar activation patterns as in SPM basedon Gaussian random fields. We found bilateral acti-vation within a network of visual responsive regionsincluding the inferior occipital gyrus, fusiform gyrus,superior temporal sulcus, amygdala, inferior frontalgyrus, and orbitofrontal cortex (Fig. 13-right). Arather ad-hoc threshold was required in the waveletdomain after the reconstruction of the coefficient mapin order to put apart the activated and non-activatedvoxels. Though the back-projected results were de-


ployed of a precise statistical meaning in the spatialdomain, the wavelet analysis yielded activation mapsof higher resolution when using the coefficients fromthe high-pass subband too.

For synthetic fMRI-like time series, both theheavy Gaussian spatial smoothing and the moresmoothing wavelet-based denoising schemes intro-duced severe deformations and blurred the edges ofthe activated regions introducing false negatives (typeII errors). In contrast, for low SNR’s, the less smooth-ing methods, both Gaussian filtering and wavelets,generated false positives (type I errors). In the midSNR range, wavelet-based denoising methods led toless errors comparatively with Gaussian smoothing.

For real fMRI data, only the smallest Gaussiansmoothing kernel yielded reliable results. The widersmoothing kernels yielded much larger detected areas(meaning more type I errors) or completely missedthe active regions smaller than the kernel, in con-trast to those obtained via less smoothing wavelet de-noising methods. The bottom line is that wavelet-based denoising methods by introducing relatively lit-tle smoothness are generally preferably over Gaussianspatial smoothing for denoising fMRI time series. Wemay speculate that wavelet denoising may prove agood alternative for temporally smoothing as well.

4 ConclusionSignal denoising methods that introduce heavy

smoothness are better suited for low SNR input sig-nals yielding the highest gain in SNR. For imagesof reasonable quality (i.e., with relatively high SNRinput), the denoising schemes that produce heavysmoothness are not recommended because of intro-ducing serious deformations of the objects in the im-age. Clearly, the more smoothing applied, the largerthe deformations, no matter the denoising scheme ap-plied. Wavelet methods perform as well as Gaussiansmoothing for low SNR’s, and better than Gaussiansmoothing for higher SNR’s. Wavelet-based denois-ing methods, by introducing less smoothing, preservethe sharpness of images and retain the original shapesof the active regions. Virtually for all wavelet-baseddenoising methods, the output SNR is a linear func-tion of the input SNR, that is, the wavelet methods,contrarily to Gaussian smoothing, improve the SNRof the input images that already have a high SNR.

For both Gaussian and wavelet-based smoothingschemes, the difference in performance is relativelysmaller for 1/f noise than for white noise. In thecase of white noise and low SNR’s, the less smoothingwavelet methods yield relatively lower output SNR’s,which indicate that smoothness (i.e., discarding image

features) is required to improve images with very lowSNR’s.

Previously reported data indicated that wavelet-based methods are preferable for denoising imageswith SNR > 2 dB and exhibit maximum efficiencyfor SNR > 10 dB [53]. The preliminary findings sofar pointed out that the methods producing smooth im-ages introduce more false positives. The less smooth-ing wavelet-based methods, though generating morefalse negatives, produce a smaller total number of er-rors than (spatial) Gaussian smoothing.

Wavelet-based methods provide a naturally mul-tiscale alternative to single scale Gaussian spatialsmoothing as widely used before hypothesis testing.Scale-varying wavelet-based methods for hypothesistesting of brain activation maps circumvent the needto specify a priori the size of signals expected and,therefore, the optimal choice of the smoothing kernelrequired by Gaussian filtering. Due to the smoothnessof the wavelet representation, the estimated statisti-cal parameter maps reveal more compact regions ofactivation than their counterparts obtained by statistictesting in the spatial domain.

Wavelet analysis is optimal in terms of detect-ing transients events in fMRI time series and adaptswell to local or nonstationary features in data withinscales of the decomposition. Wavelet-based methodsare likely to provide an overall richer characterizationof distributed brain activation.

Acknowledgements: The author is grateful for brainstorming with Dr. Vince Calhoun (Associate Prof.,Department of Psychiatry, Yale University and Direc-tor, Medical Image Analysis Laboratory, Olin Neu-ropsychiatry Research Center, Institute of Living,New Haven, CT, USA) and Dr. John Ashburner (Well-come Department of Imaging Neuroscience, Instituteof Neurology, University College London, UK) forclearing up the subtleties and commenting the latestupdates in SPM5.

References:

[1] J.B. Buckheit and D.L. Donoho, ”Wavelab andReproducible Research.” Dept. Statist., Stan-ford Univ., Stanford, CA, 1995. Available at:http://www-stat.stanford.edu/ wavelab

[2] E. Bullmore, ”Wavelets and functional MRI,” inMathematics in Brain Imaging, IPAM, UCLA,July 21, 2004, available on-line.

[3] E. Bullmore, J. Fadili, M. Breakspear, R. Sal-vador, J. Suckling, M. Brammer, ”Wavelets andstatistical analysis of functional magnetic reso-nance images of the human brain,” Statistical


Methods in Medical Research, vol. 12, no. 5,pp. 375-399, Oct. 2003.

[4] A. Cohen, I. Daubechies, and J. C. Feauveau,”Biorthogonal bases of compactly supportedwavelets,” Commun. Pure Appl. Math., vol. 45,pp. 485560, 1992.

[5] D.L. Donoho and I.M. Johnstone, ”Adaptingto unknown smoothness via wavelet shrinkage,”Journal of the American Statistical Association,vol. 90, no. 432, pp. 12001224, 1995.

[6] D. L. Donoho, ”Unconditional bases are opti-mal bases for data compression,” Appl. Comput.Harmonica Anal., vol. 1, pp. 100-115, 1993.

[7] D.L. Donoho and I.M. Johnstone, ”Ideal spatialadaptation via wavelet shrinkage,” Tech. Report,Statistics, Univ. Stanford, 1992.

[8] D.L. Donoho,”Nonlinear wavelet methods forrecovery of signals, densities, and spectra fromindirect and noisy data,” Proc. Symposia in Ap-plied Mathematics, vol. 00, 1993.

[9] J.H. Duyn, C.T.W. Moonen, R.W. de Boer,G.H. van Yperen, and P.R. Luyton, ”Influx ver-sus deoxyhemoglobin effects in BOLD func-tiona MRI using gradient echoes at 1.5 T”, NMRBiomed., vol. 7, pp. 83-88, 1994.

[10] W.A. Edelstein, P.A. Bottomley, andP.M. Pfeifer, ”A signal-to-noise calibrationprocedure for NMR imaging systems,” Med.Phys., vol. 11, pp. 180185, 1983.

[11] F. Esposito, E. Seifritz, E. Formisano, R. Mor-rone, T. Scarabino, G. Tedeschi, S. Cirillo,R. Goebel, and F. Di Sallee, ”Real-time indepen-dent component analysis of fMRI time-series,”Neuroimaging, vol. 20, pp. 2209-2224, 2003.

[12] M. Feilner, M. Jacob, and M. Unser, ”Orthog-onal quincunx wavelets with fractional orders,”Proc. of the 2001 IEEE Intl. Conf. on Image Pro-cessing (ICIP’01), vol. I, pp. 606-609, Thessa-loniki, Greece, 2001.

[13] K.J. Friston, A.P. Holmes, K.J. Worsley, J.-B. Poline, C.D. Frith, and R.S.J. Frackowiak,”Statistical parametric maps in functional imag-ing: A general linear approach,” Hum. BrainMap., vol. 2, pp. 189-210, 1995.

[14] C.R. Genovese, N.A. Lazar, and T.E. Nichols,”Thresholding of statistical maps in functionalneuroimaging using the false discovery rate,”NeuroImage, vol. 15, pp. 772-786, 2002.

[15] G.L. Gernstein, P. Bedenbaugh, andA.M.H.J. Aertsen, ”Neuronal assemblies,”IEEE Trans. on Biomedical Engineering,vol. 36, pp. 4-14, 1989.

[16] H. Gudbjartsson and S. Patz, ”The Rician distri-bution of noisy MRI data,” Magn. Reson. Med.,vol. 34, pp. 910-914, 1995.

[17] J. V. Hajnak, R. Myers, A. Oatridge, J. E.Schwieso, I. R. Young, and G. M. Byder, ”Arti-facts due to stimulus correlated motion in func-tional imaging of the brain”, Magn. Reson. Med.,vol. 31, pp. 283-291, 1994.

[18] M. Hilton, T. Ogden, D. Hattery, G. Eden, andB. Jawerth, ”Wavelet processing of functionalMRI data,” in Wavelets in Biology and Medicine(A. Aldroubi and M. Unser, Eds.) CRC Press,Boca Raton, FL, 1996.

[19] B. Horwitz and O. Sporns, ”Neural model-ing and functional neuroimaging,” Hum. BrainMap., vol. 1, pp. 269-283, 1994.

[20] B. Horwitz, K. Frinston, and J. G. Taylor, ”Neu-ral modeling and functional brain imaging: Anoverview,” Neural Networks, vol. 13, nos. 8-9,pp. 829-846, 2000.

[21] P.J. Huber, ”Projection pursuit,” Annals of Statis-tics, vol. 13, pp. 435-475, Feb. 1985.

[22] P.J. Huber, ”Huge data sets,” Proceedings,Compstat 1994 (R. Dutter and W. Grossman,Eds.), pp. 3-13, Physica Verlag, Heidelberg,1994.

[23] M. Jansen and A. Bultheel, ”Empirical bayes ap-proach to improve wavelet thresholding for im-age noise reduction,” J. Amer. Statist. Assoc.,vol. 96, pp. 629639, 2001.

[24] I.M. Johnstone and B.W. Silverman, ”Waveletthreshold estimators for data with correlatednoise,” J. Roy. Statist. Soc., vol. 59, pp. 319351,1997.

[25] S. Kim, K. Hendrich, X. Hu, H Merkle, andK. Ugurbil, ”Potential pitfalls of functional MRIusing conventional gradient-recalled echo tech-niques”, NMR Biomed., vol. 7, pp. 69-74, 1994.

[26] F. Luisier, T. Blu, B. Forster, and M. Unser,”Which wavelet bases are the best for im-age denoising ?” Biomedical Imaging Group(BIG), Ecole Polytechnique Federale de Lau-sanne (EPFL), Switzerland.

[27] S. Mallat, ”A theory for multiresolution signaldecomposition: The wavelet decomposition,”IEEE Trans. Patter. Anal. Mach. Intell., vol. 11,pp. 674-693, 1989.

[28] C.T.W. Moonen, ”Imaging of human brain acti-vation with functional MRI”, Biological Psych.,vol. 37, pp. 141-143, 1995.

[29] R. Mutihac, ”Wavelet denoising versus Gaus-sian spatial smoothing of fMRI data,” EuropeanSociety for Magnetic Resonance in Medicine


and Biology, 22nd Annual Meeting (ESMRMB2005), Basel, Switzerland, Functional MRI,S204, 2005.

[30] R. Mutihac, ”Thresholding brain activationmaps - Multiple hypotheses testing versuswavelet shrinkage,” 12th Annual Meeting ofthe Organization for Human Brain Mapping(OHBM 2006), Florence, Italy, 984, 2006.

[31] R. Mutihac, ”Exploratory analysis of functionalMRI data,” The Abdus Salam ICTP, IC/2004/69,pp. 1-27, 2004.

[32] R. Mutihac, A. Cicuttin, K. Jansen, andR.C. Mutihac, ”An essay on Bayesian inferenceand Maximum Entropy,” Romanian Biotechnol-ogy Letters, vol. 5, no. 2, pp. 83-114, 2000.

[33] T. Nichols, S. Hayasaka, and T. Wager, ”Con-trolling the familywise error rate in functionalneuroimaging: A comparative review,” Statisti-cal Methods in Medical Research, vol. 12, no. 5,pp. 419-446, 2003.

[34] R.D. Nowak, ”Wavelet-based Rician noise re-moval for magnetic resonance imaging,” IEEETransactions on Image Processing, vol. 8,no. 10, pp. 1408-1419, Oct. 1999.

[35] S. Ogawa, T.M. Lee, A.S. Nayak, and P. Glynn,”Oxygenation-sensitive contrast in magnetic res-onance image of rodent brain at high magneticfields,” J. Magnetic Resonance in Medicine,vol. 14, pp. 68-78, 1990.

[36] J.-B. Poline, K.J. Worsley, A.C. Evans, andK. Frinston, ”Combining spatial extent and peakintensity to test for activations in functionalimaging,” Neuroimage, vol. 5, pp. 83-96, 1997.

[37] U.E. Ruttimann, M. Unser, R. Rawlings, D. Rio,N. Ramsey, V. Mattay, D. Hommer, J. Frank,and D. Weinberger, ”Statistical analysis of fMRIdata in the wavelet domain,” IEEE Trans. Med.Imaging, vol. 17, no. 2, pp. 142-154, Feb. 1998.

[38] U.E. Ruttimann, M. Unser, D. Rio, andR.R. Rawlings, ”Use of the wavelet transform toinvestigate differences in brain PET images be-tween patients,” Proc. SPIE, vol. 2035, Mathe-matical Methods in Medical Imaging II, pp. 192-203, San Diego, CA, 1993.

[39] J. Sijbers, A.J. den Dekker, J. Van Audekerke,M. Verhoye, and D. Van Dyck, ”Estimation ofthe noise in magnitude in MR images,” Magn.Reson. Imag., vol. 16, pp. 8790, 1998.

[40] J. Skilling, D.R.T. Robinson, and S.F. Gull,”Probabilistic displays,” in Maximum Entropyand Bayesian Methods in Science and Engi-neering (W.T. Grandy and L.H. Schick, Eds.),pp. 365-368, Kluwer Academic Publishers,Laramie, Wyoming, 1991.

[41] F.T. Sommer, J.A. Hirsch, and A. Wichert, ”The-ories, data analysis, and simulation models inneuroimaging - An overview,” in ExploratoryAnalysis and Data Modeling in Functional Neu-roimaging(F. T. Sommer and A. Wichert, Eds.),pp. 1-13, Neural Information Processing Series,The MIT Press, Cambridge, 2003.

[42] M. Spitzer, ”Forward,” in Exploratory Analysisand Data Modeling in Functional Neuroimag-ing, (F.T. Sommer and A. Wichert, Eds.), NeuralInformation Processing Series, The MIT Press,Cambridge, 2003.

[43] J. Talairach and P. Tournoux, Co-Planar Stereo-taxic Atlas Of The Human Brain, Thieme Medi-cal Publishers, New York, 1988.

[44] J.W. Tukey, ”The future of data analysis.” Annalsof Statistics, vol. 33, pp. 1-67, 1962.

[45] F.E. Turkheimer, M. Brett, D. Visvikis, andV. J. Cunningham, ”Multiresolution analysis ofemission tomography images in the wavelet do-main”, Journal of Cerebral Blood Flow andMetabolism, vol. 19, no. 11, pp. 1189-1208,1999.

[46] F.E. Turkheimer, J.A.D Aston, and V.J. Cun-ningham, ”On the logic of hypothesis testing infunctional imaging,” European Journal of Nu-clear Medicine, vol. 31, no. 5, pp. 725-732,2004.

[47] M. Unser, P. Thevenaz, C. Lee, and U.E. Rut-timan, ”Registration and statistical analysis ofPET images using the wavelet transform,” IEEEEng. Med. Biol. Mag., vol. 14, pp. 603-611,1995.

[48] M. Unser, ”Splines: A perfect fit for signaland image processing,” IEEE Signal ProcessingMag., vol. 16, pp. 2238, Nov. 1999.

[49] M. Unser and T. Blu, ”Fractional splines andwavelets,” SIAM Rev., vol. 42, pp. 4367, 2000.

[50] D. Van De Ville, T. Blu, and M. Unser, ”Surfingthe brain: An overview of wavelet-based tech-niques for fMRI data analysis,” IEEE Engineer-ing in Medicine and Biology Magazine, vol. 25,no. 2, pp 65-78, Mar.-Apr. 2006.

[51] D. Van De Ville, T. Blu, and M. Unser, ”Onthe multi-dimensional extension of the quincunxsubsampling matrix,” IEEE Signal ProcessingLetters, vol. 12, no.2, pp. 112-115, 2005.

[52] D. Van De Ville, T. Blu, and M. Unser, ”Inte-grated wavelet processing and spatial statisticaltesting of fMRI data,” NeuroImage, vol. 23, no.4, pp. 1472-1485, Dec. 2004.


[53] A.M. Wink and J.B.T.M. Roerdink, ”Denois-ing functional MR images: A comparison ofwavelet denoising and Gaussian smoothing,”IEEE Trans. Med. Imaging, vol. 23, no. 3,pp. 374-387, March 2004.

[54] K.J. Worsley, A.C. Evans, S. Marrett, P. Neelin,”A three-dimensional statistical analysis forCBF activation studies in human brain,” Jour-nal of Cerebral Blood Flow and Metabolism,vol. 12, no. 6, pp. 900-918, 1992.

[55] K.J. Worsley, ”Discussion of the paper by Langeand Zeger.” Appl. Statist., vol. 46, p. 25, 1997.

[56] K.J. Worsley, S. Marrett, P. Neelin, andA.C. Evans, ”Searching scale space for activa-tion in PET images,” Human Brain Map., vol. 4,pp. 74-90, 1996.

[57] K.J. Worsley, ”Spatial smoothing of autocorrela-tions to control the degrees of freedom in fMRIanalysis”, OHBM, Toronto, Canada, 2005.

[58] S. Zeki, ”Functional specialization in the visualcortex: the generation of separate constructs andtheir multistage integration,” in Signal and Sense(G.M. Edelman, W.E. Gall, and W.M. Cowan,Eds.), pp. 85-130, John Wiley, New York, 1990.


Documents

Wavelet-Based Statistical Analysis in Functional Neuroimaging...Abstract: Wavelet-based analysis versus Gaussian smoothing in statistical parametric mapping (SPM) for detecting and