1 Inference on SPMs: Random Field Theory & Alternatives Thomas Nichols, Ph.D. Director, Modelling & Genetics GlaxoSmithKline Clinical Imaging Centre nichols

1

Inference on SPMs:Random Field Theory &

Alternatives

Thomas Nichols, Ph.D.Director, Modelling & Genetics

GlaxoSmithKline Clinical Imaging Centre

http://www.fmrib.ox.ac.uk/~nichols

Zurich SPM CourseFeb 12, 2009

2

realignment &motion

correctionsmoothing

normalisation

General Linear Modelmodel fittingstatistic image

Corrected thresholds & p-values

image data parameterestimates

designmatrix

anatomicalreference

kernel

StatisticalParametric Map

Thresholding &Random Field

Theory

3

Outline

• Orientation

• Assessing Statistic images

• The Multiple Testing Problem

• Random Field Theory & FWE

• Permutation & FWE

• False Discovery Rate

4

Outline

• Orientation






Assessing Statistic Images

Where’s the signal?

t > 0.5t > 3.5t > 5.5

High Threshold Med. Threshold Low Threshold

Good Specificity

Poor Power(risk of false negatives)

Poor Specificity(risk of false positives)

Good Power

...but why threshold?!

6

• Don’t threshold, model the signal!– Signal location?

• Estimates and CI’s on(x,y,z) location

– Signal magnitude?• CI’s on % change

– Spatial extent?• Estimates and CI’s on activation volume

• Robust to choice of cluster definition

• ...but this requires an explicit spatial model

Blue-sky inference:What we’d like

space

Loc.̂ Ext.̂

Mag.̂

7

Blue-sky inference:What we need

• Need an explicit spatial model

• No routine spatial modeling methods exist– High-dimensional mixture modeling problem– Activations don’t look like Gaussian blobs

• Need realistic shapes, sparse – Some initial work

• Hartvig et al., Penny et al., Xu et al.

– Not part of mass-univariate framework

8

Real-life inference:What we get

• Signal location– Local maximum – no inference– Center-of-mass – no inference

• Sensitive to blob-defining-threshold

• Signal magnitude– Local maximum intensity – P-values (& CI’s)

• Spatial extent– Cluster volume – P-value, no CI’s

• Sensitive to blob-defining-threshold

9

Voxel-level Inference

• Retain voxels above -level threshold u

• Gives best spatial specificity– The null hyp. at a single voxel can be rejected

Significant Voxels

space

u

No significant Voxels

10

Cluster-level Inference

• Two step-process– Define clusters by arbitrary threshold uclus

– Retain clusters larger than -level threshold k

Cluster not significant

uclus

space

Cluster significantk k

11

Cluster-level Inference

• Typically better sensitivity

• Worse spatial specificity– The null hyp. of entire cluster is rejected– Only means that one or more of voxels in

cluster active

Cluster not significant

uclus

space

Cluster significantk k

12

Set-level Inference

• Count number of blobs c– Minimum blob size k

• Worst spatial specificity– Only can reject global null hypothesis

uclus

space

Here c = 1; only 1 cluster larger than kk k

13

Outline

• Orientation






14

• Null Hypothesis H0

• Test statistic T– t observed realization of T

level– Acceptable false positive rate

– Level = P( T>u | H0 )

– Threshold u controls false positive rate at level

• P-value– Assessment of t assuming H0

– P( T > t | H0 )• Prob. of obtaining stat. as large

or larger in a new experiment

– P(Data|Null) not P(Null|Data)

Hypothesis Testing

u

Null Distribution of T

t

P-val

Null Distribution of T

15

Multiple Testing Problem

• Which of 100,000 voxels are sig.? =0.05 5,000 false positive voxels

• Which of (random number, say) 100 clusters significant? =0.05 5 false positives clusters

t > 0.5 t > 1.5 t > 2.5 t > 3.5 t > 4.5 t > 5.5 t > 6.5

16

MTP Solutions:Measuring False Positives

• Familywise Error Rate (FWE)– Familywise Error

• Existence of one or more false positives

– FWE is probability of familywise error

• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so

• Realized false discovery rate: V/R

17







18

FWE MTP Solutions: Bonferroni

• For a statistic image T...– Ti ith voxel of statistic image T

...use = 0/V

0 FWE level (e.g. 0.05)

– V number of voxels

– u -level statistic threshold, P(Ti u) = Conservative under correlation

Independent: V tests

Some dep.: ? tests

Total dep.: 1 test

Conservative under correlation

Independent: V tests

Some dep.: ? tests

Total dep.: 1 test

19

Outline

• Orientation






21

FWE MTP Solutions: Controlling FWE w/ Max

• FWE & distribution of maximum

FWE = P(FWE)= P( i {Ti u} | Ho)= P( maxi Ti u | Ho)

• 100(1-)%ile of max distn controls FWEFWE = P( maxi Ti u | Ho) =

– whereu = F-1

max (1-)

. u

22

FWE MTP Solutions:Random Field Theory

• Euler Characteristic u

– Topological Measure• #blobs - #holes

– At high thresholds,just counts blobs

– FWE = P(Max voxel u | Ho)

= P(One or more blobs | Ho)

P(u 1 | Ho)

E(u | Ho)

Random Field

Suprathreshold Sets

Threshold

No holes

Never more than 1 blob

24

Random Field TheorySmoothness Parameterization

• E(u) depends on ||1/2

roughness matrix:

• Smoothness parameterized as Full Width at Half Maximum– FWHM of Gaussian kernel

needed to smooth a whitenoise random field to roughness

Autocorrelation Function

FWHM

25

• RESELS– Resolution Elements– 1 RESEL = FWHMx FWHMy FWHMz

– RESEL Count R• R = () || = (4log2)3/2 () / ( FWHMx FWHMy FWHMz ) • Volume of search region in units of smoothness

– Eg: 10 voxels, 2.5 FWHM 4 RESELS

• Beware RESEL misinterpretation– RESEL are not “number of independent ‘things’ in the image”

• See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res..

Random Field TheorySmoothness Parameterization

1 2 3 4

2 4 6 8 101 3 5 7 9

27

Random Field Intuition

• Corrected P-value for voxel value t Pc = P(max T > t)

E(t) () ||1/2 t2 exp(-t2/2)

• Statistic value t increases– Pc decreases (but only for large t)

• Search volume increases– Pc increases (more severe MTP)

• Smoothness increases (roughness ||1/2 decreases)

– Pc decreases (less severe MTP)

29

5mm FWHM

10mm FWHM

15mm FWHM

• Expected Cluster Size– E(S) = E(N)/E(L)– S cluster size– N suprathreshold volume

(T > uclus})

– L number of clusters

• E(N) = () P( T > uclus )

• E(L) E(u)

– Assuming no holes

Random Field TheoryCluster Size Tests

30

RFT Cluster Inference & Stationarity

• Problem w/ VBM– Standard RFT result assumes

stationarity, constant smoothness– Assuming stationarity, false

positive clusters will be found in extra-smooth regions

– VBM noise very non-stationary• Nonstationary cluster inference

– Must un-warp nonstationarity– Reported but not implemented

• Hayasaka et al, NeuroImage 22:676– 687, 2004

– Now available in SPM toolboxes• http://fmri.wfubmc.edu/cms/software#NS• Gaser’s VBM toolbox (w/ recent update!)

VBM:Image of FWHM Noise

Smoothness

Nonstationarynoise…

…warped to stationarity

33

Random Field Theory Limitations

• Sufficient smoothness– FWHM smoothness 3-4× voxel size (Z)– More like ~10× for low-df T images

• Smoothness estimation– Estimate is biased when images not sufficiently

smooth

• Multivariate normality– Virtually impossible to check

• Several layers of approximations• Stationary required for cluster size results

Lattice ImageData

Continuous Random Field

34

Real Data

• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)

– Item Recognition• Active:View five letters, 2s pause,

view probe letter, respond

• Baseline: View XXXXX, 2s pause,view Y or N, respond

• Second Level RFX– Difference image, A-B constructed

for each subject

– One sample t test

...

D

yes

...

UBKDA

Active

...

N

no

...

XXXXX

Baseline

35

Real Data:RFT Result

• Threshold– S = 110,776– 2 2 2 voxels

5.1 5.8 6.9 mmFWHM

– u = 9.870

• Result– 5 voxels above

the threshold– 0.0063 minimum

FWE-correctedp-value

-log 1

0 p

-va

lue

36

Outline

• Orientation






37

Nonparametric Inference• Parametric methods

– Assume distribution ofstatistic under nullhypothesis

– Needed to find P-values, u

• Nonparametric methods– Use data to find

distribution of statisticunder null hypothesis

– Any statistic!

5%

Parametric Null Distribution

5%

Nonparametric Null Distribution

44

Controlling FWE: Permutation Test

• Parametric methods– Assume distribution of

max statistic under nullhypothesis

• Nonparametric methods– Use data to find

distribution of max statisticunder null hypothesis

– Again, any max statistic!

5%

Parametric Null Max Distribution

5%

Nonparametric Null Max Distribution

45

Permutation TestOther Statistics

• Collect max distribution– To find threshold that controls FWE

• Consider smoothed variance t statistic– To regularize low-df variance estimate

46

Permutation TestSmoothed Variance t


• Consider smoothed variance t statistic

t-statisticvariance

mean difference

47

Permutation TestSmoothed Variance t


• Consider smoothed variance t statistic

SmoothedVariancet-statistic

mean difference

smoothedvariance

48

Permutation TestStrengths

• Requires only assumption of exchangeability– Under Ho, distribution unperturbed by permutation– Allows us to build permutation distribution

• Subjects are exchangeable– Under Ho, each subject’s A/B labels can be flipped

• fMRI scans not exchangeable under Ho– Due to temporal autocorrelation

49

Permutation TestLimitations

• Computational Intensity– Analysis repeated for each relabeling– Not so bad on modern hardware

• No analysis discussed below took more than 3 hours

• Implementation Generality– Each experimental design type needs unique

code to generate permutations• Not so bad for population inference with t-tests

50

Permutation TestExample

• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)

– Item Recognition• Active:View five letters, 2s pause,

view probe letter, respond

• Baseline: View XXXXX, 2s pause,view Y or N, respond

• Second Level RFX– Difference image, A-B constructed

for each subject

– One sample, smoothed variance t test

...

D

yes

...

UBKDA

Active

...

N

no

...

XXXXX

Baseline

51


• Permute!– 212 = 4,096 ways to flip 12 A/B labels– For each, note maximum of t image.

Permutation DistributionMaximum t

Maximum Intensity Projection Thresholded t

52


• Compare with Bonferroni = 0.05/110,776

• Compare with parametric RFT– 110,776 222mm voxels– 5.15.86.9mm FWHM smoothness– 462.9 RESELs

53

t11 Statistic, RF & Bonf. Thresholdt11 Statistic, Nonparametric Threshold

uRF = 9.87uBonf = 9.805 sig. vox.

uPerm = 7.67

58 sig. vox.

Smoothed Variance t Statistic,Nonparametric Threshold

378 sig. vox.

Test Level vs. t11 Threshold

56

Reliability with Small Groups

• Consider n=50 group study– Event-related Odd-Ball paradigm, Kiehl, et al.

• Analyze all 50– Analyze with SPM and SnPM, find FWE thresh.

• Randomly partition into 5 groups 10– Analyze each with SPM & SnPM, find FWE

thresh

• Compare reliability of small groups with full– With and without variance smoothing.Skip

57

SPM t11: 5 groups of 10 vs all 505% FWE Threshold

10 subj 10 subj 10 subj

10 subj 10 subj all 50

T>10.93 T>11.04 T>11.01

T>10.69 T>10.10 T>4.66

2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45

4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49

58

T>4.09

SnPM t: 5 groups of 10 vs. all 505% FWE Threshold


10 subj 10 subj

T>7.06 T>8.28 T>6.3

T>6.49 T>6.19

Arbitrary thresh of 9.0

T>9.00

all 50

2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45

4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49

59


10 subj 10 subj

Arbitrary thresh of 9.0

T>9.00

all 50

SnPM SmVar t: 5 groups of 10 vs. all 505% FWE Threshold

T>4.69 T>5.04 T>4.57

T>4.84 T>4.64

2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45

4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49

60

Outline

• Orientation






61







63

False Discovery RateIllustration:

Signal

Signal+Noise

Noise

64

FWE

6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%

Control of Familywise Error Rate at 10%

11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%

Control of Per Comparison Rate at 10%

Percentage of Null Pixels that are False Positives

Control of False Discovery Rate at 10%

Occurrence of Familywise Error

Percentage of Activated Pixels that are False Positives

65

Benjamini & HochbergProcedure

• Select desired limit q on FDR• Order p-values, p(1) p(2) ... p(V)

• Let r be largest i such that

• Reject all hypotheses corresponding to p(1), ... , p(r).

p(i) i/V qp(i)

i/V

i/V qp-

valu

e

0 1

01

JRSS-B (1995)57:289-300

66

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

P-value threshold when no signal:

/V

P-value thresholdwhen allsignal:

Ord

ered

p-v

alue

s p

(i)

Fractional index i/V

Adaptiveness of Benjamini & Hochberg FDR

67

Benjamini & Hochberg Procedure Details

• Standard Result– Positive Regression Dependency on Subsets

P(X1c1, X2c2, ..., Xkck | Xi=xi) is non-decreasing in xi

• Only required of null xi’s– Positive correlation between null voxels– Positive correlation between null and signal voxels

• Special cases include– Independence– Multivariate Normal with all positive correlations

• Arbitrary covariance structure– Replace q by q/c(V),

c(V) = i=1,...,V 1/i log(V)+0.5772– Much more stringent

Benjamini &Yekutieli (2001).Ann. Stat.29:1165-1188

69

Controlling FDR:Varying Signal Extent

Signal Intensity 3.0 Signal Extent 1.0 Noise Smoothness 3.0

p = z =

1

70



p = z =

2

71



p = z =

3

72



p = 0.000252 z = 3.48

4

73



p = 0.001628 z = 2.94

5

74



p = 0.007157 z = 2.45

6

75



p = 0.019274 z = 2.07

7

77FWE Perm. Thresh. = 9.877 voxels

Real Data: FDR Example

FDR Threshold = 3.833,073 voxels

• Threshold– Indep/PosDep

u = 3.83– Arb Cov

u = 13.15

• Result– 3,073 voxels above

Indep/PosDep u– <0.0001 minimum

FDR-correctedp-value

78

Conclusions

• Must account for multiplicity– Otherwise have a fishing expedition

• FWE– Very specific, not so sensitive– Random Field Theory

• Great for single-subject fMRI, EEG

– Nonparametric / SnPM• Much better power voxel-wise than RFT for small DF

• FDR– Less specific, more sensitive– Interpret with care!

• FP risk is over whole set of surviving voxels

79

References

• Most of this talk covered in these papers

TE Nichols & S Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12(5): 419-446, 2003.

TE Nichols & AP Holmes, Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2001.

CR Genovese, N Lazar & TE Nichols, Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002.

Documents

1 Inference on SPMs: Random Field Theory & Alternatives Thomas Nichols, Ph.D. Director, Modelling & Genetics GlaxoSmithKline Clinical Imaging Centre nichols