View
224
Download
0
Tags:
Embed Size (px)
Citation preview
1
Inference on SPMs:Random Field Theory &
Alternatives
Thomas Nichols, Ph.D.Director, Modelling & Genetics
GlaxoSmithKline Clinical Imaging Centre
http://www.fmrib.ox.ac.uk/~nichols
Zurich SPM CourseFeb 12, 2009
2
realignment &motion
correctionsmoothing
normalisation
General Linear Modelmodel fittingstatistic image
Corrected thresholds & p-values
image data parameterestimates
designmatrix
anatomicalreference
kernel
StatisticalParametric Map
Thresholding &Random Field
Theory
3
Outline
• Orientation
• Assessing Statistic images
• The Multiple Testing Problem
• Random Field Theory & FWE
• Permutation & FWE
• False Discovery Rate
4
Outline
• Orientation
• Assessing Statistic images
• The Multiple Testing Problem
• Random Field Theory & FWE
• Permutation & FWE
• False Discovery Rate
Assessing Statistic Images
Where’s the signal?
t > 0.5t > 3.5t > 5.5
High Threshold Med. Threshold Low Threshold
Good Specificity
Poor Power(risk of false negatives)
Poor Specificity(risk of false positives)
Good Power
...but why threshold?!
6
• Don’t threshold, model the signal!– Signal location?
• Estimates and CI’s on(x,y,z) location
– Signal magnitude?• CI’s on % change
– Spatial extent?• Estimates and CI’s on activation volume
• Robust to choice of cluster definition
• ...but this requires an explicit spatial model
Blue-sky inference:What we’d like
space
Loc.̂ Ext.̂
Mag.̂
7
Blue-sky inference:What we need
• Need an explicit spatial model
• No routine spatial modeling methods exist– High-dimensional mixture modeling problem– Activations don’t look like Gaussian blobs
• Need realistic shapes, sparse – Some initial work
• Hartvig et al., Penny et al., Xu et al.
– Not part of mass-univariate framework
8
Real-life inference:What we get
• Signal location– Local maximum – no inference– Center-of-mass – no inference
• Sensitive to blob-defining-threshold
• Signal magnitude– Local maximum intensity – P-values (& CI’s)
• Spatial extent– Cluster volume – P-value, no CI’s
• Sensitive to blob-defining-threshold
9
Voxel-level Inference
• Retain voxels above -level threshold u
• Gives best spatial specificity– The null hyp. at a single voxel can be rejected
Significant Voxels
space
u
No significant Voxels
10
Cluster-level Inference
• Two step-process– Define clusters by arbitrary threshold uclus
– Retain clusters larger than -level threshold k
Cluster not significant
uclus
space
Cluster significantk k
11
Cluster-level Inference
• Typically better sensitivity
• Worse spatial specificity– The null hyp. of entire cluster is rejected– Only means that one or more of voxels in
cluster active
Cluster not significant
uclus
space
Cluster significantk k
12
Set-level Inference
• Count number of blobs c– Minimum blob size k
• Worst spatial specificity– Only can reject global null hypothesis
uclus
space
Here c = 1; only 1 cluster larger than kk k
13
Outline
• Orientation
• Assessing Statistic images
• The Multiple Testing Problem
• Random Field Theory & FWE
• Permutation & FWE
• False Discovery Rate
14
• Null Hypothesis H0
• Test statistic T– t observed realization of T
level– Acceptable false positive rate
– Level = P( T>u | H0 )
– Threshold u controls false positive rate at level
• P-value– Assessment of t assuming H0
– P( T > t | H0 )• Prob. of obtaining stat. as large
or larger in a new experiment
– P(Data|Null) not P(Null|Data)
Hypothesis Testing
u
Null Distribution of T
t
P-val
Null Distribution of T
15
Multiple Testing Problem
• Which of 100,000 voxels are sig.? =0.05 5,000 false positive voxels
• Which of (random number, say) 100 clusters significant? =0.05 5 false positives clusters
t > 0.5 t > 1.5 t > 2.5 t > 3.5 t > 4.5 t > 5.5 t > 6.5
16
MTP Solutions:Measuring False Positives
• Familywise Error Rate (FWE)– Familywise Error
• Existence of one or more false positives
– FWE is probability of familywise error
• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
17
MTP Solutions:Measuring False Positives
• Familywise Error Rate (FWE)– Familywise Error
• Existence of one or more false positives
– FWE is probability of familywise error
• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
18
FWE MTP Solutions: Bonferroni
• For a statistic image T...– Ti ith voxel of statistic image T
...use = 0/V
0 FWE level (e.g. 0.05)
– V number of voxels
– u -level statistic threshold, P(Ti u) = Conservative under correlation
Independent: V tests
Some dep.: ? tests
Total dep.: 1 test
Conservative under correlation
Independent: V tests
Some dep.: ? tests
Total dep.: 1 test
19
Outline
• Orientation
• Assessing Statistic images
• The Multiple Testing Problem
• Random Field Theory & FWE
• Permutation & FWE
• False Discovery Rate
21
FWE MTP Solutions: Controlling FWE w/ Max
• FWE & distribution of maximum
FWE = P(FWE)= P( i {Ti u} | Ho)= P( maxi Ti u | Ho)
• 100(1-)%ile of max distn controls FWEFWE = P( maxi Ti u | Ho) =
– whereu = F-1
max (1-)
. u
22
FWE MTP Solutions:Random Field Theory
• Euler Characteristic u
– Topological Measure• #blobs - #holes
– At high thresholds,just counts blobs
– FWE = P(Max voxel u | Ho)
= P(One or more blobs | Ho)
P(u 1 | Ho)
E(u | Ho)
Random Field
Suprathreshold Sets
Threshold
No holes
Never more than 1 blob
24
Random Field TheorySmoothness Parameterization
• E(u) depends on ||1/2
roughness matrix:
• Smoothness parameterized as Full Width at Half Maximum– FWHM of Gaussian kernel
needed to smooth a whitenoise random field to roughness
Autocorrelation Function
FWHM
25
• RESELS– Resolution Elements– 1 RESEL = FWHMx FWHMy FWHMz
– RESEL Count R• R = () || = (4log2)3/2 () / ( FWHMx FWHMy FWHMz ) • Volume of search region in units of smoothness
– Eg: 10 voxels, 2.5 FWHM 4 RESELS
• Beware RESEL misinterpretation– RESEL are not “number of independent ‘things’ in the image”
• See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res..
Random Field TheorySmoothness Parameterization
1 2 3 4
2 4 6 8 101 3 5 7 9
27
Random Field Intuition
• Corrected P-value for voxel value t Pc = P(max T > t)
E(t) () ||1/2 t2 exp(-t2/2)
• Statistic value t increases– Pc decreases (but only for large t)
• Search volume increases– Pc increases (more severe MTP)
• Smoothness increases (roughness ||1/2 decreases)
– Pc decreases (less severe MTP)
29
5mm FWHM
10mm FWHM
15mm FWHM
• Expected Cluster Size– E(S) = E(N)/E(L)– S cluster size– N suprathreshold volume
(T > uclus})
– L number of clusters
• E(N) = () P( T > uclus )
• E(L) E(u)
– Assuming no holes
Random Field TheoryCluster Size Tests
30
RFT Cluster Inference & Stationarity
• Problem w/ VBM– Standard RFT result assumes
stationarity, constant smoothness– Assuming stationarity, false
positive clusters will be found in extra-smooth regions
– VBM noise very non-stationary• Nonstationary cluster inference
– Must un-warp nonstationarity– Reported but not implemented
• Hayasaka et al, NeuroImage 22:676– 687, 2004
– Now available in SPM toolboxes• http://fmri.wfubmc.edu/cms/software#NS• Gaser’s VBM toolbox (w/ recent update!)
VBM:Image of FWHM Noise
Smoothness
Nonstationarynoise…
…warped to stationarity
33
Random Field Theory Limitations
• Sufficient smoothness– FWHM smoothness 3-4× voxel size (Z)– More like ~10× for low-df T images
• Smoothness estimation– Estimate is biased when images not sufficiently
smooth
• Multivariate normality– Virtually impossible to check
• Several layers of approximations• Stationary required for cluster size results
Lattice ImageData
Continuous Random Field
34
Real Data
• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)
– Item Recognition• Active:View five letters, 2s pause,
view probe letter, respond
• Baseline: View XXXXX, 2s pause,view Y or N, respond
• Second Level RFX– Difference image, A-B constructed
for each subject
– One sample t test
...
D
yes
...
UBKDA
Active
...
N
no
...
XXXXX
Baseline
35
Real Data:RFT Result
• Threshold– S = 110,776– 2 2 2 voxels
5.1 5.8 6.9 mmFWHM
– u = 9.870
• Result– 5 voxels above
the threshold– 0.0063 minimum
FWE-correctedp-value
-log 1
0 p
-va
lue
36
Outline
• Orientation
• Assessing Statistic images
• The Multiple Testing Problem
• Random Field Theory & FWE
• Permutation & FWE
• False Discovery Rate
37
Nonparametric Inference• Parametric methods
– Assume distribution ofstatistic under nullhypothesis
– Needed to find P-values, u
• Nonparametric methods– Use data to find
distribution of statisticunder null hypothesis
– Any statistic!
5%
Parametric Null Distribution
5%
Nonparametric Null Distribution
44
Controlling FWE: Permutation Test
• Parametric methods– Assume distribution of
max statistic under nullhypothesis
• Nonparametric methods– Use data to find
distribution of max statisticunder null hypothesis
– Again, any max statistic!
5%
Parametric Null Max Distribution
5%
Nonparametric Null Max Distribution
45
Permutation TestOther Statistics
• Collect max distribution– To find threshold that controls FWE
• Consider smoothed variance t statistic– To regularize low-df variance estimate
46
Permutation TestSmoothed Variance t
• Collect max distribution– To find threshold that controls FWE
• Consider smoothed variance t statistic
t-statisticvariance
mean difference
47
Permutation TestSmoothed Variance t
• Collect max distribution– To find threshold that controls FWE
• Consider smoothed variance t statistic
SmoothedVariancet-statistic
mean difference
smoothedvariance
48
Permutation TestStrengths
• Requires only assumption of exchangeability– Under Ho, distribution unperturbed by permutation– Allows us to build permutation distribution
• Subjects are exchangeable– Under Ho, each subject’s A/B labels can be flipped
• fMRI scans not exchangeable under Ho– Due to temporal autocorrelation
49
Permutation TestLimitations
• Computational Intensity– Analysis repeated for each relabeling– Not so bad on modern hardware
• No analysis discussed below took more than 3 hours
• Implementation Generality– Each experimental design type needs unique
code to generate permutations• Not so bad for population inference with t-tests
50
Permutation TestExample
• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)
– Item Recognition• Active:View five letters, 2s pause,
view probe letter, respond
• Baseline: View XXXXX, 2s pause,view Y or N, respond
• Second Level RFX– Difference image, A-B constructed
for each subject
– One sample, smoothed variance t test
...
D
yes
...
UBKDA
Active
...
N
no
...
XXXXX
Baseline
51
Permutation TestExample
• Permute!– 212 = 4,096 ways to flip 12 A/B labels– For each, note maximum of t image.
Permutation DistributionMaximum t
Maximum Intensity Projection Thresholded t
52
Permutation TestExample
• Compare with Bonferroni = 0.05/110,776
• Compare with parametric RFT– 110,776 222mm voxels– 5.15.86.9mm FWHM smoothness– 462.9 RESELs
53
t11 Statistic, RF & Bonf. Thresholdt11 Statistic, Nonparametric Threshold
uRF = 9.87uBonf = 9.805 sig. vox.
uPerm = 7.67
58 sig. vox.
Smoothed Variance t Statistic,Nonparametric Threshold
378 sig. vox.
Test Level vs. t11 Threshold
56
Reliability with Small Groups
• Consider n=50 group study– Event-related Odd-Ball paradigm, Kiehl, et al.
• Analyze all 50– Analyze with SPM and SnPM, find FWE thresh.
• Randomly partition into 5 groups 10– Analyze each with SPM & SnPM, find FWE
thresh
• Compare reliability of small groups with full– With and without variance smoothing.Skip
57
SPM t11: 5 groups of 10 vs all 505% FWE Threshold
10 subj 10 subj 10 subj
10 subj 10 subj all 50
T>10.93 T>11.04 T>11.01
T>10.69 T>10.10 T>4.66
2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45
4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49
58
T>4.09
SnPM t: 5 groups of 10 vs. all 505% FWE Threshold
10 subj 10 subj 10 subj
10 subj 10 subj
T>7.06 T>8.28 T>6.3
T>6.49 T>6.19
Arbitrary thresh of 9.0
T>9.00
all 50
2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45
4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49
59
10 subj 10 subj 10 subj
10 subj 10 subj
Arbitrary thresh of 9.0
T>9.00
all 50
SnPM SmVar t: 5 groups of 10 vs. all 505% FWE Threshold
T>4.69 T>5.04 T>4.57
T>4.84 T>4.64
2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45
4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49
60
Outline
• Orientation
• Assessing Statistic images
• The Multiple Testing Problem
• Random Field Theory & FWE
• Permutation & FWE
• False Discovery Rate
61
MTP Solutions:Measuring False Positives
• Familywise Error Rate (FWE)– Familywise Error
• Existence of one or more false positives
– FWE is probability of familywise error
• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
63
False Discovery RateIllustration:
Signal
Signal+Noise
Noise
64
FWE
6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%
Control of Familywise Error Rate at 10%
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%
Control of Per Comparison Rate at 10%
Percentage of Null Pixels that are False Positives
Control of False Discovery Rate at 10%
Occurrence of Familywise Error
Percentage of Activated Pixels that are False Positives
65
Benjamini & HochbergProcedure
• Select desired limit q on FDR• Order p-values, p(1) p(2) ... p(V)
• Let r be largest i such that
• Reject all hypotheses corresponding to p(1), ... , p(r).
p(i) i/V qp(i)
i/V
i/V qp-
valu
e
0 1
01
JRSS-B (1995)57:289-300
66
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
P-value threshold when no signal:
/V
P-value thresholdwhen allsignal:
Ord
ered
p-v
alue
s p
(i)
Fractional index i/V
Adaptiveness of Benjamini & Hochberg FDR
67
Benjamini & Hochberg Procedure Details
• Standard Result– Positive Regression Dependency on Subsets
P(X1c1, X2c2, ..., Xkck | Xi=xi) is non-decreasing in xi
• Only required of null xi’s– Positive correlation between null voxels– Positive correlation between null and signal voxels
• Special cases include– Independence– Multivariate Normal with all positive correlations
• Arbitrary covariance structure– Replace q by q/c(V),
c(V) = i=1,...,V 1/i log(V)+0.5772– Much more stringent
Benjamini &Yekutieli (2001).Ann. Stat.29:1165-1188
69
Controlling FDR:Varying Signal Extent
Signal Intensity 3.0 Signal Extent 1.0 Noise Smoothness 3.0
p = z =
1
70
Controlling FDR:Varying Signal Extent
Signal Intensity 3.0 Signal Extent 2.0 Noise Smoothness 3.0
p = z =
2
71
Controlling FDR:Varying Signal Extent
Signal Intensity 3.0 Signal Extent 3.0 Noise Smoothness 3.0
p = z =
3
72
Controlling FDR:Varying Signal Extent
Signal Intensity 3.0 Signal Extent 5.0 Noise Smoothness 3.0
p = 0.000252 z = 3.48
4
73
Controlling FDR:Varying Signal Extent
Signal Intensity 3.0 Signal Extent 9.5 Noise Smoothness 3.0
p = 0.001628 z = 2.94
5
74
Controlling FDR:Varying Signal Extent
Signal Intensity 3.0 Signal Extent 16.5 Noise Smoothness 3.0
p = 0.007157 z = 2.45
6
75
Controlling FDR:Varying Signal Extent
Signal Intensity 3.0 Signal Extent 25.0 Noise Smoothness 3.0
p = 0.019274 z = 2.07
7
77FWE Perm. Thresh. = 9.877 voxels
Real Data: FDR Example
FDR Threshold = 3.833,073 voxels
• Threshold– Indep/PosDep
u = 3.83– Arb Cov
u = 13.15
• Result– 3,073 voxels above
Indep/PosDep u– <0.0001 minimum
FDR-correctedp-value
78
Conclusions
• Must account for multiplicity– Otherwise have a fishing expedition
• FWE– Very specific, not so sensitive– Random Field Theory
• Great for single-subject fMRI, EEG
– Nonparametric / SnPM• Much better power voxel-wise than RFT for small DF
• FDR– Less specific, more sensitive– Interpret with care!
• FP risk is over whole set of surviving voxels
79
References
• Most of this talk covered in these papers
TE Nichols & S Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12(5): 419-446, 2003.
TE Nichols & AP Holmes, Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2001.
CR Genovese, N Lazar & TE Nichols, Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002.