Upload
jeremy-davis
View
223
Download
1
Embed Size (px)
Citation preview
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Thresholding and multiple comparisons
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Information for making inferences on activation
• Where? Signal location– Local maximum – no inference in SPM
• Could extract peak coordinates and test (e.g., Woods lab, Ploghaus, 1999)
• How strong? Signal magnitude– Local contrast intensity – Main thing tested in SPM
• How large? Spatial extent– Cluster volume – Can get p-values from SPM
• Sensitive to blob-defining-threshold
• When? Signal timing– No inference in SPM; but see Aguirre 1998; Bellgowan 2003;
Miezin et al. 2000, Lindquist & Wager, 2007
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Three “levels” of inference
• Voxel-level– Most spatially specific, least sensitive
• Cluster-level– Less spatially specific, more sensitive
• Set-level– No spatial specificity (no locatization)– Can be most sensitive
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Voxel-level Inference
• Retain voxels above α-level threshold uα
• Gives best spatial specificity– The null hyp. at a single voxel can be
rejected
Significant Voxels
space
uα
No significant Voxels
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Cluster-level Inference
• Two step-process– Define clusters by arbitrary threshold
uclus
– Retain clusters larger than α-level threshold kα
Cluster not significant
uclus
space
Cluster significantkα kα
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Cluster-level Inference
• Typically better sensitivity• Worse spatial specificity
– The null hyp. of entire cluster is rejected
– Only means that one or more voxels in cluster active
Cluster not significant
uclus
space
Cluster significantkα kα
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Multiple comparisons…
SCANLab
http://www.columbia.edu/cu/psychology/tor/
• Null Hypothesis H0
• Test statistic T– t observed realization of T
• α level– Acceptable false positive rate
– Level α = P( T>uα | H0 )
– Threshold uα controls false positive rate at level α
• P-value– Assessment of t assuming H0
– P( T > t | H0 )• Prob. of obtaining stat. as large
or larger in a new experiment
– P(Data|Null) not P(Null|Data)
Hypothesis Testing
uα
α
Null Distribution of T
t
P-val
Null Distribution of T
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Multiple Comparisons Problem
• Which of 100,000 voxels are significant?– Expected false positives for independent
tests:
– αNtests
– α=0.05 : 5,000 false positive voxels
• Which of (e.g.) 100 clusters are significant?– α=0.05 : 5 false positives clusters
• Solution: determine a threshold that controls the false positive rate (α) per map (across all voxels) rather than per voxel
SCANLab
http://www.columbia.edu/cu/psychology/tor/
=0.10 =0.01 =0.001
Note: All images smoothed with FWHM=12mm
Why correct?
• Without multiple comparisons correction, false positive rate in whole-brain analysis is high.• Using typical arbitrary height and extent threshold is too liberal! (e.g., p < .001 and 10 voxels; Forman et al., 1995).• This is because data is spatially smooth, and false positives tend to be “blobs” of many voxels:
From Wager, Lindquist, & Hernandez, 2009. “Essentials of functional neuroimaging.” In: Handbook of Neuroscience for the Behavioral Sciences
True signal inside red square, no signal outside. White is “activation”
SCANLab
http://www.columbia.edu/cu/psychology/tor/
MCP Solutions:Measuring False Positives
• Familywise Error Rate (FWER)
• False Discovery Rate (FDR)
SCANLab
http://www.columbia.edu/cu/psychology/tor/
False Discovery Rate (FDR): Executive summary
FDR = Expected proportion of reported positives that are false positives• (We don’t know what the actual number of false
positives is, just the expected number.)
• Easy to calculate based on p-value map (Benjamini and Hochberg, 1995)
• Always more liberal than FWER control• Adaptive threshold: somewhere between
uncorrected and FWER, depending on signal
• Can claim that most results are likely to be true positives, but not which ones are false
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Family-wise Error Rate vs. False Discovery Rate
Illustration:
Signal
Signal+Noise
Noise
SCANLab
http://www.columbia.edu/cu/psychology/tor/
FWE
6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%
Control of Familywise Error Rate at 10%
11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%
Control of Per Comparison Rate at 10%
Percentage of Null Pixels that are False Positives
Control of False Discovery Rate at 10%
Occurrence of Familywise Error
Percentage of Activated Pixels that are False Positives
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Controlling FWE withBonferroni…
SCANLab
http://www.columbia.edu/cu/psychology/tor/
FWE MCP Solutions: Bonferroni
• For a statistic image T...– Ti ith voxel of statistic image T, where i = 1…V
• ...use α = αc/V– αc desired FWER level (e.g. 0.05)– α new alpha level to achieve αc corrected
– V number of voxels
• Example: for p < .05 and V = 1000 tests, use p < .05/1000 = .00005.
• Assumes independent tests (voxels) – but voxels are generally positively correlated (images are smooth)!
Conservative under correlation
Independent: V tests
Some dep.: ? tests
Total dep.: 1 test
Conservative under correlation
Independent: V tests
Some dep.: ? tests
Total dep.: 1 testNote: most of the time, this is better:αc = 1 - (1-α)^V, α = 1 – (1-αc) ^ (1/V)See Shaffer, Ann. Rev. Psych. 1995
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Controlling FWE withRandom field theory…
SCANLab
http://www.columbia.edu/cu/psychology/tor/
FWER MCP Solutions: Controlling FWER w/ Max
• FWER is the probability of finding any false positive result in the entire brain map– If any voxel is a false positive, the voxel with
the max statistic value (lowest p-value) will be a false positive
– Can assess the probability that the max stat value under the null hypothesis exceeds the threshold
– FWER is the threshold that controls the probability of the max exceeding that threshold at α uα
α
Distribution of maxima
Threshold such that the
max only exceeds it α%
of the time
How to get the distribution of maxima?GRF!
SCANLab
http://www.columbia.edu/cu/psychology/tor/
SPM approach:Random fields…SPM approach:Random fields…
• Consider statistic image as lattice representation of a continuous random field
• Use results from continuous random field theory
• Consider statistic image as lattice representation of a continuous random field
• Use results from continuous random field theory
lattice representation
SCANLab
http://www.columbia.edu/cu/psychology/tor/
FWER MCP Solutions:Random Field Theory
• Euler Characteristic χu
– Topological Measure• #blobs - #holes
– At high thresholds,just counts blobs
– FWER = P(Max voxel χ u | Ho)
= P(One or more blobs | Ho)
= P(χu > 1 | Ho)
= E(χu | Ho)
Random Field
Suprathreshold Sets
Threshold
IF No holes
IF Never more than 1 blob
How high does the threshold need to be? ~p<.001?
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Random Field Intuition
• Corrected P-value for voxel value t depends on– Volume of search area– Roughness of data in search area– The t-value in any given voxel tested
• Statistic value t increases– Pc decreases (but only for large t)
• Search volume increases– Pc increases (more severe MCP)
• Roughness increases (Smoothness decreases)– Pc increases (more severe MCP)
SCANLab
http://www.columbia.edu/cu/psychology/tor/
• RESELS = Resolution Elements– 1 RESEL = FWHMx x FWHMy x FWHMz
– Volume of search region in units of smoothness– Eg: 10 voxels, 2.5 FWHM = 4 RESELS
• Beware RESEL misinterpretation– The RESEL count isnot the number of independent
tests in image– See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res.– Do not Bonferroni correct based on resel count.
Random Field TheorySmoothness Parameterization
1 2 3 4
2 4 6 8 101 3 5 7 9
SCANLab
http://www.columbia.edu/cu/psychology/tor/
5mm FWHM
10mm FWHM
15mm FWHM
• Estimate distribution of cluster sizes (mean and variance)
• Mean: Expected Cluster Size– Expected supra-threshold volume /
expected number of clusters= E(S) / E(L)
= E(L) = E(χu), expected Euler characteristic, assuming no holes (high threshold)
This provides uncorrected p-values: Chances of getting a blob of >= k voxels under the null hypothesis
…which are then corrected:Pc = E(L) Puncorr
Random Field TheoryCluster Size Tests
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Random Field Theory Assumptions and Limitations
• Sufficient smoothness– FWHM smoothness 3-4× voxel size (Z)– More like ~10× for low-df T images
• Smoothness estimation– Estimate is biased when images not
sufficiently smooth
• Multivariate normality– Virtually impossible to check
• Several layers of approximations• Stationarity required for cluster size
results– Cluster size results tend to be too
lenient in practice!
Lattice ImageData
Continuous Random Field
SCANLab
http://www.columbia.edu/cu/psychology/tor/
FWER correctionStatistical Nonparametric Mapping -
SnPM
Thresholding without (m)any assumptions
Almost all slides from Tom Nichols
SCANLab
http://www.columbia.edu/cu/psychology/tor/28
Nonparametric Inference
• Parametric methods– Assume distribution of
statistic under nullhypothesis
– Needed to find P-values, u
• Nonparametric methods– Use data to find
distribution of statisticunder null hypothesis
– Any statistic!
5%
Parametric Null Distribution
5%
Nonparametric Null Distribution
SCANLab
http://www.columbia.edu/cu/psychology/tor/29
Permutation TestToy Example
• Data from V1 voxel in visual stim. experimentA: Active, flashing checkerboard B: Baseline,
fixation6 blocks, ABABAB Just consider block averages...
• Null hypothesis Ho – No experimental effect, A & B labels arbitrary
• Statistic– Mean difference
A B A B A B
103.00 90.48 99.93 87.83 99.76 96.06
SCANLab
http://www.columbia.edu/cu/psychology/tor/30
Permutation TestToy Example
• Under Ho
– Consider all equivalent relabelings
AAABBB ABABAB BAAABB BABBAA
AABABB ABABBA BAABAB BBAAAB
AABBAB ABBAAB BAABBA BBAABA
AABBBA ABBABA BABAAB BBABAA
ABAABB ABBBAA BABABA BBBAAA
SCANLab
http://www.columbia.edu/cu/psychology/tor/31
Permutation TestToy Example
• Under Ho
– Consider all equivalent relabelings– Compute all possible statistic values
AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86
AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15
AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67
AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25
ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82
SCANLab
http://www.columbia.edu/cu/psychology/tor/32
Permutation TestToy Example
• Under Ho
– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation
distribution
AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86
AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15
AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67
AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25
ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82
SCANLab
http://www.columbia.edu/cu/psychology/tor/33
Permutation TestToy Example
• Under Ho
– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation
distribution
AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86
AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15
AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67
AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25
ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82
SCANLab
http://www.columbia.edu/cu/psychology/tor/34
Permutation TestToy Example
• Under Ho
– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation
distribution
0 4 8-4-8
SCANLab
http://www.columbia.edu/cu/psychology/tor/35
Permutation TestStrengths
• Requires only assumption of exchangeability:– Under Ho, distribution unperturbed by
permutation
• Subjects are exchangeable (good for group analysis)– Under Ho, each subject’s A/B labels can be
flipped
• fMRI scans not exchangeable under Ho (bad for time series analysis/single subject)– Due to temporal autocorrelation
SCANLab
http://www.columbia.edu/cu/psychology/tor/36
Controlling FWER: Permutation Test
• Parametric methods– Assume distribution of
max statistic under nullhypothesis
• Nonparametric methods– Use data to find
distribution of max statisticunder null hypothesis
– Again, any max statistic!
5%
Parametric Null Max Distribution
5%
Nonparametric Null Max Distribution
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Results:RFT vs Bonf. vs Perm.
t Threshold (0.05 Corrected)
df RF Bonf Perm Verbal Fluency 4 4701.32 42.59 10.14 Location Switching 9 11.17 9.07 5.83 Task Switching 9 10.79 10.35 5.10 Faces: Main Effect 11 10.43 9.07 7.92 Faces: Interaction 11 10.70 9.07 8.26 Item Recognition 11 9.87 9.80 7.67 Visual Motion 11 11.07 8.92 8.40 Emotional Pictures 12 8.48 8.41 7.15 Pain: Warn ing 22 5.93 6.05 4.99 Pain: Anticipation 22 5.87 6.05 5.05
SCANLab
http://www.columbia.edu/cu/psychology/tor/
FWER Correction: Performance summary
• RF & Perm adapt to smoothness
• Perm & Truth close
• Bonferroni close to truth for low smoothness
19 df
more
FamilywiseErrorThresholds
Nichols and Hayasaka, 2003
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Permutation vs. Bonferroni: Power curves
d = 2
d = 1
d = 0.5
d = 2
d = 1
d = 0.5
From Wager, Lindquist, & Hernandez, 2009. “Essentials of functional neuroimaging.” In: Handbook of Neuroscience for the Behavioral Sciences
• Bonferroni and SPM’s GRF correction do not account accurately for spatial smoothness with the sample sizes and smoothness values typical in imaging experiments.• Nonparametric tests are more accurate and usually more sensitive.
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Performance Summary
• Bonferroni– Not adaptive to smoothness– Not so conservative for low
smoothness• Random Field
– Adaptive– Conservative for low smoothness & df
• Permutation– Adaptive (Exact)
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Controlling the False Discovery Rate…
SCANLab
http://www.columbia.edu/cu/psychology/tor/
MCP Solutions:Measuring False Positives
• Familywise Error Rate (FWER)– Familywise Error
• Existence of one or more false positives
– FWER is probability of familywise error
• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so
• Realized false discovery rate: V/R
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Benjamini & HochbergProcedure
• Select desired limit q on FDR (e.g., 0.05)• Order p-values, p(1) < p(2) < ... < p(V)
• Let r be largest i such that
• Reject all hypotheses corresponding to p(1), ... , p(r).
• Example:– 1000 Voxels, q = .05– Keep those above P < .05*i/100
p(i) < (i/V)qp(i)
i/V
(i/V)qp-
valu
e
0 1
01
JRSS-B (1995)57:289-300
• Under positive dependence; OK for fMRI
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Benjamini & Hochberg:Key Properties
• FDR is controlled E(FDP) = q m0/V– Conservative, if large fraction of nulls false
• Adaptive– Threshold depends on amount of signal
• More signal, More small p-values,More results with only alpha % expected false discoveries
• Hard to find signal in small areas, even with low p-values
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Conclusions
• Must account for multiple comparisons– Otherwise have a fishing expedition
• FWER– Very specific, not very sensitive
• FDR– Less specific, more sensitive– Adaptive
• Good: Power increases w/ amount of signal• Bad: Number of false positives increases too
SCANLab
http://www.columbia.edu/cu/psychology/tor/
References
• Most of this talk covered in these papers
TE Nichols & S Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12(5): 419-446, 2003.
TE Nichols & AP Holmes, Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping, 15:1-25, 2001.
CR Genovese, N Lazar & TE Nichols, Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002.
SCANLab
http://www.columbia.edu/cu/psychology/tor/
End of this section.
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Set-level Inference
• Count number of blobs c – c depends on by uclus & blob size k
threshold
• Worst spatial specificity– Only can reject global null hypothesis
uclus
space
Here c = 1; only 1 cluster larger than kk k
SCANLab
http://www.columbia.edu/cu/psychology/tor/
Random Field TheoryCluster Size Corrected P-Values
• Previous results give uncorrected P-value
• Corrected P-valueIf E(L) is expected number of clusters:
– Bonferroni• Correct for expected number of clusters• Corrected Pc = E(L) Puncorr
– Poisson Clumping Heuristic (Adler, 1980)• Corrected Pc = 1 - exp( -E(L) Puncorr )