prev

next

of 29

View

30Download

0

Embed Size (px)

DESCRIPTION

David Field Thanks to…. Tom Johnstone, Jason Gledhill, FMRIB. Thresholding using FEAT. Overview. What is being thresholded? Multiple comparisons problem in FMRI Dealing with the multiple comparisons problem FWE control and other approaches Reproducibility of FMRI experiments - PowerPoint PPT Presentation

Thresholding using FEATDavid Field

Thanks to.Tom Johnstone, Jason Gledhill, FMRIB

OverviewWhat is being thresholded?Multiple comparisons problem in FMRIDealing with the multiple comparisons problemFWE control and other approachesReproducibility of FMRI experiments________________________________________

Writing FSL scripts and batch files in Linux

Thresholding the starting pointEach COPE is divided by its standard error to produce a volume of t statistics

t is a measure of estimated effect size relative to the degree of uncertainty of the estimateA large t arises from a large effect size, a small amount of uncertainty due to measurement error, individual variation and noise, or both at onceFSL converts t to z prior to thresholdingz is more convenient, but for large N, z and t are equivalent anyway

Intuitive thresholdingWhen COPE > error & noise, then z > 1If z is >> 1, there is probably an effect of interest presentOpen an unthresholded zstat image in FSLVIEW and manually threshold itnote that the negative values of z have the same interpretation except that the COPE value is negative, so the direction of effect is reversedconventionally, to look at these negative values you reverse the COPE to make them positive (e,g. -1 instead of 1)

Formal thresholding converting z to a p valueAssuming the null hypothesis, the expected value of the COPE would be 0 with some error/noise added, and so the value of z would be smallz can tell us the probability at each voxel that the observed COPE might be simply due to the error/noise:z > 1, p = 0.31: i.e. 30% chancez > 2, p = 0.046: i.e. less than 5% chancez > 3, p = 0.0027: i.e. less than 0.3% chance

Formal thresholding converting z to a p valueWe can apply a threshold to the data: show only voxels where z > z'. e.g z > 2 or z > 3

No thresh.z > 1z > 2z > 3

Multiple comparisons problemIf we thresholded an image of pure noise (i.e. no real effect) using a threshold of z > 2.1 (p < 0.05) at each voxel, with 200,000 voxels 0.05*200,000 = 10000 voxels would survive thresholdingfalse positives: apparent activationOne solution is to control the familywise error rate (FWE)This means that you adjust thresholding so that the total risk of one or more false positives among all the tests performed is < 0.05 (or other desired p)The Bonferroni method is to divide the desired p by the total number of independent tests performed0.05 / 200,000 = 0.00000025, so threshold at z > 5But this assumes all voxels to be independent, which is very wrong for fMRI data. So the Bonferroni correction is overly strict for fMRI, and we may miss real activation.

Voxelwise FWE option in FEAT poststatsIf you select this option you are controlling the probability of one or more false activations occurring in the whole imagethe effective number of tests is equal to the estimated number of RESELS in the imagelots of assumptions (works better if you smooth more)Assumptions not met for group analysis with small N, where the small number of observations at each voxel makes estimation of image smoothness unreliableIf you select the Uncorrected option in FEAT, this means uncorrected for multiple comparisons

Cluster based thresholdingIf you carry out uncorrected thresholding with z > 2.3 (p < 0.01) and look at the resultssome clusters will be very small (just one or two voxels)other clusters will be large (100s of voxels)The voxelwise FWE has not been controlled, so there will be false positive activations in the imageIntuitively, the small activation clusters are more likely to arise due to random sampling from a null disribution than the large clustersunless you are expecting a small activation in a specific region, e.g. superior colliculus

Cluster based thresholdingSignificant Voxelsspacez'No significant Voxelsz' is the threshold, e.g. z > 3 (p < 0.001) applied voxelwise

Cluster based thresholdingSignificant Voxelsspacez'z' is the threshold, e.g. z > 2.3 (p < 0.01) applied voxelwise

Cluster based thresholdingCluster not significant spaceCluster significantz'Intuitively, under the null hypothesis (i.e. in an image of pure noise/error), the lower the voxelwise z', the larger the false-positive clusters we are likely to see.Random Field Theory (RFT) can be used to estimate how big a cluster needs to be at a given voxelwise threshold for it to be highly unlikely (e.g. p < 0.05) that we would see any such clusters under the null hypothesis*This critical cluster size also depends on the smoothness of the data, but RFT takes that into account

Cluster based thresholdingCluster not significant spaceCluster significantkkz'So, it's a two-stage procedure:- threshold the image voxelwise at a certain z'- apply RFT to keep only those clusters that are big enough for that z' to ensure an overall (Familywise) p < 0.05There are no set rules for what voxelwise z' to use when doing cluster based thresholding.

Dependency of number of clusters on choice of voxelwise thresholdHigh voxelwise z': able to detect small clusters of highly activated voxels, but miss larger clusters of somewhat less activated voxelsLow voxelwise z': unable to detect small clusters of highly activated voxels, but capture larger clusters of somewhat less activated voxelsChoice will depend on nature of task and hypotheses concerning size/region of activationsThe number and size of clusters also depends upon the amount of smoothing that took place in preprocessing

Cluster based thresholding in FEATIf you choose the cluster option on the postats tab you set two thresholding valuesthe first one is an uncorrected voxelwise threshold. This is typically quite liberal, e.g. z > 2.3 (p < 0.01)the second is the familywise error threshold: the probability of one or more false positive clusters in the image. Usually this is set to p < 0.05

Voxelwise z'Familywise p

Dependency of cluster size threshold on voxel level threshold (example data)FWE p < 0.05

Summary of thresholding options in FSLVoxelwise, uncorrected for multiple comparisonsThis can be useful for checking data quality but is almost never acceptable for published researchVoxelwise, p value is the probability of one or more falsely activated voxels in the imagebut the number of independent comparisons is less than the number of voxelsClusterwise, p value is the probability of one or more falsely activated clusters in the imageresults dependant upon initial voxelwise uncorrected threshold

Other thresholding optionsNonparametric approachespermutation testingFDR (false discovery rate)Why control the FWE?As researchers, what we really want to control is the proportion of voxels declared active that are false positivesChoosing an FDR of 0.01, if you declare 1000 voxels active, on average across many samples, 10 of them will be false positivesIf there were only 200 activated voxels ~= 2 false positivesThis makes more sense than controlling the probability of a single false positive in the whole brainFDR works well with unsmoothed data (unlike FWE), and it is available using a command line program in FSL

Brain masks: reducing the number of voxelsFWE and FDR both become more conservative as the number of voxels in the image increasesYou dont expect activations in the white matter or ventriclesthis suggests that performing tissue segmentation and removing non-grey matter voxels from the image prior to the model fitting stage is a good ideaCaution: the presence activation in white matter or ventricles is often a clue indicating head motion problems or image spikesso, run the analysis with all voxels in firstIf you are only interested in a specific part of the brain then consider scanning only that part of the brainthis will also permit a shorter TR or smaller voxelsbut also acquire a whole_head epi for registration purposesOr extract a region of interest ROI for separate analysis

Genovese, Lazar, & Nichols (2002)Variation across subjects has a critical impact on threshold selection in practice. It has frequently been observed that, even with the same scanner and experimental paradigm, subjects vary in the degree of activation they exhibit, in the sense of contrast-to-noise. Subjective selection of thresholds (set low enough that meaningful structure is observed, but high enough so that appreciable random structure is not evident) suggests that different thresholds are appropriate for different subjectsSo, perhaps intuitive thresholding is best after all?I have seen this used in published papersThresholding an alternative view

Thresholding an alternative viewJournal reviewers and editors are always reassured if the rate of false positives has been controlled using FWEthis is why researchers make every effort to produce activations that survive this very stringent testHowever, there is a trade-off between the false positive rate and the false negative rateUse of FWE might be producing the wrong balance between these two types of error

Classical statistical inference with a single data set provides control of the false positive ratebut it does not quantify the probability that there is a real effect in the population, which is not reflected in this specific sample due to chance (false negative rate)If an experiment is repeated many times, and the activations are almost identical each time this implies that both false positive and false negative rates are lowIf the activations are slightly different each time this could be due to the presence of false positives, false negatives, or a mixture of bothTherefore, reproducibility provides a way of knowing something about how many real