Upload
deborah-shields
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Hierarchical ModelsHierarchical Models Hierarchical ModelsHierarchical Models
Wellcome Dept. of Imaging Neuroscience
University College London
Stefan KiebeStefan Kiebell
Overview
Why hierarchical models?Why hierarchical models?
The modelThe model
EstimationEstimation
Variance componentsVariance components
Expectation-MaximizationExpectation-Maximization
Summary Statistics approachSummary Statistics approach
ExamplesExamples
Overview
Why hierarchical models?Why hierarchical models?
The modelThe model
EstimationEstimation
Variance componentsVariance components
Expectation-MaximizationExpectation-Maximization
Summary Statistics approachSummary Statistics approach
ExamplesExamples
Why hierarchical models?Why hierarchical models?
Data
fMRI, single subjectfMRI, single subject
fMRI, multi-subjectfMRI, multi-subject ERP/ERF, multi-subjectERP/ERF, multi-subject
EEG/MEG, single subjectEEG/MEG, single subject
Hierarchical model for all imaging data!
Hierarchical model for all imaging data!
Time
Time
Intensity
Tim
e
single voxeltime series
single voxeltime series
Reminder: voxel by voxel
modelspecification
modelspecification
parameterestimation
parameterestimation
hypothesishypothesis
statisticstatistic
SPMSPM
Overview
Why hierarchical models?Why hierarchical models?
The modelThe model
EstimationEstimation
Variance componentsVariance components
Expectation-MaximizationExpectation-Maximization
Summary Statistics approachSummary Statistics approach
ExamplesExamples
The modelThe model
General Linear Model Xy
=
+X
N
1
N N
1 1p
p
Model is specified by1. Design matrix X2. Assumptions about
Model is specified by1. Design matrix X2. Assumptions about
N: number of scansp: number of regressors
N: number of scansp: number of regressors
y
)()()()1(
)2()2()2()1(
)1()1()1(
nnnn X
X
Xy
)()()( i
k
i
kk
i QC
Linear hierarchical model
Hierarchical modelHierarchical model Multiple variance components at each level
Multiple variance components at each level
At each level, distribution of parameters is given by level above.
At each level, distribution of parameters is given by level above.
What we don’t know: distribution of parameters and variance parameters.
What we don’t know: distribution of parameters and variance parameters.
=
Example: Two level model
2221
111
X
Xy
1
1+ 1 = 2X
2
+ 2y
)1(1X
)1(2X
)1(3X
Second levelSecond level
First levelFirst level
)()()()1(
)2()2()2()1(
)1()1()1(
nnnn X
X
Xy
Algorithmic Equivalence
Hierarchicalmodel
Hierarchicalmodel
ParametricEmpirical
Bayes (PEB)
ParametricEmpirical
Bayes (PEB)
EM = PEB = ReMLEM = PEB = ReML
RestrictedMaximumLikelihood
(ReML)
RestrictedMaximumLikelihood
(ReML)
Single-levelmodel
Single-levelmodel
)()()1(
)()1()1(
)2()1()1(
...
nn
nn
XXXX
Xy
Overview
Why hierarchical models?Why hierarchical models?
The modelThe model
EstimationEstimation
Variance componentsVariance components
Expectation-MaximizationExpectation-Maximization
Summary Statistics approachSummary Statistics approach
ExamplesExamples
EstimationEstimation
Estimation
EM-algorithmEM-algorithm
gJ
d
LdJ
d
dLg
1
2
2
E-stepE-step
M-stepM-step
Friston et al. 2002, Neuroimage
Friston et al. 2002, Neuroimage
kk
kQC
Assume, at voxel j:
Assume, at voxel j: kjjk
)lnL maximise p(y|λ
111
NppNN
Xy
yCXC
XCXCT
yy
Ty
1||
11| )(
And in practice?
Most 2-level models are just too big to compute.
Most 2-level models are just too big to compute.
And even if, it takes a long time! And even if, it takes a long time!
Moreover, sometimes we‘re only interested in one specific effect and
don‘t want to model all the data.
Moreover, sometimes we‘re only interested in one specific effect and
don‘t want to model all the data.
Is there a fast approximation?Is there a fast approximation?
Data Design Matrix Contrast Images )ˆ(ˆ
ˆ
T
T
craV
ct
SPM(t)
Summary Statistics approach
1̂
2̂
11̂
12̂
21̂
22̂
211̂
212̂
Second levelSecond levelFirst levelFirst level
One-samplet-test @ 2nd level
One-samplet-test @ 2nd level
Validity of approach
The summary stats approach is exact under i.i.d. assumptions at the 2nd level, if for each session:
The summary stats approach is exact under i.i.d. assumptions at the 2nd level, if for each session:
All other cases: Summary stats approach seems to be robust against typical violations.
All other cases: Summary stats approach seems to be robust against typical violations.
Within-session covariance the sameWithin-session covariance the same
First-level design the sameFirst-level design the same
One contrast per sessionOne contrast per session
Friston et al. (2004) Mixed effects and fMRI studies, Neuroimage
Friston et al. (2004) Mixed effects and fMRI studies, Neuroimage
Summarystatistics
Summarystatistics
EMapproach
EMapproach
Mixed-effects
)2(̂
jjj
i
Tii QXQXV
XXY
)2()2()1()1()1()1(
)2(
)1(ˆ
yVXXVX TT 111)2( )(ˆ yVXXVX TT 111)2( )(ˆ
yVXXVX TT 111)1( )(ˆ yVXXVX TT 111)1( )(ˆ
},,{ QXnyyREML T },,{ QXnyyREML T
Step 1
Step 2
},,,{
][)1()2(
1)1()1(
1
)2()1()0(
TXQXQQ
XXXX
IV
XXX
][ )1()0(
datay
Overview
Why hierarchical models?Why hierarchical models?
The modelThe model
EstimationEstimation
Variance componentsVariance components
Expectation-MaximizationExpectation-Maximization
Summary Statistics approachSummary Statistics approach
ExamplesExamples
Variance componentsVariance components
Sphericity
‚sphericity‘ means:‚sphericity‘ means:
ICov 2)(
Xy )()( TECovC
Scans
Sca
nsi.e.
2)( iVar12
Errors are independent but not identical
Errors are independent but not identical
Errors are not independent and not identical
Errors are not independent and not identical
Errorcovariance
Errorcovariance
2nd level: Non-sphericity
1st level fMRI: serial correlations
withwithttt a 1 ),0(~ 2 Nt
autoregressive process of order 1 (AR-1)
)(Covautocovariance-
function
N
N
Restricted Maximum Likelihood
Xy ?)(Cov observed
ReMLReMLestimated
2211ˆˆ QQ
j
Tjj yy
voxel
1Q
2Q
Estimation in SPM5
jjj Xy
jOLSj yX ,̂
),,ReML()(ˆˆ
QXyyvoCCjvoxel
Tjj
jTT
MLj yVXXVX 111, )(ˆ
Maximum LikelihoodMaximum LikelihoodOrdinary least-squaresOrdinary least-squares
ReML (pooled estimate)ReML (pooled estimate)
•optional in SPM5•one pass through data•statistic using effective degrees of freedom
•optional in SPM5•one pass through data•statistic using effective degrees of freedom
•2 passes (first pass for selection of voxels)
•more accurate estimate of V
•2 passes (first pass for selection of voxels)
•more accurate estimate of V
2nd level: Non-sphericity
Errors can be non-independent and non-identical
when…
Errors can be non-independent and non-identical
when…
Several parameters per subject, e.g. repeated measures designSeveral parameters per subject, e.g. repeated measures design
Complete characterization of the hemodynamic response,e.g. taking more than 1 HRF parameter to 2nd level
Complete characterization of the hemodynamic response,e.g. taking more than 1 HRF parameter to 2nd level
Example data set (R Henson) on: http://www.fil.ion.ucl.ac.uk/spm/data/
Example data set (R Henson) on: http://www.fil.ion.ucl.ac.uk/spm/data/
Overview
Why hierarchical models?Why hierarchical models?
The modelThe model
EstimationEstimation
Variance componentsVariance components
Expectation-MaximizationExpectation-Maximization
Summary Statistics approachSummary Statistics approach
ExamplesExamplesExamplesExamples
Example I
Stimuli:Stimuli: Auditory Presentation (SOA = 4 secs) of(i) words and (ii) words spoken backwards
Auditory Presentation (SOA = 4 secs) of(i) words and (ii) words spoken backwards
Subjects:Subjects:
e.g. “Book”
and “Koob”
e.g. “Book”
and “Koob”
fMRI, 250 scans per subject, block design
fMRI, 250 scans per subject, block design
Scanning:Scanning:
U. Noppeney et al.U. Noppeney et al.
(i) 12 control subjects(ii) 11 blind subjects
(i) 12 control subjects(ii) 11 blind subjects
Population differences
1st level:1st level:
2nd level:2nd level:
ControlsControls BlindsBlinds
X
]11[ TcV
Example II
Stimuli:Stimuli: Auditory Presentation (SOA = 4 secs) of words Auditory Presentation (SOA = 4 secs) of words
Subjects:Subjects:
fMRI, 250 scans persubject, block design
fMRI, 250 scans persubject, block designScanning:Scanning:
U. Noppeney et al.U. Noppeney et al.
(i) 12 control subjects(i) 12 control subjects
Motion Sound Visual Action
“jump” “click” “pink” “turn”
Question:Question:What regions are affectedby the semantic content ofthe words?
What regions are affectedby the semantic content ofthe words?
Repeated measures Anova
1st level:1st level:
2nd level:2nd level:
VisualVisual ActionAction
X
1100
0110
0011Tc
?=
?=
?=
MotionMotion SoundSound
V
X
Example 3: 2-level ERP model
Single subjectSingle subject
Trial type 1Trial type 1
Trial type iTrial type i
Trial type nTrial type n. . .. . .
. . .. . .
Multiple subjectsMultiple subjects
Subject 1Subject 1
Subject mSubject m
Subject jSubject j
. . .. . .
. . .. . .
. . . =
1
1Tc =
2X
2
+ 2
first levelfirst level
second levelsecond level
yIdentitymatrix
1X
Test for difference in N170
Example: difference between trial types
Example: difference between trial types
ERP-specific contrasts: Average between 150 and
190 ms
ERP-specific contrasts: Average between 150 and
190 ms
-1 1
2nd level contrast
Some practical points
RFX: If not using multi-dimensional contrasts at 2nd level (F-tests), use a series of 1-sample t-tests at the 2nd level.
RFX: If not using multi-dimensional contrasts at 2nd level (F-tests), use a series of 1-sample t-tests at the 2nd level.
Use mixed-effects model only, if seriously in doubt about validity of summary statistics approach.
Use mixed-effects model only, if seriously in doubt about validity of summary statistics approach.
If using variance components at 2nd level: Always check your variance components (explore design).
If using variance components at 2nd level: Always check your variance components (explore design).
Conclusion
Linear hierarchical models are general enough for typical multi-subject imaging data (PET, fMRI, EEG/MEG).
Linear hierarchical models are general enough for typical multi-subject imaging data (PET, fMRI, EEG/MEG).
Summary statistics are robust approximation to mixed-effects analysis.
Summary statistics are robust approximation to mixed-effects analysis.
To minimize number of variance components to be estimated at 2nd level, compute relevant contrasts at 1st level and use simple test at 2nd level.
To minimize number of variance components to be estimated at 2nd level, compute relevant contrasts at 1st level and use simple test at 2nd level.