Hierarchical Models Wellcome Dept. of Imaging Neuroscience University College London Stefan Kiebel

Hierarchical ModelsHierarchical Models Hierarchical ModelsHierarchical Models

Wellcome Dept. of Imaging Neuroscience

University College London

Stefan KiebeStefan Kiebell

Overview

Why hierarchical models?Why hierarchical models?

The modelThe model

EstimationEstimation

Variance componentsVariance components

Expectation-MaximizationExpectation-Maximization

Summary Statistics approachSummary Statistics approach

ExamplesExamples

Overview


The modelThe model





ExamplesExamples


Data

fMRI, single subjectfMRI, single subject

fMRI, multi-subjectfMRI, multi-subject ERP/ERF, multi-subjectERP/ERF, multi-subject

EEG/MEG, single subjectEEG/MEG, single subject

Hierarchical model for all imaging data!

Hierarchical model for all imaging data!

Time

Time

Intensity

Tim

e

single voxeltime series

single voxeltime series

Reminder: voxel by voxel

modelspecification

modelspecification

parameterestimation

parameterestimation

hypothesishypothesis

statisticstatistic

SPMSPM

Overview


The modelThe model





ExamplesExamples

The modelThe model

General Linear Model Xy

=

+X

N

1

N N

1 1p

p

Model is specified by1. Design matrix X2. Assumptions about

Model is specified by1. Design matrix X2. Assumptions about

N: number of scansp: number of regressors

N: number of scansp: number of regressors

y

)()()()1(

)2()2()2()1(

)1()1()1(

nnnn X

X

Xy

)()()( i

k

i

kk

i QC

Linear hierarchical model

Hierarchical modelHierarchical model Multiple variance components at each level

Multiple variance components at each level

At each level, distribution of parameters is given by level above.

At each level, distribution of parameters is given by level above.

What we don’t know: distribution of parameters and variance parameters.

What we don’t know: distribution of parameters and variance parameters.

=

Example: Two level model

2221

111

X

Xy

1

1+ 1 = 2X

2

+ 2y

)1(1X

)1(2X

)1(3X

Second levelSecond level

First levelFirst level

)()()()1(

)2()2()2()1(

)1()1()1(

nnnn X

X

Xy

Algorithmic Equivalence

Hierarchicalmodel

Hierarchicalmodel

ParametricEmpirical

Bayes (PEB)

ParametricEmpirical

Bayes (PEB)

EM = PEB = ReMLEM = PEB = ReML

RestrictedMaximumLikelihood

(ReML)

RestrictedMaximumLikelihood

(ReML)

Single-levelmodel

Single-levelmodel

)()()1(

)()1()1(

)2()1()1(

...

nn

nn

XXXX

Xy

Overview


The modelThe model





ExamplesExamples


Estimation

EM-algorithmEM-algorithm

gJ

d

LdJ

d

dLg

1

2

2

E-stepE-step

M-stepM-step

Friston et al. 2002, Neuroimage

Friston et al. 2002, Neuroimage

kk

kQC

Assume, at voxel j:

Assume, at voxel j: kjjk

)lnL maximise p(y|λ

111

NppNN

Xy

yCXC

XCXCT

yy

Ty

1||

11| )(

And in practice?

Most 2-level models are just too big to compute.

Most 2-level models are just too big to compute.

And even if, it takes a long time! And even if, it takes a long time!

Moreover, sometimes we‘re only interested in one specific effect and

don‘t want to model all the data.

Moreover, sometimes we‘re only interested in one specific effect and

don‘t want to model all the data.

Is there a fast approximation?Is there a fast approximation?

Data Design Matrix Contrast Images )ˆ(ˆ

ˆ

T

T

craV

ct

SPM(t)

Summary Statistics approach

1̂

2̂

11̂

12̂

21̂

22̂

211̂

212̂

Second levelSecond levelFirst levelFirst level

One-samplet-test @ 2nd level

One-samplet-test @ 2nd level

Validity of approach

The summary stats approach is exact under i.i.d. assumptions at the 2nd level, if for each session:

The summary stats approach is exact under i.i.d. assumptions at the 2nd level, if for each session:

All other cases: Summary stats approach seems to be robust against typical violations.

All other cases: Summary stats approach seems to be robust against typical violations.

Within-session covariance the sameWithin-session covariance the same

First-level design the sameFirst-level design the same

One contrast per sessionOne contrast per session

Friston et al. (2004) Mixed effects and fMRI studies, Neuroimage

Friston et al. (2004) Mixed effects and fMRI studies, Neuroimage

Summarystatistics

Summarystatistics

EMapproach

EMapproach

Mixed-effects

)2(̂

jjj

i

Tii QXQXV

XXY

)2()2()1()1()1()1(

)2(

)1(ˆ

yVXXVX TT 111)2( )(ˆ yVXXVX TT 111)2( )(ˆ

yVXXVX TT 111)1( )(ˆ yVXXVX TT 111)1( )(ˆ

},,{ QXnyyREML T },,{ QXnyyREML T

Step 1

Step 2

},,,{

][)1()2(

1)1()1(

1

)2()1()0(

TXQXQQ

XXXX

IV

XXX

][ )1()0(

datay

Overview


The modelThe model





ExamplesExamples


Sphericity

‚sphericity‘ means:‚sphericity‘ means:

ICov 2)(

Xy )()( TECovC

Scans

Sca

nsi.e.

2)( iVar12

Errors are independent but not identical

Errors are independent but not identical

Errors are not independent and not identical

Errors are not independent and not identical

Errorcovariance

Errorcovariance

2nd level: Non-sphericity

1st level fMRI: serial correlations

withwithttt a 1 ),0(~ 2 Nt

autoregressive process of order 1 (AR-1)

)(Covautocovariance-

function

N

N

Restricted Maximum Likelihood

Xy ?)(Cov observed

ReMLReMLestimated

2211ˆˆ QQ

j

Tjj yy

voxel

1Q

2Q

Estimation in SPM5

jjj Xy

jOLSj yX ,̂

),,ReML()(ˆˆ

QXyyvoCCjvoxel

Tjj

jTT

MLj yVXXVX 111, )(ˆ

Maximum LikelihoodMaximum LikelihoodOrdinary least-squaresOrdinary least-squares

ReML (pooled estimate)ReML (pooled estimate)

•optional in SPM5•one pass through data•statistic using effective degrees of freedom

•optional in SPM5•one pass through data•statistic using effective degrees of freedom

•2 passes (first pass for selection of voxels)

•more accurate estimate of V

•2 passes (first pass for selection of voxels)

•more accurate estimate of V

2nd level: Non-sphericity

Errors can be non-independent and non-identical

when…

Errors can be non-independent and non-identical

when…

Several parameters per subject, e.g. repeated measures designSeveral parameters per subject, e.g. repeated measures design

Complete characterization of the hemodynamic response,e.g. taking more than 1 HRF parameter to 2nd level

Complete characterization of the hemodynamic response,e.g. taking more than 1 HRF parameter to 2nd level

Example data set (R Henson) on: http://www.fil.ion.ucl.ac.uk/spm/data/

Example data set (R Henson) on: http://www.fil.ion.ucl.ac.uk/spm/data/

Overview


The modelThe model





ExamplesExamplesExamplesExamples

Example I

Stimuli:Stimuli: Auditory Presentation (SOA = 4 secs) of(i) words and (ii) words spoken backwards

Auditory Presentation (SOA = 4 secs) of(i) words and (ii) words spoken backwards

Subjects:Subjects:

e.g. “Book”

and “Koob”

e.g. “Book”

and “Koob”

fMRI, 250 scans per subject, block design

fMRI, 250 scans per subject, block design

Scanning:Scanning:

U. Noppeney et al.U. Noppeney et al.

(i) 12 control subjects(ii) 11 blind subjects

(i) 12 control subjects(ii) 11 blind subjects

Population differences

1st level:1st level:

2nd level:2nd level:

ControlsControls BlindsBlinds

X

]11[ TcV

Example II

Stimuli:Stimuli: Auditory Presentation (SOA = 4 secs) of words Auditory Presentation (SOA = 4 secs) of words

Subjects:Subjects:

fMRI, 250 scans persubject, block design

fMRI, 250 scans persubject, block designScanning:Scanning:

U. Noppeney et al.U. Noppeney et al.

(i) 12 control subjects(i) 12 control subjects

Motion Sound Visual Action

“jump” “click” “pink” “turn”

Question:Question:What regions are affectedby the semantic content ofthe words?

What regions are affectedby the semantic content ofthe words?

Repeated measures Anova

1st level:1st level:

2nd level:2nd level:

VisualVisual ActionAction

X

1100

0110

0011Tc

?=

?=

?=

MotionMotion SoundSound

V

X

Example 3: 2-level ERP model

Single subjectSingle subject

Trial type 1Trial type 1

Trial type iTrial type i

Trial type nTrial type n. . .. . .

. . .. . .

Multiple subjectsMultiple subjects

Subject 1Subject 1

Subject mSubject m

Subject jSubject j

. . .. . .

. . .. . .

. . . =

1

1Tc =

2X

2

+ 2

first levelfirst level

second levelsecond level

yIdentitymatrix

1X

Test for difference in N170

Example: difference between trial types

Example: difference between trial types

ERP-specific contrasts: Average between 150 and

190 ms

ERP-specific contrasts: Average between 150 and

190 ms

-1 1

2nd level contrast

Some practical points

RFX: If not using multi-dimensional contrasts at 2nd level (F-tests), use a series of 1-sample t-tests at the 2nd level.

RFX: If not using multi-dimensional contrasts at 2nd level (F-tests), use a series of 1-sample t-tests at the 2nd level.

Use mixed-effects model only, if seriously in doubt about validity of summary statistics approach.

Use mixed-effects model only, if seriously in doubt about validity of summary statistics approach.

If using variance components at 2nd level: Always check your variance components (explore design).

If using variance components at 2nd level: Always check your variance components (explore design).

Conclusion

Linear hierarchical models are general enough for typical multi-subject imaging data (PET, fMRI, EEG/MEG).

Linear hierarchical models are general enough for typical multi-subject imaging data (PET, fMRI, EEG/MEG).

Summary statistics are robust approximation to mixed-effects analysis.

Summary statistics are robust approximation to mixed-effects analysis.

To minimize number of variance components to be estimated at 2nd level, compute relevant contrasts at 1st level and use simple test at 2nd level.

To minimize number of variance components to be estimated at 2nd level, compute relevant contrasts at 1st level and use simple test at 2nd level.

Documents

Hierarchical Models Wellcome Dept. of Imaging Neuroscience University College London Stefan Kiebel