25
Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department of Statistics, LMU Spatial Bayesian Variable Selection with Application to fMRI

Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Michael SmithEconometrics & Business Statistics, U. of Sydney

Ludwig FahrmeirDepartment of Statistics, LMU

Spatial Bayesian Variable Selection with Application to fMRI

Page 2: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Key References

• Smith and Fahrmeir (2004) ‘Spatial Bayesian variable selection

with application to fMRI’ (under review)

• Smith, Putz, Auer and Fahrmeir, (2003), Neuroimage

• Smith and Smith (2005), ‘Estimation of binary Markov

Random Fields using Markov chain Monte Carlo’, to appear in

JCGS

• Kohn, Smith and Chan (2001), ‘Nonparametric regression

using linear combinations of basis functions’, Statistics and

Computing, 11, 313-322

Page 3: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Bayesian Variable Selection in Regression

• Bayesian variable selection is now widely used in statistics

• Assume there are i=1,..,N separate regressions:

yi = Xiβi + ei

each located at sites on a lattice

• Each with ni observations and the same p independent variables

• For each regression, introduce a vector of indicator variables γi with elements

γij = 0 iff βij = 0

γij = 1 iff βij ≠ 0

where j=1,…,p

Page 4: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Proper Prior for Coefficients

• Each regression can be restated as

yi = Xi(γi)βi(γi) + ei

• where βi(γi) are the non-zero elements of the βi vector.

• A proper prior is employed for the non-zero elements of the β vector with

• This results in a prior (g-prior)2 1ˆ( ( ), ( ( ) ' ( )) )i i i i i i i iN n X X

1/2 2i ip( ( ) | , ) ( | ( ), , ) in

i i i i i i ip y

Page 5: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Joint Model Posterior

• The posterior of interest is

• where (see Smith and Kohn ‘96)

• Binary MRFs priors are placed on p(γ(j)) to spatially smooth the indicators for each regressor j

( | ) ( | ) ( )

( | ) ( )i ii

p y p y p

p y p

/ 2 / 2( | ) ( ) (1 )i in qi i i ip y S n

Page 6: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

The Ising Prior

• One of the most popular binary MRFs is due to Ernst Ising

• Let γ = ( γ1,…, γN)’ be a vector of binary indicators over a lattice

• Then the pdf of γ is:

• Here:– αi are the external field coefficients

– i~j indicates site i neighbors site j

– θ is the smoothing parameter

– ωij are the weights (taken here as reciprocal of distance between sites i,j)

~

( | ) exp ( )i i ij i ji i j

p I

Page 7: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

The Ising Prior

• Other binary MRF priors could be used

• For example, latent variables with Gaussian MRFs (see Smith and Smith ’05 for a comparison)

• However, the Ising model has three strong advantages in the fMRI application:– Through the external field, anatomical prior information can be

incorporated

– Single-site sampling is much faster than with the latent GMRF based priors

– The edge-preservation properties are strong

Page 8: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

The Ising Prior

• However, the joint distributionp(γ,θ)=p(γ|θ)p(θ)

can be tricky….• When α=0

p(γi=1 | θ) = p(γi=1) = ½,so that θ is a smoothing parameter only

• However, when α≠0 (which proves important in the fMRI application) then

p(γi=1 | θ) ≠ p(γi=1),so that θ is both a smoothing parameter and (along with fixed variables α) determines the marginal prior probability

Page 9: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Posterior Distribution

• The posterior distribution (conditional on θ) can be computed in closed form, with

• However, in general this is not a recognisable density, so that inference has to be undertaken via simulation

• Possible to employ a MH step using a binary MRF approximation as a proposal (see Nott & Green)

• However, single-site sampling proves very hard to beat!

( )1 1

( | , ) ( | ) ( | )pN

i i j ji j

p y p y p

Page 10: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Sampling Scheme

• Can use the Gibbs sampler, generating directly from p(γij|γ\

ij,θ,y)

• Alternatively, can employ a MH step at each site based on the conditional prior p(γij|γ\ij,θ)

• Even under uniform priors can prove significantly faster (see Kohn et al. ‘01)

• This is because it avoids the need for evaluation of the likelihood when there is no switch (that is, 0→0 or 1→1)

• In the case where informative spatial binary MRF priors are used, the MH step can prove even faster because the proposal is a better approximation to the posterior

Page 11: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Co-estimation with Smoothing Parameters

• Assume a uniform hyperprior

• The smoothing parameters can be generated one-at-a-time from

• Where Cj(αj,θj) is the normalising constant for the Ising density

• It is pre-computed using a B-spline numerical approximation, with points evaluated using a Monte Carlo method (see Green and Richardson, ‘02)

• This proves extremely accurate

• The element θj is then generated using a RW MH

max1

( ) (0 )p

jj

p I

max~

( | , ) ( , ) exp ( ) (0 )j k j j j j j ik ij kj ji k

p y C I I

Page 12: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Monte Carlo Estimates

• The following Monte Carlo mixture estimates are of interest

• Note that if primary interest is in the first estimate, then evaluation of the conditional posteriors is undertaken analytically at each step of a Gibbs sampler

• In this case, this negates the speed improvements of the MH step suggested in Kohn et al. ’01

• If interest is in the second estimate, then the faster sampler is preferable

[ ] [ ]\

[ ]

1( 1| ) ( 1| , , )

1( | ) ( | , )

k kij ij ij

k

k

k

p y p yK

E y E yK

Page 13: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Introduction to brain mapping using fMRI data• fMRI is a powerful tool for the non-invasive assessment of the

functionality of neuronal networks

• The crux of this analysis is the distinction between active and inactive voxels from a functional time series

• This data is typically massive and involves a time series of observations at many voxels in the brain

• The data is usually highly noisy, and the issue facing statisticians is the balancing of false-positive with false-negative classifications

• This issue is addressed by exploiting the known high degree of spatial correlation in activation profiles

• If exploited effectively, this greatly enhances the activation maps and reduces both f-p and f-n classifications

Page 14: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Example of MR time series (strongly active, weakly active, inactive voxels)

Page 15: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Regression Modeling of fMRI data• Define the following variables:

– yit : the magnetic resonance time series at voxel i and time t

– ait : baseline trend at voxel i and time t

– zit : transformed stimulus at voxel i and time t

– βi : activation amplitude

• Then the regression model below is a popular model in the literature

yit = ait + zitβi+ eit

• The baseline trend is due to background influence on the patient’s neuronal activity during the experiment

• A simple model is a parametric expansion ait = wt’αi , where we use a quadratic polynomial and 4 low order Fourier terms

• The errors are modelled as iid N(0,σi2)

Page 16: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Anatomically Informed Activation Prior

• The prior for the activation indicators (only) can be informed by all sorts of information

• Here, we only consider the fact that activation can only occur in areas of grey matter

• Let g=(g1,…,gN) denote whether, or not, each voxel is grey matter

• We assume p(g)=p(g1)…p(gN), where p(gi) are known• Then, if p(γi =1 | gi) = a (0.1 in our empirical experiment),

p(γi =1) = p(gi) a = ci

• When θ=0 we can equate this with the marginal prior from the Ising density to obtain values for the external field to get

p(γi =1) = exp(αi)/exp(αi+1) = ci

Page 17: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Two-Step Procedure

• However, the drawback of using non-zero external field coefficients in the Ising prior for the activation effect is that θ is no longer simply a smoothing parameter

• Therefore, we use a two-step procedure:

Step (1)

Set α=0, estimate and obtain a point estimate θ*=E(θ|y)

Step (2)

Refit with an anatomically informed Ising prior, but conditional on fixed level of smoothing θ*

• This overcomes the complex inter-relationship between θ & the marginal probability of activation with non-zero α

Page 18: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Posterior Analysis using 3D Neighborhood

• We fit the data using a 3D neighborhood (9+9+8 neighbors)

• For simplicity here we do not undertake variable selection on the parametric trend terms, just the activation variable

• Three priors are employed (see table 1)

• The activation and amplitude maps obtained using prior (i) are in given in fig 2

• The activation maps using prior (ii) for 8 contiguous slices are given in fig 4- they reveal the important of using anatomical prior information

• The activation mpas using prior (iii) for the same 8 slices are given in fig 6- they reveal the importance of spatial smoothing

Page 19: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Three Ising Priors

• Prior (i) corresponds to step 1 in the two stage procedure• Prior (ii) corresponds to step 2 in the two stage procedure• Prior (iii) is for comparison with prior (ii) to demonstrate the impact of spatial smoothing

Page 20: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Page 21: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Page 22: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Page 23: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Some comparisons

• An obvious alternative is to simply smooth spatially the activation amplitudes (βi, i=1,…,N) and then classify

• Fig. 5 shows the amplitude map using this approach

• What happens when variable selection is also undertaken on trend terms?

• Therefore, using only a 2D neighborhood structure two estimates were obtained: one with BVS on all terms, and the other with BVS on only one term.

• The resulting activation maps are similar and are in fig 8.

Page 24: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney

Page 25: Econometrics & Business Statistics @ The University of Sydney Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department

Econometrics & Business Statistics

@ The University of Sydney