STATISTICAL ANALYSIS OF MEDICAL IMAGES WITH … · C'SI\'ERSITY OF TOROSTO DEP-IRTJIEST OF PUBLIC HE-ALTH SCIESCES The undersigned hereby certify that the' have rcad and recommend

$Page 1: STATISTICAL ANALYSIS OF MEDICAL IMAGES WITH … · C'SI\'ERSITY OF TOROSTO DEP-IRTJIEST OF PUBLIC HE-ALTH SCIESCES The undersigned hereby certify that the' have rcad and recommend$
STATISTICAL ANALYSIS OF MEDICAL IMAGES WITH

APPLICATIONS TO NEUROIMAGING

BY

Rafal Kustra

THESIS SLBMITTED IS COSFOR,\IIT\' \ï'ITH THE

REQCIREIIESTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

GR-4DC.ATE DEP-4RTlIEST OF

PUBLIC HE-ALTH SCIESCES

I S THE

C'Sn'ERSITY OF TOROSTO

TOROSTO. OST~UZIO

ACGCST 2000

@ Copyright by Rafal Kustra. 2000

National Library of Canada

Bibliothéque natio~ale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Wellington Street 395. nie Wdlingtori OnawaON K l A W WwaON K l A OlrC4 Canada canada

The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in rnicrofom, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or othewise reproduced without the author's permission.

L'auteur a accordé une Licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

C'SI\'ERSITY OF TOROSTO

DEP-IRTJIEST OF

PUBLIC HE-ALTH SCIESCES

The undersigned hereby certify that the' have rcad and recommend to

t hc School of Graduate Studies for acceptance a t hesis entitled -Stat istical

Analysis of Medical Images with Applications to Neuroimaging"

IF Rafal Kustra in partial fulfillment of the requirements for the degree of

Doctor of Philosophy.

Dated: -4ugust 2000

Esternal Esaminer: Keith Worsley. PhD

Research Supervisor: Robert Tibshirani

Esarning Commit tee: Stcven Strother. PhD

.James Stafford. PIiD

Randy McIntosh. PhD

UNIVERSITY OF TORONTO

Date: August 2000

.-lut hor: Rafal Kustra

Tit le: S t at ist ical Analysis of Medical Images wit h

Applications to Neuroimaging

Dcpart nient: Public Health Sciences

Dcgrcc: Ph.D. Convocation: October \car: 2000

Pern~ission is hcrewith granted to U'riivcrsity of Toronto to circiilate and to have copicd for non-commercial purposes. at its discretion. thc abovc title upon tlic rcqucst of individuals or institutions.

Signaturc of Author

THE ACTHOR RESERITES OTHER PCBLIC-r\TIOS RIGHTS. ASD SEITHER THE THESIS S O R ESTESSI\'E ESTR-ACTS FROJI IT 1LAY BE PRISTED OR CITHERIVISE REPRODCCED IVITHOCT THE ACTHOR'S \\'RITTES PER,\IISSIOS.

THE XC'THOR ATTESTS THAT PERJIISSIOS HXS BEES OBTXISED FOR THE C'SE OF ASY COPI'RIGHTED 1LITERIXL ,APPEARISG IS THIS THESIS (OTHER TH.W BRIEF ESCERPTS REQCIRISG OSLY PROPER .ACKSO\VLEDGE'rlEST I S SCHOLXRLY \\'RITISG) ASD TH-AT -ILL SLCH cSE IS LLE-ARLY -4CKSOiVLEDGED.

Contents

List of Tables

List of Figures

Abstract

Acknowledgments

ix

X

xvi

xvii

1 Introduction 1

1.1 Irriagcs as Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 \lotivation and the Setup for Statistical Iniage Analysis . . . . . . . . . . . 4

1.:3 Fiinctional Data . . . . - . . . . . . . . . . . . . . - . - . . - . . . . . - . . 6

1.4 Sotation and Conventions . . . . . . . . - . . . . - . . - . . . . . . . . . 9

2 Neuroimaging Data and Methods 12

2.1 Goals and Study Design in Seuroimaging . . . . . . . . . . . . . . . . . . . 1'7

2.2 PET and fl1RI !.Iodalities . . . . . . . . . . . - . . . . . . . . . . . . . . . 15

2.2.1 Positron Emission Toniography . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Functional Slagnetic Resonance Irnaging . . . . . . . . . . . . - . . 20

2.3 Litcrature Ovcrview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 Single Vosel .Anal-is: Statist ical Paramctric llapping aiid Gaussian

Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 'LI

2-3.2 Scalrd Suhprofile Uotlel: State-Driven iariance Decomposit ion wit h

. . . . . . . . . . . . . . . . . . Global and Subject Effect Removal 31

. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Partial Least Squares 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Datasets Studied 36

. . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Finger Opposition Task 36

. . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Static Force n IRI data 35

3 PenaIized Linear Discriminant Analysis with Basis Expansion 41

. . . . . . . . . . . . . . . . . . . . 3.1 Classical Linear Discriminant -4nalysis 44

. . . . . . . . 3.1.1 Discriminant Functions and AI.lSO\;.4 \ ïew of LD-4 -IG

. . . . . . . . . . . 3.1.2 The Geometry of LD.4 in Two Class . 2D setting 47

. . . . . . . . . . . . . . . . . . . . . . . 3.2 LDr\ arid Random Subject Effects 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Simulation Study .5 1

3.13 Diriicrision Reduct ion in LD;\ iising Smoot hness Const raints and Penalizat ion 57

. . . . . . . . . . . . . . . . 3.3.1 Basis Espansion of Canonical \ariates 58

. . . . . . . . . . . . . . . . 3.3.2 Penalized Linear Discriminant Analysis 59

3.3.3 Penalized Discriminant Analysis and St at ist ical Paranietric llapping 6 1

. . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 PD;\ via Regression 64

. . . . . 3.3.5 Espressing the PD.4 algorithm in the 3-dimcnsional space GG

. . . . . . . . . . . . . . . . . . . . . . 3.3.6 Effective Degrees of Freedom GT

. . . . . . . . . . . . . . . . . . .3.3.1 Prediction Error and its Estimates 68

. . . . . . . . . . . . . . . . . . . . . . . 3.4 -4 Sotc on Gaussian .\ ssumption 73

. . . . . . . . . . . . . . . 3.5 1s Ridge Penalty Enough for tlic B-spline Basis? 74

4 Results with B-Spline and 3 dimensional Wavelets 79

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -4.1 i\ivelet Basis 79

. . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 l\avelets: Introduction 80

. . . . . . . -1.1.2 Orthogonal \\.'al.eIet Basis and Slultiresoliition Analysis 81

4.1.3 Discretel\avelet Transform . . . . . . . . . . . . . . . . . . . . . . 85

4.1.4 3D l i a ~ e l e t Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.1.3 \\-welct Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2 Finger Opposition Data: Methods . . . . . . . . . . . . . . . . . . . . . . . 90

4 2 . Data and the Standard t-Test .A nalxsis . . . . . . . . . . . . . . . . 90

4.2.2 Two-way Classification n-it h TPS: Interna1 Optimization and Scan

Sormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . 1-23 Two-11-a? Classification: Esternal Optimizat ion

4.2.4 Deriving Tirne and Spatial projections . . . . . . . . . . . . . . . .

4.3 Finger Opposition Data: Results . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . 4.3.1 Two-way Classification and Interna1 Opt imization

4.3.2 Estcrnal Optimization wi-ith Different Bases . . . . . . . . . . . . .

4.3.3 Applying PD-A to an Eight-Class Problcm: State ancl Temporal

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changes

. . . . . . . . . . . . . . . . . . . . . . . . 4.4 Resrilt ivi tli \\a\-elet Espansions

5 Static Force fMRI Analysis 112

5.1 llotlciing the tinie series effects: Tirne-Smooti~ccl PD-1 . . . . . . . . . . . 112

5 Introducing Betwen-Scan Smoothness within the Discriminant Framework 113

5.3 T h O ( S ) -4lgorithm for Time-Smoothed Penalized Discriminant . . . . . 115

5.3.1 Constructing the Second-Order B-spline Penalty Matris . . . . . . 117

5.4 Corinections n-ith Canonical Correlation .A nalysis and l I -U 'Oi~- - l . . . . . . 118

- - 3 Pcnalizcd Discriminant -4nalysis of StaticForce da ta in B-splines and \\a\-elct

do~riains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1'23

. . . . . . . 5 . G Applying Time-Smoothed PD.-\ mode1 to the StaticForce da ta 125

vii

6 Conclusioas and Extensions 128

6.1 Estending the predictirc anal-sis . . . . . . . . . . . . . . . - . . . . . . . 129

6.2 Cornparing the ~ e s u l t s ;\cross Son-Predict ive Paradigms . . . . . . . - . - 131

6.3 l\>vclcts and Basis Selection Tecliniqiies . . . . . . . . . . - - . . . . . . . 132

6.4 Inference and other issues . . . . . . . . . . . . . . . . . - . . . . . . . - . 1.34

A Tensor Product B-Spline Basis 137

B Basis Expansion of Canonical Variates 139

C CCA via Regression 141

D Correspondence Between CCA and LDA variates 143

E Deriving Predictions in the n-Dimensional Space 145

F Ridge Regression With the Outer Product Matrix 146

G Centering the Design Matrix 149

List of Tables

3.1 Estima te o f Effects in the four --lrvO\ 14 models of the simulation rcsults.

The terms arc input dimensionalitj-. P = (5.30) a s cornparcd to P =

10. training set size -V = 50 compared to ,V = 100. nunher of sul~jects.

Subjects={5.10. ".\-") compared to 1 subject and ratio o f iarianccs of subject

rffect to crror eflect. VarRatio={ 10.100) compwed to VarRatio= 1. Tlicre is

dso a n interaction term Detwcn P and .V. . . . . . . . . . . . . . . . . . 56

List of Figures

3.1 Demonstration o f 2-class LD.4 in 2 dimensions. The light points (class 1 )

and darkcr points (class 2) show the ,300 bitariate Gaussian obsert-ations

gcncrated from each class. The solid linc is the true canonical \-ariate (C\-).

Tlie circlcs are class means, and the clia~noncls are the means projected onto

&lie Cl-. The points marked nith the cross. represent the test point and its

projections ont0 the mean-difference (broken) and Cl* lines. . . . . . . . . 47

3.2 Thc orthonormal hasis functions (columns o f L-) from the 2-il tensor-product

B-spline cspansion. T h r e are 5 x 3 B-spline basis. ir-tiich were diagonalizecl

ir-itlt S I D and displayed hcre in the order o f decreasing eigcm-altrcs. T h -- penalticssk~on-r~foreach basisn-erccalculatccf n . i t b X = l . . . . . . . . . . t r

4.1 Haar (lcft) and Dau bechies SJ-rnmlet n.a\det functions. The detail le\-el

grows from bottom up, and onlj. some intcger translates are clran-n a t each

let-el . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.2 In tcrnal optimiza tion o f the ridge t uning parameter (Sec. 4 .3 .1 ) - expressed as

a n un1 ber o f Effectii-e Degrees o f Freedom . for Penalized Discriminant .-Inal-

ysis in tn-O-class problem. This example uses the da ta projected on the BZS.

tensor product B-spline basis set. The curres show the change of Predic-

t ion Error (bo th .\ lisclassification (JIC) ra te and Squared Predic tion Error

(SPE)) as a function o f Effectiw Degrees o f Freedom (EDF). Both cross-

r-didation (Cl r) and -6.3PBootstrap estimates are eshibitcd. T i ~ c top Icft

panel shon-s changes for un-normalized data. n-hile the bottom panel deals

ri-ith mean-normalized data. The top right panel shon-s Cl- and .63ZBoot-

strap SPE curws for mean-normalized data n-hich n-ere obtained n-ith a

ciiffercnt random secd for ei-err- EDF: these portraj- the greater r-ariahilit?- o f

C'\- estimates. Thin lines - Cl- estimates. Thick lincs - .632+Bootstr;tp

cstinlates. solid lines - estimates o f SPE. dashed lincs - estimates o f MC

rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4 -3 Prcdiction Error (PE) curves as a function o f the cffectir-c degrecs o f frceclorn

(EDF) for al1 tensor-product B-splinc (B15-635) hases and unprojected rari.

(Braw) and srnoothed (Bsmooth3,S) scans. The upper panel show Scduared

Prcdictio~i Error (SPE=( 1 - P;). diere Ijc is tlic estima ted posterior prob

ahilitj- for the truc class): and the lon-er panel dcpicts -\fiscZassification rate

(SIC rate) as a percentage o f the total number o f scans n~isclassified. The

iniages from un-projected data are shon-n n-ith larger-ri-idth cun-es. The

niarkcrs show the minima for cach cim-e. (Thc minimum for the 630 curr-e

is not shon-n as it occured bej-ond the figure framc). . . . - . . . . . . . . 9.5

4 4 Filnctionalb- actir-a ted [ '.'O] ira ter PET vosels a bor-e the 93. J percentile

(n-hi te or -erlay) in t erlear -ed n-ith registered grayseale -\ 1R.I brain slices for

Pcnalized Discriminant -4nalysis oE ( A 1 6-A40) unprojected ran- data pres-

rnoothed rr-ith a 3 x 3 x 3 r-osel boxcar kernel (BSrnooth3): ( B I 6-B40) unpro-

jccted ran- data rvithout presrnoothing (Braw): (Cl 6-C40) tensor procl uct

splinc basis n-itd 15 spline bases in each spatial dimension (BI 5) : (01 6-040)

tcnsor product splinc hasis n-ith 35 splinc bases in each spatial dimension

(835) (actir-a tion images -4 to D have decreasing squared prediction error

(SPE) 1-alues as illustrated in Figure 4.3): (E16-E40) is a pooled standard

der-iation t-test image o f scans presmoothed n-ith a 3 x 3 x 3 rose1 boscar

kcrncl - the Bonferrozri t - d u e (t=4.G) at the 93.4 percentile iras used to

dcfine a conserrati~-e actiration tilreshold n-ith n-hich to compare actimtion

imagc peaks (n-hitc or-ml+-) for a fisecl number o f r-osels. PET and AIRI

sliccs are 1'28 x 128 rr-ith 3.1 mm pixels n-ith centcr-to-center slice spacing o f

3 . 4 m m (i.e.. slice -42.3 ancl -426 are separated b- (26 - 23) - 3.4 = 10.2mm)

and arc parallei GO the .4C-PC plane. rr-hich coincides rr-ith slicc 24. Image

left = brain left. . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . 98

4. . Scatter plot o f pairs o f activation imagc ralucs for al1 Talairach brain ras-

cls ( 1 point/\-osel) for a single-rose1 t-test image using a pooled standard

der-iation estimate, cornpared to penalized discriminant analysis ( P m ) of

a tensor product spline representation n-ith 35 B-spline bases along each

spatial dimension (B35). The dashed line depicts the principal a i s from

a principal componen t ana&-sis o f tlw scat ter plot ciistri bu tion. The circle

Iiighligk~ts a group o f 1-osels in the primary t-isual region that have mo1-ed

froni the 2oth percentile in the t-test image to the 9oLh percentile in the FD.4

imagc. The solid 1-ertical line depicts the Bonferroni t-ralue (t=4.6.5) at

thc 93.4 percentile o f the t-test distribution o f t-ose1 ralues ( d i t e orerlq:

row E o f Figure 4-4) and the solici horizontal lioe reflects the 93.4 pcrceotile

(ralue=0.0065) for the PD-4 distribution of t-oscl talues (rr-hite 01-erl;?: rcirr-

D o f Figure -1.4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

G Square Predictiozi Error (SPE) cur\-CS. in an 8-class prohlcm. a s a function

o f Equil-alen t Degrees o f Frcedon~ for 8 penalized discriminant models rr-ith

differen t represen ta tions: 5 tensor-prod uc t B-splinc projected da tasets \ri th

rnrj-ing nurnbers o f hasis functions (BI5 to B35) and 3 unprojectecf raw

(Braw) and srnoothed (BSmooth3 and BSmooth5) datasets (thicker lines).

Thc markers short- the minima for each curr-e. . . . . . . . . . . . . . . . . 103

4 Projecting tlic data on first t~r-O canonical images obtain bj- Penalized Dis-

criminant --lnaljsis o f the 8-n-aj- classification pro blem. The points are la-

belcd according to the class (1: first basehe, 2: first active, 3: second base-

fine. etc). and the class means are slion-n in cirdcs. Tkiis figure rr-as obtained

from tcnsor-product B-spline projected data using 35 13-spline basis in each

dimension (B35) mode1 n-ith X corresponding to the minimum Squared Pre-

. . . . . . . . . . . diction Error (SPE): but is similar across basis and X k. 104

s i i i

4. S Top pancl compares war-ele t and Bsplinc rcsults in the 2clas.s problem. Shonp

are Daubechies order 2 thresholded n-ar-elet basis compared to ran- and B-

splinc representations using .632+ Bootstrap estima te o f squared prediction

crror. The bottom panel shows SPE curl-es for the two-class problem for var-

ious mi-elet families. \\; compare Daubechies order 2 and 3 famih- and order

2 C'oiflct system. For each familj- irae inr-estigate tn-O threskiolding strategies:

simplj- 'peeling o f f ' one finest detail let-el in e a d dimension (32 x 32 x 16):

iind Donolio and Jotlnstone IïsuSlirink hard thresholding rule. . . . . . . . 101

4.9 \ ïsual cornparison of \\arelet (top ron-) and 830 representations in the 2

dass problern. First three slices shon- portions o f the cerebellurn. nest tn-O

display the midbrain portions. and the last three sliccs depict the actir-ation

o f the cortex The graj-scale image is the ana tomical -\IR1 scan in the Ta-

Iairacfl spacc and the Cl - is or-da)-ed on top o f it using the hot-metal color

coding. Both images n-cfe crea tecf tlsing EDF that nlinimized the SPEr 74.6

for \\a\-clet r-s 53.6 for B30. . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.10 C'omparing the B-splincs (B30) and n-ar-clets (DaubZThresh) Canonical Im-

ages using a corner-cube en\-ironment o f Rehm et al. (1 998) in the tn-O-class

sct ring. Escept for the three major or-crlaping regions. the foci hale bcen fit

insidc a bal1 o f the same r-olume a s that o f corresponding focus. Blue foci

correspond to B30 Canonical Image. . . . . . . . . . . . . . . . . . . . . . 109

-1 - 1 1 Scpared Prediction Error for various n-ar-cle t families in thc 8class pro Hem.

nlree rr-ai-elet functions are inr-estiga ted: Daubechies order 2 and -1, and

Coiff ets order 2. For each farnilr. n-e ei ther remor-e top-scale rr-ar-elet coef-

ficient ler-el. resulting in 32 x 32 x 16 n-ar-elet coefficients or n-e apply D k J

\ YsuShrink hard thresholding (Thresh). -4s a further dimension reduction

technique. rr-e inrestiga te using all 7. or first 2 C\ *S to perform classification

(07 r-s 02.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Projections on first 2 C \ S from the PD.4 mode1 applied to . . . . . . . . .

Projections o f and time-poin ts (first ron-) and force Icr-el rneans ooto the first

foirr Canonical Images using the time-smootli PD-4 rnodel n-ith B25 Scnsor

Product B-spline basis and B-splincs for the tinic asis. Force le\-els n-cre:

1 -bascline. 2-200g. 3-400g. 4 -6UOg. 5-800g. 6- 1000g. The rime-struct urc

pcrialty hj-perpiirameter rc-as set a t A) - = 10. . . . . . . . . . . . . . . . . .

Sclcctcd slices o f the tliird canonical image rcsirlting froru appb-ing the time -

srnoothrd PD-4 model to the StaticForce da ta n-ith 625 hasis. EDF=SO and

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,\y = 10.

ID a d 2D B-spline basis. . . . . . . . . . . . . . . . . . . . . . . . . . . .

Abstract

\\C cstcnd a classical multivariate technique: Linear Discriminant Analysis (LD-4) and

apply i t iri ttie analysis of PET and fl1M images of human brain function to discover re-

gioris of activation driven by the esperirnental stimuli. \\é re-examine and spccializc some

ecluivalences be twen LD-4 and: Canonical Correlation Analysis (CC-1) and ~lul t ivar ia te

.\S0\*-4 (\I=\SO\--A). Furthermore. efficient algorithms are cicrivcd to facilitate applying

tlicsc niultivariatc nioclels to estremel!- large image data. 11-c cleal witli the ill-poscci nature

of t hc problcrri iising spatial basis expansion and the perializatioii ( w i t 11 Pcnalizecl Discrini-

iriarit Arial?-sis (PD,\) of Hastie et al. (1995)). and utilize efficient mcasiires of predictivc

pcrforniancc to optiinize h~-perparamcters and validate the niodels in a robiist fastiiori. \\é

csaniine cspanding the images into a 3D tensor-product B-spline and l\hvelet basis and

corripare to the rcsults obtained without espansion. Some parallels bctween our proposal

and soine of those currently popular in the neuroimage comrnunity are discussed. Anotkicr

cstensiori to PD-1 is derived and applied that allows one to modei time serics effccts that

esist in n l R I images. \Ive conclude with many possible enhanccments to the proposed

paradignl.

sv i

Acknowledgment s

This work lias been a truly interdisciplina~ effort and therefore owes irnmcnsel? to a large

niiriibcr of people. 1 would like to thank the whole \--\PET team in Minneapolis for helping

~ i i e in evcry way. fielding m- questions and massaging the image data in some unortliodos

ancl iricon\-eriient wq-S. 1 would particularly like to thank llr. .Ion ,Anderson and ,\Ir. IGrt

Scliarpcr for tlieir data and systeni hclp.

1 ivoirld also like to express my gratitude to Dr Sick Lange froni Harvard Aledical School.

S i ck lias providecl me with the mani- insights. data and somc financial baçking tfiat has al1

lwcri vcry important in cornplcting this tliesis. 1 truly tiope that will tx abIc to work

togctiicr niore in the future.

S o tlicsis work is possible n-ithout tlie guidance and encouragement of a supervisor. 1

have hccn most lucky with two such persons: prof. Rob Tibshirani whom 1 owe a huge debt

of gratitude for introducing me into modern statistics. inspiring nie to pursue escellcnce

n-ort Iiy of such a super~isor and instilling in me a belief in esperiment and enthusiasm for

rinorthodos approachcs to data. On the other liand 1 have enjoyed his unconditional trust

and support in choosing the path 1 felt appropriatc. There was so little hand-holding. even

tliough 1 yearned for it a t times: only recently have 1 bccn able to appreciate his restrain

to lcacl me and his desire to let me discovcr my own way in the esciting and ever-growing.

hiit clefinitcl~. riot simple: world of statistics.

This tliesis also owes its esistence to prof. Steven Strother. PI of the PET group in the

\'-A rncdical ccnter in llinneapolis. Steve has @Yen me enormous support and guidance

svii

th-uugliout our cooperation. He has taught rnc hou- to work n-ith complicatcd data. hou-

to interpret the results of analysis. hou- to \\-rite papers and fight for them later on. and

lias givcn mc moral support al1 \ v q long. His enthusiasm \vas crucial a t times. 1 ha\-e also

lcarriccl froni iiirn how to look beyond the beauty of mathematical models. 1 1 0 ~ to accept

the sonietimes brutal truths spoken by the da t a rrhich more often than not force us t o

ahandon complication and retain simplicity.

1 would also like to offer speciai thanks to al1 faculty in the Departments of Statistics

and Biostatistics a t Cniversity of Toronto. and in the Department of Statistic in Stanford

L-riivcrsity. 1 u-oiild especiallj- like to thank prof. Paul Corey in Toronto for his support

arici encouragement. prof. .James Stafford in Toronto for his support and enthusiasni. and

prof. Trm-or Hastic in Stanford for his suggestions and opinions. In particular. matcrial in

scctioris 3.3 and 5.2 has been suggested b - Trevor. 1 also thank him for providing me n-ith

ari cscellcnt display of wa\-elet functions (figure 4.1). 1 am greatly inclcbted t a m - friends

aiid coIleagiics. both in Toronto and in Stanford: Ilana Belitskaya. -innie Dupuis, Celia

Grocriu-ood. Carmen lIak, 1Iike llanrio. Jorge Picazo. Bogdan Popcscu and .Jiaming Sun:

t h m k >-ou!

Last. but ccrtainly not the least. niy love and decp appreciation goes t o my family. 1

\varit to tliarik my wife. Kasia. who has been so patient with nie. my brother and his family

for lm-c and support. and my parents who have always be l ie~ed in me in the most difficult

of timcs.

Chapter 1

Introduction

1.1 Images as Data

Data cornes to ils in various fornis: incrcasingly. it is collected. stored and analyzed in the

for111 of images. To me tiiis is not ven- surprising: liurnans receive most of tiicir information

aboiit the n-orld surrounding us. as a visual input. It is only natural that we upgraclc tiic

positiori of tliis niedium in the statistical analysis clomain.

Tlicrc arc varioiis rcasons why it is only in rcccnt years that the image data lias beconie

iricrcasingly conimon. l los t important are tcchnological ad~sances on many fronts. To store

a n d process any data. bc it sound. iniagc. or categorical attributes. we necd. as far as our

currerit information processing techniques go. a numcrical represention. The methodologl-

for ;.iccluiririg images and converting thcm to numbers. a process know as scanning or

iriiagc digitizing. has bccome very accessiblc. lloreover. many medical instruments have

bccn digitizcd and already store and process images in digital form. Other technological

adj-a~iccs had t o be made in the storage domain: image da ta is vcry byte-consuming.

-And licre. again, much has been donc in recent years, mainly to mect the demand of

rriirltiniedia-savv>- consumers. Hard disks of ever-increasing capacity and speed. and other

fornis of cornputer storage - most recently DVD that stores over an hoiir of good quality

vide0 on a laser tiisk - have bccome \-en- accessible.

-4s important. although less well known. have been the advances in image compression

ancl proccssing. To appreciate their importance. one needs to understand the sheer size of

image da t a after they have becn conevrted to numerical representation or digitized. After

digitizing. the image is viewed as a large number of indivisible components called pixels

(pict urc clcnients). or voxels (volume clements) when the image is t hree dimensional. The

more pisels there are the better resolution of the digitized image. To represent cotor

information in each pixel. one usually needs threc numbers for each of Red. Green and

Blue colour channels. The nurnber of availahle colors is governed by a maximum allowed

1-aluc of cach such number u-hich is in turn connected to the number of bits allowed for

cach nuniber. Today 24 bits per pixel (or S bits per niimber). so called --TrueColor". giving

2'' or alnlost seventeen million colours possible. is an industry standard. To givc somc idca

a1)oiit thc storage requirement. a typical 640 by 480 TrueColor imagc nccds 921.600 bytes.

alrnost a nicgabytc. to store its 307.200 pisels.

I t is oh-ious. tliat storing images needs cornprcssion. The gencral cornpression algo-

rit linis arc riot optimal sincc they do riot takc irito account tlic spatial structure of images.

Bcttcr arc methods that use the information about special mcaning of the bytes and incor-

poratc snioothriess assumptions. Two such methods have bccome very popular: GIF ancl

.J PEG. Bot h are capable of reducing the size rcquirement many fold. Incrcasingly u-avelets

arc bcing tried for this task (-antonini e t al.. 1992).

TLic old saying: '-picture speaks a thousand words" has a special mcaning to statisticians

arid hiostatisticians. After al[, one of the main goals of statistics is to estract and surnmarize

the iriforrnation in the data. separating it from noise and error. If images indecd carry so

niuch more in format ion. t hen we should be particularly interested in adapting, estcnding

m c l dcvcloping statistical methodology for analyzing them. Statistics. however. has been

sion. in joining otlier fields that have become lieavily involved in image analusis: such

as the SIachinc Lcarning community. TIwre are some notable csccptions. such as Duda

ancl Hart ( 1973). who int roduced some stat ist ical met hodology. mainly in the classification

dornain. to the pattern recognition community: Ripley (1951) who has summarized esisting

spatial stat ist ics techniques: or Cressie (1993) who has pro\-ided a t heoret ical framcwork

for matheniatical image processing. These are just a few csamples. of course. and many

more could be found. The point is. however. that the statistical analysis of images is still

riot a well es tabl i sh i division in our field.

Thc ,\lachine Learning comrnunity. which has already provided statisticiaris with many

rien. escitirig models and methods. siich as Seural Setworks (Hertz et al.. 1991). Support

ICctor ,\lachines (Ikpnik. 1995). and \-eV promising Boosting learning (Freund (1995).

Freund and Schapire (1996). but see also Friedman et al. (In Press)) has bcen mucli cluicker

to approach iniage data. Most of the esamples therc. however. deal with building preclictivc

~rioticls tliat usc images as inputs: for instance cliaracter and zip-code recognition models.

\ \o rk in the clirectiori of statistical understanding and analyzing image data lias becn niucIi

lcss proniiricnt .

Thcrc arc rious us possible rcasons why stat ist icians have been rcluctant to considcr

iriiage data. One is the difficulty in working with images: it reqiiires considerable cornputingr

sophistication. Images are stored in many formats and casy to use librarics for reading.

writing and transferring betiwen them are not readily avaiiable. The problem of huge

image sizcs. where one image may bc many times largcr than a typical whole dataset

iisccl in st a t ist ics, requires much largcr computat ional resources and special programming

approachcs. -Anotlicr rcason may be the total inadequacy of many cornmon statistical

tools. Let us take a simple esample when the n x p input data matris. S consists of n

iniagcs. cach n-ith tens or hundreds of thousancis (p) of piscls. and n-hcn eacti image is

associatcd with one or more numerical responses, y. Yow. if ire think of running regrcssion

of 9 ont0 S. d i c r e n is many times smaller than p, we immecliately sec the difficulty with

çiicli data. 1Iost of the asymptotic results \vc use. increasingly qucstioncd evcn with the

-'regiilar" data. are totally inadequate here. IIorking with images requires a new way of

statistical t hinking t hat questions and examines al1 the assumptions ive have corne t o rely

ori in statistics.

1.2 Motivation and the Setup for Statistical Image

Analysis

This work dcals mostly with methods of analysis of medical images tiiat tiavc been ac-

quircd iiriclcr \-arious esperimental conditions. Imaging techniclues in rncclicine baye been

incrcasing in importance rver sirice the introduction of Sray imaging. There are non- mmany

riiodalitics that arc routinclj- uscd to gather visual inforniatiori in vivo about the workings

of our orgariism. In this dissertation 1 center oii the neuroimage data: scans of the living

f m ~ i n that rcpresent patterns of neuronal activation. The tcchriicpes introcluceci. liowci-cr.

lia\-c. in niy opinion. a mucli greater application. In this section 1 prcscnt a general cs-

pcrirnental situation which 1 vicw as a basis for the rnethodology prescntecl in the latcr

chaptrrs.

Lct LIS imagine that we have S subjects and that several images have been acquirecl from

cadi of thcm. Suppose further that thc images have been obtained under various conditions:

eithcr inciuced esperimentally. or sampled from the population. For instance \ve may have

-'riornial'- and **sick" patients. or the patients rnay be askcd to perform certain tasks. In

the first case we have that each patient is uniquely assigned to one of the conditions, in the

secoricl case ive have a blocked design with subjects as blocks. Therc coiild also be other

variables collccted in the patients, either for each image separatels or once per subject.

but siich varial~les arc of seconda-- interest and would be iised in the analysis to control

for confouriding. Each observation could be prescnted as follows:

Herc. and throughout the document. i will denote an image. The indeses 9 and r denotc

subjccts and repetitive scans acquired from a gisen subject. respectively. while k refcrs

to tlic condition or statc undcr which the image \vas acquired. The image data may be

siippleriic~ited by additional measurements. z as rnentioned aboi-c.

It is assurned that the goal of the ana lp i s is to estimate the *-differences" among the

images thnt were acquired under various conditions. 1 use the quotes on 'differcnces' to stress

thc fact that 1 do not necessarily mean algebraic differences. but an>- measurc of disparity.

\lé arc thus required to summarize that part of the variability in the iniages that ivas

iriduccd h- ttic conditions. The important goal of any descriptive analysis of csperimental

data is to proviclc some decomposition of that part of variability that is associated trith the

cspcri~~iental sctiip. Furthcrrnorc. it is desircd that ttie resiilts be in the forni of one or nlorc

iniages. with important measures (such as percentagc of csplaincd rariability. assessment

of ttie type of \-ariabi1it~- esplaincd by cach sunirriary image. etc) attaclicd. This is cjuite

diffcrcrit froni tlic prcdictive goal of many -41 methods: n-c would not bc content witli a

black 110s that takcs the images as an input and predicts t hcir condition. k. for esamplc.

One of the ciiffïculties alluded to in prel-ious sections is an atypical data setup. In the

gcricral frarnework introduced above. ive hase independent observations~ {i. x) that haw

hiigc apparent dimensionality q u a 1 to the number of pixels (plus whatever extra variables

x ive riicasiircd). tviiile the number of such observations is many times smaller. This is

refcrrcd to as an eztremely ill posed problem (cg. , Lautrup et al.. 1995). This is one

of ttic reasons that the usual inferential statistics based on asymptotic results canriot be

applicd. The motivation of this work was to develop a framework that tvould be testable

nit11 as fciv distributional assumptions as possible. To this end. 1 utilizc measures of

prcdictivc performance as a goodness of fit assessment that are robust across distributional

assumpt ions.

1.3 Functional Data

Iriiage data rnv be thought of as a two and three dimensional estension of functional data:

data t h is rcalized from observing smooth functional processes discretized on a cornrnon

latticc. Raniscy and Silverman (1997) have surveyed and estended common li~iear statis-

tical riicthods for sud i data. They develop functional alternatives to Principal Component

Regrcssion. Gcneral Linear Jlodels. including JI-ASOl:A. Canonical Corrclation and Lincar

Discriminarit -\nalysis. among ot hers.

Ttic starting point for al1 the models described in the book is the definition of the

fii~ictiorial inncr product. Since inner products are the workhorse of al1 the lincar miithod-

o1og-y in statistics. prol-ing the right defiriition for functional data results in a functional

cc~uivalcrit of the niodel. The aut hors scttle on the iisual L2(R) Hilbert space inner procluct:

n-liere T is the doniain of the data.

Of intcrest for t his t hcsis is the funct ional approach to Canonical Correlat ion -4nalysis

(CC;\). Tlie classical CC.-\ ma>- bc stated as a niasirnization problem (sec also Eq. 3.38):

wlicre S... are appropriate covariance aiid cross-covariance matrices for xi'y,. the N ob-

siirvcd pairs for which Ive seek to fit the CCA. If ive non* assume that what ive observed

are pairs of fiinctions: x i ( t ) , yi ( t ) we may define variance operators. e.g.,:

\\ï t 11 thesc definit ions the functional CC-\ criterion is posed:

nlicre <. r j are functions belonging to Hilbert space definecl by the inner procluct (L(P)' for

thc oric ciefiried in (1.2)). To obtain unique and interpretable solutions the inner products

in the dcriotiiinator are modified via perialization. Typicallj- a second derivative penalty

muid bc uscd. If one clenotes the second-derivative differential opcrator by D'. we can

rnodify tlic critcrion (1.6):

ir-hidi. irndcr niild regiilarity and boundaq- conditions. is ecluil-alent to:

iisirig the foiirt h-derirat im operator. D4.

Sou-. that the criterion is posed. what remâins is to develop an algorithm to optimize

it. givcri tlic ol~served data. One possibility pursued in Ramsej- and Silverman (1991) is to

iisc thc basis cspansion step which permits the use of standard tools for functional data.

That is. g i l r n a systern {oa(t)}F==, we assume:

(Therc is nothirig in the following algebra that recluires the use of the same systern for x

and y: in fact for the linear discrimination only x's are espandcd). In place of operators

1 >, etc, one has matrices espressing the covariance in the smooth basis domain:

The covariaiice matris for the basis is simply JIr = (o,(t): ok(t)). Similady. the penalty ma-

t r i s is Iijk = ( ( D 2 0 j ) ( t ) . (D2&) ( t ) ) . \\:it h these definitions.ive can show that the penalized

critcrion ( 1.7) bccomcs:

il-trerc the left and right canonical variates nov- are:

To sec that . let us look at one component of the denominator, ( r ) : I , ,q). without penalty

The covarianc.~ kernel is:

Thiis. n-ith pcnalty tcrm added. we get the two componcrits of the denominator of ( 1 . 1 1 ) .

The nunierator can be derit-ed in an almost identical rnanner.

TIic discriminant version of this is to use the class indicator matris. 1- tvitliout any

rcgiilarizat ion. but to regularize the observations x, (t). Our approach has been simpler: as

wc show in Scc. 3.3.4 ive onlj- cspand the canonical lariate (here. C) in the smooth B-splinc

aiiti wavelct basis. -1s compared \vit li the approach prcsentccl hcre. t his does not nccessitate

f i t t ing the basis to the data. to ohtain coefficicrits c;~. but merely projecting the data ont0

tlic basis. \\-ith least squares fitting. the difference is tliat w c arc using a rion-orthogonal

projcct ion ont0 the space spanned by t hc basis furict ions: iristead we orthogonal1~- project

ot~scrvations x,(t) ont0 each basis speparatcly. Tliis mcans that it is not necessari. to

coniputc tlic basis covariance matris. J . in orir case. For orthogonal basis. like wavelcts.

tlic two approaches are identical. For non-orthogonal basis. like B-splines tliat n-c use. i t

n-oiiltl bc a n-ortliu-hile esperinierit to compare our (simplcr) mode1 n-itli that proposed -- Rariiscy and Silvcrman ( l99Ï).

1.4 Notation and Convent ions

\\-c n-il1 adlicre to the typical statistical notation cscept for a few exceptions. Tfius x will

dcriote a dependent observation (column) vector. ~vith one or morc subscripts as rcquired.

( k ) For classes (or conditions) we will use superscripts in brackets. Thus zl j will denote the

( i j ) " ' ol~scrvatioii. with the meaning of subscripts espliiined at appropriate places. and

obtairicd iinder tlie kth class or condition. In place of x, ~r-e d l use i. if a-c are referring to

tlic scari data. \\é intend to cal1 the input images. scans. and usually resen-e the word image

to niean tlic rcsult of some analysis. that lies in the same space as the input scans. and

rnay tlius be visualized in the samc way. Howver. whcn tlie contest m a k s the differenco

clear. nec ma>- somet imes use "scan" and "image" interdiangeablu for stylist ic reasons.

\ i c usc boldface in formulas to distinguish vcctors from scalars if there zs a danger

of cnnjusion. siich as when both scalars and vectors appear. Cpper case Roman (late

alphabet: CI I -. II: SI I -: 2) and Greek let ters are used for matrices vi t hout bolciface. In

fcn- places boldfacc upper case S. 1- are used to denote randoni vectors. but. in gcneral.

ive do not make not at ional distinctions between random variables (incliiding vectors) and

tlieir realizations. unless it is necessar5-.

I\-e have intcrided to limit and localize the use of acronyms. Some dl -known oncs

(i..g.. ~1.-\?;0\*--4) are used f r e e l ~ Other acronyms used globally arc esplaincd in table 1.1.

Acronym Stands for: Descript ion Statistical technique. due initially t o Fisher. for classifying multitwiate obser-

LD.4 Linear Discriminant -4nalysis xxtions into one of few populations by means of linear discriminant functions

CCX

PET

SPE

EDF

Penalized Discriminant Analysis

Canonica! Correlation -4nalysis

Positron Emission Tomography

Estension of LD.4 due to Hastie et al. ( 199.5) t hat introduces general penalization of co\ariancc matris and provides an appealing algorithm n-ith a penalizcd regression as a main componcnt

-4nother classical multitariate technique that. git-en obserntions n-ith variables di- tided in two sets. finds "left" and -right" linear combination that exhibit maximum correlation. LD.4 can be seen as a special case of CCX

One of fem- tomographie imaging tech- niciuc. especially usefuI for imaging of the brain. The PET camera picks up gamma rays emit ted n-itliin the iniaged organ by a previously injccted radiotracer

Xnother imaging technique useci by neu- roscientists t o study b a i n function. f3IFU

functional AIagnetic Rcsonancc Irnaging is a specialization of .\lm that is able to measurc relative concentrations of osy-

Canonical \;ariate

Squared Prediction Error

Ecluivalent Degees of Freedom

genatctl blood

The linear combination(s) that result in CC-A and LD.4. If the Ci- lies in the image spacc. or has been reconstructed using the B-spline or n-avelet basis. WC sornetimes rcfer to i t as a Canonical Image

The main measure of prcdictive performance that WC use, defined as SPE = (1 - gr)' where 2 is a postcrior probability of the true class

One IV- t o normdize the riclgc penalty hyperparamcter by calculnting the trace of the "hat" or projection matris in ridgc

Table 1.1: Some comrnon acronyms used throughout the thesis

Chapter 2

Neuroimaging Data and Methods

2.1 Goals and Study Design in Neuroimaging

Sciiroiniagirig is a relativcly Young discipline that at tcmpts to study the workings of the

hrain aiid t lie central nervous system tiiroiigh irnaging tcchniclucs (Frackowiak et al.. 1997).

The rnost irriportant goal is to discover the functional organization of the brain: t h rict-

works of functionally connectcid structurcs in the brain spccific to a gi\-en task or groiips of

tasks. It is postiilatcd that the brain is organized in \-arious. likely overlappirig. netn-orks

tliat are coriiiected by function rather than anatoniical1~- (Strotlier et al.. 199% ,\icIntosh

et al.. 1997). Tlicse nctworks. that can be obsemcd as patterns of activation. corne to hfc

when the hrain faces a specific challenge. and work together to delil-er a response.

Tiiere are two opposing views of the brain organization. One dcpicts the brain as a

monolithic black bos of neurons. This view of massive parallelism has lcad to thc develop-

ment of -4rtificial Seural Xetworks. n-hich have becorne a very su~cessful computational and

mocleling device. rather than a true mode1 of the brain. The ot hcr view of precise iocaliza-

tion of the arcas within the brain has becn supported througliout the centurj- a series

of first anatomical. and tlien functional discoverics of specific rcgions in the brain starting

with discoveries of language components in the brain by Broca, 1Vernicke and Lichtheim at

the cnct of t h ninctccnth ccntuv. The updated view lies. as it often happens in science.

soriiewhere in between. One way to describe it (Strother et al.. 1995a. ,\IcIntosh et al..

1997) is to consider functional networks of areas. where the networks are specific for the

task. Tlierefore. on some level. we do have honiogeneous areas in the brain. but it t - d k es a

systcrn of these. not one. for the brain to process a task. Tlic same regions are likcly used

in qiiite different situations wlien the' will be connected in different networks. -4 similar

1-iew is cspoused by the notions of j u ~ ~ c t i o n u l segregation and fvnctional integration (pp. 5 .

Frackon-iak et al.. 1997). The functional segregation concept refers to large number of spa-

tiaI1y localizecl areas in the brain that work more or lcss independently. Tlie integration

iclca refcrs to the global integration of these specializcd arcas in the face of the task. Tliese

two vicn-s are riot csactly the same. If n-e assurned that some orthogona1it~- needs to be

iriiposecl tlic fiinctional nctworks' concept ~vouId rnorc easily correspond to the orthogonal

ri<>tn-orks of areas. while the fiinctional integration/segrcgation description woidd be lwtter

scrvccl by thc orthogonality among the specializcd arcas. It is not clear. bon-cver. that ariy

ortliogo~iality assumptiori is correct in dcscribing the brain fiinction.

To clarify some idcas 1 non- provide an esaniplc using parts of the cortcs rcsponsible. to

sonic rlcgrcc. for our motor abilitics. ,\Iost of the description here is takcn from (Frackowiak

ct HI.. 1991. Ch. 11) and Soback et al. (1991). Tlierc are ma-- parts of wltat is known as

a niotor cortes: primary rnotor cortex (111). supplcmcntary motor arca (S11-1). premotor

cortcs. also known as pre-SlI-1. secondary motor cortex and cingulate motor area (ClI-A).

-411 of tliesc correspond to the distinct B ~ o d r n a n n areas which arc a meticuloiis division of a

liiiman brai11 donc on the basis of the local propertics of the brain cells (cytoarchitecture).

It lias becn cstablisfied that -\II. pre-SJI-4 ancl SSI-4 contain multiple representations of

thc b o d ~ that is they are organized sornatotopicaly. -4notlier words. one can find mostly

contigiioiis arcas that correspond to ail parts of Our voluntary motor system. from toes to

the toriguc and facial muscles. The situation is quite complicated: the cingulate parts of the

niotor cortex arc still controrersial. the pre-SM-A and SILA areas seem to be composed of

cvcn srnaller parts of some autonomy. and the function of secondan- motor cortex remains

largcly unknon-n. There are also subcortical areas in the cerebellum and the ventral part of

the thalaniiis. u-hich contribute significantly to the functioning of the motor cortex There

arc also parts of the somatosensor?- systems necesSan. for rnotor control.

1 Iajor research questions center on the functional significance and connectivity of al1

tliesc. arid otlicr. related. systems. \le would like to understand how the brain coritrols

Our miisculatiire. how does it plan and esecute movernents. Iiow is the working of the

niotor cortex of a fine pianist different from that of an average person. Can movemcnts bc

clil-idetl irito groups that correspond to distinct actit-ity patterns? \\é are only beginning

to tackle these questions mostly with neuroimaging techniques. The second half of the last.

arid al1 of the present century have provided us with a hiige body of hou-lcdgc rcIatcd to

the anatomical arrangement of the brain. Ive thus know a fair amoiint about thc major

conricctiorts in the brain. but to enhance oiir understanding of this most important orgari

of ours. wc must coriccntratc on its functiorial arrangement.

-4 conimon t>-pe of study design used in neiiroimaging attcmpts to dclineatc the ac-

ti\-atiori signal iisirig two contrast states. Thc states are chosen in such a fashion that

t lie tliffcrencc in activation patterns will provitle maximum information about a spccific

fiiriction of the brain. For esample in our fingcr opposition (FOPP) PET data (Sec. '2.4.1)

tlie baselinc and actimtion states differ only in the presence of absence of paced fingcr

riiovemcnt: in particular the eyes are patched in bot h states and thcre is no auditory input

escept for tlic pacing signal in the active state. Similarlj- in the StaticForce flIRI (Sec. 3.42)

espcrinicnt the subjects ohsen-e control lines during the baseline state to compensate for

the 1-isual stimulation due to the force lewl display in the active states.

In anothcr kind of a study design one gradually varies a single parameter that relates to

strcngth of a supposecl neural signal: an esample is tlic StaticForce dataset. This situation

is similar to t hc dose-rcsponsc relat ionstiip t hat is common1~- obsen-ed in pharmacologicai

stuclies: litre one would look for patterns of activity that change with the parameter. The

ctiarigc niay he monotonie. linear o r not. o r ma>- be abrupt a t first (in a transition from

basclinc to active state) and not change mucti hereafter. Indeed. both kirids of activit~.

rnay be rclatecl t o two different neuronal patterns a t the sanie time. For instance. Sadato

ct al. (1996) reports how bilateral primary motor area and contralateral ventral premotor

cortes. among others. were equally ac t iwted during an active phase of finger rnovernents

of ii~crcasirig complesity. On the other band. the ipsilateral premotor area. also among

ot hem. lias shotvn a linear incrcase with the movement complesity. \Ve tiierefore observe a t

lcast tn.0 important networks of activation in this sti~ciy: one associatcd with the movement

itsclf. a n executive network. and one responsible for processing and planning of movcmeiit

whicii tlierefore has had an irxreased act iv i t~ . for comples tasks. Similar results have also

hcrri rcported by Catalan e t al. (1998).

2.2 PET and fMRI Modalities

Thc objcct of an? neuroiniage modalit5- is t o rcveal thc neuronal activity throiighout

t hc brairi 1-olunic or part t hereof. T h e two most commonly used ivhole-brain modalities

arc Positron Emission Tomograpliy (PET) and functional llagnctic Resonance Imaging

(f3lRI). Xone of thcsc actuallj* record the neuronal spiking patterns: rather they go after

a "prosy" rncasurc whose correlation with the neuronal activity has been established.

2.2.1 Positron Emission Tomography

Positron Emission Tomography (PET) is a general imaging technique tha t is used for many

purposes in inedicine where an image of physiological function is required. PET modality is

an irnpro\-ernent over other radioisotope-based modalities. such as single-photon ernission

corriputecl tornography (SPECT). The description in t his scction is bascd mainly on Ollinger

aritl Fcssler (1997).

PET works by counting the number of high energ'- (512 kY) photons ernitted froni the

irriaged orgari. In summary. positrons are createrl u-hen the injected radiotracer steadily de-

ca?*s: siich dccay produces a single positron which very shortly annihilates with an clectron.

The anniiiilat ion procliices two 512 k\- photons propagating in nearly opposite dircct ion.

T h c PET camera is able t o detect single photons and synchronize two hits t o estal~lish that

t iic two photons originated from the same ariniliilation. In t his way discrete approsimations

to t lie line integraIs of radiotracer density along nianu lines are computcd and the 2D or

3D image of the activity obtained by the inverse Radon transform.

-4 P E T cilmcra Lias detectors made of crystals (usually bismuth-germanate) which con-

vcrt a single Iiigh-cnergy ,712 k\- photon into about '2.500 Iight photons. These are thcn

fcd irito Phot O .\ Iiiltiplier Tubcs (PUTs ) wkiich t hen change the light act i\-ity irito clectri-

cal sigrials. l los t of the scanners connect eacli block of sniall crystals (-y. 7 x 8 a r r v of

c r~-s ta l s ) into a block of fewer PSITs ( s q . 2 x 2 array of thcm). Tlic crystafs in t h block

diffclr sliglit1~- aniong each othcr which allows the camera to determine the onc crystal in

the block hit t>y a photon. The camera counts the numbcr of euents: a pair of photons

Iiitting tn-o crj-stals on opposite sidcs of the camera witliin a ver? short period of time.

callccl the coincidence timing window. usually about 10 ns. These counts. aftcr the? have

bccri proccsscd by the inverse Radon transform. create t hc 3D image of organ fiiiiction.

Thcrc arc many simplifying assumptions and problems that ciecreasc the signal-to-

noise ratio of the data . First. it is assurned that the positron will anniliilate an electron

inirncdiatelj- aftcr bcing emitted. It has bcen shon-n that the positron range is usuallj-

srnallcr than Imm. which is much smaller than the resolution of the scanner. and therefore

ignored. Anotlier assumption is that the annihilation will produce photons flying out in

csac t 1)- opposite directions. It l m been established t hat the divergence from collincarity

is or1 the order of one degrcc or less? and can also bc ignored. The other problems arc

more serious and usually cannot be ignored. The first of these is attenuation: a decrease of

pliotori's crier= due to its interactions with body tissue and \vit h outer shell electrons. The

intcraction with body tissue. a photoelectric interaction. whilc a big problem for SPECT.

is negligible for PET due to the type of radiotracers used. The other type of interactiori.

Compton scatter can bc statistically corrected for in the image reconstruction process.

This correction is possible because the attenuation esperienced by the pair of photons

is iridependent of the position of the annihilation event. To correct for attenuation one

approsirnates the the probability of a single photon pair esperiencing Compton scatter

wliicli clepends on the total distance traveled. and then includes this correction in the

iniage reconstruction step. The probability of a single photon. traveling along the line 1.

riot cspcriericing the Compton scatter is modelcd by the following equation:

n-licrc / L ( J ' ) is the lincar attenuation coefficient a t position x. This probability is usually

approsimated by obtaining two extra scans: transmission ancl blank. These are obtairied by

a liric soiircc of radiation rotating arounci the field of view of the caniera with (transmission

scari) ancl tvithout (blarik scan) the subject. The ratio of counted everits for each possible

liric 1 approsimates the the probability 2.1. The number of detected events in the regular

e7nission scan is then corrccted for the probability of scatter.

, I r 1 additional \va>- to correct for Compton scatter. so called scatter correction coincs

from tlir! fact that the scattered photons have smaller enc re . The energi- can be measured.

to some dcgrce (to about 10% on most scanners with bismuth-germanate crystals) a t the

iridi\idiial dctcctors and then the threshold established bclow which the events are not

coiintcd.

Coinpton scatter prociuces another undesired effect: it causes a deflcction in the path of

tlic affcctcd photon. lfost of the timc such a photon will not hit any detector: the unscat-

tercci corrip1ementar~- photon w-hich will hit the detector is callert a single. It is possible.

g i \ m tlie large number of scattered photons. that two singles will hit the camera within

tlic coinciclcnce timing window-. and therefore be crroneously counted as an event that oc-

ciircd on the line joining them. Such undesired events are called randoms or accidental

coincidences.

The l u t problern which needs to be correctcd for is tlctector deudtinie. This is due to

tlic finite arriount of time that the detector needs to process a hit: during this time the

detcctor is not able to sense any other hits. The cletector deadtirne lirnits the nixsimiim

dose of the radiotracer that can bc placed in the patient: the researcher will try to use the

niasimum dosage that will still not satiirate the camera. but which provides enough events

t o niake t lie discrete approximation. used by algrit hms such as filtered backproject ion. to

linc ixitegrals. implicit in the Raclon transforni. viable.

Tlic PET data can be collected in 2D or 3D acquisition mottes. In 2D mode. the

c\-c~its arc counted in slices pliysically determincd by collirnators: thin annular rings of

tungstc11 c d e d septu. This mode results in greater accuracy by decreasing t lie probability

of scattercd evcnts and randoms. since man'- more of scattered photons originating in the

ZD ficlcl of view will never hit the colliminatecl detectors. Howcvcr. 3D mode. i r i which

the scpta are retracted and al1 possible events are countcd. has up to eight times increasect

scnsit ivi ty n-hich lcads to decreased image variance and/or lowered doses of radiotracer

reqiiired. Cntil recently most of the PET data \vas collected in 2D mode. most1'- due to

l a d of 3D reconstruction algorithms. However. 3D mode is iiow gaining a wide acceptancc

ir i nciiroiniagc comrnunity.

Tlic data collected by the PET carnera are in the form of counts for each possible

liric in tlie field of view. To reconstruct the image of the radiotracer density inside the

irnagctl organ one uses a computational approach to solvc the inverse Radon transform.

callcd filtercd-backprojection (FBP). After correcting for some or al1 artifacts, such as

attcnuation and scatter, one has. in the 2D scanning mode. the data in a forrn of photons h

eniittcd in a given line, indesed depth and angle. and denoted by -\led. Tbcse counts

coristitutc tlic input to the FBP. Tlie goal is to estimate a 3D distribution function of an

radioisotope. A(x. I/. 2) with values of line integrals available:

FBP is a clcterministic method that assumes we have obscrved perfect data. that is:

ovcr a discrete sct of angles. 8 and depths d. This is an instance of inverse problern and one

spccializccl soliit ion to t his problem is the algorit lim called filtered back-project ion. The

algorithm has bcen estencled to full 3D reconstruction. There arc also approaches that Lise

clist ribu t ional assiimptions. most ly Poisson. but tlicy have yet to gain widespreacl support.

Functional Neuroimaging via PET PET niay bc uscd for imaging nian? orgins. Iri

f~irictio~ial ricuroiniaging. whcrc ive want to obtain information about tlic tieiironal activity.

two possil>le radiotracers emergecl: [ 18~]~iioro-2-deosu-~-glucosc (FDG) and [ radio va ter . FDG is a radiolabcled glucosc and a l l o u PET to show local glucose concentration in thc

brain. This ir i tiirn is able to show us the neuronal activity since increased ncuronal activity

is \-cri- quickly follotved by the surge of glucosc-rich blood. (Barinaga. 1997). Similarly.

[ '"0]watcr tracer allows PET to image the bloociflorv in the brain. Blood is also thouglit

to surge irito active neural areas to mect the dcmand for osygen needed for metabolism of

gliicosc (tlierc is, howver, sornc controvcrs>- regarding the nature of mctabolism of neurally

active areas. sec for csample Buxton and Frank (1997) and Barinaga (1997)).

2.2.2 Functional Magnet ic Resonance Imaging

\\i, \vil1 first describe (non-functional) AIRI. u-hicIi is sometimes caIlcd anatomical MRI in

the neuroimaging community, as it describes the stat ic anatomy of an organ (the brain. for

csaniple) ratlier then the dynamic function. l los t of the ciescription here is based on the

ovcrvicw article hy \4right (1997).

General Physics of MR Imaging

'\IR techniques are based on magnctic properties of atoms. callecl nuclear magnetic reso-

riarice. first observed in the forties. In medical imaging one uscs. almost esclusivcl~. the

siriiplest atoni: single proton h'drogen nucleus. -4 hyclrogen a tom ni- be thought of as a

niiriiscuIc niagnct with its two poies prociucing a local magnctization vector in a certain

oriciitat ion. In t lie absence of any esternal niagnetic field of significant strength. t hernio-

dynmiic rriovcment causes random dist rihut ion in the local mügnet izat ion vector direct ions

wliicli rcsults in a net magnetization. M. equal to zero.

,\IR niachiries iiscd for human diagnostics applj- a static ficld. Bo. of strength u-hich

is -5 ordcrs of magnitude liighcr than the part h ficld (a typical ,\IR machine lias 1.5 Tesla

(1.5T) static field wliich is about 20.000 larger than the cartli field). The most visible

cffcct of siich a large static field is that it causes a small portion of liydrogen niiclci to align

thcriisrtlt.cs in the direction of Bo, which ive presume to bc d o n g tlie \-ertical asis. 2. in a

3D refcrcncc franie. The process of alignmcnt is not immediate. the net magnetization in

the direction of Bo has an esponential delay:

-11: is a vertical magnetization at time t after the stat ic field has be turned on. dlo is the

as>-rnptotc of this. and T l is a longitudinal relaxation time which is a property of a material

studied. For esamplc, a t 1.X': T l for grcy matter in the brain is about lOOOms, while for

n-hitc mattcr only about G5Oms and evcn less for fat ('260ms).

-\ssuniing that thcre is a non-zero net magnetization component in the plane perpen-

tlicular to Bo field. i.e. in the S I ' plane (which is not the case in a tissue without any

nlagnetic influence: such component is introduced by the SIR machine as describecf later).

the strong static field. Bo causes the -1-1- magnetization component to rotate aroiind the

Z asis as it tries to align itself with thc static field. In the litcrature. this rotation is called

precession. and its angular frecluency is directly proportional to the strerigth of Bo. This

frcquency is called the Larmor frequency. =\ny rotating magnetic dipole. such as a h?-drogcn

proton. generates electrical current in the coi1 that is positioncd perpendicular to the plane

of rotation. This is the signal detected hy the SIR machine: the coils are made to resonate

at the Larmor frequency of a proton to masimize the signal detected.

In gcneral. any volume of a tissue tliat contains many protons will have not net niagnc-

timtion in the or transverse plane. This magnetization is induccd in the SIR machine

with a R F pulse that rotates with the Larmor frecluency in the transverse plane. Figu-

ratively speaking. one may imagine t h e RF pulse as tipping over tliesc dipolcs that have

aligritd thenlselvcs with thc static field. The RF pulse is applied Ioiig enough to tip tlic

tiipolcs to t hc horizontal direction: \vit h tirrie the -1-1 - magnetization componerit. while

rotating aroiirid the Z axis. will return to the thermal eqiiilibrium condition wherc the only

sigriificarit component is in the Z direction. But an cven larger component contribiiting

to the deca'- of the SI' signal cornes from the gradua1 lost of pliase coherencc arnorig the

prccessing dipoles. -Ifter the RF pulse. the precessing dipolcs will be in phase. Duc to

thcir hctcrogcneous physical environment they \si11 preccss with slightly different rates and

as a consccliience the phase coherency will be lost resultirig in a diminishing signal. The

associatcd csponential d e c q has a characteristic time constant. denoted by T2'. To restore

thc pliase cohcrenc>+ SIR machines apply another magnetic pulse that induces a sp in echo.

Spccifically. let the dipoles evolve for time T : when some phase discrepancy d l be evident.

.-\pplj. a short (as compared with T ) magnetic pulse in a single direction (say y) in the

trarisverse plane (in practicc one applies the pulse in the single direction of the transverse

plane rotating wit h Larmor frequency). This pulse effectively '-flips" the dipoles about the

y asis: t h s e that were prccessing faster and were ..ahead" of y. non- lag hehind the same

arnoiint. and vice versa. The result of the refocusing pirlse is that after time 27 the phase

colicrcricy will be restored. assuming that rate differences arnong dipoles do not change

n'ith tinic.

In practice. the precession frequencies of diffèrent dipoles do change with time. and.

dcspite spin echo. the S I - signal will decline. The dynamics of this dcclinc. taking into

account t h rcfocusing efforts. are rnodeled as an esponential ciecay with constant T2:

-4s Kas the case for T l . the transverse relaxation time. T2. is tissue specific. For grey matter.

whitc rtiat ter. and fat Tl ' s are: 106nis. 69ms ancl GOms. (al1 at 1.5T) respectivcly. -411 tlic

at,ovc corisidcrations arc captured in one eqiiation discovercd by Bloch in 19-16 wliich

<i<wx-it,es the full dyiiamics of the magnetization field. M ( t ) = (.Il,(t). .il,(t). .Il-(t)):

n-here B is the total magnetic field applied. The first term describes the gcneral precession

d!-naniics (7 is a gyromagnetic constant: for protons. 7 = 2ir - 42.6 .\IHz/T) and the

rc~iiaining terms deal with transverse decay (Eq. 2.3) and gradua1 alignment of dipolcs

witli the static field (Eq. 2 .2 ) . -4fter the signal declines to limiting levels. a short period

is rccluired fur the sj-stem to return to the thermal equilibrium within the static field.

txfore riest the RF pulsc is applied and the measurement process repeated. Tlie total time

t)ctweeri RF pulses is denoted by TR and is usually in order of few seconds. The time

t~etwccri refocusing pulses is denoted by TE= Zr.

Contrasts and Spatial Irnaging in MRI

The '\IR signal mcasured. that cornes from the precession dynamics is eventually used to

produce the tissue images. Depending on the tissue under study different contrasts may

bc iisecl: tlicse are combinations of of T l and T'Z relaxation tirnes. Depending on the time

n-iriclou- duririg wliich the data is acquired. one weighs either of these more hcavily in tlic

rcsiilt irig coritrast. \l'Iiat is needed at this point is a way to spatially select rcgions for data

acqiiisition. to producc images. This is achieved in a few steps. F i r s t l ~ the "static" field.

Bo in the Z direction rias a linear gradient. Since the precession frequency of protons in the

trarisversc planc depends on the strengtli of the field. one may acqiiire the data in slices by

applying RF pulses witli different frequencies matched to the precession frequcncies in thin

sliccs alorig Z a ~ i s . This ensures that the recorder signal conies mostl- from the dipoles in

a spccific horizontal slice.

To locatc tlic signal in the S I - plane. sirnilar idcas arc uscd. One introdiices gradients

i n S arid 1' tiircctio~is tliat also \ .ap with time. The signal acquircd iip to time t. say.

n-il1 be a 2D spatial Fourier transform of the total slice magnetization. samplcd a t the

spitt ial frcquericy. ( k , ( t ) . k , ( t ) ) . the so-callcd k-space. Thc k-functions arc intcgrals of the

rcspccti\-e tlyriamic gradients owr time. u p to t . The image of thc slice magnetization is

recoristriïcted iising inverse 2D Fourier transform.

Functional MRI

Fiinctiorial AIRI was clcvcloped by Ogawa et al. (1990a.b). The %mctional'' adjective rcfers

to t lie n o 4 utilization of )IR technolog?. for imaging physiological fiinctions as opposed

to stat ic structures for which it was originally proposed.

Fiirictional LIRI takes âclvantage of the differing niagrietic properties of hemoglobin that

dcpcricl on whcther it carries osygen or not. Osyhcmoglobin is diamagnetic as are other

tissucs in tilc brain. Deoxyheinoglobin is paramagnctic and causes changes to the proton

niolecules in the u-atcr within the blood and surrounding the blood vessel. This is called

a BIood Osygenation Level-Dependent (BOLD) contrast. The paramagnetic nature of

dcosyhernoglobin is s o m e l i o ~ ~ '-felt" b ~ - the u-ater moiecules "close-by" which amplifies the

BOLD signal significantly. The change in magnet ic suscepti bility affects the distribution

of Larnior frecpencies of nearby photons. causing a much greater phase spread. This in

tiirri cililses a significant decrease in )IR signal in affected areas and results in a magnetic

coritrast rnostly dependent on the TI' tinie constant. Fitnctional A L R I produces images

that show local concentrations of osygenated hemoglobin in veins and capillaries. Sincc

wc bclieve that there is strong correlation bctweeri levels of osidized blood and neuronal

activation. BOLD images may bc intcrpreted as images of the brain function.

2.3 Literature Overview

PET and. rriost recently. n I R I data have been anal>-zcd b ~ - the a plcthora of niethocfs of

c\-cr- incrcasirig complesity. The most iniportarit challenges sccrn to he:

Huge input dimensionaiity: each "observation'- is a n image coniposed of 30-500 thoii-

santl numhers. This leads to the "estrernely-il1 poscd" situation ( c g . . ,\Iorch et al..

1997) where the nurnber of variables is much larger tlian the numbcr of observations

Time series effects: even if al1 ive necd is a pattern of neuronal activity n-hich corre-

sponds to the activity that we study. ive know tliat we cannot obtain a truly repcti-

t ire expr iment witliin a subject: the brain state changes througliout the esperiment.

as t herc is some learning, change in the environment. adaptation. and niany othcr

transient effccts

Subject effects: \Vith multiple subject stuclies. ncedcd to obtain results of some gener-

ality. ive observe (Strotlier e t al.: 1995a) that the differcnces anlong brains producc

cffccts n-hich are much larger than the efTect due to the stimulus iinder study.

Spatial correlation: The activity in nearby brain locations is correlated. This niust be

acknowledged in eit her the modeling stage or in hypot hesis testing paradigms (e.g..

\\Torsle>- et al.. 1992) or both.

Alan>- initial met hods analyzed Regions of Interests (ROIS) which n-ere a nianiial1~-

delincatecl using anatomical or other known regions in the brain (e-g.. Clark et al.. 1985).

The average activity within each region was used as the input to an>- siibseqiient analysis.

This resiiltecl in a great reduction in input dimensionality as one n-ould typically have a

few dozcn regions at most. This met hod has been rnost ly abandoned u-it h the introduction

of m ~ t hodolog'- to deal with estremely-il1 posed problems and the corresponding software

packages. The most fiindamental criticism of ROI methodolog'- is directcd a t the fact

tIiat manilal and ad hoc ROI dcfinitions imposes a strong. mostly anatomical. prior on the

a~ialysis.

Ciirre~itl!- thcrc are several groups of models used to estirnate activatiori maps for PET

ancl tlie closcly rclatcd functional magnetic resonance iniaging (f l lRI) techniqiie. Oiir

cittcgorization follows that recently proposcd by Lange et al. (1999) in fllRI. First. con-

sider tcdiriiqiics t liat esplici t l ~ - incorporate t hc csperimental state of the siibj ect for eüch

scan (c.g.. büscline or activation) with possible additional esplanaton- variables such as

~iciirops~-chological performance measures. whicii correspond to the " interesting indica-

tor/categorical variables" and "covariates" of the general linear mode1 (GLAI) approach of

Fristori et al. ( 19%). Simple subtraction of ( possibly standardized) average images from

two csperimental states is the most widely used esample of this approach (e-g.. Fos and

1Iintiim. 1989. \iorsley et al.? 1992). and -4SOl*-A related methods for more than two

states liavc bcen generalized by GLlI . Thcse methocls attempt to find thc activation pat-

tern u-hic11 is driven by (or whidi drives) the esperimcntal or observed conditions: an

irnposed stimulus. motor task or abnormality.

The second category includcs tcchniqucs such as principal componcnt analysis ( e g .

Friston et al.. 1993. Strother et al.. 1993b). wliich require no esperimental brain-state

information. In these methods. one attempts to csplain the gencral variability with a

set of indepcndent or orthogonal components and then post hoc link some of these to

the cspcriniental conditions. The probleni with these niethods is that thc variability is

partit ioncd n-ithout any reference to the stimulus or esperimental conclitions. which are

t lieu --souglit after" among the resulting coniporients. The t hird wide category includes al1

t lic rion-linear niodels such as neural nctu-orks (e.g.. Kippenham et al.. 1994- Lautrup et al..

199.5. Morch et al.. 1997) and very recently \Olterra kerneIs within the GLAI frarnework

(Friston. 1998).

,411 t h e catcgories tnay be appIied to two différent spatial data representations thât

havc cvolvcci to dcal with the IiighIy iIl-posed nature of the functional ncuroirnaging do-

main: rian~cly tlie hiigc dimensionality of the input space. eqiial to the number of voscls in

t tic iniagc (e.g.. 20 to 30 thousand in PET). coniparecl to the availablc numbcr of inclepcri-

dcrit scaris. wiiicti is typically onlj- a few hiintlrcd. The rriost comnion spatial reprcsentation

iiiitially ignores tliis issue b>- analyzing individual vosels. or voliimes of intercst (\-01). as

iridependent saniplcs. gcncrating a test statistic for cadi vosel (or \-01). and thcn post

hoc allon-ing for simple local spatial correlations by using inferential tests basecl on randorn

ficltl thcory to threshold the rcsulting statistical parametric maps (e-g.. l\orsley et al.. 1992.

Friston et aI.. 199.5. \Iorsley et al.. 1996). The second representation uses a data-drivcn

basis. such as the one obtained from Singular lk luc Decomposition (SI'D) of the input

data matris, to rcduce the effective dimensionality of the modeling problcm. This \vas

iritrocfiiced to PET for 1 7 0 1 rneasurements (Clark et al.? 1985. 1Iocller et al.. 1987. ~Ioeller

ancl Strotlier. 1991) and \vas then estended to vosel bwed [1 '5~]water studies (Lautrup

et al.. 199s. Strother ct ai., 1993a.b, Friston et al.. 1996, Worsley et al., 1997).

2.3.1 Single Voxel Analysis: Statistical Parametric Mapping and

Gaussian Randorn Fields

Iri t his scct ion 1 will summarize one popiilar methotl: Statistical Parametric Mapping

(SPlI) ~ i t h Gaussian Random Field tIieory testing (IYorsley et al.. 1993. Friston et al..

1995. \\orsle>- et al.. 1996). a method that n-orks separatel'- with each vosel and thcn

iiscs cstimatect spatial correlations to control the type 1 error in the multivosel hypothcsis

tcsting.

11-itli tlici simplcst S P l l setup. ones assumes that the scans come from two different

conditions. sa)- Basetine and .Active (\IorsIey et al.. 1993). -A more general framework

\vas providccl in \\orslcy et al. (1996) wliere any (one) contrast could be used. -4so in

t tic ricn'cst incarnations SPA1 may come from an>- method tha t gerierates a '2 ' . 't'. 'F-

or \ ? statistir a t eacli voscl. \ive will kecp to the simplcr tn-O condition situation but

ttic estrnsioii to general contrasts is immccliate. Let iilk(x. 9. z ) denote tlic kt'' scan from

siibjcct i ( i = 1.2. .. . . . n ) unclcr conclition j ( j E {.A. B}). The riormalizcd subjcct spccifiç

corit rast irriagcs arc formcd:

TIiiis the scaris within each subject are averagcd for each condition (-4. B) and di\-idccl

II!- tlic subject and condition specific constant T c , . ( - . -. - ) wliicli estimates the global blood

flolv. (Somctimes the scan-specific normalization by i i l k ( - . -. -) is used before averaging

ovcr k (Strother et al.. 1995a. -Appendis). to try to remove the scan global blood flow.

Sornc normalization is nccessary. especially for the PET measurments. as these are relatil-c.

The propcr normalization methods are subject to some debate (Strother et al.. 1995a.

Appcndis)). The fi constant is used to kcep the standard deriation of the difference the

sanie as t h of the original scans (\\orsley et al.. 1996). but it docs riot appear in the carlier

\-crsions of the method (Worsley et al.. 1992). The contrast images are then averagcd ovcr

siibjccts to produce the mean difference image:

n-licrc r i is a number of subjects. Again. iiorsley et al. (1992) uses n in place of fi. Sonic

estiniatc of standard deviation of 1 is then used to normalize the mean-difference across

t-oscls. Thcrc are many choices: the simplcst. proposed in iVorsley et al. (1992). is to

calculate the subject-specific estimatct for cach vosel and then a\-erage over vosels:

wlicre \ ' is number of vosels and:

Tliis assiinics that the variance across vosels is tlic same. The other choices are not to

pool across vosels. to pool over conditions iisirig ,ASO\--\ or .iSCO\---4 estimators (Friston

ct al.. 1991. ll~orsley e t al.. 1996) or to combine t h latter witli tlic pooled estimator 2.6

(Ilorsic>- ct al.. 1996). The inherent dilemma is the lon. dcgrecs of frccdom available if rio

poolirig across vosels is done (duc to sniall riumbcr of sul~jects) and a strorig assumption

of Iioriiosccdasticity across vosels if sudi poolirig is donc.

The statistical t-rnap is formed by dividirig the mean clifferencc image 2.3 by an estimate

of noise. wtiicli can eitticr hc an image itself (Le. 1-oscl-specific) or a scalar. as cicscribed

abo\.c. Csing the estimate 2.6, for esample. the t-niap is:

Tliis gii-es a t-statistic for every vosel. The problcm is non- to determine ~di ich of the

niany thoiisatid t-statistics are significant dcsignating neuronal regions with significant

cliarigc bct~vccri tlie conditions. Typically in PET images tlicrc would be about 30.000-

40.000 intcrcranial 1-oxels which are uscd to form the t-map, and tlierefore that man'.

t-statistics. By using the unadjusted significance level. a. 11-e will seriously otlerestimate

the ovcr-ail significance. o r inflate the type I error because of the muItiple testing problern.

Since tbere are large spatial correlations esisting in the images and therefore in the t-map.

t lie siinplest Bonferroni adjustment method, which is most effecti\-e \&en the tests are

indcpcndent would be very consen-ative decreasing thc potver t o ver- small levels. One

solution proposed in \Iorsley et al. (1992). generalizing and making inorc rigoroiis the

idcas in Friston et al. (1991). is based on the theory of maxima of Gaussian Randorn

Fields (,Adler and Hasofer. 1976. Hasofer and Adler. 1978). -1 threc dimensional Gaussian

Random Field with mcan p ( x . v. z ) and covariance C( (x l . 9,. zl). (xs 7/i. z - ) ) is a continuous

stochastic process. G ( x . 9.2). such that for an?- finite T L . and for any selection of points

( r 1 = ) - . . . ( x y 2 ) the joint distribution of { G ( x l . y, . z , ) . . . . . G(z,. y.. z,)} is

n-variate Gaussian with mean {p(xl. yl . zi ). . . . . p(x, y,. zn) } and the covariance mat r is

ol)tained Ily cvaluating CO\-ariance function C a t the n' pairs of points.

The 111ai1i idca is t o derivc tiic single thrcsliold. t,. sucli that undcr the nul1 hypothesis:

P(T,, > t,) = ct.

u-hcre Tm,,, is a masimum t value. C-sing the Gaussian Random Field thcory. thc convenierit

riiill hypotlicsis is that if t hcre is no difference bctween conditions. the t-niap will constitutc

a zcro Incail. Gaussian noise. with the (scalar multiple of) identity covariance function.

C. Rciiiarkablè-. using the notion of Euler characteristic number. one can approsimately

c\.aliiatc protmbility 2.9 for any Gaussian Randoni Field, G (,Adler and Hasofer. 1976,

\\orsle>- ct al.. 1993. Eq. 1):

Ilcre. 1- is a volume of the image, in some units. and the ,\ is a 3 x 3 variance matr is of

partial dcrivativcs of the field in cach dimension z, y, 2, in the same units as \/-:

T h matris of partial derisatives (1.11) constitutes a way to specify the covariance

striictiire for the continrioiis and homogenous randorn field. The diagonal cntries tell us

lion. the field varies in the t hrce axial directions. and the off-diagonal entries give variabilit!.

in t h t hrce diagonal directions.

If .\ n-crc known. the Eq. 2.10 could be (numerical1'-) int.erted to find the dcsirccl

ttircshoIcI. ta . The covariance matris .\ may be approsimated using niimerical differcnces.

wliicli is. lion-ever. a poor and uristable cstimate. ,hotfier solittiori is proposecl in \ iorsle~-

ct al. (1992). n-hich uscs the known properties of a sinoother which is applied to the scans.

Ilsiiig ttie assumption. that iiritlcr ttic nul1 liypotiicsis. the t-map is a white Gaitssian noise

ficld. one cari dcrivc an espressioii for tlie covariance rnatris of the wiiite noise corit-olvcrl

wit 11 a kcrncl sniootlicr. This is then itsed in Eq. 2-10 to calculate t , whicli is t hcn ixscd to

tlircsholcl the t-niap and select the --significant" voscls.

SPSI with Gaussian Random Fielci tlicory for determining the thresholcl has been a

grcat step forirard in the analysis of neuroimages. It has also bccn successfully applicd

in other ficlds. such as astropt~.sics. The theory is as remarkable and practical as it is

hcautiful. Tlierc are, however. a number of assumptions going into t h SPlI method. some

of wliich ha\v bccn addresscd in later papers, which could cause prohlems in intcrpreting

ttic rcsi~lts. The norrnality assumption may only be viable if thcre arc a large numbcr

of clcgrecs of frccdorn going into estimating thc t-map. Typically. this is only the casc

u-lien thc voscl-wise variancc cstimators arc poolcd across vosels. This however leads to

rlic possibly ovcr-simplistic assumption of homosccdasticity across the brain volume. T h

otlicr possible problem with the SPlI method is a specification of the riull distribution:

tliat of the white noise Gaussian field convolved with the smoothing filter applied to the

t- nap p. This seems to disregard any possibility of spatial smoothncss present in the t-map

beforc prc-smoot hing is applied. when w e know t hat the hemodj-namic response. which is

actiially ~iieasiired by PET and t3IRI. has an estent of 3-5mm (Jlalonek and G r i n d d .

1996) arid the reconstruction techniques tliemselves impose spatial smoothness. \\-hm the

riiill Iiypothesis is rejectcd. ive thcrcforc still d o not know. even upholding the normaiity

assiiiription. whether the breach came because of the non-zero mean. u-liich is the desired

rcsiilt. or because of the misspecified covariance matris. -1. llost likely it is both. which. at

bcst. leads to the i~lconclusive answer. and at ~ o r s t mai- point to the totally wrong rcgions.

Il'itli t lie lack of rcalistic simulation studies that would examine the robustness of thc SPA1

rricthod uridcr -1 niisspecification. it seems probable to me that crrors in estirnating .\ may

casily lcacl to the rcjection of the nul1 without an>- support in the mean.

2.3.2 Scaled Subprofile Model: State-Driven Variance Decom-

position with Global and Subject Effect Removal

Sc&d Siihprofile Slocicl (SSll) of lIoe1ler et al. (1987). llocller ancl Strother (1991) has

bccri <le\-clopcd to identify regional variation producecl by a treatment or a stimulus allowing

for licterogcncous covariance patterns and subject cffects. It has been specially formulatccl

to dcal n'ith high-climensional PET datasets obtained using a small number of subjects.

and to work with a minimal set of assumptions regarding subject, treatment and residual

cot.ariaricc patterns. It strives to partition the variability (sirnilarly to -4SO\:l model) to

dissociate t lie subject and treatment covariance patterns.

The two main cquations of SSN are:

Iriitially. (lloeller et al.. 1987. Sloeller and Strot her. 1991) the indes i was meant to de-

riotc siibjccts in studies of one scan per subject. and the method \vas implemented for

prc-dcsignatcd regions in the brain. SSlI LI-as later (e-g.. Strother et al.. 1995a) success-

fiilI?- appliecl in the situation u-here index i denotes a combination of subjects. treatment

(stirriuliis) and rcpetition cffects. and is therefore unique for eacli scan. and the vosels are

iisccl in phcc of regions. In the above equation. each scan is decomposed into global and

rcsicliial i~iiages ( p and ai) which are called Group Slean Profile. GSIP. and Subject Resid-

iial Profiles. SRP. respecth-ely. in Sloeller and Strothcr (1991). Each scan has a scaling

factor. s, associateci with it. The GSIPs are further decomposed into a set of orthogonal

Groilp Invariant Siibprofiles (GISs) here denoted by &. This is related to a previously

riicritioried approach. where each vosel is separately modeled with -4SO\--\ or -4SCOI:A.

aricl the rcsicluals then grouped back together to forrn residual images. diidi are thcn dc-

c-oniposcd iising SI-D or similar technicliics. Strother ct al. (1995a) Ilas listcd similaritics

arid cliffvrcriccs between these approachcs.

Tlic rriairi part of SSSI Kas the dcveloprnent of a proccdurc for estimating \-arious parts

of thc rriodci. 1Ioeller and Strother (1991) provide a cletailcd description wliich tvc will

briefi?- siinirriarizc I-iere. Scaled Subprofile Alode1 can bc approsimatcly cspressed as a

1-oscI-wisc. two-way ASO\:-l on a log scale:

wlicrc the index j refers to vosels: and the division in the residual term is made voscl-by-

voscl. Small signal approsimation In(1 + x) z x for x << 1 is uscd to derive the -4XO\:A

corresporitlcncc. where x = {%l j? for tlach rose1 j. The cstirnation procedure assumes

tnodcI (2.14). It starts by removing two main effects from the log-transformed scans by

double-centcring the log-scan matris, l e i j = ln iij. The resulting matris is then decomposed

iisirig Sirigular ITalue Decomposition. The Mt- and right-hand eigenvectors may be shown

respect il-cl? One can t lien use Ii-variate regression of .\- average log-scans. ln i,. ont0

î t k to estirnate the Ii offsets in (2.15) and Ins,. which comc up as regrcssion coefficients

and rcsiduals. respectively. In fact. the regression only identifies the part of s which lies

i r i an orthogoiial complement of the subspace spanned by -yk: ivitfiout an assuniption of

ort hogonality betwecn the two. ttic SSAI model is not identifiable. Froni the regression

rcstilts orle can estirnate the remaining terms.

SSll providcs an int uit ively appealing, log-linear model for t hc PET scarls collectecl frorn

varioris siibjccts. The parametcrs have t hc following physical interpret at ions: .Y, are scan-

specific niultiplicati~c factors which are related to global scan cffccts. both pliysiological and

rricthodological. e-g . . subject close. The global radioactivity levels arc very liard to control

a t tlic esperinicntal stage: they arc a results of a comflicatecl interrelatioriship bctwccn

the tfosc of radiolabelcd agent. weiglit and other physical diaracteristic of the patient aiid

iiriErrio\vri pliysiological efkcts that affect the distribution of the agent n-itliin the brain.

Th(! Incari pattern p represents a hypothetical brain state that is conimon to al1 scans.

This ma'- incliide a coarse description of rcgional diffcrenccs ttiat is invariable across scans

itrici suhje~ts . The scan specific variations. ai rcprcscnt patterns superin~poscd ont0 this

rrlcan brairi state pattern. Thc resulting imagc of the sum of the meari and scan-specific

patterns is normalized via the global scaling factors. s,.

Tlie niain rcsult of SSJI consists of a set of rcsidual patterns. @, toget her with their

wciglits. y t k . which arc scan-specific. One may show (Eq. 4 Noellcr and Strother. 1991)

that the total variancc of log-transformed scans ma'. be approsirnately decomposed into a

global tcrrn. an error term and the residual profiles terms whicli are indcpendent. One may

tliereforc rcprcsent the Ph scank contribution to the total variance b>. tlic sum of squared

n-cights. C, y;', for this scan. Also, if the overall index i is broken into i, for subjects, i,

for coriclitions and i, for repetitions, the following decomposition of variancc resiilts:

sincc y. . . . . .k = O for each k. IVhat the three pieces represent are betwen-condition. repeat-

tria1 and iritersubject variance contributions for each Siibject Residual Profile. @,. n-hich

are uncorrehted. This allows us to study the particrilar contribution of each SRP. and thus

cietcrrnines n-hether it is mostly associated with the su bjcct variances or the study design.

2.3.3 Partial Least Squares

\IcIritosli ct al. (1996) propose anot her intcresting mult ivariate mctliod for the anal>-sis

of iiciiroimagcs. callcd Partial Least Squares (PLS) (tiieir method is not relatcd to the

wcll-krion-11 regression mode1 under t hc same namc. desçribed in (e.g.. \\ald ct al.. 1984)).

Thc ailthors rnotivate PLS as a iiniqiie approacli that results iri tlie spatial patterns which

optirrially csplain the covariance bctween a set of scans anci tlic --csogenoiis bIocks". The

latter caii l x formcd by contrasts of interest. or ma!- iricli~dc csternal nieasurcs: siich as

t,clia\-ioiiral. performance etc. PLS is related to 110th simple t-maps and. conceptually. to

t h Scalctl Siibprofile '\Iode1 described in tlie prcvious section. It is also rclatcd to LD-4

aiid licricc to our proposal.

Let. as before. ,Y denotc an ,V x p matris with -V scans cadi with p vosels. Let 1-

l x tfic -\' x I< .-esogenous block" matris. with I< blocks: contrasts or esternal rneasures.

For iristance, to follow an esample in the paper. we may haw a miiltisubject PET data

ohtaincd undcr 3 conditions: 1 baseline and 2 active. 1' may contain two colunins: the

first. comparing the baseline to the average of the other two. witli [2: -1: - 11 for a scan in

condition 1. 2. 3, respectively: and the second comparing the two active conditions with a

[O: 1: -11 contrast.

The basis for the PLS method is an S\yD deconiposition of a cross correlation matris.

S = s ' ~ I - ' . a product of column-centered and column-normalizcd S and 1- l . That is:

n-ith -4: a p x K matris of orthogonal singular images. and 1-: a matris of orthogonal

profiles. arid the diagonal m a t r k D with singular values. The a,. b, pair of the first col-

rirnn vcctors of -4. B gives the best linear approsimation to esplaining the cross-correlation

niatris S. The first singular value dl gives the strength of this association: wiien squared

and dividcd by the sum of al1 squarcd singular values it is a proportion of the total variance

csplainccl. One may esamine the image al. together with t lie the first profile. to deduce

the fcatrires of the esperiment and the associated spatial map. which contribute niost to

the O\-crall 1-ariability. 1IcIntosh et al. (1996) also introducc a third measure: subject scores

ot~tairied 13'- projecting individual scans onto the singular images. Thus for cach singular

irnage one obtains -V siibject scores (lvhich could more aptly be calIed scan or image scores)

wliicli can t x plot against the conditions or estcrnal nieasures to gain furthtr insiglit into

t lie fcatiire represented by a particular singular image.

PLS approach includes a calibration method to validatc the modcl and determine the

ti~iriibcr of significarit singular imagc/profile pairs. 1lcIntosh et al. (1996) sliow ttiat the

colariance betwcen the j t ' l subject score: 6, = Sa, and the similar jth lcft liand score:

d.', = 1-b, is d,. the jth singular value. Tlius PLS is finding the singular pairs n-hich

siicccssively masimize the coi-ariances between the left and riglit hand scores. ,\IcIntosh

et al. (1996) propose computing the regression of subject score on the rotvs of S. the contrast

or estcrnal measure matris. -4s a measiire of validity they proposed R2: tlie proportion of

i-ariancc esplaincd ~IJ- the regression. To determine a significant cut-off point, PLS uscs

a permutation test: r o w of S are permuted and for cach permutation an S\'D applied

ancl R"omputed. The R2 computed on the given. unpermuted data. is compared to the

'The paper does not makc it clear R-hether cross-co\.ariance or cross-correlation matris bctwcen .Y and 1- is uscd. The esarnple in the appcndis clearly works with cross-correlations but the text refers to the cross-covariance. Also, the papcr uscs the notation for S and 1' n-hich is the reverse of ours

distribution dctermincd from the permutation test and. for a given significance value. its

significance asserted or rejectcd. This way the wIiole PLS mode1 and a number of significant

singiilar pairs may bc estimated-

PLS constitutcç a complete descriptive technique for neuroimage analysis. in much the

sarrte n-ay as a paradigm proposed for the PET data in this thesis (PLS does not inciude

an>- cstcnsions to deal with the temporal sequences such as oncs found in flIRI data).

It is rriultivariate in nature but ùoes not attempt to niodel the spatial properties, such

as spatial smootlincss. in the data. These could be introduced in a \va>- similar to that

proposed in this thesis. The permutation test to determine the number of significant

singular pairs rcprcsents a big step fonvard from other descriptive techniques. but it is not

wi t hout problcms. The scan data normalizatiori. for csamplc normalizing vos& to unity

\-ariarice. irit rodrices potentially big variability into the procedure. but it is not tcsted in

the pcrrriiitation stage. In gc~ieral. ive can think of LD-4. and hencc oiir procedurc. as

an cstcrisiori of PLS. wliere the per-vosel normalization is rcplaced full-scalc covariance

riorriializatiori: one that involves rotation. as wcll as. scaling. Sirice the full-rank corariancc

rriatris caririot bc cstimatcd. ive use penalization as s1ion.n in the nes t chaptcr.

2.4 Datasets Studied

TIic niethodolog'- described in t his thesis n-as applicd to two datasets. Here, we givc their

description. together with the study design, and esplain the pre-processing steps applicd.

2.4.1 Finger Opposition Task

Tlic fingcr opposition dataset, which ~ v e will cal1 FOPP, is the result of the ["O]water PET

study ori 45 \-olunteers which nTere scanned betwcen April of 94 and December of 91 in the

PET rescarch center of Iétcran ,-Iffain 'rledical Centcr in llinneapolis, \ [S. Due to the

lirliited axial size of the \-.A PET camera (10.Scm) it was not possible to cover thc wliolc

brain. Consec~iicntl-. scans from 18 subjects were discarded as they did not aclequately

colver two particularly important areas: the motor area in the cortes (top of tlie lieacl) and

ttic cersbcllum (at thc bottom of the head).

.\ctclitionally. scatis from 7 subjects had unacceptable between-scan movement. The

iiriacccptable liead movenient \vas defincd as that u-hich causes a misalignment of more tlian

oiic vosel (3.125 x 3.125 x 3.325rnm3) between any [ l J ~ ] w a t e r scan and the attcriiiatiori

scari. Thc exact amount of niovement was dctcrmined based on the results from thc 6-

parametsr rigid body transformation (\Iroods et al.. 1992). The idea of tracking movement

bascd on tlie aligntnent transformation was described in Strother et al. (199-4).

Thc final dataset uscd in this thcsis consisteci of the scans from 20 subjects: 7 maies

(agcs: 42. 39. 30. '25. 41. 33. 37) and 13 females (agcs: 4.5. 30. 25. 33. 53. 53. 56. 3.3. 41. 47.

34. 35. 27).

Eadi sill~jcct was scanned eiglit or teri tinies. Odtl scans w r e taken iiiiclcr the baseline

coritlitiori. tvitll each stud>- starting frorn scüri 1. ~d i i l c the cvcn scans wcrc obtainccl wliile

the suI>j~ct pcrformeci a simple rnotor task. In both States. the subjects had tlieir cycs

covcrccf \vit11 a patcti and w r e lying rclawcl. During the basclinc state t h cars wcre

plirggecl wit li irisert carphones: while during the active state an auditory pacirig signal

\vas dcli~crccl tlirough the earphoncs. Each subject received onc practicc lesson before the

studj-. TIic niotor task consistcd of sequentialli. toucliing, using thc left-hand thiimb. each

of tlic rcniainirig four fingers successively fore and back. The task started with the i.v.

injcctioii of the radioactim [150]\vater bolus and a 90s image accpisition startcd whcii

the radioacti~c bolus reachcd the brain (typically after 10-20s): as assessed by the total

~iunibcr of courits detected by the PET camcra. The scans wcre acquired in the 3D mode

and recoristructed using 3D filtered backprojection. Thc data iras corrected for randoms,

cleacl t inie and at tcnuation, but not for scat ter.

Scaris for eâch subject were separately aligned to the first scan using thc intramodality

iniagc ratio technique described in I\oods et al. (1992). This process uses a linear transfor-

mation to correct for translation and rotation of the head between scans. The eiglit or ten

subject scans wcre then averaged and the averages used to calculate the twelve subject-

spccific paramcters for the inter-subject alignment algorithm (IVoods et al.. 1993). This

algorithm transforms the subject scans to the commori a~iatomical space of the brain in

Talairach coordinates. by appl-ing rotations and translations plus non-rigid body trünsfor-

mations such as shears. to the subject average volumes by comparing them with a simulated

rcfcrcncc PET voliinie in Talairach coordinatcs' space. The intrasubject aligned volumes

wcre srnootlicd with 3 x 3 x 3 and 5 x 5 x 5 boscar smoothers. with simple boundav

correct ion. and t hcse toget lier n-it h t h e unsmoot hed 1-olumes wcrc t ransformecl into t lie

coniniori Talairach space using the subjcct-specific tn-elve parameters derived bcforc. Tlie

3 x 3 x 3 srnoothcd volumes were uscd to derivc thc intracranial niask volume consisting

of 1's for voscls iriside tlie brain. and 0's outside. The mask \vas derivcd by tlircsholding

cadi vol~inic a t the 45th pcrcentilc.

2.4.2 Static Force flMRI data

Sc\-critccri voluritecr subjects had bcen scannecl with a static force paradigm. Each run he-

girls \vit11 tlic baseline condition and alternates between active ancl baselirie. as before. Tlie

active condition required the subject to apply a constant force to a small force transducer

ticltl htwcen kiis/her thumb and index fingers. The actual force applied was displaycd on a

scrcen togcthcr with the -*tolerance bars" according to the espected force levcl. Thcre were

5 forcc l c \ ~ l s : 2OOg. 100g. GOOg. 800g. and 1000g. and the order of thcse \vas randoniized for

cacli sul~jcct and run. Each subject performed tu-O runs. In each run thcre are 11 instances

of altcrnating baseline and actil-e conditions n-ith 3 force Icvels. Each instance ran for 44

s c c o ~ u k During thc baseline conditions the subjccts wcre resting and viewing control lines

on ttic screcn.

The data \vas collected using a 1.5T GE Sigma Scanner with a whole-brain echo-planar

sccjricnce (TR=4s. TE=ïOms, tau offset-Ems). Each image volume consisted of thirty 5mm

oblique axial slices with 64 x 64 rosels (3.123 x 3.1%5mm2) per slice. The postprocessing

of volumes inclucles:

1. 1-isual inspection and esclusion of images witti obvious motion. artifact. poor position-

irig. ancl where the performance and ncurophysiological measures indicatc a failure

to pcrform the task

2. semi-automated generation of brain masks for anatomical SIR1 arid fNRI volumes

3 . calculation of the Gparameter within-subject rigid-body alignmcnt matrices of masketl

fl\lRI scans to the first scan using .\IR (\ioods et al.. 1098) - discard runs show-

ing more than sub-vosel movemerit bascd on tiic maximum voscl movcmcnt in cadi

volume aftcr application of 6 paramcter matrices to brainmasks (Strothcr et al.. 199-4)

4. aligriitig tlic witliin-subjcct fl lRI scans and calculation of t h subject-average aligncd

flIRI scan.

5. calculation of rigid-body alignmcnt parameters for average fMRI to high-resolution

anatomical SIR1 scan using ,AIR

6. visual inspection of the alignment between the average flIRI and SIRI witli and

wit hoiit t he transformation and choosing the best

- r . calculation of the between-subject 12 parameter affine alignmcnt paranieters of the

tiigh-resolut ion anatomical )IR1 to a high-resolut ion )IR1 template in Talairach space

S. formation and application of a single transformation matris taking each masked f!tlRI

scan to the Talairach space

9. dct rcndiiig t hc time series using 4 cosine functions (vosel-by-vosel)

The da ta from three volunteers u-ere eliminated during the first step of visual inspec-

tion. and sis more were eliminated during furt hcr processing. In two of the rernaining cight

siil~jccts thc first riin had poor neurophysiological performance measiires witli one missing

forcc le\-cl. As a rcsult the Static Force dataset contains only single (second) run from 8

siihjccts. Fiirtlicrrnorc. first three scans ancf some transition scans (thosc occurring right

t~efore or after condition change) n-ere dropped because of knoivn artifacts and hcmody-

rianiic traasition effects. The eight remaining subjccts arc 3 males and 5 fcmalcs with an

average agc of 31 zk 6 years.

Chapter 3

Penalized Linear Discriminant

Analysis with Basis Expansion

Iii tliis chaptcr we prescrit t h general franicwork for the analysis of neuroiniagcs. \\c

prtwnt oiir rriotivation. algebraic dcrii-atiori and important details of the conipiitational

aspcct wliicti ive devcloped to deal witti the estrernely ill-poscd nat iirc of the nciiroiniaging

clat a.

Tfic hasis for our anal'-sis has bccn Linear Discriminant ,Anal>-sis (LD-A). LD-A and its

alriiost -ccluivalcnt sister. Canonical \ariate -Lrialysis. n-hidi have Ixcn iised in ncuroimaging

bcfore (e-g.. Amri ct al.. 1993. Rottenbcrg et al.. 1996. Arclekani ct al.. 1998) with thc

cspcrimcrital states defiriing the classcs. Ttiesc studies deal with the ill-poscd nature of

t hc problcrn 1,'- cither dcfining a small number of Iblumcs of Interest (1.01) - anatomically

Iioriiogericous regions of thc brain a-priori consiclered important to the stimulus studicd -

or by dcfining the SI-D-derived basis. Tlic resulting Canonical Variates - one less ttian

t h riuniber of classes or states - are thcn intcrpreted as the neiirai activation patterns.

i l ' i th two classcs the resulting single Canonical Iar iate can be vicwed as an alternative to

t h inetliocls tliat rely on images formecl bu subtracting the average of scans in eacli class.

pcr1iaps norrnalizetl and prcprocessed by -4SCOC--A (Friston et al.. 1991. \\orslei. et al.,

1992. 1996. Friston et al.. 199-5). \\'ith rnorc than two statcs. the LD--4 approach results

in several Canonical Variates ordered in their contribution to esplaining the between-

state variance. Finally. the times-serics effects may be studied by defining the classes to

correspond to the temporal order of each scan.

Scction 2.3 mentions a broaci categorization of the methodolog'- cleveloped for ncuroim-

agc arial!-sis. Tiicrc are methods that deal n-ith stiniulus-indiiced changes. general variance

decorriposition niethods that do not take irito accourit the statc indicators. and methods

t h introciucc non-linearity in various ways. There are also. esisting quite indcpendently

of t hc t hrce categories, two data represcntations used for arialysis: single-vosel and SI-D-

dtrivcd basis.

111 oiir approach. ive acknowledge the multivariate. spatially corrclated nature of tiic

data and introduce a t hird representation b>- espanding the dcsired canonical activation

iniagc in a srnooth basis. Then a penalized version of LD,\. callecl PD-\ ancl developetl in

Hastitl ct al. ( 19%). is applied with smoothness constraints on the canonical variates. This

is sccri (-Appcridis B) as cquivalent to projccting the input scaris on each basis scparatcly.

carrying out the PD,\ analysis in the projectcd dornain ancl rcconstructing the canonical

image froni the result ing cotfficicnts. -A furt her ridge petial ty on t hc n-itliin-class covariance

niatris allou-s for data-depcndent choice of the esact amount of smoot hncss retliiircd. Our

rricthocl inaj- also be seen as bridging the two categorics: analysis of stimulus-indiiccd

charigcs arid gencral variance-partitioning met hods.

E w n with a preliminary dimensionality reduction using SI(-D. or a srnooth basis. the

ill-poscci nature of the functional neuroimaging problem preclucics naive application of

rnulti\-ariatc statistieal mcthods siicii as a standard GLAI or LD-A. even though the data is

clcarly multivariate in nature. The problcm of overfitting the data. here ticd strongly to

the " curse of dimensionality" (Bellman. 1961. Hastie and Tibshirani. 1990. pp. 83-84.). is

cspccially acutc in these data sets, and leads, in many cases. to singularities or saturated

tiiodels a t h s t . \Yith input dimcnsionality (niimbcr of voseIs or basis elernents) so high.

<I\-en simple linear modeIs become very flexible and powerful. with high overfitting potential.

Tlicrc are simp1~- too many degrees of freedom available even with linear rnodels. seemingly

as niany as the number of vosels. although we cannot ob\-iously use more than the number

of scans. The proper assessment of validity and generalizability of the modeling results

is tlicii of paramount importance. The classical goodness of fit techniques. based mostl-

ori asy~nptotic results of some global measure of the residuals. are totally inadequate hcrc

sincc the asymptotic assumption of large :V, the number of observations. as cornpared

to p. thc dimension of the space. are ei-idently not met. The nced for optimal niodel

selection techniques based on measures of modcl generalizability lias been advocated by

sonic (Iiippenham ct al.. 1994. Lautrup et al.. 1995. Strother et al.. 1995b. 1997. 1998a.

llorc-li ct al.. 1997. Alorch. 1998. Hansen et al.. 1999) but has bcen largcly igriored in favour

o f typ ica l l~ as>-niptotic infcrent ial tests of unkriown gcnerality ( c g . Friston. 1998).

Opcrat iorialI5- t tiere arc tti-O problems t hat we t ry to adclress wit h t his approach: strate-

gic dinicnsi~nality reduction of the input spacc. and proper asscssment of niodel gericral-

izal~ility. Tiic first problem is attackcd in two \va>-s: (1) ive induce a srnootti prior on the

spaw cf tlic resuIting canonical image(s) in the form of a non-adaptive. sniooth basis i n

u-liicki n.c cspand thc image(s) - in this thesis. KC use tcnsor protlucts of cuhic B-splines

aricl n-al-elcts: (2) we rcgularize the mode1 further with a simple ridgc penalty that is a com-

promise betwcen the mode1 and the estimated spatial covariance and acts as an additional

srrioot tincss constniirit by reducing t hc effective degrees of freedom.

By posing the estimation of activation maps as a classification problem ive operate

wit hin the probabilistic framcwork of decision thcory where ive can address the issiic of

tiioclcl gcneralizability witli predictive performance measurcs. If wc impose the need too

opcrate wittiin the predictive framework and recluire that a method must result in i~nages

that arc interprctable in a given esperimcntal contest. we are led naturally to LDA-like

approaclies. Our smoothness-constraineci. penalized LD.4. is an extension and special-

ization of the general Penalized Discriminant .-\nalj-sis (PD.-\) model proposed by Hastie

et al. (1995) and also investigated by Sielsen et al. (1998) in the neuroimage doniain. In

section 3.3.4 ive derive an efficient algorithni for fitting PD-\ suitable for this estremely

ill-poscd data.

\\i. aclclress the important problem of model generalizability and ralidity of the result ing

activation niaps using prediction error. In section 3.3.7 WC propose two predictive perfor-

mancc rneasures: i lis classification rate (1IC rate) and Squared Prediction Error (SPE)

11ascd ori posterior probability estimates of class mcmbersliip. To estimate these we use

a statc-of-the-art specialization of Bootstrap. the .632+Bootstrap (Efron and Tibshirani.

1997) (the name cornes from the fact the the probability tliat atiy observation is incltided

in the bootstrap sample is. in the limit. .632). u-hich ive compare with the more tradi-

tional cross-validation (Cl-). These are resampling tcchnicpcs wliicli givc ncarIy unhiascd

est iriiatcs of prcdiction crror anci arc Iargcly frce from distributional assumptions. 11-e

iisc tlicse measurcs to: (1) compare results built with different numbcrs of B-splinc basis

furictioris. and results built n-ithout basis expansion but witli diffcrent aniounts of voscl

prcsniootliing. (2) optimize the ricige parameter wliicli fine-t unes the amount of smootti-

ries i ~ i the result. and (3) compare simplc preproccssing with and witliout mean scan

riornialization.

3.1 Classical Linear Discriminant Analysis

The lincar discriminant function was introduced by Fisher (1936) for two classes as a

sensible approach to discriminate betwecn two sets of obsewations. Fisher sought to project

tlic data on the line in a way that would maximize the separation bctwcen classes as

nicasiired by the betwecn-class variance. The masimization problem must be normalized

ancl i t is intuitively appealing to carry it with respect to some measure of da ta variability.

\Ivi t li n id til-ariate data. the pooled \vit hin-class covariance mat ris is a goocl candichte.

Tlir proMcni tlien becomes one of finding a linear projection that maximizes the ratio of

htu-cm-class to \vit hin-class variance.

Tlit LD-4 niethod iws latcr estended to multiple classes (Rao. 1948). LD-4 can also

bc clerivcd. if al1 canonicai variates are used for classification, as a maximum likelihood

classification rule if the data in al1 classcs is assumed to follou- multivariate Gaussian

distribution ivith a common covariance matris (l lardia et al.. 1979. Hastie et al,, 199.5).

\\Ïtli a simple modification to account for prior class probabilities (usually estimated witii

t h proportion of the observations in each class) the LD-4 method may also be seen as a

pliig-in Bayes cstiniator n-ith the sarnc assumptions.

Let us start by establishing some notation used in tliis section to introciiicc t h classical

LD-4. Let II" t x an ilh pvariate observation in class k = 1. . . . . li in Ii-class classification

problern. Let n i hc a number of obscrrations in class k. and .Y = xk r 2 i . Define tlic

bctn-ecn -class and \vit hin-class covariance mat riccs b>- t lie iisual l I - \SOI~4 quant it im:

;\lgcl>raically. LD-4 is a follon-ing optimization probleni: find a h sucli tliat a h ~ a , . is rnas-

iniizcd subject to abWah = 1 and subjcct to a;Waj = O for j = 1.. . . . h - 1. This can

tx poscd as a generalized eigenvalue problcrn:

siibjcct to the aforementioned orthogonality constraints on a,, ivith rcspcct to W. The

solutiori is the cigendecomposition of W-' B: with the first cigcni-ector defining the first

carioriical variate which esplairis most of the variability betwcen classcs. Geornetri~ally~

with two classes, LD-4 sceks to separate the classes in the pdimensional space by the

straight linc that is orthogonal to the line joining the two centroids of the data that has

first been sphered using W-' .

3.1.1 Discriminant Functions and MANOVA View of LDA

\\'itli two classes. k E { 1. Z}. .inderson (1984) clcfines a discriminant fiinction I l - ( 1 ) = xT6

n-liicli. n-itli appropriate plug-in estimators. leads to Eq. 3.3. From tliere WC sec. that

the two-class LDA seeks a univariate random variable. aTx. that rnasimizes the ratio of

espect ed scliiarcd bctween class differcnce to its variance.

-A gciicralization of that rcsult may be viewed in lI-ASO\--l contest. In section 12.5.

lhrciia ct al. (1979) de\-elops a test of diniensionality that lcacls dirtctly to t h LD-4's

carioriicaI variatcs. Briefl>-• witti al1 LD-4 assumptions. let r 5 min(p. lï - 1). bc tlic

proposcd din-icnsion of tlic hypcrplane n-ithin diicii al1 f< class mcans lie. This tcst is one

of possil~ilities to explore sliould the general (one IV?) '\I-ASO\-A test of al1 means eqiial

l x rcjcctcd. Sotc. that this test lias no analogue in t h iinivariate casc: there one can only

go aftcr specific contrasts.

In gcricral i< rneans spaii a ri- - 1 dimensional hyperplane. given that p is at least Ii.

One nlay wish to sec whether the actual dimensionality of the problem is smallcr, lience

the tcst. Tlie Likelihood Ratio version of this test entails a set of vectors. proportional to

tlic crinonical \-ariates. tliat may be used to tcst for successivel~* larger r: the first vector is

iiscd to test r = 1. the first and second to test whether r 5 2 and so on. Ttiesc vectors span

the siicccssi\-cl>- higher ditnensional hyperplanes such tliat, for a given dimension r. each

1iypcrpIane is a Alasimurn Likelihood estimate of the liypcrplane that contains the IC means

uiidcr nul1 hypothesis. Thcrefore. ive cspect that thc first canonical image will cshibit

t hc fcatirres t hat most distinguish t h e classes, normalized for the coi-ariance structure.

Successi\-c canonical images show the further featurcs of the data that are uncorrelated

n-it h the previous ones.

3.1.2 The Geometry of LDA in Two Class, 2D setting

Figure 3.1: Demonstration of 2-class LD.4 in 2 dimensions. The Ziglit points (class 1 ) and darkcr points (class 3) show the 200 bir-ariatc Gaussian obserr-a tions gcneratcd frorn each class. The solid line is the true canonical r-ariate (C\,'). The circles are clüss mcans. and the dinmoncls are the means projccted ont0 the Cl*. The points marked n-ith the cross. reprcscnt the test point and its projections ont0 the mean-diflcrence (broken) and C l - 1 in es.

Figiire 3.1 shows a demonstration of the LD.4 with two classes and in two dimensions.

Tlie data lias been gcnerated using 2D Gaussian distribution with rneans (0.55,0.45) for

thc first class and (0.2.5,0.6.3) for the second class. The covariance matris \vas chosen to

obtain non-circular shapes with an oblique angle. The test point (a cross at (0.38.0.38))

tliat. given the shape of two Gaussians. quite clearly belongs to class 2. is actually closer

to the mean of class 1. when using the rcgular Euclidean distance. Csing this distance

is ccluivalent to projecting ont0 the mean-difference line. shown in broken style in the

Figure 3.1. and carrying the Euclidean distance classification in one dimension. This line

rcprcscnts tIic t-test image with the pooled cstimatc of standard deviation (ecpation 3.8).

Th(: ca11011i~;ctl variate line (solid) is correctcd to reflect the non-circular shapc of the data.

aritl is a compromise betn-een the first principal component and the mean-ciifference line.

Orle projects the data and class means ont0 this linc and then uses Euclidean distance to

classify.

3.2 LDA and Random Subject Effects

Ir1 al1 of the analysis in Chapter 4 1vc do not csplicitly consider subject effccts. In this

scctiori WC i~ivcstigate how clarigerous tîiis is. and how miich accriracy wc loose to gain sonie

corripu tational advantagc offcred by scan-space version of PD,\. 11% \ d l oril'. look at the

classical LD-4 with two classes and in the .-non-ill-posed" setting. i.c.. with p < n.

It turns out. that LD--4 is doing alrnost --the right thirig" if the data are assunicd

to comc from a two-way replicated design with random subject and fisetf class effects.

This illustrates the clear advantagc of LD,A over methods bascd on average differcncc

iniagcs. likc t-maps. These. and other methods usually subtract subject effects. which is

eqiii1-alerit to assuming a fised-effect modcl. It secms much more plausible to regard subject

effccts as random rather than fiscd. Csing random effects also lets us assess the prcdictivc

perforrriarice of the model. something that is not possible n-ith a fised-effcct structure. \\'itli

fiscd cffccts. such as models that subtract the subject averages from t hc data. ive cannot

cstcnd t lie results bcyond the population studied. and thus cannot use prediction error for

validation. Thus ive n-iil sce, that in addition to tlic fact, that LD-1 is able to correct for

rioriciiagonal covariance mat ris (which corresponds to the non-iid noise structure) it also

--automat icaI1y" deals \vit h random subject effects.

To show tlie workings of LD-a uncier random subject cffect rnodel. assume that we have

S subjects inclesed by S. and each subject had her/his obsen-ation obtained r = 1. . . . . R

tiiiies in cach class. The rnodel for tlic observation r!:) (in class k) is:

v, -. N(0. Es) and érs - N(0. Xe).

wherc Es and Z, are covariance matrices for both effects. -4s usiia1. ive assume that the

siibjcct tcrnis v, are indepcndtnt betwen subjects and indcpcndcnt of tlic iid scqucncc é , , .

.Usa riotc. t h both covariance matrices are the sanic in cach c l a s k. wliich is in tlic spirit

of LD--4.

Tlic Gairssiari vicw of LD-A. assiimcs tliat tvc have a niiiltinornial Gaussian distribiition

in ~i1c .11 rliiss n-ith t h e samc covariance rnatris. and iid obscn-ations in cadi class. Onc tlien

classifies II>- either maximum likcliliood rulc (Ilarclia et al.. 1979. p. 301) or by Bves riilc.

Ig~ioririg prior probabilities. botli rules assign observation 3: to the class with masimuni

likcliliood for x . (The Fisher LD-4 is equivalent when one uses al1 canonical variates and

wlieri a poolcd within covariancc matris estimatc is uscd for the cornrnon class covariance

mat ris).

In our case: al t liough t lie observations from the same subject arc no longer independcnt.

t h ciccision rule is similar to the LD.4. \Fe have:

Thiis the classification is based, as usual. on tlie ~Ialialanobis distance:

-4s cornparcd to LD-4. the onlj- thing that changes is the covariance matris. To esamine

hou- Fisher's LD-4 is doing in this case. ive need t o look at whether the pooled within-class

covariance niatris. used implicitly in LD.1: is a good estimate of Ts + Se.

\\'c n-il1 start by working \vit h the observations in a single class. Let S be the .C- x p

riat tris of observations. \\è have that:

Let ils look at t hc ( t l , t 2 ) element of II.-:

Son..

i Es(t i. t 2 ) + Xe ( t 1 - t 3 ) when ( r . .Y) = (r', .sl)

C(r.s: rI.5') = Es@,$ t 2 ) s = s l , r # r1 (3.13)

1 s # s',r # r1

Tlicre arc -Y pairs for the first case, SR(R - 1) = N ( R - 1) for the second case and

.\i2 - -V - S ( R - 1 ) for the third. Thus:

.\ricl thus ive conclude, that:

- - - . v - R X S ( t I - t 2 ) t Se( t l . t 2 ) .v- 1

l\-c can tlicrcforc see. t hat t lie usual poolecl wîtliin-covariance estiniator that the LD-A

iiscs will tiridcrcstinlate the cornbincd covariance Ss + T, that '\IL requircs. T h arnount

of iinder-cstimatc II-il1 depend on the relation be twen the subject cffcct and error term.

arid the riunilxr of replications per subject relative to the riilniber of subjccts. In oiir

case. -\- = '20 . 4 = 80 and R = -4 so the underestirnate clocs not appcar t o be significant.

.kyniptoticalli-. if the nilmbcr of scans for each subject is hcld fised. the abovc estimate is

irrit~iascd. Tlic rcsult does not change whcn I< classes are considcred and the /\- estimates

of thc coninion covariance rnatris in each class are pooled. since each class lias the samc

cova-tria~icr structure by assiimption.

T 1 i ~ ariiount of l i a s caiised t ~ y the factor in front of Es will also depcncl on Iiow

tliffortvit lmtli covariance matrices are. For instance. if t h e - are not ctiffercnt. tticri LD-1

resiilts in a n cstimate wtiich is correct u p to a scalar multiple. and thc sanle gocs for the

ca~ioiiical variates. Thiis if thc action of the two matrices is conceritrated on the first few

cigcri\-ectors which are similar for both Ys ancl the bias would bc minirnal. O n the other

liant1 tIic hias will have a stronger effect on the canonical \-ariates if the leading eigenvaliies

of subject covariance are large as compared to thcir Z, counterparts, and arc associated

\vit11 \.crj- diffcrent eigent-ectors.

3.2.1 Simulation Study

To asscss the cffccts of the biascd covariance mat r i s estimate tha t LD-1 is implicitly iising

with sut~jcct effccts, we lia\-e performcd a simulation stiidy. In the study Ive compare LD-A

irsirig t hree est imators of the covariance matris:

1. The iisual pooled wit hin-class covariance matris. II . -

A h

2. Tlie siim of common within-subject and error covariance matrices Cs + X E

3. The corrected within-class covariance rnatris: I I - + &Ts

i\-e iised t h cstimators of Ts and SE proposed in the lI-YXO\:A contest b- .Anderson

(198.3). -Anclerson et al. (1986). These correct the usual within and between sum-of-

square and products estimators to make thcm positive semidcfinite. The aiithors concern

thcrrise11-es n'ith o n c - ~ v q random effects lI-iSOl'A' but the resiilts hold in Our case of

rniscd-cffccts 2-way SI-ASO\:i. One starts by forrning the usual within-subject and error

suni-of -scluares and products mat rices:

SISEs = (S- l)-'SSs

Tlic espctctatioris arc as folIo~vs:

EIISSs = X E + RAmSs EllSSE = X E

s suggcsts an estimator for Ss:

h

Xs = (RI<)-'(lISSs - LISSE) (3.23)

ch is riot guaranteed to bc positive definitc. .Anderson (1985). .Anderson et al. (1986)

have ohtained modificd estimators. Thesc more that part of variancc that would make

(3.23) ricgatiw to the error variance. This is donc by first simultaneously decomposing

lISSs ancl LISSE:

- L/2 It mxj- bc achicved. for csarnple. by cigcndccomposing IISS ~ 1 ' 2 ~ ~ ~ ~ s l l ~ ~ , and Icft-

rnultiplying the resulting eigenvectors by I I S S ~ ~ . The estimator (3.23) now becornes:

n-tmc. as hfore. p is a dimension of covariance matrices. The idea is to rcmove this

part of 1-ariance that would make the estimator negative. This is achieved by escluding

thcse colunins of Z with corresponding eigenvalucs u < 1. Let p' denote tiie nuniber of

cigcrn-aliics v > 1, Lct D* be as D but \vit11 only the first p' elements. Let 2' be the

corrcsponding eigenvector niatris with only the first p* colunins of Z included. Thcn we

get tiie cstimator of Cs which is guaranteed to be positive-semidefinite:

Tlic varial~ility renioved from S s is attributed to error variance whicii giws the moclified

cwirriator for XE:

Hcrc the " siibscripts refer to tliesc parts of D. Z that w r e Ieft out in Eq. 3.26. If t h e h

iïcsc nonc. tlicn S E = AISSE. as usual. ,Anderson (1985). -4ndcrson ct al. (1986) show that

tliesc arc rnasimuni likclihood estimators undcr normality assumptions.

Design of the Simulation Study

\ \ c pcrformed a study by simulating the da ta from multivariate Gaussian distribution with

raiidom siihjcct and error effects. Ive only considered two classes with sarne covariance

structure in each. Thus n obscrvations would bc gencratcd according to the mode1 in

Eq. 3.3. Eacli observation is a realization of a 1D Gaussian process observed a t p points on

a line. The mean in class 1 and II \vas a discretizcd sinusoid and cosinusoid. respectively.

RegardIcss of the input dimcnsionality. p. ive set the frequency of the sinusoid so that 4

pcriods ~vould bc covered. \té chose this mean to refiect our interest in the functiond data.

The criicial issue is spccification of covariance matrices for subject and error terms.

-4gairi. to s t a ~ n-ithin the functional data framen-ork we used covariance matrices of an

isorrietric process. The co~ariance between two points only depends on the distance between

t ticm. Speci fically. the covariance structure [vas:

wliere --dis" was nieasured in nurnber of vosels separating XI. 12. Increasing a leads to

fast-clying correlations and thus to rough processes. \\é uscd a. = 5.0 for the error term

and 0 = 0.05 for the subject effect. This confornis to Our intuition that subject effects

tiavc soriic large smoothness propertics. \\-hile error should be mostly noise witti only some

spatial smoot hncss rcmaining.

Tlicse specify tiie correlation matrices. One of the paranieters of the siniulation n-as

VarRatio. Tlic error tcrm aln-ap had variance equal to 1.0 and the siibject tcrm variancc

\vas VarRatio= { 1. 10. 100}. Anotlicr paranieter u-as input diriierisionalit~: p. ii-it h tliree

dioicrs: (5. 10.30). 11-e also 1-aricd tlic nrimber of subjects. S = { 1.5. 10. "Y" 1. wlierc --Y"

rrimnt riiiriit~er of obsen-ations in cach class. .Y = (50.200). Togetlicr. .\: and S cletermined

the riunihcr of observations for a subject in each class. R = -\'/S.

For cacli conibination of {VarRatio. .V. p. S) 50 training sets. each of sizc 2.j: (two classes)

arid onc tes t sct of size 2 5.000 ('2 - 3.000 for S = .\; duc to computational limitations)

\vert gcncrated. Three LD-4 models. ivith three estimates of cornmon covariance matris.

describcct at the beginning of this section (Sec. 3.2.1). were estimatcd on the training sets

and applicd to the test sets to obtain estimated posterior probabilities for cach obsenation

in t lie test set. \\è considered two estimates of prediction error: Dev and SPE. dcfined as

follows. If a test observation xo came from class C = {I. II}. and {pl (xO)> pII(xO)} \wrc

t\vo estimated posterior probabilities then:

Dev(xo) = -2 logpc and SPE(x0) = (1 - p ~ ) 2

Thc tcst set cst imatcd posterior probabilities were t hen used to calculate both predict ion

error estimators and these were averaged over al1 obsemations in the tcst set.

\\'c analyzed the results of the simulation st udy using --I-\-O\*-L mode1 for 4-way factorial

design !vit11 replications. The four factors u-ere: {VarRatio. .\:.p. S) and tliere iwre =

{ZO. Z O O } rcplications. The tests and pvalues are used mostly as guides since the norniality

assiiinption is likely questionable. especially for thc Deviancc rcsponse d i ich eshibits man>-

outiiers. [\> anaiyze four responses:

SPE-ES : SPE(1I') - SPE(TE + Xs) (3.32)

Tlicsc four rcsponscs analyzc the differenccs hetn-een the standard witliiri-class ancl tivo

rriodificcl cstiniators of the covariance matris. on the Deviancc and SPE scales. \\C uscd a

siriiplc additive -ASO\:4 mode1 of al1 four factors nith onc interaction betn-een P and ,Y

terrils. In niany models that we tricd. al1 tcrms werc alw--s significant. biit as we mai-

tion abo\-c we n-erc not ovcrly concerned with p-valucs. 'rlore intcresting are the effects'

cstiriiatcs prcscntecl in Table 3.1. Tlicre are some clearl5- visible trends. -4s cspectcd froni

the rcsiilt in Eq. 3.16. incrcasing VarRatio leads to a bctter performance with moclified

estimators rcgardless of the PE metric used. The improvernent is much more pronounced

using the .-l\'+S" cstimator. Increasing the dimensionality of the data. P. also tends to

favoiir rnodificd estimators. although thcre is a surprising twist in the SPEWS case. Sim-

ilarl~.. the rnoclified estimators work bet ter ivith smaller training set sizes (N50) especially

in high climcnsions as the intcraction term N50:P30 indicates. This suggests that the mod-

ified cstiniators benefit from l o w r signal-to-noise ratios. It is an apparently surprising

finding. since in higher dimensions and with fewer observations onc niight have espccted

poor estirnates of variance matrices. Since v-e rieed two such estimates for the modified

Term 11 Dev-ES Dev-WS 1 SPEES SPE'rWS

Table 3 . 1 : Estima te o f Effects in the four .-LYO\ :-1 models o f the simulation results. The ternis arc input climensionalit~: P = ( 5 .30 ) as compared to P = 10. training set size -Y = 50 cornpared to -V = 100. number of subjects. Subjects=(5.10. .-S") compared to 1 snl~jcct arid ratio of rârianccs ofsubject cflect to error effect. VarRatio={iO. 100) compared to VarRatio= 1. Thcre is also an interaction term betn-een P ancl -V.

LD-4 as opposeci to orle poolcd n-ithin-class matris. the niodificd mcthod could he cspected

to siiffcr more undcr lowcr signal-to-noise ratios and in higlier climensions. One possible

cspla~iatiori is that the modified estimators use t h awilable signals more cfficient1~-. This

hj-potlicsis is partially supported cornparing *-E+S' and *.\I-+S'' estiniators: --E+S"

coiilcl bc cspcctcd to sbon- Iarger improvernent as it does riot use the within niatris a t all.

ririri tliis is iridecd the casc. -1nother possibility is tha t the truc CO\-ariance matrices had

\-Cr!- simple structure dcpendent only on a single paranieter a. It may be possible that in

t hat case more dimensioris act ually help in est imating t hese mat rices.

-4 good casc for modified LD-4 cornes, not surprisingly, from number of subjects' effect:

S u bjects. The baseline was Su bjects = 1 when al1 met hods are nurncrically cqui\.alcnt.

I\Ït h n siit)jccts. t here is understandably little effect of modifiecl LD-A. and in fact II.- + S

docs worse than the baseline. But for 5 and 10 subjects both met hods perform significantly

iletter t h r i the baseline, escept that again there is a huge reversal in the SPEWS column.

For -.E+S" estimator the effect for 5 subjects is itself enough to make this method better.

rcgardlcss of thc statc of other paramcters. for both mcasures of Prcdiction Error. Togetlier

wit h higher VarRatio values. the subject effect makes t hc modified estiniators. especially

' -EtS" very clear winners over classical LDA.

Ir1 gencral -E+S" modified estimator of the covariance matris performs some~vhat better

ttian the pooled u-ithin covariance matris in the contcst of classification. if therc are strong

siihjcct effects present. The improvernent seems to be more pronoiinccd with higher signal-

to-noise ratios. liere indicated by cither smaller training set sizes. higlier dimensionality

or hoth. The iniprovements are not large. liowever. and the!- arc probably smaller 11-hcn

pcnalization is included. It may be worthwhile to deyelop a modified PD,\ method for

PET/flIRI images that n-ould take both siibject and error covariance matrices into accoiint.

3.3 Dimension Reduction in LDA using Smoothness

Constraints and Penalization

;is clescril>td in the preceding section. Linear Discriminant -1nalysis rcsults in the set of

ortliogorlal 1-cctors in tlie data space. called canonical or cliscriminant variatcs. tliat bcst

scparatc the class mcans with respect to the \rit hin-class covariance. The total riuniber

of discrirniriant variatcs is one less than the numbcr of classes if tlie problem is of full

rarik. If al1 of tlic rariates are used. then LD-1 c m be derived as the LIasimum Likeliliood

cbst iniate of the optimal classification rule under tlie miiltivariatc normality assumption

n-it h a cornmon wit hin-class covariance mat ris (Hastie et al.. 1993. Ripley. 1996. pp9G).

Linear discriminant analysis is csscntially equivalent to canonical correlation analysis.

canonical variatc analysis and optimal scoring. in that any one is sufficient to derive the

othcrs. In thc contest of images. where the classes are espc~imental States. the LD-A can be

uscd to obtain the \.ariates (e-g.. Azan et al.. 1993, Rottenberg et al.: 1996, Friston et al.,

199G. Strother ct al.. 1996, Ardckani e t al.: 1998) in the vosel (or \TOI) space, that can be

iriterprctcd as activation images (or profiles). W i e n LD-1 is used with scans as inputs. thc

caiionical variates are usually called canonical images. \\*ith tn-O classes. as in our baselinc-

act il-atioii arialysis. the single canonical image may be interpreted as a pattern of the signal

driving (or driven by) the activation state. ...lgebraically. the canonical image is just a

riicari diffcrcnce pattern rescaled by the inverse of the estimated covariance matris. \Vit h

riiorc tliari t ~ v o classes. the principal canonical image gives this direction (in the scan space)

t liat rriost scparatcs the classes with respect to the within-class cm-ariance structure. In

t liat sense it is t hc image that carries the largest amount of information about the classes.

Siicccssive canonical images are chosen t o estract most of the information about the classes

in the orthogonai complement of the subspace spanncd by the previous canonical images.

Saive application of LD-4 to the images n-iil not work. Due to the ill-posed nature of the

problcni onc will not be able to estimate the inverse of the u-ithin-class covariance niatris.

Tliercforc. wc nctci to constrain the problem and bring its dimensionality down. \\'e achie\.c

i t in two KVS: hy constraining the roughness of canonical images and by pcnalizing t h

\vit liiri -covariance matris.

3.3.1 Basis Expansion of Canonical Variates

By coristraining the problem through imposing spatial smoothness on the resuiting canon-

ical iniagcs iiot only reduce the effective dimension of the problem and. thus potcntiall~.

t tic variance of the result. but ive also mode1 some spatial smoot hness wbich is known to

csist in the scaris. This is done by esprcssing the unknown canonical image(s) as a linear

conibinatiori of the known basis functions of some smootli spacc.

If .3(vl. h. 24) is a canonical image. indesecl b\- the location vcctor (v l . h. u3). we rcquire

that:

Hcrc. B,(ul. u2. 4) is a b a i s function in the voxel space. Slan- choices esist for a basis

set. \Ise have esperinzented with the tensor product B-splines (TPS) and \vavclet bases.

Hai-ing const rained the spat ial --roughncss" of oiir canonical images. u-e need to est imate

coeficicrits sj. LD-A works ivitli the scores. (3.i). which are (discrete) inner products

twt~vcen observed scan. i,. and the canonical image. Lsing Eq. 3.34. we 1ia.e tliat:

Tliiis to firici the coefficients y. in the LD.4 framework. n-e necd to projcct the scans

i ont0 ttic basis set. and treat those as the input to the LD-A. Tlie rcsiilting canonical

l-ariatc will bci a vector of coefficients y. n-hich will let us rcconstriict the canonical image

1-ia Ec~ . 3-34. Appendis B lias more details proving that smoothness-constrained LD-4 (or

PD.4) Icads to unconst rained LD-4 (PD-A) with the projcctcd data.

3.3.2 Penalized Linear Discriminant Analysis

BJ- irriposing the sinoothness on canonical images. we already rcdiice tlie ciirncnsionality of

tlic problerii: tliere will typically be feu-er hasis functions (B,'s: scc Eq 3.34) than voscls

(WC Iiavc a fe\v thousand basis functions and about 30 thousand voscls remaining in il's

after i~iaskirig is appliecl). The problem is still ill-posed. hoivever. and furthcr regularizatiori

is riecded. \\C liavc u-orked with the PD-\ model dewloped by Hastie et al. (1995). They

i~nposc a pcnalty on the n-ithin-class CO\-ariance matris (of the projected images data).

~ I i i c l i directly affects the canonical variates: herc 7. that result. Since WC ha\-e already

cspancled the carionical image in a sniooth basis, WC use a siniplc ridgc penalty which adds

a sniall \-alue to the diagonal entries in the estimated within covariance matris. This is

qui\-alent to imposing a penalty on the sum of squarcd variates' coefficients. [[yll'. The

PD.4 mode1 n-ith a ridgc penaltv is equivalent to the Carionical Ridge model of Iiinod

(1976). \diich is also being esploreci in fiinctional ncuroimaging by Sielsen ct al. (1998).

Tlic intuition behind penalization is as follo\vs. 11-hile u-e constrain the image to lie in

tlic smooth space. the srnoothness constraint can still be overcome by large coefficients. If

one specifies a large positive b, in (3.34). for a basis function jl tliat is ccritered at some

location. and a large negative coefficient î,, for a basis function j?. tliat is centered at a

ricarbj- location. then the resulting canonical image will have a steep dip between locations.

despite our efforts to impose smoothncss. How stecp a dip will depend on tlic type ancl

r i i i r r i h - of tmsis furictions and the size of tlie coefficients. Together thesc threc control

tlic effective smoothness of the canonical imagc. By penalizing the size of' the coefficients.

n.e control the amount of smoothness given the choice and niimber of the basis functions.

Tlierc is a frec tuning paranieter. A. that controls the importarice of the penalty term

relative to the criterion being minimized by LD.4. This is yet another esprcssiori of the

iibiquitoiis bias-variance tradeoff. (Friedman. 2994)

Tlicrc are some advantages to basis espansiori followed by penalization. If ttic hasis uscs

srrioot li functioris. like B-splines. projecting the scans on the hasis is siniiliir to smoot Iiing

t l i w i witli tlic kernel of the shape of the basis furictiori. Tlic \vliole metlioci. lion-c\-er. is ver?

tliffcrerit from pre-smoot hing t tic scans prior to analyzing t hem. 11-ith the basis-cspansion

itlcn n.c liavc t h power to impose regionally diffcrent amount of sniootliness or bandwicltli:

rnoreover tliis spatially varying bandwidth is utilized to masimizc tlic discriminatory powcr

of PD.4. The rcgional smoothness is again determincd by the individual basis furictions.

ttieir placement and tlic size of the coefficients. 7;. Thus. with y bcing sniall in somc regions

and large and variable in other regions. we are able to mode1 a \vide variety of possible

canonical images that eshibit smoothness in some parts and roughness in the others. Tlic

ability to control the overall size of 7 through a ridge penalty gives ils the ability to globally

fine-tunc the smoothness of the canonical imagc. Smoothing on some lewl is necessary:

The images arc re-constructed by a tomographie proccss wliich irnposcs spatial corrclation

(Pajcvic et al.. 1998). the rcgistration techniques arc not perfect (I<jems et al.' 1999). and

thc actual hcmodynamic responsc of the brain. which PET and fllRI methods use as a

prosy for iieiironal activation. has spatial estent on the order of 3-5mm (llalonek and

Grim-ald. 1996).

The basis expansion idea is similar in spirit to Ruttimann et al. (1998). There the

authors also use a specialized (wavelet) basis to inducc a prior and reduce the dimensionality

of tlic f3IRI data. Their approach is geared more towards deriving inferential statistical

tests on the obtained activation images. a task made casier t~ the orthogonaIity of the

basis (which lcatls to the near-orthogonality of the coefficients). while Our emphasis is on

tcstirig generalizability via prediction error. in a non-parametric way. -4 second important

diffcrence is in the input space to which the basis espansion is applied: WC concentrate on

t h LD.4 approach. which works with the whole-brain spatial covariance. whilc Riittimann

c3t al. (1998) appIy the espansion to thc vosel-based poolcd diffcrence image. basically

\corking with tlie first moment statistic.

To simlrnririzc oiir method: the data matris is obtaincd by projecting cach image. as

i n Eq. 3.33. iising a cliosen basis. B,. Then the PDA is applicd to tlie projcctecl data for

sorric \-l.iliie of tiining parameter A. which results in the canonical \-ariates niatris T. Finally.

canonical iniagcs arc thcn rcconstructeci via Eq 3.34.

3.3.3 Penalized Discriminant Analysis and Statistical Paramet ric

Mapping

Sonic intercsting analogies to the vosel based rnetliods that rely on (possibly scalcd) im-

ages tlcrivcd from the differencc of class-averaged iniagcs. like SPlI. cari bc established by

consiclering the two-class problem. I\é have alreadj. indicated thc basic difference in the

gcomctry of hoth approaches in Section 3.1. on page 47. Disregarding for a rnomcnt the

basis projection step. the single canonical image from PD-4 is:

wlierc Sii- is a pooled within-class covariance matris. pl. pu are class-mean images. and c

~ior~iializcs the image to iength one. Therefore. the canonical image is a rotated. rcscaled

1-crsion of the simple class-mean difference image (AIDI). w-herc the rotation and rcscaliiig

attcnipts to ccjualize the variance and to decorrelate vosels, whiie the penalty term works

i r i the opposite direction.

For vcry large values of A: when the penalized within-class covariance matris becomcs

csscritially (a constant multiple of) the ident ity. the canonical image is a scalcd AIDI. This

assiirrics tliat variances across \-osels arc equal. and tliat voscls arc uncorrclüted. and thus

rcsc~rililcs thc \-osel-wise t-map n-i t h pooIed variancc estirnate. For modcrate A. Ive can

cspcct SI\. + XI to l x diagonally dominant with possibl>. cliffercnt diagoiial eIemcrits. Thc

rcsulting image \vil1 be similar to diagonal1~- scalcct ,\ID1 d ierc each voscl in tlic ,\ID1 is

rorriparctl to its variance (noir resernbliig the 1-oscl-wisc t -map n-ith indi\-idual \-oscl vari-

m r c cstirriatcs). For small A. ~ - e get close to full?- cstimating t h witliin covariance rnatris

arirl rotatirig/scaling the '\ID1 to accoiint botli for local variances. as ive11 as. covarianccs.

\\'it 11 tlic interna1 optimization. wc let Prccliction Error dccidc how much information \SC

Liavc to niovc away froni the unrealistic assurnption of homoscetfasticity and independence

across 1-oscls.

Th tcnsor-product basis projection is helping the estimation by somewhat decorrelat-

ing thc variables because it models part of the spatial covariance structure. \f-c can espect

tlic covariarice niatris of the projected data to bc more diagonally dominant than in the

uriprojcctcd space. This in turn results in better estimatcs of the covariance matris. 13-e

thereforc espect PD_& to be a more flexible rnethod ttian SPN. with one hyperparameter X

t h t is able to control the tracteoffs betwecn increased flesibility of full covariance normal-

izatioli and neccssity of simplifying assiimptions of homoscedasticity or diagonal covariance

niatris.

PD,\ also eshibits some similarity to the other well known method in Seuroimaging

callccl Partial Lcast Squares (PLS) describecl in NcIntosh et al. (1996) and in Section 2 -33 .

(Therc is anotiier well known algorithm. also called Partial Least Squares. widely used in

tiic Chcmomctrics community For some statistical description see. e-g.. \l:ald et al. (1954)

aricl riotc that PLS described here is a completely different algorithni). PLS starts wit h

clecoiiiposing 1'1- using SVD. 11-here S is. as in our case. thc scan matris. and 1 - is an

a r t ~ i t rary design a na tris. Furthermore. PLS providcs an interesting paradigm for choosing

a xirinihcr of significant components t h result from S\-D. In light of the correspondeiice

t~ctu-ccn CC.\ and LD-1, which ive prove in the nest section. PLS ma? be seen as an

iiririornializetl version of LD-1: in LD--4 one looks at the singular value tleconiposition of:

(sec Eq.C.4). whidi is followcd by rotating back the left-hand singiilar wctors. -\nother

u-a?. to look at it is via the orthogonality constraint (Eq.3.11): PLS uses a Euclidean metric

wliilc LD-A nornializes to unit>- variance using t hc u-i t hin-class covariance mat ris cstiniator .

=\ssriiiiing t k t t hc variance can be estirnated effect ively LD-A normalizat ion is prefcrable

as it puts al1 vosels on an cclual footing. Of course. we cannot estimate the full covariance

niatris of all \-osels (or basis functions) and Our rccourse is to use pcrialization. One can

agairi establisil sonic analogies for different values of the ridge hyperpararneter. Similarly

as in the previous paragraph. we observe that for a small number of degrees of freedom ive

riiay cspcct our rcsults to be similar to PLS ones. as the lcft-hand normalizing matris wi11

bc close to a constant multiple of i d e n t i t ~ The right-tiand normalization simply reweiglits

the observations by thcir class sizcs.

3.3.4 PDA via Regression

In this section we will shou- hou- to obtain the Canonical \ariates of the PD-4 model

usiiig two steps: penalized regression followed by the eigcndecomposition of the regression

rcsiilts. Oiir proof is different from that given in Hastie et al. (1995) and relies only on

matris algcbra. \lé also feel it is more appropriatc for the ncuroimaging community as it

hirigcs more closcl>- on the current approaclics u-idely used in this domain.

in tIic ncst section ive will show hon- to "train" the PD-\ model. given -V scans (possibly

projcctcd) as input. and how to predict from this model ivith O(-\-) computational effort.

t Iiat is witliout dcaling with large p x p matrices. This is of vital importance as we use

rcsarnpling techniques to est iinate t lie prcdiction crror. \Vithout t his estension obtaining

tlic hiiricircds of mode1 estimates needed by the Bootstrap alid cross-validation ~vould bc

cornpiitat iorialiy prohibit ive.

\\-c start ivitli an unpenalized version (LD.4) and first show that tlic closel'- related

riiclthod. Carioriical Correlation Anal'-sis (CC-A). cari be esprcssccl as a rnultircsponsc rc-

grcssioii follon-cci tq- an eigendccomposit ion. l\-c thcn prove the relat ionship betwen CC;\

aiid LD--4 and. firially. introduce penalization and describc a n estension to deal with the

iriiagc data. Our proof differs from one in Hastic et al. (199.5) in that ive ciirectly apply it

to tlic CC-\ forniirla (Eqs. 3-40 and C.1).

CC-\ is a syrnmctric method that. given two sets of variables measured for cadi obser-

t-atio~i. x, y. seeks tn-O linear combinations that eshibit ma.xima1 correlation. That is each

otxerration is composed of {xi. y,}' wtiere x and y arc in general of different dimensions.

01ic atternpts to summarizc the data by finding two lincar combinations. 6. a of x and y.

respect il-ely. such ttiat:

is rnasiniizcd.

One cstcnds the mcthod by finding al1 such possible directions. ak. bk that siiccessively

masin~ize the correlation and are orthogonal to previously found pairs. Since:

var(aTx) = a T ~ a r ( x ) a and cov(bTx. aTy) = aTCov(x. y)b (3.39)

tlic probleni is to find matrices -4. B. with linear combinations in their columns. such that:

1 T :V- B S,,;L (3.40)

is masimized subject to:

ivhcrr Sr,. Sv, and S,, are respective covariance and cross-covariance matrices . This is a

gcncralizcd SI-D problem (LIardia ct al.. 1979. pp. 282). and one can show (.Appendis C )

that it rriay be solved. after suitable normalizations. via multiresponsc regrcssion of 1- ont0 h

S. lollo~vcd by tlie eigenaiialysis of l - T l - (where F arc fittcd values frorn the regrcssion stcp).

In our case, .Y denotes the data rnatrix (n-hose rows are scans or projected scans).

and 1- is the .\- x .I class-indicator niatris. with 1's denoting the class of each scan. The

classes arc cspcrimcntal conditions: tierc cit her two ciasscs denoting the -Act ive/Basclinc

States. or cight classcs denoting the temporal order of tasks.

It is known (and we rederive in it Appendis D). tliat the Canonical lar iates (C\*'ç)

associatcd with x are. up to a scaling factor. the same as the canonical variates that result

from LD-4 (Hastie et al. (1995).(lIardia et al.. 1979: Es. 11.3.4)). In -4ppendis D we provc

that:

n-licre D is a diagonal matris. Thus ive show how one obtains the canonical variates of

LD--4 by rcscaling B.

-4s rricntioned. the unpenalized version is unsuitable as it rcquires an inversion of a

singular within-class covariance matris, XI\-. To remedy that tve apply penalization to Eli . .

or ccpivalcntly. to the total covariance matris S,,: which results in a penalized regression

s t cp:

For an? positi\-c definite rl. tliis rnakes STS + AR inwrtible. In this paper u-e use ridge

pcrialty. O = In. and tlien Eq. 3.43 defines a ridge regression solution.

3.3.5 Expressing the PDA algorit hm in the N-dimensional space

llajor effort has been spent in deriving efficient computational algorithms presented in

t liis t liesis. The importance of computational issues has increased great ly in statistics.

often due to the resampling niethods. t hat apply an- given algorithm many timcs. due to

popiilarity of siniulatioiis where the coniputer-gcneratcd data of large size is used to test

tlic rriodel. or sinip!? due to ever increasing amoiints of data the statisticians have to cleal

{vit 11. -4s nicntioncd in the introduction and throughout this chapter. images constitutc

a specially challcnging form of the data due to their sizcs. \ \Ï th a 125 x 2 2 3 x 48 PET

scan. wc arc deaIing with 786.432 vosels: and if each is storecl as a floating point nuniber

of single prccision. eacli scan occupies over sis megabytes of disk space. i \ ï t h clozcns of

iniagcs a\-ailable in one data set. the computationall\- efficient methods are a must.

The PD-\ algorithm presentcd in Section 3.3.4 ivorks in the pdimensional space, ivherc

p is the ~iiimher of 1-osels or b a i s functions. The only place n-here p dimensional quantitics

arc iiecded is in the ridge regression step (Eqn. 3.43) and when the canonical variate (I)

or image (Eqn. 3.34) are constriicted. In particular. in the ridge regression step. it appears

tliat ive need to form. and invert. a huge p x p matris STS + X I . In ;\ppendis F we show

tliat the fi tted values >'. of ridge regression step may be computed using only .V-dimensional

quantitics. wherc .V is a number of scans. In order to do that. one nceds to precompute the

outer-product matris. SxT, an espensive step rvhich needs 0 ( : V 2 p ) operations but that is

pcrformed only oncc.

Sincc 11-c usc rcsampling methods to scarch for optimal X and thus need to run the above

algorithni. with a given set of training inputs S. 1- man- times. .Additionally. since we then

oril'- need to compute the posterior probabilit ies (and Iience. predictecl class memberships) .

it pays to precomputc G = -YST once and then apply cross-1-alidation or bootstrap. Each

bootstrap samptc may thcn be obtained selecting only those ro~vs/columns of G that

correspond to the sample observations. and thus forming G'. the bootstrap version of G.

This. after full-data G is coniputed, lets us operate in the -\'-dimensional scan spacc

for as lorig as 11-c do not need to compiite the canonical variatc BtD,\. Since the postcrior

prot~ability estirnatcs can be obtainecl using only 1.: fitted values F. right-hand eigerivec-

tors -4. anci cigenvalues D,. (-Appendis E) ive can pcrforrn the optimization of the ridge

parameter X in the loiver-dimensional spacc of the observations. -Appendis F siiows how

ridgc rcgression ma'- bc computed using only ,Y x 3 matris G and matris 1- of class indica-

tors. ;\ppendis G slimvs the algebraic trick of compiiting ccnteretl version of G: G = .<-s'.

wlicrc .i- Iiatl its colurnns means subtracted. using the uncentered G. Iri fact. since in the

rrsaniplirig nicthods ive use the siibset of the data as a training set. this ,-\ppendis shows

lion- to ccnter the partial G* correspondirig to the subset chosen by rcsampling. and how

to ccntcr the ron-s of remaining observations witii the column mcans of training set used

in fit ting thc PD-I. al1 using the once-computed uncenterecl G .

3.3.6 Effective Degrees of Fkeedom

Liriear statistical arialysis defines a notion of degrees of freedom (d-f.). These specify a

diiiicnsionality of the space ont0 ~rhich ive project the data. and in thc casc of iid Gaussian

crrors. thc cspccted drop in Residual Sums of Squares if only noise variables are included

in thc rriodcl ( c g . Sec. 3.5. Hastie and Tibshirani. 1990).

Sincc we have espressed LDA (and PD-\) with a regression as a building block LW can

carry over thc notion of d.f. In the univariate, full-rank (LV > p). linear regression case,

tI.f.=p. t lie numbcr of variables. Thcri also:

p = trace(^(^^^)-'^^) = trace(H) (3-44)

A

u-hrrr H is a projection ("hat") matris (i-e. 1 - = H l *). Bj- analog'. in the ridge regression

case ive can dcfine the effective degrees of freedom (EDF):

EDF = trace(S(X))

1979. Hastic and Tibshirani. 1990): wit (Crave and Ilaliba. I i S ( X ) = S(STS + AI) - 'Sr .

a pciializeti .projection" matris obtained in the rcgression step of PD;\. Plense note that

(3.45) is not the onlj- possible definition for EDF: see [e.g..]flHastie and Tibshirani (1990)

for ot lier possi bili t ies.

In oiir casc. n-ith n < p. wc c m cornpute EDF using only rnatris G of outer products.

-4 bi t of algcbra sho\vs tliat:

.v O j EDF = trace(S(A)) = -

C t j + X J = L

\vhrrc ci, are cigcn\-a1uc.s of G = SST. saiiic as non-zero eigenvalues of S'S. This

siion-s tliat EDF combines the t uning pararnetcr n-itli the srnoothricss inlicrcnt in t tir b a i s

rq~rcscntation. and is niore informative tlian unscaled A. The EDF vari- from 1 (for X = x

siricc the pcnalization is applied to the centered G) to -\-. for X ncar zero.

3.3.7 Prediction Error and its Estimates

Tt slioultl l x reatized. that the need to impose constraints is more than just a riurnerical

nccessitj-. It affects the generalization ability of our model: i-e. whet her t lie result ing

activation map will be interpretable and significant. or whether it will be overshadowed

by noise and peculiarities of the da ta at hand. That is directly rclated to the predictive

performance of tlie model: if the model lias not bcen constrained enough (and in the "right"

way). t h it will not bc able to ciassify a new scan t h t \vas not uscd in training the model.

TIic gencralizability of functional ncuroimaging models has been addressed before (Kip-

periliarn et al.. 1994. Lautrup et al.. 199.5. Morch et al.. 1997. Alorch. 1998. Strother ct al..

1997. 199Sa. Hansen et al.. 1999). In particular. lIorch (1998) contains a good introduc-

tion to gcncralizat ion error. predictive performance and bias-variance t rade-off issues in t hc

contest of neuroimaging. Evcn though prediction is not the main goal in anal-zing PET

i~riagcs. one ncecls to be concerned about the generalizability of the activation maps (hcre.

canonical images) derived. Prediction Error (PE) is a u-ay to mcasure the generalizability

of mir niodcling proccss. \\é use PE (or. rather. its estimate) as a function of EDF. in thrce

ways: to choosc the amount of smoothness. to assess the final usability of derived patterns.

and to compare different data representations.

A Probabilistic F'ramework.

Lincar Discriminant .-\nalysis. wkiile first establishcd by Fisticr as a sensible proccdurc

rrgartIIess of distri hutional assumptions (AIardia et al.. 1979). can he rcderivccl within a

probal~ilistic framework. If one assumes that tlic scans comc from tlic multivariate Gaiissian

~Iistribirtiori. and assunies that these Gaussians have the sanic covariance structure arnong

ciiissc~. thcri the LD-A çan be derived as a plug-in Ba'-es classifier for the data n.it11 the

ilsual estinlates of class-mean images and covariance matris.

Specifically for a n iniage i(') from class k. lt E {l.. . . . I < ) . let i(') N(pk. S ) . In

gcncral t h Bayes classifier would assign new image io to that class ko which masimizes the

post crior prolxibility:

wlicrc P(io 1 k) is a class-specific likelihood~ here Gaussian. P(k) is a prior probability of

ot~scrving an image in class j. and P(io) is a normalization constant.

Since the covariance matris is assumed the same for al1 classes. the only class dependent

comporicnt of multivariate Gaussian likelihood is the argument to the csponential function.

or :\lahalanobis distance between io and the mean of class k, p k :

D(io. pk) = (io - p k ) T ~ - L (iO - p k ) (3.48)

It is an cstablislied fact (Hastie et al.. 1995. Ripley. 1996. pp96). that the llahalariobis

distance is a Euclidean distancc when the image and class means are projccted ont0 all

carioriical \-ariates. Thereforc LD-4 rcsults can be uscd to obtaili both posterior probabilitics

and classification by:

where C norrrializcs the probabilities to add up to one. BLDII is the rnatris of canonical

variatcs (ir i colunins), which iras derived via the route convenicnt for us. in section 3.3.4.

Eq 3.42. ariti n k arc estitilateci prior prol~abilities for each class.

Prediction Error Measures.

\\'c ricrd to clcfiric a suitablc nicasure of Prediction Error in the population. \\C have iiseti

rwo siicli rncasrircs: 1 lisclassificat ion rate (1IC rate) and Squared Predict ion Error (SPE).

A I C ' rate is a protxhi1it~- of misclassifying a new scan h ~ - thc modcl fittccl on the training

tiat a. I r is a roiigh rncasiire. wit li t hc discontiniious 0-1 penalty for triisclassification. Th i s

thr rriocicl n-hich givcs postcrior probabilitj. of 49% to t h correct class. n-il1 score thc

smir crror ori tliis scan as a mode1 wiiich gi\-es 1% posterior probat~ility. assiiming a 50%

tlircstiold is uscd as in the 2-way classification problern.

,4riotlicr measure of PE with a more rcasonablc metric is Squared Precfiction Error.

SPE = (1 - cc)'. where pc is the posterior probabilitj- estiniated by the mode1 for the

correct class. One coiilcl also use deviance (minus twice the likelihood ratio). w l l knolvn

froni t lie t hcorj- of Gencralizcd Linear llodcIs (.\IcCullagh and Sclder. 1989). here simply

-2 log ljc. \\> esperienced erratic behaviour of t his mesure. because posterior probabilities

wcrc oftc~i close to zero or one: deviailcc puts a ver>- largc penalty on cascs wlierc pc ==: O.

This issue has also bcen addressed by Hintz-.\ladsen e t al. (1998).

Resampling Estimates.

The above are population parameters. conditional on the model and the training data. \\é

necd to clcrive t heir cstimates. \\é use (5-fold) cross-validation (CV) and the bootstrap

rcsanipling techniclues (Efron and Tibshirani, 1993). The 5-fold Ci ' estimate is derived by

first raridomly dividing the data into 5 equal-sized parts. Then the modcl is trained on

4/5ths and iisecl to obtain predictions for the remaining l/3th of the data. This is rcpcated

fivc tirries. for eacli of the five distinct training/validation set divisions. The predictioii error

is ari al-mage of errors accumulated over the five validation sets. The Cl- process mimics

the situation whcre ive have a set of independent obsen-ations on n-hich to cstimatc the

prediction crror. Howver. using a five fold C\- results in an cstimate M-hich is biascd due

to t hc tliminislied size (80%) of the Cl--training set. -Uso. thcre is a variability associated

with nlari>. possible 1vai.s to divide the data into five parts. Thc .63'2+ bootstrap estimate

wis tIcsigriet1 to remcdy that. and has been shon-ri to oiitperform CI- in simulatiori studies

(Efron and Tibshirani. 1993. 1997).

Tlic .63'>+ bootstrap procedure is a refinemcnt of the "lcavc-one-out" bootstrap a p

proach. wliidi ive non- describc. One obtains B bootstrap samplcs. with replacement. from

tlic original data (in oiir case. B = 50). Then a model is huilt on each samplc and tcstcd on

t lie ol~scr\~ttioris t liat wcre (b chance) not included in the samplc. Tlic resulting precliction -(Il

crrors are avcraged to give PE . the leave-one-out Bootstrap PE cstimate.

It shoiild bc rioted. that we have used subjects. eacli with al1 his/her scans: as a sampling

iinit. for both Cl- and bootstrap resampling techniques. Otherwisc large negative biases in

thc pcr-scan PE estimates will result due to the large between-subject variability in t h s e

data sets (Strother et al., 1995a.b).

The .G32 correction (Efron and Tibshirani. 1993. Ch. 17) !vas derived to correct for the

(positi\-e) bias that results since each bootstrap sample contains only 63.2% of the original

sariipie. on average. Thc -632 estimatc is:

or a weiglited average of the leave-one-out bootstrap cstimatc and t h training error on

al1 of t h data. Tliis will non- underestimate the PE for inodels which liighly overfit and

Iiavc training crrors close to zero. The '+' correction attempts to dcal with that. b - first

estiniating thc no-information error rate. which is dcfiricd in the population as:

Tliis means the following: assume distribution. Find. of data points consisting of prcdictors

and rcsponscs: { t . y}. sucli tliat the marginal distributions of predictors aiid rcsponses is t hc

sarric as for t lie observed data. but the two are indepcndeiit: t hat is t here is no information

in t b o t y . Let r z ( t o ) denote the precliction madc te- our mode1 at point {to. .qo) frorn

f,,,,~. traincd on the available data x. The point { t o . go} is also indcpcndent of the training

set x. Thc fiinction Q(-) is the prediction crror measure. SPE or niisclassification ratc in

our case. -A possible enipiricai estiniate of 7. suggested by Efron and Tibshirani (1997). is:

whicli is an crror ratc computed for our data using al1 :V2 pairs of predictor and responses.

t h cffcctively mises up the two and destroys th& relationsiiip.

- --b h

For a misclassification rate' ^j = ?, (1 - 3 ) + (1 - xl)p,, where i i l is a proportion of

class 1 ohsemations. and p7 is a proportion of observations predicted b. rz to belong to

class 1. The rnulticlass estcnsion is 7 = xi=, ?,(l - 6). For SPE Eq. 3.52 becomes:

wherr F(j(i') 1 i i ) is an estimated posterior probability (as in (3.49)) of the class that a scan

i' bclorigs to. Onc may calculate (3.53) in the following wa~-. Class j will be a "correct"

class T I , timcs for cach obscn-ation (where n, is nurnber of obser~ation sin class j ) . Let P

dcnotc tlie S x J rnatris of estimated posterior probabilitics. Then of equation 3.53 \ d l

bc ccpal to the al-erage of ail row-sums in the scaled P. where each column j is multiplied

by T l ] .

Tlir rio-iriformation error rate is uscd to form the wight. E. for the cont-es combination:

as a rcplacemcnt for (3.50). The weight 2 is formcd in thc following way: first definc the A

relutive overfitting rate. R:

Rclat i\-e ovcrfit ting rate measures the overfitting by the difference bctween the lcave-one-

out bootst rap cstimate and the training crror. relativc to the -'piirc" ovcrfitting as nieasiircd h

11'- tlic diff~rcricc between no-information rate and training crror. R varies bet~vccn O - if

t licrc is no t ias in the training error - and 1. if there is --full" overfitting: tliat is when -( 1 ) P E ccliials to rio-information error rate. T. The wcight E. defincci as:

A

\ -ar ia from .631. when R = O to 1. \\'ith iL' so defined. Eq- 3.34 is scen providing some

met hocl-hscd adaptivity to Eq. 3.50.

3.4 A Note on Gaussian Assumption

The LD.4 procedure. as derived by Fisher. does not necd to rely on Gaussian distribu-

tional assurnption which is also true for its penalized and srnoothness constrained version

described hem. The only place where the normality is used is in estimating posterior prob-

ability ancl thus in estimating Prediction Error mcasures dcveloped in section 3.3.7. 'The

qucstioxi of the 1-alidity of Gaussian assumption becomes then a cli~estion of the validity

of PE cstimates. Onc may conjecture that the departures from Sormality would have

dctrirriental effect on the predictive performance of oiir method. n-hich would lead to larger

prcdic-tion errors than one would obtain N-ith the similar. but Gaussian distribiited data.

Perhaps more important. howm-er. is the value of the ridge parameter X wlicre the mini-

niiirn PE happens. as this determines the final image we obtain from the arialysis with a

givcri hasis set. It is quite possible that the departure from Sorrnalit? changes tiic shapc

of PE curvcs (likc these in Fig. 4.3). -Again. Ive do not fccl that the location of the minima

n-oiilcl drastically change with the departure from normality (as this location is clearly in-

rlcpcncicnt to the monotonie transformations of the PE). but acknowlcdgc a necd for some

rol>ustricss studies in that matter.

Finally. n-e have some consolation in that a t least the PET data may ~ io t bc ver? far

frorn Gxiissiari. Each voscl is based on a linear combination of a large nunibcr of raridom

photon counts (sec section 2.2.1) and we thus hopc the Gaiissiari approximation to Poisson

wilI n-ork. i\'ith snioot h basis cspansion t hesc (rcconstriictcd) counts arc f u r t h sniodt Iicd

u-itli a large riunibcr of ncighbouring oncs which hopefiillly @\-es us a possibility tIiat Central

Liniit Tlieorcrn m q bc applied.

3.5 1s Ridge Penalty Enough for the B-spline Basis?

Ridge penalty is ver - convcnient for us to use computationally but the question is whether

is pcnalizcs "the right thing". BJ- that WC usually mcan high frequenq- componcnts or

liiglicr dcrivatives. In one dimension B-splines are usuallj- uscd with second order derivative

pcrialty (cg . . Hastie and Tibshirani: 1990): which results in the natural cubic smoothing

spline fit: siniilar penalty could be composed using 3 dimensional tcnsor product B-splines.

O'Siillivan (1991) gives an very remarkable algorithm for composing an eigendecomposition

of a discrete Laplacian penalty matris, that penalizes square of the sum of the second

dcrirativcs of tlic data. -Alsol there esists a wide literature on thin-plate splines (e.g..

Green and Silverman. 1994) that are the most popular n-ay to estcnd the cuhic smoothing

spliries to higher dimensions. ,411 these methods require that one handles p x p matrices.

wtiich is corripiitationally prohibitive in our case.

In tliis section we show. in a semi-forma1 way. tliat even thougli the ridge penalty is tiot

opt inial in t tic sense of penalizing second tierit-atives, it still behavcs reasonablj- in t Iie sense

t iiat liiglier frcqiicncy coniponents are penalizcd more. The intuitive support wts giwn in

scctio~i 3.3.2.

Let ils start C- looking a t the simpler regression probleni. Let:

wtirrc E - -V(0. Lx,) arid f (x) is a regression fiinction to bc estimatcd. If n-e want to

coristrairi f (s) to bc smootl~. Ive can espand it into some basis of snioot h furlctions. Lct

{ B ( ) I I ~ such a basis. Ttien. if ive denote the evaluatccl basis (n x p) niatris B.

~ v c lia\-c:

by fit ting t lie ridge regression onto the smooth basis. In the contcst of B-splines. we woirld

have rnatris B as the B-spline matris: i.c.. p B-spline basis each evaluated at n design

points. x,. If WC isanted f to be a natural cubic spline, simple ridge penalty is not cnough:

in oric dimension. \\-e lia\-e to use p x p penalty matris:

ij'c would like to avoid using more the complicated RI sitice:

r \\c opcratc in 3-D. -4lthough there are extensions to higher dimensions (like thin-

plate splincs). we would like to use simpler tensor product basis, for which "propcr"

Q is not easy to calculate

Simple ridgc is much more feasible computationall~ in oiir case. as it lets us calculate

f i ts in the S-dimensional scan space, as shown in Sec. 3.3.5.

-4 rnore formai. functional sctting for the above problem is as follows: find furiction f ( - )

tiiat riiinirriizes the penalized regression problem:

Reinarkably. one can show (cg.. i iahba. 1990. Grecn anci Silverman. 1994) that the mini-

rriizing function is a cubic smoothing spline with kriots at distinct values t,.

Let ils diagonalizc thc regression ecliiat ion (3.58). b>- dccomposing B = L'D, I -T. usirig

Sirigi~lar Ihluc Decomposition. Herc C; and 1 - are left and riglit orthonornial eigen-vcctors.

rrspwtivel?-. and D has corresponding eigenvaIues on its diagonal. The problcni non- bc-

('0111CS:

11-e Iiavc csprcssed the simple ridge rcgrcssion problcm (3.58) in the orthonormal basis

L-. Tlic penalty associatcd witli basis function j is ( A + -j:)/$. showing that the basis

associat cd n-i t li larger eigenvalucs are pcrialized l e s .

Thc cpcstion now becomcs: when arranged by decreasing eigenvalues. are the ort honor-

nia1 basis furict ions increasing in ~~cornp les i t~" . t hereby n-arranting highcr penalties'? Thc

partial arisnvr mai- be ohtained by looking a t figure (3.2). Here we have obtained the

orthonormal basis for the 23-spline problem in 2 dimensions. I t is quite visible, that the

"n-iggly-' basis are penalized more.

To go back to PD,% with basis expansion. tve look at the following problem: find function


3 ( t ) such that:

is niiriirnizcd. This is a penalized regression problem and a solution for .3 ( t ) is again a cubic

sinootliing spline nith knots a t distinct values of t,. This problem is covered by a special

case of t hcorcrn 1.3.1 in \Valiba (1990). the so-called generalized smoothing spline problem.

The dctails are esplored in. for esample. Hastie and Tibshirani (1993). The caveat for

tis is t hat one possible n-al- to sol\-e the problem is to espand the coefficient J(-) in cubic

B-spliric basis. and apply the second order penalty matris R (Hastie and Tibshirani. 1993).

Our approach of cspanding the Canonical Image icrcl B-splinc basis has tlic same flavour

and (for 1-D case) would result in the cubic smoothing spline if the riglit penaltj* matris

m sas uscd. \\-c can use the heuristic argument of the previous paragraph to justify the use

of thc ridgc penalty instead.

Chapter 4

Results with B-Spline and 3

dimensional Wavelet s

4.1 Wavelet Basis

.As it \\-iri bc apparent from the results presentcd belon-. wavelcts have siion-n tiiemsclves to

hc a possihlj- more efficient reprcsentation of the Canonical \ariates in FOPP rieiiroiniaging

problclil tlian B-splines. Fewer components basis are recpircd to rcprescnt tlic signal and

t Iicir prcdictivc propcrt ies arc siiperior to t hose of B-splincs. iri two class sett ing. alt hough

tlic rcsults are surprisingly different in the eight-class problcni. In this section we wiH

introduce some propcrties of wavelets and pro\-ide partial justification for the choices we

riiadc n-hen using wavelets for Canonical \Tariates bases.

III t his section wc will introduce wavelet bases. mult iresolution anal~.sis and the wavelet

trarisforni. Tliis general discussion was adapted from three excellent books on wavelcts:

Ogden (1997) Tliis is the first? and a very readable, book on statistical analysis using

nm-clcts.

Vidakovic (1999) This is a more comprehensive book on wat-elets also designed for Statis-

ticians. It offers a more complete theoretical framework and has a \vider spectrum of

ausiliary wavelet topics discussed

Burrus et al. (1998) This is a book u-ritten for cngineers in signal processing fields. It

offers a n escellent discussion on wavelet filters and their implementation. together

11-ith a good introduction to signal processing and filterbank theory.

4.1.1 Wavelets: Introduction

BJ- a ~vavclct one usually means any family of functions tliat is coniposed from a single

mother ,wnvelet function, d ~ ( x ) (Plcase note that we will use a custornap v-avelet notation

licrc. n-liicli m q - conflict with previously introduced symbols). It is assumcd that the

rriot lier wavclet sat isfies an adniissihilit?- condit ion:

u-ticre ik (;) is a Fourier transforni of the mother wavelet. Loosely speaking. condition (4.1)

says t h t the u-at'clet's power miist bc concentrateci in higher frequcncies. Since the n-avelct

is in L'. tliis cffectiwly means tliat the wavelet must be a band-limiteci function. One casy

corisecluence of the admissibilit?- condition is that @(O) = O n-hich in turn implies that:

1 e(x)dx = 0: (4-2)

that is. the motlier wavelet must average t o zero. It is also c u s t o m a r ~ ~ to normalize the

mot her wavclet to unity norm:

Gii-cn a mother wavelet, one constructs the wavelet basis by diadic dilations and integer

t rarisiations:

q:j,k = 2~/'~(2Jx - k) j7k E { O ? &l: *2:. - . }

The scaling fiictor 2j I2 keeps the unity norm. The translation indes k is easily understood

for J = O: it generates a sequence of mother u-avelet translates. each moved to the right or

lcft by an integer. The dilation index. J rescales the x-asis. compressing or espanding the

rriothcr wavelet: it does it in the units of powers of 2.

C-ndcr mild conditions. the wavelet system is orthogonal:

iising a Kronecker 6 symbol. Thcre esist non-orthogonal wal-elet systems: they arc then

iisiially bio-orthogonal. Bio-orthogonaI systems have two sets of u-avelets: orle to project a

furiction onto. to obtain the waveIet coefficients ( the analysis wavelets) and one t o recon-

struct a funct ion from. iising thc n-avelet coefficients (sgnthesis wavelets). Bio-ort hogonal

sj-stems riiaintain cross-orthogonality between the analysis and synthesis wavelcts. The

iisiial ort tiogonal system is a spccial case wherc the analysis and synthcsis systenis are the

sariic. \\C xi11 01115- concern oiirselvcs with thc orthogorial ivavclet systcrns as they have

statistical propcrtics tvhich are better undcrstood.

4.1.2 Orthogonal Wavelet Basis and Multiresolution Analysis

\\C iiintcd above to the fact that the wavclets constitute an orthonormal basis for some

fiirictional spaces. Of these. the most important is L2(R)? the space of al1 functions. f (-).

with a firiitc Li, norm:

Ariotlicr wll-known basis for L2(P) is a Fourier basis.

Givcn a wavelet basis for L2 one can decompose any function f (x) E L2 into its wauelet

coeficients:

and this mapping is one-to-one, i.e. the decornposition is rccersihle:

Equation -1.7 is called an analysis equation and 4.8: a synthesis equation.

Oric spccial property that distinguishes the wavelets from other basis is the l lul t i Reso-

Iiit ion .Anal>-sis (.\IR.\) propertx ahich ive rvill now discus. Let us imagine that the (infinite

diinciisional) space L2(W) lias the following decompositioii:

such that:

J J

Hcre. tlic I ; arc siibspaces of L' which contairi fiinctions of increasing dctail. as IW will sec.

T h closiirc of thcir union. indicated by the overbar. is the n-tiolc L'. but tiieir intersection

is niill. Tlic last condition says. tliat for each function f(r) that is in I ; tlierr is a iiniqiie

furictiori t2(s) = f (2z) in I i T 1 that changes tivice as fast. or ivitli twicc as niuch detail.

\ \c siipposc t hat thcre is an ort honormal hasis for I corisistilig of integer translations

of a scnlzng function or a father wavelet. d(x):

f(x) € IL # f(x) = x ( / ( x ) : @ ( x - k)) d(z- - k) (4.11) k

Tlic diaclic dilations. { 2 j / ' d ( ' z j ~ - k ) } ~ ~ ~ ~ of the basis for l becornc the basis for I >. Further'

siiicc the siibspaces contain each other: we decompose the subspaçe 1 >,1:

i.c.. into t lie direct sum of the previous-level subspace, I,,) and the detail space, II) . Therc

csist a canonical ortlio-basis for II', composed of integer translations of the dilated mother

wa\-clct fiinct ion: +(.). Since:

it is not surprising that the wavelet system. associated with a particular 1IR-4: constitiites

a n ort ho-basis for L2 (R).

The most famous. the simplest and the least practically usable wavelet system is a Haar

t~asis. The Haar scaling function is:

tliat is. a unit>- constant function betn-een O and 1. It is intuitivcly clcar that by scaling

this function down to covcr smaller and smaller intervals- and translating it. one can. in

tiir limit. represent an? reasonable (e.g one in L2(R) ) function.

T h rnother wavelet associated with the Haar scaling function is:

Tliis is a Haar basis for II;: zero-level detail subspace. Projecting f (x) orito the integcr

translates of r* ( - ) . one obtains the local ciifference betwcn f ( - ) represcnted \vit11 o l S k ( - )

arid witli o ~ . ~ ( - ) - i.e.. between f ( - ) represented with first-order detail and f (-) reprcsentcd

\rit11 zcro-order detail. It is easy to check that different translates and dilates of c(-) arc

ortlionornial: inorc other. cjo,. arc orthogonal to for any k and j 5 jo. as we would

cspcct from relation (4.12).

Figure 4.1 s h o w some csamples of war-elet functions. -At each lewl tliere are twice

as nian>- wavelcts as on the previous level but each of them gets more "sclueezed" which

criablcs i t ta uncover more detail in the signal. -1nother feature of many \val-elets is their

.-spikincss" wliicli is a result of requiring compact support and orthogona1it~-. Higher order

wal-clcts get visibly smoother at the espense of longer filter lcngths (nest section).

Figure 4.1: Haar (Mt) and Daubechies SJ-mmlet rr-awlet functions. The detail let-el grou-s from hottom up. and on- some integer translates are drawn at each level

4.1.3 Discrete Wavelet Tkansforrn

III practicc. one is interested in obtaining the wvc le t reprcsentation of a function. just a s

~ v c arc intercsted in Fourier (or frequency dornain) representations computed n-ith Fourier

Transforrns. Given a function f (x). one wants to calculate the lowest-level scaling coeffi-

cients. c,,.k and the detail coefficients wbere:

The arhitrary Icvcl jo. for which ive calculate the scaling cocfficierits. represents the coarsest

scalc IW are interesteci in for a function f (.) under study. In practicc. one does not have a

furlctiori but a sarnple of it obtained with a given sampling rate (we usually assume that

the function f ( - ) tias becn sarnpled uniformly over ttic x-axis). \\é then assume. that the

frinct iori f ( - ) is piccen-ise constant over the sampling ~~~~~~~~~als: f (x) = f, for x E A,. -4

givcri sariipliiig rate tletermincs the highest detail level. J . wc can possibly calculate. i\-c

c m thcri approsimatcl>- assurne t h what WC tia\.c is the projection of function f ( - ) ont0

1 ;. or ttiat:

This is t h start ing point for the Discrete \\'i~\-elct Transform (D1I-T) which is iised

cxterisivcl>- in practice. What rernains to be showri is how to obtain detail and coarser level

scalirig coefficients c1.k: j = jo ' . . . : I . Sincc I o c 1 ive can represent the zero level

scalirig funct ion using the first level ones:

k E 7

This is a so-called scaling equatiori and is fundamental in constructing wavelets. The filter

{ t r c } . is of finite length when the support of P(x) is finite? and is then and esample of a

Firiite Impulse Response filter. Sirnilarly, since II,; c 1 ; : WC have:

=\ri important tlieorem, implied by the orthogona1it)- of n-avelet and scaling fiinctions at

t lie sanie level. states t hat:

Son-. t l ~ e scalcd and translatcd version of the scaling ecluation is:

-4 sinlilar rclatioriship holds for the waveltt coefficierits' cquation (4.19). BJ- writing down

thc dcfiriition of the wawlet and scaling coefficients one can use the ahovc resiilts to obtain

r hc tn-O funciamental equations of the D\\-T:

Tlicsc cqriations shon- hon- to calculate the lower-lewi scaling and w~vclct coefficients from

highcr Icvcl scaling ones. Givcn the starting values C J , ~ from (4.17) one procceds to calculate

t h c n.avelct coefficients for Ievels .J. .J - 1. . . . jo and the final scaling coefficients for level jo.

Tlicrc arc siniilar equations for going up the scale: calculating c , ~ I , ~ from pairs of cocfficicnts

; r d 2 , These equations are used in the synthesis stage. ivhilc the eqiiations (4.23.

-1.24) in the arialysis stage.

4.1.4 3D Wavelet Basis

The discilssion so far n-as centered on the one dimensional tvavclet basis. In order to

use wa-clets for analyzing PET and f ' l IRI scans ive need to construct the 3D hasis. The

mctliotl of choicc is. as in B-spline case, tensor product of one climensional ~vavclet functions.

One has to he careful, liowever: t o obtain an ort,hogonal basis with an appropriate l'IR4

ciccomposition. To this end one first generalizes AIR-4. Let D denote the dimension of the

doniain. RD. Then we have D univariate .\IR;\ decompositions:

for cl = 1.2. . . . . D. Ive are iiiterested in the D-dimcnsional AIR-1:

i.c.. cadi D-rariatc , \ IRA subspace is a tensor prodiict of the corresponding univariate ones.

BJ- a tcxisor product space WC mean that its basis consists of tensor product of unirariate

scalirig fii~ictions that forrn the basis of 1 5:

for ariy k E Z D . To obtain the multidimensional ivavelets we start with espressing Vjii

as a direct sum bet~veen previous level V, and a dctail space. W,. To be concrete let ris

The dctail spaces W: will ernpliasize local fentures in various canonical directions. If u-e

. - irriagixie the directions in the space ordered as run horizontally. verticallu and "into the

page". t lien spaces W: . w:. W: will be -.turned on" by features in the **depth". vertical

aritl horizontal features. and the remaining 4 wilI pick up various diagonal directions. The

space for W ; is spanned by 3-D integer translations of:

aiid d(s) clenotes the crh digit in the b i n a 5 expansion of S .

4.1.5 Wavelet Thresholding

Doiiolio and Joliiistone (199-4. 1995) have clcveloprd tliresholdiiig riiles for dcnoisirig s i p a l s

iisirig n-avclcts. Tlicsc results are optimal in the minimas. or "v-orst case sccnario" sensc.

Spwifically. oric assumes a typical scquencc of noisy obser\-ations:

n-licre f ( - ) bclongs to some functional class and et arc i.i.d. sequcnce of standard Gaussian

rziriable. Dorioho and Johnstone develop a series of rninimax estimators of f ( - ) using

liard and soit wa\-clct coefficient thresholding. \lé will only describe here a liard universal

thrcsholding rule rvhich thcy termed \ïsuShrink.

Di\-T is an orthogonal transform rvhich may be representcd bj- a matris multiplication.

If orle denotes a secluence of nm-elet coefficients of / ( - ) b - 0 then the DTIi- of the noisy

signal y ( t i ) is:

n-iicre t lie transformed noise coefficients arc nt il1 i.i.ci. st andarci normal becaiise of the

ortliogonality of W. The idea of thresholding is to only keep these coefficients that carry

t lie sig~ial. i.e. that are ..big enough". The idea hinges on the fact. that for a wide classes

of signals wavelets provide a sparse representation. t hat is the waveiet espansion of these

signals liavc m a n - zero or near-zero coefficients. The question is how to determine n-hich

~va\-clct cocfficient of a noisj- observation do not carry any signal. l lany solutions hase

bccn proposed. but Donoho and Johnstonc (1994) proved that a particularly simple ride

lias nrar-optimal SISE in the minimas sense. This rule is: replace i ~ . , k by:

Iri practicc the thresliold becomes &J-. wticre a is an estimatc of t lic hornoscedastic

noisc levcl. Donolio and .Johstoiie (1995) propose to use the mcdian of the fincst Icvcl

cmcfficicrits di\-idcd by 0.6745 as an estimate of a: the constant is clerivcd from the Gaussian

case. Otlicrs liave proposcd a similar Sleclian .-\bsolute Deviation of al1 wavclct coefficients.

in plricc uf si~iipic riiediari at tlic liighest level. Everyone agrees that becausc of thc sparcity

proprrtics of wavelcts a robust estimator of variance should bc used.

The abovc rcsults work for the i.i.d. case. Remarkably, Johnstone and Silverman (1997)

sliow that if the noisc is correlated. al1 results of Donoho and Johnstonc are still valid.

provicled tlic thresholcling is done separately on cach lerel. That is both the threshold and

the 1-ariancc arc cstimated separately for each detail lcvel. In the case of images. ive do

tliat scparatcly for each combination of lewl and direction. i.c. separately for each II;" in

t h Eq. 4.29.

4.2 Finger Opposition Data: Methods

4.2.1 Data and the Standard t-Test Analysis

1l-e apply the Penalized Discriminant -4nalysis using a simple ridge penalty and tensor-

prociuct B-splirie (TPS) basis to the FOPP data described in Scction 2.4.1.

-4ftcr proccssing by the 3 x 3 x 3 bos-car smoother and scan-mean normalization

(i.c.. divitliiig each vosel by the mean of al1 vosels within the brain mask for tliat scan) a

poolcd standard deviation est iniate iras calculated (\\orslcy et al.. 1992) and an activation

t-test value obtained for each vosel. as described in Strother et al. (1995a). Sote. that

siich a pooletl t-test activation image has becn shou-n to outperform single-voscl t-test

iniagcs with al tcrnate preprocessing schemes (S t rot her et al.. 1998a). making the poolecf

acti\-ation image a good reference pattern for the various PD-4 canonical images prescntcd

in tliis papcr.

4.2.2 Two-way Classification with TPS: Interna1 Optimization

and Scan Normalization

Eacli scari was assignecl a label. from the {Active. Baseline} set. according to its espcrimental

c.oricIitior~. Tlie tensor-product (cubic) B-spline (TPS) basis. composcd of 25 B-spline basis

in cadi diriicnsion.(defined as B25 in the nest section) \vas set-up in the srnallcst 3D box

tliat circuniscribed the logical ASD of al1 subject masks for raw (Le. unsmoothed) scans

wi t 11 aricl n-i t hou t scan-mean normalizat ion

Fivc-fold cross-validat ion and leave-one-out bootstrap were used to estimate PE mea-

sures for a grid of values of the ridge tuning parameter. A. The sampling units for both

rcsanipling mctliods werc the subjects: thus al1 scans from a subject were included if a

subject was sampled. Fifty bootstrap samplcs were used, for each value of A, to derive a

.GX+ bootstrap cstimate of both SIC rate and SPE. \\'e cal1 this process of searching for

the optimal value of a ridge parameter an Intcrnal Optimization.

4.2.3 Two-Way Classification: External Optimization

The scaris w r e assigned class labels. as bcfore. Different parameterizations of the data

wcrc iiscd. which we denote b:- B15, 620, B25, 930, 835, Braw, BSmooth3, BSmoothS.

11-c will use thc word ..basis'' interchangeably with "representation". in rcfcrencc to any of

these. B?? are terisor-product spline projected images. The tn-O digits cfenote the number

of B-spliric bnsis in each dimension. That results in a widc range of input dimensionality

(i-c.. total nurnber of basis functions with support trithin the -4SD mask): between 2.500

and 25.000. Braw denotes the un-projected data, with vosels within thc -4SD of al1 masks.

This rcsults in 28.500 vosels used. BSmoothk is sirnilar. but from the data ttiat lias first

bcrri sriiootlicd with a k x k x k bos-car. (ie 3D square kcrnel) smootkier: typically. k = 3

ivoiild hc iisetl (e-g.. Strothcr ct al.. 19921).

PD-\ \vas applietl to the mean-normalized data for diffcrcrlt basis clioiccs. For each

tmsis. the bootstrap analysis over the samc grid of A was donc. with B=5O bootstrap

saniplcs. as described above. -4 number of bootstrap sarnpks. 50. is adrnitedly Ion.. but

sccnis (cf. fig. 4.2 a goocl compromise giwn thc huge cornputatiotial burden associatcd witli

rcsampIing. 1\-c compare the error curves (for both IIC rate. or SPE) and the position of

tlit niinima on the SPE/EDF plane.

4.2.4 Deriving Time and Spatial projections

11-r? can iisc t h same mode1 to obtain tlie activation maps of the within-subjcct tempo-

ral cliarigcs throughout the expriment. This is done by dcfining an 8-tvay classification

prol>lcm. whcrc classes denote the ordcr tlic scan was taken (for subjects with 10 scans,

tlic 9th scan was pooleci with the 7th. and the 10th with the 8th). So information about

tlic csperiniental-state or about the temporal ordering is available to the modcl. Out of 7

canonical variates that result, we look for those that represent time and state changes. \\é

projcct tlie labelcd data and class means ont0 the canonical variates and choose two that

arc most appropriate.

4.3 Finger Opposition Data: Results

4.3.1 Two-way Classification and Interna1 Optimizat ion

Figiirc 4.2 shon-s the effect of the ridge timing paramcter (esprcssecl as EDF) on the prc-

dictiori crror. Both CIv and .632+ bootstrap estirnates of PE measurcs are shon-n. I\é

riote that t h '\IC rate has an erratic beimviour due to its discontinuoiis definition. Es-

aniiriing thc SPE curvcs on the top-left and the bottom panel we note that the minima

for C l * estiriiates (thin lines) occur earlicr and rise fastcr with increasing EDF than those

for .G32+Bootstrap (thick lines). This is likcly sincc Cl - estixnates arc not corrected for

tIic smallcr training set size. This would cause tlie iricrcased variability of tlic canoriical

variatcs. t h t cornes with higher EDF. to have niore pronouncecl and cluicker cffcct than

wit 11 t lic fi111 training set.

Iri t h top-right panel n.c see. that CI' cstimatcs eshibit higher ~a r i ance as compared

to thcir .632+Bootstrap counterparts. To obtain ttiis plot we did not set ttic random scccl

to the same value for cvery A in the resampling esercisc. as was donc for ewry othcr plot

in the papcr. This allows us t o eshibit bctwecn sample variabilitj- of both resampling

mcthods that contributes to the variance of the estimate of the prediction error. Another

problcni with C i * is the open question of how many folds to use. On onc estrcme WC liavc

11-fold cross validation. n-hich n-ould have ncgligible training set bias but higli variance of

the PE cstiniates in each fold. On the other and there is a two-fold cross-validation that

Ilas lowcr variancc for caciz fold but liigh bias due to much smallcr training set sizc and

potcnt ially large variance contribution t hat comes from many w q s to diïide the training

M-C rate and SPE estimates for 825 unnormaiiued data CV and 632+Boot SPE estimates WIUI vaqmg random se&

.... . . . ..... - 632Bo0t-SPE -25

CV-SPE

CV-MC

O

-1 6

-1 3 I

0 1 - - J -1 0 O '

O 50 1 O0 150 Equivalent Degrees of Freedom

O v - , 3. à 0 . - B25 : CV-SPE ! :

1; : . -

z . w 0 50 1 00 150

Equivalent Degrees of Freedom

M-C rate and SPE estimates for 825 mean-nomalized data

- 632B00t-SPE

CV-SPE ----a 632B00t-MC

- - - - - - CV-MC

50 1 O0 1 50 Equivalent Degrees of Freedom

Figure 4.2: Internai optimization o f the ridge tciningparameter (Sec. 4.3.1): cspressed a s a ri iini hcr o f Effectir-e Degrees o f Freedom, for Penalized Discriminant -4naf~sis in tiio-class prol~lcrzi. This esample uses the data projectcd on the B25: tensor prodiict B-spline basis set. Tlie ciiri-es slloii- tlic change o f Prcdic tion Error (both .\ lisclassifica tion (JIC) ra tc and Sc1 uarcd Prcdiction Error (SPE)) as a function o f Effectii-e Degrees o f Freedom (EDF). Bot h cross-i-aliclation (Cl3 and .632Bootstrap cstimates are eshihiteci. The top left panel slions changes for un-normaiizecl data. wliile tlie hottom panel deals ii-ith mean-normalized data. Tlic top Nglit panel slioiis and -632Bootstrap SPE curr-cs for mean-normalized data 11-hich n-cre obtained n-ith a different random seed for ei-erj- EDF: tliese portray the grea ter i-ariabilitj- o f Cl- estimates. Thin lines - Cl- estimates, Thick lines - .632+Bootstrap estima tes. solid lines - estima tes o f SPE, daslied lines - estima tes o f MC rate.

set in tn-o. Leave-one-out bootstrap may be seen (Efron. 1983) as approsimating tivo-folcl

cross validation but with rnany two-fold splits and the .632+ corrcction deah with the bias.

i \ C have thus chosen the bootstrap estimator for reporting the rest of the results in this

t hesis.

Tlic bottom panel shows the Cumes for the nican-nornialized data. n-here each scan has

becn divitlecl by its brain-vosel mean. The crrors are lower. indicating that a large source

of variance has been rernoved. The 1IC rate (as estimated by bootstrap) drops down to

jtist belon- 13%. n-hicli is a large improvenient over the no-inforniation rate of 50%. l lore

noticeah1~-. minimum SPE is around -104. as compared to 0.25 for no-inforniation value.

The SPE minimum is a t around 50 EDF. for the normalized data. and around 40 EDF for

il ri-norrnalizcci. This suggests. t hat after opt imizing for the degree of smoot hness. t he niotiel

is able to estract more information from normalized data. This observation is consistent

n-ith oiir intuition: mean-nornialization removes a large source of variance increasing the

sigrial-to--rioisc ratio and allowing more gencralizal~lc structure to be found in the data.

4.3.2 External Optimization with Different Bases

\I'c liai-c irivestigated tiic influence of image representation on the rcsulting canonical ini-

agcs and thc prediction error. Figure 4.3 s h o w PE cumcs (SPE: top panel and AIC ratc:

bottom panel) across EDF for al1 representations. \ lé will concentrate on the SPE curves

first. ,411 TPS-projectcd data behave very sirnilarly, escept for the B15 basis. Tlie B15

representation is likelj- too coarse to capture al1 but the major. spatiallx estensise compo-

ricnts of the underiying activation pattern. Its minimum occurs quite early on the EDF

scalc (EDF=36.6)? and its SPE rises sharply as EDF increascs. indicating that there is

less iiseful structure to be estracted in this representation. The Braw SPE curve shows

the worst performance for small EDF. and for larger EDF increascs faster than al1 but the

B15 projectcd hasis. This representation seems most sensitive to the choice of the tuning

SPE curves for 2 class problem

Equivalent Degrees of Freedom Misclassification rate curves for 2 class problem

20 40 60 80 100 120 140 Equivalent Degrees of Freedom

Figtirc 4.3: Prediction Error (PE) curt-es as a function o f tlîc effècti~.e degrces o f freedom (EDF) for al1 tensor-product B-spline (815-835) bases and unprojected rari- (Braw) and sniootl~ecl (Bsmooth3,5) scans. The upper panel shows Squared Prcdiction Error (SPE=(l- ) rr-licrc pc is the estimated posterior probahility for the true class), and the lon-er pari cl dcpicts Jfisclassification rate (MC rate) as a percentage o f the total number o f scans rnisclassificcl. The images from un-projectcd da ta arc shon-n nit h larger-rr-idth curves. The niarkcrs sl~on- t h minima for each curr-e. (The minimum for the 830 curr-e is not show-n as i t occureci bq-ond the figure frarne).

paranicter. The other two unprojected representations behave sirnilarly to the projected

oncs. csccpt ttiat the two minima are farther apart and their SPEs are larger than those

of the projected basis for most of the EDF range.

Esarnining SPE plots a t around 20 EDF. ive note that the SPE curvts of projected bases

arid Braw arrange themselves in order of increasing srnoottiness and decreasing SPE. This

places BSmooth3 betwen B30 and B35. and 8Smooth5 between B20 and B25. At these

low degrees of freedom. the canonical image is very Hat. csccpt for sorne cstended bunips

occiirring at thc most predictive spatial locations. The canonical images arc then driven

by large featiircs. rather than by small regional changes. and smoother representations are

iinderstandably better in those circumstances. Thereforc. it rnakes sense t hat the curvcs

ordcr tlierriselves according to the dcgree of srnoothriess of the reprcsentation.

\ \ c also esamined the placernent of minima for each representation in the SPE-EDF

plane. The unprojccterl bases eshibit lowering of the prediçtion crror witli rising EDF.

as one mo\-cs froni srnoot hest (BSmooth5) to the roughest (Braw) representat ions. The

projcctcd hasis csliibit a similar pattern but the trend secms to be heading for largcr SPE

ivith largcr EDF for "roughti' representations. while al1 along maintaining a l o w r SPE

t h a n the iinprojcctcd bases. Onc esplanation is that the projectcd basis are somewhat more

cfficicrit for this problern allowing the carionical images to contain more structure (higher

EDF) n-hile controlling the SPE levcl. Tlicir utility is. however. lirnited and the SPE starts

to risc witli very largc numbers of basis functions and an associated higher EDF. -Us0 note.

tliat the EDF-at-min-SPE is again ordered with respect to smoothness of representation.

for Braw (28.500 voscls) and projected basis ( 5 25.000 basis functions). and that again the

smoot hcd. iinprojcctcd representations fa11 sonicwhcre in between 815 and B35.

The 1IC rate curves in the bottorn panel of Fig. 4.3 again demonstrate the markedly

diffcrent behaviour of the B15 representation. Even t hough this time it achieves seconct-

lowest crror a t its minimum, it performs significantly worse for the higher EDF values.

as bcforc. Escept for B15. al1 curves from un-projected data have worse 11C rates than

TPS projected data. for niost of the EDF spectrum. Braw eshibits performance that is

visibly worse than the other unprojected cun-es. reversing the ranking of the minima of the

SPE plots. -411 curves. besides B15. flatten-out beyorid EDF= 70. lié also note. that the

EDF-a-min-SIC are much higher. but less well determined than their SPE counterparts.

Ir1 both cases. the SPE measiire of PE seems much informative than the SIC rate

curves. Tlie minima in thc SPE Cumes are more pronounced. and t h curves thcmsclves

smoother. \\é also bclicve. that the measure which takes into account how much the

~~ iodc l ' s prcclictions are right or wrong. is more informative to Our goal of assessing the

niodcl gencralizability. The ,\IC rate plots are highly variable. due to the discontiniious

( O / l ) crror structure. Tliat they flatten out for higher EDF. with thcir niinirna occurririg

niucli frirthcr to thc right than in the SPE plots is likely due to tlie fact that the two-\Y-

classification problerri is driveri mostly by a few strong regions of activation which arc cliiite

clrar indiators of the active statc: niostly the right sensory motor arca and left cerebelliim

(sec ron- C. Fig 4.4). Oncc enougli weight is put ori tliese rcgions. (acfiieved \vit11 liigh

EDF) the classifier can perform ivell. on average. regardless of the rest of the image. There

\vil1 bc sorric scans. however. where the actil-ation maps needed to perform classification

arc diffcrcnt. In tliose cases canonicai images composed using high EDF. \vil1 perform ver-

t \\-hile such cases will be penalizcd rclati\.cl~- rnildly in terms of misclassification

(tlicy will score a penalty of 1 regardless of hoiv %adly" they misclassify). their posterior

probabilities will be v e v much off. and tIieir SPE penalties inuch stiffcr.

I\-e prefer the SPE measurc of the prediçtion error. Tlie SIC rate is. liowever. bettcr

cstablishcd i r i tcsting pattern recognition models and has a so~newliat more direct inter-

pretat ion. \\é report botli while concentrating our interpretation on SPE-based results.

In al1 that follow we will use mean-normalizecl data and .632+Bootstrap estimates.

Hon-ever. as evidenced by Fig. 4.4: the SPE ciifferences at the minima are only part

Figure 1.1: Functionaliy activated [ L50] rvater PET vosels above the 93.4 percentile (white or~erlay) interleaved with registered grayscale AIRI brain slices for Penalized Discriminant -4nal~sis of: ( A 1 6 0 A 4 0 ) unprojected raw data presmoothed ri-ith a 3 x 3 x 3 i-owel boscar kernel (BSmooth3): (BI 6-B4O) unprojected rau- data ivithout presmoothing (Braw); (Cl 6-C40) tensor product spline basis with 15 spline bases in each spatial dimension (615): (D l ô-D4O) tensor product spline basis rvit h 35 spline bases in each spatial dimension (B35) (activation images A to D have decreasing squared prediction error (SPE) r-dues a s illustrated in Figure 4.3): ( E I G E 4 0 ) is a pooled standard deviation t-test image o f scans presmoothed with a 3 x 3 x 3 m e l boscar kernel - the Bon ferroni t-value (t =L65) at the 93.1 percentile rvas used to define a conservative activation threshold u-ith which to compare activation image peaics (white overlay) for a fixed number o f r-osels. PET and J fRI slices are 128 x 128 with 3.1 m m pisels with center-to-center sljce spacing o f 3. J mm (i-e.. slice -423 and A26 are separated by (26 - 23) 3.4 = 10.2mm) and are parallel to the .AC-PC plane. which coincides with slice 24. Image left = brain left.

of the story. This figure shows the top 6.6 percent activation in nine chosen slices from

the canonical image obtained using: A - BSmooth3. B - Braw. C - B15. D - 635

represcntations. and E - a t-test image. The 6.6% threshold is defined by the Bonferroni

t-value of image E and !vas selected to compare equal number of vosels across the five

activation images. The ridge parameter. A. \vas set to optimize tlie SPE for each basis.

The first four PD-4 images are arranged in order of decreasing SPE. Of particular interest

arc Braw and B35 images. These tzvo rcprescntations are somewhat analogous to each

othcr: Braw and B35 images contain the most structure in their unprojected and projected

groups. rcspcctively). their respective minimal SPEs are not far from the group minima.

and thex both represent the roughest representations in each group. The figure shows

that the B35 representation resuits in a less noisy and fragmented image n-hich is more

visuall>- appeali ng. BSrnooth3 regains more smoot hness as compared to Braw. making it

niore appcaling. but it cioes so at the espense of predictive performance. \\é also note

sigriificant intcresting clifferences on the t-test image which secms to bc missing potentiall~

important structures: contralateral midbrain tegrnentum in slice E23 whicti contains key

parts of the niotor system such as the substantia nigra. the ipsilateral auditory area in slice

E30. and while ipsilateral parietal and premotor regions are scen on slices A36 and B36.

D36 siion-s on1~- the parietal area and E36 the premotor area.

Figure 4.5 shows the scatter plot (one point per Talairach vosel) comparing the t-tcst

aritl tlic 835 images. There is a non-linear trend upwards for the significant vosels: in the

iipper lcft part of the plot. which shows that thc rnutually significant activation regions are

more proriounced in the 835 canonical image. The circle shows a small cluster of voscls in

tlic primary \-isual cortex that have been elevated by the PD.-\ from the 2oth percentile in

tlic t-test image to the 90'" percentile in the B35 image - these vosels are visible in the

prirnary 1-isual cortex in the sIices A26, B26 and D26 in Fig. 4.4.

Ir1 this fingcr-opposition data set a single-vosel t-test using pooled standard deviation

-1 0 -5 O 5 1 O 15 Single-voxel t-test (pooled standard deviation)

Figurc 4.5: Scatter plot o f pairs o f actiration image i-alues for al1 Talairach hrain i-osels ( 1 point/\-osel) for a single-vosel t-test image using a pooled standard dei-iation estimate. conzpared CO penalized discriminant analysis (PD.4) o f a tensor product spline reprcsentation rr-i th 35 B-spline bases along each spatial dimension (B35). The dashed line depicts the principal asis from a principal component analjesis o f the scatter plot distribution. The cir- clc bigliligh ts a group o f r-osels in the primarj- risual region tha t har-e moi-ed from rhe 2 0 ' ~ percentile in the t-test image to the goth percentile in the PD=\ image. The solid i-ertical Zinc cfcpicts the Bonferroni t-\due (t=4.65) at the 93.4 percentile of the t-test distribution o f r-ose1 \-dues (white o\.erla_r-~ ron- E o f Figure -4.4) and the solid horizontal line reflects the 93. -4 percen tile (r-alue=O. 0065) for the PD-4 distri bution o f vosel values (white overlaj- rorr- D o f Figure 4.4).

(SD) est imates has been shown to prcdict population based activation image patterns

significantly better than for single-vosel t-tests using individual vosel SD estimates. and to

perform a t the same level as a canonical variate analysis built with the SI'D basis (Strother

ct al.. 199Sa). Therefore. it is not surprising that the BSmooth3 arid t-test activation

O\-trlays in Figure 4.4 are quite sirnilar. probably rcflecting the fact that for this simple

two-state ana lp i s the ridge penalty is relatively large (see section 3.3.3). Howevcr: t here

are several important differcnccs between the PD-4 solutions and the pooled t-test result.

Figure 4.5 demonstrates that the PD-4 result has nonlinearly enhanced the most significant

voscls relative t o the corresponding t-test values and "noise" values around zero. -1t least

one area ( the p r i m a p visual vosels within the circle in Figure 4.5) has been completely

reordcred relative to other activated regions so tliat it is non. potentially active while in the

t-test result it was negative and not distinguishable from noise. In addition. in Figure 4.4

thcre arc "activated areas" which are plausible given this motor task paced by auditory

ciies. t hat appear in slices 19. 33. 30 and 36 of the PD-1 results in rows -4. B and D. but

riot iri the t-test result. The key point here is not that the PD,-1 results are right and

tIic t-test wrong. but that the distribution of potentially activated peaks agree for rnany

cspectetl areas and there is a hint that the tunablc PD.4 results rnay be more sensitive

and able t o identify areas that could change the ncuroscientific interpretation of the brain

resporisc to the task, In addition. the PD-1 frarnework is much more flexible. as shown by

the eight-class results. and internally optimized through prediction error estirnates so that

we do riot need to put as much faith in the validity of distributional assumptions within

a n i r i fercnt ial test ing frameworli.

The Effective Degrees of Freedom (EDF) provides us with the way of calibrating the

aniount of information estracted from the da ta , analogous t o the dimension of the ,O space

in linear regression. It seems. from Fig. 4.3: that it is both the SPE and the EDF-at-

min-SPE that are of interest. Ideall . we would like t o estract as much structure from the

clata as possible (since we believe that the brain function is anything but simple) \\-hile

rriaintaining low levels of SPE and thus high gencralit- of canonical images. In that sense,

t h most appealing results are Braw and B35 representations which have the highest EDF-

at-min-SPE and a small difference in their min SPE. Figure 4.4 shows that of those two.

the projected B35 representation is much more appealing visually. By esamining the trend

of the minima in Fig 4.3 n-e note that !ikely nothing more can be gained in the unprojected

rcprcsentatiuns (the sniootliing tends to worsen the results and n-ith Braw we arc at thc

cricl of the roirghriess scale). while Ive can hopc to achicve better results by other choices of

bases to project the data onto.

4.3.3 Applying PDA to an Eight-Class Problem: State and Tem-

poral Changes

The SPE (Fig. 4.G) ciines show somc ciiffcrences when comparecl to the 2-n-ay classification

problcrn. Thcre is an increasc in the EDF-at-niin-SPE. hinting a t more est racted structure

in t his riiorc complicated. miil ti-cIass set ting. The Braw rcprescntation. which again has t he

1iigIicst EDF. also has higher SPE (0.535) than BSmooth3. the winner in the unprojcctcd

groiip (SPE=O.550). B35 has the lowest SPE (0.3-1.5) and second lowest EDF-at -min-SPE

(62.1). By cliance. onc ~voulcl espect 0.757 for the no-information SPE.

LIorc iniportantly, Figure 4.7 show that the mode1 is able to estract two components.

wliicli WC u priori consider important: state and temporal effects. The class centroids

projectcd on thc first canonical image arrange themselves largely in the temporal ordcr

in cach of the two states. There is also a large jump betwen first active/baseline scan

(class 1 and 2) and the second of thcse scans. This is intuitimly appealing as the subjects

arc probably still learning (in the case of the first active scan) or reflecting on the tasks

to t ~ c perforrned and generally adjusting to the situation (in the case of the first baseline

scan). T h first canonical image clearly separatcs the baseline (odd class numbers) and

SPE cutves for 8 class problem

20 40 60 80 100 120 140 Equivalent Degrees of Freedorn

Figrire 4 . 6 Square Prediction Error (SPE) curws: in an 8-cfass problem. as a function o f Equiwlent Degrees o f Freedom for 8 penalized cliscriminant modefs with diflerent represen- ta t i o ~ s : 5 tcnsor-product B-spline projected datasets n-ith 1-a--ing numbers of basis functions (B I5 to B35) and 3 unprojected ran* (Braw) and smoottied (BSmooth3 and BSrnooth5) datascts (tliicker lines). The markers show the minima for each curr-e.

-2 -1 O 1 2 Discriminant Var 1

F igtire 4.7: Pro jec ting the da ta on first t ri-O canonical images obtain bj- Penalized Discrim- inant -4nal~-sis o f the 8-n-a'- classification problem. The points are labeled according to the class (1: first baseline. 2: first active: 3: second baseline, etc)? and the class means are sli on-n in circles. This figure rras O btained from tensor-prod uct B-spline projcc ted da ta u T-

iz~g 35 B-spline basis in each dimension (635) moclcl n-ith A corresponding to the minimum Squarcd Prcdiction Error (SPE). but is sirnilar across basis and A S.

acti\-o scaris. It is worth repeating. that the mode1 had no kno\vlcdge about both statcs

and temporal ordering: its task n-as simply to differentiate among 8 unordered classes.

Thcse two components corne iip as the first two canonical \-ariates and thus accourit for

thc rriajority of the betn-cen-class variance (62%). The figure also shows: ( 1 ) potcntial

interaction hctn-ecn the two esperimental states a n d the temporal proccss. as the means in

the two groups arrange themselves on lines with differcnt slopcs, ancl (2) a first scan effect

in both states. scans 1 and 2.

Bj. esamining the SPE cux-ves ive note that the EDF-at-min-SPE are somewhat highcr

(62.1 vs 56.8. for B35 and 75.0 vs 68.8 for Braw) suggesting that more information is

cstractcd from the data. when the temporal structure of the problem is not includcd in

t lie nithin-class covariance. Tliis improvement is consistent with our observation that the

temporal structures seem different for two esperimental states: potentially violating the

comrnoii within-state covariance assumption for t he two-class analysis. This setting also

dcmonstratcs that the projcctcd rcprescntations are potentially even more useful in more

sopliisticated situations: the lowest SPE for unprojected d a t a is achieved with BSmooth3.

alid for projected da t a with B35 somewhat reversing the trend found in the two-class

problcni. This shows. even more clearly than in the two-way problcm. that t o estract more

of the generalizable structure Ive need to impose some constraints. \ié have attempted to

compare t hc first canonical image from this PD-4 (corresponding to the csperiniental state

classification) t o the cario~iical iniage obtained in the t\\-O-class paradigm. but one problem

i s tlic arbitrar>- rotation allon-ed in thc space of the first two canonical images. As a possible

fiitiire work u-e ma>- dcvelop a PD-4 mode1 where the first Canonical \ar ia te is specified

from the two-\va- analysis. and apply PD.\ t o tlic eight-class problcm to discover secondary

structures.

4.4 Result with Wavelet Expansions

The 182 scans have been preprocessed with a Discrete \\>\-elet Transform. ivliich is eqiiiva-

lent to projcct ing t hem on a n-avclct basis. Two familics were used: Daubechies ort tiogorial

n-avelcts and Coiflcts. which is also an orthogonal farnily. \le have invcstigated orcler 2 and

3 Daul~ecliies n-avelets (Daub2 and Daub3) and ordcr 2 Coifiets with 6 cocfficicnts (CoifN6).

Daiibcchies \t-avelets are perhaps the most famous wavclet family Thcy n-erc constructed

to he as symmctric as possible ( the onIy full'- symmetric, orthogonal wavelet with compact

support is a Haar wavelet). The order numbcr refers to the highest moment of the wavelct

furict ion tliat is cqual t o zero: Daubcchics wavelet of order 3 has mean and second moment

cqual to zero. This is directl). relatcd t o the smoothness of the LI-avelet. Coiflets have

additional zero-moment reqiiiremcnts on the scaling function (Burrus et al.. 1998).

The Donoho S: Joiinstone thesholding bring about a tremendous dimension rcduction.

The tliresholdcd Daub2, Daub3 and CoifN6 representations result in 1884. 4187 and 3850

wavclet functions rcspcctively This has to be contrasted with 9670 for B25 and 35.071

B35. Givcn the much bctter prediction rcsults achicvcd by wavelets. this s c a t rcduction

of ciiniensionality seems to have successfully decreased the variance of projections.

l \C first look a t the 2-class problem that estracts the single baseline-activation image.

Thc top panel in figure 4.8 compares one particular ~vavelet representation with B-splines

and unprojccted results. The improvement is dramatic and is miich larger than the differ-

ericc txtwrcn B-splines and unprojected representations. \\é offer one possible esplanation

for t iiis irnprovment. \\avelets combined \vit ti threslioldirig oEcr a great reduction in di-

rrie~isionality. wit hout lowering the discriminatory capability of the PD,\. Reduction of

dinicnsionality Iias the effect of lowering the variability of the results: that is of tlie esti-

rnatcd canonical variates. hence projection and hence posterior probabilit- estimates. It

scexiis that tliis rcdilction in variance is much greatcr than tlie associated increase in bias of

tlicsc est iniatcs. In fact. due to the scak-spacc tiling property of wvelcts. even the thresh-

oldcd n-avelct basis ni-- Iiave better resolution tlian the mucli higher diniensional B35

spliric basis. The t hresliolding a t tempts to kcep t hc high rcsoliition wavelets orily wliere

thcy secrri to be needed to estimate the brain ftinction n-ell. and rcduccs the rcsoliition

clstwliere.

In the hottom panel WC compare the three wavelet familirts with and without thresh-

olding. Thc t tiresholding !ielps the classification problcm somediat . particularly for the

two Daul>ecl~ics families. The Daub2 is a winner with SPE= 0.075 at EDF= 74.6. .As ttris

is tlic coarsest wavelet famil- it indicates ttiat the discriniination probleni liinges on a ive11

cicfinccl. sliarp structure which is best picked up by the low-order Daub2 wavelet. Thc DkJ

thrcshokling gives small. but consistent improvement.

Figure 4.9 shows a fcw slices of the Crinonical Eigenimagc that results from applying

PD-\ u-i t ti ridgc hj-perparameter optimization using two representat ions: Daubechies or

ordcr 2. and B30. The wavelet representation has a much sharper focus on the activated

arcas t han B-spline, n-i t h much srnaller .'bleed-over" from ncighbouring pixels o r slices. On

SPE curves for 2 class problem


SPE curves for various wavelets representations d

1

BWave64CoiNGTh resh BWave64Daub2Thresh

- - - - - - - - - O BWave64Daub4Thresh

BWave64CoiN632~32~16 BWaveDaub232~32~16

---------. BWaveDaub432~32~16

_ - - - - - - - -

a


Figure 4.8: Top panel compares n-al-elet and Bspline results in the 2class problenl. Shown are Da u bcchics order 2 thresholded n-ade t basis compared to ran- and B-spline representations tising .632+ Bootstrap estimate of squarcd predictiori error. The hottom panel shows SPE curres for the tn-O-class prohlem for r-arious n-adet families. 115 compare Dauhcchies order 2 and 3 faniib- and orcler 2 Coiflet system. For eaclz familj- n-e int-estigatc t 11-0 ttircsholding s tra tegies: simplj- "peeling off' one finest detail le\-el in cach dimension (32 x 32 x lG), and Donoho and Johnstone IiisuShrink liard thrcsllolcfing rule.

Figure 4.9: Wsual cornparison of U'avelet (top row) and B30 representations in the 2 class problem. First three slices show portions of the cerebellum, next two displa?- the midbrain portions! and the last three slices depict the activation o f the cortex. The m v s c a l e image is the anatornical .URI scan in the Talairach space and the CV is overlaiived on top of it using the hot-metal color coding. Both images rvere created using EDF t hat rninirnized the SPE: 74.6 for Wavelet vs 53.6 for 630.

t h e o ther hand, it still hasi fewer spikes a n d speckles than the unprojected representations

such as Braw or t -map (not shown) which improves interpretability

Figure 4.10 compares t h e Wavelet a n d B-spline results using a corner-cube environment

(CCE) (Rehm e t al., 1998). CCE finds several connected areas wit h high average activation

(here: a b o ï e 99 percentile) and wit h pre-set minimal volume, for images being compared.

These areas, called CCE foci, a r e then displayed using stems a n d projections ont0 the walls

of the 3D volume. T h e figure shows clearly tha t the B-spline results a re smoother and more

spread than wavelets. .i\lso, t he wavelet PDX shows sonie smaller regions of activation tha t

a r e either absent o r have much smaller activation levels in t h e B-spline volume. This is

due t o the CCE algorithm: except for t h e major centers of activation (motor and auditory

cortices) the relative levels will be lowver in the B-spline CV d u e t o imposed smoothness,

which causes them t o b e larger but suppresses t h e peaks, and thus prevents them from

being picked u p t ~ y CCE.

Figure 4-10: Comparing the il-splines (MO) and wavelets (Daub2Thresh) Canonical lrnages using a corner-CU be environment o f Rehm et al. (1998) in the twcdass setting. Escept for the three major overlaping regions, the foci have been fit inside a bal1 o f the same volume as that of corresponding focus. Blue foci correspond to 830 Canonical Image.

SPE curves for various wavelets representations in 8-class problem

Equivalent Deg rees of Freedom

Figiirc .4.11: Squared Prediction Error for various ii-ar-elet families in the Sclass problem. Three irai-clet functions are ini-estigated: Daubechies order 2 and 4 , and Coiflets order 2 For cacl~ famiij-. ire either remoi-e top-scale 11-areelet coeficient lei-el! rcsul ting in 32 x 32 x 16 ii-ai-clet coefficients or ire apply D&J 1ïsuShrink hard tliresholding (Thresh). -4.5 a Airtlicr dimension reduction technique: ire ini-estigate using all 7. or first 2 C\'L to perform classification (07 r-s 0 2 . )

In the 8 class problem that decomposes a full covariance structure associated with both

t inie aiid esperimental design ivavelets perform surprisingly badly- Figure 4.11 shows the

SPE ciirvcs for the same ivavelet families and thresholding rules that ive used in the 2-

class case. \ le have also investigateci restricting the dimensionality of the PD-4 mode1

from full 7 to 2 . since tliere is an a'priori belief that only badine-acti\-ation and time

structures are important for this problem. That is ive only predict using the first two

canonical variates. IVhile t his rank rcduction helps somewhat : the results are st il1 signifi-

cantl~. worse from those of B-splines and rav- representations: the loirest SPE for wavelet

families is achieved by Daub2 family. D&J thresholded and restricted to two Canonical

Iàriates (Wave64Daub4ThreshD2). It achieves SPE=0.672 a t EDF=61.7 whicli we compare

with 0.5-4.3 a t EDF=6'2.1 for B35 representation.

Figure 4.11 shows that dimensionality reduction greatly improves the prediction for

n-avcllct faniilies. ivhile. as [vas the case in the 2-class setting. the threshokiing strate=

seenis ltss important. This suggests that the errors ma>- be driven b> variance. ivhich is

rcdiiced wticn only '2 C\"s are used. Ive suspect that the common covariance assumption

ma>- l x grossly violated in the wavelet domain when a11 eight classes are used. Some support

for tliis assertion cornes from obsen-ing the curves generated by using al1 7 CI-'S. Ttiey

acliicvc their minima at very loiv EDF. as comparcd to B-spline representations. and risc

sharply afterwards. Since large ridge penalty (and hence small EDF) works to counteract

the effect of unequal covariance matrices. these ivould indicate that the common within-

coi-ariance rnatris assumption may be badly violated.

Chapter 5

Static Force fMRI Analysis

In Section 2.4.2 ive described the static force data. In this cliapter tve will estcnd the

rnc t l iodo lo~ dcveloped for FOPP task. Our goal is to remain in thc same paradiqm as

hcfore: devclop a descriptive tool to offer sevcral views of the data. as driven by the

cspcrimerital setup. but taking into the consideration the residua! covariance structure.

Tliat is. we would likc to look a t the data through the canonical t'ariatcs. whicli describe the

csperiniental --gradientu: tvhere do the esperimental conditions rcally makc the clifference.

\lé fccl. tliat cvcn though the task in front of us is not a classification task. the Discriminant

approacli t O t hc data is st il1 suit able. as i t disassembles the bctwecn-condit ions covariance

struct urc into orthogonal pieces of decreasing influence.

5.1 Modeling the time series effects: Time-Smoothed

PDA

The niain cliallerige of this data, as contrasteci with the PET FOPP data. is the existence

of tirne series cffects on a much finer scale than before. In the PET data, we dealt with the

timc scrics by cstending the class structure (our eight-way PD-4 analusis): and therefore

allowirig the time effects to be arbitrary. The staticforce data contains the 8 siibjects'

t h e serics. cach of length 91. where each image is taken in 4s inten-als. The force levels.

wliicli arc t hc esperimental conditions here. are super-imposed on t his time-series. and

t h e are aboiit 8 scans diiring each instance of a force condition (about 11 for baseline). It

is rcasonablc to suspect. that some part of the variability in the scans is due to the time-

tlcpendcnt changes independent of the conditions. Thesc could be related to time drifts in

t h AIRI machine (the simple linear drift that almost always accompanies the f'rIRI series

lias bccn rcmoved in the preprocessing stage. but there may be "higher order-. changes). to

tlic lieniodynamic processes in the brain that occur during a givcn instance of a condition

and may possess a systematic structure. and, as before. to the long-term brain processes

l i ke adaptation. O\-er-learning and fat igue.

It sccms intuitively clear. that there should be some continuity in the time-scries of

scans. sincc thcy are taken in every 4s. This mcans tlmt corisecuti\-e scans within the

same condition iristance should change in a smooth wq-. apart from noise. l\é may force

sriioothricss ont0 the result similarly as we forced spatial snioothness ont0 the canonical

\-ariates. The ciifference here is that ive are forcing smoothness between the scans that

c.onstitutc the observations. rather than within the result.

5.2 Introducing Between-Scan Smoothness within the

Discriminant Framework

The LD-A algorithm may be cast in terms of the orthogonal projection operator. Pl. that

projccts the scans ont0 the structure of 1'. In the typical LD,i\. 1- is just a iv x J indicator

niatris. and then:

projects any data ont0 tlie class structure. For example. if. as before. S denotes the X x p

scan matris. Pl-S is the Lv x p matris that has scan (ron.) i replacecl by the class-average

of al1 scans in the class that the obsen-ation i belongs to. The between- and within-class

covariaricc niatrices are (see also Eqn. D.1) S~P)-S and ST(I .v - Pl-)S. respectively. The

itiea here is to work with the Pl- opcrator forcing smoothness between s a n s in the time

serics.

\\'c can dcvelop the idea intuiti~vly. as follows. hitially. n-e could partition the t ime axis

into non-ovcrlapping bins. of say three scans each (about 1'2s). \\é could dcsignatc each

bin as a separate class. \\é would then have about 30 classes. just from the time structure

alone (\ive will introduce the combination of the tirne and force levels effects later). If we

assume t hat the scans are in temporal order. t hen the first four rows of tlie I rnatris n-ouid

look as follows:

Our proposal is to replace the rigid 0/1 design abow, which corresponds to square kernels

on tlic tirne asis. with smoother kernels. If WC pick a smooth basis likc B-splil.es. ive can

iicliicve t hc desired effect by setting up the rcsponse matris 1' to be an -V x .J2 matris of .J2

B-splirie basis evaluated at the N timc points. In fact an" smooth kernel-shaped function

coiild be uscd. and our intuition suggests that very similar results would then be obtained.

B-splirics have the advantage of compact support which: together with the banded penaltj-

matris. leads to efficient numerical implerncntations.

\\C still need to include the force levels, which is the main esperimental design effect.

Our proposa1 is to crcate the Y - matrix that combines the force level and the time structures

in a naturaI way. \Ve coiild also impose smoothness ont0 the force structure but ive chose

not to. and allow arbitrary force level effects. This is feasible. since there are only 6 different

force le\-els. and lets us assess the relationship between force levels and the brain response

visually n-hich could then be followed by a more forma1 investigation in a hypottiesis testing

fran:ework.

To complete the story ive propose to penalize the time-mis pararneterization. It is

natural to regillarize ttic B-spline basis by the second-order penalty matris. which penal-

izcs the second-derivative of the resulting function to control its --wiggliness" (Hastie and

Tibstiirani. 1990. Green and Silverman. 1994. e-g..). Therefore the complete proposal is to

set-up the response matris 1- as:

1- = [ I l . Ib] = [fl. . . . . fJi . Bi ( t ) . . . . . BJ,. ( t ) ]

wliere f, arc indicator columns for force Icl-els. and B,(t) are .J2 B-splinc hasis functions

cvaluatccl on tlic tinie points. Thcn the projection matris is constructeci:

Here R is a penalty matris for the B-splines. with rows and columiis. t h correspond the

force lcvcl basis. zcroed out. cind A,- is an another free Liyperparameter that controls the

csact amount of smootliness in the timc domain. \Ve cal1 the resulting mode1 a timc-

smoothed PD.4.

5.3 The O(N) Algorithm for Time-Smoothed Penal-

ized Discriminant

-4s prescritcd above, the algebra associateci wit h the time-smootlied PD-A would entai1

computirig p x p matrices and pvectors, where p is a number of vosels (or image b a i s

fiirictioris) and is mucli larger than X - iVe need to modify the algorithm presented in

Section 3.3.4. n-hich \vas espressed in terrns of the ,V x ,V matris of inner prociitcts. Our

approach is to construct and disassemble the Pl. projection operator. which then lets us

use the usiial PD-\ algorithm.

Specifically. rve start by creating 1- as in Ecp. 5.2. Then \ve compute S = l T I F + XI-R

and tlie Singular \alue decomposition of it:

l\é t hcn computc the normalized response matris:

Sincc Dl- is a diagonal matr is the in\-ersion abore is trivial. One can non- easily show

tliat riinning Our algorithm from section 3.3.4. with a response matris 1 i i from Eq. 5.4 is

cqui\-alciit t o eigcn-analyzing 5 (wi t 11 bot 11 covariance matrices clefined t hrougli the

projection operator. Pl- from Eq. 3.1) which is the holy grail of LD.\.

Tlicrc are man? ivays t o disph>- the rcsults. Obviousl': ~ v c will want to look at the

Canonical \àriates. but it is also important to uriderstarid what do the C\-.s represent.

Tho fks t~s t wa>- to asscss that in the usual LD-4. is to project the class means on a pair of

Ci- 's ancl display the projections. In our case this corresponds to projecting force levels'

and tirne mcans. If 1 j and 1; represent t tic indicator matrices for the force levels and the

tinic points. rcspectivel. n.e need (in the notation of Eqn. (2.8):

wherc I I , is a matrix of time/force level means. and subscript z stands for either time t or

force lewl f structure. This also shows that we can examine the projected means in the

LV-spacc! wi t hout corn pu ting the espensive Canonical ilariates.

5.3.1 Constructing the Second-Order B-spline Penalty Matrix

115 point out in section 3.5 that i t is desirahle to use a "proper" second-derivative penalty

niatris. 9. to penalize the B-splines. Esing such a penalty \\-as computationa1l'- incon-

\-cnicrit for us in the case of B-spline espansion of Canonical lar ia tes . but is completely

fcasible in the current case. To obtain the cubic smoothing spline representation of the

tiriie structure. n-e need a 91 B-spline basis with b o t s a t unique tirne points and sccond-

tlcrivativc 0. Here L w describe a computational tricli that lets us avoid esplicit construction

of the B-spline basis matris and f2 by using an esisting Splus smoothing spline function.

smooth . spline. For a giwn A. smooth. s p l i n e delivers. among other tliings. predicted

\-altics. y:

~vhcrc B is an r l x n niatris of n B-splines eraluated at t h e n design points (assiiniing al1

dcsigri points arc unique). Ive cvaluate smooth. s p l i n e rr times. n-it h rL canonical basis

vcctors for y (i.c. a t the Ph evaluation 9 is a vector of al1 zeros and a single unity a t thc kth

placc). aritl n'itli x wiiich is a sequencc of n design points. In our case. x is a vector with 91

timc points for tlic nIRI scquence. x = [O. 4.8.. . . .360]. For cach cal

n-e get y n-1iic.h is a row of the hat rnatris: and thus after n evaluations

1 to smooth. spline

WC can rcconstruct:

SOK. B-splincs arc jiist one possible (and numerically efficient) basis to obtain a solution

to the smootliing spline probIeni, (Eq. 3.60). but any other full-rank basis systcm will give

the samc fitted values. In particular. we can change B to the unity matr is and obtain the

solution in the Satura1 cubic splines basis (eq. (2.10) Hastie and Tibshirani, 1990). That

is. n-c would obtain:

(a. 10)

wiicrr I ï is a penalty matris for the natural cubic spline basis. For X = 1. we can compute

li fronl H by eigendecomposing it:

thcn in\-erting and subtracting 1 frorn eigenvalues 7 and reconstructing Ilc from these and

the cigcrivcctors:

\$3 can use I\r with time-structure response 1 from Eq. 5.2 being just an indicator matris.

5.4 Connections wit h Canonical Correlation Analysis

and MANOVA

-4s WC have shou-n in Section 3.3.4 the LD-4 is basically eqiiivalerit to Canonical Corrclation

.Analysis (CC-$ n-lien the class-indicator matris is used for 1-. This connection is cl-en niore

appcaling in the present1~- proposed model. CC=\ does not put an'- requiremcnts on the

rigtit-hand raiables. Thus we may choose any reprcscntation for 1-. for instance the

structure slion-n in Eqn. 5.2. It is up to a rcscarcher to rnake surc that t h e rcpresentation

is sensible [rom the interpretation point of \-iew. In the present contest. we are seeking the

carioriical correlations of scans with both the force l e l d and the timc structure. In addition.

i t ~riakes scnsc to parameterize the time axis using smooth b a i s fiinctions to mode1 part

of the intcrscan corrclations that esist due to pro-ximity in tinic.

T h penalization scheme proposed here is also appealing in the CC-\ contest. The PD-4

rcgiilarization pcnalizes the left-liand side of the CC-\ equation. or modifies the norm for

Icft-hand side Canonicai Variates that correspond to the scan data. The penalization of

tirric-ais B-spline basis. does the same to the right-hand side of the symnietric C C - ! equa-

tion. I t is. again. up to a rescarclier to make sure that the penalization sclieme is reüsonable

frorn the andytic point of view. The model proposed herc is similar to the one described

in Ctiap. 12 of Ramsey and Silverman (1997) which Ive siimmarized in Sec. 1.3: now both

parts of the critcrion (1.6) are trcated as functional and regularized. One difference is that

we have a mised response (or I V ) structure: fised force levels and smooth time which we

deal witli using an additive model.

In Section 3.1.1 ive hinted a t the connection between classical LD-4 and I\I-CïO\:\.

Herc wc will show t hat the proposed t ime-snioot hed PD--4 model has a similar connect ion

witli an appropriatel'. pararneterized 11-\SOC--\.

In Section 12.5 (IIardia et al.. 1919) shows that the test of dimensionality in one-way

.\I.ASOlv-A lcads to similar results as the LDA. Specifically. if Ive assume the model:

for the scans i , . i = 1. . . . . ,Y that are in J classes. and witli the iisual cusriniption of i.i.d.

Gaussiari crrors E - N(0. Si,-). ive can first test the nul1 liypottiesis of cquality bctwcen

thr class incans p,. If ive reject the nul1 tlien LW ham at lcast two options. We can test

for spccific contrasts: as in usiial ,4,\-0\:4. but WC can also perfor~n a more gencral test

of diniensioriality. That is. ive can test whcther the .J class means (wliich lie in the p

cl iriiensionai space of i,'s) span the r dimcnsional Iiypcrplane. wit h r < J - 1). The GLR

trst resiilts i r i the sum of the first r eigenvalues of X,!Ss. which is the decomposition

tliat ais0 @\-es LD-4 results. Also. one can show (Ilardia et al.. 1979. Sec.5.4). that the

cstiniated hyerplane for the class means can be parameterized in terms of thc eigenvectors

of X , - ' Y ~ . wliicli are (up to a scale factor) same as CV's from LDA. Similar connection is

provcli by Hastie and Tibshirani (1995) who use these results to dcriw thc E l1 algorithm

for rcciuced -rank misture discrimination.

11-c will nonT show that similar results hold for the moclcl proposecl in this chapter. If

j ( i ) denotes the class (force Icvel) of scan ii and ti its time. then we can propose a 2-way

The Rcsidual Sum of Squares (RSS) for this model has the form:

Let us pcrforni a change of b a i s to orthogonalizc RSS. This involves left-multiplying scans

-1/2 i, a n d factors a. B wit h X,,. . 1 will retain the same symbols for al1 of these ta avoid trivial

notatiorial dianges. Let us non- assume that the effects span the r-dimensional hyperplane

\vit t i an ort hogona1 basis. that will turn out to be the Canonical Ikriates:

Tliat is Ive asscrt that:

\Ive non- pirameterize thc timc effccts to achieve the smoothness. \le choose a basis for

tlic tiniti asis with .J2 components B I ( t ) . 1 = 1.. . . . .J2 and use that to paramctcrizc the

and thcri to get the timc effccts in this basis:

The RSS non. becornes:

(a. 19)

Let us writc the RSS in a matris form. In addition to Cl- rnatris @ that we ciefined in

Eqn. 5.16. and the response vectors y , that are the rows of the rnatris 1- from Eqn. 5.2. we

define the (-1, + -A) x r matris of effects' coefficients. q:

The RSS can be n-ritten us:

RSS = C llii - @qTy i l l

Lrt <Pc = [@<PI] be a p x p matrix witli columns forming the orthonormai basis for W . The

first r coliinins are the canonical variates &. as defined above. and the remaining p - r

coltirriris arc the orthonormal basis for the orthogonal cornpicnient of the CI' space. \\-e

For an!- clioicc of orthonorma1 CVs? a. Ive caii mininiize the RSS with respect to coefficients:

11. Sincc the second term above does not depend on q7 the result is just a regression of

Ci--projectcd scans aTii ont0 I - and thus the minimizing solution is:

n-hcrc -1- is a n -\' x p scan matris.

To find the canonical variates, we just consider a case of the one Ci-. i-e.. r = 1.

Sirice the C1''s are orthogonal, we can do the minimization separatels wliich simplifies

tlic notation which would otherwise require traces of matrices. With just a single 4. the

partial1'- minirnized RSS becomes:

Tlic riiinimizing #I can now bc easily seen to be the first eigenvector of R = . \-T)-(l-TI~)-ll-TS.

arid i r i liglit of the ortliogonality of 4,'s. ttie full solution to the minirnization problern is

{ i j . @}. wlicrc <D has first r eigerivcctors of R in its columns. To finish the presentation n c

l ia-r to rcmembcr that the minimization was carricd out in the rotated systern. Thereforc

to project thc iinrotated scan i i ont0 the hyperplane. n-e need to change its basis before

projcctiiig it onto 4,'s. Tlius the final estimate of tlic hyperplaric's basis are the first r

rotalcd cigrnvcctors of R. or ~ , ! / ~ 4 > , . k = 1.. . . . r.

Illiat n-c have computed are the 1ILE cstirnates of the successively highcr-dimcnsional

11)-prrplanes that are liypothesized to contain the force Icvcl and tirne cffects. The proposctl

rnodcl n-as the 2 wa>- lI-\';OI'\ with B-spline pararncterization of the timc effect. This

rosult also forrns the b a i s for the GLR test of the dimensionality (as in (SIardia ct al..

1979. Sec.12.5)) for the 2 w a ~ - .\Ir\SO\:-A nniodel witli the proposed parametcrization of tlic

tirne cffccts. To see tliat $ are cssentially the unrotated Ci-s ive note that the>- are the

cigcrivectors of R:

By Theorcrn -4.9.2 of Slardia et al. (1979) ive know that the s,! /~@ are then the eigenwctors

of SI(! zB: wliich gives the LD.4 decomposition.

This connection bctween an appropriately pararneterized 2-way -\I.lSOV-4 mode1 and

our proposa1 nlay be used to obtain furttier insiglits. It is now clearer that the penalization

e basis reduces the effective dimensionalit!

123

,- of coefficients rl ancl thtis regiilar-

izcs the time effccts P(t ) . This is on top of the crude regularization that is provideci bu

liniiting t hc canonical dimensionality. The B-spline penalization prohibits escessive varia-

tion of the time effects and thus forces the estimation procedure to esplain the rariability

in othcr. hopcfully more suitable, ways.

5.5 Penalized Discriminant Analysis of StaticForce data

in B-splines and Wavelet domains

i\-c have applicd the PD.\ mode1 with rictge regression to the StaticForce f'\IRI d a t a de-

scril~ed in Sec. 2.4.2. \\'e usc a 6 class structure: baselinc and 5 force levcls. There are 91

SC-ans for cach subject: 46 of them are in 6 baseline instances. and -4.5 in 5 active classes.

i\-ith S sul~jects n-e ha\-e a total of 728 scans.

Pmlectiw on fifst CanonicaJ Vanate Pqecawis on f i m Canmical Vanate

Figure 5.1: Projections on first 2 CES from the PD4 mode1 appliecl to

The figure 5.1 shows projections on the first two CVs for the StaticForce d a t a using

sccorid-ordcr Daubechies wavelets with thresholding. The right panel shows the results of

the rnodcl that n-as fittecl with EDF=2OO. The projections indicate that the PDA modcl

cstraçts a rcasonablc structure: the first Cl,- divides baseline and active scans and the

second CI' corresponds to force le~els: apart from class 6 (1000g) the scores on this Ci-

incrcase u-ith a force level. The static force esperiment with lOOOg is seen as somcwhat

~Iifferent from the othcr ones: it is apparently quite hard to maintain the force of this level

for 45s t hrough the esperiment. It is reasonable to espect that different brain structures

n-il1 t ~ c in\-ol\-ed.

11-c h l - c performed an extensive predictive analysis study iising 3 thresholded tvavelet

findies. as described in section -1-4 and B25 B-spline basis. \\é also used rcduced rank

ciiscrirnination with 2. instead of 5 canonical variates uscd for prediction. In al1 these cases.

1)otIi SPE aricl misclassification rate achieve their minima at \-ery Ion- degrees of freedom.

Reduccti rank hclps keep the errors from increasing rapid1'- for higher EDF. but is otheru-ise

riot bettcr than the full rank motlcl. The minima obtained are invariablj- around the base

rates for this data: rates that would be achieved by thc rnodel that prcdicts based on the

prior pro1)d~ilities. The base misclassification rate is 36S/72S or 50.55%- Sirnilarly. the

Imsc SPE is 0..3055(1 - 0.5055)~ + 5 * T2/T28(1 - 72/729)' = 0.5'25. sincc for each of fi\-c

classes t lierc are 72 scans (out of 7%). The Ieft panel of figure -5.1 shows the prediction

for tlie PD--4 mode1 fitted with EDF=8. It sho\vs that the mode1 is doirig exactly what u-e

suspcctctl: predicting the a'priori most probable class regardless of the scan characteristics.

That the niinimiim crror occurs at these Ion* dcgrccs of frecdom. suggests that PD-4 is not

a t~ lc to effectively predict the cfass of each particular scan. The predictive failure of PD-4

docs not conipletcly clisqualify it from analyzing the data. -4s we San- in fig. .5.l. PD-4

est raçts two rcasonable components: it is t heir generalizability ovcr subjects t hat is in

qiicstiori. ,Us0 therc is an important time axis here which is completely ignorecf in the

ciirrcnt analysis and which may constitute a much strongcr effcct than the condition under

wliich the scan \vas t a k en.

5.6 Applying Time-Smoothed PDA model to the Stat-

icForce data

11-e apply the tinie-smoothed PD.-\ model to the StaticForce data. \lé use B25: tensor-

prodtict of 2.3 B-spline basis functions in each dimension. for image rcpreseritation. and 91

B-splincs hasis function for time mis with knots at the unique da ta poirits. \\k use thc

--proper" B-spline pcnalization witli the second-order penalty matris.

Time and Force Level Projections: B25 with EDF=50, lambdaY=lO

Time (s)

Force level

Time (s) Time (s)

1 2 3 4 5 6 Force level

!l 1 2 3 4 5 6

Force level

Figiirc -5.2: Projections of and timc-points (first rori-) and force lei-el means onto the first forir Canoziical Images using the time-smootli PD-\ rnodcl nitli 825 Tensor Prodiict B- splinc basis iind B-splines for the time axis. Forcc lei-els ir-cre: 1 - baseljnc. 2-2OOg. 3-400g. -4-G00g. 5-800g. 6-1000g. The tirne-struct rire penaltj- hperparametcr n-as set at XI- = 10.

Figure 5.2 shotvs the projections of the a\-mage of al1 scans a t a given timc point (first

ron-) arid a giwn force level (second row) ont0 the first t h e Canonical Irnagcs. These

accoirrit for about 85% of tiie total variance. Tlie hyperparameters were set a t EDF of

about 50. aricl Xi- = 10. Ttiese were not optimized.

Tlic first C l - accounts for almost 68% of the variance. It dearly separatcs tiie baseline

s ta tc frorn al1 the others. The corresponding time projrçtion shows a possible quaciratic

tiriie rtilatioriship for baselirie States: it s tarts higher for first bascline statc tlicn clccrcases

arid iricreascs back for last baseline. It m+ correspond to some kind of ..anticipation"

. lio\vcver ttic cffect is miich weaker than thc baselinc-activation effcct and thtis liard to

iritcrpret. In addition to this activation effect. the force leveis arc ordcrcd on this Cl* wliich

riiay provicie sonic insight into how the forcc lcvcl is mocleled witliiri ttic brain. Tlic sccond

(ahoiit 10% of thc variance esplainecl) shows a curious tirnc trcncl wliicli is quite liricar

for niost of the tinie intcrval. This ma>- bc rclated to a nunibcr of tiiings (iriclucliiig J IRI

rriac11i11o t r~z id) arid rcquircs fiirthcr scriitiriy. The corrcsponding forcc Icvcl effcct is also

strorig arid ici rriostly gcared ton-ards tlistingiiistiing tiic tliird force Icvrl. Tlic third Ci-

(iil~oii t 7% of \-ariance csplairicd) lias a rat hcr rioisy t imc structiirc. wit ii somc pcriodic

bella\-ioiir. rnostiy visible in tlic carly timc and rclatecl to the h=clinc-alti\-e changes. Thc

corrcsporidirig force lcvcl display hints of striicturc in the brain wliidi is associatccl witti

the strcngth of tlie forcc esertcd. The basclirie condition is an exception liaving tlie samc

score as force 1evc1 4. This may indicatc tliat the baseline is quite distinct from zcro-lm-cl

force tiiat it is supposeci to mode1 and shoiild not be corisidercd togetlier with otlier force

Ici-cls. T h force lcvel ortlering on this Clc suggest that it nia- be thc most interesting to

look at n-lien s e a r c h g for the answcrs cn the relationship bet~vcen tlie an-iourit of force

appIicd and t lie cont rolling brain struct urcs.

\\-e also riotc some relationships betwc.cn thc C1.s discovercd herc ancl in the PD-\

niocicling in the prm-ious section. The first Cl-s of both mode1s are clcarly quite similar.

Figure 5.3: Selected slices of the third canonical image resulting from applying the time- smoothed PD.4 rnodel to the Staticfmce data with 625 basis. EDF=50 and = 10.

The time-smoothed PD.% CV described here has a stronger association with the force

levels in addition to rnodeling the baseline-activation changes. The second CV of the tirne-

snloothed PD-4 mode1 seems to be a novel discovery as it is strongiy related to the time

axis. The third time-smoothed Clw is similar to the second CV of the PD.4 rnodel. The

difference is that it does put the lOOOg force level in the right order with respect to the other

forces. It may be occurring because of the explicit modeling of the time axis: this force

level does not occur as a first active state in any of the 8 subjects. Thus its unexpected

score on the second CV in the PD-A model rnay be a result of the confounding of tinie-order

effect.

In general, me believe that the time-smoothed PD.-\ model is potentially very useful

for modeling f'clRI data. It provides a decomposition of the covariance matris along the

1-ariance coniponents induced by the experimental setup but it also takes into the account

the strong time series effects. Currently we lack the criterion for optimizing the hyperpa-

rarneters and assessing goodness-of-fit, since the classification performance is not longer

useful in this paradigm, but ive mention a possibility in the next chapter.

Chapter 6

Conclusions and Extensions

Tiic prcscnted paradigm provides a flcsible option for constructing summary images from

h t i i PET and f'\lRI studies. It takec into acçount different csperimental setups and is

flcsibic cnoiigh to accommodate two smoot hness sources known to csist in t lie data: spatial

and tcrnporal (fl1RI). For PET stiidics. tbc predictivc analysis constitiites a validating

tcciiniqiic tliat givcs a rescarcher a degree of confidence iri tlic resulting images and allows

hirn/Eicr to makc ciioices (e-g.. amorig cliffcrcnt bases or in riuriibcr of degrees of frceciom).

Our niethocl has some advantages over othcrs proposed in the literature:

a It de& \vitIl the full 3D (4D. n-itli the temporal extension) data in a cohesi\.c way.

without a nced to delineate the rcgions of interest or perforni voscl-based arialysis

a It ackriowlcdges esisting spatial and temporal snioothncss in a simple way via basis

esparision. which has an added benefit of reducing dimensionality and thus possihly

variance

0 Csing a fised basis and regularization it avoids the S\;D basis which are wholly data

and \-ariarice driven and tlius does not take into the account the spatial naturc of

scans. IVhen using an S\--D basis one also faces a task of choosing a subset of tliem.

wliich is an exponential complesity tasko which we avoid 1- regularization with a

single hyperparameter

It provides a simple predictive framework for assessing the goodness-of-fit. \Vhile

prediction is not a goal per se in neuroimaging studies (althougli it may become one

as the diagnostic value of tlic PET/f3IRI brain scans increases). the Prediction Error

provides a simple one-number su rnmac of tlie effectiveness of the resulting imagc(s)

a \lb tlcvclop ari associatecl. computational1~- appealing. algorithm tliat a\-oids con-

st ructing huge covariance matrices

6.1 Extending the predict ive analysis

\\c use prcdiction error cstirnates as a way to botli choose a basis and hyperparameters

aiid to validate a resiilting image. ive would likc to estend this paradipi to thc Tn-O-\la'-

PD-A rriodcl (Section *3.2). One possibility is to use the AI,\SOl--l connection: if n-e think

of Carioriical ikriates (Ci's) as a basis for the scan space. we can use bootstrap to cstiniate

t hc _\Ica11 Square Error (AISE). Specifically. ll-ASO\--A tclls us t liat if oiir mociel is correct.

cach scan is coniposed of a linear combination of Canonical \ariates and an error term

(Ecls- 5.14 and 3.17). \\é propose to validate the process by estimating tlic truc AISE:

Herc. t lic double expectations taken over the distribution of the training sets. X . and then

01-cr t lie distribution of independent test scans. io. IVe use the within-covariance rather

tlicri t h Euclidean norni to orthogonalize the Canonical lariates. as in Section 3.4. This

way. -211SE is (up to an additive constant) a log-likelihood of a ncw test scan, io.

To proceed. Ict us first orthogonalizc the system, as before (Sec. 5.4). \Vhat we have

non. arc t h cstimatcd orthogonal canonical variates. Ok: and we can writc the norm in the

ncn- basis systern as:

for some canonical coefficients. n-hich combine both the time and force level structure.

T-l/2. Tlic starrcd quantities refer to the rotated quantities. for esample i, = ,,,. 10.

1'5-c propose to estimate the AISE using the best linear cocfficients. for a given test

scari. io. and a kth C\-. o k . Our rnotit-ation for this is that wc arc intcrcstcd in Iiow wcll the

estirnatcd Ci-s represent thc data. and ; jkO are a nuisance parameters in this contest. By

"t)cst" wc understand as resulting in the smallest )ISE. It is trivial to sho~v. since Canonical

iariates are orthogonal in the rotated basis, that the minimizing cocfficients. are just

projcctioiis of a test scan ont0 eacli CI-. Since:

a r i d thcrcforc we can project the test scan onto the iinrotatcd Canoriical \ariate to calculatc

t tir cocfficicnts. Csing Eqn. E.2 \vc sce that 5ok ma>- be calculatecl witlioiit actuall5- resorting

to t lic projection wliich is an cspcnsive ( O ( p ) ) operatiori.

Ttic '\ISE then is a double cspcctatiori of:

I t should t x possible to compute the first sumniand using orily outer-product matris G and

t hc mode1 fit tcd to the training set, or quantities. without resorting to the espensi\.c

operation e n the actual scans.

To estimatc IISE (Eq. 6.1) ive can use .632+Bootstrap or cross-validation. Gi\.cn the

cstimatcs of obtained using the bootstrap set. we would apply them to computc

the '\ISE for the scans in the validation set. and average as beforc. Even in the "reguiar"

PD.\ niodels. this could be a more appealing alternative to the prcdiction error that ive

6.2 Comparing the Results Across Non-Predictive Paradigm

It would be of interest to compare the results of our model (Smooth Canonical Images) with

thcse of othcr methods currently in use. such as t-maps. ASO\'A/-\SCO\---4 preprocessed

PC--4 and Scaled Subprofile .\lodel. For two classes. one possible \va'- to compare these

n-oiilcl be the ROC analysis. ROC cunes are a measurc of a classification mode1 predictive

pou-cr n-lien n-c do not want to assume any thresholds for determining t h class. The area

iirider ROC ciirve represents the total amount of information about the class in the result.

iirider lincar rnodel. One powrful fcature of ROC analysis is that it is invariant under

moriotonic trarisformat ion of an image.

The prefcrrcd paradigm would be to perform a bootstrap studj-: for each bootstrap

siiniplc coriiputc the surnmary image (Canonical Image. t-map. first Eigen-Image or Group

In\-ariant Siibprofile) and project each test scan onto it saving the score and the truc class.

.At the crici comprite the ROC curves and areas undcr it for each model. Similar paradigm

Kas proposcd and tested on a set of sirniiiated data. by Lange et al. (1999). and i t lias bceri

warrnly recci\.ed by the c-omrnunity.

Diffcrcnt approacti. tliat works for more than two classes \vas devcloped by Strothcr

tt al. (1SC)Sh). and called SP-AIRS. It involvcs assessing the variabiiity of ttic resulting image

iising pairwisc permutation studies. Briefly. one perfornis a large nunibcr of espcrimcnts in

which the data is split randonily in two halves. One obtains the summary irnage for eadi

half arici cornputes the correlation coefficient between them. The coefficients are averaged

ovcr many random samples to give a total variability measure of the image. One problem

n-ith SP-AIRS is that it does not take into accorint tlic '-bias" in thc result: by bias I

nican somc measure of the relevancy of the resulting image: a usclcss rnctliod that always

rcturns the samc image would score perfectly in this system. However. for any -'reasonable"

riicthod. cspecially if one that has been internally optimized using, for esample prediction

crror. SP-AIRS gives a useful indication of the total variability.

6.3 Wavelets and Basis Select ion Techniques

\\-c have esperimented with two kinds of basis: Tensor-Product B-spline and wavelets.

Thcre are many other possibilities of course. and cven within these two meta-families a

great rnany more things ma_\- be esplored.

Our currcnt approach with B-spline basis is to delete tliose basis that fa11 outside of

t h ovcraI1 mask. \\è have not conducted systematic espcrinients to check whether the

ovcrall rnask should be an ,\SD or an OR of al1 masks. or pcrhaps something in betwcen.

Morc gcncrally. sonie basis selection (a'la wavelet denoising. pcrhaps) may be useful. Our

approach has been to shift the burden of basis sclection ont0 the ridge penalty. However

tliis nia'. be an over-siniplistic s t r a t e u and some combination ni-- be desirable. One

possit~ilit!- would bc to use a sum-of-absolute-values penalty. like the L-ASSO strate= of

Tihhirani ( 199.3). This offers a compromise betivecn shrinking and basis select ion and has

Iwcn siicccssfiil in rnaiiy situations when compared to classical shrinkage of ridgc rcgression.

On the otlier liand. ive have mentioned beforc that it woiild bc dcsirahle to replace the

riclgc penalty \vit11 the second-deril-ati\-c one to perhaps obtain a thin-plate spline solution.

It ma? bc possible to combine both strategies: a L-ASSO-like pcrialty for shrinkagc and

basis select ion n-it h t tiin-plate spline second derivativc penaltj- One major obstaclc is to

iiriplcnicrit this in a computationally appealing wq. that would. sirniIarIy to the algorithms

prcscritcd in this thesis. avoid constructing covariance basis in the vosel space.

Tlicrc csists a more systematic approach to selecting basis froni many families. Wavelet

packcts and t lie associated Best-Basis Pursuit algorithms (e-g.. i'idakovic. 1999) start \vit h

01-erconipletc dict ionaries which contain redundant basis from the one family or multiple

families. Bcst-Basis pursuit \vas developed for signal and image denoising. but the idea

lias Ixcn estcnded to multiple images and LD=\ by Coifman and Saito (1991. 1996). which

clcscribc the Local Discriminant Baszs (LDB). LDB searches a large redundant basis clic-

tioriary in a rapid way picking these basis that have high discriminatory power. Crucial

t O fast irnpicrnentation is the addith-ity of the discriminatory measure. The rsarnples are

I\Lullback-Liebler divergence and Hellinger distance. \Ve definitely feel that the area of basis

select ion. especially \vit h tvavelct basis. warrants niore exploration.

I\'e feel strongly tha t working in the 1-avelet domain has great potential in neuroimaging.

It lias a potential for great dimensionality reduction without affecting tlie rcsults in major

way. Indccd. if done correctly. one may obtain better rcsults. as tve saw in two-class PD,\

:triiilysis for tlie FOPP data, due t o decrcase in variance. Our approach to basis selcction

bascd on image-wise thrcsholding is simple to implement but lias large potential drawbacks.

First. it docs not pool information across scans. One possibility would bc t o perform a

robust version of -lSO\--l analysis. using medians and absolutc distances instead of rneans

arici square metric. to estirnate the levei/cliannel-dcpenclent noise across scans. This would

b c i n riircct analogy t o the current ,\LAD estimcitor but n-ould take variability across scans.

as wcll as poterit ially between siibjects and conclitions. into account . It is qriite possible

t hat u-i t h t lie current strateg'. ive may be cleleting basis important for signal discrimination.

It is also possible t h in sornc parts of a scan. the variabilit~. in higii. but that part still

lias sorric discriminatory power. Possibly more likcl>- is the rcvcrsc sccnario: there arc

lou--1-ariability regions in the scans with low discrirriinatory powr . which currcntIy sur\-ive

the thrcsholding. pcrhaps a t the espcnse of other regions. only becausc WC do not takc the

discrimination problem into account wheri constructing the wavelet basis. One problem

whcn es te~ id ing thc thresholding strategies t o account for subjcct and condition effects is

t hat it will require a different approach for validating the image: using fised subjcct effects

arid conditions to select wavelet basis would currently recluire that tliis step be performed

for cvenj bootst rap sample. which would be compiitationally proiiibitivc.

\\kvcIets also offer a possibilit? for more intelligent penalization sciienies. Each wavelet

lias a position and the scale associated with it. and thus u-e may use an'. prior information

to differcritiate penalties for different spots in the brain on different scales. For csample.

n-c ma?- perlalize places with white nlatter or ventricles more. as they are not likely to

participate in the brain function. The smoothcr rcpresentation is alrcady penalized less:

since t h e are twice as many wavelet coefficients a t the nest higher level. and since each

coefficierit receil-es the same penalty collect i\.ely the ridge penalt~. *-favours" smoot her

rcsul ts: this could he enforced n-it h location-based penalties mentioncd above. Since al1

tiicsc schenics result in a diagonal penalty matris they are easy to implcmcnt in the current

algori t h i c paradigm.

6.4 Inference and other issues

\\-r ha\-e not done much work on region-specific inference. LD-A and otticr miiltivariate

tcchniqucs. as applied to images. are spatially global in nature and have mainly a descriptive

appcaI. I I C usc prediction error to raliciate the procediirc. but have not made ariy

at tcmpt s to dcsignate specific regioris as significantly act ivated.

-A simple approach would bc to assume normality. coristruct a T-map frorn a Canon-

ical l a r i a tc aiid t liresliold using t lie Bonferroni correction. This rnay bc rnuch appealing

hcre tlian it \vas in the case of t-maps constructed with Statistical Paramctric LIapping

(Scctiori 2.3.1). sincc the basis coefficients, 7. that resiilt frorn PD-4 and projcctcd images

are potcntiall~. a lot less corrclatcd than the original scans. For one. somc spatial correla-

tiori lias bccn removed ria basis expansion step. and PD,\ decorrelatcs Canonical Iariatcs

furthcr by n-orking in the rotated space. Similarlj; it may be possible to utilize Gaussian

Ranclom Field thresholding of 11-orsley et al. (1992) on the reconstructed Canonical Image:

orle n-ould assume that rindcr the nul1 hypothesis the canonical variate resulting from PD-\

applicd to t h projected data is a zero-mean field. as before. Then the covariance matris

(2.21) of the canonical image. that results from applying Eqn. 3.34, would be possible to

calculatc using the properties of the b a i s used. This may be more appealing than the SPlI

approach since the homoscedasticity is more tenable in the CV space and the Prediction

Error (or AISE) woiild give us some non-parametric confidence.

--hotlier issue is that of canonical dimension reduction: choosing a number of significant

carionical variates. For esample. in the 8-class FOPP problem. we felt that first 2 canonical

\-ariates retain most of the structure associated n-ith the problem. The casiest approach

~voiild be to estend the prediction error selection to choose the canonical dimension. Some

asyniptotic results ma_\- be tenable for this problem. however. since Ive are ivarking with

the surnmav data. -4 related issue is that of allowable rotation of canonical variates. For

csaniple. if only the first two CI-'s are designatecl as significant. what n-e reaily cibtain

is a two-clirnensional \-iew of the between-class covariance. It is quite possible that some

rotation of the Cl-'s u-ould result in more appealing structures. Similar issues are present in

t h principal cornponent ana l~s i s Iiterature and a niinilxr of automatic rotation procediires

(suc-11 as \'-\RI,\I,lS) liavc becn developed.

LD-4 and PD-l depend heavily on the ability to cstiniate the covariance matris. Since a

fiill-riink covariance matris cannot be est imated ive --clieat" a lit tle by pcnalizing it . wfiich

c+f~cti\-cl? adds sonie volume in each direction. 1Iore irnportant1~-. LD-\/PD-\ use pooled

cstiniatcs of coi-ariance over al1 classes. assuming the same shape. Tlic alternative of esti-

riiatirig separatc covariance matrices for cach class. called Qiiatiratic Discrimiriant -4nalysis

is clcarly untenable in the present case. Friedman (1989) offers one intermediatc solution.

tcrriied Rcgularized Discriminant -4nalysis: first shrink the covariance matris for each class

t owards a circular one (via ridge penalty). and separatel'. ridge-penalize the average covari-

ance matris. Tn-O hyperpararnetcrs result. wIiich may be estimated witti cross-validation or

lmotstrap. as in our case. -4nothcr possibility is the Alised Discriminant Arialysis ()ID-\)

proposal of Hastie and Tibshirani (1993). There cach class (or al1 of the data) is modcled

b?- a misture of Gaussians with a resulting mixture of CO\-ariance matrices modeling the

cornnion covariance structure. Each mist ure-covariance mat ris is penalized with a global

hyperparameter. which helps keep the degrees of freedom Ion-. This proposal tias a poten-

tial for niodelirig different shapes for each class using different mean and covariances for

class-spccific Gaussian cornponents. SIDA algorit hm involves Expect at ion-.\ Iasimizat ion

(E l [ ) iterations of basic PD.\ algorithm. and is thus computationally appealing in our case.

sincr Ive can run the analysis in the O ( S ) tirne.

Appendix A

Tensor Product B-Spline Basis

Tcrlsor products providc a general way to estend a one-dimensional basis to more dimen-

sions. Sec. for instance, Green and Silverman (1994). which deals with cubic splinc bases

or Ogdcn (1997). for an csaniple of a tensor product basis in the i r a -de t domain. On the

riiodeliiig siclc. Friedman (1991) develops a powerful and adaptive mode1 using first-order

tcnsor prodixct B-spline basis with backward elimination-

B-spliiics. discusscd a t length by de Boor (1978). twre tlcvclopcd as a nunicrically-

rfificicnt Insis for polynomial splincs. If B, dcnotes the jLh B-spline basis. ive compose a

3-D basis 11y niult ipl~ing the unidimensional ones:

Tlius the basis in 3D involves al1 possible products of the unidimensional bases. Figure -A.1

shows the B-spline basis in one and two dimensions. One notable fcature of B-splines is

their compact support, which results in banded design and penalty matrices leading to

efficient cilgorithms.

Figure -1.1 : 1 D and 2 0 B-spline basis.

Appendix B

Basis Expansion of Canonical

Variates

Iri tliis apperidis we show. that the basis expansion of canonical variates lcads to LD.4 or

PD--4 ivith projccted data.

-4s in section 3.3. let a resulting canonical image be constrained as

wlicrc B is a basis matrix with onc basis in cach column, cl-aluated over tlic rosels (rows).

PD.\ cari be cspressed as aii optimization prol~lcm: for t ~ o classes it finds . that

iriasiriiizcs * 3 T ~ B E T d subject to .3T2ii-0 = 1. For more than two classes one successivcly

iriasimizcs the critcrion subject t o the orthogonality with metric El[-. which does not affect

t lie follon-ing resuit.

If ive add the constraint on 3 the criterion and condition bccome r T B T r B E T m and

TTBTP Br. respectivelu. The between and within covariance matrices are:

Pl- is an orthogonal projection operator on the column space of 1- (9- = 1-(1'T17-11'T):

for LD-4. A = 0. and for PD-4 R is a chosen penalty rnatris in the original space. It is clear

non.. that the PD--4 problem with smoothness basis constraint is an unconstraincd PD-\

problem in projected data rnatris S B and a modified penalty. R*. Our choice. partly for

computational cspediency, and partly duc to the limited knowlcdge of the true nature of

the data in relation to tlie TPS basis. has been to set R' = 1.

Appendix C

CCA via Regression

Hcrc WC dcrive the (unpenalized) CC-\ algorithm via regression (set also Hastie et al..

199.3). Let -1- bc the nurnber of observations (i-c.. scans) with p variablcs as inputs (liere.

\-oscIs or hsis functions). \ \ e assume here. for the unpenalized version. that -1- > p. Let

S be t h _\- x p data matris . and let 1- bc a -\; x J class-indicator mat ris. \vit h .J = niiniber

of classcs. \ \c can obtain the solution to tlie CC.1 problcm from the regular SirD of:

n-licrc r, arc siiigular values. and D, = diag(c). Anticipating the LD-A problcm (-4p-

pciidis D). we are interested in left carionical variates B. which ive will refer to as Ci-'S.

Thcse arc tlic rcscaled left eigenvectors of 1< (Eq.3.40): or:

bccailse of the normalization requirement of CC=\. If S and 1- have becn centcred then we

can use the sample estimates:

Thus thc sample version of Eq. C.1 becomes:

5ow lirli. wliose eigenvectors are the right eigenvectors of Ii: is:

If WC choose orthogonal contrasts for classes (Le.. normalizc 1- (1*T17-11" ive obtain:

wlicrc 1 ' = S (STS)-'ST1-. The two-steps ment ioncd above. are now clccarly 1-isible: run a

iriul t i-rcsponsc rcgression of (centered) da ta matris S on (centered and ort honorrnalized)

groiip-intlicator matris. 1.. ( i e : f - = ~ d ) aiid derim the right-liand cigcnvectors of I<

(Eq. C.1). by eigenanalysis of >-TF:

Tlieri ohtairi the left-hancl eigerivectors using Eqs C.1. C.2 and (2.4:

h

wlitre 3 is a matris of coefficients from the regression step.

Appendix D

Correspondence Between CCA and

LDA variates

111 t h i ç section ive will deriw the esact amount of rescaling needcd t o r:onvert the CC.\

1-ariatcs. B. in the notation of Eqs. C.1. C.2 to LD.4. canonical vi.iriates. BtD.-\. proving

Eq. 3.42.

LD.4 is a generalized cigenvalue problem: find BLo;\ t hat successively niasiinizes BTX B n B

siibjcct to BTSi,-B = 1: where roET. rit- are betneen- and within-class covariance matri-

ces. as in the prcvious appendis. If S has been centeredt thcn, for LD-4:

Sincc Bw arc left eigenwctors of w-e have that:

and tlius NT nced to rescale: BLDA4 = BD(1-c2)-1/2 to mett the LDA constraint.

Appendix E

Deriving Predictions in the

n-Dimensional Space

In this section we show hon- to derive posterior probability estimates using tlic fittcd values

arid cigc~iclccornposition stcp resiilts.

Frorn eqiiations 3.47-3.49 ive riotc that WC nced xOBLD.., ancl TBLoa4 to obtain the

estimates- Son-. using Eqs C.8 and 3.42. ~ v c havc that:

~ I i e r e !jo is a vector of fitted 1-alues for predictor xo.

The Ii x p matris of çlass centroids ( j i k ' s ) rn- be obtained by ( I -T l - ) - l I - T S . wiiere

1- is an .v x I< class-indicator matris. Therefore the requircd IC quantities. jî;-BLDil. are

(l-'I-)-'I-TSBLD.4 and may be calculated. similari- as in Eq. E.2. using rescaled fitted

values and -4'. By using Eq. E.2. al1 fi posterior probabilities are obtained for ro.

Appendix F

Ridge Regression With the Outer

Product Matrix

;\r;~- pcrialized regression. can be cspresscd using only .V(-\-- 1) dot products of obsermtions

in p( =~iiiriiher of columns) dimensional space. or using an oiiter-procluct rnatris. G = SST.

For oiir piirposcs. S is a n image rnatris. one image per ron.. \\ë will work with ridge

rcgrcssion. but ariy penalized regressioii can bc I>rouglit into ridgc foriii L\. siiitûl~lc cliatigc

of lmsis.

Riclgc rcgression is tlic solution of the following probleni:

argmin ( y - S B ) ~ ( ~ - S ; 3 ) + ~ 3 ~ 3 S 3

By taking derivati~ses wrt / Ive have that:

Tliiis t hc fit tcd valucs are

and t h predictecl values a t new design points. S'.

wlicrc G' is a n -l'G x matris of dot-products bctwcen -\$ ncw images anci -\* training

images.

.-\notticr dcrivation looks a t the projection matris SA = s ( s ~ - ~ + M)-lST. Start with

Singular I k luc Decomposition of S:

hi oiir case. the images will usually span the ,\- diniensional subspace of the p dinietisional

\-oscl spacc. To kccp things general. let's assume ttiat the imagcs span a k <= ,l' tlimeri-

siorial spacc. i.c.. C- is a -V x k. D is a diagonal with k strictlx positive entrics. and \ - is

p x P. Tlic matrices C . I - are colurnn-orthonormal. Le..: Ik = b - T ~ = I 'TI '.

so\t-.

Lirics F.7 and F.10 corne from the fact that (1-?(D2 + XI,)} and { C ( D 2 + A I N ) } are

cigcrisoliit ions of xT-y + XI, and S S ~ + XIlv: respcctivel>-. ancl t hat bot h of t hese matrices

arc invcrtiblc.

The PDA algorithm is composed of a penalized multiresponse regression of ari image A

niatris on groupindicator matris. I - ' followed by the eigenanalysis of 1 Tlius. if al1

WC neect is posterior probabilities at an? (neu-) image. xo ive can operate entireiy in the

spacc of obscrx-ations. (rnuch smaller than the space of predictors). once G and G* are

precomprited. \ l e need one more step to deal n-ith ceritering of S matris using only the

outcr-product matris G.

Appendix G

Centering the Design Matrix

For PD.\. we rieed t o center the matris S first. beforc computing G. Howc\-er. to run the

rcsarripling validation studics. the training set will changc for cach bootstrap (or CI-) itera-

t ion. plus we necd to ccnter the validation esamples by the training set mcans. This n-ould

rqiiire precomputing the outer-product matris for cadi bootstrap itcration separately. de-

fyirig tlic coi-liputational adt-antage of tliis operation. \\> therefore need to find t h way

to conipiitc tlic ccntered version of G and a \va!- to center validation set matris. for ariy

sclcctiori of training set esamples. given an unccntcrcd G computed using al1 uncentered .V

obscr\-ations.

Let Ghll = xST be the outer-product matris of al1 un-centered data. An- gil-en

l>ootstrap/C\' sample specifies a subsct of rom of S. as a training set. tvitli the rest being

a \-alidation set. From therc. onc obtains G and G* to get predictioris (Eq. F.4). If G-41i is

rcarrangcd. so that first NI colurnns/ro~vs correspond to the training images. and last :Vo

to the \*didation images. then G and G* are Nl x Nl upper-left. and iVo x .VI lower-lcft

subniatriccs of GZlll: respectively.

Thc centering operator associated with any N x p matris S is:

wherc laV cidenotes a colurnn n-vector of ones. \\C want G = -TST in terrns of G. \\è have.

csplicitly:

= G - A G

r e ( A ) = Si + gk - 4: and ij,: 5 denote column (row) mean and over-a11 mcan of

G. respcctivcly. For G': with validation points, we proceed sirnilarly. using the colurnn

meuns of G:

whcrc. siriiilarly as before. = 9; + gr. - 6. and ijl is a nieaii of ith rou1 of G'.

Bibliography

R.J. -Adler and -4.11. Hasofer. Level crossings for random fields. Annals of Probability, 4:

1-12. 197G.

13-11. -4nclcrson. T.11-. ,Inderson. and 1. OIkin. '\la-ximum likelihood estimators and likeli-

hooci ratio criteria in multivariate components of variance. The Annals of Statistics. 11

(2):-405-4LI. 1986.

T.I\-. -Anderson. A n introduction to rnultivariate .statistical ana1y.si.s- .John \\-iIey &L Sons,

sccoricl edi t ion. 1984.

T.\\-. .Anclcrson. Components of variance in AI-\SOI.--A. In P.R. Iirishnaiah. editor. Mufti-

variate .4nnl?~.si.s - VI. pages 1-8. Elsevier Science Publishcrs. 1985.

11. Jlntonini, 11. Barlaud, P Mathieu. and 1. Daubechies. Image coding iising n-avelct

trarisforrn. IEEE Trans Image Process, 1:205-220. 1992.

B.-\. ,Ardekani, S.C. Strother. J.R. -Anderson, 1. Law. O.B. Paulson, 1. Kanno. and D.-A.

Rottenbcrg. On the detection of activation patterns using principal components analysis.

In R.E. Carson, 1I.E. Daube-\Vit herspoon. and P. Herscovitch, ectitors, Quantitative

functional brain imaging with Positron Emission Tomography, pages 253-257. -1cademic

Prcss, San Diego, C-A: US-4, 1998.

?;.P. .'lzari. P. Pictrini. B. Honvitz: K.D. Pettigrew, H.L. Leonard. J.L. Rapoport. 1.I.B.

Schapiro. and S.E. Swedo. Individual differences in cerebral metabolic patterns during

pharmacot herapy in obsessive-compulsive dissorder - a multiple regression discriminant

arialysis of positron emission tomographie data. Biological Psychiatnj. 34( 1 1) :795-809.

1993.

11. Barinaga. [VIiat makes brain neiirons run'? Science. 276: 196-8. 1997.

R.E. Bellrilan. Adaptive Control Process. Priiiceton Lniversity Press. 1961.

C.S. Burrus. R.-1. Gopinath. and H. Guo. Introduction to wavelets and wauelet transforms:

A primer. Prentice Hall. 1998.

R.B. Biiston and L.R. Frank. A mode1 for the coupling bet~veen cerebral blood flow and

os'-gcn mctabolism during neural stimuiâtion. J Cereb Blood Flow Metabol. ï7:64-72.

1991.

II..]. Catalan. 11, Honda, R.P. \tecks. L.G. Cohen. and SI. Hallctt. The furictional neii-

roiiriatorii>- of simple and compks sequential fingcr movements: a PET study. Brain.

121:253-264. 1998.

C. CIark. R. Carson. R. I<essIer: R. AIargolin. 11. Buchsbaum. L. DeLisi. C. King, and

R. Cohen. -Altcrnativc statistical models for the csaniiriation of clinical positron emissiori

toniography/fluorode-osyglucose data. J Cereb Blood Flow MetaboL 5: 142-1.50. 1985.

R.R. Coifrnan and S Saito. Constructions of local orthonormal bases for classification and

rcgrcssion. Comptes Rendus Acad. Sci. Paris, Serie 1. 3 l9 (2 ) : 191-196. 1994.

R.R. Coifrnan and -\: Saito. Irnproved discriminant bases using cmpirical probability density

estirnation. In Proceedings Computing Section of Amer. Statist. Assoc., pages 312-321.

19C)G.

P. Cra\.e and G. IVahba. Smoothing noisy data with spline functions. Numerishe Mathe-

rnatic. 31:371-403, 1979.

S.-A.C. Cressie. Statistics for spatial data. .John \\ïleÿ Sr Sons. Sen- 1-ork. revised edition.

1993.

C. de Boor. A practical guide to splines. Springer-i.cerlagt Sew \ork. 1978.

D.L. Donoho and 1-11. Johnstone. Ideal spatial adaptation by wavelet shriiilmge. Biomet-

rica. 81 (3):425-55. 1994.

D.L. Donoho and 1-11. Johnstone. -4dapting to unknown smoothness via n-a\-elet shrinkage.

Journal of A m e n c a n Statistical Societjy. g O ( 4 X ) : 1'200-1224. 1995.

R.0. Dtida and P.E. Hart. Pattern Classification and Scene Recognition. IYiley. Sew York.

197.3.

B. Efrori anci R.J. Tibsliirani. An Introd~uction t o the Bootstrap. Chapman ,\rici Hall. 1993.

B. Efron and R.J. Tibshirani. Irnpro\vments on cross-1-alidation: The .632+ bootstrap

rrictliod. .J. of Arnerican Statistical As.soc.. 9'L:Z-B-560. 1997.

Bratllcy Efron. Estimating ttie error rate of a prcdiction rulc: improvement on cross-

\-alidation. Journal of American Statisticnl Society, 785316-331. 1983.

R.-\. Fiçticr. The use of multiple measurements in tasonomic problenis. Annais of Eugenics.

7 : 179-188. 1936.

PT Fos and LI,\ Alintum. Soninvasive functional brain mapping by changedistribution

anal>-sis of averaged PET images of HZ "0 tissue activit~-. J Nucl Med. 30: 141-9. 1989.

R. S. .J. Frackowiak, Iiarl T. Fristori? C. D. Frith, R. J . Dolan, and J . C . llazziotta. Human

Brain Function. Academic Press, San Diego. CA, C-S.A? 1991.

1' Freund. Boosting a rwak learning algorithm by majority Information and Computation,

121(2):25G-285i, 1995.

1- Freund and R Schapire. Esperiments with a new boosting aIgorithm. In iblachine

Lenning: Proceedzngs of the Thirteenth International Conference. pages 148-136. 1996.

.J. Friedman. T. Hastie. and R. Tibshirani. -Additive logistic regression: a statistical rieu-

of hoosting. Annals of Statistics, In Press.

.J.H. Friedman. Rcgularized discriminant analysis. Journal of Amen'cun Statistical Society.

8-1 (40.5): 165-17.5. 1989.

.J.H. Friedman. IluIti\ariate adaptit-e regression splines (rvith discussion). .-Innuls of Statis-

tics. 19:l-141. 1991.

J.H. Fricciman. -An overvicw of prtdictive lcarning and function approsiniation. In

i*. Clierkassky. .J.H. Friedman. and H. jhchsler. editors. From Statistics to neural

net u!ork.s: Theor-y and pattern recognition applications. S-AT0 -\SI Scries. pages 1-6 1.

Springcr-\krlag. Berlin. 1994.

K..J . Friston. Imagirig neuroscicnce: Principles or maps? Proc Nat1 Acad Sci. 95:796-802.

1998.

K..J. Friston. C.D. Firth. P.F. Licldle, and R.S.J. Frackowiak. Functionat connectivity: Thc

principal cornponent analysis of Iargc (PET) data sets. J Cereb Blood Flow Metabol, 13:

S E - 1 - 4 . 1993,

K..]. Friston, C.D. Frith. P.F. Licldle, and R.S..J. Frackowiak. Comparing functional (PET)

images: Th assessmcnt of significant change. J Cereb Blood Flow Metabol. 10:458-466.

1991.

K . J . Friston. -\.P. Holmes, K.J. 13orsley. J-P Poline, C.D. Frith, and R.S.J. Frqckou-iak.

Statistical parametric maps in functional imaging: A general lincar approach. fiman

Brain iWapping, 2:189-210, 1995.

1;-.J. Friston. J.B. Poline. A.P. Holmes. C.D. Frith. and R.S.J. Frackowiak. -4 multivariate

a~iaij-sis of PET activation studies. Human Brain Mapping? 4: 140-15 1. 1996.

P.J. Green and B.\Y. Silverman. Nonparametric regression and generalized linear models :

a roughness penalty approach. Chapman and Hall. London. 1994.

L.K. Hansen. J . Larsen. F..L Sielsen. S.C Strother. E. Rostrup. R. Savoy. S. Lange.

.J. Sidtis. C. Svarer. and 0.B. Pauison. Generaiizable patterns in Seuroimaging: Hou*

mariy principal components'! ~Veuroimage. 30: 1-1 1-9. 1999.

;\.SI. Hasofer and R..J. -Adler. Upcrossings of random fields. Aduances in Appfied Probabil-

ity[SupplL 10: 14-21. 1978.

T. Hastie and R. Tibsliirani. \hrying-coefficient models. Journai O/ the Royal Statistical

Societg series B' Zi(l):ï. ' j ï-ï96. 1993.

T. Hastic and R. Tibshirani. Discriminant analysis by mixture modelling. Journal of the

Ro?pl Stutistical Society series B. 58:15.5-176. 1995.

T..J. Hastic. -4. Buja. and R.J. Tibsliirani. Penalized discriminant analysis. rlnnals of

Stntistics. 237.3-102. 1995.

T.J. Hast ic and R.J. Tibshirani. Generalized Additive Models. Chapman and Hall. 1990.

.J. Hertz. -4. Iirogh, and R.G. Palmer. Introduction to the Tlreory of Neural Computation.

-Addison-\\'csley, Redwood C i t s C.4, 1991.

SI. Hintz-.\Iadscn, L X . Hansen, J. Larsen. 1I .W. Pedersen, and 11. Larsen. Scural classifier

coristruction using regularization. prunning and test error estimation. Neural Networks.

11:1659-1670. 1998.

1-11. Johnstone and B.W. Silverman. \\avelet threshold estimators for data with correlated

noise. JRSB. 59:319-351, 1997.

.J.S. Kippcnham. \\:.\V. Barker, J . Sagel, C. Grady, and R. Duara. Seural network

classification of normal and Alzheimer's disease sub jects using high-resolut ion and low-

rcsolution PET cameras. J Nucl bled, 35:ï-la. 1994.

C'. Iijems. S.C. Strother, J.-4. ,Anderson, 1. Law. and L X . Hansen. Enhancing the multivari-

nte signal of [ '"0]water PET studies with a new nori-linear neuroanatomical registration

algorithm. IEEE Tram Med Img, 18:301-319. 1999.

S. Lange. S.C. Strother. J.R. -inderson. F.,\. Sielsen. ,\. Holmes. T. Kolenda. R. Savoy.

and L.K. Hansen. Plurality and resemblance in nIRI da ta analysis. NI. 10:'LS'L-303.

1999.

B. Laiitriip. Iiai Hansen. L.. 1. Law. S. '\forch. C. Svarer. and S.C. Strotticr. 1Iassire

n-~ight sliaring: a cure for estremely ill-posed problems. In H.J. Hcrrmann. D.E. \\olf.

a n d E. Poppcl. editors. Supercomputing in Brain Research: Frorn Tomography to Neural

Net u~orks. pages 137-148. I\orlci Scicntific. 1995.

D. \Ialorick ancl -4. Grinvald. Interactions bctu-ccn electrical activity and cortical microcir-

ciilation rcvcaled by imaging spcctroscopy - implications for functional brain mapping.

Science. ?12(S26l) :Xil-a54. 1996.

K. 1'. IIardia, .J. T. Kent, and .J. 11. Bibby n/hltiuiariate Atzalgsis. -4cademic Press.

London. Great Britain, 1979.

P. 1IcCullagh and J .-A. Nelder. Generalited Linear hlodels. Chapman ad Hall, London:

CI<. 2 edition. 1989.

A.R. '\IcIntosh. F.L. Bookstein. J.V Haxbs and C.L. Grady. Spatial pattern analysis of

functional brain images using partial least squares. Neuroimage, 3:143-157. 1996.

-4.R. 1lcIntosh. L. Syberg, F.L. Bookstein. and E. Tulving. Different ial functional con-

riectivitj- of prefrontal and medial temporal cortices during episodic memory retrieval.

Hu~nun Brain Mapping. 5:3'23-327, 1997.

.J.R. lloeller and S.C. Strother. -1 regional covariance approach to the anal'-sis of functional

patterns in positron ernission tornographic data. J Cereb Blood Flow Metabol. 1l:Al'Ll-

--l1.3.5. 1991.

J .R. 1Iocller. S.C. Strot her. J . J . Sidtis, and D.-4. Rottenberg. Scaled Subprofile 1Iotlel: -4

statistical approach to the analysis of functional patterns in positron emmision tomog-

raphy data. J Ccreb Blood Flow Illetabol. 7:619-658. 1987.

S . llorch. A multiuariate approach to functional neuromodeling.

Pli D t hcsis. Danish Technical University. Lyngbl-. Dennmark. 1998.

http://eivind.imm.dtu.dk/publications/phdthesis.html~

S. 1Iorc.h. L-I i . Hansen. S.C. Strother. C. S\-arer. D.-A. Rottenberg. B. Lautriip. R. Savo~-.

a r i r l O.B. PaiiIson. Sonlincar vcrsus linear models in functional neuroimaging: Learning

cilri-CS and generalization crossover. In 3. Duncan and G. Gincli. editors. Information

pro'ocessz'ng in medical irn aging. \-oliirnc 12.30 of Lecture Notes in Cornput er Science. pages

2.59-270. Springer-\érlag. 1997.

F.A. Siclseri. L.K. Hansen, and S.C. Strother. Canonical ridge analysis \vit h ridge parani-

cter optimization. Neuroimage, ï (Part 2 of 3):S758, 1998.

C.R. Soback. S.L. Strominger? and R..J. Demarest. The human neruous .s@em: introduc-

tion and review. Lea &L Febiger, 1991.

S. Ogan-a. T.11. Lee. .\.FI. Kay, and D.\V. Tank. Brain magnetic resonance imaging with

contrast dependent on blood osygenation. Proc. Natl. Acad. Sci. USA, 87:9868-9872:

199Oa.

S. Ogawa. T.'\I. Lee. - A S . Sayak. and P. Glynn. Osygenation-sensitiw con t ra t in magnetic

rcsonance image of rodent brain ât high fields. Magn. Reson. Med.. 14:68-78. 1990b.

R.T. Ogdcri. Essential wauelets for statistical applications and data analyszs. Birkhauser.

Boston. 1 991.

.J .JI. Ollingcr and J.-4. Fessler. Posit ron-emission toniography. IEEE Signal Processing

i ' r l c tga~i~~e . pages 43-55. 1997.

F. O'Sullivan. Discretized Laplacian smoothing by Fourier methods, Journal of American

Stati.stical Societg. 86(415):634-6-12. 1991.

11 .E. Pajcvic. 1I.E. Daube-\Vitherspoon. S.L. Bacharach. and R.E. Carson. Soise charac-

teristics of 3-6 arid 2-d PET images. lEEE Truns Med 'mg. 17:9-23, 1998.

.J.O. Rarriscj- and B.\\-. Silverman. Functional data analgsis. Springer-Ierlag, Sen- 1-ork.

1997.

C.R. Rao. Tlic utilization of multiple measurcnients in problenis of biological classification

(witli cliscirssiori) . Journal of the Royal Statistical Society series B. 10: 159-203. 1945.

Ii. R e h . K. Lakshminarj-an. S. Frutiger. K A . Schapcr. DI\-. Surnners. SC. Strottier. JR.

Anderson. and Da\. Rottenberg. A syrnbolic environment for visiializating activated foci

in functiorlal neuroimaging dat asets. Medical Image .4nalysis. 232 1.5-2'26. 1998.

B.D. Riplc~.. Spatial Statistics. IViley, Sm- York. 1981.

B.D. Riplcy. Pattern Recognition and Neural Networks. Cambridge Cniversity Press. Cam-

l~ridgc. GBI 1996.

D.-1. Rottenberg. J . J . Sidtis. S.C. Strother: Schaper Ii .-A.I J.R. Anderson. 11.J. Selson,

and R.I\'. Price. .ibnormal cerebral glucose metabolism in HIV-1 seropositives with and

n-it hout clementia. .J Nucl Med, 37: 1133-1 111, 1996.

C.E. Ruttimann. 11. Lnsert R.R. Rawlings. D. Rio. S.F. Rarnsej-. Il.\\-. Honimer. .Je-1.

Frank. and D.R. 11-einberger. Statistical analysis of funct ional 11RI data in the wavelet

ciornain. IEEE Trans Med Img. ZÏ(2): 142-154. 1998.

S. SacIato. G. Campbell. 1'. Ibaiiez. 11-P Deiber. and 11. Hallett. Complesity affects

rcgional cerebral blood flow change during sequeritial finger movcrnents. Journal of

Xeuroscience. 16(8):2693-2100. 1996.

S.C. S t rothcr. J . R. -Anderson. K.-\. Schaper. .J. J. Sidtis. J-S Liow. R.P. \\;oods. and D.-4.

Rottenberg. Principal component analysis and the scaled subprofile mode1 compared to

int crsubject averaging and statitistical paramet ric mapping: 1. .*Functional Connectiv-

ity" witli [ l " ~ ] w a t e r PET. d Cereb Blood Flow iCfetabol. 15:TX-753. 199%-

S.C. Strothcr. J.R. -Anderson. S - L Su. .J-S Lio. D.C. Bonar. and D.-4. Rottenberg. Quan-

titative cornparisoris of image registration techniques based on hi& resoliition 1IRI of

thc brairi. .J Comp.irt .ilssi.st To,mogr. 15:9.54-962. 1994.

S.C. Strother. 1. Kanno. and D.,A. Rottcnberg. Principal component analysis. variance

partitioning and "functional connectivity". J Cereb Blood F1o.w ildetabol. 15:353:360.

199.5b.

S.C. Strotfier. S. Lange. J.R. Anderson. K . - k Schaper, K. Rehrn. L.K. Hansen. and Da-\.

Rottcnlxrg. -Activation pattern reproducibility: measuring the effects of group size and

data analysis models. Human Brain Mapping, 5:312:316. 1997.

S.C. Strother. S. Lange. R.L. Savos J.R. ,Anderson, J . J . Sidtis, Hansen L.K., P.,\. Ban-

dettirii. K. O'Craven, ,LI. Rezza, B.R. Rosen, and D. , l Rottenberg. ,\Iultidirnensional

state-spaces for flIRI and PET activation studies. Neuroimage, 3(2):S98, 1996.

S.C. Strother, K. Rehm, N. Lange. J.R. ~Andcrson, K.,-\. Schaper. L.K. Hansen, and D A .

Rottcnberg. 1Ieasuring activation pattern reproducibility using resampling techniques.

In R.E. Carson. 1I.E. Daube-Witherspoon. and P. Kerscovitcti. editors. Quantitati.ue

functiorml brain imaging with Posi tron Ernission Tomography. pages 341-246. .-\cademic

Prcss. San Diego. 1998a.

S.C. StrotIicr. K. Rehm. S. Lange, J.R -Anderson. K.-\. Scltaper. L.K. Hansen. and Deal.

Rottcnberg. Sleasuring activation pattern reprodiicibility iising resampling techniques.

In R.E. Carson. 1I.E. Daube-\Vitherspoon. and P. Herscovitch. d i t ors. Quantitative

funct ionnl brain imaging with Positron Emiss ion Tornograplr y, pages 233-2.57. -4cademic

Prcss. San Diego. C-4, 1998b.

R. Tibshirani. Regression selection and shrinkage via the lasso. Journal 01 the Royul

Statistical Society series B. 1:267-288. 1995.

\'.S. \kpriik. The nature of statzstical learning theon/. Springer-ICrlag, Sew 1-ork. 1993.

Brarii 1-itiakovic. Statistical Modeling b?j ~wave1et.s. \\-ille!- Scries in Probability arid Statis-

tics. .John \\-iley k Sons. Inc.. 1999.

H. \hoc i . Canonical ridge and econometrics of joint production. Journal of Econornetrics.

4:147-166. 1976.

G race iialiba. Spline h1odel.s for Observational Data. SI.111. Pliiladcl phia. P-4. 1990.

S. \\-ald. A . Riihe. H. \Yaldt and \\'..J. Dunn. The collinearity problem in linear-regrcssiori:

The partial least-squares (PLS) approach to generalized inverses. SIAM J of Scien and

Stat Corr~puting, 5(3):735--7.13, 1984.

R.P. \\oods. S.R. Cherry: and J-C. Ilazziotta. -1 rapid automated algoritlim for accuratel-

aligning and reslicing positron emission tomography images. J C o m p u t Assis t Tomogr:

16:6'20-633. 1992.

R. P. iloods. S.T. Grafton. J.D. Watsori. S.L. Sicot te, and J.C. llazziot ta. ,Automateci

image registration: II. Intersubject validation of linear and nonlinear models. J Cornput

.-l.s.sist Tomogr. '22: 153-165. 1998.

R.P. iloods. .J-C. llazziotta. and S.R. Cher-. Automated image registration. In K. Ce-

mura. S.-4. Lassen. T. Jones. and 1. Kanno. editors. Proceedings brain PET '93 AKITA:

quuntificntion of brain function. pages 391-400. -Amsterdam. 1993. Escerpta lfcdica.

K. .J. \\orsle'= J-B. Poline. 1i.J Friston. and -4.C Evans. Characterizing the resporise of

PET and nIRI data using multivariate linear models. Neuroimage. 630.5-319. 1997.

K.J. 1l'orslc~-. -4.C. Evans. S. lfarret , and P. Seelin. ,A three dimensional statistical analysis

for CBF activation studics in human brain. J Cereb Blood Flo,w Metabol. 12900-915,

1992.

K . J . ilorslcy. S. llarret. P. Seelin. A.C. \andal. Ii. .J. Friston. and -4.C. Evans. -4 unified

statistical approach for determining significant signals in images of ccrcbral actiration.

Hvrnun Bmin Mapping . -1:.58-731 1996.

G .-A. Iii-igilt. Ilasnet ic resonance imaging. IEEE Szgnal Processing Magazine. pages 56 4 6 .

1991.

Documents

STATISTICAL ANALYSIS OF MEDICAL IMAGES WITH … · C'SI\'ERSITY OF TOROSTO DEP-IRTJIEST OF PUBLIC HE-ALTH SCIESCES The undersigned hereby certify that the' have rcad and recommend