12
Biometrics DOI: 10.1111/biom.12633 Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis Yin Xia 1,2, * and Lexin Li 3, ** 1 Department of Statistics, School of Management, Fudan University, Shanghai 200433, China 2 Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A. 3 Division of Biostatistics, University of California at Berkeley, Berkeley, California 94720, U.S.A. email: [email protected] ∗∗ email: [email protected] Summary. Brain connectivity analysis is now at the foreground of neuroscience research. A connectivity network is charac- terized by a graph, where nodes represent neural elements such as neurons and brain regions, and links represent statistical dependence that is often encoded in terms of partial correlation. Such a graph is inferred from the matrix-valued neuroimag- ing data such as electroencephalography and functional magnetic resonance imaging. There have been a good number of successful proposals for sparse precision matrix estimation under normal or matrix normal distribution; however, this family of solutions does not offer a direct statistical significance quantification for the estimated links. In this article, we adopt a matrix normal distribution framework and formulate the brain connectivity analysis as a precision matrix hypothesis testing problem. Based on the separable spatial-temporal dependence structure, we develop oracle and data-driven procedures to test both the global hypothesis that all spatial locations are conditionally independent, and simultaneous tests for identifying conditional dependent spatial locations with false discovery rate control. Our theoretical results show that the data-driven procedures perform asymptotically as well as the oracle procedures and enjoy certain optimality properties. The empirical finite-sample performance of the proposed tests is studied via intensive simulations, and the new tests are applied on a real electroencephalography data analysis. Key words: Brain connectivity analysis; False discovery rate; Gaussian graphical model; Matrix-variate normal distribu- tion; Multiple testing. 1. Introduction Matrix-valued data are recently becoming ubiquitous in a wide range of scientific and business applications (Allen and Tibshirani, 2010, 2012; Reiss and Ogden, 2010; Aston and Kirch, 2012; Leng and Tang, 2012; Yin and Li, 2012; Tsiligkaridis et al., 2013; Zhou and Li, 2014, among others). Accordingly, the matrix normal distribution is becom- ing increasingly popular in modeling the matrix-variate observations (Zhou, 2014). Our motivating example is an elec- troencephalography (EEG) study of 77 alcoholic individuals and 45 controls. For each subject, we recorded the voltage values from 61 electrodes placed at various scalp locations at a rate of 256 Hz for 1 second, resulting a 61 × 256 matrix. The scientific objective is to infer the connectivity patterns among 61 spatial locations for both alcoholic and control groups. This study embodies a more general class of applications of mapping brain connectivity networks, which is now at the foreground of neuroscience research. The overarching goal is to infer the brain network, characterized by a graph, where the nodes represent neural elements such as neurons and brain regions, and the links encode statistical dependency between those elements. Partial correlations, conveyed by a precision matrix, are frequently employed to describe such statistical dependence (Fornito et al., 2013), and this precision matrix, in turn, is to be derived from matrix-valued imaging data, such as EEG and functional magnetic resonance imaging. Adopting matrix normal distribution framework, we for- mulate brain connectivity network analysis as a precision matrix inference problem. Specifically, let X IR p×q denote the spatial-temporal matrix data from an image modality, for example, EEG. It is assumed to follow a matrix normal dis- tribution with the Kronecker product covariance structure, Cov{vec(X)}= S T , where vec(X) stacks the columns of the matrix X to a vector, is the Kronecker product, S IR p×p denotes the covariance matrix of p spatial loca- tions, T IR q×q denotes the covariance matrix for q time points. Correspondingly, Cov 1 {vec(X)}= 1 S 1 T = S T , where S IR p×p and T IR q×q represent the spatial and temporal precision matrix, respectively. In brain connectiv- ity analysis, our primary interest is to infer the connectivity network characterized by the spatial precision matrix S . By contrast, the temporal precision matrix T is of little inter- est here and is to be treated as a nuisance parameter. We also make some remarks regarding the assumptions of our adopted framework. First, the matrix normal assumption has © 2016, The International Biometric Society 1

Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

Biometrics DOI: 10.1111/biom.12633

Hypothesis Testing of Matrix Graph Model with Application to BrainConnectivity Analysis

Yin Xia1,2,* and Lexin Li3,**

1Department of Statistics, School of Management, Fudan University, Shanghai 200433, China2Department of Statistics and Operations Research, University of North Carolina at Chapel Hill,

Chapel Hill, North Carolina 27599, U.S.A.3Division of Biostatistics, University of California at Berkeley, Berkeley, California 94720, U.S.A.

∗email: [email protected]∗∗email: [email protected]

Summary. Brain connectivity analysis is now at the foreground of neuroscience research. A connectivity network is charac-terized by a graph, where nodes represent neural elements such as neurons and brain regions, and links represent statisticaldependence that is often encoded in terms of partial correlation. Such a graph is inferred from the matrix-valued neuroimag-ing data such as electroencephalography and functional magnetic resonance imaging. There have been a good number ofsuccessful proposals for sparse precision matrix estimation under normal or matrix normal distribution; however, this familyof solutions does not offer a direct statistical significance quantification for the estimated links. In this article, we adopt amatrix normal distribution framework and formulate the brain connectivity analysis as a precision matrix hypothesis testingproblem. Based on the separable spatial-temporal dependence structure, we develop oracle and data-driven procedures totest both the global hypothesis that all spatial locations are conditionally independent, and simultaneous tests for identifyingconditional dependent spatial locations with false discovery rate control. Our theoretical results show that the data-drivenprocedures perform asymptotically as well as the oracle procedures and enjoy certain optimality properties. The empiricalfinite-sample performance of the proposed tests is studied via intensive simulations, and the new tests are applied on a realelectroencephalography data analysis.

Key words: Brain connectivity analysis; False discovery rate; Gaussian graphical model; Matrix-variate normal distribu-tion; Multiple testing.

1. Introduction

Matrix-valued data are recently becoming ubiquitous in awide range of scientific and business applications (Allenand Tibshirani, 2010, 2012; Reiss and Ogden, 2010; Astonand Kirch, 2012; Leng and Tang, 2012; Yin and Li, 2012;Tsiligkaridis et al., 2013; Zhou and Li, 2014, among others).Accordingly, the matrix normal distribution is becom-ing increasingly popular in modeling the matrix-variateobservations (Zhou, 2014). Our motivating example is an elec-troencephalography (EEG) study of 77 alcoholic individualsand 45 controls. For each subject, we recorded the voltagevalues from 61 electrodes placed at various scalp locations ata rate of 256 Hz for 1 second, resulting a 61 × 256 matrix. Thescientific objective is to infer the connectivity patterns among61 spatial locations for both alcoholic and control groups.This study embodies a more general class of applications ofmapping brain connectivity networks, which is now at theforeground of neuroscience research. The overarching goal isto infer the brain network, characterized by a graph, wherethe nodes represent neural elements such as neurons and brainregions, and the links encode statistical dependency betweenthose elements. Partial correlations, conveyed by a precisionmatrix, are frequently employed to describe such statisticaldependence (Fornito et al., 2013), and this precision matrix,

in turn, is to be derived from matrix-valued imaging data,such as EEG and functional magnetic resonance imaging.

Adopting matrix normal distribution framework, we for-mulate brain connectivity network analysis as a precisionmatrix inference problem. Specifically, let X ∈ IRp×q denotethe spatial-temporal matrix data from an image modality, forexample, EEG. It is assumed to follow a matrix normal dis-tribution with the Kronecker product covariance structure,Cov{vec(X)} = �S ⊗ �T , where vec(X) stacks the columnsof the matrix X to a vector, ⊗ is the Kronecker product,�S ∈ IRp×p denotes the covariance matrix of p spatial loca-tions, �T ∈ IRq×q denotes the covariance matrix for q timepoints. Correspondingly,

Cov−1{vec(X)} = �−1S ⊗ �−1

T = �S ⊗ �T ,

where �S ∈ IRp×p and �T ∈ IRq×q represent the spatial andtemporal precision matrix, respectively. In brain connectiv-ity analysis, our primary interest is to infer the connectivitynetwork characterized by the spatial precision matrix �S . Bycontrast, the temporal precision matrix �T is of little inter-est here and is to be treated as a nuisance parameter. Wealso make some remarks regarding the assumptions of ouradopted framework. First, the matrix normal assumption has

© 2016, The International Biometric Society 1

Page 2: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

2 Biometrics

been frequently adopted in numerous applications (Leng andTang, 2012; Yin and Li, 2012), and is scientifically plausiblein neuroimaging analysis. For instance, the majority stan-dard neuroimaging processing software, such as SPM (Fristonet al., 2007) and FSL (Smith et al., 2004), adopt a frame-work that assumes the data are normally distributed per voxel(location) with a noise factor and an autoregressive structure,which shares a similar spirit as the matrix normal formulation.Second, it is commonly assumed that the precision matrixis sparse, which we also adopt for our inferential procedure.Again, this sparsity assumption is scientifically justifiable, asit is known that brain region connections are energy consum-ing (Raichle and Gusnard, 2002; Fox and Raichle, 2007), andbiological units tend to minimize energy-consuming activities(Bullmore and Sporns, 2009).

In this article, we aim to address the following two hypoth-esis testing questions. The first is to test if all spatial locationsare conditionally independent, namely, we test

H0 : �S is diagonal versus H1 : �S is not diagonal. (1)

The second is to identify those conditionally dependent pairsof locations with false discovery rate (FDR) and false dis-covery proportion (FDP) control; that is, we simultaneouslytest

H0,i,j : ωS,i,j = 0 versus H1,i,j : ωS,i,j �= 0, for 1 ≤ i < j ≤ p,

(2)

where ωS,i,j is the (i, j)th element of �S .In the literature, there have been a good number of meth-

ods proposed to estimate a sparse precision matrix undernormal distribution Meinshausen and Buhlmann (2006); Yuanand Lin (2007); Friedman et al. (2008); Ravikumar et al.(2011); Cai et al. (2011). There are extensions to multipleprecision matrices (Danaher et al., 2014; Zhu et al., 2014),nonparanormal distribution (Meinshausen and Buhlmann,2006; Yuan and Lin, 2007; Friedman et al., 2008; Cai et al.,2011; Ravikumar et al., 2011), and matrix-valued graph model(Leng and Tang, 2012; Yin and Li, 2012; Zhou, 2014). How-ever, all those methods tackle the estimation aspect of theproblem and induce a network from an estimated preci-sion matrix. Only recently, hypothesis testing procedures areemerging (Drton and Perlman, 2007; Liu, 2013; Chen and Liu,2015; Narayan et al., 2015; Xia et al., 2015).

We aim at hypothesis testing for the spatial precisionmatrix �S under the matrix normal framework. We separatethe spatial and temporal dependence structures, and considertwo scenarios. One is to assume the temporal covariance �T isknown, and we term the method as an oracle procedure. Theother is to use a data-driven approach to estimate and plugin �T , and accordingly, we term it a data-driven procedure.We construct test statistics based on the covariances betweenthe residuals from the inverse regression models. We developa global test for (1) based on the derived limiting null dis-tribution, and show it is particularly powerful against sparsealternatives. We then develop a multiple testing procedurefor (2), and show the FDR and FDP are controlled asymp-totically. The data-driven procedures perform asymptotically

as well as the oracle procedures, and enjoy certain optimal-ity under some regularity conditions. Our intensive numericalanalysis supports such findings.

Our contributions and novelty are multi-fold. First, brainconnectivity analysis is now becoming a central goal of neu-roscience research (Fornito et al., 2013), and it constantlycalls for statistical significance quantification of the inferredconnection between neural elements. However, there is apaucity of systematic hypothesis testing solutions developedfor this type of problems, and our proposal offers a timelyresponse. Second, there have been some successful propos-als of matrix data network estimation, most notably, Lengand Tang (2012); Yin and Li (2012); Zhou (2014). That classof solutions and our proposal can both produce, in effect, asparse representation of the network structure. However, theformer tackles network estimation, whereas our method tar-gets network hypothesis testing, which is an utterly differentproblem than estimation. The key of estimation is to seek abias-variance tradeoff, and many common sparse graph esti-mators are actually biased. Such methods do not produce adirect quantification of statistical significance for individualnetwork edges. Although they often enjoy a high true posi-tive discovery rate (power), there is no explicit control of falsepositive rate (significance level). By contrast, our hypothe-sis testing solution starts with an nearly unbiased estimator,produces an explicit significance quantification, and achievesa high power under an explicit significance level. Third, ourproposal is also distinctly different from the few existing graphmodel hypothesis testing solutions. In particular, Liu (2013)developed one-sample testing procedures under a vector nor-mal distribution. Directly applying Liu (2013) in our contextis equivalent to assuming the columns of the spatial-temporalmatrix X ∈ IRp×q are i.i.d. vector normal. This is clearly nottrue as the measurements at different time points are tempo-rally highly correlated. A preprocessing step of whitening assuggested in Narayan and Allen (2015) can help induce inde-pendent columns, which basically uses n samples to estimatea q × q temporal covariance matrix at each spatial location.By contrast, our testing procedures are built upon a lineartransformation of the data X�

−1/2T , and pool the np corre-

lated samples to estimate �T . Crucial to our solution is thatthe pooled estimator of �T guarantees the required conver-gence rate, which in turn ensures the data-driven proceduresperform asymptotically as well as the oracle procedures as if�T were known. Conventional whitening, however, does notguarantee this convergence rate, and furthermore, is compu-tationally more expensive. See also Remark 1 in Section 2.1.In Section 4, we numerically compare with Liu (2013), withand without whitening, and demonstrate the clear advantageof our solution. In a more recent parallel work, Chen andLiu (2015) studied a similar one-sample testing problem ofmatrix-variate graphs. However, their work differs from oursin several ways. We study both the global test and entry-wise test and focus on the spatial precision matrix inference,whereas Chen and Liu (2015) only considered the entry-wisetest and targeted both spatial and temporal precision matri-ces. Our solution is invariant with respect to a constant factorof the estimation of �T , and thus we do not require a strictestimate as in Chen and Liu (2015). Moreover, in their con-struction of test statistics, Chen and Liu (2015) required both

Page 3: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis 3

a bias correction and a variance correction. The variance cor-rection involves estimation of the Frobenius norm and thetrace of �S , and can be challenging when the spatial dimensionp is large. Our method, however, separates the spatial andtemporal structures and treats �T as a nuisance. The result-ing test statistics do not require any variance correction. Seealso Remark 4 in Section 2.2. Finally, both Xia et al. (2015)and Narayan et al. (2015) studied the two-sample testing,whereas Xia et al. (2015) only considered the vector normalcase, and Narayan et al. (2015) turned the matrix data intovector normal by whitening, and used bootstrap resamplingmethod for inference.

The rest of the article is organized as follows. Section 2develops the testing procedures, and Section 3 studies theirasymptotic properties. Section 4 presents intensive simulationstudies and Section 5 analyzes the motivating EEG data. Alltechnical conditions and proofs, along with additional simu-lations, are relegated to the online supplement.

2. Methodology

2.1. Separation of Spatial-Temporal Dependency

Let Xk ∈ IRp×q , k = 1, . . . , n, denote n i.i.d. random samplesfrom a matrix normal distribution with mean zero and covari-ance matrix � = �S ⊗ �T . Note that �S and �T are notidentifiable; however, our aim is to identify the set of nonzeroentries of the spatial graph, and this set is invariant up to aconstant. Motivated by brain functional connectivity analy-sis, our goal is to infer about �S = �−1

S , while treating �T orequivalently �T as a nuisance. Our first step is to separate thespatial and temporal dependence structures. We build the teststatistics based on the linear transformation of the originalsamples, {Xk�

−1/2T , k = 1, . . . , n}, first assuming the temporal

covariance �T is known, then plugging in a legitimate esti-mator �T . There are multiple ways to estimate �T , or �−1

T ,as long as the resulting estimator satisfies the regularity con-dition (C4) as given in Web Appendix A. Examples includethe usual sample covariance estimator, the banded estima-tor (Bickel and Levina, 2008), and the adaptive thresholdingestimator (Cai and Liu, 2011),

Remark 1. Our separation step through a pooled esti-mator of �T can be viewed, conceptually, as matrix-variatewhitening. However, it notably differs from the conventionalwhitening. Crucial to our proposed solution is that our teststatistics are built upon a nearly unbiased estimator of ωS,i,j,with a convergence rate op{(nq log p)−1/2} as specified inLemma 2 in Web Appendix B. For the data-driven testing pro-cedures, to guarantee the same convergence rate and to ensurenearly the same asymptotic properties as the oracle proce-dures, we require the estimator of �T to satisfy the regularitycondition (C4), in terms of certain convergence rate on some

norm of �−1/2

T − c�−1/2T , with c > 0 a constant. Our estima-

tor by pooling np samples satisfies this requirement. Due to thecorrelations among the np samples, the pooled estimator of �T

is unbiased only up to a constant. However, our test statistics,by construction, are not affected by this constant. By con-trast, the conventional whitening procedure seeks an unbiasedestimator of the temporal covariance based on the n samples,which does not satisfy the estimation rate of (C4). Conse-

quently, it can not guarantee the asymptotic performance ofthe data-driven testing procedures.

2.2. Test Statistics

We first develop test statistics in the oracle case when �T isknown. The development of the data-driven statistics with anestimated �T is similar, so we omit the details but point outthe difference between the oracle and data-driven cases. Forsimplicity, we also use the same set of notations for the twoscenarios, and only differentiate them when we study theirrespective asymptotic properties in Section 3.

It is well established that, under the normal distribution,the precision matrix can be described in terms of regres-sion models (Anderson, 2003, Section 2.5). Specifically, letY k = Xk�

−1/2T , k = 1, . . . , n, denote the transformed samples,

we have,

Yk,i,l = Y�k,−i,lβi + εk,i,l, 1 ≤ i ≤ p, 1 ≤ l ≤ q, (3)

where εk,i,l ∼ N(0, σS,i,i − �S,i,−i�−1S,−i,−i�S,−i,i) is independent of

Y k,−i,l, and the subscript “−i” means the ith entry is removedfor a vector, or the ith row/column removed for a matrix.The regression coefficient βi = (β1,i, . . . , βp−1,i) and the errorεk,i satisfy

βi = −ω−1S,i,i�S,−i,i, and ri,j = Cov(εk,i,l, εk,j,l) = ωS,i,j

ωS,i,iωS,j,j

.

As such, the elements ωS,i,j of �S can be represented in termsof ri,j. Next, we construct an estimator of ri,j and its bias-corrected version. We then build on this estimator to obtainan estimator of ωS,i,j, upon which we further develop our teststatistics.

A natural estimator of ri,j is the sample covariance between

the residuals εk,i,l = Yk,i,l − Yi,l − (Y k,−i,l − Y ·,−i,l)�βi, i.e., ri,j =∑n

k=1

∑q

l=1εk,i,lεk,j,l/(nq), where Yi,l = ∑n

k=1Yk,i,l/n, Y ·,−i,l =∑n

k=1Y k,−i,l/n ∈ IR(p−1)×1 , and βi, i = 1, . . . , p, are estimators

of βi that satisfy the regularity conditions (C1) or (C1′) givenin Web Appendix A. Such estimators can be obtained viastandard estimation methods such as the Lasso and Dantzigselector. When i �= j, however, ri,j tends to be biased, due tothe correlation induced by the estimated parameters. To cor-rect such bias, following Lemma 2 in Web Appendix B, wehave

ri,j = Ri,j − ri,i(βi,j − βi,j) − rj,j(βj−1,i − βj−1,i)

+ op{(nq log p)−1/2},

where Ri,j is the empirical covariance between {εk,i,l, k =1, . . . , n, l = 1, . . . , q} and {εk,j,l, k = 1, . . . , n, l = 1, . . . , q}. For1 ≤ i < j ≤ p, βi,j = −ωS,i,j/ωS,j,j and βj−1,i = −ωS,i,j/ωS,i,i.

Thus, a bias-corrected estimator of ri,j can be constructed by

ri,j = −(ri,j + ri,iβi,j + rj,j βj−1,i), for 1 ≤ i < j ≤ p. When i =j, we let ri,i = ri,i, which is a nearly unbiased estimator of ri,i.Consequently, an estimator of the entry ωS,i,j of the spatialprecision matrix �S can be constructed as,

Ti,j = ri,j

ri,i · rj,j

, 1 ≤ i < j ≤ p.

Page 4: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

4 Biometrics

To further estimate the variance of Ti,j, note that

θi,j = Var{εk,iεk,j/(ri,irj,j)}/(nq) = (1 + ρ2i,j)/(nqri,irj,j), (4)

where ρ2i,j = β2

i,jri,i/rj,j. Then, θi,j can be estimated by θi,j =(1 + β2

i,j ri,i/rj,j)/(nqri,irj,j). Given {Ti,j, 1 ≤ i < j ≤ p} are het-eroscedastic and can possibly have a wide variability, westandardize Ti,j by its standard error, which leads to the stan-dardized statistics,

Wi,j = Ti,j/(θi,j)1/2, 1 ≤ i < j ≤ p.

In the next section, we test hypotheses (1) and (2) based on{Wi,j}p

i,j=1.

Remark 2. Construction of the test statistics for the data-driven procedure is almost the same as that for the oracleprocedure, except that the data-driven one replaces the trans-formed samples Y k = Xk�

−1/2T in (3) with a plug-in estimator

for �T . Furthermore, the regression coefficients slightly varyat different time points in the data-driven scenario, and wereplace (3) by Yk,i,l = Y

�k,−i,lβi,l + εk,i,l, for 1 ≤ i ≤ p, 1 ≤ l ≤ q.

Remark 3. When �T is unknown, E(�T ) ={trace(�S)/p}�T . If trace(�S) = cp, an unbiased estima-tor of �T becomes �T /c. Accordingly, we shall define the

transformed data Y k = √cXk�

−1/2

T , for k = 1, . . . , n. Then,we have the bias-corrected estimator rnew

i,j = cri,j, which in

turn leads to T newi,j = Ti,j/c, and θnew

i,j = θi,j/c2. Thus, the

standardized statistic Wi,j remains the same, as the constantc is canceled. Therefore, c does not affect our final teststatistics, and for simplicity, we set c = 1 from the beginning.

Remark 4. Thanks to the separation of spatial and tempo-ral covariance structures, the errors {εk,i,l}p

i=1 are independentwith each other for k = 1, . . . , n and l = 1, . . . q. As such, thevariance of the estimator Ti,j can be approximated by the vari-ance of the products of residuals as in (4). On the other hand,Chen and Liu (2015) did not separate the spatial and tempo-ral structures, and performed the inverse regression directly.As a result, for each spatial location, the errors of the corre-sponding nq linear models are correlated, and the variance ofTi,j can no longer be approximated by (4). Therefore, their teststatistics required an additional variance correction, whereasours do not.

2.3. Global Testing Procedure

We propose the following test statistic for the global nullhypothesis H0 : �S is diagonal,

Mnq = max1≤i<j≤p

W2i,j .

Furthermore, let qα = − log(8π) − 2 log log(1 − α)−1, wedefine the global test α by

α = I(Mnq ≥ qα + 4 log p − log log p),

where I(·) is the indicator function. The hypothesis H0 isrejected whenever α = 1.

The above test is developed based on the asymptotic prop-erties of Mnq, which will be studied in detail in Section 3.1.Intuitively, {Wi,j}p

i,j=1 are approximately standard normal vari-ables under H0, and are only weakly dependent under suitableconditions. Thus, Mnq is the maximum of the squares ofp(p − 1)/2 such variables, and its value should be close to2 log{p(p − 1)/2} ≈ 4 log p. We will later show that, under cer-tain regularity conditions, Mnq − 4 log p − log log p convergesto a type I extreme value distribution under H0.

2.4. Multiple Testing Procedure

Next, we develop a multiple testing procedure, based onWi,j, for H0,i,j : ωS,i,j = 0, so to identify spatial locations thatare conditionally dependent. Let t be the threshold levelsuch that H0,i,j is rejected if |Wi,j| ≥ t. Let H0 = {(i, j) :�S,i,j = 0, 1 ≤ i < j ≤ p} be the set of true nulls. Denote byR0(t) = ∑

(i,j)∈H0I(|Wi,j| ≥ t) and R(t) = ∑

1≤i<j≤pI(|Wi,j| ≥

t) the total number of false positives and rejections, respec-tively. The FDP and FDR are defined as

FDP(t) = R0(t)

R(t) ∨ 1, FDR(t) = E{FDP(t)}.

An ideal choice of t would reject as many true positives aspossible while controlling the FDP at the pre-specified level α.That is, we select t0 = inf

{0 ≤ t ≤ 2(log p)1/2 : FDP(t) ≤ α

}.

We shall estimate∑

(i,j)∈H0I{|Wi,j| ≥ t} by 2{1 − �(t)}|H0|,

where �(t) is the standard normal cumulative distributionfunction. Note that |H0| is at most (p2 − p)/2, and is closeto (p2 − p)/2 due to the sparsity of �S . This leads to thefollowing multiple testing procedure.

Step 1. Calculate the test statistics Wi,j.Step 2. For given 0 ≤ α ≤ 1, calculate

t = inf

{0 ≤ t ≤ 2(log p)1/2 :

2{1 − �(t)}(p2 − p)/2

R(t) ∨ 1≤ α

}.

If t does not exist, set t = 2(log p)1/2.Step 3. For 1 ≤ i < j ≤ p, reject H0,i,j if and only if |Wi,j| ≥ t.

3. Theoretical Properties

In this section, we analyze the theoretical properties of theglobal and multiple testing procedures for both the oracleand data-driven scenarios. We show that the data-drivenprocedures perform asymptotically as well as the oracleprocedures and enjoy certain optimality under the regularityconditions. For separate treatment of the oracle and data-driven procedures, we now distinguish the notations of thetwo, and add the superscript “o” to denote the statisticsand tests of the oracle procedures, e.g., β

o

i , Monq,

oα, t

o, andthe superscript “d” to denote those of the data-driven

procedures, e.g., βd

i , Mdnq,

dα, and td . In the interest of space,

we present all the regularity conditions and their discussionin Web Appendix A.

Page 5: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis 5

3.1. Oracle Global Testing Procedure

We first analyze the limiting null distribution of the oracleglobal test statistic Mo

nq and the power of the correspondingtest o

α. We are particularly interested in the power of thetest under the alternatives when �S is sparse, and show thatthe power is minimax rate optimal.

The following theorem states the asymptotic null distribu-tion for Mo

nq, and indicates that, under H0, Monq − 4 log p +

log log p converges weakly to a Gumbel random variable withdistribution function exp{−(8π)−1/2e−t/2}.

Theorem 1. Assuming the regularity conditions (C1)-(C3), under H0, for any t ∈ IR,

P(Monq − 4 log p + log log p ≤ t) → exp{−(8π)−1/2 exp(−t/2)},

as nq, p → ∞.

The above convergence is uniform for all {Xk}nk=1 satisfying

(C1)-(C3).

We next study the power of the corresponding test oα.

We define the following class of precision matrices for spatiallocations:

U(c) ={

�S : max1≤i<j≤p

|ωS,i,j|/θi,j1/2 ≥ c(log p)1/2

}. (5)

This class of matrices include all precision matrices such thatthere exists one standardized off-diagonal entry having themagnitude exceeding c(log p)1/2. By the definition in (4), θi,j

is of the order 1/(nq), and, thus, we only require one of theoff-diagonal entries of �S to be larger than C{log p/(nq)}1/2

with C > 0 fully determined by c in (5) and c0 and c1 in theregularity condition (C2). Then, if we choose the constantc = 4, the next theorem shows that the null parameter set inwhich �S is diagonal is asymptotically distinguishable fromU(4) by the test o

α. That is, H0 is rejected by the test oα

with high probability if �S ∈ U(4).

Theorem 2. Assuming the regularity conditions (C1) and(C2), then,

inf�S∈U(4)

P(oα = 1) → 1, as nq, p → ∞.

The next theorem further shows that this lower bound4(log p)1/2 is rate-optimal. Let Tα be the set of all α-level tests,that is, P(Tα = 1) ≤ α under H0 for all Tα ∈ Tα.

Theorem 3. Suppose that log p = o(nq). Let α, β > 0 andα + β < 1. Then there exists a constant c2 > 0 such that forall sufficiently large nq and p,

inf�S∈U(c2)

supTα∈Tα

P(Tα = 1) ≤ 1 − β.

Therefore, if c2 is sufficiently small, then any α level testis unable to reject the null hypothesis correctly uniformly

over �S ∈ U(c2) with probability tending to one. So the order(log p)1/2 in the lower bound of max1≤i<j≤p{|ωS,i,j|θ−1/2

i,j } in (5)cannot be further improved.

3.2. Oracle Multiple Testing Procedure

We next investigate the properties of the oracle multiple test-ing procedure. The following theorem shows that it controlsthe FDP and FDR at the pre-specified level α asymptotically.

Theorem 4. Assuming the regularity conditions (C1) and(C2), and letting

Sρ ={

(i, j) : 1 ≤ i < j ≤ p,|ωS,i,j|θ1/2i,j

≥ (log p)1/2+ρ

}.

Suppose for some ρ, δ>0, |Sρ| ≥ [1/{(8π)1/2α} + δ](log log p)1/2.Suppose l0 = |H0| ≥ c0p

2 for some c0 > 0, and p ≤ c(nq)r forsome c, r > 0. Let l = (p2 − p)/2, then,

lim(nq,p)→∞

FDR(to)

αl0/l= 1,

FDP(to)

αl0/l→ 1

in probability, as (nq, p) → ∞.

We make a few remarks. First, the condition |Sρ| ≥[1/{(8π)1/2α} + δ](log log p)1/2 in Theorem 4 is mild, becausewe have (p2 − p)/2 hypotheses in total and this conditiononly requires a few entries of �S having standardized magni-tude exceeding {(log p)1/2+ρ/(nq)}1/2 for some ρ > 0. Second,the condition l0 = |H0| ≥ c0p

2 can be easily satisfied underthe sparsity assumption. It is to ensure that the test is notoverly conservative, which could occur if this bound is notmet. Finally, the FDR and FDP are controlled at the levelαl0/l, instead of α, since p(p − 1)/2 is an overestimate of|H0|. This explains why our test tends to be conservative insimulations, where l0/l is usually less than one.

3.3. Data-Driven Procedures

We next turn to the data-driven procedures for both theglobal and multiple testing. We show that they perform aswell as the oracle testing procedures asymptotically.

Theorem 5. Assume the regularity conditions (C1′) , and(C2)-(C4).

(i) Under H0, for any t ∈ IR,

P(Mdnq − 4 log p + log log p ≤ t) → exp{−(8π)−1/2 exp(−t/2)},

as nq, p → ∞,

and this convergence is uniform for all {Xk}nk=1 satisfy-

ing (C1′), (C2)-(C4).(ii) Furthermore, inf�S∈U(4) P(d

α = 1) → 1, as nq, p → ∞.

This theorem shows that Mdnq has the same limiting null

distribution as the oracle test statistics Monq, and the power

of the corresponding test dα performs as well as the oracle

test and is thus minimax rate optimal. The same observation

Page 6: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

6 Biometrics

applies to Theorem 6 below, which shows that the data-drivenmultiple procedure also performs as well as the oracle case,and controls the FDP and FDR at the pre-specified level α

asymptotically.

Theorem 6. Assuming (C1′), (C4), and the same condi-tions as in Theorem 4,

lim(nq,p)→∞

FDR(td)

αl0/l= 1,

FDP(td)

αl0/l→ 1 in probability,

as(nq, p) → ∞.

4. Simulation Studies

We study in this section the finite-sample performance ofthe proposed testing procedures. We compare the follow-ing methods: the oracle procedure (denoted as “oracle”),the data-driven procedure with �T estimated by the usualsample covariance (denoted as “data-driven-S”), the onewith the banded estimator (denoted as “data-driven-B”), thetesting method of Liu (2013) with whitening (denoted as“whitening”), and the one without whitening (denoted as“vector normal”). We examine a range of spatial and tem-poral dimensions, as well as the sample sizes. Specifically,p = {50, 200, 800}, q = {20, 50, 200}, and n = {2, 10, 50}.

We consider two temporal covariance structures: (1) autore-gressive model: �T = (σT,i,j) with elements σT,i,j = 0.4|i−j|, 1 ≤i, j ≤ p; and (2) moving average model: �T = (σT,i,j) withnonzero elements σT,i,j = 1/(|i − j| + 1), for |i − j| ≤ 3. We alsoexamine two additional temporal structures and report thecorresponding results in Web Appendix D.

In all simulations, we use Lasso to estimate βi,

βi = D− 1

2i arg min

u

{1

2nq

∣∣(Y ·,−i − Y (·,−i))D−1/2i u

− (Y (i) − Y (i))∣∣22

+ λn,i|u|1}

, (6)

where Y is the nq × p data matrix by stacking the transformedsamples {(Y k,·,l, k = 1, . . . , n, l = 1, . . . , q}, Y k = Xk�

−1/2T for

the oracle procedure and Y k = Xk�−1/2

T for the data-driven

procedure, k = 1, . . . , n, Y (i) = (Y1,i, . . . , Ynq,i)�∈IRnq×1 , Y (i)=

(Yi, . . . , Yi)� ∈ IRnq×1 with Yi= 1

nq

∑nq

k=1Yk,i, Y (·,−i)=(Y

�·,−i, . . . ,

Y�·,−i)

�∈IRnq×(p−1) with Y ·,−i= 1nq

∑nq

k=1Y k,−i, Di=diag(�S,−i,−i),

and �S is the sample covariance matrix of �S with nq

transformed samples.

4.1. Global Testing Simulation

For global testing, the data are generated from a matrixnormal distribution with precision matrix �S ⊗ �T . To eval-uate the size of the tests under the null, we set �S = I.To evaluate the power under the alternative, we set �S =(I + U + δI)/(1 + δ), where δ = |λmin(I + U)| + 0.05, and U isa matrix with eight random nonzero entries. The locations offour nonzero entries are selected randomly from the upper tri-angle of U, each with a magnitude generated randomly and

uniformly from the set [−4{log p/(nq)}1/2, −2{log p/(nq)}1/2] ∪[2{log p/(nq)}1/2, 4{log p/(nq)}1/2]. The other four entries aredetermined by symmetry. We set the tuning parameters in (6)as λn,i = 2{�L,i,i log p/(nq)}1/2, following Xia et al. (2015).

Table 1 summarizes the size and power, in percentages, ofthe proposed global testing based on 1000 data replicationswith the significance level α1 = 5%. We see from the table thatthe empirical sizes of both oracle and data-driven proceduresare well controlled under α1 for all settings. By contrast, thevector normal based testing shows a serious distortion in size,whereas the whitening aided testing shows a clear increase insize too, especially when n is small and q is large. We alsoobserve that, for our proposed tests, the empirical sizes areslightly below the nominal level when p is large, and this isdue to the correlation among the variables. A similar phe-nomenon has also been observed and justified in Cai et al.(2013,Proposition 1). Moreover, we see that the proposed testbased on the banded covariance estimator is powerful in allsettings, even though the two spatial precision matrices dif-fer only in eight entries with the magnitude of difference ofthe order {log p/(nq)}1/2. The new test is also more powerfulthan the whitening procedure. In addition, the data-drivenprocedure based on the banded covariance estimator is seento perform similarly as the oracle procedure, and it clearlyoutperforms the one based on the sample covariance estima-tor. As such we recommend to use the banded estimator inour data-driven testing. The results for p = 800 are reportedin Web Appendix D.

4.2. Multiple Testing Simulation

For multiple testing, the data {X1, . . . ,Xn} are generated fromthe matrix normal distribution with the above two temporalcovariance matrices, and five spatial graph structures. Theyinclude: (1) banded graph: a banded graph generated by theR package huge (Zhao et al., 2012) with bandwidth equal to3; (2) sparse graph: a random graph by huge with defaultprobability 3/p that a pair of nodes has an edge; (3) rela-tively dense graph: a random graph by huge with probability0.01 that a pair of nodes has an edge; (4) hub graph: a hubgraph by huge with row and columns evenly partitioned into10 disjoint groups; and (5) small-world graph: a small-worldgraph generated by the R package rags2ridges (van Wierin-gen and Peeters, 2014) with three starting neighbors and 5%probability of rewiring.

We select the tuning parameters λn,i in (6) adaptively giventhe data. The idea is to make

∑(i,j)∈H0

I(|Wi,j| ≥ t) and {2 −2�(t)}(p2 − p)/2 close. That is, a good choice of the tuning

parameter λn,i = b/20√

�S,i,i log p/(nq) should minimize theerror

∫ 1

c

(∑(i,j)∈H0

I(|W (b)i,j | ≥ �−1(1 − α/2))

α(p2 − p)/2− 1

)2

with c > 0 and W(b)i,j is the statistic of the corresponding tuning

parameter. Toward that end, we employ the following proce-dure for parameter tuning. Step 2 is a discretization of theabove integral, where the sum in the numerator is over H, as

Page 7: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis 7

Table

1G

loba

lte

stin

g:em

pir

icalsi

ze(w

ith

standard

erro

rin

pare

nth

eses

)and

empir

icalpo

wer

(in

perc

enta

ge)

base

don

1000

data

replica

tions.

n=

{10,50},p

={5

0,200},q

={2

0,50,200},α

1=

5%

.M

ethod

sunder

com

pari

son

are

:th

eora

cle

pro

cedure

with

the

true

tem

pora

lco

vari

ance

(“ora

cle”

),th

edata

-dri

ven

pro

cedure

with

asa

mple

cova

riance

estim

ato

r(“

data

-dri

ven-S

”),

the

data

-dri

ven

pro

cedure

with

aba

nded

cova

riance

estim

ato

r(“

data

-dri

ven-B

”),

and

the

pro

cedure

ofLiu

(2013)

with

(“whiten

ing”

)and

without(“

vect

or

norm

al”

)co

nve

ntionalwhiten

ing.

n=

10,A

Rn

=50,A

Rn

=10,M

An

=50,M

A

pq

20

50

200

20

50

200

20

50

200

20

50

200

Em

pir

icalsi

ze(i

n%

)

Ora

cle

3.4

(0.6

)5.0

(0.7

)3.9

(0.6

)4.3

(0.7

)3.9

(0.6

)3.7

(0.6

)4.1

(0.6

)4.3

(0.7

)4.2

(0.6

)4.1

(0.6

)4.1

(0.6

)4.5

(0.7

)D

ata

-dri

ven

-S1.7

(0.4

)1.4

(0.4

)0.0

(0.0

)3.4

(0.6

)3.3

(0.6

)1.4

(0.4

)2.4

(0.5

)1.3

(0.4

)0.0

(0.0

)3.8

(0.6

)3.2

(0.6

)1.5

(0.4

)50

Data

-dri

ven

-B3.2

(0.6

)4.1

(0.6

)3.3

(0.6

)4.0

(0.6

)3.7

(0.6

)3.6

(0.6

)3.6

(0.6

)3.5

(0.6

)3.8

(0.6

)4.2

(0.6

)3.5

(0.6

)4.3

(0.7

)W

hit

enin

g5.4

(0.7

)7.3

(0.8

)7.1

(0.8

)4.4

(0.6

)5.9

(0.7

)4.4

(0.7

)6.7

(0.8

)9.3

(0.9

)11.2

(1.0

)5.3

(0.7

)4.5

(0.7

)5.7

(0.7

)V

ecto

rnorm

al

34.8

(1.5

)40.4

(1.6

)41.2

(1.6

)44.3

(1.6

)39.5

(1.5

)42.1

(1.6

)36.4

(1.5

)58.2

(1.6

)69.3

(1.5

)42.5

(1.6

)62.1

(1.5

)68.2

(1.5

)

Ora

cle

3.8

(0.6

)4.8

(0.7

)3.9

(0.6

)4.1

(0.6

)4.9

(0.7

)5.7

(0.7

)3.4

(0.6

)3.8

(0.6

)3.9

(0.6

)4.8

(0.7

)4.8

(0.7

)5.7

(0.7

)D

ata

-dri

ven

-S3.4

(0.6

)3.0

(0.5

)0.8

(0.3

)3.7

(0.6

)4.7

(0.6

)4.7

(0.6

)3.3

(0.6

)2.9

(0.5

)0.8

(0.3

)5.0

(0.7

)4.5

(0.6

)4.7

(0.6

)200

Data

-dri

ven

-B3.5

(0.6

)4.1

(0.6

)3.6

(0.6

)3.8

(0.6

)4.7

(0.6

)5.8

(0.7

)3.6

(0.6

)3.8

(0.6

)4.0

(0.6

)5.0

(0.7

)4.3

(0.6

)5.6

(0.7

)W

hit

enin

g7.1

(0.8

)7.2

(0.8

)8.2

(0.9

)6.1

(0.8

)6.7

(0.8

)7.2

(0.8

)6.6

(0.8

)11.5

(1.0

)14.3

(1.1

)6.9

(0.8

)6.1

(0.8

)6.6

(0.8

)V

ecto

rnorm

al

49.3

(1.6

)60.6

(1.5

)67.4

(1.5

)60.0

(1.5

)68.4

(1.5

)69.6

(1.5

)59.9

(1.5

)86.6

(1.1

)95.1

(0.7

)70.8

(1.5

)91.5

(0.9

)95.8

(0.6

)

Em

pir

icalpow

er(i

n%

)

Ora

cle

84.8

73.6

64.8

64.6

61.6

49.4

88.8

72.2

60.8

63.8

59.2

52.8

Data

-dri

ven

-S80.8

54.6

1.4

63.2

58.2

35.7

83.1

51.1

1.1

63.1

54.6

38.1

50

Data

-dri

ven

-B82.4

70.6

60.1

63.8

62.4

49.3

87.4

70.7

58.6

63.4

58.9

52.8

Whit

enin

g76.8

68.0

54.7

63.8

59.6

49.1

76.6

63.1

53.2

62.4

56.0

49.9

Vec

tor

norm

al

89.8

87.8

80.1

79.6

74.4

74.4

92.6

86.9

90.0

79.8

85.2

88.2

Ora

cle

92.6

81.1

61.8

63.8

63.6

54.1

91.4

75.6

61.9

72.6

57.1

54.1

Data

-dri

ven

-S91.8

76.6

40.9

63.0

62.8

49.4

90.4

71.7

41.1

72.9

55.8

49.4

200

Data

-dri

ven

-B92.2

79.2

61.2

63.6

63.3

53.2

90.8

74.9

61.1

72.6

56.9

53.4

Whit

enin

g88.2

71.6

51.6

63.8

60.5

52.1

85.4

68.6

53.3

70.8

56.4

50.8

Vec

tor

norm

al

94.4

91.2

88.9

87.6

86.8

85.2

97.2

96.8

98.7

93.8

97.0

98.2

Page 8: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

8 Biometrics

H0 is unknown.

Step 1. Let λn,i = b/20√

�S,i,i log p/(nq), for b = 1, · · · , 40.

For each b, calculate β(b)

i , i = 1, · · · , p, and con-struct the corresponding standardized statistics

W(b)i,j .

Step 2. Choose b as the minimizer of

10∑s=1

(∑(i,j)∈H I(|Wi,j |(b)≥�−1[1−s{1−�(

√logp)}/10])

s{1 − �(√

logp)}/10 · p(p − 1)− 1

)2

.

Step 3. The tuning parameters λn,i are then set as, λn,i =b/20{�S,i,i log p/(nq)}1/2.

Tables 2 and 3 summarize the empirical FDR (standarderrors) and the empirical power, in percentages, of the multi-ple testing procedures based on 100 data replications with thesignificance level α2 = 1% and the sample size n = 10. Here,the power is calculated as 100−1

∑100

l=1{∑

(i,j)∈H1I(|Wi,j,l| ≥

t)}/|H1|, where Wi,j,l denotes the standardized statistic forthe l-th replication and H1 denotes the nonzero locations. Wealso report the simulations results for n = 50, a more extremesmall-sample case with n = 2, and the case when p = 800in Web Appendix D. We have observed similar qualitativepatterns as shown here.

For the empirical FDR, we see from Table 2 that, theFDRs of the data-driven procedure based on the banded tem-poral covariance estimator are well under control across allsettings, with better performance as (p, q) increase, and arealso close to those of the oracle procedure. By contrast, thevector normal procedure yields much larger empirical FDRsthan α2. The whitening procedure also suffers from someobvious FDR distortion, especially for the moving averagetemporal structure. We also observe that, the performanceof the data-driven procedure based on the sample temporalcovariance estimator degrades and becomes more conserva-tive when the temporal dimension q grows. This is due tothe fact that the sample covariance is incapable of estimatingthe true covariance matrix well when q is large. In addition,for all testing procedures, the standard errors of the empiricalFDR decrease when (p, q) increase, whereas our proposed ora-cle and data-driven procedures perform similarly and achievesmaller standard errors than the competing solutions. For theempirical power, we observe from Table 3 that our testingprocedure based on the banded estimator is more powerfulthan the whitening one, and performs similarly as the oracle.Moreover, the empirical power decreases when p increases,as we have more edges to estimate. On the other hand, thepower increases when q increases, as we have more samplesto estimate the graph.

5. Real Data Analysis

We revisit the motivating electroencephalography (EEG) dataand illustrate our testing methods. The data was collectedin a study examining EEG correlates of genetic predisposi-tion to alcoholism and is available at http://kdd.ics.uci.edu/datasets/eeg/eeg.data.html. It consists of 77 alcoholic indi-viduals and 45 controls, and each subject was fitted with

a 61-lead electrode cap and was recorded at 256 Hz for 1second. There were in addition a ground and two bipolardeviation electrodes, which are excluded from the analy-sis. The electrode positions were located at standard sites(Standard Electrode Position Nomenclature, American Elec-troencephalographic Association 1990), and were organizedinto frontal, central, parietal, occipital, left temporal, andright temporal regions. Each subject performed 120 trialsunder three types of stimuli. More details of data collectioncan be found in Zhang et al. (1995). We preprocessed thedata by first averaging all trials under a single stimulus condi-tion following Li et al. (2010). We then performed an α-bandfiltering on the signals following Hayden et al. (2006). Theresulting data is a 61 × 256 matrix for each subject, and ourgoal is to infer the 61 × 61 connectivity network of the brainspatial locations.

We applied our testing procedures for the alcoholic and con-trol groups separately. We employed the banded estimator ofBickel and Levina (2008) to estimate the temporal covariance.By splitting the data of each group 10 times, we obtained thebest bandwidth equaling 3 for both the alcoholic and controlgroups, which are close to our simulation settings. We appliedthe global test and obtained the p-value 0 for both groups,clearly indicating that some brain regions are connected inthe two groups. We then applied the data-driven multipletesting procedure, with a pre-specified FDR significance levelα = 1%. For graphical illustration, we report the top 100 mostsignificant pairs of spatial locations, ordered by their p-values,in Figure 1. Examining the connection patterns among thoseelectrodes in the frontal region (denoted by symbols FP, AF,and F), we noted a decrease in connections and some asymme-try between the left and right frontal regions in the alcoholicgroup compared to the control. This finding agrees with thatin the literature (Hayden et al., 2006).

6. Discussion

Motivated by brain connectivity analysis, we have proposedin this article hypothesis testing procedures for detectingconditional dependence between spatial locations. Our workexplicitly exploits the special covariance structure of matrixnormal distribution, and our results suggest that using suchinformation could improve the inferential capability. Our test-ing methods can handle a small sample size, as well as anadequately large network; in our simulations, n is as small as 2or 10, whereas the spatial dimension p could reach 800. Mean-while, we have treated the temporal covariance as a nuisance.We make a few remarks regarding the temporal dimension q

and the temporal covariance estimator �T . First, we do notrequire q < n, as we in effect pool np samples to estimate aq × q temporal covariance �T . Second, there are multiple waysto estimate �T . Our numerical study suggests the usual sam-ple covariance estimator does not work well, but favors thebanded estimator (Bickel and Levina, 2008). Meanwhile, sev-eral other existing covariance estimators can be employed aswell (Cai and Liu, 2011; Cai et al., 2011). Third, the estima-tor of �T needs to satisfy the estimation rate condition (C4).We view this condition reasonable, because it holds undersome sparsity requirement, which is scientifically plausible,and it allows the temporal dimension q to be of the same

Page 9: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis 9

Table

2M

ultip

lete

stin

g:em

pir

icalFD

R(w

ith

standard

erro

rin

pare

nth

esis

)ba

sed

on

100

data

replica

tions.

n=

10,p

={5

0,200},q

={2

0,50,200},α

2=

1%

.

q=

20

q=

50

q=

200

psp

ati

alst

ruct

ure

12

34

51

23

45

12

34

5

Em

pir

icalFD

R(S

E)

(in

%),

auto

regre

ssiv

ete

mpora

lst

ruct

ure

Ora

cle

1.3

(1.4

)1.1

(1.6

)0.8

(2.3

)0.5

(1.1

)1.1

(2.3

)0.5

(0.6

)0.6

(0.8

)0.8

(2.6

)0.8

(1.2

)0.8

(0.7

)0.4

(0.5

)1.0

(1.2

)0.6

(2.0

)0.9

(1.3

)0.6

(0.6

)D

ata

-dri

ven

-S1.5

(1.7

)1.0

(1.6

)0.5

(1.9

)0.6

(1.3

)1.4

(2.9

)0.5

(0.5

)0.5

(0.7

)0.3

(1.3

)0.7

(1.2

)0.6

(0.7

)0.2

(0.3

)0.1

(0.4

)0.0

(0.0

)0.1

(0.5

)0.1

(0.1

)50

Data

-dri

ven

-B1.3

(1.4

)1.0

(1.6

)0.4

(1.8

)0.5

(1.0

)1.1

(2.4

)0.5

(0.6

)0.7

(0.8

)0.6

(2.2

)0.8

(1.2

)0.8

(0.7

)0.4

(0.5

)0.8

(1.0

)0.6

(2.0

)1.0

(1.3

)0.6

(0.5

)W

hit

enin

g2.4

(1.8

)1.3

(2.2

)1.4

(3.2

)1.0

(1.6

)1.8

(3.3

)1.0

(0.8

)0.9

(1.1

)1.7

(3.9

)1.2

(1.7

)1.2

(1.1

)1.7

(1.1

)1.0

(1.2

)0.7

(2.1

)1.2

(1.6

)1.0

(0.9

)V

ecto

rnorm

al

4.5

(2.2

)4.4

(3.4

)4.9

(6.2

)3.8

(3.3

)4.8

(3.8

)2.5

(1.5

)4.3

(2.3

)9.1

(7.8

)4.9

(3.3

)3.2

(1.5

)2.7

(1.4

)5.3

(2.8

)7.4

(6.3

)5.1

(3.3

)3.3

(1.6

)

Ora

cle

1.3

(0.7

)0.7

(1.0

)0.7

(0.7

)1.0

(1.7

)1.1

(1.2

)0.5

(0.2

)0.7

(0.5

)0.7

(0.5

)0.8

(0.6

)0.6

(0.3

)0.6

(0.3

)0.8

(0.4

)0.9

(0.6

)0.8

(0.6

)0.6

(0.3

)D

ata

-dri

ven

-S1.3

(0.7

)0.8

(1.0

)0.6

(0.7

)1.0

(1.6

)1.0

(1.2

)0.5

(0.2

)0.7

(0.5

)0.6

(0.5

)0.9

(0.7

)0.6

(0.3

)0.6

(0.3

)0.4

(0.4

)0.6

(0.4

)0.7

(0.6

)0.5

(0.3

)200

Data

-dri

ven

-B1.3

(0.7

)0.8

(1.0

)0.6

(0.7

)1.0

(1.6

)1.1

(1.1

)0.5

(0.2

)0.7

(0.5

)0.7

(0.5

)0.9

(0.6

)0.6

(0.3

)0.6

(0.3

)0.8

(0.4

)0.8

(0.5

)0.9

(0.6

)0.6

(0.3

)W

hit

enin

g2.0

(1.0

)1.1

(1.9

)1.2

(1.2

)1.9

(2.9

)1.5

(2.4

)0.8

(0.4

)1.2

(0.7

)1.2

(0.8

)1.4

(0.9

)1.1

(0.5

)1.2

(0.4

)1.3

(0.7

)1.3

(0.7

)1.4

(0.8

)1.1

(0.4

)V

ecto

rnorm

al

4.1

(1.6

)5.4

(2.3

)5.2

(2.4

)6.3

(3.5

)5.2

(2.3

)2.9

(0.7

)4.9

(1.5

)5.6

(1.5

)6.7

(1.7

)3.8

(0.8

)4.4

(0.9

)6.5

(1.3

)7.3

(2.0

)8.0

(1.7

)4.7

(0.9

)

Em

pir

icalFD

R(S

E)

(in

%),

movin

gaver

age

tem

pora

lst

ruct

ure

Ora

cle

1.3

(1.4

)0.8

(1.4

)0.6

(2.2

)0.7

(1.4

)1.0

(2.2

)0.5

(0.5

)0.9

(1.0

)0.8

(2.4

)0.9

(1.5

)0.7

(0.7

)0.7

(0.7

)0.7

(0.8

)1.3

(3.5

)0.7

(1.6

)0.7

(0.6

)D

ata

-dri

ven

-S1.4

(1.3

)0.8

(1.5

)0.6

(2.2

)0.5

(1.1

)1.3

(2.8

)0.5

(0.6

)0.6

(1.0

)0.3

(1.4

)0.7

(1.2

)0.5

(0.7

)0.1

(0.3

)0.0

(0.0

)0.0

(0.0

)0.2

(0.6

)0.0

(0.1

)50

Data

-dri

ven

-B1.3

(1.4

)1.0

(1.3

)0.6

(2.1

)0.8

(1.4

)1.0

(2.2

)0.5

(0.5

)0.8

(0.9

)0.5

(1.8

)0.8

(1.5

)0.7

(0.7

)0.8

(0.8

)0.6

(0.8

)1.3

(3.5

)1.0

(1.5

)0.7

(0.6

)W

hit

enin

g2.9

(2.1

)1.7

(3.1

)1.2

(2.9

)1.5

(2.0

)1.5

(2.9

)1.3

(0.9

)1.4

(1.4

)1.5

(3.3

)1.7

(1.9

)1.5

(1.3

)2.8

(1.3

)2.0

(1.5

)2.0

(3.9

)2.3

(2.5

)1.6

(0.9

)V

ecto

rnorm

al

4.7

(2.6

)4.7

(3.5

)6.8

(7.9

)4.5

(3.4

)5.2

(3.9

)3.7

(1.5

)6.0

(2.9

)11.8

(9.3

)7.8

(4.2

)4.9

(2.1

)5.3

(1.9

)7.9

(3.2

)16.5

(10.4

)10.3

(4.1

)5.9

(2.1

)

Ora

cle

1.3

(0.7

)0.8

(0.8

)0.7

(0.7

)1.0

(1.3

)1.0

(1.1

)0.4

(0.2

)0.7

(0.5

)0.7

(0.5

)0.8

(0.6

)0.7

(0.4

)0.6

(0.3

)0.8

(0.5

)0.7

(0.5

)0.8

(0.5

)0.6

(0.3

)D

ata

-dri

ven

-S1.3

(0.7

)0.8

(0.8

)0.6

(0.8

)1.0

(1.2

)1.0

(1.2

)0.4

(0.2

)0.7

(0.5

)0.6

(0.5

)0.7

(0.6

)0.7

(0.4

)0.5

(0.3

)0.4

(0.4

)0.4

(0.5

)0.5

(0.4

)0.4

(0.2

)200

Data

-dri

ven

-B1.3

(0.7

)0.8

(0.8

)0.7

(0.7

)1.0

(1.2

)1.0

(1.0

)0.4

(0.2

)0.7

(0.5

)0.7

(0.5

)0.8

(0.6

)0.7

(0.4

)0.6

(0.3

)0.7

(0.4

)0.7

(0.5

)0.8

(0.5

)0.6

(0.3

)W

hit

enin

g2.0

(1.2

)1.7

(1.6

)1.3

(1.6

)2.5

(4.5

)1.8

(2.4

)1.2

(0.5

)1.4

(0.7

)1.5

(0.8

)1.9

(1.1

)1.7

(0.7

)2.0

(0.6

)1.9

(0.9

)1.9

(0.8

)2.5

(1.2

)1.4

(0.5

)V

ecto

rnorm

al

3.5

(1.3

)5.2

(2.5

)5.4

(2.9

)6.9

(4.2

)5.3

(2.5

)4.3

(0.7

)7.6

(1.5

)9.3

(2.1

)10.8

(2.3

)6.1

(1.1

)8.1

(1.1

)12.1

(2.0

)13.8

(2.0

)14.4

(2.1

)8.3

(1.3

)

Page 10: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

10 Biometrics

Table

3M

ultip

lete

stin

g:em

pir

icalpo

wer

(in

perc

enta

ge)

base

don

100

data

replica

tions.

n=

10,p

={5

0,200},q

={2

0,50,200},α

2=

1%

.

q=

20

q=

50

q=

200

psp

ati

alst

ruct

ure

12

34

51

23

45

12

34

5

Em

pir

icalpow

er(i

n%

),auto

regre

ssiv

ete

mpora

lst

ruct

ure

Ora

cle

62.7

53.2

98.9

98.7

19.1

100.0

98.9

100.0

100.0

88.5

100.0

100.0

100.0

100.0

100.0

Data

-dri

ven

-S57.7

47.9

97.7

98.3

14.5

100.0

97.4

100.0

100.0

77.9

100.0

100.0

100.0

100.0

99.9

50

Data

-dri

ven

-B61.2

51.9

98.3

98.6

17.2

100.0

98.8

100.0

100.0

87.2

100.0

100.0

100.0

100.0

100.0

Whit

enin

g47.9

39.1

96.4

95.1

11.8

99.6

96.7

100.0

100.0

74.9

100.0

100.0

100.0

100.0

100.0

Vec

tor

norm

al

62.7

56.1

98.1

96.9

21.9

99.9

98.0

100.0

100.0

85.1

100.0

100.0

100.0

100.0

100.0

Ora

cle

44.5

21.2

46.7

19.0

12.4

99.9

95.6

98.1

92.6

92.6

100.0

100.0

100.0

100.0

100.0

Data

-dri

ven

-S43.3

20.4

45.0

18.3

11.6

99.9

94.7

97.6

91.4

89.3

100.0

100.0

100.0

100.0

100.0

200

Data

-dri

ven

-B44.1

20.8

46.1

18.5

12.2

99.9

95.5

98.1

92.4

92.1

100.0

100.0

100.0

100.0

100.0

Whit

enin

g32.3

13.7

32.9

10.1

5.9

97.8

89.3

94.6

76.6

77.3

100.0

100.0

100.0

100.0

100.0

Vec

tor

norm

al

48.9

26.0

45.6

23.2

15.8

99.7

92.7

96.2

86.7

90.4

100.0

100.0

100.0

100.0

100.0

Em

pir

icalpow

er(i

n%

),m

ovin

gav

erage

tem

pora

lst

ruct

ure

Ora

cle

62.2

50.7

98.5

98.6

17.3

100.0

98.9

100.0

100.0

88.3

100.0

100.0

100.0

100.0

100.0

Data

-dri

ven

-S58.1

45.7

97.9

98.2

13.6

99.9

97.6

100.0

100.0

76.9

100.0

100.0

100.0

100.0

99.8

50

Data

-dri

ven

-B60.7

49.1

98.2

98.6

16.1

100.0

98.9

100.0

100.0

87.2

100.0

100.0

100.0

100.0

100.0

Whit

enin

g44.7

35.7

93.7

94.1

10.1

98.7

95.8

99.9

100.0

69.9

100.0

100.0

100.0

100.0

100.0

Vec

tor

norm

al

63.5

53.9

97.2

96.7

22.2

99.8

97.8

100.0

100.0

83.9

100.0

100.0

100.0

100.0

100.0

Ora

cle

44.5

27.9

33.6

19.7

11.7

99.9

96.7

97.9

92.9

80.1

100.0

100.0

100.0

100.0

100.0

Data

-dri

ven

-S43.4

26.8

32.3

19.4

10.9

99.8

95.9

97.3

91.9

76.3

100.0

100.0

100.0

100.0

100.0

200

Data

-dri

ven

-B44.2

27.5

33.2

19.1

11.5

99.9

96.6

97.8

92.7

79.4

100.0

100.0

100.0

100.0

100.0

Whit

enin

g30.1

17.3

21.3

9.9

5.8

96.2

89.2

92.4

70.1

59.8

100.0

100.0

100.0

100.0

100.0

Vec

tor

norm

al

50.5

31.2

34.7

22.4

15.3

99.6

93.2

95.1

83.7

80.8

100.0

100.0

100.0

100.0

100.0

Page 11: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

Hypothesis Testing of Matrix Graph Model with Application to Brain Connectivity Analysis 11

Figure 1. Connectivity network inferred by the multiple testing procedure for the EEG data. The left panel is for thealcoholic group, and the right panel for the control. Top 100 significant links are shown in this graph.

order as a polynomial of np, which again generally holds inthe neuroimaging context. Under such situations, we feel ourproposed testing procedure is advisable. Finally, the empiricalperformance of our testing procedure depends on how goodthe plug-in estimator �T is. In cases where the sparsity doesnot hold or q far exceeds np, the quality of �T can deteriorate,which would in turn adversely affect our testing procedure.

7. Supplementary Materials

Web Appendices, Tables, and Figures referenced in Sec-tions 2–4, the EEG data and the computer code are availablewith this article at the Biometrics website on Wiley OnlineLibrary.

Acknowledgements

We thank the editor, the associate editor, and two reviewersfor their constructive and insightful comments. This researchwas partially supported by NSF grants DMS-1310319 (Li),DMS-1613137 (Li), and DMS-1612906 (Xia), The Recruit-ment Program of Global Experts Youth Project (Xia), andthe startup fund from Fudan University (Xia).

References

Allen, G. I. and Tibshirani, R. (2010). Transposable regularizedcovariance models with an application to missing data impu-tation. Annals of Applied Statistics 4, 764–790.

Allen, G. I. and Tibshirani, R. (2012). Inference with transposabledata: modelling the effects of row and column correlations.Journal of the Royal Statistical Society: Series B (StatisticalMethodology) 74, 721–743.

Anderson, T. W. (2003). An Introduction To Multivariate Statis-tical Analysis. Third edition. New York: Wiley.

Aston, J. A. and Kirch, C. (2012). Estimation of the distributionof change-points with application to fmri data. Annals ofApplied Statistics 6, 1906–1948.

Bickel, P. J. and Levina, E. (2008). Regularized estimation of largecovariance matrices. The Annals of Statistics 199–227.

Bullmore, E. and Sporns, O. (2009). Complex brain networks:Graph theoretical analysis of structural and functional sys-tems. Nature Reviews. Neuroscience 10, 186–198.

Cai, T. T. and Liu, W. (2011). Adaptive thresholding for sparsecovariance matrix estimation. Journal of the American Sta-tistical Association 106, 672–684.

Cai, T. T., Liu, W., and Luo, X. (2011). A constrained �1 minimiza-tion approach to sparse precision matrix estimation. Journalof the American Statistical Association 106, 594–607.

Cai, T. T., Liu, W., and Xia, Y. (2013). Two-sample covariancematrix testing and support recovery in high-dimensional andsparse settings. Journal of the American Statistical Associ-ation 108, 265–277.

Chen, X. and Liu, W. (2015). Statistical inference for matrix-variate gaussian graphical models and false discovery ratecontrol. arXiv preprint arXiv:1509.05453.

Danaher, P., Wang, P., and Witten, D. M. (2014). The joint graph-ical lasso for inverse covariance estimation across multipleclasses. Journal of the Royal Statistical Society, Series B(Statistical Methodology) 76, 373–397.

Drton, M. and Perlman, M. D. (2007). Multiple testing and errorcontrol in gaussian graphical model selection. Statistical Sci-ence 22, 430–449.

Fornito, A., Zalesky, A., and Breakspear, M. (2013). Graph analysisof the human connectome: Promise, progress, and pitfalls.Neuroimage 80, 426–444.

Page 12: Hypothesis testing of matrix graph model with application ...homepage.fudan.edu.cn/xiayin/files/2017/02/Xia_et_al-2016-Biometri… · matrix, are frequently employed to describe such

12 Biometrics

Fox, M. D. and Raichle, M. E. (2007). Spontaneous fluctuations inbrain activity observed with functional magnetic resonanceimaging. Nature Reviews Neuroscience 8, 700–711.

Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inversecovariance estimation with the graphical lasso. Biostatistics9, 432–441.

Friston, K., Ashburner, J., Kiebel, S., Nichols, T., and Penny,W. (2007). Statistical Parametric Mapping: The Analysisof Functional Brain Images. London: Academic Press.

Hayden, E. P., Wiegand, R. E., Meyer, E. T., Bauer, L. O.,O’Connor, S. J., Nurnberger, J. I., et al. (2006). Patterns ofregional brain activity in alcohol-dependent subjects. Alco-holism: Clinical and Experimental Research 30, 1986–1991.

Leng, C. and Tang, C. Y. (2012). Sparse matrix graphical models.Journals of American Statistical Association 107, 1187–1200.

Li, B., Kim, M. K., and Altman, N. (2010). On dimension foldingof matrix- or array-valued statistical objects. The Annals ofStatistics 38, 1094–1121.

Liu, H., Han, F., Yuan, M., Lafferty, J., and Wasserman, L. (2012).High-dimensional semiparametric Gaussian copula graphicalmodels. The Annals of Statistics 40, 2293–2326.

Liu, W. (2013). Gaussian graphical model estimation with falsediscovery rate control. The Annals of Statistics 41, 2948–2978.

Meinshausen, N. and Buhlmann, P. (2006). High-dimensionalgraphs and variable selection with the lasso. The Annals ofStatistics 1436–1462.

Narayan, M., Allen, G., and Tomson, S. (2015). Two sample infer-ence for populations of graphical models with applications tofunctional connectivity. arXiv preprint arXiv:1502.03853.

Narayan, M. and Allen, G. I. (2016). Mixed effects models forresampled network statistics improves statistical power tofind differences in multi-subject functional connectivity.Frontiers in Neuroscience, 10. doi:10.3389/fnins.2016.00108.

Raichle, M. E. and Gusnard, D. A. (2002). Appraising the brainsenergy budget. Proceedings of the National Academy of Sci-ences of the United States of America, 99, 10237–10239.

Ravikumar, P., Wainwright, M. J., Raskutti, G., and Yu, B. (2011).High-dimensional covariance estimation by minimizing �1-penalized log-determinant divergence. Electronic Journal ofStatistics 5, 935–980.

Reiss, P. and Ogden, R. (2010). Functional generalized linear mod-els with images as predictors. Biometrics 66, 61–69.

Smith, S. M., Jenkinson, M., Woolrich, M. W., Beckmann, C. F.,Behrens, T. E., Johansen-Berg, H., et al. (2004). Advances infunctional and structural MR image analysis and implemen-tation as FSL. NeuroImage, 23, Supplement 1, S208–S219.

Tsiligkaridis, T., Hero, III, A. O., and Zhou, S. (2013). On con-vergence of Kronecker graphical lasso algorithms. IEEETransactions on Signal Processing 61, 1743–1755.

van Wieringen, W. N. and Peeters, C. F. (2014). Ridge estimationof inverse covariance matrices from high-dimensional data.arXiv preprint arXiv:1403.0904.

Xia, Y., Cai, T., and Cai, T. T. (2015). Testing differential networkswith applications to the detection of gene-gene interactions.Biometrika 102, 247–266.

Yin, J. and Li, H. (2012). Model selection and estimation inthe matrix normal graphical model. Journal of MultivariateAnalysis 107, 119–140.

Yuan, M. and Lin, Y. (2007). Model selection and estimation inthe gaussian graphical model. Biometrika 94, 19–35.

Zhang, X., Begleiter, H., Porjesz, B., Wang, W., and Litke, A.(1995). Event related potentials during object recognitiontasks. Brain Research Bulletin 38, 531–538.

Zhao, T., Liu, H., Roeder, K., Lafferty, J., and Wasserman, L.(2012). The huge package for high-dimensional undirectedgraph estimation in r. The Journal of Machine LearningResearch 13, 1059–1062.

Zhou, H. and Li, L. (2014). Regularized matrix regression. Journalof the Royal Statistical Society, Series B 76, 463–483.

Zhou, S. (2014). Gemini: graph estimation with matrix variatenormal instances. The Annals of Statistics 42, 532–562.

Zhu, Y., Shen, X., and Pan, W. (2014). Structural Pursuit OverMultiple Undirected Graphs. Journal of the American Sta-tistical Association 109, 1683–1696.

Received November 2015. Revised September 2016.Accepted October 2016.