21
Biometrika (year), volume, number, pp. 1–20 Advance Access publication on date Printed in Great Britain Supplementary Material for ”Testing separability of space-time functional processes” BY P. CONSTANTINOU Department of Statistics, Pennsylvania State University, 330B Thomas Building, University Park, Pennsylvania 16802, USA 5 [email protected] P. KOKOSZKA Department of Statistics, Colorado State University, 202 Statistics Building, Fort Collins, Colorado 80523, USA [email protected] 10 AND M. REIMHERR Department of Statistics, Pennsylvania State University, 411 Thomas Building, University Park, Pennsylvania 16802, USA [email protected] SUMMARY 15 This Supplementary Material contains mostly proofs of the large sample results used in the paper. However, we begin with the detailed description of testing procedures which do not in- volve spatial dimension reduction, Section A. Then, in Section B, we proceed with the derivation of the form of several matrices which are needed to formulate the joint asymptotic distribution of the maximum likelihood estimators. The form of these matrices is complex, and in some cases 20 can be defined only via an algorithm, which can however be coded. With these matrices defined, we present in Section C the proof of our main result, Theorem 1. Section D contains the proof of Theorem 2, and finally Section E has additional empirical rejection rates, more discussion in the Irish wind data and application to pollution data. A. TESTS FOR A SMALL NUMBER OF SPATIAL LOCATIONS 25 In the main article, we described the testing procedure which involves dimension reduction in both space and time, using estimated principal components. In this section, we describe two alternative versions of the tests. Both are suitable for curves defined at a small number of spatial locations. In Section A·1, we describe the simplest approach which uses a fixed basis of time domain functions. Section A·2 considers temporal dimension reduction via functional principal components. 30 A·1. Procedure 1: fixed spatial locations, fixed temporal basis We assume that for each n the field X n is observed at the same spatial locations s k ,k =1,...,K. We estimate μ(s k ,t) by the sample average ˆ μ(s k ,t)= N -1 N n=1 X n (s k ,t), and focus on the covariance structure. Under H 0 , the covariances of the observations are cov{X n (s k ,t)X n (s ,t 0 )} = U (k,‘)V (t, t 0 ), C 2016 Biometrika Trust

Supplementary Material for ”Testing separability of space ...piotr/sepSupp.pdf · Supplementary Material for ”Testing separability of ... temporal dimension reduction via functional

Embed Size (px)

Citation preview

Biometrika (year), volume, number, pp. 1–20Advance Access publication on datePrinted in Great Britain

Supplementary Material for ”Testing separability ofspace-time functional processes”

BY P. CONSTANTINOUDepartment of Statistics, Pennsylvania State University, 330B Thomas Building, University

Park, Pennsylvania 16802, USA 5

[email protected]

P. KOKOSZKADepartment of Statistics, Colorado State University, 202 Statistics Building, Fort Collins,

Colorado 80523, [email protected] 10

AND M. REIMHERRDepartment of Statistics, Pennsylvania State University, 411 Thomas Building, University Park,

Pennsylvania 16802, [email protected]

SUMMARY 15

This Supplementary Material contains mostly proofs of the large sample results used in thepaper. However, we begin with the detailed description of testing procedures which do not in-volve spatial dimension reduction, Section A. Then, in Section B, we proceed with the derivationof the form of several matrices which are needed to formulate the joint asymptotic distribution ofthe maximum likelihood estimators. The form of these matrices is complex, and in some cases 20

can be defined only via an algorithm, which can however be coded. With these matrices defined,we present in Section C the proof of our main result, Theorem 1. Section D contains the proof ofTheorem 2, and finally Section E has additional empirical rejection rates, more discussion in theIrish wind data and application to pollution data.

A. TESTS FOR A SMALL NUMBER OF SPATIAL LOCATIONS 25

In the main article, we described the testing procedure which involves dimension reduction in bothspace and time, using estimated principal components. In this section, we describe two alternative versionsof the tests. Both are suitable for curves defined at a small number of spatial locations. In Section A·1, wedescribe the simplest approach which uses a fixed basis of time domain functions. Section A·2 considerstemporal dimension reduction via functional principal components. 30

A·1. Procedure 1: fixed spatial locations, fixed temporal basisWe assume that for each n the field Xn is observed at the same spatial locations sk, k = 1, . . . ,K. We

estimate µ(sk, t) by the sample average µ(sk, t) = N−1∑N

n=1Xn(sk, t), and focus on the covariancestructure. Under H0, the covariances of the observations are

cov{Xn(sk, t)Xn(s`, t′)} = U(k, `)V(t, t′),

C© 2016 Biometrika Trust

2 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

with entries U(k, `) = U(sk, s`) forming a K ×K matrix U , and V being the temporal covariance func-35

tion over T × T . As in the multivariate setting, the estimation of the matrix U and the covariance functionV must involve an iterative procedure. In the functional setting, a dimension reduction is also needed. Sup-pose {vj , j ≥ 1} is a basis system in L2(T ) such that for sufficiently large J , the functions

X(J)n (sk, t) = µ(sk, t) +

J∑j=1

ξjn(sk)vj(t)

are good approximations to the functions Xn(sk). We thus replace a large number of time points by amoderate number J , and seek to reduce the testing of H0 to testing the separability of the covariances of40

the transformed observations given as K × J matrices

Ξn = {ξjn(sk), k = 1, . . . ,K, j = 1, . . . , J}. (A1)

The index j should be viewed as a transformed time index. The number I of the actual time points ti canbe very large, J is usually much smaller. The following proposition is easy to prove. It establishes theconnection between the original testing problem and testing the separability of the transformed data (A1).The assumption that the vj are orthonormal cannot be removed.45

PROPOSITION 1. For some orthonormal vj , set

X(J)(sk, t) = µ(sk, t) +

J∑j=1

ξj(sk)vj(t).

If

E{ξj(sk)ξi(s`)} = U(k, `)V (j, i), (A2)

then

cov{X(J)(sk, t), X(J)(s`, s)} = U(k, `)V(t, s). (A3)

Conversely, (A3) implies (A2). The entries V (j, i) and V(t, s) are related via

V (j, i) =

∫∫V(t, s)vj(t)vi(s)dtds, V(t, s) =

J∑j,i=1

V (j, i)vj(t)vi(s).

We assume that {vj , j = 1, . . . , J} is a fixed orthonormal system, for example the first J trigonometric50

basis functions. Slightly abusing notation, consider the matrices

Σ (KJ ×KJ), U (K ×K), V (J × J),

defined as in Theorem B, but with matrices Xn replaced by the matrices Ξn. The index j ≤ J now playsthe role of the index i ≤ I of Section 2. To apply tests based on Theorem 1, we must recursively calculateU and V using the relations stated in Theorem B. This can be done using Algorithm 1 with Ξn in placeof Xn. This approach leads to the following test procedure. The test statistic can be one of the three55

statistics introduced in Section 2.

Procedure A.11. Choose a deterministic orthonormal basis vj , j ≥ 1.2. Approximate each curve Xn(sk, t) by60

X(J)n (sk, t) = µ(sk, t) +

J∑j=1

ξjn(sk)vj(t).

Construct K × J matrices Ξn defined in (A1).

Testing separability of space-time functional processes 3

3. Compute the matrix

Σ =1

N

N∑n=1

vec(Ξn − M){vec(Ξn − M)}>, M =1

N

N∑n=1

Ξn.

Using Algorithm 1 with Ξn in place of Xn, compute the matrices U and V .4. Estimate the matrix W defined by (C.1) by replacing Σ, U, V by their estimates.5. Calculate the p-value using the limit distribution specified in Theorem 3, with I replaced by J . 65

Step 2 can be easily implemented using R function pca.fd, see Ramsay et al. (2009) Chapter 7.Several methods of choosing J are available. We used the cumulative variance rule requiring that J be solarge that at least 85% of variance is explained for each location sk.

A·2. Procedure 2: fixed spatial locations, data driven temporal basisIn Section A·1, we used a deterministic orthonormal system. To achieve the most efficient dimension 70

reduction, it is usual to project on a data driven system, with the functional principal componentsbeing used most often. Since the sequences of functions are defined at a number of spatial locations,it is not a priori clear how a suitable orthonormal system should be constructed, as each sequence{Xn(sk), n = 1, . . . , N} has different functional principal components vj(sk), j ≥ 1, and Proposition 1requires that a single system be used. Our next algorithm proposes an approach which leads to suitable 75

estimates U , V and Σ. It is not difficult to show that these estimators are OP (N−1/2) consistent.

Algorithm 1. Initialize with U0 = IK . For i = 1, 2, . . . , perform the following two steps until conver-gence is reached.1. Calculate 80

Vi(t, t′) = (NK)−1N∑

n=1

{Xn(·, t)− µ(·, t)}>U−1i−1{Xn(·, t′)− µ(·, t′)}.

Denote the eigenfunctions and eigenvalues of Vi by {vij} and {λij}. Determine Ji such that the first Jieigenfunctions of Vi explain at least 85% of the variance.2. Project each function Xn(sk, ·) on the first Ji eigenfunctions of Vi. Denote the scores of these projec-tions by

Zin(sk, j) = 〈Xn(sk, ·)− µ(sk, ·), vij〉

and calculate 85

Ui(k, `) = (NJi)−1

Ji∑j=1

N∑n=1

Zin(sk, j)Zin(s`, j)

λij.

Normalize Ui so that tr(Ui) = K.Let {vj , j = 1, . . . , J} denote the final eigenfunctions. Carry out the final projection Zn(sk, j) =

〈Xn(sk, ·)− µ(sk, ·), vj〉. For each n, denote by Zn the K × J matrix with these entries. Set

Σ =1

N

N∑n=1

vec(Zn) vec(Zn)>

and apply Algorithm 1 with Xn = Zn to obtain U and V .

Using the above algorithm, the testing procedure is as follows: 90

Procedure A.21. Calculate matrices Σ, U , V according to Algorithm A1.2. Perform steps 4 and 5 of Procedure A.1.

4 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

B. DERIVATION OF THE Q MATRICES95

This section introduces four matrices, that describe the covariance structure of products of variousvectorized matrices consisting of standard normal variables. We refer to them collectively as Q matrices,as we use the symbolQ with suitable subscripts and superscripts to denote them. These matrices appear inthe asymptotic distribution of the vectorized matrices U , V , Σ, which, in turn, is used to prove Theorem 1.In particular, the asymptotic distribution of statistic TF , which we recommended in Section 2, is expressed100

in terms of these Q matrices. Some of them are defined though an algorithm.

THEOREM 1. If E is an K × I matrix of independent standard normals, then

cov{vec(EE>)} = 2IQK . (B1)

where

QK(i, j) =

1, i = j = k + (k − 1)K, k = 1, . . . ,K12 , i = j 6= k + (k − 1)K, k = 1, . . . ,K12 , i 6= j = 1 +

[(i−1)−{(i−1) mod K}

K

]+ {(i− 1) mod K}K,

0, otherwise,

.

Proof. Denote by ekl the independent standard normals, and set105

ek = (ek1, ek2, . . . , ekI)>, k = 1, . . . ,K,

so that

E =

e>1...e>K

.

Then for any i, j in {1, . . . ,K} we have that the (i, j) entry of cov{vec(EE>)} can be written as

cov(e>k1el1 , e

>k2el2),

where we have the relationships

i = k1 + (l1 − 1)K, k1 = {(i− 1) mod K}+ 1, l1 = 1 +(i− 1)− (i− 1) mod K

K

j = k2 + (l2 − 1)K, k2 = {(j − 1) mod K}+ 1, l2 = 1 +(j − 1)− (j − 1) mod K

K.110

For the diagonal terms, i.e. i = j, we have two settings k1 = k2 = l1 = l2, in which case the covariance is2I , or alternatively k1 = k2 6= l1 = l2 in which case the covariance is I . Since the former occurs in everyKth term, we have established the proper pattern for the diagonal.

We now need only establish the pattern for the off diagonal. Every term in the off diagonal can beexpressed as cov(e>i ej , e

>k el), for some i, j, k, l = 1, . . . ,K. Clearly, if any one index is different from115

the other 3, then the covariance is 0. We can’t have all four indices being equal as that would be a diagonalelement, and we can’t have i = k 6= j = k as that would also be a diagonal element. If i = j and k = lthen two inner products are independent, and thus the covariance is zero. Therefore, the only nonzero offdiagonal entries occur when i = l 6= j = k, and the covariance would be I . To determine where in theK2 ×K2 matrix these occur, we use the change of base formulas. �120

We illustrate the form of the matrices QK :

Q2 =

1.0 0 0 00 0.5 0.5 00 0.5 0.5 00 0 0 1.0

,

Testing separability of space-time functional processes 5

Q3 =

1.0 0 0 0 0 0 0 0 00 0.5 0 0.5 0 0 0 0 00 0 0.5 0 0 0 0.5 0 00 0.5 0 0.5 0 0 0 0 00 0 0 0 1.0 0 0 0 00 0 0 0 0 0.5 0 0.5 00 0 0.5 0 0 0 0.5 0 00 0 0 0 0 0.5 0 0.5 00 0 0 0 0 0 0 0 1.0

.

THEOREM 2. If E is an K × I matrix of independent standard normals, then

cov{vec(EE>), vec(E>E)} = 2(IK)1/2QK,I , (B2)

where QK,I is an K2 × I2 matrix given by 125

QK,I(i, j) =

{(KI)−1/2 i = k1 +K(k1 − 1), j = k2 + I(k2 − 1) k1 = 1, . . . ,K k2 = 1, . . . , I

0 otherwise,.

Proof. Here, each entry of the above covariance matrix is obtained by taking two rows of E, possi-bly the same row, forming the inner product, taking two columns, taking their inner product, and thencomputing the covariance between the two. Due to the symmetry of this calculation, there are only threepossible resulting values: when the two rows are different, then the two columns are different, or whenboth the rows and columns are different. When both rows and columns are different, we can, without loss 130

of generality, take the first two rows and columns. In that case, the covariance becomes

cov

(K∑

k=1

e1,ke2,k,

I∑i=1

ei,1ei,2

)=

K∑k=1

I∑i=1

cov (e1,ke2,k, ei,1ei,2) .

However, every summand above is zero when i > 2 or k > 2 since they will then involve independentvariables. Therefore, we can express the above as

cov(e1,1e2,1 + e1,2e2,2, e1,1e1,2 + e2,1e2,2) = 0.

Hence, any term with two different rows and columns is zero. A similar result will hold when there areeither two different rows or two different columns. The only nonzero term will stem from taking the same 135

row and same column, in which case the value becomes

cov(e1,1e1,1 + e1,2e1,2, e1,1e1,1 + e2,1e2,1) = var(e21,1) = 2.

Therefore, every nonzero entry will be 2. We now only need to determine which entries of the covariancematrix correpond to taking the same row and same column. Considering the structure induced by vector-izing, the first row of the covariance matrix and every subsequent K rows will correspond to matching thesame row of E. Similalry, the first and every subsequent I column will correspond to matching the same 140

colmun of E. This corresponds to our definition and the result follows.

Some examples QK,I are

Q2,2 =

0.5 0 0 0.50 0 0 00 0 0 0

0.5 0 0 0.5

6 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

and

Q2,3 =

6−1/2 0 0 0 6−1/2 0 0 0 6−1/2

0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0

6−1/2 0 0 0 6−1/2 0 0 0 6−1/2

.

Before the next theorem, we define the matrix QR,K via pseudo code.145

Begin CodeSet QR,K to be an R = K2I2 by K2 matrix of zeros

For i = 1, . . . , Rl = 1 + b(i− 1)/(K2I)ck = 1 + b{i− 1− (l − 1)K2T}/(KI)c150

p = 1 + b{i− 1− (l − 1)K2I − (k − 1)KI}/(K)cm = i− (l − 1)K2I − (k − 1)KI − (p− 1)KIf p = l and m 6= k then QR,K{i,m+ (k − 1)K} = 1/(2I1/2)

and QR,K{i, k + (m− 1)K} = 1/(2I1/2)

If p = l and m = k then QR,K{i,m+ (m− 1)K} = 1/I1/2155

End For LoopEnd Code

THEOREM 3. If E is an K × I matrix of independent standard normals and E� = vec(E), then

cov{vec(E�E>� ), vec(EE>)} = 2I1/2QR,K , (B3)

where QR,K is the R×K2 matrix defined by the pseudo code above.160

Proof. Begin by considering the (i, j) of the desired covariance matrix. There exists indices such thatthe (i, j) entry is equal to

cov(em,pek,l, e>r es),

where m, k, r, s take values 1, . . . ,K and p, l take values 1, . . . , I . Moving from (r, s) to j we have that

j = 1 + (r − 1) + (s− 1)K,

and the reverse is obtained using

s = 1 + b(j − 1)/Kc165

r = j − (s− 1)K.

Moving from (m, p, k, l) to i we have that

i = 1 + (m− 1) + (p− 1)K + (k − 1)KI + (l − 1)K2I.

We can move back to (m, p, k, l) from i using

l = 1 + b(i− 1)/(K2I)ck = 1 + b{i− 1− (l − 1)K2I}/(KI)c170

p = 1 + b{i− 1− (l − 1)K2I − (k − 1)KI}/(K)cm = i− (l − 1)K2I − (k − 1)KI − (p− 1)K.

We can see that the covariance will be zero if any one of m, k, r or s is distinct. Thus, the only nonzeroentries will correspond to eitherm = r = k = s,m = r 6= k = s, orm = s 6= k = r whenm = k 6= r =s we get zero. When all four are equal we get that175

cov(em,pem,l, e>mem) = cov(em,pem,l, e

2m,p + e2

m,l1p 6=l),

Testing separability of space-time functional processes 7

which will be zero unless p = l, in which case it equals

cov(e2m,p, e

2m,p) = {E(e4

m,p)− E(e2m,p)2} = 2.

We have therefore established the first if-statement in the pseudo code.Turning to the next case, when m = r 6= k = s, we have that

cov(em,pek,l, e>mek) = cov(em,pek,l, em,pek,p + em,lek,l1l 6=p),

which is again only nonzero when p = l, in which case it will be

cov(em,pek,p, em,pek,p) = E(e2m,p) E(e2

k,p) = 1.

An identical result will hold for whenm = s 6= k = r, which gives both the second and third if-statements 180

in the pseudo code, and the proof is established. �

One example of QR,K is

Q16,2 =

2−1/2 0 0 00 8−1/2 8−1/2 00 0 0 00 0 0 00 8−1/2 8−1/2 00 0 0 2−1/2

0 0 0 00 0 0 00 0 0 00 0 0 0

2−1/2 0 0 00 8−1/2 8−1/2 00 0 0 00 0 0 00 8−1/2 8−1/2 00 0 0 2−1/2

.

For the last Q matrix, we also use pseudo code which is only slightly different from the code definingQR,K .

Begin Code 185

Set QR,I to be an R = K2I2 by I2 matrix of zerosFor i = 1, . . . , Rl = 1 + b(i− 1)/(K2I)ck = 1 + b{i− 1− (l − 1)K2I}/(KI)cp = 1 + b{i− 1− (l − 1)K2I − (k − 1)KI}/(K)c 190

m = i− (l − 1)K2I − (k − 1)KI − (p− 1)KIf m = k and p 6= l then QR,I{i, p+ (l − 1)I} = 1/(2K1/2)

and QR,I{i, l + (p− 1)I} = 1/(2K1/2)

If m = k and p = l then QR,I{i, l + (l − 1)I} = 1/K1/2

End For Loop 195

End Code

THEOREM 4. If E is an K × I matrix of independent standard normals and E� = vec(E), then

cov{vec(E�E>� ), vec(E>E)} = 2K1/2QR,I , (B4)

where QR,I is the R× I2 matrix defined by the pseudo code above.

8 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

Proof. Begin by considering the (i, j) of the desired covariance matrix. There exists indices such that200

the (i, j) entry is equal to

cov{em,pek,l, e>(r)e(s)},

where m, k take values 1, . . . ,K and p, l, r, s take values 1, . . . , I . Moving from (r, s) to j we have that

j = 1 + (r − 1) + (s− 1)I,

and the reverse is obtained using

s = 1 + b(j − 1)/Ic,r = j − (s− 1)I.205

Moving from (m, p, k, l) to i we have that

i = 1 + (m− 1) + (p− 1)K + (k − 1)KI + (l − 1)K2I.

We can move back to (m, p, k, l) from i using

l = 1 + b(i− 1)/(K2I)c,k = 1 + b{i− 1− (l − 1)K2I}/(KI)c,p = 1 + b{i− 1− (l − 1)K2I − (k − 1)KI}/(K)c,210

m = i− (l − 1)I2K − (k − 1)KI − (p− 1)K.

We can see that the covariance will be zero if any one of p, l, r or s is distinct. Thus, the only nonzeroentries will correspond to either p = r = l = s, p = r 6= l = s, or p = s 6= l = r when p = l 6= r = s weget zero. When all four are equal we get that

cov{em,pek,p, e>(p)e(p)} = cov(em,pek,p, e

2m,p + e2

k,p1m 6=k),

which will be zero unless m = k, in which case it equals it will be equal to 2.215

We have therefore established the first if-statement in the pseudo code.Turning to the next case, when p = r 6= l = s, we have that

cov{em,pek,l, e>(p)e(l)} = cov(em,pek,l, em,pem,l + ek,pek,l1m 6=k),

which is again only nonzero when m = k, in which case it will be

cov(em,pem,l, em,pem,l) = E(e2m,p) E(e2

m,l) = 1.

An identical result will hold for when p = s 6= l = r, which gives both the second and third if-statementsin the pseudo code, and the proof is established. �220

C. PROOF OF THEOREM 1We begin by establishing in Theorem C5 the joint null limit distribution of the vectors vec(U −

U), vec(V − V ), and vec(Σ− Σ). We first define several matrices that appear in this distribution. Re-call the Q matrices derived in Section B: the matrix QK is defined in (B1), QK,I in (B2), QR,K in (B3),and QR,I in (B4). Denote by (·)+ the Moore-Penrose generalized inverse. We define the following gener-225

alized information matrices:

IU,V =1

2

(U−1/2 ⊗ U−1/2 0

0 V −1/2 ⊗ V −1/2

)×{

IQK (IK)1/2QK,I

(IK)1/2QI,K KQI

}(U−1/2 ⊗ U−1/2 0

0 V −1/2 ⊗ V −1/2

),

IcU,V ={D(D>IU,VD)+D>}+,

Testing separability of space-time functional processes 9

whereD is an (K2 + I2)× (K2 + I2 − 1) matrix whose columns are orthonormal and are perpendicular 230

to vec(IK2+I2), and

IΣ =1

2(Σ−1/2 ⊗ Σ−1/2)QR1/2(Σ−1/2 ⊗ Σ−1/2),

where R1/2 = KI .

THEOREM 5. Suppose Assumption 1 and decomposition (3) hold. Assume further that tr(U) = K.Then 235

(N)1/2

vec(U − U)

vec(V − V )

vec(Σ− Σ)

→ N(0,Γ), N →∞

The asymptotic covariance matrix Γ is defined as follows. The asymptotic covariance of {vec(U −U), vec(V − V )} is given by (IcU,V )+, of vec(Σ− Σ) is given by I+

Σ , and the cross covariance matrixbetween the two is

1

2(IcU,V )+

(U−1/2 ⊗ U−1/2 0

0 V −1/2 ⊗ V −1/2

)(I1/2Q>R,K

K1/2Q>R,I

)(Σ−1/2 ⊗ Σ−1/2)I+

Σ .

Proof. From standard theory for maximum likelihood estimators, Ferguson (1996) Chapter 18, we can 240

use the partial derivatives of the log likelihood function, score equations, to find the Fisher information aswell as asymptotic expressions for the maximum likelihood estimators. One can show that the cross termsof the Fisher information involving M and V,U and Σ are all zero, meaning that the estimate of the M isasymptotically independent of U , V and Σ. We therefore treat in the following M as known.

We start by working with U and V . Applying the constrained likelihood methods described in Moore 245

et al. (2008), asymptotically, U and V are jointly normally distributed with meansU and V and covariancegiven by the generalized inverse of the constrained Fisher information matrix. Starting with U we havethat unconstrained score equation is given by

1

N1/2

∂l(M,U, V )

∂U=

1

2N1/2

N∑n=1

{U−1(Xn −M)V −1(Xn −M)>U−1 − IU−1}

=1

2N1/2

N∑n=1

(U−1/2EnE>n U−1/2 − IU−1) 250

=1

2N1/2

N∑n=1

{U−1/2(EnE>n − IIK×K)U−1/2}.

To get a handle on the unconstrained Fisher information matrix, and therefore the covariance matrix, itwill be easier to work with the vectorized version

1

2N1/2(U−1/2 ⊗ U−1/2)

N∑n=1

vec(EnE>n − IIK×K).

Notice that we will have a complete handle on the above if we can understand the form for the covarianceof vec(EnE

>n − IIK×K). However, this is a term that in no way depends on the underlying parameters 255

as it is composed entirely of iid standard normals. We label QK = (2I)−1 cov{vec(EnE>n − IIK×K)}

and its explicit form is given in (B1).The part of the Fisher information matrix for vec(U) is given by

I

2(U−1/2 ⊗ U−1/2)QK(U−1/2 ⊗ U−1/2).

10 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

Identical arguments give that the part of the Fisher information matrix for vec(V ) is given by

K

2(V −1/2 ⊗ V −1/2)QI(V −1/2 ⊗ V −1/2).

The joint unconstrained Fisher information matrix for U and V is given by260

IU,V =1

2

(U−1/2 ⊗ U−1/2 0

0 V −1/2 ⊗ V −1/2

)×{

IQK (IK)1/2QI,K

(IK)1/2QI,K KQI

}(U−1/2 ⊗ U−1/2 0

0 V −1/2 ⊗ V −1/2

),

where QK,I is defined in (B2). The constrained version is then given by

IcU,V = {D(D>IU,VD)D+}+.

Recall that D is an (K2 + I2)× (K2 + I2 − 1) matrix whose columns are orthonormal and are perpen-dicular to vec(IK2+I2). The form for D come from the gradient of the constraint tr(U) = K.265

The last piece we need is the joint behavior of U or V and the estimator Σ. The score equation for Σcan be expressed as

∂l(µ,Σ)

∂Σ= −N

2Σ−1 +

1

2Σ−1

{N∑

n=1

(Yn − µ)(Yn − µ)

}Σ−1

= −N2

Σ−1 +1

2Σ−1/2

(N∑

n=1

EnE>n

)Σ−1/2

=1

2Σ−1/2

(N∑

n=1

EnE>n − IKI×KI

)Σ−1/2.270

Using the same arguments as before, we get that Fisher information matrix for vec(Σ) is

IΣ =1

2(Σ−1/2 ⊗ Σ−1/2)QR(Σ−1/2 ⊗ Σ−1/2).

For the joint behavior, we use the following asymptotic expression for the maximum likelihood estimators,Ferguson (1996) Chapter 18,

N1/2

{vec(U − U)

vec(V − V )

}= N−1/2(IcU,V )+

vec{

∂l(M,U,V )∂U

}vec{

∂l(M,U,V )∂V

}+ oP (1),

and275

N1/2 vec(Σ− Σ) = N−1/2I+Σ

∂l(µ,Σ)

∂Σ+ oP (1).

For the covariance between Σ and U or V we obtain two more matrices, called QR,K and QR,I whichsatisfy

cov{

vec(E�nE>�n − IR×R), vec(EnE

>n − IIK×K)

}= 2I1/2QR,K ,

cov{

vec(E�nE>�n − IR×R), vec(E>n En −KII×I)

}= 2K1/2QR,I .

Recall that the diamond subscript indicates vectorization and the definitions of QR,K and QR,I280

can be found in (B3) and (B4), respectively. The cross-covariance matrix for N1/2 vec(Σ− Σ) and

Testing separability of space-time functional processes 11

N1/2 vec(U − U) is then given by

(IcU,V )+ cov

vec(N−1/2 ∂l(M,U,V )

∂U

)vec(N−1/2 ∂l(M,U,V )

∂V

) , vec

{N−1/2 ∂l(M,Σ)

∂Σ

} I+Σ

=1

2(IcU,V )+

(U−1/2 ⊗ U−1/2 0

0 V −1/2 ⊗ V −1/2

)(I1/2Q>R,K

K1/2Q>R,I

)(Σ−1/2 ⊗ Σ−1/2)I+

Σ

285

Proof of Theorem 1: Since we have the joint asymptotic distribution for U , V , and Σ, we can use the deltamethod to find the asymptotic distributions of desired test statistics, and in particular, we can find the formof W , the asymptotic covariance matrix of vec(V ⊗ U)− vec(Σ). To apply the delta method, we needthe partial derivatives. Taking the derivative with respect to Vi,j yields

vec(1i,j ⊗ U)

and with respect to Uk,l 290

vec(V ⊗ 1k,l).

So the matrix of partials with respect to vec(V ) is

GV =

vec(11,1 ⊗ U)>

vec(12,1 ⊗ U)>

...vec(1I,I ⊗ U)>

,

with respect to vec(U) is

GU =

vec(V ⊗ 11,1)>

vec(V ⊗ 12,1)>

...vec(V ⊗ 1K,K)>

,

and with respect to vec(Σ) is just (−1) times the KI ×KI identity matrix. We therefore have that

vec(V ⊗ U)− vec(Σ) ≈

GUGV

−IKI×KI

>

vec(U)

vec(V )

vec(Σ)

.

This implies that

W =

GU

GV

−IKI×KI

> Γ

GU

GV

−IKI×KI

. (C1)

The degrees of freedom are obtained by noticing that under the alternative Σ has KI(KI + 1)/2 free 295

parameters, while under the null there K(K + 1)/2 + I(I + 1)/2− 1, where the last −1 is includedbecause we have one constraint, tr(U) = K.

D. PROOF OF THEOREM 2We begin by restating Assumption 2 in Section 3 of the main paper. We assume that the processes have

mean zero for simplicity. 300

12 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

Assumption 1. Assume that the Xn are independent and identically distributed mean zero Gaussianprocesses in L2(S × T ) with separable covariance σ(s, s′, t, t′) = U(s, s′)V(t, t′). Assume the functionsare scaled such that tr(U) =

∑U(sk, sk) =

∑U(k, k) = K.

The following corollary essentially follows from Isserlis’ theorem applied to our stochastic processes.

COROLLARY 1. Under Assumption D1, we have305

cov{Xn(sk, t)Xn(sk, t′), Xn(si, t)Xn(si, t

′)} = E{Xn(sk, t)Xn(si, t)}E{Xn(sk, t′)Xn(si, t

′)}+ E{Xn(sk, t)Xn(si, t

′)}E{Xn(sk, t′)Xn(si, t)},

and

cov {ξjn(sk)ξjn(s`), ξin(sk)ξin(s`)} = E{ξjn(sk)ξin(sk)}E{ξjn(s`)ξin(s`)}+ E{ξjn(sk)ξin(s`)}E{ξjn(s`)ξin(sk)}.310

The key to establishing the equivalence of procedures based on population and estimated principalcomponents is the derivation of OP (N−1) bounds for differences between the population covarianceoperator V and the operator V defined in the procedure of Section 3, and an analogous bound for thedifference between U and U . To emphasize the different dimensionalities, we refer to the Hilbert-Schmidtnorm of a matrix as a Frobenius norm, and use the subscript F to indicate it. The Hilbert-Schmidt norm315

of operators is indicated with HS .

THEOREM 6. Under Assumption D1,

E(‖V − V‖2HS

)≤ 2

NK2‖U‖2F ‖V1/2‖4HS

where ‖ · ‖F is the Frobenius norm and ‖ · ‖HS is the Hilbert-Schmidt norm.

Proof. The expected value of the estimator V(t, t′) is given by

E{V(t, t′)} = E

{1

NK

N∑n=1

K∑k=1

Xn(sk, t)Xn(sk, t′)

}320

=1

NK

N∑n=1

K∑k=1

E{Xn(sk, t)Xn(sk, t′)}

=1

NK

N∑n=1

K∑k=1

U(k, k)V(t, t′)

= V(t, t′),

since∑K

k=1 U(k, k) = tr(U) = K, which means the estimator V(t, t′) is unbiased. Next observe that

E(‖V − V‖2HS

)= E

[∫t

∫t′{V(t, t′)− V(t, t′)}2dtdt′

]=

∫t

∫t′E{V(t, t′)− V(t, t′)}2dtdt′.

Testing separability of space-time functional processes 13

Since the estimator V(t, t′) is unbiased, E{V(t, t′)− V(t, t′)}2 = var{V(t, t′)}. Applying Corollary 1, 325

we obtain

var{V(t, t′)} = cov{V(t, t′), V(t, t′)}

= cov

{1

NK

N∑n=1

K∑k=1

Xn(sk, t)Xn(sk, t′),

1

NK

N∑m=1

K∑i=1

Xm(si, t)Xm(si, t′)

}

=1

NK2

K∑k=1

K∑i=1

cov{Xn(sk, t)Xn(sk, t′), Xn(si, t)Xn(si, t

′)}

Corollary 1=

1

NK2

K∑k=1

K∑i=1

[E{Xn(sk, t)Xn(si, t)}E{Xn(sk, t′)Xn(si, t

′)} 330

+ E{Xn(sk, t)Xn(si, t′)}E{Xn(sk, t

′)Xn(si, t)}]

=1

NK2

K∑k=1

K∑i=1

{U(k, i)V(t, t)U(k, i)V(t′, t′) + U(k, i)V(t, t′)U(k, i)V(t′, t)}

=1

NK2

K∑k=1

K∑i=1

U2(k, i){V(t, t)V(t′, t′) + V2(t, t′)}

=1

NK2‖U‖2F {V(t, t)V(t′, t′) + V2(t, t′)}.

335

Applying the Cauchy-Schwarz inequality we conclude that

E(‖V − V‖2HS

)=

1

NK2‖U‖2F

∫t

∫t′{V(t, t)V(t′, t′) + V2(t, t′)}dtdt′

=1

NK2‖U‖2F

{∫t

V(t, t)dt

∫t′V(t′, t′)dt′ +

∫t

∫t′V2(t, t′)dtdt′

}C.S.≤ 2

NK2‖U‖2F

{∫t

V(t, t)dt

∫t′V(t′, t′)dt′

}=

2

NK2‖U‖2F tr2(V) 340

=2

NK2‖U‖2F ‖V1/2‖4HS ,

as desired. �

As corollaries, we have that the estimated eigenvalues and eigenfunctions can also be consistentlyestimated. The proofs are directly based on well-known inequalities from operator theory, e.g, Horvath &Kokoszka (2012) Chapter 2. 345

COROLLARY 2. Let vj ,vj be the eigenfunctions of V,V respectively. Then

E(‖vj − vj‖2

)≤ 2(2)1/2

αjE(‖V − V‖2HS

)≤ 4(2)1/2

NK2αj‖U‖2F ‖V1/2‖4HS

where α1 = (λ1 − λ2), αj = min{(λj − λj+1), (λj−1 − λj)} for j ≥ 2.

COROLLARY 3. Let λj ,λj be the eigenvalues of V,V respectively. Then

E(‖λj − λj‖2

)≤ E

(‖V − V‖2HS

)≤ 2

NK2‖U‖2F ‖V1/2‖4HS .

THEOREM 7. Under Assumption D1, ‖U − U‖2F = OP (N−1).

14 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

Proof. Recall that350

U(k, k′) =K∑

n

∫Xn(sk, t)Xn(sk′ , t) dt

N tr(σ).

Notice that

tr(U) =K∑

n

∑k

∫Xn(sk, t)Xn(sk, t) dt

N tr(σ)=K tr(σ)

tr(σ)= K,

which explains the chosen normalization. Define the intermediate term

U(k, k′) =K∑

n

∫Xn(sk, t)Xn(sk′ , t) dt

N tr(σ),

based on the true trace of σ.To establish the desired result, we will first separate it into two terms

‖U − U‖F ≤ ‖U − U‖F + ‖U − U‖F .

The square of the first term is given by355

|U(k, k′)− U(k, k′)|2 =

∣∣∣∣K∑n

∫Xn(sk, t)Xn(sk′ , t) dt

N tr(σ)−K∑

n

∫Xn(sk, t)Xn(sk′ , t) dt

N tr(σ)

∣∣∣∣2=

∣∣∣∣∣KN ∑n

∫Xn(sk, t)Xn(sk′ , t) dt

{1

tr(σ)− 1

tr(σ)

}∣∣∣∣∣2

≤ K2

N2

∑n

∫|Xn(sk, t)Xn(sk′ , t)|2 dt

∣∣∣∣{ 1

tr(σ)− 1

tr(σ)

}∣∣∣∣2≤ K2

N2

{∑n

∫X2

n(sk, t) dt

}{∑n

∫X2

n(sk′ , t) dt

}∣∣∣∣{ 1

tr(σ)− 1

tr(σ)

}∣∣∣∣2≤ K2

N2

(∑n

‖Xn(sk)‖2HS

)(∑n

‖Xn(sk′)‖2HS

)∣∣∣∣{ 1

tr(σ)− 1

tr(σ)

}∣∣∣∣2360

= OP (K2N−1/2),

since σ is N1/2 consistent and therefore by using delta method with the function f(σ) = 1/ tr(σ) wehave that 1/ tr(σ) is also N1/2 consistent.

Turning to the second term, the expected value of the U(k, k′) is given by

E{U(k, k′)} = E

{K∑

n

∫Xn(sk, t)Xn(sk′ , t) dt

N tr(σ)

}365

=K

N tr(σ)

∑n

∫E {Xn(sk, t)Xn(sk′ , t)} dt

=K

N tr(σ)

∑n

∫V(t, t)dt U(k, k′)

=K

tr(σ)U(k, k′)

∫V(t, t)dt

=K

tr(σ)U(k, k′) tr(V)

= U(k, k′),370

Testing separability of space-time functional processes 15

since tr(σ) = tr(V) tr(U) and tr(U) = K, which means the estimator U(k, k′) is unbiased. ComparingU to U we have

E(‖U − U‖2

)= E

[K∑

k=1

K∑k′=1

{U(k, k′)− U(k, k′)}2]

=

K∑k=1

K∑k′=1

E{U(k, k′)− U(k, k′)}2.

Since the estimator U(k, k′) is unbiased we have E{U(k, k′)− U(k, k′)}2 = var{U(k, k′)}, and so thedifference is given by

var{U(k, k′)} = cov{U(k, k′), U(k, k′)} 375

= cov

{K∑

n

∫Xn(sk, t)Xn(sk′ , t) dt

N tr(σ),K∑

m

∫Xm(sk, t

′)Xm(sk′ , t′) dt′

N tr(σ)

}=

K2

N tr2(σ)

∫t

∫t′

cov {Xn(sk, t)Xn(sk′ , t), Xn(sk, t′)Xn(sk′ , t′)} dt dt′

Corollary 1=

K2

N tr2(σ)

∫t

∫t′

[E{Xn(sk, t)Xn(sk, t′)}E{Xn(sk′ , t)Xn(sk′ , t′)}

+ E{Xn(sk, t)Xn(sk′ , t′)}E{Xn(sk′ , t)Xn(sk, t′)}]dt dt′

=K2

N tr2(σ)

∫t

∫t′{U(k, k)V(t, t′)U(k′, k′)V(t, t′) + U(k, k′)V(t, t′)U(k, k′)V(t, t′)}dt dt′380

=K2

N tr2(σ){U(k, k)U(k′, k′) + U2(k, k′)}

∫t

∫t′V2(t, t′)dt dt′.

We therefore have

E(‖U − U‖2

)=

K2

N tr2(σ)

K∑k=1

K∑k′=1

{U(k, k)U(k′, k′) + U2(k, k′)}∫t

∫t′V2(t, t′)dt dt′

C.S.≤ 2K2

N tr2(σ)

K∑k=1

K∑k′=1

U(k, k)U(k′, k′)

∫t

V(t, t)dt

∫t′V(t′, t′)dt′

=2K2

N tr2(σ)tr2(U) tr2(V) 385

= OP (K2N−1)

since tr(σ) = tr(U) tr(V) = K tr(V). �

Using Theorems D6 and D7, one can show that replacing the uk and the vj by their estimates uk andvj has an asymptotically negligible effect. We provide the argument in the case of statistic TF , whichgenerally leads to a test that performs better than the other tests. We also assume that TF is computed 390

without iterating using the following estimates

V(t, t′) =1

NK

N∑n=1

K∑k=1

Xn(sk, t)Xn(sk, t′), U(k, k′) =

1

N tr(σ)

N∑n=1

∫Xn(sk, t)Xn(sk′ , t) dt.

This simplifies the asymptotic arguments greatly as the test statistics will depend linearly on eigenfunc-tions. In this case, the test statistic is given by

TF =

K∑k1

J∑j1

K∑k2

J∑j2

〈V⊗U − Σ, uk1⊗ vj1 ⊗ uk1

⊗ vj1〉2.

16 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

THEOREM 8. Denote by TF the statistic computed using the estimators uk and vj and by T †F therandom variable computed using the population functions uk and vj . If Assumption D1 holds, then395

TF − T †F = OP (N−1/2).

Proof. To lighten the notation, we drop the superscript F . First notice that

T =

K∑k1

J∑j1

K∑k2

J∑j2

N〈V ⊗ U − Σ, uk1⊗ vj1 ⊗ uk2

⊗ vj2〉2,

with an analogous formula for T †. Let T ∗ = V ⊗ U − Σ. By using the triangle inequality we have that∣∣∣∣∣∣K∑k1

J∑j1

K∑k2

J∑j2

N〈T ∗, uk1 ⊗ vj1 ⊗ uk2 ⊗ vj2〉2 −K∑k1

J∑j1

K∑k2

J∑j2

N〈T ∗, uk1 ⊗ vj1 ⊗ uk2 ⊗ vj2〉2∣∣∣∣∣∣

≤ NK∑k1

J∑j1

K∑k2

J∑j2

∣∣〈T ∗, uk1⊗ vj1 ⊗ uk2

⊗ vj2〉2 − 〈T ∗, uk1⊗ vj1 ⊗ uk2

⊗ vj2〉2∣∣ .400

By using the formula a2 − b2 = (a− b)(a+ b) and the linearity of the inner product we obtain∣∣〈T ∗, uk1⊗ vj1 ⊗ uk2

⊗ vj2〉2 − 〈T ∗, uk1⊗ vj1 ⊗ uk2

⊗ vj2〉2∣∣ = |〈T ∗, uk1

⊗ vj1 ⊗ uk2⊗ vj2 − uk1

⊗ vj1 ⊗ uk2⊗ vj2〉|

× |〈T ∗, uk1⊗ vj1 ⊗ uk2

⊗ vj2 + uk1⊗ vj1 ⊗ uk2

⊗ vj2〉| .

Applying the Cauchy-Schwarz inequality, the above is bounded by

‖T ∗‖2‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 + uk1⊗ vj1 ⊗ uk2

⊗ vj2‖405

≤ 2‖T ∗‖2‖uk1 ⊗ vj1 ⊗ uk2 ⊗ vj2 − uk1 ⊗ vj1 ⊗ uk2 ⊗ vj2‖.

We have that

‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖= ‖uk1

⊗ vj1 ⊗ uk2⊗ vj2 − uk1

⊗ vj1 ⊗ uk2⊗ vj2 + uk1

⊗ vj1 ⊗ uk2⊗ vj2 − uk1

⊗ vj1 ⊗ uk2⊗ vj2‖

≤ ‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖+ ‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖410

= ‖uk1− uk1

‖+ ‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖= ‖uk1

− uk1‖+ ‖uk1

⊗ vj1 ⊗ uk2⊗ vj2 ± uk1

⊗ vj1 ⊗ uk2⊗ vj2 − uk1

⊗ vj1 ⊗ uk2⊗ vj2‖

≤ ‖uk1− uk1

‖+ ‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖+ ‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖= ‖uk1

− uk1‖+ ‖vj1 − vj1‖+ ‖uk1

⊗ vj1 ⊗ uk2⊗ vj2 − uk1

⊗ vj1 ⊗ uk2⊗ vj2‖

= ‖uk1− uk1

‖+ ‖vj1 − vj1‖+ ‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 ± uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖415

≤ ‖uk1− uk1

‖+ ‖vj1 − vj1‖+ ‖uk1⊗ vj1 ⊗ uk2

⊗ vj2 − uk1⊗ vj1 ⊗ uk2

⊗ vj2‖+ ‖uk1

⊗ vj1 ⊗ uk2⊗ vj2 − uk1

⊗ vj1 ⊗ uk2⊗ vj2‖

= ‖uk1− uk1

‖+ ‖vj1 − vj1‖+ ‖uk2− uk2

‖+ ‖vj2 − vj2‖

≤ 2(2)1/2

βk1

‖U − U‖+2(2)1/2

αj1

‖V − V ‖+2(2)1/2

βk2

‖U − U‖+2(2)1/2

αj2

‖V − V ‖,

where α1 = (λ1 − λ2), αj1 = min{(λj1 − λj1+1), (λj1−1 − λj1)} for j1 ≥ 2, β1 = (µ1 − µ2) and420

βk1 = min{(µk1 − µk1+1), (µk1−1 − µk1)} for k1 ≥ 2. Here λ1, . . . , λJ are the eigenvalues of V andµ1, . . . , µK are the eigenvalues of U .

Testing separability of space-time functional processes 17

To sum up we have∣∣∣∣∣∣K∑k1

J∑j1

K∑k2

J∑j2

N〈T ∗, uk1⊗ vj1 ⊗ uk2

⊗ vj2〉2 −K∑k1

J∑j1

K∑k2

J∑j2

N〈T ∗, uk1⊗ vj1 ⊗ uk2

⊗ vj2〉2∣∣∣∣∣∣

≤ NK∑k1

J∑j1

K∑k2

J∑j2

2‖T ∗‖2(

2(2)1/2

βk1

‖U − U‖+2(2)1/2

αj1

‖V − V ‖+2(2)1/2

βk2

‖U − U‖+2(2)1/2

αj2

‖V − V ‖)

425

= ‖(N)1/2T ∗‖28(2)1/2JK

J K∑k1

1

βk1

‖U − U‖+K

J∑j1

1

αj1

‖V − V ‖

= OP (N−1/2),

and the claim holds. �

E. ADDITIONAL EMPIRICAL REJECTION RATES, MORE DISCUSSION IN IRISH WIND DATA ANDAPPLICATION TO POLLUTION DATA 430

E·1. Additional simulationsWe now study the effect of increasing the number of spatial locations and the number of spatial and

temporal principal components. We will use the first spatiotemporal covariance function introduced byGneiting (2002). We use I = 100 time points equally spaced on [0, 1] and K = 16, 25 equally spacedpoints in [0, 1]× [0, 1]. We display the results, based on 1000 replications, for two extreme values of the 435

space-time interaction parameter, β = 0 and β = 1. Table 1 shows that increasing the number of spatialpoints and the number of principal components does not change our previous conclusions, i.e., only thetests TL−MC and TF are robust to the number of the principal components used with the TF having morepower.

We now explore the behavior of the method which involves dimension reduction in time only. Details 440

are presented in Section A. We continue to work with the covariance function of Gneiting (2002). Table2 shows that only the tests TF and TL−MC are robust to the selection of J , but in some settings, theempirical size of this method is not as well calibrated as that of the general method of Section 3. TestTL−MC tends to be too conservative, with a corresponding loss of power. If K = 9, and J ≥ 3, test TFoverrejects. It appears that the cut off point for spatial dimension reductions is about K = 10. 445

Finally in Figure 1 we can see the QQ-plots of the p-values of the three test statistics introduced inthe main paper, TL, TF , TW respectively. Based on the QQ-plots we don’t have a clear evidence that theapproximate null distributions of the test statistics are the one claimed. Therefore, we apply Kolmogorov-Smirnov tests and obtain p-values 0.5361, 0.8593 and 0.0971 for TL, TF , TW respectively. Based onthe p-values we can conclude that the approximate null distributions are the one claimed. To construct the 450

QQ-plots we use the first spatiotemporal covariance function introduced by Gneiting (2002), I = 100 timepoints equally spaced on [0, 1], K = 11 space points in [0, 1]× [0, 1], β = 0, N = 100 and L = J = 2.The results are based on 1000 replications.

E·2. Irish wind dataIn addition to the p-values obtained by the procedures describe in Section 4.2 we applied our tests to 455

the deseasonalized square root transformed wind speed data and to the deseasonalized way of Haslett &Raftery (1989). Haslett & Raftery (1989) esimated the seasonal effect by calculating the average of thesquare roots of the daily means over all years and stations for each day of the year, and then regressing theresults on a set of annual harmonics. Subtraction of the estimated seasonal effect from the square roots ofthe daily means yields the deseasonalized data, referred to as velocity measures. Both these procedures 460

follow the same pattern as in Table 5.

18 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

Table 1. Rejection rates (%), based on 1000 replications, for N = 100, L, J ∈ {2, 3, 4, 5, 6, 7},β = 0 (H0) and β = 1 (HA). L is the number of spatial principal components used in the

dimension reduction and J the number of temporal principal components. The proportion ofvariance explained by the temporal principal components is given in the last column. The

standard errors are given in the parentheses

β K L J TL−MC TL TF TW Variance Explained (%)0 16 2 2 5.0 (0.69) 5.8 (0.74) 5.8 (0.74) 4.6 (0.66) 900 16 3 3 4.4 (0.65) 6.7 (0.79) 6.0 (0.75) 38.7 (1.54) 930 16 4 4 3.9 (0.61) 13.7 (1.08) 6.2 (0.76) 98.2 (0.42) 950 16 5 5 6.2 (0.76) 33.7 (1.49) 7.1 (0.81) 100 (0) 960 16 6 6 3.3 (0.56) 84.2 (1.15) 4.3 (0.64) 100 (0) 970 16 7 7 5.7 (0.73) 100 (0) 5.0 (0.69) 100 (0) 971 16 2 2 54.0 (1.58) 57.4 (1.56) 67.1 (1.49) 32.0 (1.48) 881 16 3 3 50.0 (1.58) 56.7 (1.57) 89.7 (0.96) 94.6 (0.71) 921 16 4 4 71.5 (1.43) 87.4 (1.05) 99.6 (0.20) 100 (0) 951 16 5 5 80.5 (1.25) 98.1 (0.43) 100 (0) 100 (0) 961 16 6 6 92.5 (0.83) 100 (0) 100 (0) 100 (0) 961 16 7 7 94.5 (0.72) 100 (0) 100 (0) 100 (0) 970 25 2 2 4.2 (0.63) 4.9 (0.68) 4.2 (0.63) 3.0 (0.54) 890 25 3 3 4.5 (0.66) 6.9 (0.80) 4.9 (0.68) 38.0 (1.53) 930 25 4 4 3.8 (0.60) 12.4 (1.04) 5.4 (0.71) 98.4 (0.40) 950 25 5 5 5.0 (0.69) 31.4 (1.47) 5.7 (0.73) 100 (0) 960 25 6 6 2.5 (0.49) 82.8 (1.19) 4.2 (0.63) 100 (0) 970 25 7 7 5.5 (0.72) 99.9 (0.10) 4.8 (0.68) 100 (0) 971 25 2 2 52.1 (1.58) 56.0 (1.57) 65.5 (1.50) 25.2 (1.37) 871 25 3 3 48.0 (1.58) 56.1 (1.57) 89.6 (0.97) 94.8 (0.70) 921 25 4 4 65.9 (1.50) 84.1 (1.16) 99.8 (0.14) 99.9 (0.10) 941 25 5 5 76.9 (1.33) 97.0 (0.54) 100 (0) 100 (0) 961 25 6 6 90.8 (0.91) 100 (0) 100 (0) 100 (0) 961 25 7 7 94.4 (0.73) 100 (0) 100 (0) 100 (0) 97

E·3. Air quality dataWe conclude this section by considering air quality data from the east coast of the United States.

The Environmental Protection Agency, EPA, collects massive amounts of air quality data which areavailable through its website http://www3.epa.gov/airdata/ad_data_daily.html. The465

records consist of data for 6 common pollutants, collected by outdoor monitors in hundreds of locationsacross the United States. The number and frequency of the observations varies greatly by location, butsome locations have as many as 3 decades worth of daily measurements. We focus on nitrogen dioxide, acommon pollutant emitted by combustion engines and power stations, which has been linked to adult andinfant deaths, and to incidence of lung cancer. In addition, nitrogen dioxide contributes to ozone formation470

with all its health impacts, as well as impacts on both terrestrial and aquatic ecosystems.We consider nine locations that have relatively complete records since 2000: Allentown, Baltimore,

Boston, Harrisburg, Lancaster, New York City, Philadelphia, Pittsburgh, and Washington D.C. We usethe data for the years 2000-2012 . Each functional observation Xn(sk, t) consists of the daily maximumone-hour nitrogen dioxide concentration measured in ppb, parts per billion, for day t, month n, N = 156,475

and at location sk. Figure 3 shows the data for the nine locations for December 2012.Applying the tests to the deseasonalized curves obtained after removing the monthly mean from each

curve, as we did for the Irish wind data, we obtained p-values smaller than 0.001 for all L, J ∈ {2, 3, 4, 5}.In addition since the number of spatial locations is relatively small, only 9 stations, we applied our testswithout taking dimension reduction in space. Again, we obtained p-values smaller than 0.001. We see that480

Testing separability of space-time functional processes 19

Table 2. Rejection rates (%), based on 1000 replications, for the method of Section A, temporaldimension reduction only; N = 100, J ∈ {2, 3, 4}, β = 0 and β = 1. J is the number of

temporal principal components. The proportion of variance explained by the temporal principalcomponents is given in the last column. The standard errors are given in the parentheses

β K J TL−MC TL TF TW Variance Explained (%)0 4 2 4.2 (0.63) 7.1 (0.81) 5.6 (0.73) 5.4 (0.71) 890 4 3 5.3 (0.71) 9.3 (0.92) 5.5 (0.72) 85.4 (1.12) 930 4 4 3.2 (0.56) 12.2 (1.03) 5.8 (0.74) 99.9 (0.10) 951 4 2 71.5 (1.43) 81.7 (1.22) 91.5 (0.88) 75.7 (1.36) 891 4 3 86.6 (1.08) 92.1 (0.85) 98.5 (0.38) 80.7 (1.25) 931 4 4 79.8 (1.27) 92.4 (0.84) 99.6 (0.20) 100 (0) 950 6 2 1.5 (0.38) 8.8 (0.90) 4.4 (0.65) 65.0 (1.51) 880 6 3 4.3 (0.64) 17.1 (1.19) 4.8 (0.68) 100 (0) 930 6 4 3.8 (0.60) 29.5 (1.44) 5.1 (0.70) 100 (0) 941 6 2 79.5 (1.28) 95.8 (0.63) 98.4 (0.40) 100 (0) 881 6 3 96.5 (0.58) 99.7 (0.17) 99.9 (0.10) 100 (0) 921 6 4 97.0 (0.54) 99.9 (0.10) 100 (0) 100 (0) 940 9 2 4.5 (0.66) 17.6 (1.20) 5.2 (0.70) 100 (0) 880 9 3 2.1 (0.45) 43.1 (1.57) 6.6 (0.79) 100 (0) 920 9 4 3.7 (0.60) 84.6 (1.14) 7.2 (0.82) 100 (0) 941 9 2 96.4 (0.59) 99.1 (0.30) 99.4 (0.24) 100 (0) 901 9 3 99.6 (0.20) 100 (0) 100 (0) 100 (0) 921 9 4 99.3 (0.26) 100 (0) 100 (0) 100 (0) 94

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

uniform_quantiles

TL_pvalue

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

uniform_quantiles

TF_pvalue

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

uniform_quantiles

TW_pvalue

Figure 1. QQ-plots of the p-values of the test statistics TL, TF , TW , from left to right.

nonseparability is a feature of pollution data which is difficult to ignore. This means that simplifying thespatiotemporal dependence structure of many pollution data sets by assuming separability is questionable.Conclusions obtained under the assumption of separability may be incorrect, and may lead to incorrectpublic health policies.

BIBLIOGRAPHY 485

FERGUSON, T. S. (1996). A Course in Large Sample Theory. London: Chapman & Hall.

20 P. CONSTANTINOU, P. KOKOSZKA AND M. REIMHERR

0 5 10 15 20 25 30

time (days)

RPT

0 5 10 15 20 25 30

VAL

0 5 10 15 20 25 30

KIL

0 5 10 15 20 25 30

SHA

0 5 10 15 20 25 30

BIR

0 5 10 15 20 25 30

DUB

0 5 10 15 20 25 30

CLA

0 5 10 15 20 25 30

MUL

0 5 10 15 20 25 30

CLO

0 5 10 15 20 25 30

BEL

0 5 10 15 20 25 30

MAL

Figure 2. Irish wind speed curves for January 1961. Each curve corresponds to a differentlocation. The curves are offset since they overlap.

GNEITING, T. (2002). Nonseparable, stationary covariance functions for space–time data. Journal of the AmericanStatistical Association 97, 590–600.

HASLETT, J. & RAFTERY, A. E. (1989). Space–time modelling with long–memory dependence: Assesing Ireland’swind power resource (with discussion). Applied Statistics 38, 1–50.490

HORVATH, L. & KOKOSZKA, P. (2012). Inference for Functional Data with Applications. New York: Springer.MOORE, J. T., B. M. SADLER, B. M. & KOZICK, R. J. (2008). Maximum-likelihood estimation, the Cramer–

Rao bound, and the method of scoring with parameter constraints. IEEE Transactions on Signal Processing 56,895–908.

RAMSAY, J. O., HOOKER, G. & GRAVES, S. (2009). Functional Data Analysis with R and MATLAB. New York: 495

Springer.

[Received month year. Revised month year]

Testing separability of space-time functional processes 21

0 5 10 15 20 25 30

time (days)

ALL

0 5 10 15 20 25 30

BAL

0 5 10 15 20 25 30

BOS

0 5 10 15 20 25 30

HAR

0 5 10 15 20 25 30

LAN

0 5 10 15 20 25 30

NYC

0 5 10 15 20 25 30

PHI

0 5 10 15 20 25 30

PIT

0 5 10 15 20 25 30

WAS D.C

Figure 3. Maximum one hour nitrogen dioxide curves for December 2012. The curves are offsetsince they overlap.