56
Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data With Fixed E/ects When Both n and T Are Large: A Nonstationary Case Jihai Yu, Robert de Jong, Lung-fei Lee y Department of Economics The Ohio State University August 25, 2007 Abstract Yu, de Jong and Lee (2006) established asymptotic properties of quasi-maximum likelihood estimators for spatial dynamic panel data with xed e/ects when both the number of individuals n and the number of time periods T are large. This paper covers a nonstationary case where there are unit roots in the data generating process. When not all the roots in the DGP are unit, the estimatorsrates of convergence will be the same as the stationary case, and the estimators can be asymptotically normal. The presence of the nonstationary components however will make the estimatorsasymptotic variance matrix singular. Consequently, a linear combination of the spatial and dynamic e/ects can converge at a higher rate. We also propose a bias correction for our estimator. When T grows faster than n 1=3 , the correction will asymptotically eliminate the bias and yield a centered condence interval. JEL classication: C13; C23 Keywords: Spatial autoregression, Dynamic panels, Fixed e/ects, Quasi-maximum likelihood estima- tion, Bias correction, Unit root, Nonstationarity We would like to thank participants of the Econometrics Seminar at The Ohio State University (March 2007) and the Third Symposium on Econometric Theory and Applications at HKUST (April 2007) for helpful comments. y Lee acknowledges nancial support for his research from NSF under Grant No. SES-0519204. 1

Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Quasi-Maximum Likelihood Estimators For Spatial Dynamic

Panel Data With Fixed E¤ects When Both n and T Are Large:

A Nonstationary Case�

Jihai Yu, Robert de Jong, Lung-fei Leey

Department of Economics

The Ohio State University

August 25, 2007

Abstract

Yu, de Jong and Lee (2006) established asymptotic properties of quasi-maximum likelihood estimators

for spatial dynamic panel data with �xed e¤ects when both the number of individuals n and the number

of time periods T are large. This paper covers a nonstationary case where there are unit roots in the data

generating process. When not all the roots in the DGP are unit, the estimators�rates of convergence will

be the same as the stationary case, and the estimators can be asymptotically normal. The presence of

the nonstationary components however will make the estimators�asymptotic variance matrix singular.

Consequently, a linear combination of the spatial and dynamic e¤ects can converge at a higher rate. We

also propose a bias correction for our estimator. When T grows faster than n1=3, the correction will

asymptotically eliminate the bias and yield a centered con�dence interval.

JEL classi�cation: C13; C23

Keywords: Spatial autoregression, Dynamic panels, Fixed e¤ects, Quasi-maximum likelihood estima-

tion, Bias correction, Unit root, Nonstationarity

�We would like to thank participants of the Econometrics Seminar at The Ohio State University (March 2007) and the Third

Symposium on Econometric Theory and Applications at HKUST (April 2007) for helpful comments.yLee acknowledges �nancial support for his research from NSF under Grant No. SES-0519204.

1

Page 2: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

1 Introduction

This paper investigates the properties of maximum likelihood (ML) estimators and quasi-maximum like-

lihood (QML) estimators for spatial dynamic panel data models with individual �xed e¤ects when both the

number of individuals n and the number of time periods T are large for a nonstationary case.

In Yu, de Jong and Lee (2006), the consistency and asymptotic distribution of the QML estimators are

established for the stationary case. Also, a bias correction procedure for the estimators is proposed. It is

shown that as long as T grows faster than n1=3, the correction will asymptotically eliminate the bias and

will yield a centered con�dence interval. When there are unit roots in the process so that the assumption of

absolute summability in Yu, de Jong and Lee (2006) does not hold, the analysis of the properties of estimators

for the stationary case, which is crucially based on the absolute summability condition, will not be valid. In

this paper, we will show that when the spatial weights matrix is row normalized from a symmetric matrix,

we can still obtain the consistency and asymptotic normality of the ML and QML estimators with the same

rate of convergence as in the stationary case. The di¤erence is that the variance matrix is di¤erent from the

stationary case, and it is singular in the limit. Also, for this nonstationary case, there is a linear combination

of common parameters that will have a higher rate of convergence.

The nonstationary case we consider is relevant in empirical applications. In Yu (2006), a spatial dynamic

panel data model is applied to study the growth convergence of 48 contiguous states. In the estimation result,

the spatial e¤ects are signi�cant and the sum of estimators of spatial and dynamic e¤ects equals nearly to

the 1. This implies that there may be nonstationary components in the DGP (see discussion in Section 2.1

for details), which motivates deriving asymptotic theory for the estimators under nonstationarity. Also, in

Tao�s (2006) study on the education spending of local school districts using spatial dynamic panel model, we

have signi�cant spatial e¤ects and the sum of estimators of spatial and dynamic e¤ects equals nearly to 1.

There is growing research interest in nonstationary panels in recent years. For independent panels, we

have Maddala and Wu (1999), Levin, Lin and Chu (2002), Im, Pesaran and Shin (2003), etc. For cross-

sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron (2004),

etc, where the cross sectional dependence is speci�ed by common factors. This paper covers a case of

nonstationary panel data where the cross sectional dependence is speci�ed by spatial correlation among

units directly. There are already extensive empirical applications for nonstationary panel data1 . We expect

that our model can shed light on existing nonstationary panel data models and empirical applications.

This paper is organized as follows. In Section 2, the model is introduced. We then explain our method

of estimation, which is a concentrated QML estimation. Several lemmas on matrix algebra and a central

limit theorem are stated. Section 3 derives the consistency and asymptotic distribution of the spatial e¤ect

parameter. Using the results of Section 3, we establish the asymptotic distribution of the common parameters

1The applications include purchasing power parity, growth and convergence, money demand, monetary exchange rate model,

in�ation-rate convergence, interest rate, health care expenditure, hysteresis in unemployment, etc. See Choi (2004) for more

details.

2

Page 3: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

in Section 4. Also, a bias correction procedure is proposed and simulation results are reported. Section 5

concludes the paper. Some useful lemmas and proofs are collected in the Appendix.

2 The Model and The Likelihood Function

2.1 The Model

The model considered in this paper is

Ynt = �0WnYnt + 0Yn;t�1 + �0WnYn;t�1 +Xnt�0 + cn0 + Vnt; t = 1; 2; :::; T , (2.1)

where Ynt = (y1t; y2t; :::; ynt)0 and Vnt = (v1t; v2t; :::; vnt)0 are n � 1 column vectors and vit is i:i:d: across iand t with zero mean and variance �20,Wn is a known n�n spatial weights matrix which is nonstochastic andgenerates the spatial dependence between cross sectional units yit, Xnt is an n� kx matrix of nonstochasticregressors, and cn0 is n � 1 column vector of �xed individual e¤ects. Therefore, the total number of para-meters in this model is equal to the number of individuals n plus the dimension of the common parameters

( ; �; �0; �; �2)0, which is kx + 4. Wn is usually row normalized from a symmetric matrix such that its ith

row is [cn;i1; cn;i2; � � � ; cn;in]=Pn

j=1 cn;ij , where cn;ij represents a function of the spatial distance of di¤erent

units in some space. As a normalization, cn;ii = 0. It is a common practice in empirical work that Wn is

row normalized, which ensures that all the weights are between 0 and 1 and weighting operations can be

interpreted as an average of the neighboring values. Also, a weights matrix row normalized from a symmetric

matrix has real eigenvalues, with all its eigenvalues less than or equal to one in absolute value and its largest

eigenvalue always 1 (see Ord (1975)). Such a spatial weights matrix is also diagonalizable (see Proposition

B.1 in Appendix B).

De�ne Sn(�) = In � �Wn and denote Sn � Sn(�0) = In � �0Wn. Then, presuming Sn is invertible and

denoting An = S�1n ( 0In + �0Wn), (2.1) can be rewritten as

Ynt = AnYn;t�1 + S�1n Xnt�0 + S

�1n cn0 + S

�1n Vnt. (2.2)

A nonstationary case occurs if some eigenvalues dni of An are equal to 1, i.e., dni = 0+�0$ni

1��0$ni= 1 for some

i where $ni is an eigenvalue of Wn. For the nonstationary case, we can decompose Ynt into a stationary

part and a nonstationary part. To do that, we can �rst diagonalize2 An as An = RnDnR�1n where Rn is the

eigenvectors of An andDn = diag(dn1; dn2; � � � ; dnn) where dni�s are eigenvalues of An. When dn;max = 1 anddn;min > �1 where dn;max and dn;min are respectively the largest and smallest eigenvalues of An, withoutloss of generality, suppose that dni = 1 for i = 1; 2; � � � ;mn and jdnij < 1 for mn + 1 � i � n where

mn is the number of unit roots. Let Bn = Rn ~DnR�1n with ~Dn = Diag(0; � � � ; 0; dn;mn+1; � � � ; dnn) so thatDn = Jn+ ~Dn where Jn = Diagf10mn

; 0; � � � ; 0g with 1mn being anmn�1 vector of ones. As Jn is idempotent2See Proposition B.2 in Appendix B for diagonalizability of An.

3

Page 4: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

and Jn � ~Dn = 0, Ahn =Mn +Bhn for any h = 1; 2; � � � where Mn = RnJnR

�1n . Then (see Proposition B.5 in

Appendix B), for t � 0, we can decompose Ynt into sum of a stationary part and a nonstationary part:

Ynt = Yunt + Y

snt, (2.3)

where

Y unt =Mn

Yn;�1 + cn0

t

(1� �0)+

Pt�1h=0Xnh�0(1� �0)

+

Pt�1h=0 Vnh(1� �0)

!, (2.4)

Y snt =1Xh=0

BhnS�1n cn0 +

1Xh=0

BhnS�1n Xn;t�h�0 +

1Xh=0

BhnS�1n Vn;t�h. (2.5)

Compared to the stationary case, the model has a time trend attachment Mncn0t

(1��0) +Mn

Pt�1h=0Xnh�0(1��0) , a

random walk attachment Mn

Pt�1h=0 Vnh(1��0) and a nonstationary initial value component MnYn;�1.

Using (2.3), (2.4) and (2.5), we have3

~Ynt = ~Y unt + ~Y snt; t = 0; 1; � � � ; T , (2.6)

where

~Y unt =1

(1� �0)Mn

�cn0~t+ ~Xnt�0 + ~�nt

�, ~Y snt = ~X s

nt�0 + ~Usnt, (2.7)

with ~t = t� T+12 , Xnt =

Pt�1h=0Xnh, �nt =

Pt�1h=0 Vnh, X s

nt =1Ph=0

BhnS�1n Xn;t�h and Usnt =

1Ph=0

BhnS�1n Vn;t�h.

To analyze the model, the following assumptions are needed.

Assumption 1. Wn is a nonstochastic spatial weights matrix, row normalized from a symmetric weights

matrix.

Assumption 2. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with zero

mean, variance �20 and E jvitj4+�

<1 for some � > 0.

Assumption 3. n is a nondecreasing function of T .

Assumption 4. The elements of Xnt and cn0 are nonstochastic and bounded, uniformly in n and t,

and limT!11nT

PTt=1

~X 0nt~Xnt exists and is nonsingular. Also, limT!1

1nT 3

TPt=1(cn0~t+ ~Xnt�0)0M 0

nMn(cn0~t+

~Xnt�0) 6= 0.Assumption 5. Sn(�) is invertible for all � 2 �. Furthermore, � is compact4 and the true parameter �0with j�0j < 1 is in the interior of �.Assumption 6. �0 + 0 + �0 = 1 with 0 6= 1. Also, dn;max = 1 and dn;min > �1, where dn;max and dn;minare the largest and smallest eigenvalues of An.

Assumption 7. The row and column sums of Wn and S�1n (�) are bounded uniformly5 in n, also uniformly

in � 2 � for S�1n (�).

3For notational purpose, we de�ne for any n � 1 vector at period t, �nt, we have ~�nt = �nt � ��nT and��n;t�1 =

�n;t�1 � ��nT;�1 for t = 1; 2; � � � ; T where ��nT = 1T

TPt=1

�nt and ��nT;�1 =1T

TPt=1

�n;t�1.

4Note that in the literature, � is typically assumed to be a compact subset of (�1; 1).5We say the row and column sums of a (sequence of n � n) matrix Pn are uniformly bounded in n if

sup1�i�n;n�1Pnj=1 jpij;nj <1 and sup1�j�n;n�1

Pni=1 jpij;nj <1.

4

Page 5: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Assumption 8. The row and column sums ofP1

h=1 abs(Bhn) are bounded uniformly in n, where [abs(Bn)]ij =

jBn;ij j.Assumptions 1 and 2 provide essential features of the weights matrix and disturbances of the model.

Assumption 3 allows two cases: (i) n ! 1 as T ! 1; (ii) n is �xed as T ! 1. For case (i), we say thatn; T ! 1 simultaneously. When exogenous variables Xnt are included in the model, it is convenient to

assume that the exogenous regressors are uniformly bounded, as is done in Assumption 4. Also, we make the

assumption that either cn0 or Xnt is relevant in the model. A simple consequence of Assumption 5 is that,

for the system (2.1), Ynt can be solved in terms of cn0, Xnt and Vnt. Assumption 6 speci�es that some roots

of An are equal to 1, while the other roots are less than 1 in absolute value. The �rst part of this assumption

rules out explicitly the pure unit root time series case without spatial interaction; more generally, it rules

out the case where 0 = 1 and �0 + �0 = 0. A su¢ cient condition for Assumption 6 is �0 < 1 with j 0j < 1and j�0j < 1 under �0 + 0 + �0 = 1 (see Proposition B.3 in Appendix B). Assumption 7 is originated byKelejian and Prucha (1998, 2001). The uniform boundedness of Wn and S�1n (�) is a condition to limit the

spatial correlation to a manageable degree. Assumption 8 is the absolute summability condition and the row

and column sum boundedness condition, which will play an important role to derive asymptotic properties

of QML estimators. This assumption is essential for the model because it limits the dependence between

time series and between cross sectional units for the stationary component Y snt in the process. In order to

justify the absolute summability of Bn in (2.5) and Assumption 8, a su¢ cient condition is kBnk < 1 for

any matrix norm (see Horn and Johnson (1985), Corollary 5.6.16) that satis�es kBnk = kabs (Bn)k. WhenkBnk < 1,

P1h=0B

hn exists and can be de�ned as (In �Bn)�1 (see Appendix B.1 for an example where An

has some eigenvalues equalling to one but others strictly less than one in absolute value).

2.2 Concentrated Likelihood Function

Denote Znt = (Yn;t�1; WnYn;t�1; Xnt) and � = (�0; �; �2)0 where � = ( ; �; �0)0. The log likelihood

function of (2.1) is

lnLn;T (�; cn) = �nT

2ln 2� � nT

2ln�2 + T ln jSn(�)j �

1

2�2

TXt=1

V 0nt(�)Vnt(�), (2.8)

where Vnt(�) = Sn(�)Ynt�Znt��cn and � = (�0; �; c0n)0. The QML estimators �nT and cnT are the extremeestimators derived from the maximization of (2.8). When the Vnt�s are normally distributed, �nT and cnT

are the ML estimators; when the Vnt�s are not normally distributed, �nT and cnT are QML estimators. As

the number of parameters goes to in�nity when n goes to in�nity, it�s convenient to use the concentrating

approach. We will concentrate cn and � out and focus asymptotic analysis on the estimator of �0 via the

concentrated likelihood function6 . For the concentrated likelihood function, the dimension of parameter

space does not change as n and/or T increase.

6The reason to concentrate out � is to avoid technical complication in the consistency proof and deriving the asymptotic

distribution jointly for the common parameters. See footnote 15 for details.

5

Page 6: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

From (2.8), using the �rst order conditions, we can get the concentrated estimators given �:

�nT (�) = [1

nT

TXt=1

~Z 0nt ~Znt]�1[

1

nT

TXt=1

~Z 0ntSn(�) ~Ynt], cnT (�) =1

T

TXt=1

(Sn(�)Ynt � Znt�nT (�)),

�2nT (�) =1

nT

TXt=1

(Sn(�) ~Ynt � ~Znt�nT (�))0(Sn(�) ~Ynt � ~Znt�nT (�)), (2.9)

and the concentrated likelihood is

lnLn;T (�) = �nT

2(ln 2� + 1)� nT

2ln �2nT (�) + T ln jSn(�)j . (2.10)

The QML estimator �nT maximizes the concentrated likelihood function (2.10), and the QML estimators of

�0, �20 and cn0 are �nT (�nT ), �2nT (�nT ) and cn(�nT ).

Also, the reduced form of (2.1) can be represented as

Ynt = S�1n (Znt�0 + cn0 + Vnt) (2.11)

= Znt�0 + �0GnZnt�0 + S�1n (cn0 + Vnt), t = 0; 1; :::; T ,

because In + �0Gn = S�1n where Gn = WnS�1n . Denote HnT =

1nT

TPt=1( ~Znt; Gn ~Znt�0)

0( ~Znt; Gn ~Znt�0) =0@ H1;nT H2;nT

H02;nT H3;nT

1A whereH1;nT =1nT

TPt=1

~Z 0nt ~Znt,H2;nT =1nT

TPt=1

~Z 0ntGn ~Znt�0 andH3;nT =1nT

TPt=1�00 ~Z

0ntG

0nGn ~Znt�0.

Hence, HnT is the covariance matrix of the explanatory variables of the reduced form (2.11) after taking

di¤erence from time average, which is crucial for our asymptotic analysis of QML estimators because �2nT (�)

in (2.10) involves Hi;nT terms for i = 1; 2; 3. To study HnT , it is desirable to decompose ~Znt into a stationary

part and a nonstationary part such that ~Znt = ~Zunt + ~Zsnt where

~Zunt = (~~Y un;t�1;Wn

~~Y un;t�1;0n�kx), ~Zsnt = (

~~Y sn;t�1;Wn~~Y sn;t�1;

~Xnt). (2.12)

As (see Proposition B.4 in Appendix B) ~~Y un;t�1 = Wn~~Y un;t�1 = Gn ~Z

unt�0, we have ~Zunt =

~~Y un;t�1 � c0

and ( ~Zunt; Gn ~Zunt�0) =

~~Y un;t�1 � c�0 where c = (1; 1;01�kx)0 and c� = (c0; 1)0. Hence, denoting Hs

nT =

1nT

TPt=1( ~Zsnt; Gn ~Z

snt�0)

0( ~Zsnt; Gn ~Zsnt�0), we can express HnT in terms of vectors such that

HnT � !nT�T 2 � c�c�0 + T � dnT � c�0 + T � c� � d0nT +Hs

nT =!nT�, (2.13)

where !nT = 1nT 3

TPt=1

~~Y u0n;t�1~~Y un;t�1, dnT = 1

!nT( 1nT 2

TPt=1( ~Zsnt; Gn ~Z

snt�0)

0 ~~Y un;t�1)0. Similarly, we can express

Hi;nT in terms of vectors such that

H1;nT = !nT�T 2 � cc0 + T � d1;nT � c0 + T � c � d01;nT +Hs

1;nT =!nT�, (2.14a)

H2;nT = !nT�T 2 � c+ T � d1;nT + T � d2;nT � c+Hs

2;nT =!nT�, (2.14b)

H3;nT = !nT�T 2 + 2T � d2;nT +Hs

3;nT =!nT�, (2.14c)

where d1;nT = 1!nT

1nT 2

TPt=1

~Zs0nt~~Y un;t�1, d2;nT = 1

!nT1

nT 2

TPt=1(Gn ~Z

snt�0)

0 ~~Y un;t�1. We notice that elements of

HnT are of the order O(T 2) and T�2HnT is singular in the limit. However, because of the pattern of the

nonstationary component, H�1nT exists and H

�1nT � c� has a lower order of O(T�1) from Proposition 2.1 below.

6

Page 7: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

2.3 Two Technical Propositions

To study the asymptotic behavior of H�1nT , we need the following proposition about matrix algebra.

Proposition 2.1 Let KT = T2cT c

0T+T (cT d

0T+dT c

0T )+AT , where cT , dT arem-dimensional column random

vectors, plimT!1cT 6= 0 and is nonstochastic, AT is positive de�nite for large enough T with probability one,plimT!1AT exists and is an m�m positive de�nite matrix. Denote �T = 1�

�d0TA

�1T dT �

(d0TA�1T cT )

2

c0TA�1T cT

�.

Under the assumption that plimT!1�T 6= 0, the sequence fKT g has the following properties:(a) the elements of K�1

T are Op(1);

(b) the elements of K�1T cT are Op(T�1);

(c) T 2c0TK�1T cT = 1 +Op(T

�1).

Proof. See the proof for Proposition B.13 in Appendix B.3.

In our application, we can apply KT to HnT in (2.13) and H1;nT in (2.14). To apply Proposition 2.1, we

need an additional assumption.

Assumption 9. HsnT is nonsingular for large enough T with probability one, plimT!1Hs

nT exists and is

nonsingular.

As HsnT =

1nT

TPt=1( ~Zsnt; Gn

~Zsnt�0)0( ~Zsnt; Gn

~Zsnt�0) is always positive semide�nite, with Assumption 9, HsnT

is positive de�nite for large enough T and plimT!1HsnT will also be positive de�nite.

In this paper, we need a central limit theorem for linear and quadratic forms of Vnt. Denote QnT =

QsnT +QunT where

QsnT =TXt=1

�U0n;t�1Vnt +D0

ntVnt + V0ntBnVnt � �20trBn

�, (2.15a)

QunT =kTT

TXt=1

�Mn

�cn0~t�1 + ~Xn;t�1�0 + �n;t�1

��0� Vnt. (2.15b)

Here, Unt =P1

h=1 Pnt;hVn;t+1�h where fPnt;hg1h=1 is a sequence of n � n nonstochastic square matrices,

Dnt is n� 1 vector, which is nonstochastic and bounded, uniformly in n and t, Bn is a nonstochastic n� nsymmetric matrix7 and its row and column sums are bounded uniformly in n and kT is O(1). Denote the

mean and variance of QnT as �QnTand �2QnT

respectively with �QnT= 0, we have the following proposition.

Assumption A1. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with zeromean, variance �20 and E jvitj

4+�<1 for some � > 0.

Assumption A2. The row and column sums ofP1

h=1 abs(Pnt;h) are bounded uniformly in n and t.

Assumption A3. The elements of n� 1 vector Dnt are nonstochastic and bounded, uniformly in n and t.Assumption A4. n is a nondecreasing function of T .

Proposition 2.2 Assume that row and column sums of Bn are bounded uniformly in n and assume thesequence 1

nT �2QnT

is bounded away from zero. Then under Assumptions A1, A2, A3 and A4, QnT

�QnT

d! N(0; 1).

Proof. See Appendix B.4.7The assumption that Bn is symmetric is maintained w.l.o.g. since V 0ntBnVnt = V 0nt[(Bn + B0n)=2]Vnt.

7

Page 8: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

3 Consistency and Asymptotic Distribution of �nT

We have the Taylor expansionpnT (� � �0) =

�� 1nT

@2 lnLn;T (��)

@�2

��1 �1pnT

@ lnLn;T (�0)@�

�where �� lies

between � and �0. From concentrated likelihood function (2.10):

1

nT

@ lnLn;T (�)

@�= � 1

2�2nT (�)

@�2nT (�)

@�� 1

ntrGn(�), (3.16a)

1

nT

@2 lnLn;T (�)

@�2= � 1

2�4n;T (�)

�@2�2nT (�)

@�2�2nT (�)� (

@�2nT (�)

@�)2�� 1

ntr(G2n(�)). (3.16b)

The �2nT (�),@�2nT (�)

@� and @2�2nT (�)

@�2have the explicit forms (see (B.49), (B.50) and (B.51)) implied by (2.9).

Using Proposition B.14, we have (derived in Appendix B.5)

�2nT (�) = �20 + j�� �0j �Op(1) +Op�max

�1pnT;1

T

��, (3.17a)

@�2nT (�)

@�= � 2

n�20trGn + j�� �0j �Op(1) +Op

�max

�1pnT;1

T

��, (3.17b)

@2�2nT (�)

@�2= 2(H3;nT �H0

2;nTH�11;nTH2;nT ) + 2�

20

1

ntrG0nGn +Op

�max

�1pnT;1

T

��,(3.17c)

pnT@�2nT (�0)

@�= � 2p

nT

TXt=1

~V 0ntG0n~Vnt �

2pnT

TXt=1

(�00 ~Z0ntG

0n �H0

2;nTH�11;nT

~Z 0nt) ~Vnt (3.17d)

+Op

�max

�1pnT;1

T;

rn

T 3

��,

where the Op(1), Op�max

�1pnT; 1T

��and Op

�max

�1pnT; 1T ;

pnT 3

��are uniform in �. (3.16) through

(3.17) will be used to derive the consistency and asymptotic distribution of the spatial e¤ect parameter �.

3.1 Consistency of �nT

For the log likelihood function (2.10) divided by the sample size nT , we have corresponding Qn;T (�) =

max�;cn;�2 E1nT lnLn;T (�) and the optimal solution to the problem is (equation: concentrated estimators

expect)

��nT (�) = [E1

nT

TXt=1

~Z 0nt ~Znt]�1[E

1

nT

TXt=1

~Z 0ntSn(�) ~Ynt], c�nT (�) = E

1

T

TXt=1

(Sn(�)Ynt � Znt��nT (�)),

��2nT (�) = E1

nT

TXt=1

(Sn(�) ~Ynt � ~Znt��nT (�))

0(Sn(�) ~Ynt � ~Znt��nT (�)). (3.18)

Hence,

Qn;T (�) = �1

2(ln 2� + 1)� 1

2ln��2nT (�) +

1

nln jSn(�)j . (3.19)

Claim 3.1 Under Assumptions 1-9, 1nT lnLn;T (�)�Qn;T (�)

p! 0 uniformly in � in any compact parameter

space � and Qn;T (�) is uniformly equicontinuous for � 2 �.Proof. See Appendix C.1.

8

Page 9: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

From (3.19), we have

@2Qn;T (�)

@�2= � 1

2��4n;T (�)

�@2��2nT (�)

@�2��2nT (�)� (

@��2nT (�)

@�)2�� 1

ntr(G2n(�)). (3.20)

Using (B.60) about ��2nT (�),@��2nT (�)

@� and @2��2nT (�)@�2

, we have

@2Qn;T (�0)=@�2 = � 1

�20(EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT ) (3.21)

� 1n

�trG0nGn + trG

2n �

2(trGn)2

n

�+O

�1

T

�,

and its limit will be negative if limT!1

�H3;nT �H0

2;nTH�11;nTH2;nT

�6= 0 or limn!1

1n tr(Cn+C

0n)(Cn+C0n)0 6=

0 where Cn = Gn � trGn

n In (see Appendix C.2). Claim 3.1 is the uniform convergence condition, combined

with identi�cation, we can get the consistency of QML estimators.

Theorem 3.2 Under Assumptions 1-9, �0 is globally identi�ed and �nT is consistent.

Proof. See Appendix C.3.

3.2 Distribution of �nT

Plugging (3.17) into @ lnLn;T (�)@� in (3.16a), we have

1pnT

@ lnLn;T (�0)

@�(3.22)

=1

�2nT (�0)

1pnT

TXt=1

~V 0nt(G0n �

1

ntrGn � In) ~Vnt +

1pnT

TXt=1

(�00 ~Z0ntG

0n �H0

2;nTH�11;nT

~Z 0nt) ~Vnt

!

+Op

�max

�1pnT;1

T;

rn

T 3

��.

As ~Znt has stationary and nonstationary parts (see (2.12)), we can decompose 1pnT

@ lnLn;T (�0)@� into two parts

accordingly such that 1pnT

@ lnLn;T (�0)@� = 1p

nT

@ lnLsn;T (�0)

@� + 1pnT

@ lnLun;T (�0)

@� + Op

�max

�1pnT; 1T ;

pnT 3

��where 1p

nT

@ lnLsn;T (�0)

@� is the stationary part and 1pnT

@ lnLun;T (�0)

@� is the nonstationary part as de�ned via

(C.5)-(C.9). For 1pnT

@ lnLsn;T (�0)

@� , it has two parts 1pnT

@ lnLsn;T (�0)

@� = 1pnT

@ lnLs�n;T (�0)

@� ���0;nT (de�ned in(C.5) and (C.6) respectively) where 1p

nT

@ lnLs�n;T (�0)

@� has zero mean and��0;nT has nonzero mean because the

latter involves �VnT . For 1pnT

@ lnLun;T (�0)

@� , it also has two parts 1pnT

@ lnLun;T (�0)

@� = 1pnT

@ lnLu�n;T (�0)

@� � N�0;nT(de�ned in (C.8) and (C.9) respectively) where 1p

nT

@ lnLu�n;T (�0)

@� has zero mean and N�0;nT has nonzero mean.To study the asymptotic behavior of 1p

nT

@ lnLn;T (�0)@� , we will �rst study 1p

nT

@ lnLs�n;T (�0)

@� + 1pnT

@ lnLu�n;T (�0)

@�

(using Proposition 2.2) , then ��0;nT + N�0;nT (using Lemma B.11).

Theorem 3.3 Under Assumptions 1-98 ,

1pnT

@ lnLn;T (�0)

@�+

rn

T(as�0;nT +

mn

n� au�0;nT ) +Op

�max

�rn

T 3;1pT

��p! N(0;��0 +�0). (3.23)

8Only parts of Assumptions 5 and 7 are required. Speci�cally, Sn is invertible; and the row and column sums of Wn and

S�1n are uniformly bounded in n.

9

Page 10: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

where

��0 =1

�20limT!1

(H3;nT �H02;nTH�1

1;nTH2;nT ) + limn!1

1

n(trG0nGn + trG

2n �

2(trGn)2

n), (3.24)

�0 =�4 � 3�40�40

limn!1

nXi=1

G2n;ii, (3.25)

as�0;nT =1

ntr�Gn 0 � (H�1

1;nTH2;nT )1In

��X1

h=0Bhn

�S�1n (3.26)

+1

ntr�Gn�0 � (H�1

1;nTH2;nT )2In

��X1

h=0WnB

hn

�S�1n ,

au�0;nT = T � (1� c0H�11;nTH2;nT ) �

1

2(1� �0). (3.27)

Proof. See Appendix C.4.

Also, we have the following claims.

Claim 3.4 Under Assumptions 1-9, 1nT

@2 lnLn;T (�)

@�2� 1nT

@2 lnLn;T (�0)

@�2= j�� �0j�O(1)+Op

�max

�1pnT; 1T

��.

Proof. See Appendix C.5.

Claim 3.5 Under Assumptions 1-9, 1nT

@2 lnLn;T (�0)

@�2� @2Qn;T (�0)

@�2= Op

�max

�1pnT; 1T

��:

Proof. See Appendix C.6.

Using Theorem 3.3, Claim 3.4 and Claim 3.5, we have the following theorem:

Theorem 3.6 Under Assumptions 1-9,

pnT (�nT � �0) +

rn

Tb�0;nT +Op

�max

�rn

T 3;1pT

��d! N(0;��1�0 +�

�2�0�0), (3.28)

where

b�0;nT = ��1�0

�as�0;nT +

mn

nau�0;nT

�. (3.29)

When nT ! 0,

pnT (�nT � �0)

d! N(0;��1�0 +��2�0�0). (3.30)

When nT ! k,

pnT (�nT � �0) +

pkb�0;nT

d! N(0;��1�0 +��2�0�0). (3.31)

When nT !1,

T (�nT � �0) + b�0;nTp! 0. (3.32)

Additionally ,if vit is normal, (3.28) becomes

pnT (�nT � �0) +

rn

Tb�0;nT +Op

�max

�rn

T 3;1pT

��d! N(0;��1�0 ). (3.33)

Proof. See Appendix C.7.

10

Page 11: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

4 Distribution of QML Estimator �nT and Bias Corrected �1

nT

4.1 QML Estimator �nT

After we get the distribution of �nT , the distribution of �n;T = �n;T (�nT ), �2nT = �

2nT (�nT ) and cn;T =

cn;T (�nT ) can be derived from (2.9). As is derived in Appendix C.8,

pnT��nT � �0

�= ��1�0;nT �

1pnT

@ lnLnT (�0)

@�+Op

�max

�1pnT;1

T;

rn

T 3

��, (4.1)

where

1pnT

@ lnLnT (�0)

@�=

0BBBBBB@1�20

1pnT

TPt=1

~Z 0nt~Vnt

1�20

1pnT

TPt=1( ~V 0ntGn

~Vnt � �20trGn) + 1�20

1pnT

TPt=1(Gn ~Znt�0)

0 ~Vnt

12�40

1pnT

TPt=1

�~V 0nt~Vnt � n�20

1CCCCCCA ,

��0;nT =1

�20

0@ EHnT 0

0 0

1A+0BB@0 0 0

0 1n

�tr(G0nGn) + tr(G

2n)�

1�20ntr(Gn)

0 1�20ntr(Gn)

12�40

1CCA .Using the central limit theorem for martingale di¤erence arrays (see Proposition 2.2), we have the joint

distribution of the common parameters in the following theorem. Denote

�0;n =�4 � 3�40�40

0BBB@0 0 0

0 1n

nPi=1

G2n;ii1

2�20ntrGn

0 12�20n

trGn14�40

1CCCA , (4.2)

b�0;nT � ��1�0;nT � an;�0 , (4.3)

where a�0;nT = as�0;n

+ mn

n au�0;T

with

as�0;n =

0BBBBBBBB@

1n tr

��P1h=0B

hn

�S�1n

�1n tr

�Wn

�P1h=0B

hn

�S�1n

�0

1n 0tr(Gn

�P1h=0B

hn

�S�1n ) + 1

n�0tr(GnWn

�P1h=0B

hn

�S�1n ) + 1

n trGn12�20

1CCCCCCCCA, (4.4)

au�0;T = T � 1

2(1� �0)� (c�0; 0)0. (4.5)

Theorem 4.1 Under Assumptions 1-9,

pnT (�nT � �0) +

rn

Tb�0;nT +Op

�max

�1pT;

rn

T 3

��d! N(0; lim

T!1��1�0;nT + lim

T!1��1�0;nT�0;n�

�1�0;nT

).

(4.6)

11

Page 12: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

When nT ! 0,

pnT (�nT � �0)

d! N(0; limT!1

��1�0;nT + limT!1

��1�0;nT�0;n��1�0;nT

). (4.7)

When nT ! k <1,

pnT (�nT � �0) +

pkb�0;nT

d! N(0; limT!1

��1�0;nT + limT!1

��1�0;nT�0;n��1�0;nT

). (4.8)

When nT !1,

T (�nT � �0) + b�0;nTp! 0. (4.9)

Additionally ,if vit is normal, (4.6) becomes

pnT (�nT � �0) +

rn

Tb�0;nT +Op

�max

�1pT;

rn

T 3

��d! N(0; lim

T!1��1�0;nT ): (4.10)

Proof. See Appendix C.9.

Hence, �nT has the bias of the order O(T�1). Also, the asymptotic variance matrix ofpnT �nT is

singular because ��1�0;nT � (c�0; 0) = O(T�1). This implies that we have a di¤erent rate of convergence of

(c�0; 0) � (�nT � �0) = �nT + nT + �nT � 1 using HnT � c� = O(T�1) in Proposition 2.1.

Theorem 4.2 Under Assumptions 1-9,

pnT 3(c�0; 0)(�nT � �0) +

rn

T(T (c�0; 0)b�0;nT ) +Op

�max

�1pT;

rn

T 3

��(4.11)

d! N�0; limT!1

!�1nT + limT!1

T 2(c�0; 0)( limT!1

��1�0;nT�0;nT��1�0;nT

)(c�0; 0)0�

Proof. See Appendix C.10.

The estimators of �xed e¤ects arepT consistent and asymptotically centered normal, as shown below.

Theorem 4.3 Under Assumptions 1-9, if (Yn;�1=T )i � E (Yn;�1=T )i = op(1) and E (Yn;�1=T )i = O(1)

uniformly in n and i, then, for i = 1; 2; � � � ; n,pT (ci;nT � ci;0)

d! N(0;�n;ci) where �n;ci is in (C.40).

When n also goes to in�nity,pT (ci;nT � ci;0)

d! N(0; �20).

Proof. See Appendix C.11.

4.2 Bias Corrected Estimators �1

nT

From (4:6), the QML estimator has the bias � 1T b�0;nT where b�0;nT � ��1�0;nT �

�as�0;n +

mn

n au�0;T

�and

the con�dence interval is not centered when nT ! k where 0 < k <1. Furthermore, when T is small relative

to n in the sense that nT !1, the presence of b�0;nT causes �nT to have the slower T�1 rate of convergence

in (4.9). An analytical bias reduction procedure is to correct the bias BnT = �b�0;nT by constructing anestimator BnT and de�ning the bias corrected estimator as

�1

nT = �nT �BnTT. (4.12)

12

Page 13: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

From Theorem 4.1, BnT = ���1�0;nT ��as�0;n +

mn

n au�0;T

�and a natural candidate for BnT is

h���1�;nT � a�;nT

i����=�nT

.

As ��1�nT ;nT

involves EHnT (�nT ) (see (B.47)) which is hard to evaluate, our alternative estimate is

BnT =h����1�;nT � a�;nT

i����=�nT

, (4.13)

and ���1�;nT is de�ned in (B.36) where EHnT (�nT ) in ��nT ;nT is replaced with HnT (�nT ). We show that when

n=T 3 ! 0, �1

nT ispnT consistent and asymptotically centered normal even when n=T !1.

To show our result for the bias corrected estimators, we need the following additional assumption.

Assumption 10. Either row sum or column sum ofP1

h=0Bhn(�) and

P1h=1 hB

h�1n (�) are bounded uniformly

in n and in a neighborhood of �0.

Assumption 10 can be veri�ed through the following lemma.

Lemma 4.4 If supnfkBn(�0))k1g < 1 (resp: supnfkBn(�0))k1g < 1), then the row sum (resp: column

sum) ofP1

h=0Bhn(�) and

P1h=1 hB

h�1n (�) are bounded uniformly in n and in a neighborhood of �0.

Proof. This is Lemma 3.9 in Yu, de Jong and Lee (2006).

Our result for the bias corrected estimator is as follows.

Theorem 4.5 Under Assumptions 1-10, if nT 3 ! 0,

pnT (�

1

nT � �0)d! N(0; lim

T!1��1�0;nT + lim

T!1��1�0;nT�0;n�

�1�0;nT

). (4.14)

Additionally,

pnT 3(c�0; 0)(�

1

nT � �0)d! N

�0; limT!1

!�1nT + limT!1

T 2(c�0; 0)���1�0;nT�0;n�

�1�0;nT

�(c�0; 0)0

�. (4.15)

Proof. See Appendix C.12.

4.3 Monte Carlo Results

We conduct a small Monte Carlo experiment to evaluate the performance of our ML estimators and

the bias corrected estimators. We generate samples from (2.1) using �a0 = (0:4; 0:2; 1; 0:4; 1)0 and �b0 =

(0:6;�0:4; 1; 0:8; 1)0 where �0 = ( 0; �0; �00; �0; �

20)0, and Xnt; cn0 and Vnt are generated from independent

normal distributions9 and the spatial weights matrix we use is a rook matrix. We use T = 10, 50 and n = 49,

196. For each set of generated sample observations, we calculate the ML estimator �nT and evaluate the

bias �nT � �0; we then construct the bias corrected estimator �1

nT and evaluate the bias �1

nT � �0. We dothis for 1000 times to see if the bias is reduced on average by using the analytical bias correction procedure,

9We generated the spatial panel data with 20 + T periods and then take the last T periods as our sample. And the initial

value is generated as N(0; In) in the simulation. We have also generated the data with a much longer history 1000+ T and the

results are similar. Also, in our example, the second largest eigenvalue of Wn is 0.94107. If we count it as a unit root, the bias

corrected estimator does not change much.

13

Page 14: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Table 1: Performance of QMLs and Their Bias Corrected Estimators: Biases

Case Bias of �nT (1st line) and �1

nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 �0.0758 0.0187 �0.0135 -0.0107 -0.1211

-0.0021 0.0161 0.0015 -0.0042 -0.0346

(2) 10 49 �b0 -0.0939 0.0785 -0.0180 -0.0087 -0.1234

-0.0050 0.0124 0.0026 -0.0063 -0.0374

(3) 10 196 �a0 -0.0749 0.0160 -0.0135 -0.0108 -0.1147

-0.0019 0.0163 0.0015 -0.0039 -0.0276

(4) 10 196 �b0 -0.0919 0.0745 -0.0184 0.0071 -0.1179

-0.0042 0.0119 0.0020 -0.0046 -0.0312

(5) 50 49 �a0 -0.0139 0.0081 -0.0009 -0.0018 -0.0219

0.0004 0.0024 -0.0000 -0.0030 -0.0020

(6) 50 49 �b0 -0.0170 0.0172 -0.0003 -0.0029 -0.0204

-0.0002 0.0031 0.0008 -0.0030 -0.0008

(7) 50 196 �a0 -0.0142 0.0087 -0.0005 -0.0019 -0.0208

0.0002 0.0040 0.0004 -0.0031 -0.0010

(8) 50 196 �b0 -0.0172 0.0166 -0.0003 -0.0019 -0.0202

-0.0004 0.0028 0.0008 -0.0023 -0.0003

Note: �a0 = (0:4, 0:2, 1, 0:4, 1) and �b0 = (0:6, �0:4, 1, 0:8, 1).

i.e., to compare 11000

P1000i=1 (�nT � �0)i with 1

1000

P1000i=1 (�

1

nT � �0)i. With two di¤erent values of �0 for eachn and T , �nite sample properties of both estimators are summarized in Table 1 and Table 2, where Table 1

is for the biases and Table 2 is for the standard errors of estimators.

We see that both estimators have some biases, but the bias corrected estimators reduce those biases.

This is consistent with our asymptotic analysis, because the bias corrected estimators will eliminate the bias

of order O(T�1). Also, the bias reduction is achieved while there is no signi�cant increase in the variance of

the estimators, as can be seen from Table 2.

For di¤erent cases of n and T , we see that for each given n, when T is larger, the biases of two sets of

estimators will be smaller and the variances will be smaller; for each given T , when n is larger, the biases of

two sets of estimators will be nearly the same, but the variances will be smaller. This is consistent with our

theoretical prediction.

14

Page 15: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Table 2: Performance of QMLs and Their Bias Corrected Estimators: Standard Errors

Case S.E. of �nT (1st line) and �1

nT (2nd line)

T n �0 � � � �2

(1) 10 49 �a0 0.0320 0.0534 0.0454 0.0426 0.0568

0.0336 0.0572 0.0476 0.0428 0.0625

(2) 10 49 �b0 0.0312 0.0415 0.0460 0.0237 0.0582

0.0327 0.0441 0.0482 0.0237 0.0639

(3) 10 196 �a0 0.0160 0.0276 0.0227 0.0221 0.0286

0.0168 0.0296 0.0238 0.0222 0.0315

(4) 10 196 �b0 0.0156 0.0214 0.0230 0.0126 0.0292

0.0163 0.0228 0.0241 0.0126 0.0321

(5) 50 49 �a0 0.0136 0.0219 0.0203 0.0184 0.0283

0.0137 0.0222 0.0205 0.0185 0.0289

(6) 50 49 �b0 0.0124 0.0165 0.0206 0.0102 0.0290

0.0125 0.0167 0.0208 0.0103 0.0296

(7) 50 196 �a0 0.0068 0.0113 0.0102 0.0095 0.0142

0.0069 0.0114 0.0103 0.0096 0.0144

(8) 50 196 �b0 0.0062 0.0085 0.0103 0.0054 0.0145

0.0062 0.0086 0.0104 0.0055 0.0148

5 Conclusion

In this paper, we derived the properties of QML estimators of a nonstationary spatial dynamic panel

data with �xed e¤ects when both n and T are large. For the distribution of the common parameters, when

T is asymptotically large relative to n, the estimators arepnT consistent and asymptotically normal, with

the limit distribution centered around 0; when n is asymptotically proportional to T , the estimators arepnT consistent and asymptotically normal, but the limit distribution is not centered around 0; and when

n is large relative to T , the estimators are consistent with rate T , and have a degenerate limit distribution.

Compared to Yu, de Jong and Lee (2006), the estimators� rate of convergence will be the same, but the

asymptotic variance matrix will be driven by the nonstationary component and it is singular. Also, the sum

of the spatial e¤ect coe¢ cients and dynamic e¤ect coe¢ cient will have a higher rate of convergence. We

also propose a bias correction for our estimators. We show that as long as T grows faster than n1=3, the

correction will eliminate the bias of order O(T�1) and yield a centered con�dence interval.

15

Page 16: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Appendices

A Notations

The following list summarizes some frequently used notations in the text:

Sn(�) = In � �Wn for any possible �.

Sn = In � �0Wn.

Gn =WnS�1n . An = S�1n ( 0In + �0Wn).

An = RnDnR�1n where Rn is the eigenvectors and Dn is the diagonal matrix of eigenvalues.

Jn = Diagf10mn; 0; � � � ; 0g where 1mn

is an mn � 1 vector of ones.Mn = RnJnR

�1n .

c = (1; 1;01�kx)0 and c� = (c0; 1)0.

Znt = (Yn;t�1; WnYn;t�1; Xnt).

� = (�0; �; �2)0 where � = ( ; �; �0)0.

lnLn;T (�; cn) is the log-likelihood of � and cn.

lnLn;T (�) is the concentrated log-likelihood of �.

Qn;T (�) = max�;cn;�2 E1nT lnLn;T (�; cn).

~�nt = �nt � ��nT and��n;t�1 = �n;t�1 � ��nT;�1 where ��nT = 1

T

TPt=1�nt and ��nT;�1 = 1

T

TPt=1�n;t�1.

!nT =1

nT 3

TPt=1

~~Y u0n;t�1~~Y un;t�1 (see (2.7) for

~~Y un;t�1).

B Algebra for the Nonstationary Case

B.1 An Example to Justify the Assumptions

Consider the group case with equal weights for peers, i.e., Wn is a block diagonal matrix with its jth

block being Wjn =1

nj�1 [lnj l0nj � Inj ], j = 1; � � � ; R, where R is the total number of groups.

The eigenvalues are roots of the characteristic polynomial

jWjn � �Inj j = j1

nj � 1lnj l

0nj � (�+

1

nj � 1)Inj j = (�1)nj (�+

1

nj � 1)nj�1(�� 1);

by using the property of a determinant that jA + �bd0j = jAj(1 + �d0A�1b) (Proposition 31 in Dhrymes(1978)). Hence the eigenvalues of Wjn are a single root with the unit, and (nj � 1) multiple roots withthe value (� 1

nj�1 ) for the jth group. As Wn is a block diagonal matrix, its determinant is the product of

the determinants of the diagonal block matrices. It follows that there are R-multiple roots of the unit, and

(nj � 1)-multiple roots of the value (� 1nj�1 ) for each j = 1; � � � ; R.

16

Page 17: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

As the total number of unit eigenvalues ofWn is R, the corresponding orthonormal matrix of eigenvectors

of Wn is Rn = (Rn;R; Rn;n�R), where

Rn;R =

0BBBBBB@

ln1pn1

0 � � � 0

0ln2pn2

� � � 0

... � � � . . . � � �0 0 � � � lnRp

nR

1CCCCCCA :

As Jn =

0@ IR 0

0 0

1A, we have

RnJnR0n = (Rn;R; Rn;n�R)Jn(Rn;R; Rn;n�R)

0 = Rn;RR0n;R =

0BBB@ln1 l

0n1

n1� � � 0

0. . . 0

0 � � � lnR l0nR

nR

1CCCA ;which is uniformly bounded in both row and column sums.

The matrix An = (In��0Wn)�1( 0In+�0Wn) for this group setting is a diagonal block matrix. Because

Bn = An �RnJnRn as de�ned, Bn is also a block diagonal matrix. Consider the �rst diagonal block of Anwhich is An1 = (In1 � �0W1n)

�1( 0In1 + �0W1n). Note that

(In1 � �0W1n)�1 =

n1 � 1n1 � 1 + �0

(In1 ��0

n1 � 1 + �0ln1 l

0n1)

�1

=n1 � 1

n1 � 1 + �0(In1 +

�0(n1 � 1)(1� �0)

ln1 l0n1):

As 0 + �0 = 1� �0, it follows that

An1 = (In1 � �0W1n)�1( 0In1 + �0W1n)

=n1 � 1

n1 � 1 + �0f( 0 �

�0n1 � 1

)In1 +�0 + �0n1 � 1

ln1 l0n1):

Hence,

Bn1 = An1 �ln1 l

0n1

n1= (

n1 0 � 1 + �0n1 � 1 + �0

)(In1 �1

n1ln1 l

0n1):

Because (In1 � 1n1ln1 l

0n1) is an idempotent matrix, it follows that for any positive integer h,

Bhn1 = (n1 0 � 1 + �0n1 � 1 + �0

)h(In1 �1

n1ln1 l

0n1):

The (In1 � 1n1ln1 l

0n1) is uniformly bounded in both row and column sums, so

P1h=0 abs(B

hn1) will be uni-

formly bounded in both row and column sum if j(n1 0�1+�0n1�1+�0 )j < 1. The corresponding Bn will be so if

maxj=1;��� ;R j(nj 0�1+�0nj�1+�0 )j < 1. A su¢ cient condition for this to occur is that j�0j < 1, 0 < 1 and �0 < 1.This is so as follows. De�ne the function f(x) = x 0�1+�0

x�1+�0 . The derivative of f(x) isdf(x)dx = (1��0)(1� 0)

(x�1+�0)2

which will be positive if 1 > �0 and 1 > 0. As the upper bound of f(x) will be 0 and its lower bound

17

Page 18: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

is f(2) = 2 0�1+�01+�0

= 1��0�2�01+�0

> �1 because 1 + �0 > 0, 1 > �0 and x � 2. Under this situation, we canjustify the Assumption 8 in the text for this example.

The same consideration will also justify that the smallest eigenvalue of An is less than one in absolute

value in Assumption 610 . Because An is a block diagonal matrix, it is su¢ cient to consider the eigenvalues

of each block Anj . An eigenvalue � of Anj for some j will also be the eigenvalue of An. This is so because

if � is an eigenvalue of Anj with eigenvector xnj such that Anjxnj = �xnj , then Anxn = �xn where

xn = (0; � � � ; 0; x0nj ; 0; � � � ; 0)0. Consider the eigenvalue (� 1n1�1 ) of Wn1 (and the remaining eigenvalue is

one) and the corresponding eigenvalue x1. As

An1x1 = (In1 � �0Wn1)�1( 0In1 + �0Wn1)x1 = (

n1 0 � 1 + �0n1 � 1 + �0

)x1;

thus the corresponding eigenvalue of An is (n1 0�1+�0n1�1+�0 ), which lies in (�1; 0) with 0 < 1, as previously

shown.

B.2 Some Basic Lemmas

Proposition B.1 Suppose that Wn is a weights matrix row normalized from a symmetric matrix Cn, i.e.,

Wn = ��1n Cn, where �n is a diagonal matrix with its diagonal elements formed by the row sums of Cn.

Then, the eigenvalues of Wn are all real and Wn is diagonalizable.

Proposition B.2 Suppose that An = (In��0Wn)�1( 0In+�0Wn), where Wn is the row normalized weights

matrix in Proposition B.1. Then, An is diagonalizable with all real eigenvalues. If Wn is diagonalizable as

Wn = RnD�nR

�1n , then An can be diagonalizable as An = RnDnR

�1n , with its eigenvalue matrix Dn =

(In � �0D�n)�1( 0In + �0D

�n).

Proposition B.3 Denote dn;i�s the eigenvalues of An. Under Assumption 1 for Wn, j�0j < 1 and �0 +

0 + �0 = 1, (1) if �0 + 0�0 > 0 and 0��01+�0

> �1, we have dn;max = 1 and dn;min > �1 ; (2) when�0 + 0 + �0 = 1, ��0 < 1, j 0j < 1 and j�0j < 1� implies ��0 + 0�0 > 0 and

0��01+�0

> �1�.

Proposition B.4 (1) Suppose that j�0j < 1 and 0 6= 1, then the unit eigenvalues of Wn correspond to unit

eigenvalues of An via the relation 0+�0$ni

1��0$ni, if and only if �0 + 0 + �0 = 1.

(2) AnRnJnR�1n = RnJnR�1n An = RnJnR

�1n .

(3) Assuming that the unit eigenvalues of Wn correspond to unit eigenvalues of An, then,

(3i) WnRnJnR�1n = RnJnR

�1n ;

(3ii) S�1n RnJnR�1n = RnJnR

�1n S�1n = 1

1��0RnJnR�1n .

10See also su¢ cient conditions on parameters in Proposition B.3, which guarantee Assumption 6.

18

Page 19: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Proposition B.5 Under Assumptions 5 and 6, for Ynt in (2.1), Ynt = Y unt + Ysnt where

Y unt = (RnJnR�1n )

Yn;�1 + cn0

t

(1� �0)+

Pt�1h=0Xnh�0(1� �0)

+

Pt�1h=0 Vnh(1� �0)

!, (B.1)

Y snt =1Xh=0

BhnS�1n cn0 +

1Xh=0

BhnS�1n Xn;t�h�0 +

1Xh=0

BhnS�1n Vn;t�h. (B.2)

Furthermore,

~~Y un;t�1 =RnJnR

�1n

(1� �0)

cn0[(t� 1)� (

T � 12

)] +t�2Xh=0

(Xnh�0 + Vnh)�1

T

T�2Xh=0

(T � 1� h)(Xnh�0 + Vnh)!,

~~Y sn;t�1 = S�1n

1Xh=0

Bhn[(Xn;t�1�h �1

T

T�1Xt=0

Xn;t�h)�0 + (Vn;t�1�h �1

T

T�1Xt=0

Vn;t�h)]:

Proposition B.6 Under Assumptions 5 and 6, for the nonstationary part Y un;t�1 of Yn;t�1,

WnYun;t�1 = Y

un;t�1, GnY

un;t�1 =

1

1� �0Y un;t�1. (B.3)

Also, for nonstationary part Zunt of Znt, denote c = (1; 1;01�kx)0, we have

Zunt = (Yun;t�1;WnY

un;t�1;01�kx) = Y

un;t�1 � c0, GnZunt�0 = Y un;t�1. (B.4)

Denote �nt =Pt�1

h=0 Vnh, Xnt =Pt�1

h=0Xnh, Unt =P1

h=1 Pnt;hVn;t+1�h and Wnt =P1

h=1Qnt;hVn;t+1�h

where Pnt;h and Qnt;h are n�n nonstochastic matrices and the row and column sums ofP1

h=1 abs(Pnt;h) andP1h=1 abs(Qnt;h) are bounded uniformly in n and t. Also, n� 1 vector Dnt are nonstochastic and bounded,

uniformly in n and t. We note that ��nT =1T

PTt=1 �nt =

PTh=1

hT Vn;T�h. Also, as

�UnT =�PT

t=1 Unt�=T ,

we have �UnT =P1

h=1�PnT;hVn;T+1�h, where

�PnT;h =

8<: 1T (PnT;1 + PnT;2 + � � �+ PnT;h) =

1T

Phg=1 Pn;T�h+g;g for h � T

1T

Phg=h�T+1 Pn;T�h+g;g for h > T .

(B.5)

Lemma B.7 Under Assumption 2, for t � s,

E(UntW0ns) = �

20

1Xh=1

Pnt;t�s+hQ0ns;h

!, E(U0ntWns) = �

20tr

1Xh=1

P 0nt;t�s+hQns;h

!, (B.6)

Cov(U0ntWnt;U0nsWns) = (�4 � 3�40)1Xh=1

nXi=1

(P 0nt;t�s+hQnt;t�s+h)ii(P0ns;hQns;h)ii +

�40tr

" 1Xh=1

Pns;hP0nt;t�s+h

! 1Xh=1

Qnt;t�s+hQ0ns;h

!+

1Xh=1

Qns;hP0nt;t�s+h

! 1Xh=1

Qnt;t�s+hP0ns;h

!#Lemma B.8 11Denote Bn an n � n nonstochastic matrix which is row sum and column sum bounded uni-

formly in n. Under Assumptions 1-8, for �nt, cn0, Xnt, Unt, Dnt and their cross products, we have1

nT

XT

t=1(cn0~t+ ~Xnt�0)0Bn(cn0~t+ ~Xnt�0) = O

�T 2�; (B.7)

11For Mn = RnJnR�1n = An � Bn, as An = S�1n ( 0In + �0Wn) is row sum and column sum bounded, and Bn is also row

sum and column sum bounded implied by Assumption 8, Mn is also row sum and column sum bounded. Hence, we can replace

Bn with Mn or M 0nMn to apply following lemmas.

19

Page 20: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

1

nT

XT

t=1(cn0~t+ ~Xnt�0)0Bn~�nt = Op

rT 3

n

!with zero mean; (B.8)

1

nT

XT

t=1~�0ntBn~�nt � E(

1

nT

XT

t=1~�0ntBn~�nt) = Op

�Tpn

�(B.9)

where E( 1nT

PTt=1

~�0ntBn~�nt) = O(T );

1

nT

XT

t=1(cn0~t+ ~Xnt�0)0BnDnt = O (T ) ; (B.10)

1

nT

XT

t=1(cn0~t+ ~Xnt�0)0 ~Unt = Op

rT

n

!with mean zero; (B.11)

1

nT

XT

t=1~�0ntBnDnt = Op

rT

n

!with mean zero; (B.12)

1

nT

XT

t=1~�0nt~Unt � E(

1

nT

XT

t=1~�0nt~Unt) = Op

rT

n

!, (B.13)

where E( 1nT

PTt=1

~�0nt~Unt) = O(1) and

1

nT

XT

t=1~U0nt ~Wnt � E(

1

nT

XT

t=1~U0nt ~Wnt) = O

�1pnT

�, (B.14)

where E( 1nT

PTt=1

~U0nt ~Wnt) = O(1).

Lemma B.9 Denote Bn an n�n nonstochastic matrix which is row sum and column sum bounded uniformlyin n. Under Assumptions 1-8, for cn0, Xnt, �nt, Vnt, Unt and their cross products,

1

nT

XT

t=1(cn0~t�1 + ~Xn;t�1�0)0BnVnt = Op

rT

n

!; (B.15)

1

nT

XT

t=1�0n;t�1BnVnt = Op

�1pn

�; (B.16)

1

n��0n;T�1Bn �VnT � E

1

n��0n;T�1Bn �VnT = Op

�1pn

�, (B.17)

where E 1n��0n;T�1Bn �VnT = O(1) and for Bn =M 0

n, E1n (Mn

��n;T�1)0 �VnT = �

20(T�1)(T�2)mn

2T 2n = O(mn

n );

1

n�U0n;T�1 �VnT � E

1

n�U0n;T�1 �VnT = Op

�1pn

�, (B.18)

where E 1n�U0n;T�1 �VnT = O(

1T );

1

nT

TXt=1

~V 0ntBn ~Vnt = (1�1

T)�20

1

ntr(Bn) +Op

�1pnT

�. (B.19)

20

Page 21: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Lemma B.10 Denote Bn an n � n nonstochastic matrix which is row sum and column sum bounded uni-

formly in n. Under Assumptions 1-8,

1

nT

TXt=1

~~Y u0n;t�1Bn~~Y un;t�1 � E

1

nT

TXt=1

~~Y u0n;t�1Bn~~Y un;t�1 = Op

T �rT

n

!, (B.20)

1

nT

TXt=1

~~Y u0n;t�1Bn~~Y sn;t�1 � E

1

nT

TXt=1

~~Y u0n;t�1Bn~~Y sn;t�1 = Op

rT

n

!, (B.21)

and1

nT

TXt=1

~~Y s0n;t�1Bn~~Y sn;t�1 � E

1

nT

TXt=1

~~Y s0n;t�1Bn~~Y sn;t�1 = Op

�1pnT

�, (B.22)

where E 1nT

PTt=1

~~Y u0n;t�1Bn~~Y un;t�1 = O(T

2), E 1nT

PTt=1

~~Y un;t�1Bn~~Y sn;t�1 = O(T ) and E

1nT

PTt=1

~~Y sn;t�1Bn~~Y sn;t�1 =

O(1).

Lemma B.11 Under Assumptions 1-8 and Bn is an n � n nonstochastic matrix which is row sum and

column sum bounded uniformly in n,

1

nT

TXt=1

~~Y u0n;t�1BnVnt � E1

nT

TXt=1

~~Y u0n;t�1BnVnt = Op

rT

n

!, (B.23)

1

nT

TXt=1

~~Y s0n;t�1BnVnt � E1

nT

TXt=1

~~Y s0n;t�1BnVnt = Op�

1pnT

�, (B.24)

where E 1nT

PTt=1

~~Y u0n;t�1BnVnt = O(1) and E 1nT

PTt=1

~~Y s0n;t�1BnVnt = O( 1T ). For the special case with Bn =In, we have E 1

nT

PTt=1

~~Y u0n;t�1Vnt = �20(T�1)(T�2)mn

2T 2n1

1��0 = O(mn

n ) where mn is the number of unit roots.

Proposition B.12 Consider the m�m square matrix HT = Im+T (gT d0T + hT b0T ), where gT , hT , bT , and

dT are all m-dimensional column vectors. Then, under the assumption �T 6= 0,

H�1T = Im �

T

�TBT , (B.25)

where

�T = 1 + T (b0ThT + d

0T gT )� T 2Det ((bT ; dT )0(gT ; hT )) , (B.26)

and

BT = (hT b0T + gT d

0T )� T [(d0ThT )gT b0T + (b0T gT )hT d0T � (d0T gT )hT b0T � (b0ThT )gT d0T ]: (B.27)

Proposition B.13 Consider the m�m stochastic matrix KT ,

KT = T2cT c

0T + T (bT d

0T + d

0T bT ) +AT ;

where cT , bT and dT are m-dimensional column random vectors with cT proportional to bT such that cT =

!T � bT , where !T is a nonzero random variable with probability one. Suppose that, as T !1, cT , bT , dT ,and !T converge in probability, respectively, to �nite limits c, b, d and ! where c and ! are nonstochastic and

21

Page 22: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

nonzero. Assume that AT is positive de�nite for large enough T with probability one and its limit A exists

and is also a positive de�nite matrix. Then, under the condition that � = 1� 1!2

hd0A�1d� (d0A�1b)2

b0A�1b

i6= 0,

(a) the limit of K�1T is a �nite matrix Lk, where

Lk = (A�1 � 1

b0A�1bA�1bb0A�1) +

1

!2�A�1(d� d

0A�1b

b0A�1bb)(d� d

0A�1b

b0A�1bb)0A�1;

(b) K�1T � cT = Op(T�1);

(c) c0TK�1T cT = Op(T

�2) and T 2c0TK�1T cT = 1 +Op(T

�1).

Proposition B.14 Assuming plimT!1HsnT is positive de�nite, we have

H�11;nT c = Op(T

�1), (B.28)

c0H�11;nT c = Op(T

�2), (B.29)

H�11;nTH2;nT = Op(1), (B.30)

1� c0H�11;nTH2;nT = Op(T

�1), (B.31)�H3;nT �H0

2;nTH�11;nTH2;nT

��1exists and is Op(1), (B.32)

plimT!1

�H3;nT �H0

2;nTH�11;nTH2;nT

��16= 0, (B.33)�

H3;nT �H02;nTH�1

1;nTH2;nT

���EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT

� p! 0, (B.34)

plimT!1

�H3;nT �H0

2;nTH�11;nTH2;nT

�= limT!1

�EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT

�. (B.35)

Proposition B.15 For QML estimator �nT in Theorem 4.1, de�ne

���nT ;nT =1

�2nT

0@ HnT (�nT ) 0

0 0

1A+0BBB@0 0 0

0 1n

htr(G0n(�nT )Gn(�nT )) + tr(G

2n(�nT ))

i1

�2nTntr(Gn(�nT ))

0 1�2nTn

tr(Gn(�nT ))1

2�4nT

1CCCA ,(B.36)

where HnT (�nT ) is HnT (�) (see (B.47)) evaluated at �nT , then,

���1�nT ;nT

� ��1�0;nT = Op�max

�1pnT;1

T

��, (B.37)

T �h���1�nT ;nT

� ��1�0;nTi(c�0; 0)0 = Op

�max

�1pnT;1

T

��. (B.38)

22

Page 23: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

B.3 Proofs for Basic Lemmas

Proof for Proposition B.1: The �rst part is known in Ord (1975). To show that Wn is diagonalizable,

note that as Wn = ��1n Cn, it implies that �

12nWn�

� 12

n = �� 12

n Cn�� 12

n , which is a symmetric matrix. Let D�n

be the eigenvalue matrix of �� 12

n Cn�� 12

n , which is real; and let R�n be the corresponding orthonormal matrix

such that �� 12

n Cn�� 12

n = R�nD�nR

�0n . Hence, Wn = �

� 12

n (R�nD�nR

�0n )�

12n = RnD

�nR

�1n where Rn = �

� 12

n R�n is

an eigenvector matrix of Wn, and D�n is the eigenvalue matrix for Wn. �

Proof for Proposition B.2: Because Wn = RnD�nR

�1n from Proposition B.1, it follows that

An = (In � �0Wn)�1( 0In + �0Wn)

= (In � �0RnD�nR

�1n )�1( 0In + �0RnD

�nR

�1n )

= Rn(In � �0D�n)�1R�1n �Rn( 0In + �0D�

n)R�1n

= Rn(In � �0D�n)�1( 0In + �0D

�n)R

�1n :

Note that Dn = (In��0D�n)�1( 0In+ �0D

�n) is a diagonal real matrix because D

�n is diagonal and real, and

(In � �0D�n) is invertible because (In � �0Wn) is assumed to be invertible to begin with. �

Proof for Proposition B.3: For (1): The eigenvalue of An has the formula 0+�0$ni

1��0$niwhere $ni is an

eigenvalue of Wn with j$nij � 1 for all i and $ni = 1 for some i (see Ord (1975)). Because@� 0+�0$ni1��0$ni

�@$ni

=�0+ 0�0

(1��0$ni)2and j$nij � 1 for all i, �0 + 0�0 > 0 will imply that 0+�0$ni

1��0$niis an increasing function

of $ni. As $n;max = 1, the maximum value of 0+�0$ni

1��0$niwill be achieved at $ni = 1; additionally, as

$n;min � �1, 0��01+�0> �1 will assure that the minimum value of 0+�0$ni

1��0$niwill be greater than �1. For (2):

If �0+ 0+�0 = 1, then �0+ 0�0 > 0 is equivalent to (1� 0)(1��0) > 0; also, 0��01+�0

> �1 if 0+�0 > 0and �0 > �1. Under �0 + 0 + �0 = 1, 0 + �0 > 0 is equivalent to �0 < 1. The conclusion in (2) follows. �

Proof for Proposition B.4: (1) An eigenvalue dni of An has the form dni = 0+�0$ni

1��0$nifor some eigen-

value $ni of Wn. Thus, dni = 1 is equivalent to 0 + �0$ni = 1 � �0$ni when j�0j < 1. It is apparent

that $ni = 1 is equivalent to dni = 1 when �0 + 0 + �0 = 1 and 0 6= 1. That �0 + 0 + �0 = 1

is a necessary condition is trivial. (2) Because An = RnDnR�1n and DnJn = (Jn + ~Dn)Jn = Jn, we

have AnRnJnR�1n = RnDnJnR�1n = RnJnR

�1n . Note that because Jn and Dn are diagonal matrices,

RnJnR�1n An = RnJnDnR

�1n = RnDnJnR

�1n = AnRnJnR

�1n . (3) From Proposition B.1, Wn = RnD

�nR

�1n .

Hence,WnRnJnR�1n = RnD

�nJnR

�1n = RnJnR

�1n as D�

nJn = Jn when the unit eigenvalues ofWn correspond

to unit eigenvalues of An. As S�1n = Rn(In � �0D�n)�1R�1n , we have S�1n RnJn = Rn(In � �0D�

n)�1Jn =

11��0RnJn because (In � �0D

�n)�1Jn =

11��0 Jn. It follows that S

�1n RnJnR

�1n = 1

1��0RnJnR�1n . Further-

more, RnJnR�1n S�1n = RnJn(In � �0D�n)�1R�1n = Rn(In � �0D�

n)�1JnR

�1n = S�1n RnJnR

�1n . �

Proof for Proposition B.5: Suppose that the number of unit roots of An is mn, then Dn = Jn + ~Dn

where Jn = Diagf10mn; 0; � � � ; 0g and ~Dn = Diagf0; � � � ; 0; dn;mn+1; � � � ; dnng with jdnj j < 1 for all j =

23

Page 24: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

mn + 1; � � � ; n. As Jn is idempotent and Jn � ~Dn = 0, we have Ahn = RnJnR�1n +Bhn where Bhn = Rn ~D

hnR

�1n

for any h = 1; 2; 3; � � � .Because Ynt = AnYn;t�1 + S

�1n (Xnt�0 + cn0 + Vnt), we can decompose Ynt as Ynt = Y unt + Y

snt where

Y unt = RnJnR�1n Yn;t�1 and Y snt = BnYn;t�1 + S

�1n (Xnt�0 + cn0 + Vnt). By using BnAn = B

2n and BnS

�1n =

S�1n Bn, Y snt can be written as an in�nite sum of the past by recursive induction for any integer t:

Y snt = BnYn;t�1 + S�1n (Xnt�0 + cn0 + Vnt)

= Bn[AnYn;t�2 + S�1n (Xn;t�1�0 + cn0 + Vn;t�1)] + S

�1n (Xnt�0 + cn0 + Vnt)

= S�1n (1Xh=0

Bhn)cn0 + S�1n

1Xh=0

Bhn(Xn;t�h�0 + Vn;t�h).

For Y unt, there are two versions which will be useful. By using RnJnR�1n An = RnJnR

�1n and RnJnR�1n S�1n =

S�1n RnJnR�1n ,

Y unt = RnJnR�1n Yn;t�1

= RnJnR�1n [AnYn;t�2 + S

�1n (Xn;t�1�0 + cn0 + Vn;t�1)]

= RnJnR�1n Yn;t�2 + S

�1n RnJnR

�1n (Xn;t�1�0 + cn0 + Vn;t�1)

= RnJnR�1n Yn;0 + (t� 1)S�1n RnJnR

�1n cn0 + S

�1n RnJnR

�1n

t�1Xh=1

(Xnh�0 + Vnh);

for t = 1; 2; � � � , whereP0

h=1 is a zero as a convention. Another version is to expand Yunt to Yn;�1 as

Y unt = RnJnR�1n Yn;�1 + tS

�1n RnJnR

�1n cn0 + S

�1n RnJnR

�1n

t�1Xh=0

(Xnh�0 + Vnh); (B.39)

for t = 0; 1; 2; � � � .Using 1

T

PTt=1(t�1) = 1

T

PT�1t=1 t =

T�12 , 1T

PT�1t=1

Pt�1h=0 zh =

1T

PT�1t=1 (T�t)zt�1 and 1

T

PTt=2

Pt�1h=1 zh =

1T

PT�1t=1 (T � t)zt, it follows that

�Y unT =1

T

TXt=1

Y unt

= RnJnR�1n Yn0 + S

�1n RnJnR

�1n cn0

1

T

TXt=1

(t� 1) + S�1n RnJnR�1n

1

T

TXt=2

t�1Xh=1

(Xnh�0 + Vnh)

= RnJnR�1n Yn0 + S

�1n RnJnR

�1n cn0(

T � 12

) + S�1n RnJnR�1n

1

T

T�1Xt=1

(T � t)(Xnt�0 + Vnt);

and

�Y unT;�1 =1

T

T�1Xt=0

Y unt

= RnJnR�1n Yn;�1 + S

�1n RnJnR

�1n cn0

1

T

T�1Xt=1

t+ S�1n RnJnR�1n

1

T

T�1Xt=1

t�1Xh=0

(Xnh�0 + Vnh)

= RnJnR�1n Yn;�1 + S

�1n RnJnR

�1n cn0(

T � 12

) + S�1n RnJnR�1n

1

T

T�2Xt=0

(T � 1� t)(Xnt�0 + Vnt):

24

Page 25: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Hence,

~~Y un;t�1 = Y un;t�1 � �Y unT;�1

= S�1n RnJnR�1n fcn0[(t� 1)� (

T � 12

)] +

t�2Xh=0

(Xnh�0 + Vnh)�1

T

T�2Xh=0

(T � 1� h)(Xnh�0 + Vnh)g

=RnJnR

�1n

(1� �0)fcn0[(t� 1)� (

T � 12

)] +t�2Xh=0

(Xnh�0 + Vnh)�1

T

T�2Xh=0

(T � 1� h)(Xnh�0 + Vnh)g

because S�1n RnJnRn =1

1��0RnJnR�1n from Proposition B.4 (3ii). For the stationary component,

�Y snT;�1 =1

T

T�1Xt=0

Y snt = S�1n (

1Xh=0

Bhn)cn0 + S�1n

1Xh=0

Bhn1

T

T�1Xt=0

(Xn;t�h�0 + Vn;t�h);

and

~~Y sn;t�1 = Ysn;t�1 � �Y snT;�1 = S

�1n

1Xh=0

Bhn[(Xn;t�1�h �1

T

T�1Xt=0

Xn;t�h)�0 + (Vn;t�1�h �1

T

T�1Xt=0

Vn;t�h)]: �

Proof for Proposition B.6: We use the result of Proposition B.4 to prove the result here. Conditions

there are satis�ed under Assumptions 5 and 6. That WnYun;t�1 = Y

un;t�1 follows from (B.1) of Proposition

B.5 using WnRnJnR�1n = RnJnR

�1n from Proposition B.4. For GnY un;t�1 =

11��0Y

un;t�1, this is so because

(1) S�1n Y un;t�1 =1

1��0Yun;t�1 using S

�1n RnJnR

�1n = 1

1��0RnJnR�1n and (2) Gn = WnS

�1n = S�1n Wn. Also,

as Zunt = (Yun;t�1;WnY

un;t�1;01�kx) = Y

un;t�1(1; 1;01�kx)

0, we have GnZunt�0 = Yun;t�1. This follows because

GnZunt�0 = GnY

un;t�1( 0 + �0) and 0 + �0 = 1� �0. �

Proof for Lemma B.7: See Lemma A.2 and A.4 in Yu, de Jong and Lee (2006). �

Proof for Lemma B.8:

Equation (B.7): Let �(Bn) be its spectral radius (the largest eigenvalue in absolute value) and jj � jj be amatrix norm. It is known from matrix theory that �(Bn) � jjBnjj (see Horn and Johnson (1985)). Takingk�k to be either k�k1 or k�k1, it follows that fkBnkg is bounded because Bn is row sum and column sum

bounded. With the above settings,����� 1nTTXt=1

(cn0~t+ ~Xnt�0)0Bn(cn0~t+ ~Xnt�0)�����

� �(Bn) ������ 1nT

TXt=1

(cn0~t+ ~Xnt�0)0(cn0~t+ ~Xnt�0)

������ kBnk �

����� 1nTTXt=1

(cn0~t+ ~Xnt�0)0(cn0~t+ ~Xnt�0)

�����= jjBnjj �

1

nT

TXt=1

(c0n0cn0~t2 + 2c0n0

~Xnt�0~t+ (~Xnt�0)0(~Xnt�0)).

25

Page 26: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Because 1T

PTt=1 t

2 = 16 (T + 1)(2T + 1) = O(T 2),

���~Xnt�0��� = ���Pt�1h=0

~Xnh�0

��� � t � supn;t��� ~Xnt�0���, ~Xnt is

bounded uniformly in all n and t, and elements of cn0 are also uniformly bounded, we have the result that��� 1nT PTt=1(cn0~t+

~Xnt�0)0Bn(cn0~t+ ~Xnt�0)��� = O(T 2).

Equation (B.8): As E�nt�0ns = �

20minft; sgIn, we have

V ar(1

nT

TXt=1

(cn0~t+ ~Xnt�0)0Bn~�nt) = V ar(1

nT

TXt=1

(cn0~t+ ~Xnt�0)0Bn�nt)

=1

n2T 2

TXt=1

TXs=1

(cn0~t+ ~Xnt�0)0BnE�nt�0nsB0n(cn0~s+ ~Xns�0)

=�20n2T 2

TXt=1

TXs=1

minfs; tg(cn0~t+ ~Xnt�0)0B2n(cn0~s+ ~Xns�0)

� �20n2T 2

TXt=1

t(cn0~t+ ~Xnt�0)

!B2n

TXs=1

�cn0~s+ ~Xns�0

�!

=�20T

3

n

1

n

1

T 3

TXt=1

t(cn0~t+ ~Xnt�0)

!B2n

1

T 2

TXs=1

�cn0~s+ ~Xns�0

�!= O(

T 3

n)

by the uniform boundedness elements of cn0 and Xnt, and the uniform boundedness of B2n in row and columnsums. The result follows.

Equation (B.9): We have 1nT

PTt=1

~�0ntBn~�nt = 1

nT

PTt=1(�

0ntBn�nt)� 1

n��0nTBn��nT .

For the �rst part, E( 1nT

PTt=1(�

0ntBn�nt)) = �20tr(Bn)

�1nT

PTt=1 t

�= O(T ) and

V ar( 1nT

PTt=1 �

0ntBn�nt) = 1

n2T 2

PTt=1

PTs=1 Cov(�

0ntBn�nt; �0nsBn�ns). Using Lemma B.7 for covariance be-

tween U0ntWnt and U0ntWnt (in our case here, Unt =P1

h=1 Pnt;hVn;t+1�h and Wnt =P1

h=1Qnt;hVn;t+1�h,

where Pnt;h = In andQnt;h = Bn for h � t, and Pnt;h = Qnt;h = 0 for h > t.), we have V ar( 1nT

PTt=1 �

0ntBn�nt) =

T 2

n .

For the second part, E( 1n��0nTBn��nT ) = E( 1n

�U0nT �WnT ) where �UnT =P1

h=1�PnT;hVn;T�h and �WnT =P1

h=1�QnT;hVn;T�h with

�PnT;h =

8<: InhT for h � T

0 for h > Tand �QnT;h =

8<: Bn hT for h � T0 for h > T

. (B.40)

Then, using Lemma B.7, E( 1n��0nTBn��nT ) = O(T ) and V ar( 1n

��0nTBn��nT ) = 1

n2Cov(�U0nT �WnT ; �U0nT �WnT ) =

O(T2

n ) becauseP1

h=1�PnT;h �P

0nT;h =

PTh=1(

hT )

2In,P1

h=1�Q0nT;h

�PnT;h =PT

h=1(hT )

2B0n,P1

h=1�QnT;h �Q

0nT;h =PT

h=1(hT )

2BnB0n andPT

h=1 h2 = O(T 3).

Equation (B.10): Because of the uniform boundedness of cn0, ~Dnt and Bn, there exist �nite constants c1

26

Page 27: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

and c2 such that����� 1nTTXt=1

(cn0~t+ ~Xnt�0)0Bn ~Dnt

������ 1

T

TXt=1

�����c0n0Bn ~Dntn

����� � ��~t��+ 1

T

TXt=1

����� (~Xnt�0)0Bn ~Dntn

����� � c1T

TXt=1

j~tj+ c2T

TXt=1

j~tj = O(T ):

Equation (B.11): 1nT

PTt=1(cn0~t+

~Xnt�0)0 ~Unt = 1n

PTt=1(cn0

~tT+

1T~Xnt�0)0 ~Unt = T [ 1nT

PTt=1(cn0

~tT+

1T~Xnt�0)0 ~Unt].

As ~tT and 1

T~Xnt�0 are bounded, using Theorem A.8 in Yu, de Jong and Lee (2006), 1

nT

PTt=1(cn0

~tT +

1T~Xnt�0)0 ~Unt = Op( 1p

nT). Hence, 1

nT

PTt=1(cn0~t+

~Xnt�0)0 ~Unt = Op(q

Tn ).

Equation (B.12): As E�nt�0ns = �

20minft; sgIn, we have

V ar(1

nT

XT

t=1~�0ntBn ~Dnt)

= V ar(1

nT

XT

t=1(�0ntBn ~Dnt) =

1

n2T 2

XT

t=1

XT

s=1( ~D0

ntB0n(E(�nt�0ns))Bn ~Dns)

=�20n2T 2

XT

t=1

XT

s=1minfs; tg( ~D0

ntB0nBn ~Dns) = O(T

n).

Equation (B.13): We have 1nT

PTt=1

~�0nt~Unt = 1

nT

PTt=1

~�0nt~Unt � 1

n��0nT�UnT .

For the �rst part, using Lemma B.7,

E(1

nT

XT

t=1�0ntUnt) =

1

nT

XT

t=1

��20tr(

Xt

h=1Pnt;h)

�= �20

1

nT

�tr(XT

t=1

Xt

h=1Pnt;h)

�= O(1),

becauseP1

h=1 abs(Pnt;h) is row sum and column sum bounded. Also,

V ar(1

nT

XT

t=1�0ntUnt) =

1

n2T 2

XT

t=1

XT

s=1Cov(�0ntUnt; �

0nsUns) = O(

T

n).

This is so as follows. As �nt =P1

h=1Qnt;hVn;t�h where Qnt;h =

8<: In for h = 1; 2; � � � ; t0 for h � t+ 1

, we have

V ar( 1nT

PTt=1 �

0ntUnt) = O(Tn ) using Lemma B.7 because the leading factor

P1h=1Qns;hQ

0ns;h =

Psh=1 In =

s � In andPT

h=1

Pts=1 s = O(T

3).

For the second part, E( 1n��0nT�UnT ) = �20

nT tr�PT

h=1 h � �PnT;h�where �PnT;h is speci�ed in (B.5). So,

E( 1n��0nT�UnT ) = O(1). Also, V ar( 1n��

0nT�UnT ) = 1

n2Cov(�W0nT�UnT ; �W0

nT�UnT ) where �UnT =

P1h=1

�PnT;hVn;t+1�h

and �WnT =P1

h=1�PnT;hVn;t+1�h with �PnT;h speci�ed in (B.5) and �PnT;h speci�ed in (B.40). Then, using

Lemma B.7, we have V ar( 1n��0nT�UnT ) = O(Tn ).

Equation (B.14): This is Theorem A.7 in Yu, de Jong and Lee (2006).

27

Page 28: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Proof for Lemma B.9

Equation (B.15): Denote �(BnB0nB0n) be the spectral radius of BnB0n. Then,

V ar(1

nT

XT

t=1(cn0~t�1 + ~Xn;t�1�0)0BnVnt)

=�20n2T 2

XT

t=1(cn0~t�1 + ~Xn;t�1�0)0BnB0n(cn0~t�1 + ~Xn;t�1�0)

� �20n2T 2

� �(BnB0n) �����XT

t=1(cn0~t�1 + ~Xn;t�1�0)0(cn0~t�1 + ~Xn;t�1�0)

����� �20

n2T 2� kBnB0nk1 �

����XT

t=1(cn0~t�1 + ~Xn;t�1�0)0(cn0~t�1 + ~Xn;t�1�0)

���� = O(Tn ),because

PTt=1 t

2 = O(T 3) in the leading term.

Equation (B.16):

V ar

�1

nT

XT

t=1�0n;t�1BnVnt

�=

�20n2T 2

XT

t=1E(�0n;t�1BnB0n�n;t�1) =

�20n2T 2

XT

t=1tr�E(BnB0n�n;t�1�0n;t�1)

�=

�40n2T 2

� tr(BnB0n) �XT

t=1(t� 1) = O( 1

n).

Then, 1nT

PTt=1 �

0n;t�1BnVnt = Op

�1pn

�with mean zero.

Equation (B.17): As ��n;T�1 =1T

PT�1t=0 �nt =

1T

PT�1t=0

Pt�1h=0 Vnh =

1T

PT�1t=1 (T � t)Vn;t�1, we have

E(��0n;T�1Bn �VnT ) =

1

T 2E[(XT�1

t=1(T � t)Vn;t�1)0Bn

XT

t=1Vnt] = �

20

(T � 1)(T � 2)2T 2

tr(Bn). (B.41)

For the special case where Bn = Mn, E(��0n;T�1Mn

�VnT ) = �20(T�1)(T�2)

2T 2 tr(Mn) = �20(T�1)(T�2)

2T 2 mn because

tr(Mn) = tr(RnJnR�1n ) = tr(Jn) = mn. Also,

V ar(��0n;T�1Bn �VnT )

= V ar(1

T 2(XT�1

t=1(T � t)Vn;t�1)0Bn

XT

t=1Vnt) =

1

T 4V ar(

XT

t=1V 0ntB0n(

XT�1

t=1(T � t)Vn;t�1))

=1

T 4V ar(U0nT;�1WnT;�1),

where UnT;�1 =P1

h=1 Pnt;hVn;t+1�h with Pnt;h =

8<: In for h � T0 for h > T

and WnT;�1 =P1

h=1Qnt;hVn;t+1�h

28

Page 29: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

with Qnt;h =

8<: B0n � (h� 1) for h � T0 for h > T

. From Lemma B.7,

V ar(��0n;T�1Bn �VnT )

=1

T 4�40tr

�In � T � B0nBn �

XT

h=1(h� 1)2 +

XT

h=1(h� 1) �

XT

h=1(h� 1)B0nB0n

�+1

T 4(�4 � 3�40)

XT

h=1(h� 1)2 �

Xn

i=1(Bn)ii(Bn)ii

=1

T 4�40

�tr(B0nBn) � T �

XT

h=1(h� 1)2 + (tr(B0nB0n) �

(T � 1)2T 24

�+Xn

i=1(Bn)ii(Bn)ii �

1

T 4(�4 � 3�40)

XT

h=1(h� 1)2 = O(n).

So, E(��0n;T�1Bn �VnT ) = O(n) and V ar(��

0n;T�1Bn �VnT ) = O(n).

Equation (B.18): This is implied by Theorem A.11 in Yu, de Jong and Lee (2006). �

Proof for Lemma B.10: Using Lemma B.8 and that Ynt has those components, we have the result.

For (B.20), we use (B.7), (B.8) and (B.9) in Lemma B.8. For (B.21), we use (B.10), (B.11), (B.12) and

(B.13) in Lemma B.8. For (B.22), it is implied by Lemma B.1 in Yu, de Jong and Lee (2006). �

Proof for Lemma B.11: Using Lemma B.9 and that Ynt has those components, we have the result.

For (B.23), we use (B.15), (B.16) and (B.17) in Lemma B.9. For (B.24), it is in Lemma B.1 in Yu, de

Jong and Lee (2006). �

Proof for Proposition B.12: The form of the inverse of HT can be checked by direct multiplication

of HT with the right hand side matrix expression (of H�1T ), which will result in an identity matrix. The

explicit expression of H�1T is complicated. But it can be derived by the following motivations. De�ne

QT = Im + TgT d0T and RT = Im + TQ

�1T hT b

0T . It follows, by construction, that HT = QTRT . If both

RT and QT are invertible, then HT must be invertible. By the familiar pattern of QT , its inverse will

have the form (see Dhrymes (1978)) Q�1T = Im � T1+Td0T gT

gT d0T , and also, the inverse of RT has the form

R�1T = Im � T1+Tb0TQ

�1T hT

Q�1T hT b0T . The �nal expression of H

�1T = R�1T Q�1T can be derived by exploring the

explicit expressions of Q�1T , R�1T and their multiplication. �

Proof for Proposition B.13: The following proof is for the case that KT is nonrandom. After we get the

result, it can be extended to the case that KT is random as long as AT is nonsingular with probability 1.

Using the notations in the Proposition B.12, KT = PTHT and K�1T = H�1

T P�1T , where H�1T is in (B.25)

and

P�1T = (T 2cT c0T +AT )

�1 = A�1T � T 2

1 + T 2c0TA�1T cT

A�1T cT c0TA

�1T . (B.42)

Furthermore, denote hT = P�1T dT and gT = P�1T bT , where hT and gT are in Proposition B.12. As cT is

proportional to bT and the explicit inverse formula of PT involves cT , gT = P�1T bT =1

1+T 2c0TA�1T cT

A�1T bT .

29

Page 30: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

This implies the following scalar values: b0T gT =b0TA

�1T bT

1+T 2c0TA�1T cT

, d0T gT = b0ThT =

b0TA�1T dT

1+T 2c0TA�1T cT

, and d0ThT =

d0TA�1T dT � T 2(c0TA

�1T dT )

2

1+T 2c0TA�1T cT

. In terms of orders of magnitude, we have d0ThT = O(1), b0T gT = O(

1T 2 ), d

0T gT =

O( 1T 2 ), and b0ThT = O(

1T 2 ).

With these, one can evaluate �T in H�1T and its limit. The two terms of �T are T (b0ThT + d

0T gT ) =

2b0TA

�1T dT

T (T�2+c0TA�1T cT )

= O( 1T ); and

T 2(b0T gT � d0ThT � b0ThT � d0T gT )

= (b0TA

�1T bT

T�2 + c0TA�1T cT

)(d0TA�1T dT �

(c0TA�1T dT )

2

T�2 + c0TA�1T cT

)� ( Tb0TA�1T dT

1 + T 2c0TA�1T cT

)2

�! 1

!2(d0A�1d� (c

0A�1d)2

c0A�1c):

Hence,

� = limT!1

�T = 1�1

!2

�d0A�1d� (d

0A�1b)2

b0A�1b

�:

As K�1T = H�1

T P�1T = (Im� TBT

�T)P�1T , it remains to consider the limiting behavior of TBT �P�1T where BT is

in (B.27) and P�1T is in (B.42). As b0TP�1T = 1

1+T 2c0TA�1T cT

b0TA�1T and d0TP

�1T = (dT � T 2c0TA

�1T dT

1+T 2c0TA�1T cT

cT )0A�1T ,

these imply the following matrices

gT b0TP

�1T = P�1T bT b

0TP

�1T =

1

(1 + T 2c0TA�1T cT )2

A�1T bT b0TA

�1T ;

hT d0TP

�1T = P�1T dT d

0TP

�1T = A�1T (dT �

T 2c0TA�1T dT

1 + T 2c0TA�1T cT

cT )(dT �T 2c0TA

�1T dT

1 + T 2c0TA�1T cT

cT )0A�1T ;

and

gT d0TP

�1T = (hT b

0TP

�1T )0 = P�1T bT d

0TP

�1T =

1

(1 + T 2c0TA�1T cT )

A�1T bT (dT �T 2cTA

�1T dT

1 + T 2c0TA�1T cT

cT )0A�1T :

In terms of orders of magnitude, hT d0TP�1T = O(1), hT b0TP

�1T = O( 1T 2 ), gT d

0TP

�1T = O( 1T 2 ) and gT b

0TP

�1T =

O( 1T 4 ). Therefore, for TBT �P�1T where BT is in (B.27) and P

�1T is in (B.42), we have T (hT b0T +gT d

0T )P

�1T =

O( 1T ) and

T 2[(d0ThT )gT b0T + (b

0T gT )hT d

0T � (d0T gT )hT b0T � (b0ThT )gT d0T ]P�1T

= T 2(b0T gT )hT d0TP

�1T +O(

1

T 2)

=T 2b0TA

�1T bT

1 + T 2c0TA�1T cT

A�1T (dT �T 2d0TA

�1T cT

1 + T 2c0TA�1T cT

cT )(dT �T 2d0TA

�1T cT

1 + T 2c0TA�1T cT

cT )0A�1T +O(

1

T 2)

�! 1

!2A�1(d� d

0A�1b

b0A�1bb)(d� d

0A�1b

b0A�1bb)0A�1;

because c = !b, which is the limit of (�TBTP�1T ).

Thus, K�1T = (Im � TBT

�T)P�1T = P�1T � 1

�TTBTP

�1T converges to Lk, where

Lk = (A�1 � 1

b0A�1bA�1bb0A�1) +

1

!2�A�1(d� d

0A�1b

b0A�1bb)(d� d

0A�1b

b0A�1bb)0A�1:

30

Page 31: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

For (b) and (c), we have K�1T = (PTHT )

�1 where P�1T = A�1T � T 2

1+T 2c0TA�1T cT

A�1T cT c0TA

�1T and H�1

T =

Im� T�TBT with �T = 1+T (b0ThT+d

0T gT )�T 2j(bT ; dT )0(gT ; hT )j and BT = (hT b0T+gT d0T )�T [(d0ThT )gT b0T+

(b0T gT )hT d0T � (d0T gT )hT b0T � (b0ThT )gT d0T ]. Hence,

K�1T = P�1T � T

�TBTP

�1T (B.43)

= P�1T � T

�T((hT b

0T + gT d

0T )� T [(d0ThT )gT b0T + (b0T gT )hT d0T � (d0T gT )hT b0T � (b0ThT )gT d0T ])P�1T

= P�1T � T

�T(hT b

0T + gT d

0T )P

�1T +

T 2

�T[(d0ThT )gT b

0T + (b

0T gT )hT d

0T � (d0T gT )hT b0T � (b0ThT )gT d0T ]P�1T

= P�1T � T

�T(P�1T dT b

0TP

�1T + P�1T bT d

0TP

�1T )

+T 2

�T[(d0TP

�1T dT )P

�1T bT b

0TP

�1T + (b0TP

�1T bT )P

�1T dT d

0TP

�1T ]

� T2

�T[(d0TP

�1T bT )P

�1T dT b

0TP

�1T + (b0TP

�1T dT )P

�1T bT d

0TP

�1T ].

As

c0TP�1T cT =

c0TA�1T cT

1 + T 2c0TA�1T cT

= O(T�2), c0TP�1T dT =

c0TA�1T dT

1 + T 2c0TA�1T cT

= O(T�2), (B.44a)

d0TP�1T dT = d0TA

�1T dT �

T 2(c0TA�1T dT )

2

1 + T 2c0TA�1T cT

= O(1) (B.44b)

and bT is proportional to cT , we have

c0TK�1T cT = c0TP

�1T cT �

2T

�T(c0TP

�1T dT b

0TP

�1T cT )

+T 2

�T[(d0TP

�1T dT )c

0TP

�1T bT b

0TP

�1T cT + (b

0TP

�1T bT )c

0TP

�1T dT d

0TP

�1T cT ]

+2T 2

�T[�(d0TP�1T bT )c

0TP

�1T dT b

0TP

�1T cT ].

Using (B.44), we have c0TK�1T cT = O(T

�2) and similarly,K�1T cT = O(T

�1). Also, we have that T 2c0TK�1T cT =

T 2c0TP�1T cT +O(T

�1). This is so because T 2c0TP�1T cT =

T 2c0TA�1T cT

1+T 2c0TA�1T cT

and 1� T 2c0TA�1T cT

1+T 2c0TA�1T cT

= O(T�2).

When this Proposition is applied to Proposition 2.1 and B.14, we have bT = cT . Also, it can be extended

to the case where dT and AT are stochastic. �

Proof for Proposition B.14: We are going to use Proposition 2.1 to prove in Proposition B.14. First, we

need to show that �T = 1��d0TA

�1T dT �

(d0TA�1T cT )

2

c0TA�1T cT

�has the property that � �plimT!1�T 6= 0.

In our paper, � = 1 �hd0A�1d� (d0A�1c)2

c0A�1c

i= 1� d0A�1d

h1� (d0A�1c)2

c0A�1cd0A�1d

i. Using Cauchy inequality,

(d0A�1c)2

c0A�1cd0A�1d � 1; using positive de�niteness of A, (d0A�1c)2

c0A�1cd0A�1d � 0. Hence, 0 � 1 � (d0A�1c)2

c0A�1cd0A�1d � 1.Also, d0A�1d < 1 in our application because it is equivalent to limT!1 (dnT )

0(Hs

nT =!nT )�1(dnT ) < 1

where dnT = 1!nT

�1

nT 2

TPt=1( ~Zsnt; Gn

~Zsnt�0)0 ~Y un;t�1

�, Hs

nT =1nT

TPt=1( ~Zsnt; Gn

~Zsnt�0)0( ~Zsnt; Gn

~Zsnt�0) and !nT =

31

Page 32: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

1nT 3

PTt=1

~Y u0n;t�1~Y un;t�1. This is so, as we have 1

nT 2

TXt=1

~Y u0n;t�1( ~Zsnt; Gn ~Z

snt�0)

! 1

nT 3

TXt=1

~Y u0n;t�1 ~Yun;t�1

!�1�

1

nT

TXt=1

( ~Zsnt; Gn ~Zsnt�0)

0( ~Zsnt; Gn ~Zsnt�0)

!�1 1

nT 2

TXt=1

~Y u0n;t�1( ~Zsnt; Gn ~Z

snt�0)

!0< 1

and d0A�1d < 1 because of the generalized Schwartz inequality and that, for large enough T , ( ~Y un;t�1)i

is not a linear function of ( ~Zsnt; Gn ~Zsnt�0)i with some positive probability. Hence, combined with 0 �

1� (d0A�1c)2

c0A�1cd0A�1d � 1 and d0A�1d < 1, � > 0. �

Proof for (B.28) and (B.29): This is implied by (b) and (c) in Proposition 2.1 when KT there is taken

to be H1;nT here. �

Proof for (B.30): To prove it, we take H1;nT =!nT to be KT . As K�1T = O(1) and K�1

T cT = Op(T�1) ,

we need to show K�1T (T 2cT + TdT ) = Op(1).

From (B.43), we have

K�1T =P�1T � 1

�TfTP�1T (dT b

0T + bT d

0T )P

�1T � T 2[(d0TP�1T dT )P

�1T bT b

0TP

�1T

+(b0TP�1T bT )P

�1T dT d

0TP

�1T � (b0TP�1T dT )P

�1T (dT b

0T + bT d

0T )P

�1T ]g;

where �T = 1 + T (b0TP�1T dT + d

0TP

�1T bT ) � T 2[(b0TP�1T bT )(d

0TP

�1T dT ) � (b0TP�1T dT )(d

0TP

�1T bT )]. It follows

that

K�1T dT =

1

�TfP�1T dT + T (b

0TP

�1T dT )P

�1T dT � T (d0TP�1T dT )P

�1T bT g;

and

K�1T bT =

1

�TfP�1T bT + T (b

0TP

�1T dT )P

�1T bT � T (b0TP�1T bT )P

�1T dT g. (B.45)

As bT = cT in our case, after arrangement of terms,

K�1T (T 2cT +TdT ) =

1

�TfT 2(1+Tc0TP�1T dT �d0TP�1T dT )P

�1T cT +T (1+Tc

0TP

�1T dT �T 2c0TP�1T cT )P

�1T dT g:

The �rst part on the right hand side is of order Op(1) because P�1T cT = Op(

1T 2 ) and c

0TP

�1T dT = Op(

1T 2 ).

It is of interest to see that for the second half, because

T + T 2c0TP�1T dT � T 3c0TP�1T cT

= T [1 +T

1 + T 2c0TA�1T cT

c0TA�1T dT �

T 2

1 + T 2c0TA�1T cT

c0TA�1T cT ] =

T (1 + Tc0TA�1T dT )

1 + T 2c0TA�1T cT

= Op(1);

so K�1T (T 2cT + TdT ) = Op(1). �

Proof for (B.31): As H1;nT and HnT have the form speci�ed in (2.14), we need to prove that for KT =

T 2cT c0T + T (cT d

0T + dT c

0T ) + AT , we have 1 � c0TK�1

T (T 2cT + TdT + Td2;nT cT + Hs2;nT =!nT ) = Op(T

�1)

32

Page 33: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

where d2;nT and Hs2;nT =!nT are de�ned in (2.14). As c

0TK

�1T = Op(T

�1) and c0TK�1T cT = Op(T

�2), we need

to show 1� c0TK�1T (T 2cT + TdT ) = Op(T

�1).

From (B.43),

T 2c0TK�1T cT = T

2c0TP�1T cT �

T 3

�T(c0TP

�1T dT b

0TP

�1T cT + c

0TP

�1T bT d

0TP

�1T cT )

+T 4

�T[(d0TP

�1T dT )c

0TP

�1T bT b

0TP

�1T cT + (b

0TP

�1T bT )c

0TP

�1T dT d

0TP

�1T cT ]

+T 4

�T[�(d0TP�1T bT )c

0TP

�1T dT b

0TP

�1T cT � (b0TP�1T dT )c

0TP

�1T bT d

0TP

�1T cT ]

=T 2c0TA

�1T cT

1 + T 2c0TA�1T cT

+1

�T(T 4c0TP

�1T bT b

0TP

�1T cT )(d

0TP

�1T dT ) +Op(T

�1)

by using (B.44) because bT = cT for our case, and, similarly,

Tc0TK�1T dT = �

1

�T(T 2c0TP

�1T bT )(d

0TP

�1T dT ) +Op(T

�1):

Hence,

T 2c0TK�1T cT + Tc

0TK

�1T dT

=T 2c0TA

�1T cT

1 + T 2c0TA�1T cT

+1

�T(T 4c0TP

�1T bT b

0TP

�1T cT )(d

0TP

�1T dT )�

1

�T(T 2c0TP

�1T bT )(d

0TP

�1T dT ) +Op(T

�1)

=T 2c0TA

�1T cT

1 + T 2c0TA�1T cT

+d0TP

�1T dT�T

(T 2c0TP�1T bT )((T

2c0TP�1T bT )� 1) +Op(T�1).

As bT = cT , (T 2c0TP�1T bT )�1 = � 1

1+T 2c0TA�1T cT

= Op(T�2), using the fact that �T and d0TP

�1T dT are Op(1),

we have 1� c0TK�1T (T 2cT + TdT ) = Op(T

�1). �

Proof for (B.32):

0@ H1;nT H2;nT

H02;nT H3;nT

1A = !nT�T 2 � c�c�0 + T � dnT � c�0 + T � c� � d0nT +Hs

nT =!nT�where

c� = (c0; 1)0 and dnT = (d01;nT ; d2;nT )0. As we have already established that� > 0, by using Proposition B.13,

inverse of

0@ H1;nT H2;nT

H02;nT H3;nT

1A exists and is Op(1). Using the formula of inverting a partitioned matrix, we

can get�H3;nT �H0

2;nTH�11;nTH2;nT

��1exists and is Op(1). �

Proof for (B.33): To prove plimT!1

�H3;nT �H0

2;nTH�11;nTH2;nT

��16= 0, we will make use of the matrix

algebra result of Proposition B.13 we have developed. Denoting HnT =

0@ H1;nT H2;nT

H02;nT H3;nT

1A and LH =

plimT!1H�1nT , we are going to prove that e

0LHe 6= 0 where e is a unit vector such that e = (0; � � � ; 0; 1)0. Here,HnT takes the form of HnT = !nT �

�T 2 � c�c�0 + T � dnT � c�0 + T � c� � d0nT +Hs

nT =!nT�. From Proposition

B.13, for KT =�T 2 � cT c0T + T � dnT � c0T + T � cT � d0nT +AT

�, the limit of K�1

T is

Lk = (A�1 � 1

c0A�1cA�1cc0A�1) +

1

�A�1(d� d

0A�1c

c0A�1cc)(d� d

0A�1c

c0A�1cc)0A�1

33

Page 34: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

where � = 1�hd0A�1d� (d0A�1c)2

c0A�1c

i. Then,

e0Lke = (e0A�1e� (e

0A�1c)2

c0A�1c) +

1

�e0A�1d� d

0A�1c

c0A�1ce0A�1c

�2.

The Cauchy inequality guarantees that e0A�1e� (e0A�1c)2

c0A�1c > 0 as e and c are not proportional. The second

part of e0Lke will be nonnegative if� > 0 where� = 1�hd0A�1d� (d0A�1c)2

c0A�1c

i= 1�d0A�1d

h1� (d0A�1c)2

c0A�1cd0A�1d

i,

which is proved in the beginning of the proof for Proposition B.14. �

Proof for (B.34): From (B.32) and (B.33), H3;nT �H02;nTH�1

1;nTH2;nT is Op(1) and is a function of !nT and

dnT ; more explicitly, !nT and !nT dnT in (2.14). As we have !nT �E!nTp! 0 and !nT dnT �E!nT dnT

p! 0

(by using Lemma B.10) where !nT and dnT are Op(1), we have the result. �

Proof for (B.35): (B.33) states that plimT!1

�H3;nT �H0

2;nTH�11;nTH2;nT

��16= 0. AsH3;nT�H0

2;nTH�11;nTH2;nT

is a scalar, we have plimT!1

�H3;nT �H0

2;nTH�11;nTH2;nT

�also exists.

Similarly, plimT!1�EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT

�also exists. Using (B.34), we have the result.

Proof for Proposition B.15: From (B.45), using bT = cT for our case, we have

TK�1T cT =

1

�TfTP�1T cT + (T

2c0TP�1T dT )P

�1T cT � (T 2c0TP�1T cT )P

�1T dT g.

As �T = Op(1) and bounded away from zero, c0TP�1T dT = Op(T

�2) and P�1T cT = Op(T�2), we have

TK�1T cT = � 1

�T(T 2c0TP

�1T cT )P

�1T dT + Op(T

�1). Also, as P�1T = A�1T � T 2

1+T 2c0TA�1T cT

A�1T cT c0TA

�1T and

T 2c0TP�1T cT =

T 2c0TA�1T cT

1+T 2c0TA�1T cT

, we have

TK�1T cT = �

1

�T

T 2c0TA�1T cT

1 + T 2c0TA�1T cT

(A�1T dT �T 2

1 + T 2c0TA�1T cT

A�1T cT c0TA

�1T dT ) +Op(T

�1). (B.46)

Also, for K�1T , using (B.43), it is Op(1) and is just a function of A

�1T , cT and dT . We are going to apply the

above results to ��;nT where

��;nT =1

�2

0@ EHnT (�) 0

0 0

1A+0BB@0 0 0

0 1n

�tr(G0n(�)Gn(�)) + tr(G

2n(�))

�1�2n tr(Gn(�))

0 1�2n tr(Gn(�))

12�4

1CCA ,and

HnT (�) =1

nT

TXt=1

( ~Znt; Gn(�) ~Znt�)0( ~Znt; Gn(�) ~Znt�). (B.47)

As ~~Y un;t�1 = Wn~~Y un;t�1 = Gn ~Z

unt�0 (see Proposition B.4 in Appendix B), we have ~Z

unt =

~~Y un;t�1 � c0 wherec = (1; 1;01�kx)

0 and Gn(�) ~Zunt� = +�1��

~~Y un;t�1. Hence, we have ( ~Zunt; Gn(�) ~Zunt�) =

~~Y un;t�1 � (c0; +�1�� )0.

34

Page 35: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Therefore, denote c(�) = (c0; +�1�� ; 0)0, we can write ��;nT as

��;nT = (E!nT )�T 2 � c(�)c0(�) + T � (EdnT (�)) � c0(�) + T � c0(�) � (EdnT (�))0 +�s�;nT =E!nT

�,

where c(�) = (1; 1;01�kx ; +�1�� ; 0)

0, dnT (�) = 1E!nT

( 1nT 2

TPt=1( ~Zsnt; Gn(�) ~Z

snt�; 0)

0 ~~Y un;t�1)0, �s�;nT =

1�2

0@ EHsnT (�) 0

0 0

1A+0BB@0 0 0

0 1n

�tr(G0n(�)Gn(�)) + tr(G

2n(�))

�1�2n tr(Gn(�))

0 1�2n tr(Gn(�))

12�4

1CCA andHsnT (�) =

1nT

TPt=1( ~Zsnt; Gn(�) ~Z

snt�)

0( ~Zsnt; Gn(�) ~Zsnt�).

For c(�) evaluated at �0 and �nT , we have c(�0) = (c0; 0+�01��0 ; 0)

0 = (c0; 1; 0)0 (because 0 + �0 + �0 = 1

under Assumption 6) and c(�nT ) = (c0; nT+�nT1��nT

; 0)0. Also, nT+�nT1��nT

= 1 + Op

�max

�1pnT 3

; 1T 2

��. This

is so as follows. From Theorem 4.2, nT + �nT + �nT � 1 = Op

�max

�1pnT 3

; 1T 2

��. As �nT � 1 6= 0

for large enough T with probability close to one12 , nT+�nT1��nT

= 1 + 11��nT

Op

�max

�1pnT 3

; 1T 2

��. Hence,

nT+�nT1��nT

= 1 +Op

�max

�1pnT 3

; 1T 2

��.

Hence, for ��0;nT , we have

��0;nT = (E!nT )�T 2 � c(�0)c0(�0) + T � EdnT (�0) � c0(�0) + T � c0(�0) � (EdnT (�0))0 +�s�0;nT =E!nT

�,

where c(�0) = (c�0; 0)0, dnT (�0) = 1E!nT

( 1nT 2

TPt=1( ~Zsnt; Gn

~Zsnt�0; 0)0 ~~Y un;t�1)

0 and �s�0;nT =1�20

0@ EHsnT 0

0 0

1A+0BB@0 0 0

0 1n

�tr(G0nGn) + tr(G

2n)�

1�20ntr(Gn)

0 1�20ntr(Gn)

12�40

1CCA. For ���nT ;nT (see (B.36)), we have���nT ;nT = !nT

�T 2 � c(�nT )c0(�nT ) + T � dnT (�nT ) � c0(�nT ) + T � c0(�nT ) � d0nT (�nT ) + ��s�nT ;nT =!nT

�,

(B.48)

where c(�nT ) = (1; 1;0; nT+�nT1��nT

; 0)0, dnT (�nT ) = 1!nT

( 1nT 2

TPt=1( ~Zsnt; Gn(�nT ) ~Z

snt�nT ; 0)

0 ~~Y un;t�1)0 and ��s

�nT ;nT=

1�2nT

0@ HsnT (�nT ) 0

0 0

1A+0BBB@0 0 0

0 1n

htr(G0n(�nT )Gn(�nT )) + tr(G

2n(�nT ))

i1

�2nTntr(Gn(�nT ))

0 1�2nTn

tr(Gn(�nT ))1

2�4nT

1CCCA.As nT+�nT

1��nT= 1+Op

�max

�1pnT 3

; 1T 2

��, we have c(�nT )� c(�0) = (0; 0;0; Op(max( 1p

nT 3; 1T 2 )); 0)

0. For

dnT (�nT )�EdnT (�0) = [dnT (�nT )�dnT (�0)]+[dnT (�0)�EdnT (�0)], dnT (�nT )�dnT (�0) isOp�max

�1pnT; 1T

��as �nT � �0 = Op

�max

�1pnT; 1T

��from (4.6); also, dnT (�0) � EdnT (�0) = Op

�1pnT

�using (B.21) in

Lemma B.10. Hence, dnT (�nT )�EdnT (�0) = Op�max

�1pnT; 1T

��. From Equation (C.9) and (C.10) in Yu,

12From (3.28), �nT � �0 = Op�max

�1pnT; 1T

��, this implies that 1 � �nT = 1 � �0 + Op

�max

�1pnT; 1T

��. As �0 6= 1

under Assumption 5 (If �0 = 1, Sn(�0) = In �Wn would not be invertible because Wn is row normalized under Assumption

1.), 1� �nT 6= 0 for large enough T with probability close to one.

35

Page 36: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

de Jong and Lee (2006), (��s�nT ;nT

)�1 � (�s�0;nT )�1 = Op

�max

�1pnT; 1T

��. Also, from (B.8) and (B.9) in

Lemma B.8, !nT � E!nT = Op�max

�1pnT; 1T

��.

To prove (B.37): From (B.43), ���1�nT ;nT

is Op(1) and is a simple function of !nT ; c(�nT ), dnT (�nT ),

(��s�nT ;nT

)�1. Similarly, ��1�0;nT is O(1) and is a simple function of E!nT ; c(�0), EdnT (�0), (��s�0;nT )�1.

As !nT � E!nT , c(�nT ) � c(�0), dnT (�nT ) � EdnT (�0) and (��s�nT ;nT )�1 � (�s�0;nT )

�1 are all having at

most the order Op�max

�1pnT; 1T

��, it implies that elements of ���1

�nT ;nT� ��1�0;nT will be of the order

Op

�max

�1pnT; 1T

��.

To prove (B.38): We are going to show �rst that T �h���1�nT ;nT

c(�nT )� ��1�0;nT c(�0)i= Op

�max

�1pnT; 1T

��.

From (B.46), T � ���1�nT ;nT

� c(�nT ) is a simple function of !nT , c(�nT ), dnT (�nT ), (��s�nT ;nT )�1 and T �

��1�0;nT � c(�0) is a simple function of E!nT , c(�0), EdnT (�0), (�s�0;nT

)�1. As !nT � E!nT , c(�nT ) � c(�0),dnT (�nT )� EdnT (�0) and (��s�nT ;nT )

�1 � (�s�0;nT )�1 are all having at most the order Op

�max

�1pnT; 1T

��,

it implies that elements of T �h���1�nT ;nT

c(�nT )� ��1�0;nT c(�0)iwill be of the order Op

�max

�1pnT; 1T

��. As

T �h���1�nT ;nT

� ��1�0;nTi� c(�0) = T � ���1�nT ;nT (c(�0)� c(�nT ))+T �

���1�nT ;nT

c(�nT )���1�0;nT c(�0), (B.38) follows

because T � (c(�nT )� c(�0)) = Op�max

�1pnT; 1T

��and ���1

�nT ;nTis Op(1). �

B.4 Proof for Proposition 2.2

We have already proved the central limit theorem for statistics like QsnT for stationary case (see Theorem

2.4 in Yu, de Jong and Lee (2006)). As QnT = QsnT +QunT speci�ed in (2.15), we can prove that QnT still

behaves like QsnT . Rewrite QnT as

QnT =

TXt=1

�Un;t�1 +

kTT(RnJnR

�1n )�n;t�1

�0� Vnt

+TXt=1

�Dnt +

kTT(RnJnR

�1n )

�cn0~t�1 + ~Xn;t�1�0

��0� Vnt

+TXt=1

�V 0ntBnVnt � �20trBn

�,

then QnT has just another form of QsnT so that the central limit theorem in Yu, de Jong and Lee (2006) is

applicable. To con�rm this, we need to show that

(1) For Wnt = Unt+ kTT (RnJnR

�1n )�n;t =

P1h=1Qnt;hVn;t+1�h,

P1h=1 abs(Qnt;h) is row sum and column

sum bounded uniformly in n and t;

(2) For Dnt = Dnt+ kTT (RnJnR

�1n )

�cn0~t�1 + ~Xn;t�1�0

�, elements of Dnt is bounded uniformly in n and

t.

For (1), as �nt =Pt�1

h=0 Vnh, we have Qnt;h =

8<: Pnt;h +kTT (RnJnR

�1n ) for h � t� 1

Pnt;h for h > t� 1. Hence,

1Xh=1

abs(Qnt;h) =t�1Xh=1

abs(Pnt;h +kTT(RnJnR

�1n )) +

1Xh=t

abs(Pnt;h).

36

Page 37: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

This implies that kP1

h=1 abs(Qnt;h)k � kP1

h=1 abs(Pnt;h)k + Pt�1

h=1 abs(kTT (RnJnR

�1n ))

, where k�k rep-resents either the row or column sum norm. Here,

P1h=1 abs(Pnt;h) is row sum and column sum bounded

uniformly in n and t. Also, Pt�1

h=1 abs(kTT (RnJnR

�1n ))

is row sum and column sum bounded uniformly in

n and t because RnJnR�1n is row sum and column sum bounded and KT is O(1).

For (2), Dnt = Dnt + kTT (RnJnR

�1n )

�cn0~t�1 + ~Xn;t�1�0

�where ~Xn;t =

Pt�1h=0

~Xnt. As elements of cn0,

Xnt and Dnt are uniformly bounded in n and t and RnJnR�1n is row sum and column sum bounded, usingtT � 1 for t = 1; 2; :::; T , elements of Dnt are uniformly bounded. �

B.5 Proof about �2nT (�) and �2nT (�0)

Using (2.9), we have

�2nT (�) = (�0 � �)2(H3;nT �H02;nTH�1

1;nTH2;nT ) +1

nT

TXt=1

~V 0ntS0�1n S0n(�)Sn(�)S

�1n~Vnt (B.49)

+2(�0 � �)1

nT

TXt=1

(�00~Z 0ntG

0n �H0

2;nTH�11;nT

~Z 0nt)Sn(�)S�1n~Vnt

� 1

nT

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!0H�11;nT

1

nT

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!,

@�2nT (�)

@�= 2(�� �0)(H3;nT �H0

2;nTH�11;nTH2;nT )�

2

nT

TXt=1

~V 0ntG0nSn(�)S

�1n~Vnt (B.50)

�2(�0 � �)1

nT

TXt=1

(�00 ~Z0ntG

0n �H0

2;nTH�11;nT

~Z 0nt)Gn ~Vnt

� 2

nT

TXt=1

(�00 ~Z0ntG

0n �H0

2;nTH�11;nT

~Z 0nt)Sn(�)S�1n~Vnt

+2

1

nT

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!0H�11;nT

1

nT

TXt=1

~Z 0ntGn~Vnt

!,

@2�2nT (�)

@�2= 2(H3;nT �H0

2;nTH�11;nTH2;nT ) +

2

nT

TXt=1

~V 0ntG0nGn ~Vnt (B.51)

+4

nT

TXt=1

(�00~Z 0ntG

0n �H0

2;nTH�11;nT

~Z 0nt)Gn ~Vnt

�2 1

nT

TXt=1

~Z 0ntGn~Vnt

!0H�11;nT

1

nT

TXt=1

~Z 0ntGn~Vnt

!.

To study �2nT (�),@�2nT (�)

@� and @2�2nT (�)

@�2, we need the following (B.52).

1

nT

TXt=1

~Zu0ntBn ~Vnt =

1

nT

TXt=1

~Y u0n;t�1Bn ~Vnt

!� c, (B.52a)

Gn ~Zunt�0 � ~ZuntH�1

1;nTH2;nT = ~Y u0n;t�1 � (1� c0H�11;nTH2;nT ). (B.52b)

37

Page 38: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

For �2nT (�) in (B.49), using (B.32) in Proposition B.14, (�0��)2(H3;nT�H02;nTH�1

1;nTH2;nT ) is j�� �0j�Op(1).

Using (B.19) and Sn(�)S�1n = In � (�� �0)Gn, 1nT

TPt=1

~V 0ntS0�1n S0n(�)Sn(�)S

�1n~Vnt = �

20 + j�� �0j �Op(1) +

Op

�max

�1pnT; 1T

��. For 2(�0 � �) 1

nT

TPt=1(�00

~Z 0ntG0n � H0

2;nTH�11;nT

~Z 0nt)Sn(�)S�1n~Vnt, it can be rewritten

as a sum of stationary component and nonstationary component. For the stationary component 2(�0 �

�) 1nT

TPt=1(�00 ~Z

s0ntG

0n�H0

2;nTH�11;nT

~Zs0nt)Sn(�)S�1n~Vnt, using (B.24), it is j�� �0j �Op

�max

�1pnT; 1T

��. For the

nonstationary component 2(�0 � �) 1nT

TPt=1(�00 ~Z

u0ntG

0n �H0

2;nTH�11;nT

~Zu0nt)Sn(�)S�1n~Vnt, it is equal to

2(�0 � �)(1� c0H�11;nTH2;nT )

1

nT

TXt=1

~Y u0n;t�1Sn(�)S�1n~Vnt

using (B.52). Then, using (B.23) and (1�c0H�11;nTH2;nT ) = Op(T

�1) from (B.31), 2(�0��) 1nT

TPt=1(�00 ~Z

u0ntG

0n�

H02;nTH�1

1;nT~Zu0nt)Sn(�)S

�1n~Vnt is j�� �0j�Op

�max

�1pnT; 1T

��. For the last term of �2nT (�) in (B.49), we have

1nT

TPt=1

~Z 0ntSn(�)S�1n~Vnt =

1nT

TPt=1

~Zu0ntSn(�)S�1n~Vnt +

1nT

TPt=1

~Zs0ntSn(�)S�1n~Vnt. As 1

nT

TPt=1

~Zs0ntSn(�)S�1n~Vnt =

Op

�max

�1pnT; 1T

��from (B.24) and 1

nT

TPt=1

~Zu0ntSn(�)S�1n~Vnt = c � Op

�max

�1;q

Tn

��from (B.52) and

(B.23), using H�11;nT is Op(1), H

�11;nT c is Op(T

�1) and c0H�11;nT c is Op(T

�2) from Proposition B.14, we have 1

nT

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!0H�11;nT

1

nT

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!= Op

�max

�1

nT;

1pnT 3

;1

T 2

��. (B.53)

Hence, �2nT (�) = �20 + j�� �0j �Op(1) +Op

�max

�1pnT; 1T

��. Also,

�2nT (�) = (�� �0)2(H3;nT �H02;nTH�1

1;nTH2;nT ) + �20

1

ntr(S0�1n S0n(�)Sn(�)S

�1n ) +Op

�max

�1

T;1pnT

��.

(B.54)

Similarly, we can get the behavior of @�2nT (�)@� and @2�2nT (�)

@�2by using Lemma B.11, (B.52) and Proposition

B.14. The results are summarized as follows.

�2nT (�) = �20 + j�� �0j �Op(1) +Op�max

�1pnT;1

T

��, (B.55a)

@�2nT (�)

@�= ��20

2

ntrGn + j�� �0j �Op(1) +Op

�max

�1pnT;1

T

��, (B.55b)

@2�2nT (�)

@�2= 2(H3;nT �H0

2;nTH�11;nTH2;nT ) + 2�

20

1

ntrG0nGn +Op

�max

�1pnT;1

T

��. (B.55c)

Furthermore, at � = �0, we have, from (B.50),

pnT@�2nT (�0)

@�= � 2p

nT

TXt=1

~V 0ntG0n~Vnt �

2pnT

TXt=1

(�00 ~Z0ntG

0n �H0

2;nTH�11;nT

~Z 0nt) ~Vnt

+2

1pnT

TXt=1

~Z 0nt ~Vnt

!0H�11;nT

1

nT

TXt=1

~Z 0ntGn ~Vnt

!.

38

Page 39: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Using (B.53), we have

pnT@�2nT (�0)

@�= � 2p

nT

TXt=1

~V 0ntG0n~Vnt �

2pnT

TXt=1

(�00 ~Z0ntG

0n �H0

2;nTH�11;nT

~Z 0nt) ~Vnt (B.56)

+Op

�max

�1pnT;1

T;

rn

T 3

��. �

B.6 Proof about ��2nT (�) and ��2nT (�0)

From (3.18), we have

��2nT (�) = (�0 � �)2(EH3;nT � EH02;nT (EH1;nT )

�1EH2;nT ) +1

nTE

TXt=1

~V 0ntS0�1n S0n(�)Sn(�)S

�1n~Vnt

+2(�0 � �)1

nTE

TXt=1

(�00 ~Z0ntG

0n � EH0

2;nT (EH1;nT )�1 ~Z 0nt)Sn(�)S

�1n~Vnt

� 1

nTE

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!0(EH1;nT )

�1

E1

nT

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!, (B.57)

@��2nT (�)

@�= 2(�� �0)(EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT )�

2

nTE

TXt=1

~V 0ntG0nSn(�)S

�1n~Vnt

�2(�0 � �)1

nTE

TXt=1

(�00 ~Z0ntG

0n � EH0

2;nT (EH1;nT )�1 ~Z 0nt)Gn ~Vnt

� 2

nTE

TXt=1

(�00~Z 0ntG

0n � EH0

2;nT (EH1;nT )�1 ~Z 0nt)Sn(�)S

�1n~Vnt

+2

E1

nT

TXt=1

~Z 0ntSn(�)S�1n~Vnt

!0(EH1;nT )

�1

E1

nT

TXt=1

~Z 0ntGn~Vnt

!, (B.58)

@2��2nT (�)

@�2= 2(EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT ) +

2

nTE

TXt=1

~V 0ntG0nGn ~Vnt

+4

nTE

TXt=1

(�00 ~Z0ntG

0n � EH0

2;nT (EH1;nT )�1 ~Z 0nt)Gn ~Vnt

�2 1

nTE

TXt=1

~Z 0ntGn ~Vnt

!0(EH1;nT )

�1

1

nTE

TXt=1

~Z 0ntGn ~Vnt

!. (B.59)

Using Lemma B.11, (B.52) and Proposition B.14, similarly as we derived (B.55), we have

��2nT (�) = �20 + j�� �0j �Op(1) +O�1

T

�, (B.60a)

@��2nT (�)

@�= ��20

2

ntrGn + j�� �0j �Op(1) +O

�1

T

�, (B.60b)

@2��2nT (�)

@�2= 2(EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT ) + 2�

20

1

ntrG0nGn +

�1

T

�. (B.60c)

39

Page 40: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Furthermore,

��2nT (�) = (�� �0)2(EH3;nT � EH02;nT (EH1;nT )

�1EH2;nT ) + �20

1

ntr(S0�1n S0n(�)Sn(�)S

�1n ) +O

�1

T

�: �(B.61)

C Proof for Theorems

C.1 Proof of Claim 3.1

To prove 1nT lnLn;T (�)�Qn;T (�)

p! 0 uniformly in � in any compact parameter space �:

As lnLn;T (�) = �nT2 (ln 2�+1)�

nT2 ln �

2nT (�)+T ln jSn(�)j and Qn;T (�) = � 1

2 (ln 2�+1)�12 ln�

�2nT (�)+

1n ln jSn(�)j ((2.10) and (3.19)),

1nT lnLn;T (�) � Qn;T (�) =

12 ln�

�2nT (�) � 1

2 ln �2nT (�). By the mean value

theorem, 1nT lnLn;T (�) � Qn;T (�) = � 1

21

~�2n;T (�)(�2nT (�) � ��2nT (�)) where ~�2nT (�) lies between �

2nT (�) and

��2nT (�). We need to show that (1) �2nT (�) � ��2nT (�)

p! 0 uniformly in � and (2) ~�2nT (�) is bounded away

from zero uniformly in � with probability one.

To prove (1): We have �2nT (�) and ��2nT (�) in (B.54) and (B.61). Using (B.34) in Proposition B.14,

�2nT (�)� ��2nT (�)p! 0 uniformly in �.

To prove (2): As ~�2nT (�) lies between �2nT (�) and �

�2nT (�), we have

1~�2nT (�)

� maxf 1�2nT (�)

; 1��2nT (�)

g.Denote �2nT (�) = �20

1n tr(S

0�1n S0n(�)Sn(�)S

�1n ), then �2nT (�) is uniformly bounded away from zero13 . As

H3;nT �H02;nTH�1

1;nTH2;nT is nonnegative14 , �2nT (�) and �

�2nT (�) are uniformly bounded away from zero. So,

1~�2nT (�)

is uniformly bounded.

Combining (1) �2nT (�) � ��2nT (�)p! 0 uniformly in � and (2) 1

~�2nT (�)is Op(1) uniformly in �, we have

1nT lnLn;T (�)�Qn;T (�)

p! 0 uniformly in �.

To prove Qn;T (�) is uniformly equicontinuous in � in any compact parameter space �:

To prove this property, from the expression of Qn;T (�), the followings are su¢ cient: (1) 1n ln jSn(�)j is

uniformly equicontinuous; (2) (���0)2(EH3;nT �EH02;nT (EH1;nT )

�1EH2;nT ) is uniformly equicontinuous;

(4) �2n(�) is uniformly equicontinuous.

For (1), 1n ln jSn(�2)j �1n ln jSn(�1)j =

1n tr

�WnS

�1n

�����(�2 � �1) where �� lies between �2 and �1. As

S�1n (�) is uniformly bounded in row and column sums, uniformly in � 2 �, 1n tr�WnS

�1n

�����is bounded,

we have 1n ln jSn(�)j is uniformly equicontinuous. For (2), because � is bounded and because EH3;nT �

EH02;nT (EH1;nT )

�1EH2;nT is O(1) according to Proposition B.14, the result follows. For (3), �2n(�2) ��2n(�1) =

�20n tr(S

0�1n S0n(�2)Sn(�2)S

�1n )� �20

n tr(S0�1n S0n(�1)Sn(�1)S

�1n ). Using Sn(�)S�1n = In � (�� �0)Gn,

�2n(�2)� �2n(�1) = �20�(�2 � �1) (�2 + �1 � 2�0) trG

0nGn

n � (�2 � �1)tr(G0

n+Gn)n

�. As elements of G0nGn and

Gn are uniformly bounded, �2n(�) is uniformly equicontinuous. �13See the supplement to Lee (2004), Page 8 for the proof of consistency, available in http://economics.sbs.ohio-state.edu/lee/.14Here, H3;nT �H0

2;nTH�11;nTH2;nT � 0 because of the Cauchy inequality.

40

Page 41: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

C.2 Proof of Nonsingularity of Information Matrix (Scalar)

From (3.21), @2Qn;T (�0)=@�2 = � 1

�20(EH3;nT�EH0

2;nT (EH1;nT )�1EH2;nT )� 1

n

�trG0nGn + trG

2n �

2(trGn)2

n

�+

Op�T�1

�. Then, using (B.35) in Proposition B.14, limT!1

@2Qn;T (�0)

@�2= � 1

�20plimT!1(H3;nT�H0

2;nTH�11;nTH2;nT )�

limn!11n [trG

0nGn+ trG

2n�

2(trGn)2

n ]. If plimT!1(H3;nT �H02;nTH�1

1;nTH2;nT ) 6= 0 or limn!11n (trG

0nGn+

trG2n�2(trGn)

2

n ) 6= 0, limT!1�@2Qn;T (�0)

@�2is positive. Here, (H3;nT�H0

2;nTH�11;nTH2;nT ) > 0 for large enough

T because of the Cauchy inequality; also, denote Cn = Gn � trGn

n In, then, 1nftrG0nGn + trG

2n �

2(trGn)2

n g =1n tr(Cn + C

0n)(Cn + C0n)0 � 0. �

C.3 Proof of Theorem 3.2

We have

Qn;T (�) = �1

2(ln 2� + 1)� 1

2ln��2nT (�) +

1

nln jSn(�)j (C.1)

where ��2nT (�) = (�0��)2(EH3;nT �EH02;nT (EH1;nT )

�1EH2;nT )+1n�

20tr(S

0�1n Sn(�)Sn(�)S

�1n )+O( 1T ) and

the O( 1T ) is uniformly in �. At � = �0, Qn;T (�0) = � 12 (ln 2� + 1) �

12 ln�

�2nT (�0) +

1n ln jSn(�0)j. We are

going to prove that limT!1Qn;T (�) < limT!1Qn;T (�0) for any � 6= �0.

Qn;T (�)�Qn;T (�0) = �12[ln��2nT (�)� ln��2nT (�0)] +

1

nln jSn(�)j �

1

nln jSn(�0)j

= T1;nT � T2;nT +O(1

T)

where

T1;nT = �12[lnf 1

n�20tr(S

0�1n Sn(�)Sn(�)S

�1n )g � ln��2nT (�0)] +

1

nln jSn(�)j �

1

nln jSn(�0)j

T2;nT = ln

1 +

(�0 � �)2(EH3;nT � EH02;nT (EH1;nT )

�1EH2;nT )

�20tr(S0�1n Sn(�)Sn(�)S

�1n )=n

!.

Consider the pure spatial dynamic panel process Ynt = �0WnYnt+cn0+Vnt, the concentrated log likelihood

function of this process is

lnLp;n;T (�) = �nT

2ln 2� � nT

2ln�2 + T ln jSn(�)j �

1

2�2

TXt=1

(Sn(�)Ynt � cn0)0(Sn(�)Ynt � cn0), (C.2)

and the concentrated likelihood is

lnLp;n;T (�) = �nT

2(ln 2� + 1)� nT

2ln �2p;nT (�) + T ln jSn(�)j , (C.3)

where cp;nT (�) = 1T

TPt=1Sn(�)Ynt and �

2p;nT (�) =

1nT

TPt=1(Sn(�) ~Ynt)

0Sn(�) ~Ynt. Then, E lnLp;n;T (�) �

E lnLp;n;T (�0) would be equal to T1;nT . By information inequality, E lnLp;n;T (�) � E lnLp;n;T (�0) � 0.

Thus, T1;nT � 0 for any �. Also, limT!1 T2;nT > 0 as long as limT!1(EH3;nT�EH02;nT (EH1;nT )

�1EH2;nT ) 6=0. Under Assumption 9, limT!1(EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT ) 6= 0 from Proposition B.14. This

proves the global identi�cation. The consistency then follows from the global identi�cation, uniform conver-

gence and uniform equicontinuity in Claim 3.1. �

41

Page 42: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

C.4 Proof of Theorem 3.3

As ~Znt has stationary and nonstationary parts (see (2.12)), we can decompose 1pnT

@ lnLn;T (�0)@� from

(3.22) into two parts accordingly:

1pnT

@ lnLn;T (�0)

@�=

1pnT

@ lnLsn;T (�0)

@�+

1pnT

@ lnLun;T (�0)

@�+Op

�max

�1pnT;1

T;

rn

T 3

��, (C.4)

where 1pnT

@ lnLsn;T (�0)

@� is the stationary part and 1pnT

@ lnLun;T (�0)

@� is the nonstationary part as de�ned via

(C.5)-(C.9). The 1pnT

@ lnLsn;T (�0)

@� has two parts, namely, 1pnT

@ lnLsn;T (�0)

@� = 1pnT

@ lnLs�n;T (�0)

@� ���;nT where

1pnT

@ lnLs�n;T (�0)

@�=

1

�2nT (�0)

1pnT

TXt=1

V 0nt(G0n �

1

ntrGn � In)Vnt +

1pnT

TXt=1

(�00�Zs�0nt G

0n �H0

2;nTH�11;nT

�Zs�0nt )Vnt

!(C.5)

and

��0;nT =1

�2n;T (�0)

rT

n�V 0nT (G

0n �

1

ntrGn � In) �VnT

!(C.6)

+1

�2n;T (�0)

rT

n(�00( �U

snT;�1;Wn

�UsnT;�1;0n�kx)0G0n �H0

2;nTH�11;nT (

�UsnT;�1;Wn�UsnT;�1;0n�kx)

0) �VnT

!,

Here, �Zs�nt is the component of ~Zsnt, which is uncorrelated with Vnt such that

~Zsnt = �Zs�nt � ( �UsnT;�1; Wn�UsnT;�1; 0n�kx) (C.7)

and �Zs�nt = (( ~~X sn;t�1�0 + U

sn;t�1); (Wn

~~X sn;t�1�0 + WnU

sn;t�1);

~Xnt) with~~X sn;t�1 = X s

n;t�1 � �X snT;�1. For

1pnT

@ lnLun;T (�0)

@� , it also has two parts 1pnT

@ lnLun;T (�0)

@� = 1pnT

@ lnLu�n;T (�0)

@� � N�0;nT where

1pnT

@ lnLu�n;T (�0)

@�=

1

�2n;T (�0)

((1� c0H�1

1;nTH2;nT ) �1pnT

TXt=1

�Y u�0n;t�1Vnt

)(C.8)

with �Y u�n;t�1 =1

(1��0)Mn

�cn0~t�1 +

�Xn;t�1�0 + �n;t�1

�, ~t�1 = (t� 1)� T�1

2 and

N�0;nT =T (1� c0H�1

1;nTH2;nT )

�2n;T (�0)(1� �0)

�rn

T

1

n(Mn

��n;T�1)0 � �VnT

�. (C.9)

For 1pnT

@ lnL�n;T (�0)

@� = 1pnT

@ lnLs�n;T (�0)

@� + 1pnT

@ lnLu�n;T (�0)

@� from (C.5) and (C.8), denote

�Z�nt = �Zs�nt + �Zu�nt , (C.10)

we have

1pnT

@ lnL�n;T (�0)

@�=

1

�2nT (�0)

1pnT

TXt=1

V 0nt(G0n �

1

ntrGn � In)Vnt +

1pnT

TXt=1

(�00 �Z�0ntG

0n �H0

2;nTH�11;nT

�Z�0nt)Vnt

!.

42

Page 43: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Proposition 2.2 implies that it will be asymptotically normally distributed because 1pnT

@ lnLs�n;T (�0)

@� from

(C.5) and 1pnT

@ lnLu�n;T (�0)

@� from (C.8) are counterparts of QsnT and QunT in Proposition 2.2. To calculate the

limit variance, using uncorrelatedness of �Z�nt and Vnt, we have

Cov

1

�20

1pnT

TXt=1

(V 0nt(G0n � �20

1

ntrGn)Vnt);

1

�20

1pnT

TXt=1

(�0 �Z�0ntG

0n �H0

2;nTH�11;nT

�Z�0nt)Vnt

!

=�3�40

1

nT

nXi=1

(Gn;ii �1

ntrGn)

E

TXt=1

(Gn �Z�nt�0 � �Z�ntH�1

1;nTH2;nT )

!i

= 0

because ETPt=1Gn �Z

�nt�0 = 0 and E

TPt=1

�Z�nt = 0. Hence,

1pnT

@ lnLs�n;T (�0)

@�+

1pnT

@ lnLu�n;T (�0)

@�

d! N(0;��0 +�0) (C.11)

where

��0 =1

�20limT!1

(H3;nT �H02;nTH�1

1;nTH2;nT ) + limn!1

1

n(trG0nGn + trG

2n �

2(trGn)2

n), (C.12)

�0 =�4 � 3�40�40

limn!1

nXi=1

G2n;ii. (C.13)

For ��0;nT , using the results in Yu, de Jong and Lee (2006) (Theorem A.11, page 19), we have ��0;nT ��2nT (�0)

�20=p

nT a

s�0;nT

+Op

�max

�pnT 3 ;

1pT

��where

as�0;nT =1

ntr�Gn 0 � (H�1

1;nTH2;nT )1In

��X1

h=0Bhn

�S�1n

+1

ntr�Gn�0 � (H�1

1;nTH2;nT )2In

��X1

h=0WnB

hn

�S�1n

is O(1). As �2nT (�0) = �20 + Op

�max

�1pnT; 1T

��from (3.17) so that �20

�2nT (�0)= 1 + Op

�max

�1pnT; 1T

��,

��0;nT =p

nT a

s�0;nT

+ Op

�max

�pnT 3 ;

1pT

��. Also, using (B.17) and (C.9), we have N�0;nT �

�2nT (�0)

�20=p

nT �

mn

n � au�0;nT + Op�max

�pnT 3 ;

1pT

��where au�0;nT = T � (1 � c0H�1

1;nTH2;nT )1

2(1��0) . As�20

�2nT (�0)=

1 +Op

�max

�1pnT; 1T

��, N�0;nT =

pnT �

mn

n � au�0;nT +Op�max

�pnT 3 ;

1pT

��. Hence,

��0;nT + N�0;nT =rn

T� (as�0;nT +

mn

n� au�0;nT ) +Op

�max

�rn

T 3;1pT

��. � (C.14)

C.5 Proof of Claim 3.4

First, by the mean value theorem, tr(G2n(�)) = tr(G2n) + 2tr(G3n(��))(� � �0) where �� lies between �

and �0. So, 1n tr(G

2n(�)) =

1n tr(G

2n) + j�� �0j � O(1) as 1

n tr(G3n(��)) is uniformly bounded (see Lee(2001),

Lemma A.8 on page 22). Second, using (3.16), we can express 1nT

@2 lnLn;T (�)

@�2in terms of �2nT (�),

@�2nT (�)@� and

@2�2nT (�)

@�2. Then, using (3.17), we have the result that 1

nT@2 lnLn;T (�)

@�2� 1

nT@2 lnLn;T (�0)

@�2= j�� �0j � Op(1) +

Op

�max

�1pnT; 1T

��. �

43

Page 44: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

C.6 Proof of Claim 3.5

Using (3.16), we can express 1nT

@2 lnLn;T (�0)

@�2in terms of �2nT (�0),

@�2nT (�0)@� and @2�2nT (�0)

@�2where, using

(3.17),

�2nT (�0) = �20 +Op

�max

�1pnT;1

T

��,@�2nT (�0)

@�= �2 1

n�20trGn +Op

�max

�1pnT;1

T

��@2�2nT (�0)

@�2= 2(H3;nT �H0

2;nTH�11;nTH2;nT ) + 2�

20

1

ntrG0nGn +Op

�max

�1pnT;1

T

��.

Similarly, we can express @2Qn;T (�0)

@�2in terms of ��2nT (�0),

@��2nT (�0)@� and @2��2nT (�0)

@�2via (3.20) where, using

(B.60),

��2nT (�0) = �20 +O

�1

T

�,@��2nT (�0)

@�= �2�20

1

ntrGn +O

�1

T

�,

@2��2nT (�0)

@�2= 2(EH3;nT � EH0

2;nT (EH1;nT )�1EH2;nT ) + 2�

20

1

ntrG0nGn +

�1

T

�.

Hence, we have the result 1nT

@2 lnLn;T (�0)

@�2� @2Qn;T (�0)

@�2= Op

�max

�1pnT; 1T

��. �

C.7 Proof of Theorem 3.6

(3.28) follows from the Taylor expansion (�nT � �0) = (�@2 lnLn;T (��nT )

@�2)�1

@ lnLn;T (�0)@� where ��nT lies

between �0 and �nT . Note that because

� 1

nT

@2 lnLn;T (��nT )

@�2=

�� 1

nT

@2 lnLn;T (��nT )

@�2��� 1

nT

@2 lnLn;T (�0)

@�2

��+

�� 1

nT

@2 lnLn;T (�0)

@�2� ��0;nT

�+��0;nT

where ��0;nT � �@2Qn;T (�0)

@�2, we have �@2 lnLn;T (��nT )

@�2= ��0;nT +

����nT � �0��� �Op(1) +Op �max� 1pnT; 1T

��according to Claim C.5 and C.6. Because

����nT � �0��� = op(1) as �nT is consistent and ��0;nT is positive inthe limit from Appendix C.2, we have �@2 lnLn;T (��nT )

@�2is invertible for large T and

�� 1nT

@2 lnLn;T (��nT )@�@�0

��1is Op(1).

According to the Taylor expansion,pnT (�nT��0) =

�� 1nT

@2 lnLn;T (��nT )

@�2

��1��

1pnT

@ lnL�n;T (�0)

@� ���0;nT � N�0;nT�

where 1pnT

@ lnL�n;T (�0)

@�

d! N(0;��0 + �0) from (C.11) and ��0;nT + N�0;nT =p

nT � (a

s�0;nT

+ mn

n �au�0;nT ) + Op

�max

�pnT 3 ;

1pT

��with as�0;nT +

mn

n � au�0;nT = O(1) from (C.14). Then,pnT (�nT � �0) =

Op(1) ��Op(1) +O

�pnT

��, which implies that �nT � �0 = Op

�max

�1pnT; 1T

��. Hence,

pnT (�nT � �0) =

�� 1

nT

@2 lnLn;T (��nT )

@�2

��1��

1pnT

@ lnL�n;T (�0)

@����0;nT � N�0;nT

�(C.17)

=

���0;nT +Op

�max

�1pnT;1

T

����1��

1pnT

@ lnL�n;T (�0)

@����0;nT � N�0;nT

�using Claim C.6. Using the fact that�

��0;nT +Op

�max

�1pnT;1

T

����1= ��1�0;nT +Op

�max

�1pnT;1

T

��(C.18)

44

Page 45: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

given that ��0;nT is positive in the limit, we have

pnT (�nT � �0) =

���1�0;nT +Op

�max

�1pnT;1

T

�����

1pnT

@ lnL�n;T (�0)

@����0;nT � N�0;nT

�= ��1�0;nT �

1pnT

@ lnL�n;T (�0)

@�+Op

�max

�1pnT;1

T

��� 1pnT

@ lnL�n;T (�0)

@�

���1�0;nT � (��0;nT + N�0;nT )�Op�max

�1pnT;1

T

��� (��0;nT + N�0;nT ),

which implies that

pnT (�nT � �0) + ��1�0;nT � (��0;nT + N�0;nT ) +Op

�max

�1pnT;1

T

��� (��0;nT + N�0;nT )

= (��1�0;nT + op(1)) �1pnT

@ lnL�n;T (�0)

@�. (C.19)

As ��0 = limT!1��0;nT exists, then using Theorem 3.3 and that ��0;nT + N�0;nT =p

nT � (a

s�0;nT

+

mn

n � au�0;nT ) + Op�max

�pnT 3 ;

1pT

��, we have

pnT (�nT � �0) +

pnT b�0;nT + Op

�max

�pnT 3 ;

1pT

��d!

N(0;��1�0 +��2�0�0). The results in (3.30)-(3.32) are immediate consequences of (3.28). �

C.8 Proof for (4.1)15

From concentrated estimators ((2.9)), �nT (�) = �0�(���0)H�11;nTH2;nT+H�1

1;nT

�1nT

TPt=1

~Z 0ntSn(�)S�1n~Vnt

�.

Using Sn(�)S�1n = In � (�� �0)Gn,

pnT��n;T (�nT )� �0

�= �

pnT (�nT � �0)

H�11;nTH2;nT +H�1

1;nT

1

nT

TXt=1

~Z 0ntGn ~Vnt

!(C.20)

+H�11;nT

1pnT

TXt=1

~Z 0nt ~Vnt

!

= �pnT (�nT � �0)H�1

1;nTH2;nT +H�11;nT

1pnT

TXt=1

~Z 0nt ~Vnt

!+R�nT

where R�nT = �pnT (�nT � �0)H�1

1;nT1nT

TPt=1

~Z 0ntGn ~Vnt.

For the termH�11;nT

1nT

TPt=1

~Z 0ntGn ~Vnt = H�11;nT

1nT

TPt=1

~Zs0ntGn ~Vnt+H�11;nT

1nT

TPt=1

~Zu0ntGn ~Vnt, we have1nT

TPt=1

~Zs0ntGn ~Vnt =

Op

�max

�1T ;

1pnT

��from Theorem A.7 in Yu, de Jong and Lee (2006) and H�1

1;nT1nT

TPt=1

~Zu0ntGn ~Vnt =

H�11;nT c � 1

nT

TPt=1

~~Y u0n;t�1Gn~Vnt = Op

�max

�1T ;

1pnT

��because 1

nT

TPt=1

~~Y u0n;t�1Gn~Vnt = Op

�max

�1;q

Tn

��from Lemma B.11 and H�1

1;nT � c = Op(T�1). Then, because �nT � �0 = Op

�max

�1pnT; 1T

��from

15Note that the derivation of (4.1) is built up from the estimates of various components of �nT in (2.9). The reason is that the

conventional mean value theorem can not be directly applied to the @2 lnLnT (�)@�@�0 at �0 for analysis due to technical complication.

45

Page 46: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

(3.28), H�11;nT � c = Op(T

�1) and c0H�11;nT � c = Op(T

�2), we have R�nT = Op

�max

�1T ;

1pnT;p

nT 3

��and

c0 �R�nT = Op�max

�1T 2 ;

1pnT 3

;p

nT 5

��.

Also, from (B.49) and �nT � �0 = Op�max

�1pnT; 1T

��,

pnT��2n;T (�nT )� �20

�=

1pnT

TXt=1

~V 0nt ~Vnt � n�20

!� 2�20

1

ntrGn

�pnT (�nT � �0)

�+R�2nT ,(C.21a)

R�2nT = Op

�max

�1

T;1pnT;

rn

T 3

��. (C.21b)

From the Taylor expansion,pnT��nT � �0

�=��@2 lnLn;T (��nT )

@�2

��11pnT

@ lnLn;T (�0)@� where ��nT lies between

�0 and �nT . From Claim 3.4, Claim 3.5 and (C.18),��@2 lnLn;T (��nT )

@�2

��1= ��1�0 + Op

�max

�1pnT; 1T

��.

Using Theorem 3.3,

pnT��nT � �0

�= ��1�0

1pnT

@ lnLn;T (�0)

@�+R�nT and R�nT = Op

�max

�1

T;1pnT;

rn

T 3

��. (C.22a)

Hence, we have

pnT

0BB@Ikx+2 H�1

1;nTH2;nT 0

0 1 0

0 2 trGn

n �20 1

1CCA0BB@�n;T (�nT )� �0�nT � �0�2nT (�nT )� �20

1CCA

=

0BBBBBB@H�11;nT

�1pnT

TPt=1

~Z 0nt ~Vnt

���1�0

�1�20

1pnT

TPt=1

~V 0nt(G0n � 1

n trGn)~Vnt +

1�20

1pnT

TPt=1(�00 ~Z

0ntG

0n �H0

2;nTH�11;nT

~Z 0nt) ~Vnt

�1pnT

TPt=1

�~V 0nt ~Vnt � n�20

1CCCCCCA+(R0

�nT; R�nT ; R�2nT )

0

=

0BB@H�11;nT�

20 0 0

0 ��1�0 0

0 0 2�40

1CCA�

0BBBBBBBBBB@

�1�20

1pnT

TPt=1

~Z 0nt ~Vnt

�0BB@

1�20

1pnT

TPt=1( ~V 0ntG

0n~Vnt � �20trGn)� 1

�20

1n trGn

1pnT

TPt=1( ~V 0nt ~Vnt � n�20)

+ 1�20

1pnT

TPt=1(�00 ~Z

0ntG

0n �H0

2;nTH�11;nT

~Z 0nt) ~Vnt

1CCA12�40

1pnT

TPt=1

�~V 0nt ~Vnt � n�20

1CCCCCCCCCCA+(R0

�nT; R�nT ; R�2nT )

0

=

0BB@H�11;nT�

20 0 0

0 ��1�0 0

0 0 2�40

1CCA�0BB@

Ikx+2 0 0

�H02;nTH�1

1;nT 1 � 2n�

20trGn

0 0 1

1CCA

0BBBBBB@1�20

1pnT

TPt=1

~Z 0nt~Vnt

1�20

1pnT

TPt=1( ~V 0ntG

0n~Vnt � �20trGn) + 1

�20

1pnT

TPt=1(�00

~Z 0ntG0n)~Vnt

12�40

1pnT

TPt=1

�~V 0nt~Vnt � n�20

1CCCCCCA+ (R0�nT; R�nT ; R�2nT )

0.

46

Page 47: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

Hence,

pnT

0BB@�n;T (�nT )� �0�nT � �0�2nT (�nT )� �20

1CCA = C1;nT �

0BBBBBB@1�20

1pnT

TPt=1

~Z 0nt ~Vnt

1�20

1pnT

TPt=1( ~V 0ntG

0n~Vnt � �20trGn) + 1

�20

1pnT

TPt=1(�00 ~Z

0ntG

0n) ~Vnt

12�40

1pnT

TPt=1

�~V 0nt ~Vnt � n�20

1CCCCCCA

+

0BB@Ikx+2 H�1

1;nTH2;nT 0

0 1 0

0 2n�

20trGn 1

1CCA�1

� (R0�nT; R�nT ; R�2nT )

0

where

C1;nT =

0BB@Ikx+2 H�1

1;nTH2;nT 0

0 1 0

0 2n�

20trGn 1

1CCA�10BB@

H�11;nT�

20 0 0

0 ��1�0 0

0 0 2�40

1CCA�0BB@

Ikx+2 0 0

�H02;nTH�1

1;nT 1 � 2n�

20trGn

0 0 1

1CCA

=

0BB@Ikx+2 �H�1

1;nTH2;nT 0

0 1 0

0 � 2n�

20trGn 1

1CCA�0BB@H�11;nT�

20 0 0

0 ��1�0 0

0 0 2�40

1CCA�0BB@

Ikx+2 0 0

�H02;nTH�1

1;nT 1 � 2n�

20trGn

0 0 1

1CCA= ��1�0;nT .

We note that, from the log likelihood in (2.8), by concentrating out cn in terms of � = (�0; �; �2)0, the

concentrated likelihood of � is

lnLn;T (�) = �nT

2ln 2� � nT

2ln�2 + T ln jSn(�)j �

1

2�2

TXt=1

~V 0nt(�) ~Vnt(�) (C.23)

where ~Vnt(�) = Sn(�) ~Ynt � ~Znt�. It follows that the �rst derivative of lnLn;T (�) with � evaluated at �0 is

1pnT

@ lnLnT (�0)

@��

0BBBBBB@1�20

1pnT

TPt=1

~Z 0nt ~Vnt

1�20

1pnT

TPt=1( ~V 0ntG

0n~Vnt � �20trGn) + 1

�20

1pnT

TPt=1(�00 ~Z

0ntG

0n) ~Vnt

12�40

1pnT

TPt=1

�~V 0nt ~Vnt � n�20

1CCCCCCA : (C.24)

Hence,

pnT

0BB@�n;T (�nT )� �0�nT � �0�2nT (�nT )� �20

1CCA = ��1�0;nT �1pnT

@ lnLnT (�0)

@�+

0BB@Ikx+2 H�1

1;nTH2;nT 0

0 1 0

0 2n�

20trGn 1

1CCA�1

�(R0�nT; R�nT ; R�2nT )

0.

(C.25)

AsH�11;nTH2;nT = Op(1) from Proposition B.14 and elements ofR0�nT ; R�nT ; R�

2nTareOp

�max

�1pnT; 1T ;

pnT 3

��,

we havepnT��nT � �0

�= ��1�0;nT �

1pnT

@ lnLnT (�0)@� +Op

�max

�1pnT; 1T ;

pnT 3

��.

47

Page 48: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

C.9 Proof for Theorem 4.1

As ~Znt has stationary and nonstationary parts ((2.12)), we can decompose 1pnT

@ lnLnT (�0)@� into two parts:

1pnT

@ lnLn;T (�0)

@�=

1pnT

@ lnLsn;T (�0)

@�+

1pnT

@ lnLun;T (�0)

@�(C.26)

where 1pnT

@ lnLsn;T (�0)

@� is the stationary part and 1pnT

@ lnLun;T (�0)

@� is the nonstationary part as follows. For1pnT

@ lnLsn;T (�0)

@� , it has two parts 1pnT

@ lnLsn;T (�0)

@� = 1pnT

@ lnLs�n;T (�0)

@� ���0;nT where

1pnT

@ lnLs�nT (�0)

@�=

0BBBBBB@1�20

1pnT

TPt=1

�Zs�0nt Vnt

1�20

1pnT

TPt=1(V 0ntG

0nVnt � �20trGn) + 1

�20

1pnT

TPt=1(�00

�Zs�0nt G0n)Vnt

12�40

1pnT

TPt=1

�V 0ntVnt � n�20

1CCCCCCA (C.27)

and

��0;nT =

0BBB@1�20

qTn (�UsnT;�1;Wn

�UsnT;�1;0n�kx)0 �VnT

1�20

qTn�V 0nTG

0n�VnT +

1�20

qTn (�

00(�UsnT;�1;Wn

�UsnT;�1;0n�kx)0G0n)

�VnT

12�40

qTn�V 0nT

�VnT

1CCCA . (C.28)

For 1pnT

@ lnLun;T (�0)

@� , it also has two parts 1pnT

@ lnLun;T (�0)

@� = 1pnT

@ lnLu�n;T (�0)

@� � N�0;nT where

1pnT

@ lnLu�n;T (�0)

@�=1

�20

1pnT

TXt=1

�Y u�0n;t�1Vnt � (c�0; 0)0 (C.29)

with �Y u�n;t�1 =1

(1��0) (RnJnR�1n )

�cn0~t�1 +

�Xn;t�1�0 + �n;t�1

�, ~t�1 = (t� 1)� T�1

2 and

N�0;nT =1

�20

(1

(1� �0)

rT

n(Mn

��n;T�1)0 � �VnT

)� (c�0; 0)0. (C.30)

Let 1pnT

@ lnL�n;T (�0)

@� = 1pnT

@ lnLs�n;T (�0)

@� + 1pnT

@ lnLu�n;T (�0)

@� . For ��1�0;nT �1pnT

@ lnL�n;T (�0)

@� , because ��1�0;nT �(c�0; 0)0 = O(T�1) according to Proposition 2.1, it has the form of the CLT in Proposition 2.2 and is

normally. For its variance, we can write E( 1pnT

@ lnL�n;T (�0)

@� � 1pnT

@ lnL�n;T (�0)

@�0 ) =

E 1nT

0BBBBBBB@

1�40

�TPt=1

�Z�0ntVnt

��TPt=1

�Z�0ntVnt

�0� �

1�40

�TPt=1(Gn �Z

�nt�0)

0Vnt +TPt=1(V 0ntG

0nVnt � �20trGn)

��TPt=1

�Z�0ntVnt

�00 0

12�60

�TPt=1(V 0ntVnt � n�20)

��TPt=1

�Z�0ntVnt

�00 0

1CCCCCCCA

+E 1nT

0BBBBB@0 0 0

0 1�40

�TPt=1(Gn �Z

�nt�0)

0Vnt +TPt=1(V 0ntG

0nVnt � �20trGn)

�2�

0 12�60

�TPt=1(Gn �Z

�nt�0)

0Vnt +TPt=1(V 0ntG

0nVnt � �20trGn)

��TPt=1(V 0ntVnt � n�20)

�00

1CCCCCA48

Page 49: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

+E 1nT

0BBBB@0 0 0

0 0 0

0 0 14�80

�TPt=1(V 0ntVnt � n�20)

��TPt=1(V 0ntVnt � n�20)

�01CCCCA.

As �Z�nt is uncorrelated with Vnt, we have E(1pnT

@ lnL�n;T (�0)

@� � 1pnT

@ lnL�n;T (�0)

@�0 )

=

0BBBBB@1

�20nTE

TPt=1

�Z�0nt�Z�nt

1�20nT

ETPt=1

�Z�0ntGn�Z�nt�0 0

1�20nT

ETPt=1(Gn �Z

�nt�0)

0 �Z�nt1

�20nTE

TPt=1(Gn �Z

�nt�0)

0Gn �Z�nt�0 +

1n

�tr(G0nGn) + tr(G

2n)�

1�20ntr(Gn)

0 1�20ntr(Gn)

12�40

1CCCCCA

+

0BBBBB@0 � �

�3�40nT

nPi=1

Gn;iiE(TPt=1

�Z�nt)i2�3�40nT

nPi=1

Gn;iiE(TPt=1Gn �Z

�nt�0)i +

�4�3�40�40n

nPi=1

G2n;ii �

�32�60nT

l0nETPt=1

�Z�nt1

2�60nT�3l

0nE

TPt=1Gn �Z

�nt�0 +

�4�3�402�60n

trGn�4�3�404�80

1CCCCCA.

As ETPt=1

�Z�nt = 0 and ETPt=1Gn �Z

�nt�0 = 0, the second matrix equals

�0;n =�4 � 3�40�40

0BBB@0 0 0

0 1n

nPi=1

G2n;ii1

2�20ntrGn

0 12�20n

trGn14�40

1CCCA .When Vnt are normally distributed, �0;n = 0 because �4 � 3�40 = 0 for a normal distribution. For the �rstmatrix, premultiplying and postmultiplying it with ��1�0;nT will yield �

�1�0;nT

+ O�1T

�. To see this, denote

�z;nT = ( �Usn;T�1;Wn

�Usn;T�1;0n�kx) and Nz;nT =�

11��0Mn

��n;T�1

�� c0, then, ~Znt = �Z�nt ��z;nT � Nz;nT .

This implies that E( 1pnT

@ lnL�n;T (�0)

@� � 1pnT

@ lnL�n;T (�0)

@�0 ) = ��0;nT +�0;n + ��0;nT where

��0;nT

=

0BBBBB@1

�20nTE

TPt=1

~Z 0nt(Nz;nT +�z;nT ) 1�20nT

ETPt=1

~Z 0ntGn(Nz;nT +�z;nT )�0 0

1�20nT

ETPt=1(Gn ~Znt�0)

0(Nz;nT +�z;nT ) 2�20nT

ETPt=1(Gn ~Znt�0)

0Gn(Nz;nT +�z;nT )�0 0

0 0 0

1CCCCCA

+

0BBBBB@1

�20nTE

TPt=1(Nz;nT +�z;nT )0 ~Znt 1

�20nTE

TPt=1(Nz;nT +�z;nT )0Gn ~Znt�0 0

1�20nT

ETPt=1(Gn(Nz;nT +�z;nT )�0)0 ~Znt 0 0

0 0 0

1CCCCCA

+

0BBBBB@1

�20nTE

TPt=1(Nz;nT +�z;nT )0(Nz;nT +�z;nT ) 1

�20nTE

TPt=1(Nz;nT +�z;nT )0Gn(Nz;nT +�z;nT )�0 0

1�20nT

ETPt=1(Gn(Nz;nT +�z;nT )�0)0(Nz;nT +�z;nT ) 1

�20nTE

TPt=1(Gn(Nz;nT +�z;nT )�0)0Gn(Nz;nT +�z;nT )�0 0

0 0 0

1CCCCCA

49

Page 50: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

=

0BBBBB@1

�20nTE

TPt=1(Nz;nT +�z;nT )0(Nz;nT +�z;nT ) 1

�20nTE

TPt=1(Nz;nT +�z;nT )0Gn(Nz;nT +�z;nT )�0 0

1�20nT

ETPt=1(Gn(Nz;nT +�z;nT )�0)0(Nz;nT +�z;nT ) 1

�20nTE

TPt=1(Gn(Nz;nT +�z;nT )�0)0Gn(Nz;nT +�z;nT )�0 0

0 0 0

1CCCCCAbecause the expectations in the �rst two matrices are all zero.

We have

E1

�20nTE

TXt=1

(Nz;nT +�z;nT )0(Nz;nT +�z;nT )

= c ��

1

(1� �0)21

�20nE��

0n;T�1M

0nMn

��n;T�1

�� c0 +

�1

�20nE�0z;nT�z;nT

�:

+c ��� 1

(1� �0)1

�20nE��

0n;T�1M

0n�z;nT

�+

�� 1

1� �01

�20nE�0z;nTMn

��n;T�1

�c0.

Similarly we can expand 1�20nT

ETPt=1(Nz;nT+�z;nT )0Gn(Nz;nT+�z;nT )�0 and 1

�20nTE

TPt=1(Gn(Nz;nT+�z;nT )�0)0Gn(Nz;nT+

�z;nT )�0. Using the orders of relevant terms from Lemma B.8 and Lemma B.10, we have 1nE�0z;nTBn��n;T�1 =

O(1), 1nE�0z;nTBn�z;nT = O(T�1) and 1

nE��0n;T�1Bn��n;T�1 = O(T ) where Bn is a row sum and column sum

bounded matrix. Using ��1�0;nT � (c�0; 0)0 = O(T�1) and (c�0; 0) � ��1�0;nT � (c

�0; 0)0 = O(T�2) (see Proposition

2.1) and the above, it follows that ��1�0;nT � ��0;nT � ��1�0;nT

= O(T�1). Hence,

��1�0;nTE(1pnT

@ lnL�n;T (�0)

@�� 1pnT

@ lnL�n;T (�0)

@�0)��1�0;nT = �

�1�0;nT

+��1�0;nT�0;n��1�0;nT

+O�T�1

�. (C.31)

Therefore,

��1�0;nT

�1pnT

@ lnL�n;T (�0)

@�0

�p! N(0; lim

T!1��1�0;nT + lim

T!1��1�0;nT�0;n�

�1�0;nT

) (C.32)

Using the results in Yu, de Jong and Lee (2006) (Theorem A.11, page 19), we have ��0;nT =p

nT a

s�0;n

+

Op

�max

�pnT 3 ;

1pT

��. Using (B.17) and (C.30), we have N�0;nT =

pnT �

mn

n �au�0;T

+T �(c�0; 0)�Op�max

�pnT 3 ;

1pT

��where

as�0;n =

0BBBBBBBB@

1n tr

��P1h=0B

hn

�S�1n

�1n tr

�Wn

�P1h=0B

hn

�S�1n

�0

1n 0tr(Gn

�P1h=0B

hn

�S�1n ) + 1

n�0tr(GnWn

�P1h=0B

hn

�S�1n ) + 1

n trGn12�20

1CCCCCCCCA(C.33)

au�0;T = T � 1

2(1� �0)� (c�0; 0)0.

Hence,

��0;nT +N�0;nT =rn

T�(as�0;n+a

u�0;T

mn

n)+T �(c�0; 0)�Op

�max

�rn

T 3;1pT

��+Op

�max

�rn

T 3;1pT

��.

(C.34)

Using ��1�0;nT � (c�0; 0)0 = O(T�1), we have ��1�0;nT

���0;nT + N�0;nT

�=p

nT � b�0;nT +Op

�max

�pnT 3 ;

1pT

��.

Hence, combining (4.1), (C.32) and the equation above, we have the result. �

50

Page 51: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

C.10 Proof for Theorem 4.2

For the remainder term in (C.25), using (C.20), (C.22) and 1�c0H�11;nTH2;nT = O(T

�1) from Proposition

B.14, we have

(c�0; 0)0

0BB@Ikx+2 H�1

1;nTH2;nT 0

0 1 0

0 2n�

20trGn 1

1CCA�10BB@

R�nT

R�nT

R�2nT

1CCA = (c�0; 0)0

0BB@Ikx+2 �H�1

1;nTH2;nT 0

0 1 0

0 � 2n�

20trGn 1

1CCA0BB@R�nT

R�nT

R�2nT

1CCA= c0R�nT + (1� c

0H�11;nTH2;nT )R�nT = Op

�max

�1T 2 ;

1pnT 3

;p

nT 5

��:

Hence,

pnT 3(c�0; 0)(�nT � �0) (C.35)

= T (c�0; 0)��1�0;nT �1pnT

@ lnLnT (�0)

@�+Op

�max

�1

T;1pnT;

rn

T 3

��= T (c�0; 0)��1�0;nT �

1pnT

@ lnLs�nT (�0)

@�+ T (c�0; 0)��1�0;nT �

1pnT

@ lnLu�nT (�0)

@�

�T (c�0; 0)��1�0;nT (��0;nT + N�0;nT ) +Op�max

�1

T;1pnT;

rn

T 3

��where 1p

nT

@ lnLs�nT (�0)@� is in (C.27), 1p

nT

@ lnLu�nT (�0)@� is in (C.29), ��0;nT is in (C.28) and N�0;nT is in (C.30).

We shall investigate the orders of those terms.

For stationary terms, from Yu, de Jong and Lee (2006) (Claim 3.4, page 10), 1pnT

@ lnLs�nT (�0)@� has the

typical Op(1) and ��0;nT � E(��0;nT ) = Op( 1pT) where E(��0;nT ) = O(

pnT ). For the nonstationary term

T (c�0; 0)��1�0;nTN�0;nT = T (c�0; 0)��1�0;nT (c

�0; 0)n

1�20(1��0)

qTn (Mn

��n;T�1)0 � �VnT

o, we have T (c�0; 0)��1�0;nTN�0;nT�

E�T (c�0; 0)��1�0;nTN�0;nT

�= Op(

1pT) where E

�T (c�0; 0)��1�0;nTN�0;nT

�= O(

pnT ) by using (B.17) in Lemma

B.9 and (c�0; 0)��1�0;nT (c�0; 0)0 = O(T�2). For nonstationary term T (c�0; 0)��1�0;nT �

1pnT

@ lnLu�nT (�0)@� , we have

1pnT

@ lnLu�nT (�0)

@�=

1pnT

TXt=1

1

�20(1� �0)Mn

�cn0~t�1 +

�Xn;t�1�0 + �n;t�1

�0Vnt � (c�0; 0)0:

Hence,

pnT 3(c�0; 0)(�nT � �0)

= T (c�0; 0)��1�0;nT �1pnT

@ lnLs�nT (�0)

@�

+T 2(c�0; 0)��1�0;nT (c�0; 0)0 � 1p

nT

TXt=1

1

�20(1� �0)Mn

1

T

�cn0~t�1 +

�Xn;t�1�0 + �n;t�1

�0Vnt

�T (c�0; 0)��1�0;nTE(��0;nT + N�0;nT ) +Op(1pT) +Op

�max

�1

T;1pnT;

rn

T 3

��where T (c�0; 0)��1�0;nTE(��0;nT + N�0;nT ) = O(

pnT ) represents the asymptotic bias term and the �rst two

terms will be asymptotically jointly normally distributed. As (c�0; 0)(�nT � �0) = (�nT + nT + �nT � 1),the rate of convergence of �nT + nT + �nT to the unit is of higher order O(

1pnT 3

) as long as nT 3 ! 0.

51

Page 52: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

So, we have

pnT 3(c�0; 0)(�nT � �0) +

rn

TT (c�0; 0)b�0;nT +Op

�max

�1pT;

rn

T 3

��d! N

�0; limT!1

T 2(c�0; 0)(��1�0;nT + limT!1

��1�0;nT�0��1�0;nT

)(c�0; 0)0�. (C.36)

Also, using Proposition 2.1, limT!1 T2(c�0; 0)��1�0;nT (c

�0; 0)0 = limT!1 !�1nT , where !nT is de�ned in (2.13).

C.11 Proof for Theorem 4.3

From the �rst order condition that @ lnLn;T (�;cn)@cn= 1

�2

TPt=1Vnt(�) = 0, we have cnT (�) = 1

T

TPt=1(Sn(�)Ynt�

Znt�). As SnYnt = Znt�0 + cn0 + Vnt and Sn(�)S�1n = In � (� � �0)Gn, it implies that cnT (�) =

1T

TPt=1((In � (�� �0)Gn) (Znt�0 + cn0 + Vnt)� Znt�). Hence,

cnT (�)� cn0 =1

T

TXt=1

((In � (�� �0)Gn) (Znt�0 + cn0 + Vnt)� Znt�)� cn0

= � 1T

TXt=1

[Znt (� � �0) + (�� �0) (Gncn0 +GnZnt�0)� (In � (�� �0)Gn)Vnt]

= � 1T

TXt=1

[Zsnt (� � �0) + (�� �0) (Gncn0 +GnZsnt�0)� (In � (�� �0)Gn)Vnt]

� 1T

TXt=1

[Zunt (� � �0) + (�� �0) (GnZunt�0)] .

As 1T

TPt=1[Zunt (� � �0) + (�� �0) (GnZunt�0)] =

�1T

TPt=1Y un;t�1

�( + �+ �� 1), we have

1T

TPt=1

hZunt

��nT � �0

�+��nT � �0

�(GnZ

unt�0)

i=

�1T 2

TPt=1Y un;t�1

�T � ( nT + �nT + �nT � 1).

From Theorem 4.2, T �( nT+�nT+�nT�1) = Op�max

�1T ;

1pnT

��. From (2.4), elements of 1

T 2

TPt=1Y un;t�1

are Op(1) if elements ofYn;�1T are Op(1). Then, for each �xed e¤ect, we have

ci;nT (�nT )� ci;0 = � 1T

TXt=1

((Gncn0 +GnZsnt�0)i , (Z

snt)i)�

0@ �nT � �0�nT � �0

1A+ 1

T

TXt=1

n�In � (�nT � �0)Gn

�Vnt

oi

+Op

�max

�1

T;1pnT

��, (C.37)

where (Zsnt)i is the ith row of Zsnt and (Gncn0 +GnZ

snt�0)i is the ith element of (Gncn0 +GnZ

snt�0). As

elements of 1T

TPt=1((Gncn0 +GnZ

snt�0)i, (Z

snt)i) are Op(1) uniformly in n and i implied by Lemma B.4 of

Yu, de Jong and Lee (2006) and �nT � �0 = Op

�max

�1pnT; 1T

��by Theorem 3.6, the dominant term

ofpT (ci;nT (�nT ) � ci;0) would be 1p

T

TPt=1vit + Op

�1pn

�when T ! 1 where the Op

�1pn

�term is the

52

Page 53: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

�1T 2

TPt=1Y un;t�1

�term multiplied by the distribution part of T �( nT+�nT+�nT�1), which is T (c�0; 0)��1�0;nT �

1npT

@ lnL�nT (�0)@� (see (C.35)). So, for each �xed e¤ect,

pT�ci;nT (�nT )� ci;0

�=

1pT

TXt=1

vit+1pn

1

T 2

TXt=1

Y un;t�1

!i

�[T (c�0; 0)��1�0;nT ]�

1pnT

@ lnL�nT (�0)

@�

�+Op

�1pT

�(C.38)

where 1pnT

@ lnL�nT (�0)@� = 1p

nT

@ lnLs�nT (�0)@� + 1p

nT

@ lnLu�nT (�0)@� (de�ned in (C.27) and (C.29)) and [T (c�0; 0)��1�0;nT ]�

1pnT

@ lnL�(�0)@� is normally distributed asymptotically with the variance speci�ed in (4.11). Hence,

pT�ci;nT (�nT )� ci;0

�is a linear and quadratic form of Vnt and it will be normally distributed asymptotically using the central

limit theorem by Proposition 2.2. We need to calculate its variance.

Under the assumption that (Yn;�1=T )i � E(Yn;�1=T )i = op(1) and E(Yn;�1=T )i = O(1) uniformly in n

and i, we have 1T 2

TPt=1(Y un;t�1)i = E

1T 2

TPt=1(Y un;t�1)i + op(1) where E

1T 2

TPt=1(Y un;t�1)i is O(1). Hence,

pT�ci;nT (�nT )� ci;0

�=

1pT

TXt=1

vit+1pn

E1

T 2

TXt=1

Y un;t�1

!i

�[T (c�0; 0)��1�0;nT ]�

1pnT

@ lnL�nT (�0)

@�

�+op(1).

(C.39)

As 1pnT

@ lnL�nT (�0)@� =

0BBBBBB@1�20

1pnT

TPt=1

�Z�0ntVnt

1�20

1pnT

TPt=1(V 0ntG

0nVnt � �20trGn) + 1

�20

1pnT

TPt=1(�00 �Z

�0ntG

0n)Vnt

12�40

1pnT

TPt=1

�V 0ntVnt � n�20

1CCCCCCA where �Z�nt is de�ned

in (C.10), the asymptotic variance ofpT�ci;nT (�nT )� ci;0

�would be �n;ci where

�n;ci = �20 +2

n

E1

T 2

TXt=1

Y un;t�1

!i

��3

�[T (c�0; 0)��1�0;nT ] � [0; Gii; 1]

0��

(C.40)

+2

n

E1

T 2

TXt=1

Y un;t�1

!i

�20

[T (c�0; 0)��1�0;nT ] � [(

TXt=1

E �Z�nt; )i; (TXt=1

EGn �Z�nt�0; )i; 0]

0

!!

+1

n

0@ E 1

T 2

TXt=1

Y un;t�1

!2i

�limT!1

!�1nT + limT!1

T 2(c�0; 0)( limT!1

��1�0;nT�0;nT��1�0;nT

)(c�0; 0)0�1A .

When n!1, we have �n;ci ! �20. �

C.12 Proof for Theorem 4.5

Theorem 4.1 states thatpnT (�nT � �0) +

pnT b�0;nT + Op

�max

�1T ;p

nT 3

�� d! N(0; limT!1��1�0;nT

+

limT!1��1�0;nT

�0;n��1�0;nT

). As the bias corrected estimator �1

nT = �nT +1T b�nT ;nT , we have

pnT (�

1

nT ��0)

d! N(0; limT!1��1�0;nT

+ limT!1��1�0;nT

�0;n��1�0;nT

) ifp

nT

�b�nT ;nT � b�0;nT

�p! 0 and n

T 3 ! 0.

So, given nT 3 ! 0, we are going to prove that

pnT

�b�nT ;nT � b�0;nT

�p! 0 where b�0;nT = ��1�0;nT ��

as�0;n + au�0;T

mn

n

�and b�nT ;nT =

���1�nT ;nT

��as�nT ;n

+ au�nT ;T

mn

n

�. As ��1�0;nT =

���1�nT ;nT

+Op

�max

�1pnT; 1T

��53

Page 54: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

and T �h���1�nT ;nT

� ��1�0;nTi�(c�0; 0)0 = Op

�max

�1pnT; 1T

��from Proposition B.15,

pnT

�b�nT ;nT � b�0;nT

�p!

0 is reduced to rn

T

���1�0;nTa

u�nT ;n

� ��1�0;nTau�0;T

�p! 0 (C.41)

and rn

T

�as�nT ;n

� as�0;n�

p! 0. (C.42)

For (C.41), as au�0;nT = T1

2(1��0) (c�0; 0)0 with ��1�0;nT �(c

�0; 0)0 = O(T�1),p

nT

���1�0;nTa

u�nT ;n

� ��1�0;nTau�0;n

�=p

nT

�T � ��1�0;nT (c

�0; 0)0���

12(1��nT )

� 12(1��0)

�=p

nT

�T � ��1�0;nT (c

�0; 0)0���

�nT��02(1��nT )(1��0)

�. As �nT��0 =

Op

�max

�1T ;

1pnT

��, we have

pnT

���1�0;nTa

u�nT ;n

� ��1�0;nTau�0;n

�p! 0 if n

T 3 ! 0.

For (C.42), as �nT ��0 = Op�max

�1T ;

1pnT

��and asn(�0) is O(1) where a

sn(�0) � as�0;n, according to the

Taylor expansion of asn(�nT ) around asn(�0), to prove (C.42) is reduced to proving that elements of

@asn(��nT )

@�0

are O(1) where ��nT lies between �nT and �0 and

asn(�) =

0BBBBBBBB@

1n tr

��P1h=0B

hn(�)

�S�1n (�)

�1n tr

�Wn

�P1h=0B

hn(�)

�S�1n (�)

�0

1n tr(Gn(�)

�P1h=0B

hn(�)

�S�1n (�)) + 1

n�tr(GnWn

�P1h=0B

hn(�)

�S�1n (�)) + 1

n trGn(�)

12�2

1CCCCCCCCA.

From Proposition B.2, for An(�) = (In��Wn)�1( In+�Wn) whereWn is diagonalizable asWn = RnD

�nR

�1n ,

we have that An(�) is diagonalizable as An(�) = RnDn(�)R�1n , with its eigenvalue matrix Dn(�) = (In �

�D�n)�1( In + �D

�n). As Bn(�) = Rn ~Dn(�)R

�1n with ~Dn(�) = Diag(0; � � � ; 0; dn;mn+1; � � � ; dnn) so that

Dn(�) = Jn + ~Dn(�) where Jn = Diagf10mn; 0; � � � ; 0g, we have

Bn(�) = Rn (Dn(�)� Jn)R�1n = Rn(In � �D�n)�1( In + �D

�n)R

�1n �RnJnR�1n .

With Bn(�) as a function explicitly in �,@Bn(�)@�0 can be easily evaluated. Because @Bh

n(�)@�0 = hBh�1n (�)@Bn(�)

@�0

for h � 1 (see footnote 9 in Yu, de Jong and Lee (2006)), we haveP1

h=1@Bh

n(�)@�0 =

P1h=1 hB

h�1n (�)@Bn(�)

@�0 .

As (1)P1

h=0Bhn(�) and

P1h=1 hB

h�1n (�) are uniformly bounded in either row sum or column sum, uniformly

in a neighborhood of �0, (2) S�1n (�) is uniformly bounded in both row and column sums, also uniformly in �

in a neighborhood of �0 and (3) Wn is uniformly bounded in both row and column sums, we have the result

that the elements of @asn(�)@�0 will be uniformly bounded in n in a neighborhood of �0. As ��nT converges in

probability to �0, we conclude that elements of@asn(

��nT )@�0 are Op(1).

For (4.15), we can start from (4.11). Similarly, we can provep

nT T (c

�0; 0)�b�1nT ;nT

� b�0;nT�

p! 0. Hence,pnT 3(c�0; 0)(�

1

nT � �0)d! N

�0; limT!1 T

2(c�0; 0)���1�0;nT +�

�1�0;nT

�0;n��1�0;nT

�(c�0; 0)0

�under n

T 3 ! 0,

where limT!1 T2(c�0; 0)��1�0;nT (c

�0; 0)0 = limT!1 !�1nT using Proposition 2.1. �

54

Page 55: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

References

[1] Dhrymes, P (1978), Mathematics for Econometrics, Springer-Verlag.

[2] Choi, I. (2004), Nonstationary Panels, Palgrave Handbooks of Econometrics, Vol. 1, forthcoming (Jul.,

2004).

[3] Horn, R. and C. Johnson (1985), Matrix Algebra, Cambridge University Press.

[4] Im, K.S., M.H. Pesaran, S. Shin (2003), Testing for Unit Roots in Heterogeneous Panels, Journal of

Econometrics, 115, 53-74

[5] Kelejian, H.H. and I.R. Prucha (1998), A Generalized Spatial Two-Stage Least Squares Procedure for

Estimating a Spatial Autoregressive Model with Autoregressive Disturbance, Journal of Real Estate

Finance and Economics, Vol. 17:1, 99-121.

[6] Kelejian, H.H. and I.R. Prucha (2001), On the Asymptotic Distribution of the Moran I Test Statistic

With Applications, Journal of Econometrics, 104, 219-257.

[7] Lee, L.F. (2001), Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Econo-

metric Models I: Spatial Autoregressive Process, Working Paper, The Ohio State University.

[8] Lee, L.F. (2004), Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Econo-

metric Models, Econometrica, Vol. 72, No.6, 1899-1925.

[9] Levin, A., C-F. Lin and C-S.J. Chu (2002), Unit Root tests in Panel Data: Asymptotic and Finite

Sample Properties, Journal of Econometrics, 108, 1-24.

[10] Maddala, G.S. and S. Wu (1999), A Comparative Study of Unit Root Tests With Panel Data and A

New Simple Test, Oxford Bulletin of Economics and Statistics, 61, 631-652.

[11] Moon, H.R. and B. Perron (2004), Testing for A Unit Root in Panels With Dynamic Factors, Journal

of Econometrics, 122, 81-126.

[12] Ord, J.K. (1975), Estimation Methods for Models of Spatial Interaction, Journal of the American

Statistical Association 70, 120-297.

[13] Pesaran, M.H. (2003), A Simple Panel Unit Root Test in the Presence of Cross Section Dependence,

Mimeo, Trinity College, Cambridge.

[14] Phillips, P.C.B. and D. Sul (2003), Dynamic Panel Estimation and Homogeneity Testing Under Cross

Section Dependence, Econometrics Journal, 6, 217-259.

[15] Tao, J. (2006), Analyzing Local School Expenditure in A Dynamic Game, Working Paper, Shanghai

University of Finance and Economics.

55

Page 56: Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data …€¦ · For cross-sectionally correlated panels, we have Pesaran (2003), Phillips and Sul (2003), Moon and Perron

[16] Rothenberg, T.J. (1971), Identi�cation in Parametric Models, Econometrica, Vol. 39, No.3, 577-591.

[17] Yu, J. (2006), Convergence: A Spatial Dynamic Panel Data Approach, Working Paper, The Ohio State

University.

[18] Yu, J., R. de Jong and L-F. Lee (2006), Quasi-Maximum Likelihood Estimators For Spatial Dynamic

Panel Data With Fixed E¤ects When Both n and T Are Large, Working Paper, The Ohio State

University.

56