Simultaneous Equations Models with Higher-Order Spatial or ...econweb.umd.edu/~prucha/Papers/WP_DMD_PHE_IRP_2019.pdfSimultaneous Equations Models with Higher-Order Spatial or Social

Simultaneous Equations Models with

Higher-Order Spatial or Social Network

Interactions

David M. Drukker1, Peter H. Egger2, and Ingmar R. Prucha3,

May 24, 2019 (This Version)

1StataCorp, 4905 Lakeway Drive, College Station, TX 77845, Tel.:+1- 979-696-

4600, e-mail: [email protected], ETH Zurich, Leonhardstrasse 21, 8092 Zurich, Switzerland; Tel.: +41-44-

632-4108, e-mail: [email protected] of Economics, University of Maryland, College Park, MD 20742, Tel.:

+1-301-405-3499, e-mail: [email protected]

Abstract

This paper develops an estimation methodology for network data generated

from a system of simultaneous equations. Our specification allows for network

interdependencies via spatial lags in the endogenous and exogenous variables, as

well as in the disturbances. By allowing for higher-order spatial lags our spec-

ification provides important flexibility in modeling network interactions. The

estimation methodology builds on the generalized method of moments (GMM)

estimation approach introduced in Kelejian and Prucha (1998, 1999, 2004). The

paper considers limited and full information estimators, and one- and two-step

estimators, and establishes their asymptotic properties. In contrast to some of

the earlier literature our asymptotic results facilitate joint tests for the absence

of all forms of network spillovers.

Key Words: Cliff-Ord spatial model; Limited information estimation; Full in-

formation estimation; Two-stage least squares estimation; Three-stage least

squares estimation; Generalized method of moments estimation

JEL Classification: C21, C31

1 Introduction1

In this paper we develop a generalized estimation theory for simultaneous equa-

tion systems for cross sectional data with possible network interactions in the

dependent variables, the exogenous variables and the disturbances. A leading

application will be spatial networks. However, since network interdependencies

are modeled only to relate to a measure of proximity, without assuming that ob-

servations are indexed by location, the developed methodology can be of interest

to the estimation of a much wider class of networks, including social networks.

There is substantial empirical evidence of cross sectional interdependence

among observations in many areas of economics both at the macro level, where

cross sectional units may, e.g., be countries, states or counties, as well as at

the micro level where cross sectional units may, e.g., be industries, firms or

individuals.2

An important class of models for spatial networks originates from the seminal

work by Whittle (1954) and Cliff and Ord (1971, 1983). In those models cross-

sectional interactions are modeled through spatial lags, where the weights used

in forming the spatial lags are reflective of the relative importance of the links

between neighbors for the generation of spillovers. In a spatial setting the

relative importance would typically be taken to be inversely related to a measure

of distance. The usefulness of those models for the analysis of a wide class of

networks beyond spatial networks stems from the recognition that the notion of

distance is not confined to geographic distance.

Spatial econometrics has a long history in geography, regional science and

urban economics; see, e.g., Anselin (1988). For the last two decades the devel-

opment of econometric methods of inference for Cliff-Ord type models has also

been an active area of research in economics.3 Most of the literature focused on

single-equation models where a single dependent variable, say, , is determined

for units = 1 . However, in economics it is frequent that the outcomes for

several dependent variables, say, 1 , are determined jointly by a system

of equations for units = 1 . In this case the simultaneous nature of the

1We gratefully acknowledge financial support from the National Institute of Health through

SBIR grants R43 AG027622 and R44 AG027622. We also thank the the CESifo for their

hospitality.2For instance, Conley and Ligon (2002) or Ertur and Koch (2007) document spatial

spillovers in economic growth. Holtz-Eakin (1994) or Audretsch and Feldmann (1996) put

forward evidence for spatial spillovers in productivity. Egger, Pfaffermayr, and Winner (2005)

provide evidence for spatial interdependencies of value added tax rates, and Devereux, Lock-

wood, and Redoano (2008) for spatial corporate tax rates. Case, Hines, and Rosen (1993)

report on spatial budget spillovers. The results in Behrens, Ertur, and Koch (2012) suggest

that bilateral trade flows exhibit spatial interdependence, and Baltagi, Egger, and Pfaffermayr

(2007, 2008) and Blonigen, Davies, Waddell, and Naughton (2007) illustrate that the same

holds true for bilateral foreign direct investment.

Pinkse, Slade, and Brett (2002) provide evidence for spatial price competition among

wholesale-gasoline terminals. For contributions to the literature on social interactions see,.e.g.,

Ballester, Calvó-Armegol and Zenou (2006), Lee (2007), Calvó-Armengol, Patacchini and

Zenou (2009), Blume, Bock, Durlauf and Ionnides (2011), Cohen-Cole, Liu and Zenou (2018),

and Liu (2014).3 See, e.g., Anselin (2010) for a review of the development of spatial econometric methods.

1

outcomes can stem from two sources, interactions between different economic

variables as well as interactions between cross sectional units. Surprisingly, the

literature on the estimation of simultaneous systems of spatially interrelated

cross sectional equations has so far been limited. Kelejian and Prucha (2004)

provide, by extending the methodology developed in Kelejian and Prucha (1998,

1999) for single equations, an early development of generalized method of mo-

ments estimators for such models. However, as discussed in more detail below,

they do not provide a full asymptotic theory for all considered estimators and

their setup only covers first-order spatial lags. Liu (2014) and Cohen-Cole,

Liu, and Zenou (2018) build on the methodology of Kelejian and Prucha (2004)

within the context of social interaction models, and provide further refinements,

which includes a bias correction procedure for many instruments.

Other recent contribution to the literature on spatial simultaneous equa-

tion models are Baltagi and Deng (2015), who consider an extension of a two-

equation system with first-order spatial lags to panels. Wang, Li, and Wang

(2014) analyze the quasi maximum likelihood estimator for such a system in the

cross section. Yang and Lee (2017) consider the quasi maximum likelihood esti-

mator for a multi-equation system with a first-order spatial lag in the dependent

variable. In contrast to the current paper, those papers only consider first-order

spatial lags in the dependent variable, and do not also consider spatial spillovers

in the disturbance process. Those papers also differ in terms of the considered

estimation methodology.

An important limitation of the estimation methodology developed in Kele-

jian and Prucha (2004) is that the paper only establishes the consistency, but

not the asymptotic normality, of important spatial parameters. As a result, the

methodology does not facilitate a joint test for the absence of spatial interac-

tions in the dependent variables, the exogenous variables and the disturbances.

Closely related to this is that Kelejian and Prucha (2004) do not consider ef-

ficiently weighted GMM estimators for the spatial autoregressive parameters,

since that paper lacked the knowledge of the limiting distribution for those

estimators.

Another important limitation of the earlier paper is that it only allowed for

first-order spatial lags. Allowing for higher-order spatial lags is important for

at least two reasons. First, the researcher may not be sure about the chan-

nel through which interactions occur — e.g., though geographic proximity, or

technological proximity, or both. By allowing for higher-order spatial lags the

researcher can consult the data on this issue. Second, as argued below, higher-

order spatial lags can be used to relax the requirement regarding a priori knowl-

edge of what weights should be assigned to different units in the construction

of a spatial lag.

The paper is organized as follows: Section 2 specifies the considered simul-

taneous equation system with spatial/cross sectional network interactions. In

Section 3 we discuss two exemplary applications. The first example highlights

how higher-order spatial lags can be used to achieve a more flexible specification

of the spatial weights. The second example considers a social interaction model

where individuals make interdependent choices on the level of effort for multi-

2

ple activities. In Section 4, we discuss the moment conditions underlying the

considered GMM estimators for the regression parameters and the parameters

of the disturbance process. The paper focuses on two-step estimation proce-

dures. It turns out that the distribution of the GMM estimator for the spatial

autoregressive parameters of the disturbance process depends on the estimator

of regression parameters. Section 5 is hence devoted to give generic results con-

cerning the consistency and asymptotic normality of two-step GMM estimators.

In particular, we give generic results concerning the joint limiting distribution

of estimators for all model parameters of interest, which can be utilized in the

usual way to form general Wald tests regarding the model parameters. Results

for one-step estimators are in essence delivered as a special case of two-step es-

timation. In Sections 6 and 7 we introduce specific limited and full information

estimators, and provide specific expressions for consistent estimators of the as-

sociated asymptotic variance-covariance (VC) matrices of those estimators. The

last section concludes with a summary of our findings and possible directions

for future research. All technical derivations are given in appendices, including

a supplemental appendix. The latter will be made available on our web sites

and also contains some initial Monte Carlo results.

Throughout the paper we adopt the following notations and conventions. Let

(A)∈ be some sequence of matrices, then we denote the ( )-th element of

A with . If A is nonsingular, then we denote its inverse with A−1 , and

the ( )-th element of A−1 with . Let A be of dimension × , then

the maximum column sum and row sum matrix norms of A are, respectively,

defined as

kAk1 = max1≤≤

P=1 || and kAk∞ = max

1≤≤

P=1 ||

If kAk1 ≤ and kAk∞ ≤ for some finite constant which does not depend

on , then we say that the row and column sums of the sequence of matrices

A are uniformly bounded in absolute value. We note that if the row and

column sums of the matrices A and B are uniformly bounded in absolute

value, then so are the row and column sums of A +B and AB; cf., e.g.,

Kapoor, Kelejian and Prucha (2007, Remark A2). For any square matrix A,

A = (A +A0)2, and for any vector or matrix A, kAk = [(A0

A)]12,

where denotes the trace operator.

2 Model

In the following, we specify our simultaneous system of equations for en-

dogenous variables observed for cross sectional units. This system allows for

two sources of simultaneity. First, the observations for the -th endogenous

variable for the -th unit may depend on observations of the other endogenous

variables for the -th unit, as in the classical text book simultaneous equation

system.4 Second, simultaneity may stem from Cliff and Ord (1973, 1981) type

4Our specification differs from, e.g., Anselin (1988) and Wang, Lee and Bao (2018) who

consider systems for one variable.

3

higher-order cross-sectional network interactions, where spatial interactions rep-

resent a leading application.

As remarked, the model specification will be fairly general and allows for

network interactions modeled by, possibly, higher-order spatial lags in the de-

pendent variables, the exogenous variables and the disturbances. More specif-

ically, we assume that the cross-sectional data are generated by the following

system ( = 1 ):5

y =

X=1

y +

X=1

x +

X=1

"X

=1

W

#y + u(1)

u =

"X

=1

M

#u + ε

where y is the × 1 vector of cross-sectional observations on the dependentvariable in the -th equation, x is the × 1 vector of cross-sectional observa-tions on the -th exogenous variable, which is taken to be nonstochastic,6 uis the × 1 disturbance vector in the -th equation,W and M are ×

weights matrices, ε is the ×1 vector of innovations entering the disturbanceprocess for the -th equation, and denotes the sample size. With and

we denote the (scalar) parameters corresponding to the -th endogenous

and -th exogenous variables, respectively.

Consistent with the usual terminology for Cliff-Ord type network interac-

tions, we refer toW andM as spatial weights matrices, to

y =Wy and u =Mu

as spatial lags, and to the (real scalar) parameters and as spatial

autoregressive parameters. The weights matrices carry the information on the

links between units and on the relative weight of those links, and the spatial

autoregressive parameters describe the strength of the spillovers. Although orig-

inally introduced for spatial networks, Cliff-Ord type interaction models do not

require the indexing of observations by location. In general they only rely on a

measure of distance in the formation of the spatial weights. Since the notions

of space and distance or proximity are not confined to geographic space, these

models have, as discussed in the introduction, also been applied in various other

settings. This includes social interaction models, where one considered speci-

fication has been to assign to each of the -th individual’s friends a weight of

1, where denotes the total number of friends of , while assigning zero

weights to individuals not belonging to the circle of friends. In the following,

we continue to refer to y and u and and as spatial lags and

spatial autoregressive parameters, but note the wider applicability.

5Of course, the structural model parameters are not identified without certain restrictions.

Those restrictions will be introduced below.6 In treating the exogenous variables as non-stochastic, the analysis may be viewed as

conditional on the exogenous variables.

4

The reason for allowing the elements of the spatial weights matrices to de-

pend on the sample size is to permit — as is frequent practice in applications —

normalizations of these matrices where the normalization factor(s) depend on

the sample size.7 The -th element of y is given by =P

=1.

We note that even if the elements of the spatial weights matrices do not depend

on sample size, the elements of the spatial lag y and, analogously, the ele-

ments of u will generally depend on sample size. This in turn implies that

also the elements of y and u will generally depend on the sample size, i.e.,

form triangular arrays. In allowing the elements of x to depend on the sample

size, we implicitly also allow for some of the exogenous variables to be spatial

lags of exogenous variables, and thus the model accommodates, as remarked

above, cross-sectional interactions in the endogenous variables, the exogenous

variables, and the disturbances.

The above model generalizes the spatial simultaneous equation model con-

sidered in Kelejian and Prucha (2004) in allowing for higher-order spatial lags.8

Consistent with the terminology introduced by Anselin and Florax (1995) in a

single-equation context, we refer to the above model as a simultaneous spatial

autoregressive model of order with spatially autoregressive disturbances of or-

der , for short, a simultaneous SARAR(p,q) model.9 One reason for allowing

for multiple spatial weights matrices is that they can capture different forms

of proximity between units. For example, within the context of R&D spillovers

between firms, one matrix may refer to geographic proximity between firms, and

the other may correspond to a measure of proximity in the product space. As

another example, as discussed in more detail below, within the context of a so-

cial interaction model different matrices may refer to different circles of friends,

e.g., one matrix may identify the very close friends, and a second matrix the

other friends. Additionally, as discussed below, an estimation theory that allows

for multiple spatial weights matrices can also be used to accommodate certain

parameterizations of the spatial weights.

Model (1) can be written more compactly as

Y = YB +XC +YΛ +U (2)

U = UR +E

with

Y = (y1 y)× X = (x1 x)× U = (u1 u)× E = (ε1 ε)×Y = (y11 y1 y1 y)×U = (u11 u1 u1 u)×

7The normalizing factors may in turn affect the parameters of the spatial lags, which is

the reason for allowing the parameters in (1) to depend on the sample size; see, e.g., Kelejian

and Prucha (2010) for further discussions regarding normalizations.8Extensions of the estimation methodology will be discussed later.9For single equations higher order SAR models have been considered by Blommestein (1983,

1985) and Huang (1984), among others, and more recently by Bell and Bockstael (2000), Cohen

and Morrison Paul (2007), Badinger and Egger (2010), and Lee and Liu (2010).

5

and where the parameter matrices B = ()×, C = ()×, Λ =()×, and R = ()× are defined conformably.

10

Towards computing the reduced form of the above model, let

y = (Y) x = (X) u = (U) ε = (E)

and let

W =£W0

1 W0

¤0 M =

£M0

1 M0

¤0

Observing that (Y) = (I ⊗W)y and (U) = (I ⊗M)u, and

that for any two conformable matricesA1 andA2, (A1A2) = (A02⊗I)(A1)

it is readily seen that the spatial simultaneous equation system (2) can be re-

written in stacked notation as

y = B∗y +C∗x + u (3)

u = R∗u + ε

where B∗ = [(B0 ⊗ I) + (Λ0 ⊗ I) (I ⊗W)], C∗ = (C0 ⊗ I), and R∗ =

(R0 ⊗ I) (I ⊗M) Assuming invertability of I −B∗ and I −R∗, the

reduced form of the system is now given by

y = (I −B∗)−1 [C∗x + u] (4)

u = (I −R∗)−1 εAs remarked, the structural parameters of the spatial simultaneous equation

system (1) and (2) are not identified unless we impose exclusion restrictions. Let

β, γ, λ, and ρ denote the ×1, ×1, ×1 and ×1 vectors ofnon-zero elements of the -th column of B, C, Λ, and R, respectively, and

let Y, X, Y, and U be the corresponding matrices of observations

on the endogenous variables, exogenous variables, spatially lagged endogenous

variables, and spatially lagged disturbances appearing in the structural equation

for the -th endogenous variable. Then, system (2) can be expressed as ( =

1 ):

y = Zδ + u (5)

u = Uρ + ε

where Z = [YXY] and δ = [β0γ

0λ

0]

0.

We maintain the following assumptions regarding the spatial weights matri-

ces and model parameters.

Assumption 1 For = 1 and = 1 : (a) All diagonal elements

of W and M are zero. (b) kWk1 ≤ , kMk1 ≤ for some finite

constant which does not depend on , and kWk∞ = 1, kMk∞ = 1.10For clarity, we note that the -th column of Λ and R are, respectively, given by

[11 1 1 ]0 and [0 0 1 0 0]0.

6

Assumption 2 (a) The matrices I − B∗ are nonsingular. (b) The spatialautoregressive parameters satisfy sup

P∈I

¯

¯ 1 for = 1 ,

where I ⊆ 1 denotes the set of indices associated with the elements ofρ. (c) The row and column sums of the matrices [I−B∗]−1 are uniformlybounded in absolute value.

The above assumptions are in line with the recent spatial literature. Assump-

tion 1(a) entails a normalization rule. Assumption 1(b) implies that the row and

column sums of the matricesW andM are uniformly bounded in absolute

value. The assumption that kWk∞ = 1 and kMk∞ = 1 implies a normal-ization for the parameters. For interpretation, letW∗

be some spatial weights

matrix with°°W∗

°°∞ 6= 1 and let ∗ be the corresponding spatial autore-

gressive parameter on W∗y. Now define W = W∗

°°W∗

°°∞ and

= ∗°°W∗

°°∞, then kWk∞ = 1 and W = ∗W

∗.

Thus the normalizations kWk∞ = 1 and kMk∞ = 1 can always be

achieved by appropriately re-scaling the elements of the spatial weights matrix,

provided the corresponding spatial autoregressive parameter is correspondingly

redefined; for further discussions see Kelejian and Prucha (2010).11

Assumption 2(a) ensures that the first equation of the expression for the re-

duced form in (4) is well defined. In allowing in Assumption 2(b) for the index

set I to vary with we allow for different orders of spatial lags in the distur-

bance process of different equations. Next observe that R∗ = =1£R∗

¤with R∗ = R∗(ρ) =

P∈I M. In light of this it follows from

Assumptions 1(b) and 2(b) that kR∗k∞ ≤ maxP

∈I¯

¯ 1, which in

turn implies that I−R∗ is nonsingular; see Horn and Johnson (1985, p. 301).Consequently, also the second equation of the expression for the reduced form

in (4) is well defined, and thus y is uniquely defined by the model.

Assumptions 1(b) and 2(b) imply even that sup°°R∗°°∞ 1, which im-

plies that the row sums of the matrices£I −R∗

¤−1are uniformly bounded in

absolute value. To see this observe that k £I −R∗¤−1 k∞≤ 1 h1− °°R∗°°∞i ≤1h1− sup

°°R∗°°∞i ∞; see Horn and Johnson (1985, p. 301).Assumption 3 (a) The matrix of (nonstochastic) exogenous regressors X in

(2) has full column rank (for sufficiently large). Furthermore, the elements

of X are uniformly bounded in absolute value by some finite constant. (b) The

elements of the parameter matrices B, C and Λ are uniformly bounded in

absolute value.

11The suggested re-scaling is simple and practically implementable even if is large. In

situations where is sufficiently small such that the eigenvalues of the spatial weights matrices

are computable, one could alternatively normalize each spatial weights matrix by its spectral

radius, which would in conjunction with the next assumptions entail an expansion of the

admissible parameter space.

7

An assumption such as Assumption 3(a) is common in the spatial literature.

In treatingX as nonstochastic, our analysis should be viewed as conditional on

X. Note that in part (b) of Assumption 3 uniformity refers to . Assumption

3(b) is trivially satisfied, if the parameters do not depend on the sample size,

since any real number is finite.

We next state the assumptions maintained w.r.t. ε. In the following let

V = [v1 v] be an × matrix of basic innovations and let v =

(V).

Assumption 4 The innovations ε are generated as follows:

ε = (Σ0 ⊗ I)ν (6)

where Σ0 is a nonsingular × matrix and the random variables : =1 = 1 are, for each , identically and independently distributed

with zero mean, unitary variance, and finite 4 + moments for some 0,

and their distribution does not depend on . Furthermore, let Σ = Σ0Σ, then

the diagonal elements of Σ are bounded by some finite constant.

The above assumption on the innovation process is in line with the speci-

fication of the disturbance terms for a classical simultaneous equation system.

Let ε() denote the -th row of E, then, observing that E = VΣ, it is

readily seen that the innovation vectors ε() : 1 ≤ ≤ are i.i.d. with zeromean and VC matrix Σ. With respect to the stacked innovation vector, the

assumption implies that ε = 0 and εε0 = Σ⊗ I.

Given (4), we note that Assumption 4 implies furthermore that u = 0

and y = (I −B∗)−1C∗x, and that the VC matrices of u and y aregiven by, respectively,

Ω = (I −R∗)−1(Σ⊗ I)(I −R∗0 )−1Ω = (I −B∗)−1Ω(I −B∗0 )

Assumptions 2 and 4 imply that the row and column sums of the VC matrix

of u (and similarly those of y) are uniformly bounded in absolute value, thus

limiting the degree of correlation between, respectively, the elements of u and

of y.

Remark: Under the above assumptions it is shown in Lemmata A.2 and A.3

that all random variables in [YXY] have uniformly bounded finite fourth

moments, and that

−1Z0Au − −1Z0Au = (1)

for any × real matrixA whose row and column sums are bounded uniformly

in absolute value.

8

For purposes of estimation it proves helpful to apply a spatial Cochrane-

Orcutt transformation to the model. In particular, premultiplying (5) by I −R∗(ρ) yields

y∗ = Z∗δ + ε (7)

with y∗ = y∗(ρ) =£I −R∗(ρ)

¤y and Z∗ = Z∗(ρ) =£

I −R∗(ρ)¤Z. Stacking the transformed equations yields

y∗ = Z∗δ + ε (8)

with y∗ =£y0∗1 y

0∗

¤0, Z∗ = =1 [Z∗], δ =

£δ01 δ

0

¤0and where ε is as defined above.

3 Exemplary Applications

In the following we give two examples involving the use of higher-order spatial

lags. The first example illustrates how higher-order spatial lags can be useful for

certain parameterizations of the spatial weights. The second example formulates

a social interaction model where the utility maximizing solution is described by

a system of equations with higher-order spillovers as defined in (1).

3.1 Parameterized Spatial Weights

As part of the specification of the model in (1) the researcher has to specify

the elements of the spatial weights matrices. In case that those elements are

specified incorrectly, the model is misspecified and estimates will generally be

inconsistent. Allowing for higher-order spatial lags provides important flexibility

and robustness in modeling network interactions.

In the following we discuss exemplarily how higher-order spatial lags can be

used to allow for certain flexible parameterizations of the spatial weights. For

simplicity of notation we drop subscripts . Spatial weights are often specified

as a function of some distance measure, possibly combined with some contiguity

measure. LetW = () be the basic spatial weights matrix, let denote some

distance measure between units and , and let ∗ be some contiguity measuretaking on values of one or zero. Then, the researcher may specify the weights

as the product of the contiguity measure and a polynomial in , treating the

coefficients of the polynomial as unknown parameters:12

= ∗£1 + +

¤

Now, suppose the researcher models y as a function of, say, Wy, then

clearly

Wy =

"

X=1

W

#y =

"X

=1

W

#y

12Alternatively the researcher could specify a polynomial in 1 .

9

with = , W = (), and = ∗ . In allowing for a higher-

order spatial lags, model (1) covers this specification as a special case. Of course,

the above specification of spatial weights is entirely illustrative, and model (1)

will cover many other specifications, including specifications with alternate basis

functions instead of power functions, and more general measures of distance

and contiguity. The same ideas also apply to the modeling of the disturbance

process.13

3.2 Social Interactions in Multiple Activities

The following example extends Cohen-Cole, Liu and Zenou (2018). We follow

their basic setup apart, but allow for more flexible peer effects. More specifically,

consider a model where individuals choose effort levels for activities, say

1 , allowing for peer effects among groups of peers, e.g., for = 2 we

may distinguish between very close friends and other friends. Now let ∗ beone or zero depending on whether individual belongs to the -th peer group

of individual , and let be the size of that peer group. Let = ∗denote the corresponding normalized weights, let =

P=1 denote

the average effort level for activity by the -th group of peers, and assume

that the utility of individual is of the following linear-quadratic form:

(1 ) = (1 11 1 1 ) (9)

=

X=1

∗ +X=1

X=1

X=1

∗| z

−12

X=1

∗2 −

X=1

X=1 6=

∗| z

The specification considered in Cohen-Cole, Liu and Zenou (2018) corresponds

to = 1. The first-order conditions for the maximum of (1 ) yield

= +

X=1 6=

+

X=1

X=1

with = ∗∗, = −∗∗, = ∗

∗, or in matrix notation

y = π +

X=16=

y +

X=1

X=1

Wy (10)

with π = [1 ]0. Similar to Cohen-Cole, Liu and Zenou (2018) assume

that π can be modeled as

π =

X=1

x + u (11)

13The above observations are related to Pinkse and Slade (1998), who estimate, in a single-

equation context, the spatial weights corresponding to the dependent variable nonparamet-

rically. Given the complexity of our systems specification, we do not pursue nonparametric

estimation, here.

10

Substituting (11) into (10) then shows that the utility maximizing vectors of

effort for the activities are defined as the solution of a model of the form

specified in (1).

We note that by specifying the W to be block diagonal, we can accom-

modate situations where the individuals = 1 belong to, say, groups

which, e.g., represent class rooms. Also, some of the x covariates may rep-

resent group indicator variables, and others may be spatial lags of some basic

covariates.

4 Moment Conditions

Recall that from the Cochrane-Orcutt transformed form of the model (7) we

have

ε = ε(ρ δ) =£I −R∗(ρ)

¤[y − Zδ]

The estimators for the model parameters ρ and δ considered in this paper

will utilize a set of linear and quadratic moment conditions of the form ( =

1 )

m(ρ δ) = −1H0

ε = 0 (12)

m(ρ δ) =

⎡⎢⎣ −1ε0A1ε...

−1ε0Aε

⎤⎥⎦ = 0, (13)

where the × instrument matrixH in the linear form and the × weightingmatrices A in the quadratic forms are non-stochastic. In the following we

will also simply write m and m

for the sample moment vector at the true

parameter values.14

We maintain the following assumptions regarding the instruments H.

Assumption 5 : The instrument matrices H are nonstochastic and have full

column rank ≥ + + (for all large enough). Furthermore, the

elements of the matrices H are uniformly bounded in absolute value. Addi-

tionally H is assumed to contain, at least, the linearly independent columns of

H = [XM1X MX].

The inclusion of H in H ensures that the exogenous variables on the r.h.s.

of the Cochrane-Orcutt transformed model serve as their own best instruments.

For limited information estimators it suffices to postulate thatH is assumed to

contain, at least, the linearly independent columns of [XM1X MX].

Assumption 6 The instruments H satisfy furthermore:

(a) Q = lim→∞ −1H0H is finite and nonsingular.

14We note that our setup could be readily modified to accommodate for H and for the

A to vary with at the expense of further complicating the notation.

11

(b) Q = plim→∞−1H0

Z and Q = plim→∞−1H0

MZare finite and have full column rank.

(c) Let Q∗(ρ) = Q −P

∈I Q, then

min£Q∗(ρ)

0Q−1Q∗(ρ)¤ ≥ for some 0.

The above assumptions are in the spirit of those maintained, e.g., in Kelejian

and Prucha (1998, 2004, 2010) and Lee (2003). We first discuss Assumption 5.

The best instruments for Y and Y are given by their conditional means.

Observe that in light of (3) we have

y = (I −B∗)−1C∗xwith B∗ = [(B

0 ⊗ I) + (Λ0 ⊗ I) (I ⊗W)]. For large the accurate com-

putation of the inverse of I−B∗, which is of dimension × and depends

on unknown parameters, will be challenging if not impossible, unless the weights

matrices are sparse. Furthermore, even in the single equation case existing re-

sults on the asymptotic properties of GMM estimators based on the best instru-

ments have so far only been obtained by restricting the parameter space to a

compact interval in, say, (−1 1). To avoid those difficulties and limitations weemploy an approximation of the best instruments, which is consistent in spirit

with the approach adopted in the above cited literature.

Given kB∗k 1 we have (I −B∗)−1 =P∞

=0(B∗)

and thus y =P∞=0 (B

∗)

C∗x. In light of the structure of B

∗ it is not difficult to see that

the blocks of (I −B∗)−1 can be expressed as infinite weighted sums of thematrices I, W11=1, W1W212=1, W1W2W3123=1, Adopting the notation [A ]

=1:= [A1 A] for any set of conformable

matrices A1 A, define

X1 = [W1X]

1=1 X = [W1W2 WX]

12=1

and let H = [XX1 X]. Now supposeW1 W are the spa-

tial weights matrices appearing in Y, then by including inH the linearly in-

dependent columns of£HW1H WH

¤, we may view the fitted

values of Z as computationally simple approximations of the best instruments

Z. Suppose further that M1 M are the spatial weights matrices in

the disturbance process of the -th equation, then by including in H also

the linearly independent columns of M1

£HW1H WH

¤, ,

M

£HW1H WH

¤we may view the fitted values of Z∗ as

computationally simple approximations of the best instruments Z∗. Prelim-inary Monte Carlo results suggest that in many situations relatively low values

of are sufficient for providing a good approximation.

Assumption 6(a) is standard. Assumption 6(b) ensures the identification of

δ, while Assumption 6(c) ensures the identification of ρ.

We will maintain the following assumptions regarding the matrices A.

12

Assumption 7 : The row and column sums of the matrices A, = 1 ,

are bounded uniformly in absolute value by some finite constant and, further-

more, all diagonal elements of A are zero for any = 1 .

The assumptions that the diagonal elements of A are zero ensures that

the moment conditions are robust against heteroskedasticity. Exemplary speci-

fications for A include

M, M0M − (M0

M),

W, W0W − (W0

W), M0W − (M0

W)

For computational purposes and for proving consistency, it is convenient to

re-write the moment conditions in (13) as

γ − Γr(ρ) = 0 (14)

where

γ×1

=

⎡⎢⎣ γ1...

γ

⎤⎥⎦ Γ×∗

=

⎡⎢⎣ Γ11 Γ12 Γ13...

......

Γ1 Γ2 Γ3

⎤⎥⎦ r∗×1

(ρ) =

⎡⎣ r1r2r3

⎤⎦with

γ = −1u0Au

Γ1 = −1(2u0M01Au 2u

0M

0

Au)

Γ2 = −−1(u0M01AM1u u

0M

0

AMu)

Γ3 = −−1(2u0M01AM2u 2u

0M

0−1AMu)

r1 = (1 )0

r2 = (21 2

)0

r3 = (12 1 −1)0

and ∗ = 2+ (−1)2. Now let eδ be some estimator for δ, let eu =y − Zeδ, and let eΓ and eγ denote the corresponding estimators ofΓ and γ, respectively, which are obtained by suppressing the expectations

operator and replacing u by eu in the above expressions. Thenm

(ρeδ) = eγ − eΓr(ρ) (15)

5 Generic Asymptotic Properties

In this section we give a generic discussion of the asymptotic properties of GMM

estimators for ρ and δ corresponding to the moment conditions in (12) and

13

(13). We focus on two-step estimators. Two-step estimators are appealing, since

they are computationally simple, and since in a single-equation context the loss

of efficiency has been found to be small under various reasonable scenarios, see,

e.g. Das et al. (2003). A two-step procedure also provides some robustness for

the estimation of δ against misspecification of the disturbance process. As

will be seen below, the limiting distribution of the GMM estimators for ρwill depend on the estimator of δ used in computing the estimated residual.

That is, δ is not a nuisance parameter for ρ (although the reverse is true).

As a consequence, the derivation of the limiting distribution of the two-step

estimators is technically more challenging. We will discuss one-step estimators

later on in the paper. In essence the results in this section also deliver the

limiting distribution of one-step estimators as a special case.

The GMM estimators for the autoregressive parameter vectors ρ intro-

duced below generalize the GMM estimators in Kelejian and Prucha (2004). In

contrast to Kelejian and Prucha (2004), we not only accommodate higher-order

spatial disturbance processes, but we also provide for a full asymptotic theory

for those estimators. As a result, we are able to introduce more efficient esti-

mators, and provide results on the joint limiting distribution of the estimators

for all model parameters.

5.1 Consistency of GMM Estimator for ρ

Let eΥ be some × symmetric positive semidefinite (moments) weighting

matrix. Then a corresponding GMM estimator for ρ can be defined as

eρ = eρ( eΥ) = :

∈I ||∈[−]m

(ρeδ)0 eΥm

(ρ

eδ)(16)

with ≥ 1. We note that the objective function for eρ remains well definedeven for values of ρ for which I−R∗0(ρ) is singular, which allows us to takeas the optimization space a compact set containing the true parameter space.

We postulate the following additional assumption to establish consistency ofeρ.Assumption 8 The smallest eigenvalue of Γ0Γ is uniformly bounded awayfrom zero.

Assumption 9 eΥ −Υ = (1), where Υ is an × non-stochastic

symmetric positive definite matrix. The largest eigenvalues of Υ are bounded

uniformly from above, and the smallest eigenvalues Υ are bounded uniformly

away from zero (and, thus, by the equivalence of matrix norms, Υ and Υ−1

are (1)).

Assumption 8 requires Γ0Γ to be nonsingular and in conjunction withAssumption 9 ensures that the smallest eigenvalue of Γ0ΥΓ is uniformly

14

bounded away from zero. This will be sufficient to demonstrate that ρ is

identifiable unique w.r.t. the nonstochastic analogue of the objective function

of the GMM estimator

m(ρ δ)

0Υm(ρ δ)

=£γ − Γr(ρ)

¤0Υ

£γ − Γr(ρ)

¤

Assumption 9 ensures that eΥ is positive definite with probability trending

to one. Of course, Assumption 9 is satisfied for eΥ = Υ = I . In this case,

the estimator defined by (16) reduces to a nonlinear least squares estimator.

Choices of eΥ which result in efficient estimates of ρ will be discussed

below in conjunction with the asymptotic normality result.

Our basic consistency result for eρ is given by next theorem.Theorem 1 (Consistency) Let eρ = eρ( eΥ) denote the GMM estima-

tor for ρ defined by (16). Suppose Assumptions 1-9 hold, and suppose that

12(eδ − δ) = (1), then,

eρ − ρ → 0 as →∞

5.2 Asymptotic Distribution of GMM Estimator for ρ

The limiting distribution of the GMM estimator eρ will generally depend onthe estimator eδ used to compute estimated disturbances. To define GMMestimators for δ we can employ the moment conditions (12). Leading exam-

ples for limited and full information GMM estimators for δ will be the spatial

2SLS and 3SLS estimators defined in the next Section. To keep the discussion

general we maintain the following assumption regarding eδ.Assumption 10 The estimator eδ is asymptotically linear in ε in the sensethat

12(eδ − δ) = −12X=1

T0ε + (1)

with T = FP, where F and P are, respectively, × and

× real non-stochastic matrices whose elements are uniformly bounded

in absolute value by a finite constant, and where is the dimension of the

parameter vector δ. (We note that under the maintained assumptions the

elements of T are again uniformly bounded in absolute value by a finite

constant).

In the Appendix we show that our spatial 2SLS and 3SLS estimators satisfy

this assumption. For 2SLS estimators of the parameters of the, say, first equa-

tion,P

=1T01ε reduces to T

011ε1. Also note that under Assumptions

4 and 10 we have 12(eδ − δ) = (1), as is assumed by Theorem 1.

15

In preparation of our result concerning the asymptotic distribution of the

GMM estimator we next define the matrices that will compose the limiting VC

matrix. In particular, consider the × matrix

J = −m

ρ= Γ

r(ρ)

ρ (17)

and the × matrix Ψ =

¡

¢where

= 2(2)−1

£¡A +A

0

¢ ¡A +A

0

¢¤(18)

+−1α0

"X=1

X=1

T0T

#α

with

α = −−1 £Z0(I −R∗0(ρ))(A +A0)(I −R∗(ρ))u

¤

As shown in the proof of the subsequent theorem, Ψ is the asymptotic

VC matrix of the sample moment vector m(ρ

eδ). The second term in

(18) stems from the fact that the sample moment vector depends on estimated

residuals. If the true residuals could be observed, we could take eδ = δ or

T = 0, in which case the second term in (18) is zero.

Theorem 2 (Asymptotic Normality) Let eρ = eρ(eΥ) denote the GMM

estimator for ρ defined by (16). Then given Assumptions 1-10, and given

that min(Ψ) ≥ ∗Ψ 0, we have

12(eρ − ρ) = £J0ΥJ¤−1

J0Υξ + (1) (19)

where

ξ =

⎡⎢⎣ 1...

⎤⎥⎦ = −−12⎡⎢⎢⎣

12ε0(A1 +A

01)ε +α01

P=1T

0ε

...12ε0(A +A

0)ε +α0

P=1T

0ε

⎤⎥⎥⎦(20)

and (Ψ)

−12ξ→ (0 I). Furthermore

12(eρ − ρ) = (1) and

min£Ω(Υ)

¤ ≥ 0 for

Ω(Υ) = (J0ΥJ)

−1J0ΥΨΥJ(J

0ΥJ)

−1(21)

The above theorem implies that the difference between the cumulative distri-

bution function of 12(eρ−ρ) and that of h0Ωi converges pointwise

16

to zero, which justifies the use of the latter distribution as an approximation of

the former.15

Remark. Clearly,Ω((Ψ

)

−1) =£J0(Ψ

)

−1J¤−1

andΩ(Υ)−

Ω((Ψ

)

−1) is positive semi-definite for any Υ. Thus, choosing eΥ asa consistent estimator for (Ψ

)−1 leads to the efficient GMM estimator. As

discussed in the proof of the above theorem, the elements ofΨ are uniformly

bounded in absolute value and, hence, max¡Ψ

¢ ≤ ∗∗Ψ for some ∗∗Ψ ∞.Since by assumption also 0 ∗Ψ ≤ min(Ψ

), it follows that the conditions on

the eigenvalues of Υ postulated in Assumption 9 are automatically satisfied

by (Ψ)

−1.

We next define a consistent estimator for Ω(Υ). As a preliminary

result we have the following lemma.

Lemma 1 : Suppose Assumptions 1-4 hold. For = 1 definee = −1eε0eε (22)

with eε = y∗(eρ) − Z∗(eρ)eδ and assume eδ − δ = (1) andeρ − ρ = (1), then e − = (1)

We note that in the above lemma eρ and eδ can be any consistent esti-mators. Now let eΓ be defined as in (15) and corresponding to (17) define

eJ = eΓ r(eρ)ρ

(23)

Furthermore, corresponding to (18) consider the following estimator eΨ =³e´ for Ψ

, where

e = e2(2)−1 £¡A +A0

¢ ¡A +A

0

¢¤(24)

+−1eα0"

X=1

X=1

eeT0 eT

# eα

witheα = −−1 £Z0(I −R∗0(eρ))(A +A0)(I −R∗(eρ))eu¤

where eT is some estimator for T, and eu = y − Zeδ. Givenestimators for J and Ψ

we can now formulate the following estimator for

Ω:eΩ(

eΥ) = (eJ0 eΥeJ)−1eJ0 eΥ

eΨ

eΥeJ(eJ0 eΥ

eJ)−1(25)

The next theorem establishes the consistency of eΨ and

eΩ.

15This follows from Corollary F4 in Pötscher and Prucha (1997). Compare also the discus-

sion on pp. 86-87 in that reference.

17

Theorem 3 (VC Matrix Estimation) Suppose all assumptions of Theorem 2

hold, except for Assumption 9, and suppose that −1 eT0 eT−−1T0T =

(1). Then, provided that eρ − ρ = (1),

eΨ −Ψ

= (1), ( eΨ)

−1 − (Ψ)

−1 = (1),

and Ψ = (1), (Ψ

)−1 = (1). If furthermore Assumption 9 holds, then,

eΩ −Ω

= (1), (eΩ)

−1 − (Ω)

−1 = (1),

and Ω = (1) (Ω

)−1 = (1)

Note for the first part of the above theorem eρ can be any consistentestimator.

5.3 Joint Asymptotic Distribution of Estimators for ρ and

δ

In the following, let δ= [δ01 δ

0]

0 and ρ= [ρ01 ρ

0]

0, and leteδ= [eδ01 eδ0]0 and eρ= [eρ01 eρ0]0 denote the corresponding esti-mators as defined in the previous subsections. In this section, we derive the

joint limiting distribution of eδ and eρ. Knowledge of the joint asymptotic dis-tribution of all model parameters will then enable the researcher to test for the

presence of network spillovers in any of the dependent variables, explanatory

variables, or disturbances in the system.

As shown in the proof of the next theorem, the joint limiting distribution

of the estimators will depend on the joint limiting distribution of the following

vector of linear and quadratic forms [η0 ξ0]0 with η0 = [η01 η

0]

0 and

ξ0 = [ξ01 ξ

0]

0, where η = −12P

=1T0ε and ξ are defined

in (20). Let

T =

⎡⎢⎣ T11 T1

.... . .

...

T1 T

⎤⎥⎦ then the VC matrix of [η0 ξ

0]0 is given by

Ψ =

∙Ψ Ψ

Ψ0 Ψ

¸ (26)

where

Ψ = ξξ

0 =

⎡⎢⎣ Ψ11 Ψ

1

.... . .

...

Ψ1 Ψ

⎤⎥⎦ Ψ = ηη

0 = −1T0(Σ⊗ I)T

Ψ = ηξ

0 = −1T0(Σ⊗ I)T

=1 [α1 α]

18

and where the ( )-th element of Ψ is given by

= 2(2)

−1£¡A +A

0

¢ ¡A +A

0

¢¤+−1α0

"X=1

X=1

T0T

#α

Analogous to (24) consider the following estimator eΨ =

³e´ forΨ

, where

e = e2(2)−1 £¡A +A0

¢ ¡A +A

0

¢¤+−1eα0

"X=1

X=1

e eT0 eT

# eα

witheα = −−1 £Z0(I −R∗0(eρ))(A +A0)(I −R∗(eρ))eu¤

with eT being some estimator for T, and eu = y − Zeδ. Next,consider the estimators

eΨ = −1 eT0(eΣ⊗I)eTeΨ = −1 eT0(eΣ⊗I)eT

=1 ([eα1 eα])

then our estimator for Ψ can be formulated as

eΨ =

" eΨ

eΨeΨ0

eΨ

# (27)

We now have the following theorem concerning the joint limiting distribution

of eδ − δ and eρ − ρ.Theorem 4 (Asymptotic Normality) Let eρ= [eρ01 eρ0]0 where eρ =eρ( eΥ) denotes the GMM estimator for ρ defined by (16), and let

eδ= [eδ01 eδ0]0be an estimator for δ. Then given Assumptions 1-10, −1 eT0 eT−−1T0T =

(1), and given that min(Ψ) ≥ for some 0, we have

12∙ eδ − δeρ − ρ

¸=

∙I 0

0 [£J0ΥJ

¤−1J0Υ]

¸ ∙ηξ

¸+ (1)

with

Ψ−12

∙ηξ

¸→ (0 I)

19

where =P

=1( + + + ). Furthermore, let

Ω =

∙I 0

0 =1¡(J0ΥJ)

−1J0Υ

¢ ¸Ψ (28)

×∙I 0

0 =1¡ΥJ(J

0ΥJ)

−1¢ ¸ eΩ =

"I 0

0 =1

³(eJ0 eΥ

eJ)−1eJ0 eΥ

´ # eΨ

×"I 0

0 =1

³eΥeJ(eJ0 eΥ

eJ)−1´#

Then, eΨ −Ψ = (1), eΨ−1 −Ψ−1 = (1), Ψ = (1), Ψ−1 = (1) andeΩ −Ω = (1), eΩ−1 −Ω−1 = (1), Ω = (1), Ω−1 = (1).

Theorem 4 implies that the difference between the joint cumulative distribu-

tion function of the estimators all model parameters 12[(eδ−δ)0 (eρ−ρ)0]0and that of (0Ω) converges pointwise to zero so that using the latter as an

approximation of the former is justified.16 The theorem also states that eΩ is

a consistent estimator of Ω. Of course, since the marginal distribution of a

multivariate normal distribution is normal, the above theorem also establishes

the limiting distribution of any subvector of 12[(eδ − δ)0 (eρ − ρ)0]0.6 Limited and Full Information Two-Step Esti-

mators

In the previous section we developed generic results regarding the asymptotic

properties of two-step GMM estimators for the parameters of model (1). The

results show that the limiting distribution of GMM estimators for ρ, which

employ an initial estimator for δ in computing estimated residuals, will gen-

erally depend on the limiting distribution of the latter. Thus, establishing the

proper asymptotic theory for specific estimators is “delicate”. In this section,

we define specific limited and full information estimators and provide specific

expressions for their asymptotic VC matrices.

6.1 Definition of Limited Information Estimators

In the following we define, in a sequence of steps, a specific generalized spatial

two-stage least squares (GS2SLS) estimator of δ and a GMM estimator of

ρ based on GS2SLS residuals. W.o.l.o.g. we consider the estimation of the

parameters of the -th equation.

16This follows from Corollary F4 in Pötscher and Prucha (1997). Compare also the discus-

sion on pp. 86-87 in that reference.

20

Step 1a: 2SLS estimator of δAs a first step, we apply 2SLS to the -th equation of the untransformed model

(5) using the instrument matrix H in Assumptions 5 and 6 to estimate δ.

The 2SLS estimator, say eδ, is then defined aseδ = (eZ0Z)−1eZ0y (29)

where eZ = PHZ = ( eYX

eY), eY = PHY,

eY = PHY,

and where PH=H(H

0H)

−1H0.17

Step 1b: Initial GMM estimator of the vector ρ based on 2SLS

residuals

Let eu = u(eδ) = y − Zeδ denote the 2SLS residuals of the -

th equation, and let m(ρ

eδ) denote the corresponding sample momentvector as defined in (15). Our initial GMM estimator for ρ is now defined as

eρ = :

∈I ||∈[−]m

(ρeδ)0m

(ρeδ) (30)

with ≥ 1.

Step 2a: GS2SLS estimator of δAnalogous to Kelejian and Prucha (1998), we next compute a generalized spatial

two-stage least squares (GS2SLS) estimator of δ, bδ(eρ). This estimatoris defined as the 2SLS estimator of the -th equation of the spatially Cochrane-

Orcutt transformed model (7) with ρ replaced by eρ, i.e.,bδ = bδ(eρ) = [bZ∗(eρ)0Z∗(eρ)]−1bZ∗(eρ)0y∗(eρ) (31)

where y∗(eρ) = £I −R∗(eρ)¤y, Z∗(eρ) = £I −R∗(eρ)¤Z,and bZ∗(eρ) = PH

Z∗(eρ). We shall also utilize the following estimatorfor the ( )-th block of Ψ

(corresponding to bδ):bΨ = b[−1bZ∗(eρ)0bZ∗(eρ)]−1

where b = −1bε0bε with bε = y∗(eρ)− Z∗(eρ)bδ.Step 2b: Efficient GMM estimator of ρ based on GS2SLS residuals

Let bu = y−Zbδ denote the GS2SLS residuals of the -th equation,and let m

(ρbδ) denote the corresponding sample moment vector as

defined in (15). Then, the corresponding efficient GMM estimator for ρbased on GS2SLS residuals is given by

bρ = :

∈I ||∈[−]m

(ρbδ)0(bΨ

)−1m

(ρbδ) (32)

17 In the previous section we used tilde to denote generic estimators. In the following tilde

is used to denote our initial 2SLS based estimators.

21

where bΨ = (

b ) is an estimator of the VC matrix of the limiting dis-tribution of the normalized sample moments 12m

(ρbδ). Specifically,

we have b = (2)−1b2 £(A +A0)(A +A

0)

¤+bα0 bΨ

bα

with

bα = −−1£Z0∗(eρ)(A +A

0)(I −R∗(eρ))bu¤

The claim that ( bΨ)

−1 provides the efficient weighting of the sample momentswill be established by Theorem 5 below.

6.2 Asymptotic Properties of Limited Information Esti-

mators

In this subsection, we derive results concerning the joint limiting distribution

of the GS2SLS estimators bρ and bδ by applying the generic limit theorydeveloped in Theorem 4. In preparation we first specialize the expressions for

estimators of the ( )-th blocks of Ψ , Ψ

and Ω

, Ω , Ω

as implied by

the specific structure of bδ and bρ. More specifically, letbΩ =

bΨ

bΩ =

bΨ

bJ bΩ =

∙eJ0 ³bΨ

´−1 eJ¸−1 with bΨ

= bΨ [bα1 bα]

bJ =³bΨ

´−1 eJ ∙eJ0 ³bΨ

´−1 eJ¸−1 and eJ = J(eρ). We now have the following result concerning the jointasymptotic distribution of bδ and bρ.Theorem 5 (Joint Asymptotic Normality of bρ and bδ) Suppose Assump-tions 1-8 hold, and that the smallest eigenvalues of Ψ are bounded away from

zero.18 Then, bρ is efficient among the class of GMM estimators based on

GS2SLS residuals, and∙ bδ − δ)bρ − )

¸∼

"−1

Ã bΩ

bΩbΩ0

bΩ

!#

18Explicit expressions for the sub-matrices composing Ψ specialized to and aregiven in the proof of the theorem.

22

In the above theorem the estimators for the asymptotic VC matrix of bρand bδ employ eρ as an estimator for ρ. The theorem continues to hold

if eρ is replaced by bρ (or any other consistent estimator).6.3 Definition of Full Information Estimators

In the previous section we discussed GS2SLS estimation, where the parameters

of each equation are estimated separately from the spatially Cochrane-Orcutt

transformed model (7). In the following, we consider full information estimation,

where all parameters are estimated jointly from the stacked spatially Cochrane-

Orcutt transformed model (8). In particular, we will consider a generalized

spatial three-stage least squares (GS3SLS) estimator.

Step 3a: GS3SLS estimator of δThe GS3SLS estimator of δ based on the stacked spatially Cochrane-Orcutt

transformed model (8) is given by

bbδ = [bZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)]−1bZ0∗(bρ)(bΣ−1 ⊗ I)−1y∗(bρ) (33)

where y∗(bρ) = £y0∗1(bρ1) y0∗(bρ)¤0, Z∗(bρ) = =1£Z∗(bρ)¤,bZ∗(bρ) = =1

hbZ∗(bρ)i with bZ∗(bρ) = PHZ∗(bρ), and wherebΣ = (b) with b = −1bε0bε and bε = y∗(bρ)−Z∗(bρ)bδ.

Below, we shall also utilize the following estimator for Ψ (corresponding tobbδ): bbΨ

=h−1bZ0∗(bρ)(bΣ−1 ⊗ I)bZ∗(bρ)i−1

and we denote the ( )-th blocks of Ψ and

bbΨ

with Ψ and

bbΨ

,

respectively.

Step 3b: GMM estimator of ρ based on GS3SLS residuals

In a final step we compute a further GMM estimator of ρ based on the

GS2SLS residuals bbu = y−Zbbδ, where bbδ denotes the -th componentofbbδ. Let m

(ρbbδ) denote the corresponding sample moment vector as

defined in (15). Then, the corresponding GMM estimator for ρ based on

GS3SLS residuals is given by

bbρ = :

∈I ||∈[−]m

(ρbbδ)0µbbΨ

¶−1m

(ρbbδ)

(34)

wherebbΨ

is an estimator of the VC matrix Ψ of the limiting distribution

of the normalized sample moments 12m(ρ

bbδ). Towards presenting23

the asymptotic distribution of bbρ1 bbρ we need estimators not only forthe ( )-th block of Ψ

, but more generally for the ( )-th block Ψ. LetbbΨ

andbbΨ

denote the estimators for Ψ and Ψ

, respectively, then the

( )-th element ofbbΨ

is defined asbb = (2)−1b2 £(A +A0)(A +A

0)

¤(35)

+bbα0 bbΨ

bbα

with bbα = −−1hZ0(I −R∗0(bρ))(A +A

0)(I −R∗(bρ))bbui.

6.4 Asymptotic Properties of Full Information Estimators

In this subsection, we derive results concerning the joint limiting distribution of

the GS3SLS estimators bbρ and bbδ by applying again the generic limit theorydeveloped in Theorem 4. In preparation, we first specialize the expressions for

estimators of Ψ , Ψ

and Ω

, Ω , Ω

as implied by the specific structure

of bbρ and bbδ. More specifically, letbbΩ

=bbΨ

bbΩ

=bbΨ

=1

µbbJ¶ bbΩ

= =1

µbbJ0¶ bbΨ

=1

µbbJ¶with

bbΨ

=bbΨ

=1

³[bbα1 bbα]

´bbJ =

µbbΨ

¶−1 bJ "bJ0µbbΨ

¶−1 bJ#−1 and with bJ = J(bρ). The next theorem establishes the joint limiting

distribution of bbρ and bbδ.Theorem 6 (Joint Asymptotic Normality of bbρ and bbδ) Suppose Assumptions1-8 hold, and that the smallest eigenvalues of Ψ are bounded away from zero.

19

Then, "(bbδ − δ)(bbρ − ρ)

#∼

⎡⎣−1⎛⎝ bbΩ

bbΩ

bbΩ0

bbΩ

⎞⎠⎤⎦ The estimators bbρ based on GS3SLS residuals are efficient within their class.19Explicit expressions for the sub-matrices Ψ are given in the proof of the theorem.

24

In the above theorem the estimators for the asymptotic VC matrix of bbρand

bbδ employ bρ as an estimator for ρ. The theorem continues to hold ifbρ is replaced by bbρ (or any other consistent estimator.)7 Limited and Full Information One-Step Esti-

mators

In the following we discuss the one-step analogues to the two-step estimators

considered in the previous section. For simplicity we assume the availability of

a consistent estimator for Σ, say eΣ = (e).Towards defining our one-step limited information GMM estimator, consider

the stacked moment vector

m(ρ δ) =

∙m

(ρ δ)

m(ρ δ)

¸

wherem(ρ δ) andm

(ρ δ) are the vectors of linear and quadratic

sample moments for the -th equation as defined in (12) and (13). Let

Φ =

£−1H0

H

¤and Φ

= ()

with

= (2)−12

£(A +A

0)(A +A

0)

¤

Then, m(ρ δ) = 0 and

(12m(ρ δ)) =

∙Φ 0

0 Φ

¸

Let bΦ and

bΦ denote the corresponding estimators, where is re-

placed by some consistent estimator e. Then the one-step limited informa-tion GMM estimator is defined as

(bδ bρ) = argmin

m(ρ δ)0" bΦ

0

0 bΦ

#−1m(ρ δ)

A simple adaptation of the methodology used to derive the limiting distribution

of two-step estimators yields∙ bδ − δ)bρ − ρ)¸∼

"−1

Ã bΩ 0

0 bΩ

!#

with bΩ = b[−1bZ∗(bρ)0bZ∗(bρ)]−1bΩ =

∙J0(bρ)³bΦ

´−1J(bρ)¸−1

25

A comparison of the above expressions for the asymptotic VC matrix of the one-

step estimator with those for the two-step estimator given by Theorem 5 reveals

that the difference stems solely from terms in the latter expression involving

estimates for α. They reflect that the two-step GMM estimator for ρis based on estimated disturbances. If the disturbances would be observed, we

would have α = 0, and both one- and two-step estimators would have the

same limiting distribution.

Towards defining our one-step full information GMM estimator consider the

stacked sample moment vector

m(ρ δ) =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

m1(ρ1 δ1)

...

m(ρ δ)

m1(ρ1 δ1)

...

m(ρ δ)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

Let

Φ = Σ⊗ £−1H0

H

¤and

Φ =

⎡⎢⎣ Φ11 Φ

1

.... . .

...

Φ1 Φ

⎤⎥⎦ where the ( )-th element of Φ

is given by

= 2(2)

−1£¡A +A

0

¢ ¡A +A

0

¢¤

Then, m(ρ δ) = 0 and

(m(ρ δ)) = −1∙Φ 0

0 Φ

¸

Let bΦ and bΦ

denoted the corresponding estimators where is replaced

by some consistent estimator e. Then the one-step full information GMMestimator is defined as

(bbδ bbρ) = argmin

m(ρ δ)0" bΦ

0

0 bΦ

#−1m(ρ δ)

Again, a simple adaptation of the methodology used to derive the limiting dis-

tribution of two-step estimators yields" bbδ − δ)bbρ − ρ)#∼

"−1

Ã bΩ 0

0 bΩ

!#

26

where

bbΩ

=h−1bZ0∗(bbρ)(eΣ−1 ⊗ I)bZ∗(bbρ)i−1 bbΩ

= =1

µbbJ0¶ bΦ =1

µbbJ¶

with

bbJ = ³bΦ

´−1J(bbρ) ∙J0(bbρ)³bΦ

´−1J(bbρ)¸−1

A comparison of the above expressions for the asymptotic VC matrix of the

one-step estimator with those for the two-step estimator given by Theorem 6

reveals again that the difference stems solely from terms in the latter expression

involving estimates for α.

8 Concluding Remarks

This paper develops estimation methodologies for a cross-sectional simultaneous

equation model in variables, where simultaneity stems from interdependencies

in the variables as well as from network interdependencies. Taking guidance

from the spatial literature network interdependencies are modeled in the form of

weighted averages. For simplicity, and consistent with the spatial literature, we

refer to those weighted averages as spatial lags. We allow for higher order spatial

lags in the endogenous variables, exogenous variables and disturbances. As a

consequence, the model provides for significant flexibility in modeling network

effects.

The paper develops an estimation theory for both limited and full informa-

tion generalized method of moments estimators, which utilize both linear and

quadratic moment conditions. We consider both two-step and one-step estima-

tors. An important aim in specifying our estimators was that the estimators

remain feasible even for very large data sets and general weight matrices. Re-

sults from an initial Monte Carlo study of the small sample behavior of the

estimators reported in the supplemental appendix are encouraging.

We expect the model will be helpful for empirical research in both macro and

micro economic settings, as well as areas outside of economics. As illustrated in

the paper, one potential application is for modeling social interaction in different

activities; e.g., the level of different physical activities among groups of friends

connected via an activity tracker such as Fitbit. Future research includes an

extension of the methodology to panel data.

27

A Appendix: Preliminary Results

In this appendix we collect some preliminary results. All proofs are relegated

to a supplemental appendix, which will be made available on our web sites.

A.1 Asymptotic Linearity of S2SLS, GS2SLS and GS3SLS

Assumption 10 postulates that the estimators of the regression parameters are

asymptotically linear. In the following we show that the S2SLS, GS2SLS and

GS3SLS estimators are indeed asymptotically linear.

Lemma A.1 : Let A = () be some × real matrix where the row

sums of the absolute elements are bounded uniformly in by some finite con-

stant. Let μ = (1 ) and η = (1

)0 be some × 1random vectors with supmax

=1¯

¯ ∞ and supmax

=1¯

¯

∞ for some 1, and let ξ = (1 )0 = μ + Aη. Then

supmax

=1¯

¯∞.

Lemma A.2 : Suppose Assumptions 1-4 hold. Let Z = [YXY], then

¯

¯4 ≤ ∞ (A.1)

where does not depend on , and .

Lemma A.3 : Suppose Assumptions 1-4 hold. Let Z = [YXY] and

let A = () be some × matrix, where the row and column sums of the

absolute elements are bounded uniformly in by some finite constant. Then

−1u0Au = (1), −1ZAu = (1) and

−1ZAZ = 1) and

furthermore

−1ZAu − −1ZAu = (1)

Lemma A.4 : Suppose Assumptions 1-4, 5 and 6 hold. Consider the S2SLS

estimator eδ = (bZ0Z)−1bZ0ywhere bZ = PH

Z and PH=H(H

0H)

−1H0. Then,

(a) 12(eδ − δ) = −12T0ε + (1) with T = FP and

where

P = Q−1HHQHZ[Q0HZQ

−1HHQHZ]

−1

F =¡I −R∗0

¢−1H

28

(b) −12T0ε = (1).

(c) P is a finite matrix and eP −P = (1) for

eP = (−1H0H)

−1(−1H0Z)×

[(−1Z0H)(−1H0

H)−1(−1H0

Z)]−1

(d) min(−1T0T) ≥ for some 0 for all large .

Lemma A.5 : Suppose Assumptions 1-4, 5 and 6 hold. Consider the GS2SLS

estimator bδ = [bZ∗(bρ)0Z∗(bρ)]−1bZ∗(bρ)0y∗(bρ)where bZ∗(bρ) = PH

Z∗(bρ), where bρ is any consistent estimator forρ. Then,

(a) 12[bδ − δ] = −12T∗0ε + (1) with T∗ = F∗P

∗ and

where

P∗ = Q−1HHQHZ∗(ρ)[Q0HZ∗(ρ)Q

−1HHQHZ∗(ρ)]

−1

F∗ = H

(b) −12T∗0ε = (1).

(c) P∗ = (1) and eP∗ −P∗ = (1) for

eP∗ = (−1H0H)

−1(−1H0Z∗(bρ))×£

(−1Z0∗(bρ)H)(−1H0

H)−1(−1H0

Z∗(bρ))¤−1 (d) min(

−1T∗0T∗) ≥ for some 0 for large .

Lemma A.6 : Suppose Assumptions 1-4, 5-6 hold. Consider the GS3SLS es-

timatorbbδ = hbZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)i−1 bZ0∗(bρ)(bΣ−1 ⊗ I)y∗(bρ)where bZ∗(bρ) = =1

hbZ0∗(bρ)i with bZ∗(bρ) = PHZ∗(bρ), andbρ = £bρ01 bρ0¤0 is any consistent estimator for bρ and bΣ is any consis-

tent estimator for Σ. Then

(a) 12[bbδ − δ] = −12T∗∗0 ε + (1) with T

∗∗ = F∗∗ P

∗∗ , and where

P∗∗ =£Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤×© £Q0

HZ∗(ρ)¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤ª−129

and

F∗∗ = I ⊗H

(b) −12T∗∗0 = (1).

(c) P∗∗ = (1) and eP∗∗ −P∗∗ = (1) for

eP∗∗ =hbΣ−1 ⊗ (−1H0

H)−1i

£−1H0

Z∗(bρ)¤×h−1bZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)i−1

(d) min(−1T∗∗0 T∗∗ ) ≥ for some 0 for large .

A.2 Auxiliary Results for Linear Quadratic Forms

In the following we establish some auxiliary results on the relationship between

linear and quadratic forms based on some × 1 disturbance vector u andcorresponding forms based on a predictor eu.Assumption A.1 For ≥ 1 the × 1 disturbance vector u is generated by

R×

u×1

= e×1

where R is a non-stochastic nonsigular × matrix, and the row and column

sums of the absolute elements of R and R−1 are bounded uniformly by some

finite constant, and the innovations e = (e1 e)0 have the following

properties: For each ≥ 1 the random variables e1 e are totally inde-

pendent with e = 0, (e2) = 2 0 , and sup1≤≤≥1 |e|4+ ∞for some 0.

Assumption A.2 : The predictor eu for u satisfies thateu − u = D∆

where D = (d) is an × ∆ random matrix and ∆ is a ∆ × 1 randomvector. Furthermore sup sup1≤≤1≤≤∆ |d|

2+ ∞ for some 0,

and 12 k∆k = (1).

Assumption A.3 For any × real matrix A∗, whose row and column sumsare bounded uniformly in absolute value,

−1D0A∗u − −1D0A

∗u = (1)

Remark: The above assumptions are formulated in a general fashion, so that

the results on the properties of linear quadratic forms established below can also

be utilized in a variety of contexts. For an interpretation of the results specific

30

to this paper, suppose that u corresponds to the disturbance term of the -

th equation of the model defined by (1)-(5). Then the quantities considered

in Assumptions A.1-A.3 should be interpreted as u = u, D = − Z,

R = I −P∈I M, e = ε, and ∆ = eδ − δ, where eδ issome estimator for the parameter vector δ. Observe that under Assumptions

1-4, and given eδ a 12-consistent estimator, these quantities clearly satisfyAssumptions A.1-A.3 in light of Lemmata A.1 and A.2.

Lemma A.7 : Let A∗ be an × real matrix whose row and column sums are

bounded uniformly in absolute value. Suppose Assumptions A.1 and A.2 hold,

then:

(a) −1 |u0A∗u| = (1), (−1u0A∗u) = (1) and

−1eu0A∗eu − −1u0A∗u = (1)

(b) −1 |D0A∗u| = (1), and

−1D0A∗eu − −1D0A

∗u = (1)

(c) If furthermore Assumption A.3 holds, then

−12eu0A∗eu = −12u0A∗u +α∗0

12∆ + (1)

with α∗ = −1D0(A∗ +A

∗0 )u. (Of course, in light of (a) and (b) we

have α∗ = (1) and −1D0(A∗ +A

∗0 )eu −α∗ = (1))

We next state an additional assumption regarding ∆, which is satisfied

by the various IV estimators considered in the paper; compare Lemmata A.4-

A.6. This assumption then permits a specialization of the r.h.s. expression for

−12eu0A∗eu given in Lemma A.7(c) for the case where A∗ = R0AR, which

will be helpful for deriving its limiting distribution. Note that given Assumption

A.1 and A∗ = R0AR we have u

0A∗u = u

0R

0ARu = e

0Ae.

Assumption A.4 (a) The vector of innovations ε = [ε01 ε

0]

0 satisfiesAssumption 4. (b) The estimator ∆ is asymptotically linear in the sense that

12∆ = −12X=1

T0ε + (1)

with T = FP where the F = () are × dimensional

real nonstochastic matrices with sup −1P

=1 || ∞ with 2 for

= 1 = 1 , and P = () are × ∆ dimensional real

nonstochastic matrices whose elements are uniformly bounded in absolute value.

31

We now have the following specialization of Lemma A.7(c).

Lemma A.8 : Suppose Assumptions A.1-A.3 hold where ∆ is asymptotically

linear satisfying Assumption A.4, and suppose e = ε. Furthermore, let A

be an × real matrix whose row and column sums are bounded uniformly in

absolute value.

(a) Then

−12eu0R0AReu = −12ε0Aε + −12X=1

a0ε + (1)

where a = (1 )0 = Tα with α = −1D0R

0(A +

A0)Ru. Furthermore, sup

−1P=1 || ∞ for 2.

(b) If furthermore the diagonal elements of A are zero, then

"−12ε0Aε + −12

X=1

a0ε

#= 0

"−12ε0Aε + −12

X=1

a0ε

#

= −12 [A(A +A0)] + −1

X=1

X=1

a0a

B Appendix: Proofs for Section 5

Proof of Theorem 1: The existence and measurability of eρ is assured by,e.g., Lemma 3.4 in Pötscher and Prucha (1997). The objective function of the

weighted nonlinear least squares estimator and its corresponding non-stochastic

counterpart are given by, respectively,

(ρ) = m(ρ

eδ)0 eΥm(ρ

eδ)=

heΓr(ρ)− eγi0 eΥ

heΓr(ρ)− eγi(ρ) =

£Γr(ρ)− γ

¤0Υ

£Γr(ρ)− γ

¤

where the quantities appearing in the above equation are defined before the

theorem in the text. To prove the consistency of eρ we show that the conditionsof, e.g., Lemma 3.1 in Pötscher and Prucha (1997) are satisfied for the problem at

hand. We first show that ρ is an identifiably unique sequence of minimizers of

32

. Observe that (ρ) ≥ 0 and that (ρ) = 0, since = Γr(ρ)

by (14). Utilizing Assumptions 8 and 9 we get

(ρ)−(ρ) = (ρ)

=£Γr(ρ)− Γr(ρ)

¤0Υ

£Γr(ρ)− Γr(ρ)

¤=

£r(ρ)− r(ρ)

¤0Γ0ΥΓ

£r(ρ)− r(ρ)

¤≥ min(Υ)min(Γ

0Γ)

£r(ρ)− r(ρ)

¤0 £r(ρ)− r(ρ)

¤≥ ∗

X=1

£ −

¤2for some ∗ 0. Hence, for every 0 and we have

inf:

∈I ||∈[−] and k−k≥[(ρ)−(ρ)]

≥ inf:

∈I ||∈[−] and k−k≥∗

X∈I

£ −

¤2= ∗2 0

which proves that ρ is identifiably unique. Next let Φ = [Γ−] andeΦ = [eΓ−eγ], then¯(ρ)−(ρ)

¯=

¯£r(ρ)

0 1¤ heΦ0 eΥ

eΦ −Φ0ΥΦ

i £r(ρ)

0 1¤0 ¯

≤°°°eΦ0 eΥ

eΦ −Φ0ΥΦ

°°°°°[r(ρ)0 1]°°2≤

°°°eΦ0 eΥeΦ −Φ0ΥΦ

°°°©1 + [2 + ( − 1)2] 4ª

Next observe that the elements of Φ and eΦ are all of the form −1u0uand −1eu0eu, where the row and column sums of are bounded uni-

formly in absolute value. Recall that u =hP

∈I M

iu + ε

and observe that eu − u = −Z(eδ − δ). By Assumption 1 and 2

the row and column sums of the absolute elements ofP

∈I M andhP∈I M

i−1are bounded in absolute value by some finite constant.

By Assumptions 4 the elements of ε are totally independent with zero mean,

positive variance and finite 4 + absolute moments for some 0. Further-

more, observe that by Lemma A.2 the fourth absolute moments of the elements

of Z are uniformly bounded by a finite constant and 12(eδ−δ) = (1).

It now follows immediately from Lemma A.7(a) that eΦ − Φ→ 0, that the

elements of Φ are (1) and, consequently, that the elements of eΦ are (1).

The elements of eΥ and Υ have the analogous properties in light of condi-

tion (b) in the theorem. Given this it follows from the above inequality that

33

(ρ) − (ρ) converges to zero uniformly over the optimization space,

i.e.,

sup:

∈I ||∈[−]¯(ρ)−(ρ)

¯≤

°°°eΦ0 eΥeΦ −Φ0ΥΦ

°°°©1 + [2 + ( − 1)2] 4ª → 0

as → ∞. The consistency of eρ now follows directly from Lemma 3.1 in

Pötscher and Prucha (1997).

Proof of Theorem 2: We have shown in Theorem 1 that the GMM estimatoreρ defined in (16) is consistent. Apart on a set, whose probability tends tozero, the estimator satisfies the following first-order condition

m(eρeδ)0 eΥ

m(eρeδ)

ρ= 0

Substituting the mean value theorem expression

m(eρeδ) =m

(ρeδ) + m

(ρeδ)

ρ(eρ − ρ)

where ρ is some between value, into the first-order condition yields

m(eρeδ)

ρ0eΥ

m(ρ

eδ)ρ

12(eρ − ρ) (B.1)

= −m(eρeδ)

ρ0eΥm

(ρ

eδ)Observe that

m(ρ

eδ)ρ

= −eΓ r(ρ)ρ

(B.2)

and consider

eΞ =m

(eρeδ)ρ0

eΥ

m(ρ

eδ)ρ

(B.3)

=r(eρ)

ρ0eΓ0 eΥ

eΓ r(ρ)ρ

Ξ =r(ρ)

ρ0Γ0ΥΓ

r(ρ)

ρ

In proving Theorem 1 we have demonstrated that eΓ−Γ → 0 and that the

elements of eΓ and Γ are (1) and (1), respectively. By Assumption 9

34

we have eΥ −Υ = (1) and also that the elements of eΥ and Υ are

(1) and (1), respectively. Since eρ and ρ are consistent for ρ, andthe elements of ρ are bounded in absolute value, clearly

eΞ −Ξ → 0 (B.4)

as → ∞, and furthermore eΞ = (1) and Ξ = (1). In particular

max(Ξ) ≤ ∗∗Ξ where ∗∗Ξ is some finite constant. Observe that in light of

Assumptions 8 and 9 we have

min(Ξ) ≥ min(Υ)min(Γ0Γ)min

½r(ρ)

ρ0

r(ρ)

ρ

¾≥ ∗Ξ

for some ∗Ξ 0, observing that

min

½r(ρ)

ρ0

r(ρ)

ρ

¾= min

© + semi-positive matrix

ª ≥ 1.Hence 0 maxΞ−1 = 1min(Ξ) ≤ 1∗Ξ ∞, and thus we also haveΞ−1 = (1). Let eΞ+ denote the generalized inverse of eΞ. It then followsas a special case of Lemma F1 in Pötscher and Prucha (1997) that eΞ isnonsingular eventually with probability tending to one, that eΞ+ = (1), and

that eΞ+ −Ξ−1 → 0 (B.5)

as →∞.Premultiplying (B.1) by eΞ+ and rearranging terms yields12(eρ − ρ) =

h − eΞ+eΞi12(eρ − ρ)−eΞ+ m

(eρeδ)ρ0

eΥ12m

(ρeδ)

In light of the above discussion the the first term on the r.h.s. is zero on -sets

of probability tending to one. This yields

12(eρ − ρ) = −eΞ+ m(eρeδ)

ρ0eΥ

12m(ρ

eδ) + (1)

(B.6)

Observe that

eΞ+ m(eρeδ)

ρ0eΥ −Ξ−1

r(ρ)

ρ0Γ0Υ = (1) (B.7)

In light of (15) the elements of 12m(ρ

eδ) are of the form ( = 1 )−12eu0 £I −R∗0(ρ)¤A

£I −R∗(ρ)

¤ eu35

with £I −R∗(ρ)

¤u = εeu − u = −Z(eδ − δ)

12(eδ − δ) = −12X=1

T0ε + (1)

Recall that if we define eu = eu, u = u, e = ε, R = I −R∗(ρ),D = −Z, ∆ = eδ − δ these quantities satisfy Assumptions A.1-A.3 inlight of Lemmata A.1-A.3. Hence it follows from Lemma A.8 that

−12eu0 £I −R∗0(ρ)¤A

£I −R∗(ρ)

¤ eu= −12

1

2ε0(A +A

0)ε + −12α0

X=1

T0ε + (1)

where α = −−1Z0£I −R∗0(ρ)

¤(A +A

0)ε. Furthermore,

the lemma implies that for some 2 the sample moments of the absolute

elements Tα are uniformly bounded. We now have

12m(ρ

eδ) (B.8)

= −12

⎡⎢⎢⎣12ε0(A1 +A

01)ε +α01

P=1T

0ε

...12ε0(A +A

0)ε +α0

P=1T

0ε

⎤⎥⎥⎦+ (1)

Let Ψ = (

) denote the VC matrix of the vector of linear quadratic

forms on the r.h.s. of (B.8), then observing that the diagonal elements of A

are zero and that the VC matrix of ε is Σ⊗ I, it follows from Lemma A.1 in

Kelejian and Prucha (2010) that

=1

22

£(A +A

0)(A +A

0)

¤(B.9)

+1

α0

"X=1

X=1

T0T

#α

We note that min(Ψ) ≥ 0 by assumption. Since the matrices A,

the vectors of the linear forms, and — in light of Assumption 4 — the innovations

ε satisfy all of the remaining assumptions of the central limit theorem for

vectors of linear quadratic forms given in Kelejian and Prucha (2010) it now

follows that

ξ = −(Ψ)

−12−12

⎡⎢⎢⎣12ε0(A1 +A

01)ε +α01

P=1T

0ε

...12ε0(A +A

0)ε +α0

P=1T

0ε

⎤⎥⎥⎦→ (0 I) (B.10)

36

Next observe that Ψ = (1), and hence (Ψ

)12 = (1), since the row

and column sums of the absolute elements of A are uniformly bounded by as-

sumption, and since in light of the above assumptions the terms −1α0T0Tα

0

are uniformly bounded in absolute value. It now follows from (B.6), (B.7) and

(B.10) that

12(eρ − ρ) = [J0ΥJ]−1J0Υ(Ψ

)

12ξ + (1) (B.11)

observing that Ξ = J0ΥJ. This establishes (19). Since all of the

nonstochastic terms on the r.h.s. of (B.11) are (1), it follows that 12(eρ−ρ) = (1). Next recall that 0 ∗Ξ ≤ min(Ξ) ≤ max(Ξ) ≤ ∗∗Ξ ∞and observe that min(Ξ

−1) = 1λmax(Ξ). Hence

min©Ξ−1J

0ΥΨ

ΥJΞ

−1

ª≥ min

¡Ψ

¢min (Υ)min(Ξ

−1J

0ΥJΞ

−1)

≥ min¡Ψ

¢min (Υ)

∗∗Ξ ≥ 0.

This establishes the last claim of the theorem.

Proof of Lemma 1: Observe that eu = u − Z∆ with ∆ =eδ − δ, and thuseε =

£I −R∗(eρ)¤ eu

= ε −R∗(eρ − ρ)u − £I −R∗(eρ)¤Z∆

Consequently

e = −1eε0eε = −1ε0ε

+−1u0£R∗(eρ − ρ)¤0R∗(eρ − ρ)u

+∆0

−1Z0£I −R∗(eρ)¤0 £I −R∗(eρ)¤Z∆

−−1ε0R∗(eρ − ρ)u − −1ε0R∗(eρ − ρ)u

−−1ε0£I −R∗(eρ)¤Z∆ − −1ε0

£I −R∗(eρ)¤Z∆

+−1u0£R∗(eρ − ρ)¤0 £I −R∗(eρ)¤Z∆

+−1u0£R∗(eρ − ρ)¤0 £I −R∗(eρ)¤Z∆

By Assumption 4 we have =P

=1 ∗∗ and =P

=1 ∗v. Sincethe elements of v are i.i.d. (0,1), it follows that −1v0v = 1 + (1) and

−1v0v = (1) for 6= . Hence

−1ε0ε =X=1

X=1

∗∗−1v0v =X=1

∗∗ + (1) = + (1)

Next observe that, taking into account that ε =£I −R∗(ρ)

¤u,

all the other terms consist of expressions of the form (1)−1u0Au,

37

(1)−1u0AZ and (1)

−1Z0AZ, where A is a matrix whose

row and column sums of the absolute elements is uniformly bounded. In light

of Lemma A.3 all those expressions are seen to be (1), and thus e− =(1) as claimed.

Proof of Theorem 3: We first demonstrate that eΨ − Ψ

= (1),

where the elements of eΨ and Ψ

are defined in (18) and (24). By Lemma

1 we have e − = (1). Furthermore, by assumption −1eT0 eT −−1T0T = (1), where

−1T0T = (1) in light of Assumption

10. Next, observe that under the maintained assumption the row and column

sums of the absolute elements of the matrices A and A are uniformly

bounded, and thus clearly are those of the matrices A = () with =

(+ )(+ ). It then follows directly from Lemma A.7 thateα−α = (1), where α = (1) and eα = (1). Hence, clearlye − = (1), = (1) and e = (1), as well aseΨ

− Ψ = (1), Ψ

= (1) and eΨ

= (1). Observing that

min(Ψ) ≥ ∗Ψ 0 it follows further that ( eΨ

)−1 − (Ψ

)−1 = (1),

(Ψ)

−1 = (1) and ( eΨ)

−1 = (1).

By Assumption 9 eΥ−Υ = (1), Υ = (1) and thus eΥ = (1).

In proving Theorem 2 we have verified that eJ − J → 0, J = (1) andeJ = (1), and furthermore that (eJ0eΥeJ)+−(J0ΥJ)−1 = (1),

(eJ0eΥeJ)+ = (1) and (J0ΥJ)

−1 = (1). The claim that eΩ−

Ω = (1) and Ω

= (1) is now obvious. Observing that by Theorem 2

min£Ω(Υ)

¤ ≥ 0 it follows further that f(Ω

)−1− (Ω

)−1 =

(1), (Ω)

−1 = (1) and (eΩ)

−1 = (1).

We shall utilize the following lemma.

Lemma B.1 : Suppose the ×1 vector of innovations ε is generated as pos-tulated in Assumption 4. LetA = =1(A), and B = =1(B) be sym-

metric × matrices with zero diagonal elements, and let a=[a01 a

0]0

and b=[b01 b

0]0 be × 1 nonstochastic vectors. Then

(ε0Aε+ a0ε) = 0

(ε0Aε+ a0ε ε0Bε+ b0ε) = 2 [A(Σ⊗ I)B(Σ⊗ I)] + a0(Σ⊗ I)b

= 2

X=1

X=1

2 [AB] +

X=1

X=1

a0b

Proof. The result follows immediately from Lemma A1 in Kelejian and Prucha

(2010), observing that the diagonal elements of A∗ = (Σ ⊗ I)A(Σ0 ⊗ I) andB∗ = (Σ ⊗ I)B(Σ0 ⊗ I) are zero.

38

Proof of Theorem 4: Observe that η = −12T0ε and thus clearly Ψ =

ηη0 = −1T0εε

0T = −1T0(Σ⊗ I)T as claimed.

Next observe that = −12 [ε0Bε + b0ε] withB = (0 B 0),

whereB = (A+A0)2 and b = Tα withT = [T

01 T

0]

0.It now follows from Lemma B.1 that

(η ) = (−12T0ε −12 [ε0Bε + b

0ε])=(−12η −12b0ε)

= −1T0(Σ⊗ I)Tα

and thus

(η ξ) = −1T0(Σ⊗ I)T [α1 α]

and

Ψ = (η ξ) = −1T0(Σ⊗ In)T

=1 [α1 α]

as claimed.

Define = −12 [ε0Aε + a0ε] withA = (0 A 0) where

A = (A +A0)2 and a = Tα. Then applying Lemma B.1 we see

that

( ) = 2(2)−1

£¡A +A

0

¢ ¡A +A

0

¢¤+α0

"X=1

X=1

T0T

#α

which verifies the expressions for the elements of Ψ = (ξ ξ).

In light of Assumption 10 and Theorem 2 we have

12∙ eδ − δeρ − ρ

¸=

∙I 0

0 [£J0ΥJ

¤−1J0Υ]

¸ ∙ηξ

¸+ (1)

The vector of linear-quadratic forms [η0 ξ0]0 is readily seen to satisfy the as-

sumptions of Theorem A.1 in Kelejian and Prucha (2010). Hence it follows from

that central limit theorem that

Ψ−12

∙ηξ

¸→ (0 I)

which proves the first part of the theorem.

Note that min(Ψ) ≥ min(Ψ

) ≥ min(Ψ) ≥ 0. In proving

Theorem 3 we have shown that eΨ − Ψ

= (1), Ψ = (1) andeΨ

= (1). The proof that eΨ − Ψ

= (1), Ψ = (1) andeΨ

= (1) is analogous. Thus eΨ − Ψ

= (1), Ψ = (1) andeΨ

= (1). Observing that min(Ψ ) ≥ 0 it follows further that

( eΨ )−1 − (Ψ

)−1 = (1), (Ψ

)−1 = (1) and (eΨ

)−1 = (1).

39

Next recall that in proving Theorem 3 we demonstrated that eα−α =

(1), α = (1) and eα = (1), and furthermore that e − =

(1). Since −1 eT0eT−−1T0T = (1) it follows that eΨ

−Ψ =

(1) and eΨ −Ψ

= (1). Also observe that in light of Assumption 10 we

have Ψ = (1) and Ψ

= (1). This and the above results imply thateΨ −Ψ = (1), Ψ = (1) and eΨ = (1). Recalling that min(Ψ) ≥ 0, it follows further that eΨ−1 −Ψ−1 = (1), Ψ

−1 = (1) and eΨ−1 =

(1).

By Assumption 9, eΥ−Υ = (1),Υ = (1) and thus eΥ = (1).

Also recall from the proof of Theorem 3 that eJ − J = (1), J = (1)

and eJ = (1), and furthermore that (eJ0eΥeJ)+ − (J0ΥJ)−1 =(1), (eJ0eΥeJ)+ = (1) and (J

0ΥJ)

−1 = (1). The claim thateΩ −Ω = (1) and Ω = (1) is now obvious. Observing that

min(Ω) ≥ min(Ψ)min©[(J

0ΥJ)

−1]ª ≥ 0

utilizing that min£J0ΥJ)

−1¤ ≥ 0 as demonstrated in the proof

of Theorem 2, it follows further that eΩ−1 − Ω−1 = (1), Ω−1 = (1) andeΩ−1 = (1).

C Appendix: Proofs for Section 6

Lemma C.1 : Suppose the assumptions of Theorem 5 hold. Let eρ be theinitial GMM estimators defined in (30). Then eρ− ρ = (1).

Proof. To prove the claim we verify the assumptions of Theorem 1. Assump-

tions 1-8 are maintained. Assumption 9 holds trivially with eΥ = Υ = I.

Observing furthermore that by Lemma A.4 we have 12(eδ − δ) = (1)

completes the proof.

Proof of Theorem 5: The proof of Theorem 5 is based on the generic limit

theory developed in Theorem 4. In light of that theorem it proves convenient

to first derive the limiting distribution of bδ = (bδ01 bδ0)0 and bρ =

(bρ01 bρ0)0. The limiting distribution of bδ and bρ is then obtained asa trivial specialization. For clarity we divide the remainder of the proof into

several parts.

Part 1: (Verification of Assumption 10 for bδ) In light of Lemma A.5 we have12[bδ − δ] = −12T0ε + (1) with

T =HP with P = Q−1HHQHZ∗(ρ)[Q

0HZ∗(ρ)Q

−1HHQHZ∗(ρ)]

−1

40

and thus

−1T0T = [Q0HZ∗(ρ)Q

−1HHQHZ∗(ρ)]

−1Q0HZ∗(ρ)Q

−1HH

×(−1H0H)Q

−1HHQHZ∗(ρ)[Q

0HZ∗(ρ)Q

−1HHQHZ∗(ρ)]

−1

The remaining conditions of Assumption 10 are also seen to hold in light of

Lemma A.5.

Part 2: (Specialized Expressions for Ψ and the Corresponding Estimator)

Next observe that the components of Ψ defined in (26) simplified in obvious

ways in that T = (T), and thus all terms involving a T with

6= are zero. In particular, the ( )-th blocks of Ψ and Ψ

are given by

Ψ =

−1T0T

Ψ =

−1T0T [α1 α]

and the elements of Ψ, i.e., the elements of the ( )-th block of Ψ

are

given by

= 2(2)

−1£¡A +A

0

¢ ¡A +A

0

¢¤+

−1α0T0Tα

Now consider the estimatorbT = H(−1H0

H)−1(−1H0

Z∗(eρ))×£(−1Z0∗(eρ)H)(

−1H0H)

−1(−1H0Z∗(eρ))¤−1

where eρ denotes the first-stage estimator for ρ, letbα = −−1£Z0(I −R∗0(eρ))(A +A

0)(I −R∗(eρ))bu¤

with bu = y−Zbδ, and let b = −1bε0bε with bε = y∗(eρ)−Z∗(eρ)bδ. Then the components of the estimator for Ψ defined in (27)

simplify to bΨ = b−1 bT0 bTbΨ = b−1 bT0 bT [bα1 bα]

and the elements of the estimator bΨ of Ψ

are given byb = b2(2)−1 £¡A +A

0

¢ ¡A +A

0

¢¤+b−1bα0bT0 bTbα

Note that

−1 bT0bT =h−1bZ0∗(eρ)bZ∗(eρ)i−1 −1bZ0∗(eρ)bZ∗(eρ)×h−1bZ0∗(eρ)bZ∗(eρ)i−1

41

and

−1 bT0bT =h−1bZ0∗(eρ)bZ∗(eρ)i−1 = h−1bZ0∗(eρ)Z∗(eρ)i−1

Also note that in light of Lemma A.5 we have −1bT0 bT−−1T0T =

(1).

Part 3: (Verification of Assumption 9 for Υ = (Ψ)

−1 and eΥ =

( bΨ)

−1 with bΨ = (

b)). Observe that the assumption that min(Ψ) ≥ for some 0 implies that also min(Ψ

) ≥ . Recall furthermore that by

Lemma C.1 we have eρ− ρ = (1). It now follows directly from Theorem 3

that bΨ−Ψ

= (1), ( bΨ)

−1− (Ψ)

−1 = (1), and Ψ = (1),

(Ψ)

−1 = (1), which verifies Assumption 9.

Part 4: (Limiting Distribution of bδ and bρ) Recall that Assumptions 1-8 are maintained. Thus, in light of the above discussion, all assumptions of

Theorem 4 are satisfied. Next observe that since T = 0 for 6= the

expression for Ω given in (28) simplify to:

Ω =

∙Ω Ω

Ω0 Ω

¸with

Ω = Ψ

Ω = Ψ

=1¡(Ψ

)−1J(J0(Ψ

)

−1J)−1¢

Ω = =1

¡(J0(Ψ

)

−1J)−1J0(Ψ)

−1¢⎡⎢⎣ Ψ

11 Ψ

1

.... . .

...

Ψ1 Ψ

⎤⎥⎦×=1

¡(Ψ

)−1J(J0(Ψ

)

−1J)−1¢

By Theorem 4, 12[(bδ−δ)0 (bρ−ρ)0]0 → (0Ω), and as a specialization,

12∙ bδ − δ)bρ − )

¸→

∙Ω Ω

Ω0 Ω

¸(C.1)

with

Ω = Ψ

= −1T0T

Ω = Ψ

(Ψ)

−1J(J0(Ψ)

−1J)−1

= −1T0T [α1 α] (Ψ

)

−1J(J0(Ψ)

−1J)−1

Ω = (J0(Ψ

)

−1J)−1

42

Observe from part 3 of the proof that

−1 bT0 bT =h−1bZ0∗(eρ)bZ∗(eρ)i−1

and that −1 bT0 bT − −1T0T = (1). The asymptotic normal-

ity result of the theorem now follows immediately from (C.1), observing that

Theorem 4 also established the consistency of the VC estimators.

Proof of Theorem 6: The proof is again based on the generic limit theory

developed in Theorem 4. For clarity we divide the proof, analogous to the proof

of Theorem 5, into several parts.

Part 1: (Verification of Assumption 10 forbbδ) In light of Lemma A.6 we have

12[bbδ − δ] = −12T0ε + (1) with

T = (I ⊗H)P (C.2)

P =£Σ−1 ⊗Q−1HH

¤=1

£QHZ∗(ρ)

¤×©=1 £Q0

HZ∗(ρ)¤ £Σ−1 ⊗Q−1HH

¤=1

£QHZ∗(ρ)

¤ª−1

Now let T and P denote the ( )-th block of T and P, then T =

FP with F = H. The remaining conditions of Assumption 10 are

then seen to hold in light of Lemma A.6.

Part 2: (Specialized Expressions forΨ and the Corresponding Estimator) Spe-

cialized expressions for each of the submatrices Ψ , Ψ

, Ψ

of Ψ defined in

(26) are readily found by substituting into those expressions the formulae for T

given in (C.2), and by observing thatP

=1

P=1 T

0T represents

the ( )-th block of Ψ = T0(Σ⊗ I)T.

Next letbbT = (I ⊗H)bbPbbP =

hbΣ−1 ⊗ (−1H0H)

−1i

£−1H0

Z∗(bρ)¤×h−1bZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)i−1

LetbbT and

bbP denote the ( )-th block ofbbT and

bbP, respectively.

Then, clearly,bbT =H

bbP. Next observe that

bbΨ

=h−1bZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)i−1 = −1 bbT0(bΣ ⊗ I)bbT,

and thus the ( )-th block ofbbΨ

is given bybbΨ

=P

=1

P=1 b bbT0 bbT.

From this we see that the estimatorsbbΨ

,bbΨ

,bbΨ

bbΩ

bbΩ

bbΩ

are special

43

cases of the class of VC estimators considered by Theorem 4. Also note that in

light of Lemma A.6 we have that

−1bbT0 bbT −T0T = (1)

which verifies that also this condition of Theorem 4 holds.

Part 3: (Verification of Assumption 9 for Υ = (Ψ)

−1 and eΥ =

(bbΨ

)−1 with bbΨ

= (bb)) Observe that the assumption that min(Ψ) ≥

for some 0 implies that also min(Ψ) ≥ . Recall furthermore that by

Lemma C.1 we have bbρ− ρ = (1). It now follows directly from Theorem 3

thatbbΨ

−Ψ = (1), (

bbΨ

)−1− (Ψ

)−1 = (1), and Ψ

= (1),

(Ψ)

−1 = (1), which verifies Assumption 9.

Part 4: (Limiting distribution ofbbδ and bbρ) Recall that Assumptions 1-

8 are maintained. Thus, in light of the above discussion, all assumptions of

Theorem 4 are satisfied. It thus follows from that theorem that"(bbδ − δ)(bbρ − ρ)

#→

∙Ω Ω

Ω0 Ω

¸ (C.3)

where

Ω = Ψ

Ω = Ψ

=1(J), Ω = =1(J

0)Ψ

=1(J)

with

Ψ = Ψ

=1[α1 α]

J = (Ψ )−1J

£J0(Ψ

)

−1J¤−1

The asymptotic normality result of the theorem now follows immediately from

(C.3), observing that Theorem 4 also establishes the consistency of the VC

estimators.

44

References

[1] Anselin, L., Spatial Econometrics: Methods and Models. Boston: Kluwer

Academic Publishers, 1988.

[2] Anselin, L., Thirty Years of Spatial Econometrics, Papers in Regional Sci-

ence 89 (2010), 3-25.

[3] Anselin, L. and Florax, R., New Directions in Spatial Econometrics. Lon-

don: Springer, 1995.

[4] Arraiz, I., Drukker, D.M., Kelejian, H.H. and Prucha, I.R., A Spatial Cliff-

Ord-type Model with Heteroskedastic Innovations: Small and Large Sample

Results, Journal of Regional Science 50 (2009), 592U

[5] Audretsch, D.B. and Feldmann, M.P., R&D Spillovers and the Geography

of Innovation and Production. American Economic Review 86 (1996), 630-

640.

[6] Ballester, C., Calvó-Armengol, A. and Y. Zenou, Who’s Who in Networks.

Wanted: The Key Player. Econometrica 74 (2006), 1403-1417.

[7] Baltagi, B.H. and Y. Deng, EC3SLS Estimator for a Simultaneous System

of Spatial Autoregressive Equations with Random Effects. Econometric Re-

views 34 (2015), 659-694.

[8] Baltagi, B.H., Egger, P. and Pfaffermayr, M., Estimating Models of Com-

plex FDI: Are There Third-Country Effects? Journal of Econometrics 140

(2007), 260-281.

[9] Baltagi, B.H., Egger, P. and Pfaffermayr, M., Estimating Regional Trade

Agreement Effects on FDI in an Interdependent World. Journal of Econo-

metrics 145 (2008), 194-208.

[10] Behrens, K., Koch, W., and Ertur, C., Dual Gravity: Using Spatial Econo-

metrics to Control For Multilateral Resistance. Journal of Applied Econo-

metrics 27 (2012), 773-794.

[11] Bell, K.P. and Bockstael, N.E., Applying the Generalized-Moments Esti-

mation Approach to Spatial Problems Involving Microlevel Data. Review

of Economics and Statistics 82 (2000), 72-82.

[12] Blonigen, B.A., Davies, R.B., Waddell, G.R. and Naughton, H.T., FDI in

Space: Spatial Autoregressive Relationships in Foreign Direct Investment.

European Economic Review 51 (2007), 1303-1325.

[13] Blume, L.E., Brock, W.A., Durlauf, S.N. and Ioannides, Y.M., Identifica-

tion of Social Interactions, In: J. Benhabib, A. Bisin, and M.O. Jackson

(Eds.), Handbook of Social Economics, Vol. 1B, Amsterdam: Elsevier Pub-

lisher, 2011, pp. 853-964.

45

[14] Calvó-Armengol, A., Patacchini, E. and Zenou, Y., Peer Effects and Social

Networks in Education, Review of Economic Studies 76, 2009, 1239-1267.

[15] Case, A., Hines Jr., J. and Rosen, H., Budget Spillovers and Fiscal Policy

Independence: Evidence from the States. Journal of Public Economics 52

(1993), 285-307.

[16] Das, D., Kelejian, H.H., and Prucha, I.R., Finite Sample Properties of Esti-

mators of Spatial Autoregressive Models with Autoregressive Disturbances,

Papers in Regional Science, 82 (2003), 1-26.

[17] Ertur, C. and W. Koch, Growth, Technological Interdependence and Spa-

tial Externalities: Theory and Evidence. Journal of Applied Econometrics

22 (2007), 1033-1062.

[18] Cliff, A. and Ord, J., Spatial Autocorrelation, London: Pion, 1973.

[19] Cliff, A. and Ord, J., Spatial Processes, Models and Applications. London:

Pion, 1981.

[20] Cohen, J.P, and Morrison Paul, C.J., Higher Order Spatial Autocorrelation

in a System of Equations: The Impacts of Transportation Infrastructure

on Capital Asset Values, Journal of Regional Science 47 (2007), 457-478.

[21] Cohen-Cole, E., Liu, X. and Y. Zenou, Multivariate Choice and Identifi-

cation of Social Interactions. Journal of Applied Econometrics 33 (2018),

165-178.

[22] Conley, T.G. and E. Ligon, Economic Distance and Cross-Country

Spillovers, Journal of Economic Growth 7 (2002), 157-187.

[23] Devereux, M.P., Lockwood, B., and Redoano, M., Do Countries Compete

Over Corporate Taxes? Journal of Public Economics 92 (2008), 1210-1235.

[24] Egger, P., Pfaffermayr, M., and Winner, H. Commodity Taxation in a

’Linear’ World: A Spatial Panel Data Approach. Regional Science and

Urban Economics 35 (2005), 527-541.

[25] Holtz-Eakin, D., Public Sector Capital and the Productivity Puzzle. Review

of Economics and Statistics 76 (1994), 12-21.

[26] Horn, R.A. and Johnson, C.R. Matrix Analysis. Cambridge: Cambridge

University Press, 1985.

[27] Kapoor, M., Kelejian, H.H., and Prucha, I.R., Panel Data Models with Spa-

tially Correlated Error Components. Journal of Econometrics 140 (2007),

97-130.

[28] Kelejian, H.H. and Prucha, I.R., A Generalized Spatial Two-Stage Least

Squares Procedure for Estimating a Spatial Autoregressive Model with Au-

toregressive Disturbances. Journal of Real Estate Finance and Economics

17 (1998), 99-121.

46

[29] Kelejian, H.H. and Prucha, I.R., A Generalized Moments Estimator for

the Autoregressive Parameter in a Spatial Model. International Economic

Review 40 (1999), 509-533.

[30] Kelejian, H.H. and Prucha, I.R., Estimation of Simultaneous Systems of

Spatially Interrelated Cross Sectional Equations. Journal of Econometrics,

118 (2004), 27-50.

[31] Kelejian, H.H. and Prucha, I.R., HAC Estimation in a Spatial Framework.

Journal of Econometrics 140 (2007), 131-154.

[32] Kelejian, H.H. and Prucha, I.R., Specification and Estimation of Spatial

Autoregressive Models with Autoregressive and Heteroskedastic Distur-

bances. Journal of Econometrics 157 (2010), 53-67.

[33] Lee, L.-F., Best Spatial Two-Stage Least Squares Estimators for a Spa-

tial Autoregressive Model with Autoregressive Disturbances. Econometric

Reviews 22 (2003), 307-335.

[34] Lee, L.-F. , Identification and Estimation of Econometric Models with

Group Interactions, Contextual Factors and Fixed Effects, Journal of

Econometrics 140 (2007), 333-374.

[35] Lee, L.-F. and Liu, X., Efficient GMM Estimation of High Order Spatial

Autoregrssive Models with Autoregressive Disturbances. Econometric The-

ory 26 (2010), 187-230.

[36] Liu, X., Identification and Efficient Estimation of Simultaneous Equations

Network Models, Journal of Business & Economic Statistics 32 (2014),

516-536.

[37] Livanis, G., Moss, C.B., Breneman, V.C., and Nehring, R.F., Urban Sprawl

and Farmland Prices. American Journal of Agricultural Economics 88

(2006), 915-929.

[38] Pinkse, J. and Slade, M.E., Contracting in Space: An Application of Spatial

Statistics to Discrete-Choice Models. Journal of Econometrics 85 (1998),

125-154.

[39] Pinkse, J., Slade, M.E. and Brett, C., Spatial Price Competition: A Semi-

parametric Approach. Econometrica 70 (2002), 1111-1153.

[40] Pötscher, B.M. and Prucha, I.R., Dynamic Nonlinear Econometric Models,

Asymptotic Theory. New York: Springer Verlag, 1997.

[41] Wang, W., Lee, L.-F. and Y. Bao, GMM Estimation of the Spatial Autore-

gressive Model in a System of Interrelated Networks. Regional Science and

Urban Economics, 69 (2018), 167-198.

47

[42] Wang, L., Li, K. and Z. Wang, Quasi Maximum Likelihood Estimation for

Simultaneous Spatial Autoregressive Models. MPRAWorking Paper 59901,

2014.

[43] Whittle, P., On Stationary Processes in the Plane. Biometrica 41 (1954),

434-449.

[44] Yang, K. and Lee, L.-F., Identification and QML Estimation of Multivari-

ate and Simultaneous Equations Spatial Autoregressive Models. Journal of

Econometrics 196 (2017), 196-214.

48

Supplementary Appendix for SimultaneousEquations Models with Higher-Order Spatial or

Social Network Interactions

D Supplement to Appendix A

Proof of Lemma A.1: Let = supmax

=1

P=1 ||, = supmax

=1¯

¯and = supmax

=1¯

¯. Clearly

¯

¯≤¯

¯+

X=1

||¯¯≤¯

¯+

X=1

¯¯

with = || ³P

=1 ||´ifP

=1 || 0 and = 0 ifP

=1 || =0. Since 0 ≤ ≤ 1 and

P

=1 = 1 it follows from Holder’s inequality thatP

=1 ¯¯≤hP

=1 ¯¯i1

and consequently

|| ≤ 2¯

¯+ 2

⎡⎣X=1

¯¯⎤⎦

≤ 2¯

¯+ 2

X=1

[¯¯] ≤ 2 + 2

∞

which proves the claim since , and do not depend on and .

Proof of Lemma A.2: By assumptionX is non-stochastic with sup sup || ∞, and so (A.1) holds trivially if corresponds to an element of X. Next

observe that by (4) and (6) we have

y = a +Aν

a = (I −B∗)−1C∗xA = (I −B∗)−1 (I −R∗)−1 (Σ0 ⊗ I)

In light of Assumptions 1-3 the absolute elements of a are uniformly bounded,

and furthermore the row and column sums of the absolute elements of A are

uniformly bounded; compare, e.g., Remark A.1 in Kelejian and Prucha (2004).

By Assumption 4 the elements of ν are i.i.d. with finite fourth moments. Thus

it follows immediately from Lemma A.1 that sup sup ||4 ∞. Nextobserve that the columns of Y are of the form y =Wy. Since by As-

sumption 1 the row and column sums of the absolute elements ofW are uni-

formly bounded it follows further from Lemma A.1 that sup sup¯

¯4

∞, which completes the proof.

49

Proof of Lemma A.3: In light of the proof of Lemma A.2, and observing that

u = (I −R∗)−1(Σ0∗ ⊗ I)ν, it is readily seen that under the maintainedassumptions u and all columns of Z are of the generic form

Cν, c or c +Cν (D.1)

where c is an × 1 nonstochastic vector with uniformly bounded elementsand C is an × nonstochastic matrix whose row and column sums are

uniformly bounded in absolute value. By Assumption 4 the elements of the

× 1 vector ν are i.i.d. (0 1) with finite fourth moments. Given this,

it is readily seen that −1u0Au and the elements of −1Z0Au and

−1ZAZ are of the generic form , −1d0ν or

−1ν0Dν, or sums

thereof, where ||, the absolute elements of d and the row and column sumsof the absolute elements of D are uniformly bounded by some finite con-

stant, say . In the following let D = () = (D + D0)2. Observe

that £−1d0ν

¤= 0 and

£−1ν0Dν

¤= −1(D) = (1). Further-

more observe that (−1d0ν) ≤ −12 = (1), and that in light of, e.g.,

Lemma A.1 in Kelejian and Prucha (2004) and Remark A.2 in Kapoor et al.

(2007), we have (−1ν 0Dν) ≤ −2(D2

) + −2P

¯4 − 3

¯≤

−1∗ = (1) for some finite constant ∗. Thus clearly −1u0Au =

(1), −1ZAu = (1) and −1ZAZ = 1). The third claim in

the lemma follows from Chebychev’s inequality.

Proof of Lemma A.4: Clearly,

12(eδ − δ) = eP0−12F0εwhere eP and F are defined in the lemma. Given Assumption 6 clearlyeP = P + (1) with P being finite, which establishes (c). Since by As-

sumption 2 the row and column sums of (I −R∗0)−1 are uniformly boundedin absolute value, and since by Assumption 5 the elements of H are uni-

formly bounded in absolute value, it follows that the elements of F are

uniformly bounded in absolute value. By Assumption 4, (ε) = 0 and

(εε0) = I. Therefore,

−12F0ε = 0 and the elements of

(−12F0ε) = −1F0F are also uniformly bounded in absolute

value. Thus, by Chebyshev’s inequality −12F0ε = (1), and conse-

quently 12(eδ−δ) = P0−12F0ε+(1) andP0−12F0ε =(1). This establishes (a) and (b), recalling that T = FP. Next

observe that

min(−1T0T)

≥ min

h¡I −R∗0

¢−1 ¡I −R∗

¢−1imin

£−1H0

H

¤min

©[Q0

HZQ−1HHQHZ]

−1Q0HZQ

−1HHQ

−1HHQHZ[Q

0HZQ

−1HHQHZ]

−1ª≥

50

for some 0, since in light of Assumptions 1 and 2 the largest eigenvalue of¡I −R∗

¢ ¡I −R∗0

¢is bounded from above, and thus the smallest eigen-

value of¡I −R∗0

¢−1 ¡I −R∗

¢−1is bounded away from zero, and since

min£−1H0

H

¤ ≥ [min(QHH)] 2 0 for sufficiently large in light of As-

sumption 6. This establishes (d).

Proof of Lemma A.5: Note from (1) and (7) that

y∗(bρ) = Z∗(bρ)δ + ε − (bR∗ −R)u

and hence

12[bδ − δ]=

h−1bZ0∗(bρ)Z∗(bρ)i−1 −12bZ0∗(bρ) hε − (bR∗ −R)u

i= eP∗0

⎡⎣−12F∗0ε − X∈I

(b − )−12F

∗0ε

⎤⎦ with bR∗ = R∗(bρ), and where eP∗ is defined in the lemma, F∗ =H,

and F∗ =

¡I −R∗0

¢−1M0

H. In light of Assumption 6, and since bρis consistent, it follows that

−1bZ0∗(bρ)Z∗(bρ)−Q0HZ∗(ρ)Q

−1HHQHZ∗(ρ) = (1)

Since by Assumption 6 we have Q0HZ∗(ρ)Q

−1HHQHZ∗(ρ) = (1) and

[Q0HZ∗(ρ)Q

−1HHQHZ∗(ρ)]

−1 = (1) it follows that

[−1bZ0∗(bρ)Z∗(bρ)]−1 − [Q0HZ∗(ρ)Q

−1HHQHZ∗(ρ)]

−1 = (1);

compare, e.g., Pötscher and Prucha (1997), Lemma F1. In light of this it fol-

lows further that eP∗ −P∗ = (1) and P∗ = (1), with P∗ defined

in the lemma. By argumentation analogous to that in the proof of Lemma

A.4 it is readily seen that −12F∗0ε = (1) and −12F∗0ε =

(1). Consequently 12[bδ − δ] = P∗0−12F∗0ε + (1) and

P∗0−12F∗0ε = (1), observing again that bρ − ρ = (1). This

established (a)-(c) recalling that T∗ = F∗P

∗. Next observe that

min(−1T∗0T

∗)

≥ min

hQ0−12HH −1H0

HQ−12HH

imin

©[Q0

HZ∗(ρ)Q−1HHQHZ∗(ρ)]

−1ª≥ min(Q

−1HH)min

©[Q0


−1ªmin £−1H0H

¤≥ min(Q

−1HH)min

©[Q0


−1ª £min(Q−1HH)¤ 2 ≥ ∗

51

for some ∗ 0 in light of Assumption 6, and observing that min£−1H0

H

¤ ≥[min(QHH)] 2 0 for sufficiently large. This establishes (d).

Proof of Lemma A.6: Note from (1) and (8) that

y∗(bρ) = Z∗(bρ)δ + ε − (bR∗ −R)u

and hence

12[bδ − δ]=

h−1bZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)i−1 −12bZ0∗(bρ)(bΣ−1 ⊗ I)×hε − (bR∗ −R)u

i=

h−1bZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)i−1 £−1Z0∗(bρ)H

¤×hbΣ−1 ⊗ (−1H0

H)−1i(I ⊗ −12H0

)hε − (bR∗ −R)u

i

= eP∗∗0 −12F∗∗0 ε + eP∗∗0

⎡⎢⎢⎣P

∈I1(b1 − 1)−12F

∗∗011ε1

...P∈I(b − )

−12F∗∗0ε

⎤⎥⎥⎦ with bR∗ = =1

£R∗(bρ)¤, and where eP∗∗ and F∗∗ are defined in the

lemma, and F∗∗ =

¡I −R∗0

¢−1M0

H. Observe that the ( )-th block

of

−1bZ0∗(bρ)(bΣ−1 ⊗I)Z∗(bρ)− £Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤is

b−1bZ0∗(bρ)Z∗(bρ)− Q0HZ∗(ρ)Q

−1HHQHZ∗(ρ) = (1)

in light of Assumption 6, and since bρ and bΣ are consistent. By Assumption

6 we have £Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤= (1) and

min©

£Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤ª≥ min

©Σ−1 ⊗ Iªmin © £Q0

HZ∗(ρ)Q−1HHQHZ∗(ρ)

¤ª ≥ ∗

for some ∗ 0. This in turn implies that©

£Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤ª−1= (1)

Consequentlyn−1bZ0∗(bρ)(bΣ−1 ⊗ I)Z∗(bρ)o−1 −©

£Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤ª−1= (1);

52

compare, e.g., Pötscher and Prucha (1997), Lemma F1. In light of this it is now

readily seen that eP∗∗ − P∗∗ = (1) and P∗∗ = (1), with P∗∗ defined in the

lemma. By argumentation analogous to that in the proof of Lemma A.4 it is

readily seen that −12F∗∗0 ε = (1) and −12F∗0ε = (1). Conse-

quently 12[bbδ−δ] = P∗∗0 −12F∗∗0 +(1) with P

∗∗0 −12F∗∗0 = (1),

observing again that bρ−ρ = (1). This establishes (a)-(c), recalling that

T∗∗ = F∗∗ P∗∗ . Next observe that

min(−1T∗∗0 T∗∗ )

≥ min(Σ−1)min

hQ0−12HH −1H0

HQ−12HH

i×min

h©

£Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤ª−1i≥ min(Σ

−1)min(Q−1HH)min£−1H0

H

¤×min

h©

£Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤ª−1i≥ min(Σ

−1)min(Q−1HH)[min(Q−1HH)2]

×minh©

£Q0HZ∗(ρ)

¤ £Σ−1 ⊗Q−1HH

¤

£QHZ∗(ρ)

¤ª−1i ≥ ∗

for some ∗ 0 in light of Assumption 6, and observing that min£−1H0

H

¤ ≥[min(QHH)] 2 0 for sufficiently large. This establishes (d).

Proof of Lemma A.7: Without loss of generality, assume that 2 = 1, since

the model in Assumption A.1 can always be normalized accordingly.

We first prove part (a) of the lemma. Let = −1u0A∗u and

e =−1eu0A∗eu, then, in light of Assumption A.1, we have = −1e0B

∗e with

B∗ = (12)R−10 (A∗+A

∗0 )R

−1 . Furthermore, by Assumption A.1, the row and

column sums of the matrices R are uniformly bounded in absolute value. Since

this property is preserved under matrix addition and multiplication - see, e.g.,

Remark A.1 in Kelejian and Prucha (2004) - it follows that also the row and

column sums of the matrices B∗ and B∗B∗ are uniformly bounded in absolute

value. In the following let ∞ be a common bound for the row and column

sums of the absolute elements ofB∗ andB∗B∗, and of their respective elements.

Then, using the triangle inequality and the Cauchy-Schwarz inequality, we have

|| = −1X=1

X=1

¯∗

¯ |e| |e| ≤ −1

X=1

X=1

¯∗

¯≤

Furthermore, utilizing the expression for the variance of linear quadratic forms

given in Lemma A.1 in Kelejian and Prucha (2007) we have in light of Assump-

tion A.1

() = −22(B∗B∗) + −2

X=1

∗2£e4 − 3

¤≤ −12 + −12 sup

£e4 − 3

¤

53

Given that the fourth moments of the e are uniformly bounded in light ofAssumption A.1, this establishes the first two claims of part (a) of the lemma.

We next proof the last claim of part (a) of the lemma. The above discussion

implies that − = (1). Hence it suffices to show that e − = (1).

By Assumptions A.1 and A.2

e − = +

with

= −1∆0D

0 (A

∗ +A

∗0 ) u = −1∆0

D0C∗e

= −1∆0D

0A∗D∆

and C∗ = (∗) = (A

∗ +A

∗0 )R

−1 . The row and column sums of the matrices

C∗ are again seen to be uniformly bounded in absolute value. Let ∞denote a uniform bound for the row and column sums of the absolute elements

of the matrices A∗ and C∗, and let c

∗ and d denote the -th row of C

∗

and D, respectively.

To prove the claim we now show that both and are (1) Using the

triangle and Hölder inequality we get

|| =

¯¯−1

X=1

∆0d0c

∗e

¯¯ (D.2)

≤ −1 k∆kX=1

kdkX=1

¯∗

¯|e| ≤ −1 k∆k

X=1

|e|X=1

kdk¯∗

¯≤ −1 k∆k

X=1

|e|Ã

X=1

kdk!1Ã X

=1

¯∗

¯!1

≤ 1−12³12 k∆k

´⎛⎝−1X=1

|e|⎞⎠Ã−1 X

=1

kdk!1

for = 2+ and 1+1 = 1, and where 0 is as in Assumption A.2. The

last inequality utilizes the observation of Remark C.1 in Kelejian and Prucha

(2007). Since the e are independent with bounded second moments, it followsthat −1

P=1 |e| = (1). The terms

12 k∆k and −1P

=1 kdk are(1) by Assumption A.2. Since

1−12 → 0 as → ∞ it follows that =

(1).

54

Again, using the triangle and Hölder inequality yields

|| =

¯¯−1 X

=1

X=1

∆0d0

∗d∆

¯¯ (D.3)

≤ −1 k∆k2X=1

kdkX=1

kdk¯∗

¯

≤ −1 k∆k2X=1

kdk⎛⎝ X

=1

kdk⎞⎠1⎛⎝ X

=1

¯∗

¯⎞⎠1

≤ 1 k∆k2Ã−1

X=1

kdk!⎛⎝−1

X=1

kdk⎞⎠1

≤ 1−12−12(12 k∆k)2Ã−1

X=1

kdk!2

with and as before. By Assumption A.2 both −1P

=1 kdk and 12 k∆kare (1). Since

1−12 → 0 as →∞ it follows that = (1). From the

last inequality we see also that 12 = (1).

We next prove part (b) of the lemma. In the following let denote

the -th element of −1D0A∗u. Observe uu

0 = R−1 R−10 . Then given

Assumptions A.1 and A.2 there exists a constant ∞ such that u2 ≤

and |d| ≤ . W.o.l.o.g. assume that the row and column sums of the

matrices A∗ are uniformly bounded by . Utilizing the Cauchy-Schwarz and

Lyapunov inequalities we then have |u| |d| ≤£u2

¤12 £d2

¤12 ≤£u2

¤12( |d|)1 ≤

12+1

with as before and, hence,

¯

¯= −1

X=1

X=1

¯∗

¯ [|u| |d|] ≤

12+1

−1X=1

X=1

¯∗

¯≤

32+1

∞

which shows that indeed ¯−1d0A

∗u

¯= (1) where d denotes the -th

column of D. Of course, the argument also shows that α∗ = −1D0(A

∗ +

A∗0 )u = (1). Next observe that

−1D0A∗eu = −1D0A

∗u +

where = −1D0A

∗D∆. By argumentation analogous to that employed

to demonstrate that 12 = (1) it follows that also = (1), which

completes the proof of part (b).

We next prove part (c). In light of the proof of part (a) we have

−12eu0A∗eu = −12u0A∗u + [

−1u0(A∗ +A

∗0 )D]

12∆ + 12

55

with 12 = (1). In light of part (b) and Assumption A.3 we have −1u0(A

∗+

A∗0 )D−α∗0 = (1). The claim follows since 12∆ = (1) by Assumption

A.2.

Remark A.1: For future reference it proves helpful to note that in light of

Remark A.1 in Kelejian and Prucha (2004) the constant used in proving the

last claim of part (a) of the above lemma can be chosen as = 2 where and denote a bound for the row and column sums of the absolute elements

of R∗−1 and A∗. Furthermore it proves helpful to observe that in light of (D.2)and (D.3) ¯e −

¯≤ 2

where = (1) does not depend on A∗.

Proof of Lemma A.8: Given Assumption A.1 and the maintained assumptions

on A it follows that the row and column sums of A∗ = R

0AR are bounded

uniformly in absolute value. Thus by Lemma A.7(c), and utilizing Assumption

A.4, we have

−12eu0R0AReu= −12u0R

0ARu +α0

12∆ + (1)

= −12ε0Aε + −12α0

"X=1

T0ε + (1)

#+ (1)

= −12ε0Aε + −12X=1

a0ε + (1)

The last inequality holds since α = (1); see the remark in Lemma A.7(c).

Given this and the maintained assumption on P it follows that c =

(1 ∆)0 = Pα = (1). Since a = Fc we have

|| =¯¯X=1

¯¯

≤

∆X=1

||

using inequality (1.4.4.) in Bierens (1994). Thus

sup

−1X=1

|| ≤

X=1

sup

−1X=1

|| ∞

in light of Assumption A.4(a). This proves part (a). Part (b) follows readily

from, e.g., Lemma A.1 in Kelejian and Prucha (2010).

56

E Some Monte Carlo Simulation

This section reports on initial results of a Monte Carlo study of the small sample

properties of the estimators developed in the paper. We used a social-network

design, and considered the following two equation system as a special case of

(2):

y1 = 21y2 + [111M1 + 112M2]y1 +P

=1246 1x + u1

y2 = 12y1 + [221M1 + 222M2]y2 +P

=1357 2x + u2

u = [1M1 + 2M2]u + ² = 1 2

(E.1)

The stylized social-network design employes a group structure, which can be

viewed as emulating groups of friends in a classroom setting. More specifically,

suppose there are schools, and each school has three classrooms of size 1,

2, and 3. Now consider the matrix

= () = ⊗⎡⎣ 1

0 0

0 20

0 0 3

⎤⎦where the

, = 1 2 3, are × matrices with zeros on the diagonal and

with ones off the diagonal. Then the elements of can be viewed as indicator

variables that are equal to one if two students belong to the same classrooms,

and zero otherwise. The matrix is of dimension × , where the sample size

is = with = 1+2+3. We used the following values in our Monte

Carlo simulations:

1 = 10, 2 = 15, 3 = 25, = 20

which implies a sample size of = 1000.

To generate the weight matricesM1 andM2, let and be, respectively,

an i.i.d. binary random variable taking values 0 and 1 with equal probability, and

an i.i.d. discrete random variable taking values 1 2 10 with equal probabil-

ity, and let be i.i.d. (0 1). Furthermore, (), () and () are generated

independently. Now define

=£( − ) + ( − ) + ( − )

¤where 2 = 14, 2 = (100 − 1)12, 2 = 1 denote the variances of ,

, and , respectively. For an exemplary interpretation, could be an

indicator for the gender of an individual, could represent the family income

decile of an individual, the are unobserved characteristics, and could then

be interpreted as a measure of similarity between two individuals. With this

interpretation we define

=

(1 if | | ∗0 otherwise

57

and

∗1 = [ ], ∗2 = (1− )[ ]

as indicators for whether or not two individuals are close friends.20

We used the following values in the Monte Carlo simulations:

= 4, = 4, = 1, = 1, ∗ = 4

The weight matrices M1 = [1 ] and M2 = [2 ] are then obtained by

applying the following normalization ( = 1 2): = ∗P

=1∗ ifP

=1∗ 0 and = ∗ if

P=1

∗ = 0. Note that the design allows

for situations where an individual has no close friends or only close friends. That

is, we allow for a row of M1 or M2 to only contain zeros. If a row contains

nonzero elements, then that row is normalized so that the row sum is one.

The parameter values in our Monte Carlo simulations were:

21 = 1 12 = 1

111 = 2 112 = 1 221 = 3 222 = 15

11 = 25 12 = 15 21 = 2 22 = 17

11 = 9 21 = 9 12 = 9 32 = 9

41 = 9 61 = 9 52 = 9 72 = 9

Let x() = [1 7] be the -th row of the regressor matrix [x1 x7].

The x() are generated i.i.d. in and they are generated once for all of the

MC repetitions. More specifically, let ϕ = [1 · · · 7] be independentlydistributed 2 random variables with = 8 degrees of freedom and define

ϕ∗ = [∗1 · · · ∗7] to be the corresponding standardized versions. In otherwords,

∗ = ( − )p2

Then the x() are generated as 0() = Σ

12 ∗0 , where the elements of Σ

are given as = 2, = 2 for 6= , with 2 = 4, and = 19.

That is, each regressor was generated with a variance of 4, and all regressors

are correlated with a correlation coefficient of 019

Let ²() = (1 2) be the -th row of [²1 ²2], then the ²() were generated as

i.i.d. from a normal distribution with mean zero and variance covariance matrixµ225 675

675 225

¶

Table 1 contains results based on 1,000 Monte Carlo repetitions. For each

parameter, we report the mean of the estimates (mean), the standard devi-

ation (SD) of the estimates, and the root mean squared error (RMSE). We

report results for the maximum-likelihood estimator (ML), the two-step limited-

information estimator (GS2SLS), and for the two-step full-information estimator

(GS3SLS).

20That || = 0 and 0 is always less than ∗ does not cause a problem, because [ ] = 0

for all . Similarly, there is no problem when = 1, because [ ] = 0.

58

For all the parameters, the RMSE is essentially the same as the SD, indicat-

ing that there is little bias is the estimates. As expected, the ML estimates have

the smallest RMSE, but the GS2SLS and GS3SLS are not much larger. On the

whole, the results indicate that the GS2SLS and GS3SLS estimators derived in

the text perform well in the finite sample explored in the simulation.

59

Table 1: Simulation results

Parameter estimator true mean SD RMSE

12 ML 0.100 0.100 0.015 0.015

GS2SLS 0.100 0.102 0.014 0.014

GS3SLS 0.100 0.100 0.014 0.014

111 ML 0.200 0.199 0.022 0.022

GS2SLS 0.200 0.211 0.023 0.025

GS3SLS 0.200 0.213 0.022 0.026

112 ML 0.100 0.097 0.034 0.034

GS2SLS 0.100 0.111 0.034 0.036

GS3SLS 0.100 0.113 0.033 0.035

11 ML 0.250 0.243 0.046 0.046

GS2SLS 0.250 0.235 0.047 0.049

GS3SLS 0.250 0.234 0.047 0.050

12 ML 0.150 0.146 0.060 0.060

GS2SLS 0.150 0.133 0.061 0.063

GS3SLS 0.150 0.132 0.061 0.064

21 ML 0.100 0.101 0.014 0.014

GS2SLS 0.100 0.104 0.014 0.014

GS3SLS 0.100 0.102 0.014 0.014

221 ML 0.300 0.299 0.018 0.018

GS2SLS 0.300 0.308 0.019 0.020

GS3SLS 0.300 0.309 0.018 0.020

222 ML 0.150 0.147 0.026 0.026

GS2SLS 0.150 0.155 0.026 0.026

GS3SLS 0.150 0.156 0.025 0.026

21 ML 0.200 0.194 0.046 0.047

GS2SLS 0.200 0.187 0.048 0.049

GS3SLS 0.200 0.186 0.047 0.049

22 ML 0.170 0.170 0.061 0.061

GS2SLS 0.170 0.159 0.063 0.064

GS3SLS 0.170 0.158 0.062 0.063

60

Documents

Simultaneous Equations Models with Higher-Order Spatial or ...econweb.umd.edu/~prucha/Papers/WP_DMD_PHE_IRP_2019.pdfSimultaneous Equations Models with Higher-Order Spatial or Social