52
Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Specication Testing Yonghui Zhang y Qiankun Zhou z Job Market Paper This version : November 10, 2016 Abstract In this paper, we study the nonparametric estimation and testing for the partially linear functional-coe¢ cient dynamic panel data models where the e/ects of some covariates on the dependent variable vary according to a set of low-dimensional variables nanparametrically. Based on the sieve approximation of unknown functions, we propose a sieve 2SLS procedure to estimate the model. The asymptotic properties for both parametric and nonparametric components are established when sample size N and T tend to innity jointly or only N goes to innity. We also propose a specication test for the constancy of slopes, and we show that after being appropriately standardized, our test is asymptotically normally distributed under the null hypothesis. Monte Carlo simulations show that our sieve 2SLS estimators and test perform remarkably well in nite samples. We apply our method to study the e/ect of income on democracy and nd strong evidence of nonconstant e/ect of income on democracy. Key words: Dynamic panel models, Sieve approximation, Functional-coe¢ cient, 2SLS estimation, Specication testing JEL Classication: C12, C23, C26, C33, C38. Address correspondence to: Qiankun Zhou, Department of Economics, State University of New York at Binghamton, Binghamton, NY 13902, USA. Email: [email protected]. Zhang gratefully acknowledges the nancial support from the National Science Foundation of China under Grant 71401166. All errors are the authorssole responsibilities. y School of Economics, Renmin University of China, Beijing, China. z Department of Economics, State University of New York at Binghamton, Binghamton, NY, 13902, USA. 1

Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Partially Linear Functional-Coeffi cient Dynamic Panel Data

Models: Sieve Estimation and Specification Testing∗

Yonghui Zhang† Qiankun Zhou‡

Job Market Paper

This version: November 10, 2016

Abstract

In this paper, we study the nonparametric estimation and testing for the partially linear

functional-coeffi cient dynamic panel data models where the effects of some covariates on the

dependent variable vary according to a set of low-dimensional variables nanparametrically.

Based on the sieve approximation of unknown functions, we propose a sieve 2SLS procedure

to estimate the model. The asymptotic properties for both parametric and nonparametric

components are established when sample size N and T tend to infinity jointly or only

N goes to infinity. We also propose a specification test for the constancy of slopes, and

we show that after being appropriately standardized, our test is asymptotically normally

distributed under the null hypothesis. Monte Carlo simulations show that our sieve 2SLS

estimators and test perform remarkably well in finite samples. We apply our method to

study the effect of income on democracy and find strong evidence of nonconstant effect of

income on democracy.

Key words: Dynamic panel models, Sieve approximation, Functional-coeffi cient, 2SLS

estimation, Specification testing

JEL Classification: C12, C23, C26, C33, C38.

∗Address correspondence to: Qiankun Zhou, Department of Economics, State University of New York at

Binghamton, Binghamton, NY 13902, USA. Email: [email protected]. Zhang gratefully acknowledges

the financial support from the National Science Foundation of China under Grant 71401166. All errors are the

authors’sole responsibilities.†School of Economics, Renmin University of China, Beijing, China.‡Department of Economics, State University of New York at Binghamton, Binghamton, NY, 13902, USA.

1

Page 2: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

1 ≥Introduction

Since the seminal work of Balestra and Nerlove (1966), there is rich literature on the research

of dynamic panel data models among both theoretical and empirical economists. Based on

the influential work of Anderson and Hsiao (1981, 1982), using the two stage least squares

(2SLS) or generalized method of moments (GMM) to estimate the dynamic panel data model

has received lots of attention in the literature. To name a few, see Arellano and Bond (1991)

and Alvarez and Arellano (2003), among others.

However, it should be pointed out that linear parametric form is generally assumed in

the aforementioned researches of dynamic panel models, and it is well-known that parametric

dynamic panel data models might not be flexible enough to capture nonlinear structure in

practice, such a failure may result in model misspecification issue. To deal with this issue,

various nonparametric or semiparametric dynamic panel data models have been proposed. For

example, in earlier work, Li and Ullah (1998) and Baltagi and Li (2002) consider semiparametric

estimation of partially linear dynamic panel data models using instrumental variable methods.

More recently, in order to allow coeffi cients to depend on some informative variables, research

of varying-coeffi cient models has received lots of attention. For the varying coeffi cient models,

it has wide application in the economics literature. As for the first example, in the traditional

labor economics literature of return to schooling, researchers usually apply linear IV regression

model. However, Card (2001) finds that the returns to education tend to be underestimated

by using the 2SLS method when one ignores the nonlinearity and the interaction between

schooling and working experience, and Schultz (2003) argues that the marginal returns to

education may vary with different levels of working experience and schooling. This motivates

Cai et al. (2010) and Su et al. (2014) to consider the partially linear functional coeffi cient

model. Apparently, both models allow the impact of education on the log-wage to vary with

working experience. Another application of varying coeffi cient model is the heterogenous effects

of FDI on economic growth. Based on the finding of Kottaridi and Stengos (2010), Cai et al.

(2010) find that the effect of FDI on economic growth varies across initial income levels, and

thus varying coeffi cient model is adapted for such purpose. For other applications of varying-

coeffi cient models in economics and finance, refer to Baglan (2010), Cai (2010), Cai et al (2000,

2010) and Cai and Hong (2009), among others.

In this paper, we consider a new class of partially linear varying-coeffi cient additive dynamic

models, which allows for linearity in some regressors and nonlinearity in other regressors. In

2

Page 3: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

other words, some coeffi cients are constant and others are varying over some variables. This new

class model is flexible enough to include many existing models as special cases. By extending

the model in Cai and Li (2008) to a partially varying-coeffi cient model with fixed effects, we

reduce the model dimension without influencing the degree of the model flexibility, and the√NT consistent estimation of parametric coeffi cients can be achieved. We also extend the

work of Cai et al. (2015) to sieve estimation instead of kernel estimation. The choice of

sieve estimation over kernel estimation is simply because series estimation methods are more

convenient than kernel methods under certain type of restrictions (such as additivity or shape-

preserving estimation, see Dechevsky and Penez (1997)). It is also computationally convenient

because the results can be summarized by a relatively small number of coeffi cients.

Based on the sieve approximation of unknown varying-coeffi cient functions, we use the

standard approach of taking the first difference to eliminate the fixed effects and use the lagged

variables as instruments. This results in a sieve two stage least squares (2SLS) estimation

for partially linear functional-coeffi cient dynamic panel models. The asymptotic properties for

both parametric and nonparametric components are established when sample size N and T

tend to infinity jointly or only N goes to infinity. We also discuss the plausibility of extending

the proposed sieve 2SLS estimation procedure to unbalanced dynamic panels.

We also propose a nonparametric test for the linearity of the nonparametric component,

i.e., slopes of the nonparametric part is constant. This specification test for the constancy of

slopes is based on a weighted empirical L2-norm distance between the two estimates under the

null and the alternative, respectively. We show that after being appropriately standardized,

our test is asymptotically normally distributed under the null hypothesis.

Compared with the existing literature of estimation of varying-coeffi cient additive dynamic

models, our paper has the following merits. On the first hand, in the existing literature, it

is common to use within-group transformation to eliminate the fixed effects, however, such a

transformation for dynamic panels will in general lead to biased estimation and bias correction is

needed (e.g., Cai and Li (2008), Tran (2014), Rodriguez-Poo and Soberon (2015) and reference

therein). However, our paper considers the first difference transformation to remove the fixed

effects, and we use the lagged variables as instruments and propose the 2SLS estimation of the

constant and varying coeffi cients. It is shown in this paper that the 2SLS estimators are free

of asymptotical bias. On the other, instead of assuming the time series dimension is short for

dynamic panels (e.g., An et al (2016) and Cai et al (2015)), we establish the asymptotics of the

3

Page 4: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

2SLS estimation when both N and T are large, thus our asymptotic results cover the foregoing

results as special cases. We also discuss the applicability of the 2SLS estimation when the

panel is unbalanced, and it is shown in the simulation that the proposed sieve 2SLS estimation

works remarkably well even if the panel is unbalanced.

The small sample properties of the sieve 2SLS estimation and specification testing for

partial linear varying-coeffi cient additive dynamic models are investigated through Monte Carlo

simulation, using six different data generating processes (DGPs). The first four DGPs are

designed to check the performance of the sieve 2SLS estimation for the balanced panels, when

T is large or fixed, and the fifth DGP is to verify the applicability of the sieve 2SLS estimation

for unbalanced panels. From the simulation results, we can observe that the proposed sieve

2SLS works remarkably well for the estimation of both parametric and nonparametric part in

the model. Namely, for the estimation of parametric part of the model, the constant coeffi cient

can always be consistently estimated, and the shape of estimated functional-coeffi cients is close

enough to the true pre-specified functions. Similar findings can be applied to the case when

the panel is unbalanced. The last DGP is to investigate the performance of specification test.

From the simulation results, we can notice that the empirical size of the specification test is

very close to the nominal value and the empirical power increases steadily with the increase of

either N or T.

Finally, We apply the proposed method to study the relationship of income and democracy

as in Acemoglu et al (2008) and Cervellati et al (2014). Through the sieve 2SLS estimation,

we find substantial nonlinearity in the relationship between a country’s degree of democracy

and its lagged value and a nonlinear relationship between income and democracy.

The rest of the paper is organized as follows. We introduce the model and sieve 2SLS

estimation in Section 2. Asymptotics for the sieve 2SLS estimation is established in Section 3.

We propose the specification test in Section 4, and in Section 5 we conduct a small set of Monte

Carlo simulations to evaluate the finite sample performance of the sieve 2SLS estimation and

specification testing. We apply our method to study to study the relationship of income per

capita and democracy in Section 6. Conclusion are made in Section 7. All technical details are

relegated to the Appendix.

Notations: For a real matrix A, let ‖A‖ = [tr (A′A)]1/2 denotes its Frobenius norm and

‖A‖2 = [λmax (A′A)]1/2 denotes its spectral norm where λmax (·) is the largest eigenvalue of “·”.Define PA ≡ A (A′A)−1 A′, andMA = Icol(A)−PA where col (A) denotes the column number

4

Page 5: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

of A. Throughout, C denotes a generic non-zero positive constant that does not depend on N

or T and may vary case by case. The symbols →p and →d denote convergence in probability

and in distribution, respectively.

2 Model and Sieve Estimation

In this section, we first introduce the model and then propose a nonparametric estimation

procedure based on sieve approximation.

2.1 The models

We consider the following partially linear functional-coeffi cient dynamic panel data models with

fixed effects

yit = d′itθ (zit) + x′itβ + ηi + uit, i = 1, . . . , N, t = 1, . . . , T, (2.1)

where dit and xit are pd × 1 and px × 1 vectors of covariates, respectively, and zit is a pz × 3

(pz ≤ 3) vector of covariates which enter the unknown functions θ (·) = (θ1 (·) , . . . , θpd (·))′

nonparametrically1, ηi represents the unobserved heterogeneity of the i-th individual, and uit

is the idiosyncratic error. For identification, we follow Chen and Liu (2001) to assume that

there is no common variables between dit and xit. As classical fixed effects, ηi can be correlated

with (dit,xit, zit). In this paper, we are interested in the estimation of θ (·) and β when N and

T go to infinity jointly or only N goes to infinity while T is fixed.

The specification in (2.1) is natural extensions of classical parametric models with good

interpretability and are becoming more and more popular in data analysis. Thanks to the flex-

ibility and interpretability, the panel version of varying coeffi cient models have many potential

applications. For instance, in the study of return to schooling, the impact of education on the

wage may vary with working experience (Cai et al. (2010) and Su et al. (2014)) while the

impact of other variables are constant. Moreover, model (2.1) is a generalization of Cai and Li

(2008), Sun et al (2009) and Cai et al (2015) by including fixed effects in the model. However,

if αi is assumed to be random, then model (2.1) is similar to the one considered by Zhou et

al (2010). Furthermore, model (2.1) is also an extension of Feng et al (2017) and Cai and Li

(2008) by allowing partially linear in the model. Finally, model (2.1) extends the nonparametric

1Another more flexible specification is yit =∑p1k=1 dk,itθl (zk,it) + x′itβ + ηi + uit, where different coeffi cient

function θk (·) may have different conditional variables

5

Page 6: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

dynamic model of Lee (2014) and Su and Zhang (2016) to partially linear functional-coeffi cient

dynamic models which can further alleviate the problem of "curse of dimensionality".

Before we move to the estimation, two important features for the model (2.1) need to be

emphasized. First, model (2.1) allows for general dynamic pattern. The lagged dependent

variables may enter the vectors of dit, xit or zit. In particular, when dit = (yi,t−1, ...yi,t−p∗)′

and zit = yi,t−q where 1 ≤ q ≤ p∗ for some p∗ ≥ 1, we obtain the panel version of functional-

coeffi cient autoregressive model in Chen and Tsay (1993). Second, either dit or xit may include

endogenous covariates.2

2.2 Sieve estimation

In principle, one can choose either kernel method or sieve method to estimate the unknown

nonparametric component in the model (2.1). But in this paper, we focus on the sieve method.

There are mainly two reasons. First, the conditional variables for different coeffi cient functions

θl (·) may be different, which is complicated to use kernel method. Second, as argued by Su andHoshino (2016), even if kernel method has the advantage of capturing the local properties of the

coeffi cient functionals and its asymptotic properties are also well documented in the literature,

the kernel method for functional-coeffi cient models usually require iterative methods which are

particular computationally demanding. However, it is convenient to use sieve approximation

to handle different conditional variables; in our case, there exists explicit expression for our

estimates and the computation is fast. See Chen (2007) and Li and Racine (2007) for an

overview on sieve methods.

Let hL (·) = (h1 (·) , ..., hL (·))′ be an L × 1 sequence of basis functions where the number

of sieve basis functions L ≡ LNT increases as either N or T increases. Then for k = 1, . . . , pd,

we have θk (·) ≈ hL (·)′ γk, where γk = (γk1, ..., γkL)′ is an L × 1 vector of corresponding

coeffi cients of the basis functions.3 For notational simplicity, we suppress the dependence of

hL (·) on L and let h (·) = hL (·) and hit = h (zit) . Define a pdL× 1 vector

H (d, z) = d⊗ h (z) =(d1h (z)′ , ..., dpdh (z)′

)′,

2However, we do not consider the case with endogeneous zit. Othewise, we face the problem of nonparametric

instrumental variables (NPIV) regression which raise different issues in identifcations and estimation. This is

beyond the scope of this paper.3For simplicity, we use the same sieve basis functions in the approximation of different coeffi cient functions.

We also allow for different number of basis functions for different unknown function.

6

Page 7: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

where ⊗ is the Kronecker product and Hit ≡ H (dit, zit). Then we can rewrite the model (2.1)

as follows

yit = H′itΓ + x′itβ + ηi + εit, (2.2)

where Γ ≡ (γ ′1, ...,γ′pd

)′ denotes the vector of sieve approximation coeffi cients and εit = uit+rit,

rit =∑pd

k=1 dk,itrLk,it with r

Lk,it ≡ θk (zit) − γ ′kh (zit) signifying the sieve approximation error.

Now we have a linear dynamic panel data models with two new features: one is the increasing

dimension of Hit and the other is the composite structure of the new error εit. When establish

the asymptotic properties for our estimates, we have to take these feature into consideration.

Since the fixed effects αi enters model (2.2) linearly, one can apply some linear transforma-

tion to eliminate αi.When there is neither lagged dependent variables nor endogenous variables,

and all the variables are strictly exogenous, we can use the within-group transformation to re-

move αi. However, for dynamic panel data models, the most commonly used transformation

is the so-called first time difference (Anderson and Hsiao (1981, 1982) and Arellano and Bond

(1991)). Let ∆Ait = Ait − Ait−1 be the first time difference (FD) of sequence AitTt=1 for

A = y, H, x, ε, u and r, then the first differenced model of (2.2) is given by

∆yit = ∆H′itΓ + ∆x′itβ + ∆εit, t = 2, . . . , T ; i = 1, ..., N. (2.3)

As L → ∞ when (N,T ) → ∞, the sieve approximation errors become asymptotic neg-ligible and ∆uit dominates in ∆εit. For the FD model (2.3), there is endogeneity problem

E (∆xit∆εit) = E (∆uit∆xit) + o (1) 6= 0 or E (∆Hit∆εit) = E (∆uit∆Hit) + o (1) 6= 0 which

may be caused by the lagged dependent variables in either Hit or xit, or the endogenous co-

variates in dit or xit.4 To handle the problem, we suppose there exists a pw×1 vector wit such

that E (∆uitwit) = 0, which can be used as instruments for ∆Hit and ∆xit, then we can apply

the IV or 2SLS estimation to obtain consistent estimators of Γ and β for model (2.3).

Remark 2.1 How to choose IVs or construct moment conditions depends on the model spec-

ification case by case. When only one lagged dependent variable on the right-hand side of

(2.3), we should include all the lagged levels yi,t−2, ..., yi1 or lagged differences ∆yi,t−2, ...,∆yi2

as IVs when is T small; when T is large, we focus on the consistency instead of effi ciency

for the estimator because the latter is still an open question in the literature of nonparamet-

ric/semiparametric dynamic panel data models. For simplicity, we assume there exists only

one lagged dependent variable yi,t−1 in dit,xit,or zit:

4An illustrating example is if ∆xit = ∆yit−1, then we have E (∆xit∆εit) 6= 0.

7

Page 8: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

(i) Linear dynamic panel: xit = (yi,t−1,x′−1,it)

′ where x−1,it = (x2,it, ..., xpx,it)′ are sequentially

exogenous. Choose wit = (∆H′it, yi,t−2,∆x′−1,it)′ or wit = (∆H′it,∆yi,t−2,∆x′−1,it)

′.5

(ii) Lagged dependent variable with functional coeffi cient: d1,it = yi,t−1. Let wit = (yi,t−2h′i,t−2,

∆H′−1,it,∆x′it)′ or wit = (∆H1,i,t−1, ...,∆HL,i,t−1,∆H′−1,it,∆x′it)

′, where ∆H−1,it = (∆HL+1,it,

...,∆HpdL,it)′ with ∆Hk,it being the kth element of ∆Hit.

(iii) Nonparametric dynamic coeffi cient functions: zit = yi,t−1. Set wit =(

∆H′i,t−2,∆x′it

)′or

wit =(H′i,t−2,∆x′it

)′.

(iv) When endogenous covariates are included in xit or dit, we can use additional variables

as IVs or construct IVs from the lags of xit,dit, zit or yit according to the specific dependent

structure of the model.

Remark 2.2 How to choose optimal instruments for the nonparametric/semiparametric dy-

namic panel model when T is large, it is still an open question due to the curse of dimensionality

and the possible problems caused by many weak IVs (e.g., Newey and Windmeijer (2009), Okui

(2009) and reference therein).

Let Wi ≡ (wi2, . . . ,wiT )′, W ≡ (W′1,W

′2, . . . ,W

′N )′, ∆yi ≡ (∆yi3, . . . ,∆yiT )′, and ∆Y ≡

(∆y′1, ∆y′2, . . . ,∆y′N )′. Similarly define Hi, ∆H, ∆xi,and ∆X. Then the sieve IV/2SLS esti-

mates of Γ and β based on the model (2.3) are given by6(Γ′, β

′)′=[∆X′PW∆X

]−1∆X′PW∆Y,

where ∆X = (∆H,∆X) , and PW = W(W′W)−W′ is a projection matrix with A− denoting

the Moore-Penrose generalized inverse of square matrix A (e.g., Horn and Johnson (2012)). Let

YW ≡ PW∆Y, HW ≡ PW∆H, XW ≡ PW∆X and MXW= IN(T−1) −XW(X′WXW)−X′W.

By the formula for partitioned regressions, we can write the estimators for Γ and β separately

by

Γ =(H′WMXW

HW

)−1H′WMXW

YW, (2.4)

β =(X′WMHW

XW

)−1X′WMHW

YW. (2.5)

5 In practice, one usually adds more several lags but not all lags in IVs to improve finite sample performance

in dynamic panel models. The use of all lags raise new issues such as weak and many IVs. The choice of optimal

IVs is still an open question. See Okui (2009) for an overview.6More generally, we can consider the sieve GMM estimate defined by: (Γ′, β

′)′ = [∆X′WANTW′∆X]−

∆X′WANTW′∆Y, ANT is a pw× pw weighting matrix that is symmetric and asymptotically positive definite.The asymptotic properties are similar to the 2SLS estimator but the notation is slightly more complicated. So

we decide to focus on the sieve 2SLS estimation here.

8

Page 9: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Given the estimator of Γ = (γ ′1, . . . , γ′pd

)′, we can estimate the varying coeffi cient θ (u) by

θ (u) = HS (u) Γ, where HS (u) =

h (u)′ S1

...

h (u)′ Spd

(2.6)

where Sk = i′k ⊗ IL and ik is the pd × 1 unity vector with the only nonzero element being 1 at

its k-th place. The L× pdL matrix Sk selects the estimator of γl. That is

γk = SkΓ and θk (u) = h (u)′ γk for k = 1, ..., pd.

3 Asymptotic Properties of Sieve Estimators

In this section, we study the asymptotic properties of the sieve estimators β and θ (·) . In orderto derive the asymptotics, we focus on the case of large T and large N and discuss the extension

to the case of large N and small T briefly. The latter case is much simpler because we do not

need to impose the stationarity and mixing conditions on the system (2.1), whilst we need

to impose extra conditions (e.g., mixing conditions) to ensure the system (2.1) is ergodic and

stationary when T is large.

3.1 Assumptions

To apply the method of sieves, we assume that θl (·)’s satisfy some smoothness conditions. LetU ⊂ Rpz be the support of zit. To allow for the possible unboundedness of U , we follow Chenet al. (2005), Su and Jin (2012), and Lee (2014) to use a weighted sup-norm: ‖m‖∞,$ ≡supu∈U |m (u)| [1 + ‖u‖2]−$/2 for some $ ≥ 0. When $ = 0, the norm is the usual sup-norm

which is suitable for the case with compact support (Newey (1997) and Andrews (2005)).

Let α ≡ (α1, . . . , αpz)′ be a pz-vector of non-negative integers and |α| ≡

∑pzk=1 αk. For any

u = (u1, . . . , upz)′ ∈ U , theα-th derivative ofm is denoted as∇αm(u) ≡ ∂αm(u)/(∂uα11 . . . ∂u

αpzpz )

and the l-th derivatives of m include all ∇αm(u)’s with |α| = l. The Hölder space Λγ(U) with

order γ > 0 is the set of functions m : U → R such that the first dγe derivatives are boundedand the dγeth derivatives are Hölder continuous with the exponent γ−dγe ∈ (0, 1]. The Hölder

norm is defined by

‖m‖Λγ ≡ supu∈U|m (u)|+ max

|α|=dγesupu6=u∗

|∇αm(u)−∇αm(u∗)|‖u− u∗‖γ−dγe

.

9

Page 10: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

The following definition is adopted from Chen et al. (2005).

Definition 1. Let Λγ(U , $) ≡ m : U → R such that m(·)[1 + || · ||2]−$/2 ∈ Λγ(U) denote aweighted Hölder space of functions. A weighted Hölder ball with radius c is

Λγc (U , $) ≡m ∈ Λγ(U , $) :

∥∥∥m(·)[1 + ‖·‖2]−$/2∥∥∥

Λγ≤ c <∞

.

Function m(·) is said to be H(γ,$)-smooth on U if it belongs to a weighted Hölder ball

Λγc (U , $) for some γ > 0, c > 0 and $ ≥ 0.

Let yi ≡ (yi1, . . . , yiT )′ and define di, zi, xi and ui analogously. Let yi,t−1

≡ (yi,t−1, yi,t−2, ..., yi1)′

and define dit, xit and zit in the same way. Denote Qwx,NT ≡ 1NT

∑Ni=1

∑Tt=2 wit∆x′it, Qwx ≡

E(Qwx,NT ),Qww,NT ≡ 1NT

∑Ni=1

∑Tt=2 witw

′it,Qww ≡ E(Qww,NT ),Qwh,NT ≡ 1

NT

∑Ni=1

∑Tt=2 wit∆H′it,

Qwh ≡ E(Qwh,NT ), Qhh,NT ≡ 1NT

∑Ni=1

∑Tt=2 ∆Hit∆H′it and Qhh = E (Qhh,NT ). Then define

Q1 ≡ Q′wxQ−1wwQwx −Q′wxQ

−1w Qwh

(Q′whQ

−1wwQwh

)−1Q′whQ

−1wwQwx, (3.1)

Q2 ≡ Q′wxQ−1ww −Q′wxQ

−1w Qwh

(Q′whQ

−1wwQwh

)−1Q′whQ

−1ww, (3.2)

Q3 ≡ Q′whQ−1wwQwh −Q′whQ

−1wwQwx

(Q′wxQ

−1wwQwx

)−1Q′wxQ

−1wwQwh, (3.3)

Q4 ≡ Q′whQ−1ww −Q′whQ

−1w Qwx

(Q′wxQ

−1wwQwx

)−1Q′wxQ

−1ww, and (3.4)

Q5 ≡ Qhh −Q′whQ−1wwQhw. (3.5)

For model (2.1), we make the following assumptions.

Assumption A.1 (i) (yi,di,xi, zi, ηi,ui) are independently across i and E(uit|yi,t−1,dit,xit,zit) =

0.

(ii) There exists a pw × 1 vector wit such that pw ≥ px + pdL and E (∆uitwit) = 0.

(iii) For each i, (yit,dit,xit, zit, uit) : t = 1, 2, ... is strong-mixing with mixing coeffi cientαi (·) given the fixed effects. α (·) = max1≤i≤N αi (·) satisfies

∑∞s=1 s

2αδ

4+δρ (s) < C < ∞ for

some δ > 0.

Assumption A.2 (i) θl(·)’s (l = 1, ..., pd) are all H(γ,$)-smooth on U for some γ > (pz+1)/2

and $ ≥ 0.

(ii) For any H(γ,$)-smooth function θ (u) , there exists a linear combination of basis func-

tionsΠ∞,Lθ ≡ γ ′θh (·) in the linear sieve space GL = θ (·) = γ ′h (·) such that ‖θ −Π∞,Lθ‖∞,$ =

O(L−γ/pz

)for some $ > $ + γ.

(iii) plim(N,T )→∞1NT

∑Ni=1

∑Tt=2(1 + ||zit||2)$α(zit) <∞.

(iv) There are a sequence of constants ζ0 (L) and a sequence of increasing compact set UNTsatisfying that supu∈UNT ‖u‖ = O(ζ0 (L)1/$), supu∈UNT ‖h (u)‖ ≤ ζ0 (L) , and ζ0 (L)2 L/ (NT )→0 as (N,T )→∞.

10

Page 11: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Assumption A.3 (i) sup1≤i≤N sup2≤t≤T E ‖χit‖4+ε ≤ C < ∞ for some ε > 0 and χit = uit,

zit, dit, wit, and Hit.

(ii) Qww is invertible, Q′wxQ−1wwQwx is invertible, Q1 and Q3 are invertible, and (Qwh,Qwx)

has full rank px + L.

(iii) The eigenvalues ofQ1 andQ3 are all bounded and bounded away from 0, and λmin (Q4Q′4) >

C > 0.

(iv) λmax (Qhhω) < ∞ where Qhhω ≡∫u∈U HS (u) HS (u)′ ω (u) du and ω (·) is a non-

negative weight function.

(v) Ω = lim(N,T )→∞1NT

∑Ni=1E (W′

i∆ui∆u′iWi) > 0 and λmax (Ω) < C <∞.Assumption A.4 As (N,T )→∞, L3/ (NT )→ 0,

√NTL−γ/pz → 0.

Most of the above assumptions are very similar to those that are used in Lee (2014), and Su

and Zhang (2016), we modify a few of them for the purpose of our analysis. Assumption A.1(i)

is standard for dynamic panel data models (e.g., Alvarez and Arellano, 2003); A.1(ii) requires

the existence of a vector of IVs; and A1(iii) imposes the strong mixing condition on the data

generating process, which can be easily satisfied for a wide class of nonlinear autoregressive

functions in time series context (e.g., Chen and Shen, 1998). Assumptions A.2 is widely used in

the literature on sieve estimation with infinite support. A.2(i) imposes a smoothness conditions

on the unknown functions; A.2(ii) states the uniform sieve approximation errors; A.2(iii) is used

to obtain the convergence of our sieve estimator in the empirical L2-norm; and A.2(iv) is used

to derive the uniform convergence rate on an increasing compact support UNT . AssumptionA3(i) gives some moment conditions on the variables and basis functions; A.3(ii)-(iii) are used

in identification condition for sieve 2SLS estimation. In the sieve literature, it is known that

many series function satisfy assumptions A.2 and A.3, for example, power series, orthogonal

polynomial, trigonometric series and splines. Assumption A.4 impose some rate conditions on

L to control the sieve approximation error and variance.

3.2 Asymptotic properties of sieve estimators

We establish the asymptotic normality of β in the following theorem.

Theorem 3.1 Suppose Assumptions A.1-A.4 hold. Then

√NT (β − β)→d N(0,Q−1

1 Q2ΩQ′2Q−11 ). (3.6)

11

Page 12: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

The above theorem gives the asymptotic distribution of β, which is shown to be√NT -

consistent, asymptotically unbiased and asymptotically normally distributed.

Now let’s turn to the asymptotic properties of sieve estimator (2.6). The following theorem

reports the convergence rates and asymptotic normality of θ (u) .

Theorem 3.2 Suppose Assumptions A.1-A.4 hold and infu∈UNT ‖h (u)‖ ≥ C > 0. Then

(i)∫ ∥∥∥θ (u)− θ (u)

∥∥∥2ω (u) du = Op

(LNT + L−2γ/pz

);

(ii) supu∈UNT

∥∥∥θ (u)− θ (u)∥∥∥∞

= Op(ζ0 (L) (√

LNT + L−γ/pz));

(iii)√NTΞ (u)−1/2

[θ (u)− θ (u)

]→d N (0, 1) , where Ξ (u) = HS (u)′Q−1

3 Q4ΩQ′4Q−13 HS (u) .

Several remarks can be made for the above asymptotic results.

Remark 3.3 The most important feature about the above result is that with oversmoothing the

sieve estimation of β and θ (·) is unbiased in the sense that the limiting distributions of β andθ (·) are centered at zero, unlike the sieve estimation based on within group transformation,which is shown by Lee (2014) and Tran (2014) to be asymptotically biased of order O

(T−1

)and bias correction method is needed for statistical inference. It will be interesting to compare

the bias-corrected estimators with our sieve 2SLS estimators.

Remark 3.4 For the above asymptotic results, it is assumed that both N and T go to infinity.

However, similar asymptotic results for both β and θ (u) still hold if T is fixed and N goes to

infinity. When T is fixed, the sieve 2SLS estimation procedure for β and θ (u) remain the same

as the case when T is large. However, the assumption for asymptotics for large N and fixed T

can be relaxed, namely, we don’t need to impose weak dependence condition (strong mixing) on

the variables. Also, all assumptions regarding the limit of T →∞ can be relaxed. For instance,

Assumption A3(iii) can be relaxed as Ω = limN→∞1NT

∑Ni=1E (W′

i∆ui∆u′iWi) > 0. Finally,

Assumption A.4 can be relaxed to L/N → 0,√NL−γ/pz → 0,and L3/N → 0 as N → ∞.

Under these assumptions, the above asymptotics for both β and θ (·) can be easily establishedby following the derivation in this paper.

Remark 3.5 In the above sieve 2SLS estimation process, it is assumed that the panel structure

is balanced, i.e., time dimension T is the same for different cross-sectional units. However,

the proposed sieve 2SLS estimation procedure can be easily modified to suit the unbalanced

panels. In such a case, the sieve 2SLS estimation still works, namely, the only thing needs

12

Page 13: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

to be changed in both (2.4) and (2.5) is the dimension of the data matrix. For example, for

(2.4), the dimension of data ∆H and ∆YW changes to (T1 + T2 + · · ·+ TN − 2N)× pdL and(T1 + T2 + · · ·+ TN − 2N)×1, respectively, where Ti is the number of observation of i-th cross-

sectional unit. As shown in the simulation below, the sieve 2SLS estimation works remarkably

well for unbalanced panel regardless min1≤i≤N (Ti) is large (large T case) or max1≤i≤N (Ti) is

fixed (fixed T case).

4 Specification Test for the Constant Slopes

In this section we maintain the correct specification of the partially linear panel data model

and consider testing for the constancy of the nonparametric component θ (·) in the partiallylinear model. The null hypothesis is

H0 : Pr(θ (zit) = γ0

)= 1 for some γ0∈ Θ ⊂ Rpd , (4.1)

where i = 1, . . . , N, t = 1, . . . , T . The alternative hypothesis is given by

H1 : Pr [θ (zit) = γ] < 1 for all γ ∈ Θ ⊂ Rpd . (4.2)

To facilitate the asymptotic local power analysis, we consider the following sequence of Pitman

local alternatives:

H1 (δNT ) : Pr(θ (zit) = γ0 + δNTΨ (zit)

)= 1 for some γ0∈ Θ ⊂ Rpd

where Ψ (·) = ΨNT (·) is a measurable nonlinear function and δNT →∞ as (N,T )→∞.Here we follow Su and Zhang (2016) and propose a test for H0 versus H1 by comparing

the weighted empirical L2-norm distance between two estimators of the slopes of dit, i.e., the

nonparametric estimator θ (·) and parametric estimator γ. Intuitively, both estimators areconsistent under the null hypothesis while only the sieve estimator is consistent under the

alternative. So if there is any deviation from the null, the distance between two estimators will

13

Page 14: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

signal it out asymptotically. This motivates us to consider the following test statistic7

DNT =1

NT

N∑i=1

T∑t=2

∥∥∥θ (zit)− γ∥∥∥2a (zit) , (4.3)

where a (·) is a user-specified nonnegative weighting function, and γ is the usual IV/2SLSestimate for γ in the linear panel data model under H0. Similar test statistics have been

proposed in various other contexts in the literature; see, e.g., Härdle and Mammen (1993),

Hong and White (1995), and Su and Zhang (2016). We will show that after being appropriately

centered and scaled, DNT is asymptotically normally distributed under the null hypothesis of

constant slopes.

Under H0, taking the first difference on the linear panel data model leads to

∆yit = ∆d′itγ + ∆x′itβ + ∆uit,

for i = 1, ..., N and t = 2, ..., T . Due to the possible endogeneity of ∆dit or ∆xit, let vit be a

pv × 1 vector of instrumental variables. Let ∆D = (∆d′12, ...,∆d′1T , ...,∆d′N2, ...,∆d′NT )′ and

V = (v′12, ...,v′1T , ...,v

′N2, ...,v

′NT )′ . Denote PV = V (V′V)−1 V′. Define the linear projection

matrix of ∆A on V by AV = PV∆A where ∆A = ∆D, ∆Y, or ∆X. Then the 2SLS estimator

for the slope of γ can be written as

γ =(D′VMXV

DV

)−1D′VMXV

YV,

where MXV= IN(T−1) −PXV

. Denote Qvx,NT ≡ 1NT

∑Ni=1

∑Tt=2 vit∆x′it, Qvx ≡ E(Qvx,NT ),

Qvv,NT ≡ 1NT

∑Ni=1

∑Tt=2 vitv

′it, Qvv ≡ E(Qvv,NT ), Qvd,NT ≡ 1

NT

∑Ni=1

∑Tt=2 vit∆d′it, Qvd ≡

E(Qvd,NT ), qvΨ,NT = 1NT

∑Ni=1

∑Tt=2 vit∆Ψ′it and qvΨ = E (QvΨ,NT ), where ∆Ψit = Ψ (zit)−

Ψ (zi,t−1). Define

Q6 = Q′vxQ−1vv Qvx −Q′vxQ

−1vv Qvd

(Q′vdQ

−1vv Qvd

)−1Q′vdQ

−1vv Qvx,

qΨ = Q′vxQ−1vv qvΨ −Q′vxQ

−1vv Qvd

(Q′vdQ

−1vv Qvd

)−1Q′vdQ

−1vv qvΨ,

7One can also construct a test statistic by comparing the two estimates for the whole conditional mean

function under the null and the alternative:

D♦NT =

1

N (T − 1)

N∑i=1

T∑t=2

∥∥∥(θ (zit)− γ)′

dit + xit(β − β

)∥∥∥2 a (zit)

.

14

Page 15: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

and γΨ = Q−16 qΨ. LetQ

(a)hs,NT

= 1NT

∑Ni=1

∑Tt=2 HS (zit) HS (zit)

′ a (uit) andQ(a)hs

= E(Q(a)hs,NT

).

Then define

Q7,NT = Qwh,NT −Qwx,NT

(Q′wx,NTQ−1

ww,NTQwx,NT

)−1Q′wx,NTQ−1

ww,NTQwh,NT ,

Q7 = Qwh −Qwx

(Q′wxQ

−1wwQwx

)−1Q′wxQ

−1wwQwh,

QNT = Q−1ww,NTQ7,NTQ−1

3,NTQ(a)hs,NT

Q−13,NTQ7,NTQ−1

ww,NT ,

Q = Q−1w Q7Q

−13 Q

(a)hs

Q−13 Q7Q

−1w .

Now we give some additional assumptions which is used in deriving the asymptotic prop-

erties for our test statistic.

Assumption A.5. (i) plim(N,T )→∞1NT D′VMXV

DV exists and is invertible.

(ii) plim(N,T )→∞1NT

∑Ni=1

∑Tt=2 vit∆Ψ′it exists.

(iii) 1NT D′VMXV

PV∆u = Op((NT )−1/2).

Assumption A.6. (i) sup1≤i≤N sup2≤t≤T E ‖χit‖8+8ε ≤ C <∞ for some ε > 0 and χit = uit,

zit, dit, wit, and hit.

(ii) λmax (Ωi) < C <∞ where Ωi = E (W′i∆ui∆u′iWi) for i = 1, ..., N.

(iii)∑∞

d=1 dα4δ−14δ+1 (d) ≤ ∞ and

∑∞d=1 d

2αδδ+1 (d) <∞ for some δ > 1/4.

Assumption A.7. As (N,T )→∞, L4/ (NT )→ 0,√NTL−γ/pz → 0, and L3/N → 0.

Remark 4.1 Assumption A.5 is used to study the asymptotic behavior of parametric estimator

γ under the local alternatives. Assumption A.6(i) imposes more higher order moment on the

variables in testing; A.6(ii) require the variance-covariance matrix of T−1/2∆u′iWi has bounded

eigenvalues. Assumption A.7 gives some more strict rate conditions on L diverging to ∞ in

testing.

We define the test statistic

JNT = (NTDNT − BNT ) /√VNT

where BNT =tr(QΩ1) and VNT = 2tr(QΩ1QΩ1) are asymptotical bias and variance terms,

respectively. Noting that JNT is infeasible due to the unknown BNT and VNT , then we define

a feasible test statistic

JNT =(NTDNT − BNT

)/

√VNT

15

Page 16: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

where BNT and VNT are respectively estimated by

BNT = tr(QNTΩ1,NT ) and VNT = 2tr(QNTΩ1,NTQNTΩ1,NT )

with Ω1,NT = 1N(T−2)

∑Ni=1

∑Tt=2 witw

′it(∆uit)

2 and ∆uit = ∆yit −∆H′itΓ−∆x′itβ.

Let µΨ = plim(N,T )→∞1NT

∑Ni=1

∑Tt=2 ‖Ψ (zit)− γΨ‖2 a (zit) where γΨ = Q−1

6 qΨ. The

following theorem establishes the asymptotic distribution of JNT under H1 (δNT ) .

Theorem 4.2 Suppose that Assumptions A.1-A.3, and A.5-A.6 hold. Under H1 (δNT ) with

δNT ≡ (NT )−1/2V1/4NT , as (N,T )→∞,

JNT →d N (µΨ, 1) .

Remark 4.3 The proof for the above theorem is tedious and is relegated to the appendix. We

complete the proof by showing that (i) JNTd→ N (µΨ, 1), and (ii) JNT − JNT = op (1). The

idea to prove (i) is to write JNT as a degenerated second order U-statistic plus some smaller

order terms and then apply deJong’s (1987) CLT for independent but non-identically distributed

(inid) observations.

Remark 4.4 In view of the fact that VNT = O (L), we have δNT ≡ (NT )−1/2 L1/4 which indi-

cates that our test has power to detect the local alternatives that converge to the null hypothesis

at the rate (NT )−1/2 L1/4. The asymptotic local power function is given by

Pr(JNT ≥ zα|H1 (δNT )

)→ 1− Φ (zα − µΨ) as (N,T )→∞,

where zα is the upper αth percentile from the standard normal distribution, and Φ (·) is thestandard normal cumulative distribution function (CDF).

Remark 4.5 To study the asymptotic behavior of JNT under global alternatives, we need to

study the asymptotic properties of γ under H1. We can define pseudo-true parameter γ∗

as the probability limit of γ. Then Pr(Π (zit) ≡ θ (zit) − γ∗ 6= 0) > C for some C > 0.

Let Π = (Π′ (z11) , ..., Π′ (z1T ) , ..., Π′ (zN1) , ..., Π (zNT )′)′. With the additional assumption

that∥∥Π∥∥ = Op((NT )1/2), we can show that DNT = 1

NT

∑Ni=1

∑Tt=2

∥∥∥θ (zit)− γ∥∥∥2a (zit) =

1NT

∑Ni=1

∑Tt=2

∥∥Π (zit)∥∥2a (zit) + op (1) = Op (1). Then, together with the fact that BNT =

Op (L) and VNT = Op (L), we can demonstrate that JNT diverges to infinity at rate Op(NT/√L)

under H1 as L/ (NT ) → 0. That is, Pr(JNT > bNT |H1) → 1 for any nonstochastic sequence

bNT = O(NT/√L). So our test achieves the consistency against the global alternatives.

16

Page 17: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Remark 4.6 With a slightly modification, our test can be applied to testing for the parametric

specification of θ (·). One can still construct test statistics based on the empirical L2-norm

distance between the sieve estimate and parametric estimate for the slope vector of dit.

The null hypothesis H0 can be seen a special case of H1 (δNT ) when µΨ = 0. Clearly, JNT

is asymptotic distributed N (0, 1). This result is stated as a corollary.

Corollary 4.7 Suppose Assumptions A.1-A.3, A.6-A.7. hold. Under H0, JNT →d N (0, 1) as

(N,T )→∞.

Remark 4.8 In principle, we can compare JNT with the one-sided critical value zα from the

standard normal distribution, and reject the null when JNT ≥ zα. In finite sample, tests basedon standard normal critical values tend to suffer from severe size distortion due to the non-

parametric nature of our test. Therefore, we propose to implement the test based on bootstrap

p-value.

To improve the finite sample performance of our test, we propose a fixed-regressor bootstrap

(Hansen, 2000) procedure as follows:

1. Estimate the restricted model under H0 and obtain the residuals uit = yit − d′itγ − x′itβ,

where γ and β are the IV or GMM estimates of γ and β under the null; under H1, obtain

the sieve estimator θ (zit). Calculate the test statistic JNT based on the original sample

yit,dit,xit, zit. Let ηi ≡ T−1∑T

t=1 uit.

2. Obtain the bootstrap error u∗it = (uit − ηi) εit for i = 1, 2, . . . , N and t = 2, . . . , T, and

εit’s are IID across both i and t and follow a two-point distribution: εit = 1−√

52 with

probability 1+√

52√

5and

√5+12 with probability

√5−1

2√

5. We generate the bootstrap analogue

y∗it of yit as

y∗it = d′itγ + x′itβ + ηi + u∗it for i = 1, 2, . . . , N and t = 2, . . . , T,

where y∗i1 = yi1.

3. Given the bootstrap resample y∗it,dit,xit, zit, estimate both the restricted (linear) andunrestricted (semi-parametric) panel data model and calculate the bootstrap test statistic

J∗NT .

17

Page 18: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

4. Repeat steps 2 and 3 for B times and index the bootstrap test statistics as J∗NT,bBb=1.

The bootstrap p-value is calculated by p∗ = B−1∑B

b=1 1(J∗NT,b > JNT ).

It is straightforward to implement the above bootstrap procedure. Clearly, we impose the

null hypothesis of constant slopes for dit in step 2. Noting that there is no dynamic and

endogeneity in the bootstrap world, we can estimate the model with/without using the IV

approach. Conditional on the data, (y∗it, u∗it) are independently but not identically distributed

(INID) across i, and u∗it are also independently distributed across t. So we need to resort to

the CLT for second order U -statistics with INID data (e.g., de Jong (1987)) to justify the

asymptotic validity of the above bootstrap procedure. Following Su and Lu (2013) and Su and

Zhang (2016), we can easily justify the validity of our bootstrap procedure.

5 Simulations

In this section, we conduct a small set of Monte Carlo simulations to examine the finite sample

performance of our proposed sieve 2SLS estimation for partially linear functional-coeffi cient

dynamic panel models.

We consider the following five data generating processes (DGPs) with functional coeffi cient

by allowing the panel structure could be either balanced or unbalanced.

DGP 1. (functional coeffi cient on lagged variable):

yit = (0.5− e−2z2it)yi,t−1 + 0.5xit + ηi + εit, (5.1)

so the functional coeffi cient θ (z) = 0.5−e−2z2 , similar setting can be found in Cai et al (2015).

We also assume εit are IID N (0, 1) across both i and t, ηi are IID N (0, 1) , and

xit = ρx,ixit−1 + 0.5ηi + εx,it,

zit = ρz,izit−1 + 0.5ηi + εz,it,

with ρx,i and ρz,i are independent draws from U (0.2, 0.8) for i = 1, 2, . . . , N.

DGP 2. (functional coeffi cient on lagged variable):

yit = e−z2it(z2

it + zit)yi,t−1 + 0.5xit + ηi + εit, (5.2)

so the functional coeffi cient θ (z) = e−z2(z2 + z), 8 and the generation of xit, zit, ηi and εit are

the same as in (5.1).8For this function θ (z) , it can be verified that minz∈R θ (z) = −0.4 and maxz∈R θ (z) = 0.85.

18

Page 19: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

DGP 3. (functional coeffi cient on lagged variable):

yit = 0.8 sin(zit)yi,t−1 + 0.5xit + ηi + εit, (5.3)

so the functional coeffi cient θ (z) = 0.8 sin(z) with Φ (·) being the CDF of standard normaldistribution, and the generation of xit, zit, ηi and εit are the same as in (5.1).

DGP 4. (functional coeffi cients on both lagged and exogenous variables):

yit = (0.5− e−2z21,it)yi,t−1 + (1.5 + φ (z2,it))dit + 0.5xit + ηi + εit. (5.4)

so the functional coeffi cient θ1 (z) = 0.5 − e−10z2 and θ2 (z) = 1.5 + φ (z) where φ (·) is thestandard normal PDF. We assume the generation of εit, αi and xit are the same as in DGP1,

and

z1,it = ρz1,izj,it−1 + 0.5ηi + εz1,it,

z2,it = ηi + εz2,it,

dit = ρd,idit−1 + 0.5ηi + εd,it,

with εz1,it, εd,it being IID N (0, 1) and εz2,it being IID χ2 (1) across both i and t and in-

dependent of ηi and εit . Also, ρz1,i and ρd,i are independent draws from U (0.2, 0.8) for

i = 1, 2, . . . , N .

DGP 5. (unbalanced model with functional coeffi cient on lagged variable):

For this DGP, we assume the generation of θ (z) , yit, zit and xit are the same as of DGP

(5.1), but we assume the panel is unbalanced in the sense that Ti 6= Tj for some i 6= j. To this

end, we consider two cases of generation of Ti (i = 1, . . . , N), where we assume Ti are integers

randomly drawn from [5, 10] , i.e, we assume the time period of the panel is fixed. While in

the second case, we assume Ti are integers randomly drawn from [40, 50] , i.e, we consider a

relatively large panel.

DGP 6. (linear dynamic panels with constant lag coeffi cient)

For this DGP, we consider a linear dynamic panels with constant lag coeffi cient, and we

perform the hypothesis testing of whether the lag coeffi cient is indeed constant. The prototype

model (under the null hypothesis that the lag coeffi cient is constant) is given by

yit = 0.5yi,t−1 + 0.5xit + ηi + εit, (5.5)

and we assume the model under alternative hypothesis is given by

yit = θ (zit) yi,t−1 + 0.5xit + ηi + εit, (5.6)

19

Page 20: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

where θ (z) = 0.5 − e−2z2 as in (5.1) or θ (z) = e−z2(z2 + z) as in (5.2). The purpose of DGP

(5.6) is to verify the power of the proposed test statistics. The generation of xit, ηi and zit are

the same as in DGP (5.1).

For the (N,T ) pair, we consider N = 100, 200 and T = 5, 10, 50 for DGP (5.1)-DGP (5.4),

and we set the number of replications as 1000 for the estimation.

In the estimation, we consider the following sieve estimation for θ (·) and β, θsieve (·) andβsieve, respectively. Since the coeffi cient of yi,t−1 is assumed to be functional, we follow the

suggestion (i) of Remark (2.1) for choice of IVs. For the sieve estimates, we choose the cubic

B-spline as the sieve basis and include the tensor product terms to approximate the function

θ (·) . Along each dimension of the covariate in θ (·), we let Lc = cb(NT )1/5c + 1 and choose

Lc sieve approximating terms, where bac denotes the integer part of a and c = 1, 2, 3. For the

sieve estimates of θsieve (·) for DGPs (5.1)-(5.3) and θ1,sieve (·) and θ2,sieve (·) for DGP (5.4),we calculate the median bias (which is computed as the difference of the medians of θ (·) andθsieve (·)) and RMSE (which is computed as square root of the pointwise difference of θ (·)and θsieve (·)). For estimation of βsieve for DGPs (5.1)-(5.4), we calculate the bias and RMSEaround the true value for comparison. Finally, for the specification testing of constant lag

coeffi cient of DGP (5.5), we use 500 replications and 300 bootstrap resamples for the empirical

size and power study. The simulation results are summarized in Table 1-5 and Fig 1-6.

Several interesting findings can be observed from the simulation results. On the first hand,

when the panel is balanced (DGP1-4), Table 1-2 reports the median bias (m.b) and root mean

squared error (RMSE) of various estimates of θ (·) (or θ1 (·) and θ2 (·)), and Table 4 reports themedian bias (m.b) and root mean squared error (RMSE) of various estimates of β. For all DGPs

under investigation, the RMSEs of both nonparametric part and parametric part decrease as

either N or T increases and are roughly halved as N is quadrupled regardless of the sieve choice

of approximation terms L1, L2 and L3. The observation is valid when N is large or T is large.

However, it should be noticed that the choice of approximation terms L1, L2 and L3 do have

impact on the estimation for the sieve estimation of nonparametric part, namely, compared to

using a small number of sieve approximation terms, estimation based larger number of sieve

approximation terms will have relatively smaller RMSE, but the impact is almost negligible

for the sieve estimation of parametric part of the model. These suggest that the sieve 2SLS

estimation for balanced dynamic panels is indeed consistent.

On the other, when the panel is unbalanced (DGP5), from the simulation results, we can

20

Page 21: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

still observe that the RMSE of the sieve estimation of both θ (·) and β decreases as either Nor T increases regardless of whether N is large or T is large, which shows the effectiveness of

applying the sieve 2SLS estimation to unbalanced panels.

When coming the specification testing, Table 5 gives the empirical rejection frequency for

our proposed test. From this table, we can see that the empirical size behave reasonably well

for DGP (5.5), and they are very close to the nominal values 1%, 5% and 10% regardless

the choice of number of sieve approximation numbers. The powers are reasonably good, and

increase quite fast with the increase of either N or T. These empirical size and power suggest

that the proposed test is applicable to testing whether the lag coeffi cient is constant for linear

dynamic panels.

Finally, we compare the approximation of the estimated functional coeffi cient with the true

functional coeffi cient for DGP 1-5 in Fig 1-6. It is obvious that the estimated functional coeffi -

cient using sieve 2SLS estimation fits the true functional coeffi cient quite well in all simulation

designs regardless of whether the panel is balanced or not and whether T is large or fixed. In

most cases, the shape of estimated functional coeffi cient are almost coincide with the true func-

tion coeffi cients, which illustrates the validity of using sieve approximation for the unknown

functional coeffi cient.

In all, we can conclude that the simulation results confirm our theoretical findings in the

paper and the sieve 2SLS estimation perform reasonably good.

21

Page 22: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Table 1: Simulation results for sieve estimates θ (·) for DGPs (5.1)-(5.3)L1 L2 L3

DGP T N m.b RMSE m.b RMSE m.b RMSE

1 5 100 0.1939 0.1123 0.0999 0.0845 0.1324 0.1194

200 0.0487 0.0372 0.0758 0.0633 0.0985 0.0861

10 100 0.0350 0.0348 0.0467 0.0402 0.0660 0.0551

200 0.0053 0.0230 0.0245 0.0197 0.0337 0.0280

50 100 0.0084 0.0059 0.0086 0.0077 0.0115 0.0112

200 0.0026 0.0023 0.0058 0.0053 0.0083 0.0073

2 5 100 0.0210 0.0898 0.0590 0.0654 0.0820 0.0969

200 0.0179 0.0281 0.0386 0.0431 0.0502 0.0626

10 100 0.0173 0.0212 0.0275 0.0285 0.0380 0.0416

200 0.0109 0.0143 0.0131 0.0187 0.0203 0.0271

50 100 0.0036 0.0076 0.0063 0.0072 0.0091 0.0106

200 0.0033 0.0037 0.0044 0.0051 0.0064 0.0071

3 5 100 0.0144 0.0299 0.0200 0.0480 0.0413 0.0714

200 0.0069 0.0124 0.0124 0.0275 0.0174 0.0398

10 100 0.0057 0.0104 0.0120 0.0232 0.0142 0.0328

200 0.0027 0.0046 0.0059 0.0106 0.0114 0.0160

50 100 0.0005 0.0028 0.0014 0.0052 0.0040 0.0080

200 -0.0004 0.0014 0.0013 0.0029 0.0025 0.0044

Notes: 1. "m.b" refers to the median bias.

2. "L1", "L2" and "L3" refers to sieve estimation using Lc sieve terms in the approximation,

which is determined by Lc = cb(NT )1/5c for c = 1, 2, 3.

22

Page 23: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Table 2: Simulation results for sieve estimates θ1 (·) and θ2 (·) for DGP (5.4)L1 L2 L3

θ1 (·) T N m.b RMSE m.b RMSE m.b RMSE

5 100 0.2616 0.1888 0.0482 0.0484 0.0629 0.0557

200 0.0343 0.0286 0.0262 0.0234 0.0351 0.0290

10 100 0.0225 0.0179 0.0138 0.0191 0.0192 0.0263

200 -0.0056 0.0119 0.0113 0.0101 0.0170 0.0143

50 100 -0.0007 0.0052 0.0034 0.0033 0.0055 0.0052

200 -0.0004 0.0017 0.0025 0.0027 0.0035 0.0040

θ2 (·) 5 100 0.0483 0.0711 0.0407 0.0359 0.0500 0.0385

200 0.0144 0.0136 0.0123 0.0126 0.0182 0.0160

10 100 0.0212 0.0123 0.0089 0.0119 0.0092 0.0148

200 0.0012 0.0101 0.0051 0.0054 0.0113 0.0080

50 100 0.0021 0.0023 0.0019 0.0023 0.0038 0.0032

200 0.00.13 0.0013 0.0007 0.0017 0.0003 0.0023See notes of table 1.

Table 3: Simulation results for sieve estimates θ (·) and β for DGP 5 with unbalanced panelL1 L2 L3

θ (·) Ti N m.b RMSE m.b RMSE m.b RMSE

[5, 10] 100 0.0250 0.0455 0.0347 0.0362 0.0435 0.0362

200 0.0106 0.0216 0.0264 0.0233 0.0377 0.0326

[40, 50] 100 0.0066 0.0060 0.0093 0.0079 0.0129 0.0116

200 0.0005 0.0035 0.0046 0.0058 0.0064 0.0082

β [5, 10] 100 -0.0011 0.0556 -0.0067 0.0563 -0.0079 0.0564

200 -0.0024 0.0367 -0.0018 0.0366 -0.0029 0.0366

[40, 50] 100 -0.0004 0.0207 -0.0009 0.0209 -0.0016 0.0209

200 -0.0011 0.0143 -0.0014 0.0144 -0.0018 0.0144See notes of table 1.

23

Page 24: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Table 4: Simulation results for β for DGPs (5.1)-(5.4)

L1 L2 L3

DGP T N Bias RMSE Bias RMSE Bias RMSE

1 5 100 0.0027 0.0796 -0.0089 0.0813 -0.0118 0.0823

200 -0.0043 0.0521 -0.0088 0.0523 -0.0133 0.0544

10 100 -0.0096 0.0502 -0.0103 0.0502 -0.0140 0.0502

200 -0.0017 0.0338 -0.0040 0.0339 -0.0042 0.0342

50 100 0.0006 0.0198 0.0001 0.0196 -0.0001 0.0196

200 0.0004 0.0133 0.0002 0.0133 0.0000 0.0133

2 5 100 0.0093 0.0808 -0.0068 0.0815 -0.0108 0.0823

200 -0.0027 0.0522 -0.0055 0.0523 -0.0091 0.0537

10 100 -0.0028 0.0493 -0.0060 0.0491 -0.0081 0.0497

200 -0.0007 0.0335 -0.0019 0.0337 -0.0026 0.0336

50 100 -0.007 0.0197 0.0003 0.0196 0.0003 0.0196

200 0.0004 0.0133 0.0003 0.0133 -0.0003 0.0133

3 5 100 -0.0031 0.0791 -0.0073 0.0775 -0.0103 0.0768

200 -0.0003 0.0541 -0.0015 0.0544 -0.0025 0.0545

10 100 0.0014 0.0503 -0.0002 0.0499 -0.0011 0.0499

200 -0.0005 0.0341 -0.0012 0.0342 -0.0017 0.0342

50 100 -0.0005 0.0194 -0.0008 0.0194 0.0012 0.0194

200 -0.0004 0.0133 -0.0005 0.0133 -0.0006 0.0133

4 5 100 0.0440 0.0933 0.0066 0.0875 -0.0016 0.0894

200 -0.0148 0.0545 -0.0074 0.0532 -0.0106 0.0543

10 100 -0.0003 0.0493 -0.0032 0.0499 -0.0032 0.0497

200 -0.0046 0.0338 -0.0003 0.0335 -0.0003 0.0338

50 100 -0.0004 0.0200 -0.0001 0.0200 -0.0001 0.0200

200 -0.0005 0.0135 -0.0007 0.0135 -0.0008 0.0136

Notes: The true value of β is β = 0.5. See also Note 2 of Table 1.

24

Page 25: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Table 5: Empirical size and power for specification test of DGP (5.5)

size study for H0 : θ (z) = 0.5

L1 L2 L3

size size size

T N 1% 5% 10% 1% 5% 10% 1% 5% 10%

5 100 0.4% 2% 4.4% 0.4% 2.4% 5.4% 0.4% 1.2% 4%

200 0.4% 4.4% 7.8% 0.6% 2.4% 7.2% 0.1% 2.8% 7%

10 100 0.4% 3.6% 10% 1% 4.4% 9.2% 0.8% 5.4% 9.2%

200 0.8% 4.6% 8.8% 0.4% 3.4% 8.8% 0.8% 3.6% 9.6%

50 100 1.4% 6.6% 12.2% 1.4% 6.6% 11.2% 1.4% 7% 11.2%

200 1% 5% 9.8% 0.8% 4.2% 9% 0.8% 4.6% 9%

Power study for H1 : θ (z) = 0.5− e−2z2

L1 L2 L3

Power Power Power

T N 1% 5% 10% 1% 5% 10% 1% 5% 10%

5 100 7% 32.6% 53.6% 5.8% 24.4% 46.4% 3% 13.8% 30.6%

200 24.2% 65.4% 84.6% 21.6% 63.8% 85.4% 12.2% 46.2% 75.2%

10 100 97.8% 99.6% 100% 97.4% 99.8% 100% 91.8% 99.2% 99.8%

200 100% 100% 100% 100% 100% 100% 99.8% 100% 100%

50 100 100% 100% 100% 100% 100% 100% 100% 100% 100%

200 100% 100% 100% 100% 100% 100% 100% 100% 100%

Power study for H1 : θ (z) = e−z2(z2 + z)

L1 L2 L3

Power Power Power

T N 1% 5% 10% 1% 5% 10% 1% 5% 10%

5 100 7.4% 27.6% 54.8% 3.8% 19.6% 41.6% 1.4% 10% 25.4%

200 32% 75.2% 89.8% 17.2% 59.8% 82.8% 10.2% 41.4% 71.8%

10 100 97.6% 100% 100% 98% 100% 100% 91% 99.6% 100%

200 100% 100% 100% 100% 100% 100% 100% 100% 100%

50 100 100% 100% 100% 100% 100% 100% 100% 100% 100%

200 100% 100% 100% 100% 100% 100% 100% 100% 100%

See notes of Table 1.

25

Page 26: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Fig 1. Sieve approximation of θ (z) = 0.5− e−10z2 for DGP (5.1)

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

N = 100, T = 5 N = 200, T = 5

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

N = 100, T = 10 N = 200, T = 10

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

L1L2L3True

N = 100, T = 50 N = 200, T = 50

Note: "L1" refers to sieve approximation using L1 terms, "L2" refers to sieve approximation

using L2 terms, "L3" refers to sieve approximation using L3 terms, and "True" refers to the

true function.

26

Page 27: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Fig 2. Sieve approximation of θ (z) = e−z2(z2 + z) for DGP (5.2)

­3 ­2 ­1 0 1 2 3­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

­3 ­2 ­1 0 1 2 3­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

N = 100, T = 5 N = 200, T = 5

­3 ­2 ­1 0 1 2 3­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

­3 ­2 ­1 0 1 2 3­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

N = 100, T = 10 N = 200, T = 10

­3 ­2 ­1 0 1 2 3­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

­3 ­2 ­1 0 1 2 3­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

L1L2L3True

N = 100, T = 50 N = 200, T = 50

See note of Fig 1.

27

Page 28: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Fig 3. Sieve approximation of θ (z) = 0.8 sin(z) for DGP (5.3)

­3 ­2 ­1 0 1 2 3­1

­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

­3 ­2 ­1 0 1 2 3­1

­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

N = 100, T = 5 N = 200, T = 5

­3 ­2 ­1 0 1 2 3­1

­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

­3 ­2 ­1 0 1 2 3­1

­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

N = 100, T = 10 N = 200, T = 10

­3 ­2 ­1 0 1 2 3­1

­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

­3 ­2 ­1 0 1 2 3­1

­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

L1L2L3True

N = 100, T = 50 N = 200, T = 50

See note of Fig 1.

28

Page 29: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Fig 4. Sieve approximation of θ1 (z) = 0.5− e−10z2 for DGP (5.4)

­3 ­2 ­1 0 1 2 3­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

N = 100, T = 5 N = 200, T = 5

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

N = 100, T = 10 N = 200, T = 10

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

L1L2L3True

N = 100, T = 50 N = 200, T = 50

See note of Fig 1.

29

Page 30: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Fig 5. Sieve approximation of θ2 (z) = 1.5 + φ (z) for DGP (5.4)

­1.5 ­1 ­0.5 0 0.5 1 1.5 2 2.5 31.4

1.5

1.6

1.7

1.8

1.9

2

­1.5 ­1 ­0.5 0 0.5 1 1.5 2 2.5 3

1.5

1.55

1.6

1.65

1.7

1.75

1.8

1.85

1.9

1.95

2

N = 100, T = 5 N = 200, T = 5

­1.5 ­1 ­0.5 0 0.5 1 1.5 2 2.5 3

1.5

1.55

1.6

1.65

1.7

1.75

1.8

1.85

1.9

1.95

2

­1.5 ­1 ­0.5 0 0.5 1 1.5 2 2.5 3

1.5

1.55

1.6

1.65

1.7

1.75

1.8

1.85

1.9

1.95

2

N = 100, T = 10 N = 200, T = 10

­1.5 ­1 ­0.5 0 0.5 1 1.5 2 2.5 31.5

1.55

1.6

1.65

1.7

1.75

1.8

1.85

1.9

1.95

2

­1.5 ­1 ­0.5 0 0.5 1 1.5 2 2.5 31.5

1.55

1.6

1.65

1.7

1.75

1.8

1.85

1.9

1.95

2

L1L2L3True

N = 100, T = 50 N = 200, T = 50

See note of Fig 1.

30

Page 31: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Fig 6. Sieve approximation of θ (z) = 0.5− e−10z2 for DGP 5 with unbalanced panel

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

N = 100, T ∼ U [5, 10] N = 100, T ∼ U [40, 50]

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

­3 ­2 ­1 0 1 2 3

­0.4

­0.2

0

0.2

0.4

0.6

L1L2L3True

N = 200, T ∼ U [5, 10] N = 200, T ∼ U [40, 50]

See note of Fig 1.

31

Page 32: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

6 Empirical Application to Democracy and Income

In this section, we apply the sieve 2SLS estimation to the study of Democracy and Income.

Recently, the study of the relationship between income per capita and democracy has been

very popular, to name a few researches, see Acemoglu et al. (2008), Cervellati et al. (2014),

Robinson (2006) and the references therein. From these researches, there is no conclusive for

the relationship between democracy and income. For instance, Acemoglu et al. (2008) find

a positive effect between changes between in income and democracy, while Cervellati et al.

(2014) find heterogenous effects of income on democracy, namely, the effect is negative for

former colonies, but positive for non-colonies. However, a linear panel framework is assumed

in all these researches, i.e, the impact of income on democracy is linear and holds constant

over time and across different countries. It is unclear for the impact of income and democracy

if the relation is nonlinear and depends on level of income per capita. In this section, we

reinvestigate this topic using our proposed partially linear functional-coeffi cient dynamic panel

models that allow general nonlinearity of unknown functional coeffi cient for nonlinear relation

between income and democracy.

We consider the five-year panel dataset over the period 1960-1995.9 The data set we adapt

in this section is the same as in Acemoglu et al. (2008) and Cervellati et al. (2014). Let

democracyit be the measure of democracy (which is measured as the Freedom House Political

Rights Index) for country i over the t-th five-year periods. In order to capture persistence in

democracy and potential mean-reverting dynamics (for instance, the tendency of the democracy

score to return to some equilibrium value for the country (Acemoglu et al. (2008)), the lagged

value democracyi,t−1, is included in the model. Let ln gdpit be the log GDP per capita, which is

the main variable of interest, and measures the causal effect of income per capita on democracy.

Other potential covariates are included in the vector xit, which contain log population (which

is measured as log of total population in thousands) and average education level (which is

measured as average schooling years). The summary statistics for control variables are also

listed in the Table 6.9After deletion of missing values, the total observation used in this section is 682, with 92 countries in total,

while the five-year varies from 4 to 8, i.e., the data is unbalanced.

32

Page 33: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Table 6. Descriptive statistics (92 countries)

Panel (5 year panel) Mean Std deviation Min Max

Freedom House measure of democracy 0.6053 0.3485 0 1

Log GDP per Capitat−1 8.2850 1.0102 5.7739 10.2544

Log populationt−1 9.0888 1.5510 5.1704 14.0018

Educationt−1 4.5267 2.8578 0.042 12.179

When the impact of income on democracy is assumed linear as in Acemoglu et al. (2008) and

Cervellati et al. (2014), different conclusions are drawn. To allow the nonlinear and interactive

effect of income on democracy, we consider the following partially linear functional-coeffi cient

dynamic panel models with fixed effects

democracyit = m (ln gdpi,t−1) democracyi,t−1 + γ ln gdpi,t−1 + x′i,t−1β + ηi + uit, (6.1)

where m (·) is an unknown function, which measures nonlinearity and interaction betweendemocracy and income.

Table 7 presents the sieve estimation results for the parametric part of model (6.1) based

on the 2SLS estimation procedure. As in the simulation, we use three different approximation

terms in the estimation, and we follow the procedure in Remark (2.1) to choose instruments

for the sieve 2SLS estimation. We report 3 semiparametric estimates using L1, L2 and L3 sieve

approximation terms where Lc = cb(NT )1/5c + 1 for c = 1, 2, 3, which is the same number

of choosing sieve approximation terms in the simulation. Robust standard error are provided

in parentheses of Table 6. From the estimation results, it is obvious that both Education

and Population have positive significant effect on Democracy, and the significance remains the

same across different number of sieve approximation terms. The finding of positive effects on

Democracy of both Education and Population is in contrast to the result of Acemoglu et al.

(2008, Table 4, P821), where they found the effects are insignificant. We also calculate the

correlation between Democracy and lagged Education, which is 0.5587, and the correlation

between Democracy and Population is 0.0045. Both correlation coeffi cient suggest that there

exist positive effects for Education and Population on Democracy. Even after the inclusion of

the effect of lagged Log GDP per Capita through the functional coeffi cient m (ln gdpi,t−1) , the

effect of lagged Log GDP per Capita on Democracy is positive, which is consistent with the

findings in the literature.

33

Page 34: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Table 7. Estimation results of the parametric part for model (6.1)

Dependent variable: democracy

Var.\Coeff. est. L1 L2 L3

Log GDP per Capitat−1 0.0013(0.0010)

0.0019(0.0011)

∗ 0.0024(0.0011)

∗∗

Educationt−1 0.0375(0.0048)

∗∗∗ 0.0427(0.0057)

∗∗∗ 0.0504(0.0069)

∗∗∗

Populationt−1 0.0191∗∗(0.0072)

0.0155∗(0.0081)

0.0131(0.0.0093)

Notes: 1. Robust S.E are given in parentheses.

2. "∗∗∗", "∗∗" and "∗" refers to 1%, 5% and 10% significant, respectively.

3. "L1", "L2" and "L3" refer to using using Lc sieve terms in the approximation, which is deter-

mined by Lc = cb(NT )1/5c+ 1 for c = 1, 2, 3.

4. The correlation between democracy and lagged Education is 0.5587, and the correlation between

democracy and lagged Education is 0.0045.

Now, let’s turn to the estimation of functional coeffi cient m (·) of (6.1). Following the sieve2SLS estimation procedure in the main context, we consider 3 semiparametric estimates ofm (·)using L1, L2 and L3 sieve approximation terms as above. The 2-D plot of the sieve estimation

ofm (·) is provided in Fig 7, from which we can observe that there is a highly nonlinear relation-ship between Democracyt and Democracyt−1, and the relation changes with different degree

of Log GDP per Capita. Namely, the variation of the effect of income on democracy is much

larger when Log GDP per Capita is less than 9, whilst the effect becomes relatively smooth

when Log GDP per Capita is greater than 9. Also, we calculate the correlation coeffi cient

between Democracyt and Democracyt−1, which is 0.6749, and the correlation coeffi cient be-

tween Democracyt and Log GDP per Capitat−1 is 0.5717. These correlation coeffi cients indeed

suggest that income has a positive effect on democracy, which is consistent with the findings of

Acemoglu et al (2008). However, the effect of income on democracy is highly nonlinear in the

sense that the effect of income on democracy varies with different level of degree of Log GDP

per Capita. More specifically, from Fig 7, we can observe that when Log GDP per Capita is

less than 9, the functional coeffi cient m (·) shows lots of fluctuation, while it becomes moreflat when Log GDP per Capita is greater than 9. This suggests that income is higher enough

(i.e., Log GDP per Capita is greater than 9), the degree of democracy tends to be remain the

same as previous stage, and different level of income will have different effects on the degree of

34

Page 35: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

democracy.

Fig 7. Plot of estimated m (·) of (6.1)

7 7.5 8 8.5 9 9.5 10 10.5­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

Log GDP per Capita t­ 1

Siev

e es

timat

ion o

f fu

nctio

nal c

oeffi

cien

t M

(.)

Sieve estimation using L1 termsSieve estimation using L2 termsSieve estimation using L3 terms

Notes: "L1", "L2" and "L3" refer to using using Lc sieve terms in the approximation, which is

determined by Lc = cb(NT )1/5c+ 1 for c = 1, 2, 3.

In the above model (6.1), we adopt the partial linear functional-coeffi cient dynamic model

on the lagged dependent variable. In this section, we consider the test that whether model

(6.1) is valid or not. Based on our proposed testing procedure, the null hypotheses for linear

forms can be stated as follows:

H0 : m (ln gdpi,t−1) = ρ for some ρ ∈ R,

and the alternative hypothesis is H1 : not H0.

Under the null, model (6.1) doesn’t have a functional coeffi cient and the effect of income on

democracy is constant (which is measured by the coeffi cient of ln gdpi,t−1 of (6.1)). The testing

results are summarized in Table 7, where bootstrap p-values for our test statistics JNT based

on different sieve approximation are computed based on 1000 bootstrap resamples. We can

reject H0 at the 5% significance level for the JNT statistics across different choice. In general,

35

Page 36: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

we can conclude that the nonparametric component m (·) is nonlinear in Log GDP per Capitaat least at the 5% significance level.

Table 7. Bootstrap p-values based on 1000 bootstrap resamples

Test statistics JNT L1 L2 L3

Linear function (H0 vs H1) 0.024 0.018 0.038

Notes: "L1", "L2" and "L3" refer to using using Lc sieve terms in the approximation,

which is determined by Lc = cb(NT )1/4c+ 1 for c = 1, 2, 3.

7 Conclusion

In this paper, we study the sieve 2SLS estimation for the partially linear functional-coeffi cient

dynamic panel data models where the effects of some covariates on the dependent variable vary

according to a set of low-dimensional conditional variables nanparametrically. The asymptotic

properties for both parametric and nonparametric components are established when sample

size N and T tend to infinity jointly or only N goes to infinity. We also propose a specification

testing for the null of constant slopes of the nonparametric part in the model. Monte Carlo

simulations show that our sieve 2SLS estimation and specification test perform remarkably well

in finite samples. We apply our method to study the effect of income on democracy and find

strong evidence of nonconstant effect of income on democracy.

There are several interesting topics for further research. First, we do not address the choice

of optimal IVs in this paper. Since we need IVs for the estimation of both the parametric

and nonparametric components, we need a separate consideration of the choice of IVs for the

estimation of the nonparametric component and the parametric component. Second, as shown

in the simulation, the choice of number of sieve approximation terms indeed has certain impact

on the estimation of nonparametric part of the model, it remains an open question of how to

choose the optimal number of sieve approximation terms. We leave these for future research.

References[1] Acemoglu, D., S. Johnson, J. Robinson and P. Yared, 2008, Income and Democracy, The

American Economic Review 98, 808-842.

[2] Ahn, S. C., and P. Schmidt, 1995, Effi cient estimation of models for dynamic panel data.Journal of Econometrics 68, 5—27.

36

Page 37: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

[3] Alvarez, J. and M. Arelleno, 2003, The time series and cross-section asymptotics of dy-namic panel data estimator, Econometrica 71, 1795-1843.

[4] An, Y., C. Hsiao and D. Li, 2016, Semiparametric Estimation of Partially Linear VaryingCoeffi cient Panel Data Models, Advances in Econometrics Volume 36, Essays in Honor ofAman Ullah, Emerald Group Publishing Limited, 47-65.

[5] Anderson, T.W., and C. Hsiao, 1981, Estimation of dynamic models with error compo-nents, Journal of the American Statistical Association 76, 598—606.

[6] Anderson, T.W., and C. Hsiao, 1982, Formulation and estimation of dynamic models usingpanel data, Journal of Econometrics 18, 47—82.

[7] Arellano, M., and S. Bond, 1991, Some tests of specification for panel data: Monte Carloevidence and an application to employment equations. Review of Economic Studies 58,277—297.

[8] Baglan, D. 2010, Effi cient estimation of a partially linear dynamic panel data model withfixed effects: an application to unemployment dynamics in the U.S. Working Paper, De-partment of Economics, Howard University, Washington DC 20059.

[9] Balestra, P., and M. Nerlove, 1966, Pooling cross-section and time series data in theestimation of a dynamic model: the demand for nature gas. Econometrica 34, 585—612.

[10] Baltagi, B.H., and Q. Li, 2002, On instrumental variable estimation of semiparametricdynamic panel data models. Economics Letters 76, 1—9.

[11] Cai, Z., 2010, Functional coeffi cient models for economic and financial data. In: Ferraty,F., Romain, Y., eds. Oxford Handbook of Functional Data Analysis. Oxford: OxfordUniversity Press, 166—186.

[12] Cai, Z., Fang, Y., Lin, M., and J. Su, 2010. Semiparametric Varying-Coeffi cient Instru-mental Variables Models. Working paper.

[13] Cai, Z., and Li, Q., 2008, Nonparametric estimation of varying coeffi cient dynamic paneldata models. Econometric Theory 24, 1321-1342.

[14] Cai, Z., Chen, L., and Fang, Y., 2015, Semiparametric estimation of partially varying-coeffi cient dynamic panel data models. Econometric Reviews 34, 695-719.

[15] Cai, Z., and Y. Hong, 2009, Some recent developments in nonparametric finance. Advancesin Econometrics 25, 379—432.

[16] Card, D., 2001, Estimating the return to schooling: progress on some persistent econo-metric problems. Econometrica 69, 1127—1160.

[17] Cervellati, M., F. Jung, U., Sunde and T. Vischer, 2014, Income and Democracy: Com-ment, The American Economic Review 104, 707-719.

[18] Chen, X., 2007, Large sample sieve estimation of semi-nonparametric models. In J. J.Heckman and E. Leamer (eds), Handbook of Econometrics, Vol. 6, pp. 5549-5632, NorthHolland, Amsterdam.

[19] Chen, X., H. Hong, and E. Tamer, 2005, Measurement error models with auxiliary data.Review of Economic Studies 72, 343-366.

37

Page 38: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

[20] Chen, X. and X. Shen, 1998, Sieve extremum estimators for weakly dependent data,Econometrica 66, 289-314.

[21] Chen, R., and R. Tsay, 1993, Functional-coeffi cient autoregressive models, Journal of theAmerican Statistical Association 88, 298-308.

[22] Chen, R., and L-M, Liu, 2001, Functional coeffi cient autoregressive models: estimationand tests of hypotheses, Journal of Time Series Analysis 22, 151-173.

[23] de Jong, P., 1987, A central limit theorem for generalized quadratic forms. ProbabilityTheory and Related Fields 75, 261-277.

[24] de Jong, R.M., 2002, A note on “Convergence rates and asymptotic normality for seriesestimators”: uniform convergence rates. Journal of Econometrics 111, 1-9.

[25] Dechevsky, L. and S. Penez, 1997, On shape-preserving probabilistic wavelet approxima-tors, Stochastic Analysis and Applications, 15, 187—215

[26] Feng, G., J. Gao, B. Peng, and X. Zhang, 2017, A varying-coeffi cient panel data Model withfixed effects: theory and an application to U.S. commercial banks, Journal of Econometrics196, 68—82.

[27] Gao, J., 2007, Nonlinear Time Series: Semiparametric and Nonparametric Methods, Chap-man & Hall/CRC Press.

[28] Hansen, B., 2000, Testing for Structural Change in Conditional Models, Journal of Econo-metrics 97, 93-115.

[29] Hardle, W., and E. Mammen, 1993, Comparing Nonparametric Versus Parametric Regres-sion Fits, The Annals of Statistics 21, 1926-1947.

[30] Hong, Y., and H. White, 1995, Consistent specification testing via nonparametric seriesregressions, Econometrica 63, 1133-1159.

[31] Horn, R., and C. Johnson, 2012, Matrix Analysis, Cambridge University Press.

[32] Kottaridi, C., and T. Stengos, 2010. Foreign direct investment, human capital and non-linearities in economic growth. Journal of Macroeconomics 32, 858-871.

[33] Lee, Y., 2014, Nonparametric estimation of dynamic panel models with fixed effects,Econometric Theory 30, 1315-1347.

[34] Li, Q., C. Huang, D. Li and T-T, Fu, 2002, Semiparametric smooth coeffi cient models,Journal of Business & Economic Statistics 20, 412-422.

[35] Li, Q. and J. Racine, 2007, Nonparametric Econometrics: Theory and Practice. PrincetonUniversity Press.

[36] Li, Q., and A. Ullah, 1998, Estimating partially linear models with one-way error compo-nents. Econometric Reviews 17, 145—166.

[37] Newey, W.K., 1997, Convergence rates and asymptotic normality for series estimators,Journal of Econometrics 79, 147-168.

[38] Newey, W., and F. Windmeijer, 2009, Generalized method of moments with many weakmoment conditions, Econometrica 77, 687-719.

38

Page 39: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

[39] Okui, R. 2009, The optimal choice of moments in dynamic panel data models, Journal ofEconometrics 151, 1-16.

[40] Robinson, J., 2006, Economic development and democracy. Annual Review of PoliticalScience 9, 503-527.

[41] Rodriguez-Poo, J.M., and A. Soberon, 2015, Nonparametric estimation of fixed effectspanel data varying coeffi cient models, Journal of Multivariate Analysis 133, 95-122.

[42] Sun, Y., R. Carroll, and D. Li. 2009, Semiparametric estimation of fixed effects panel datawith smooth coeffi cient models, Advances in Econometrics 25, 101-130.

[43] Schultz, T.P., 2003. Human capital, schooling and health. Economics and Human Biology1, 207-221.

[44] Su, L. and S. Jin, 2012, Sieve estimation of panel data models with cross section depen-dence. Journal of Econometrics 169, 34-47.

[45] Su, L. and X. Lu, 2013, Nonparametric dynamic panel data models: kernel estimation andspecification testing. Journal of Econometrics 176, 112-133.

[46] Su, L, and T. Hoshino, 2016, Sieve instrumental variable quantile regression estimation offunctional coeffi cient models, Journal of Econometrics 191, 231—254.

[47] Su, L., Murtazashvili, I., Ullah, A., 2014. Local linear GMM estimation of functionalcoeffi cient IV models with application to the estimation of rate of return to schooling.Journal of Business & Economic Statistics 31, 184—207.

[48] Su, L., and Y. Zhang., 2016, Semiparametric estimation of partially linear dynamic paneldata models with fixed effects, Advances in Econometrics Volume 36, Essays in Honor ofAman Ullah, Emerald Group Publishing Limited, 137-204.

[49] Tran, K.C., 2014, Nonparametric estimation of functional-coeffi cient partially linear dy-namic panel data model with fixed effects. Economics Bulletin 34, 1751-1761.

[50] Zhou, B, You, J., Xu, Q., and Chen, G. (2010) Weighted profile least squares estimationfor a panel data varying-coeffi cient partially linear model. Chinese Annals of Mathematics,Series B, 31 B(2), 247—272.

39

Page 40: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

AppendixThis appendix includes the proof for the main results in the paper. Before we prove the

theorem, we first state some lemmas that are used in the proof of the main results in Section3 and Section 4, and then complete the proof for theorem. The proof of all lemmas arestraightforward (e.g., Su and Zhang (2016)), and we skip them to save space.

A Proof of Theorems in Section 3

Lemma A.1 Suppose Assumptions A.1-A.4 hold. Then(i) ‖Qwx,NT −Qwx‖ = Op(

√L/NT );

(ii) ‖Qwh,NT −Qwh‖ = Op(L/√NT );

(iii) ‖Qww,NT −Qww‖ = Op(L/√NT );

(iv) ‖Qhh,NT −Qhh‖ = Op(L/√NT );

(v) λmax (Q3,NT ) = λmax (Q3) + Op(L/√NT ),and λmin (Q3,NT ) ≥ λmin (Q3) /2 w.p.a.1.where

Q3,NT = 1NT H′WMXW

HW;

(vi) λmax (Q5,NT ) = λmax (Q5) +Op(L/√NT ), where Q5,NT ≡ 1

NT HWH′W.

Lemma A.2 Under Assumptions A.1-A.4, 1NTE(‖∆r‖2) = O

((L−2γ/pz

)).

Lemma A.3 Under Assumptions A.1-A.4, 1NTE(‖W′∆u‖2) = O (L) .

Lemma A.4 Suppose Assumptions A.1-A.4 hold. Then 1√NT

AW′∆u →d N (0,AΩA′) forany nonstochastic matrix A of dimension n× pw.

Now let’s turn to the proofs of the theorems in Section 3.Proof of Theorem 3.1By the formula in (2.5), we have

√NT

(β − β

)= Q−1

1,NT

1√NT

X′WMHWPW∆u + Q−1

1,NT

√NTRNT , (A.1)

where Q1NT ≡ 1NT X′WMHW

XW and RNT ≡ 1NT X′WMHW

PW∆r. Then it suffi ces to prove

the theorem by showing that, as (N,T )→∞, (i) Q1,NT →p Q1 > 0, (ii) RNT = op((NT )−1/2),

and (iii) TNT ≡ 1√NT

X′WMHWPW∆u→d N (0,Q2ΩQ′2) .

To show (i), we note that

Q1,NT = Q′wx,NTQ−1ww,NTQwx,NT

−Q′wx,NTQ−1ww,NTQwh,NT

(Q′wh,NTQ−1

ww,NTQwh,NT

)−1Q′wh,NTQ−1

ww,NTQwx,NT

By Lemma A.1, one can readily to show that ‖Q1,NT −Q1‖ = Op(L/√NT ), where Q1 is

defined in (3.1).

40

Page 41: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

In order to show (ii), by the repeated use of tr(AB) ≤ λmax (A)tr(B) for any positivesemi-definite (p.s.d.) matrix B and symmetric matrix A, we have

‖RNT ‖22 =1

N2T 2tr(∆r′PWMHW

∆X′PW∆XMHWPW∆r

)≤ 1

N2T 2λmax (PW)λmax

(∆X′∆X

)λ2

max (MHW)λ2

max (PW) ‖∆r‖2

≤ C‖∆X‖2

NT

‖∆r‖2

NT= Op

(L−2γ/pz

)by Lemma A.2 and the fact that λmax (PW) = λmax (MHW

) = 1, and λmax (∆X′∆X) ≤‖∆X‖2 = Op (1) by the Markov inequality and Assumption A3(i). It follows that

√NT ‖RNT ‖ =

Op(√NTL−γ/pz) = op (1) .

Now, let’s turn to the proof of (iii). Let

Q2,NT = ∆XMHWW(W′W

)−1

=

[Q′wx,NT −Q′wx,NTQ−1

ww,NTQwh,NT

(Q′wh,NTQ−1

ww,NTQwh,NT

)−1Q′wh,NT

]Q−1ww,NT .

By Lemma A.1, one can show that ‖Q2,NT −Q2‖ = Op(L/ (NT )1/2) where Q2 is defined in(3.2). Then decompose TNT as follows

TNT = Q21√NT

W′∆u + (Q2,NT −Q2)1√NT

W′∆u ≡ T1,NT + T2,NT , say.

For T2,NT , we have |T2,NT | ≤ ‖Q2,NT −Q2‖ || 1√NT

W′∆u|| = Op(L/ (NT )1/2)Op(L1/2) =

Op(L3/2/ (NT )1/2) = op(1) by the Markov inequality and Lemma A.3. We apply Lemma

A.4 to show that T1,NT →d N (0,Q2ΩQ′2) with A being replaced by Q2. It follows thatTNT →d N (0,Q2ΩQ′2) as required.

Proof of Theorem 3.2(i) Recall thatMXW

YW = MXW(HWΓ + PW∆u + PW∆r) and Γ = (H′WMXW

HW)−1 H′WMXWYW.

We have with probability approaching 1 (w.p.a.1),

θ (u)− θ (u) = HS (u)′ Γ− θ (u)

= HS (u)′(H′WMXW

HW

)−1H′WMXW

YW − θ (u)

= HS (u)′(H′WMXW

HW

)−1H′WMXW

PW∆u

+HS (u)′(H′WMXW

HW

)−1H′WMXW

PW∆r

+[HS (u)′ Γ− θ (u)]

≡ ANT,1 (u) + ANT,2 (u) + ANT,3 (u) , say. (A.2)

Then by Cauchy-Schwarz inequality, we have∫ ∥∥∥θ (u)− θ (u)∥∥∥2ω (u) du ≤ 3

3∑j=1

∫‖ANT,j (u)‖2 ω (u) du ≡ 3

3∑j=1

ΛNT,j , say.

41

Page 42: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

First, note that

Q3,NT =1

NTH′WMXW

HW

= Q′wh,NTQ−1w,NTQwh,NT

−Q′wh,.NTQ−1w,NTQwx,NT

(Q′wxQ

−1w,NTQwx,NT

)−1Q′wx,NTQ−1

w,NTQwh,NT .

We can show that ‖Q3,NT −Q3‖ = Op(L/√NT ) whereQ3 is defined in (3.3) and λmin (Q3NT ) >

C w.p.a.1 by Lemma A.1. By the repeated use of tr(AB) ≤tr(A)λmax (B) for any symmetricmatrix B and p.s.d. matrix A (e.g., Horn and Johnson (2012)) and Lemma A.1 and A.3, wehave

ΛNT,1 =1

N2T 2tr[∆u′PWMXW

H−1WQ−1

3NTQhhωQ−13NTH′WMXW

PW∆u]

=1

N2T 2tr[Q−1

3NTH′WMXWPW∆u∆u′PWMXW

H−1WQ−1

3NTQhhω

]≤ λmax (Qhhω)

1

N2T 2tr[H′WMXW

PW∆u∆u′PWMXWHWQ−2

3NT

]≤ Cλ−2

min (Q3NT )1

N2T 2tr(MXW

PW∆u∆u′PWMXWHWH′W

)≤ Cλmax

(1

NTHWH′W

)1

NTtr(MXW

PW∆u∆u′PWMXW

)≤ Cλmax (Q5,NT )λ2

max (MXW)

1

N2T 2tr(W′∆u∆u′W

)≤ C

N2T 2

∥∥W′∆u∥∥2

= Op

(L

NT

)where Q3,NT and Q5,NT are defined in Lemma A.1, and we use the fact that λmax (Q5,NT ) =

λmax

(1NT H′WHW

)= λmax

(1NT HWH′W

).

Similarly, noting that λmax (PW) = 1 and 1NT ‖∆r‖2 = OP

(L−2γ/pz

)by Lemma A.2, we

have

ΛNT,2 ≤ λmax (Qhhω)λ−2min (Q3,NT )

1

N2T 2tr(H′WMXW

PW∆r∆r′PWMXWHW

)= λmax (Qhhω)λ−2

min (Q3,NT )1

N2T 2tr(PW∆r∆r′PWMXW

HWH′WMXW

)≤ λmax (Qhhω)λ−2

min (Q3,NT )λmax (Q5,NT )λ2max (MXW

)λ2max (PW)

1

NTtr(∆r∆r′

)≤ C

1

NT‖∆r‖2 = Op

(L−2γ/pz

).

Lastly, it is straightforward to show that ΛNT,3 =∫ ∥∥HS (u)′ Γ− θ (u)

∥∥2ω (u) du = Op

(L−2γ/p2

)by Assumption A2.

Combining the above derivation yields (i) as required.

42

Page 43: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

(ii) The proof follows the same argument of part (i). The major difference is that now wealso need to use Assumption A.2(iv) and restrict our attention to the sequence of compactsubsets UNT of U that is expanding at controllable rate. For the first term, we have

supu∈UNT

‖ANT,1 (u)‖∞ ≤ supu∈UNT

‖h (u)‖∥∥∥(H′WMXW

HW

)−1H′WMXW

PW∆u∥∥∥ = Op

(ς0 (L)

√L

NT

).

By Cauchy-Schwarz inequality and Assumption A2(iv), we can obtain

supu∈UNT

‖ANT,2 (u)‖∞ ≤ supu∈UNT

‖h (u)‖∥∥∥(H′WMXW

HW

)−1H′WMXW

PW∆r∥∥∥ = ζ0 (L)Op(L

−γ/pz).

By Assumptions A.2 (ii) and (iv),

supu∈UNT

‖ANT,3 (u)‖∞ = max1≤l≤pd

supu∈UNT

∣∣h (u)′ γl − θl (u)∣∣

≤ max1≤l≤pd

supu∈UNT

‖θl −Π∞,L (l) h (u)‖∞,$ supu∈UNT

(1 + ‖u‖2

)$/2= O(L−γ/pz)ζ0 (L) .

(iii) To obtain the asymptotic distribution, by the decomposition in (A.2), it suffi ces to provethe theorem by showing that (a) ANT,1 ≡

√NTΞ−1/2 (u) ANT,1 (u)→d N (0, 1) , (b) ANT,2 ≡√

NTΞ−1/2 (u) ANT,2 (u) = op (1) , and (c) ANT,3 ≡√NTΞ−1/2 (u) ANT,3 (u) = op (1).

Let Q4,NT ≡ H′WMXWW(W′W)−1. First, noting that

Q4,NT = H′WMXWW(W′W)−1

= Q′wh,NTQ−1ww,NT −Q′wh,NTQ−1

ww,NTQwx,NT

(Q′wx,NTQ−1

ww,NTQwx,NT

)−1Qwx,NTQ−1

ww,NT ,

we can readily apply Lemma A.1 (i)-(iii) to show that ‖Q4,NT −Q4‖ = Op(L/√NT ).

For (a), we have

ANT,1 = Ξ−1/2 (u) HS (u)′Q−13,NTQ4,NT

W′∆u√NT

= Ξ−1/2 (u) HS (u)′Q−13 Q4

W′∆u√NT

+ Ξ−1/2 (u) HS (u)′ (Q−13,NTQ4,NT −Q−1

3 Q4)W′∆u√NT

≡ ANT,11 + ANT,12, say.

Then we can apply Lemma A.4 to show that

ANT,11 =(

Ξ−1/2 (u) HS (u)′Q−13 Q4

)W′∆u√NT

→ dN

(0,

HS (u)′Q−13 Q4ΩQ4Q

−13 HS (u)

Ξ (u)

)= N (0, 1) .

43

Page 44: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Also, by Cauchy-Schwarz inequality, we have∣∣ANT,12

∣∣ ≤ (Ξ−1/2 (u) ‖HS (u)‖)||Q−13,NTQ4,NT −Q−1

3 Q4||1√NT

∥∥W′∆u∥∥

≤ COp

(L/√NT

)Op(√L) = Op(L

3/2/ (NT )1/2) = op (1) ,

where we use the fact that 1√NT||W′∆u|| = Op(

√L) by Lemma A.3,

Ξ−1 (u) ‖HS (u)‖2 =HS (u)′HS (u)

HS (u)′Q−13 Q4ΩQ′4Q

−13 HS (u)

≥ λmin (Ω1)λmin

(Q4Q

′4

)λ−2

max (Q3) = C (A.3)

and∥∥∥Q−1

3,NTQ4,NT −Q−13 Q4

∥∥∥ = Op(L/√NT ) by Lemma A.1.

Next, in order to show (b), by the Cauchy-Schwarz inequality, the fact that a′Ba ≤‖a‖2 λmax (B) for any vector a and conformable p.s.d. symmetric matrix B, that λmax (C′C) =

λmax (CC′) for any real matrix C, and that λmax (PW) = 1, we have∥∥ANT,2

∥∥2 ≤ 1

NTκ (u) ∆r′PWMXW

HWQ−23,NTHWMXW

PW∆r

≤ O (1)λ−2min (Q3,NT ) ∆r′PWMXW

HWH′WNT

MXWPW∆r

≤ Op (1)λ2max (MXW

)λmax (Q5,NT )λmax (PW) ‖∆r‖2

≤ Op (1) ‖∆r‖2 = Op

(NTL−2γ/pz

)= op (1) ,

under assumption A.4, where κ (u) ≡ Ξ−1 (u) HS (u)′HS (u) = O (1) by (A.3), and the penul-timate equality follows because (NT )−1 ‖∆r‖2 = Op

(L−2γ/pz

). Then (b) follows.

Finally, for (c), by Assumptions A.2(ii) and A.4,

∥∥ANT,3

∥∥ =√NTΞ−1/2 (u) max

1≤l≤pd

∣∣h (u)′ γl − θl (u)∣∣ (1 + ‖u‖2

)−$/2 (1 + ‖u‖2

)$/2≤√NTΞ−1/2 (u) max

1≤l≤pd‖θl −Π∞,Lθl‖∞,$

(1 + ‖u‖2

)$/2= Op(

√NTL−γ/pz) = op (1) ,

for any given u ∈ U , which suggest (c) follows.Combining the above results of (a)-(c) yields result (iii) as required.

B Proof of Theorems in Section 4

Lemma B.1 Under Assumption A.1, A.3.(i) and A.5, γ = γ0 +δNTQ−16 qΨ +Op

((NT )−1/2

)under H1 (δNT ) .

44

Page 45: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Lemma B.2 Under Assumption, A.1-A.3, and A.6-A.7, (i) ‖QNT −Q‖ = Op

(L/ (NT )1/2

);

and (ii) ‖ΩNT − Ω‖ = Op

(L/√NT

).

Proof of Theorem 4.2.Noting that θit − γ = (θit − θit) + (θit − γ) + (γ − γ), we write DNT as follows

DNT =1

NT

N∑i=1

T∑t=2

∥∥∥(θit − θit) + (θit − γ) + (γ − γ)∥∥∥2ait =

6∑s=1

Ds,NT ,

where ait ≡ a (zit) ,θit ≡ θ (zit) , θit ≡ θ (zit) , and

D1,NT ≡ 1NT

∑Ni=1

∑Tt=2

∥∥∥θit − θit∥∥∥2ait, D2,NT ≡ 1

NT

∑Ni=1

∑Tt=2 ‖θit − γ‖

2 ait,

D3,NT ≡ 1NT

∑Ni=1

∑Tt=2 ‖γ − γ‖

2 ait, D4,NT ≡ 2NT

∑Ni=1

∑Tt=2[(θit − θit)′ (θit − γ) ait,

D5,NT ≡ 2NT

∑Ni=1

∑Tt=2(θit − θit)′ (γ − γ) ait, D6,NT ≡ 2

NT

∑Ni=1

∑Tt=2 (θit − γ) (γ − γ) ait.

Then we can decompose JNT as follows

JNT =NTD1,NT − BNT√

VNT+NT

∑6s=2Ds,NT√VNT

+BNT − BNT√

VNT− JNT

√VNT −

√VNT√

VNT.

We complete the proof by showing that under H1 (δNT ):

(i) (NTD1,NT−BNT )√VNT

→d N (0, 1) ;

(ii)NT(D2,NT+D3,NT+D6,NT )√

VNT→p µΨ;

(iii) NTD4,NT√VNT

= op (1) ;

(iv) NTD5,NT√VNT

= op (1) ;

(v) BNT = BNT + op(1);

and (vi) VNT = VNT + op (1) .

We show the results (i) in Proposition B.3, (ii) in Proposition B.4, (v) and (vi) in PropositionB.5. For (iii) and (iv), we use Cauchy-Schwarz inequality to show that∣∣∣∣NTD4,NT√

VNT

∣∣∣∣ ≤ ∣∣∣∣NTD1,NT − BNT√VNT

∣∣∣∣+

∣∣∣∣ BNT√VNT

∣∣∣∣1/2NTD2,NT√VNT

1/2

=[Op (1) +Op

(L1/4

)]Op (δNT ) = Op

(√L/√NT

)= op (1) ,

and ∣∣∣∣NTD5,NT√VNT

∣∣∣∣ ≤ ∣∣∣∣NTD1,NT − BNT√VNT

∣∣∣∣+

∣∣∣∣ BNT√VNT

∣∣∣∣1/2NTD5,NT√VNT

1/2

=[Op (1) +Op

(L1/4

)]Op (δNT ) = Op

(√L/√NT

)= op (1) .

Therefore, we complete the proof of Theorem 4.2.

45

Page 46: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Proposition B.3 Under Assumptions A.1-A.3, A.5-A.8, (NTD1,NT − BNT ) /√VNT →d N (0, 1) .

Proof. Recall that θit − θit = HS,it (zit)′ (H′WMXW

HW)−1 H′WMXWPW (∆u + ∆r) +

eθ,it, whereHS,it = HS (zit) and eθ,it ≡ H′S,itΓ−θit. Let eθ =(e′θ,12, ..., e

′θ,1T , ..., e

′θ,N2, ..., e

′θ,NT

)′and HS = (H′S,12, ...,H

′S,1T , ...,H

′S,N2, ..., H′S,NT )′. Then we have

NTD1,NT − BNT√VNT

=1NT ∆u′WQNTW′∆u− BNT√

VNT+

∆r′WQNTW′∆r

NT√VNT

+e′θeθ

NT√VNT

+2∆u′WQNTW′∆r√

VNTNT

+2∆u′PWMXW

HW (H′WMXWHW)−1 H′Se

θ√VNT

+2∆r′PWMXW

HW (H′WMXWHW)−1 H′Seθ√

VNTNT= J1,NT + J2,NT + J3,NT + 2J4,NT + 2J5,NT + 2J6,NT , say.

First, we can show that J2,NT ≤ λmax

(1NT WQNTW′) ‖∆r‖2√

VNT= Op

(NTL−2γ/pzL−1/2

)=

op (1) by Lemma A.2. It is clear to see that J3,NT = Op(NTL−2γ/pzL−1/2

)because of ‖eθ‖2 / (NT ) =

Op(L−2γ/pz

)which can be verified by the similar proof of Lemma A.2 and Assumption A.2(ii).

Second, we consider the first term J1NT . Rewrite

J1,NT =(NT )−1 ∆u′WQW′∆u− BNT√

VNT+

∆u′W(QNT −Q)−1W′∆u√VNTNT

= J(a)1NT + J

(b)1NT ,

we show that (i) J (a)1NT →d N (0, 1) and (ii) J (b)

1NT = op (1). For (ii), by Lemma A.3 and B.2, we

have J (b)1NT ≤

1NTOp(L/

√NT )Op(NTL)Op

(L−1/2

)= Op(L

3/2/√NT ) = op (1) . Now we turn

to the proof of (i). Rewrite J (a)1NT as follows

J(a)1NT =

(NT )−1 ∆u′WQW′∆u− BNT√VNT

=1

NT√VNT

N∑i=1

N∑j=1, 6=i

∆u′iWiQW′j∆uj +

∑Ni=1 ∆u′iWiQW′

i∆ui − BNTNT√VNT

= J(a)11NT + J

(a)12NT , say.

For the second term, it is clear to see that by Lemma A.2

J(a)12NT =

1√VNT

tr (QΩNT )− BNT =1√VNT

tr (Q (ΩNT − Ω))

≤ 1√VNT

√tr (Q′Q) ‖ΩNT − Ω‖ = Op

(L/√NT

)= op (1) .

46

Page 47: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Next, we show that J (a)11NT →d N (0, 1). Let Hij = WiQW′

j and Hij,ts = W′itQWjs. Then we

can write J (a)11NT as

J(a)11NT =

∑1≤i 6=j≤N

1

NT√VNT

∆u′iHij∆uj =∑

1≤i<j≤NWij ,

where Wij ≡ WNT (ui, uj) ≡ 2NT√VNT

∆u′iHij∆uj and ui ≡ (Wi,∆ui). Noting that J(a)11NT is

a second order degenerated U-statistic that is “clean”(EWNT (ui, u) = EWNT (u, uj) = 0 a.s.

for any nonrandom u), we apply Proposition 3.2 in de Jong (1987) to prove the CLT for J (a)11NT

by verifying that (i) Var(J (a)11NT ) = 1 + o (1) ; (ii) GI ≡

∑1≤i<j≤N E

(W4ij

)= o (1) ; (iii) GII ≡∑

1≤i<j<l≤N E(W2ijW2

il +W2ijW2

jl +W2jlW2

il

)= o (1) ; andGIII ≡

∑1≤i<j<l<r≤N E(WijWirWljWlr+

WijWilWjrWlr+ WirWilWjrWjl) = o (1).

Proof of (i). Note that EJ (a)11NT = 0 and

Var(J

(a)11NT

)=

1

N2T 2VNTE

2∑

1≤i<j≤N∆u′iWiQW′

j∆uj

2

=4

N2T 2VNT

∑1≤i<j≤N

E(∆u′iWiQW′

j∆uj)2

=4

N2T 2VNT

∑1≤i<j≤N

tr[E(Wi∆ui∆u′iWi

)QE

(W′

j∆uj∆u′jWj

)]=

2

VNTtr (ΩQΩQ)− 2

NTVNTtr

(1

N

N∑i=1

ΩiQΩiQ

)

=2

VNTtr (ΩQΩQ)−O((NT )−1) = 1 + o (1) ,

where Ωi ≡ T−1E (W′i∆ui∆u′iWi) for i = 1, ..., N, and we use the fact that

2

NTVNT1

N

N∑i=1

tr (ΩiQΩiQ) ≤ 2λmax (Ωi)λmax (Ω1)λmax (Q)

NT

tr (Q)

VNT

= O

(1

NT

)= o (1)

by repeated use of tr(AB) ≤ λmax (A)tr(B) for p.s.d matrix A and symmetric matrix B.Proof of (ii). For (ii), let ϕit,l = Wl,it∆uit. Noting thatHij,ts =

∑L0l1=1

∑L0l2=1 Wl,itQl1l2Wr,js

47

Page 48: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

where L0 ≡ pdL, we have

GI =16

N4T 4V2NT

∑1≤i<j≤N

E

T∑t=2

T∑s=2

L0∑l1=1

L0∑l2=1

Ql1l2∆uitWl1,itWl2,js∆ujs

4

=16

N4T 4V2NT

∑1≤l1,...,l8≤L0

Ql1l2Ql3l4Ql5l6Ql7l8

×∑

1≤i<j≤N

∑2≤t1,...,t8≤T

E(ϕit1,l1ϕit3,l3ϕit5,l5ϕit7,l7

)E(ϕjt2,l2ϕjt4,l4ϕjt6,l6ϕjt8,l8

)=

16

N4T 4V2NT

∑1≤l1,...,l8≤L0

Ql1l2Ql3l4Ql5l6Ql7l8N∑i=1

GI,l1l3l5l7 (i)

N∑j=1

GI,l2l4l6l8 (j)

− 16

N4T 4V2NT

∑1≤l1,...,l8≤L0

Ql1l2Ql3l4Ql5l6Ql7l8N∑i=1

GI,l1l3l5l7 (i)GI,l2l4l6l8 (i)

whereGI,lrkq (i) =

∑2≤t,s,p,v≤T

E(ϕit,lϕis,rϕip,kϕiv,q

).

We determine the order of GI by investigating the order of GI,l1l3l5l7 (i). We consider 3 differentcases according to the cardinality of set S = t1, t3, t5, t7: (i’) |S| = 4; (ii’) |S| = 3; and(iii’) |S| ≤ 2. We denote the summations for cases (i’)-(iii’) by G[4]

I,l1l3l5l7(i), G[3]

I,l1l3l5l7(i) and

G[2−]I,l1l3l5l7

(i), respectively, and then GI,l1l3l5l7 (i) = G[4]I,l1l3l5l7

(i) + G[3]I,l1l3l5l7

(i) + G[2−]I,l1l3l5l7

(i).

For the each term inside the summation of G[4]I,l1l3l5l7

(i), there are four different time indices.Let k1 ≤ k3 ≤ k5 ≤ k7 be the permutation of t1, t3, t5 and t7 in ascending order and let dc bethe c-th large difference among k2j+1 − k2j−1 for j = 3, 2, 1. Let

Hl1l3l5l7 (k1, k3, k5, k7) = E(ϕit1,l1ϕit3,l3ϕit5,l5ϕit7,l7

).

We first consider the subcases with k7 − k5 = d1 or k3 − k1 = d1. By the covariance inequalityfor strong mixing sequence in Section 2.1.2 in Doukan (1997), we have

|E [Hl1l3l5l7 (k1, k3, k5, k7)]| ≤

8M1α4δ−14δ+4 (k7 − k5) if k7 − k5 = d1,

8M1α4δ−14δ+4 (k3 − k1) if k3 − k1 = d1.

whereM1 = supL0 maxl1l2,l3,l4 maxi maxt1,t3,t5,t7 [E∣∣ϕit1,l1∣∣4+4δ

]1/(4+4δ)[E∣∣ϕit3,l3ϕit5,l5ϕit7,l7∣∣1+δ

]1/(1+δ).

Thus ∑2≤k1<k3<k5<k7≤T

k3−k1=d1

|E [Hl1l3l5l7 (k1, k3, k5, k7)]| ≤T∑t=2

T−1∑d1=2

d1−1∑d2=2

d2−1∑d3=2

M1α4δ−14δ+4 (d1)

≤ T8M1

T−1∑d1=2

d21α

4δ−14δ+1 (d1) = O (T ) .

48

Page 49: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

Similarly, we can show that∑

2≤k1<k3<k5<k7≤Tk7−k5=d1

|E [Hl1l3l5l7 (k1, k3, k5, k7)]| = O (T ). Next, we

consider the subcase with k5 − k3 = d1. It follows that k7 − k5 = d2 or k3 − k1 = d2. Weonly consider k7 − k5 = d2. Then by Lemma A.1 in Ch. 7 in Gao (2007), for each term insummation,∣∣∣E [Hl1l3l5l7 (k1, k3, k5, k7)]− E

(ϕik1,l∗1ϕik3,l∗3

)E(ϕik5,l∗5ϕik7,l∗7

)∣∣∣ ≤ 10M22α

δδ+1 (k5 − k3)

where M2 = supL0 maxl1,l3 maxi maxt1,t3 [E∣∣ϕit1,l1ϕit3,l3∣∣2+2δ

]1/(2+2δ), and l∗1, l∗3, l∗5, and l

∗7 be

the corresponding permutation of l1, l3, l5 and l7 according the ascending order of t1, t3, t5 andt7. Further, by using the covariance inequality again, we have∣∣∣E (ϕik5,l∗5ϕik7,l∗7)∣∣∣ ≤ 8M2

3α2δ+12δ+2 (k7 − k5)

where M3 = supL0 maxl maxt[E∣∣ϕit,l∣∣4+4δ

]1/(4+4δ). Then

|E [Hl1l3l5l7 (k1, k3, k5, k7)]| ≤ 10M2αδδ+1 (k5 − k3) + 8M4M

23α

2δ+12δ+2 (k7 − k5)

where M4 = maxl,k maxt6=s∣∣E (ϕit,lϕis,k)∣∣. Thus∑

2≤k1<k3<k5<k7≤Tk5−k3=d1,k7−k5=d2

|E [Hl1l3l5l7 (k1, k3, k5, k7)]|

≤T∑t=2

T−1∑d1=2

d1−1∑d2=2

d2−1∑d3=2

[10M2α

δδ+1 (d1) + 8M4M

23α

2δ+12δ+2 (d2)

]

≤ 10M2TT−1∑d1=2

d21α

δδ+1 (d1) + 8M4M

23T

2T−1∑d2=2

d2α2δ+12δ+2 (d2)

= O (T ) +O(T 2).

It follows that G[4]I,l1l3l5l7

(i) = O(T 2)uniformly.

For G[3]I,l1l3l5l7

(i), we can write

G[3]I,l1l3l5l7

(i) =∑

2≤t1,t3,t5,t7≤T, |S|=3

E(ϕit1,l1ϕit3,l3ϕit5,l5ϕit7,l7

).

As before, we let k1, k3, k5 and k7 be the permutation of t1, t3, t5, t7 in ascending order, thatis k1 ≤ k3 ≤ k5 ≤ k7. It is clearly that there must be four subcases (iia) k1 = k3 < k5 < k7,or k1 < k3 < k5 = k7; (iib) k1 < k3 = k5 < k7. We denote the summation in (iia) and (iib) asG

[3a]I,l1l3l5l7

(i) and G[3b]I,l1l3l5l7

(i), respectively. For (iia), the typical terms in the summation are

E(ϕik1,l∗1ϕik1,l∗3ϕik5,l∗5ϕik7,l∗7

)or E

(ϕik1,l∗1ϕik3,l∗3ϕik7,l∗5ϕik7,l∗7

), which are bounded by

M5M2αδ

1+δ (k5 − k1) + 8M4M23α

2δ+12δ+2 (k7 − k5) or

M5M2αδ

1+δ (k7 − k3) + 8M4M23α

2δ+12δ+2 (k3 − k1)

49

Page 50: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

where M5 = maxi maxk maxl,r[E∣∣ϕik,lϕik,r∣∣2+2δ

]1

2+2δ . Thus

G[3a]I,l1l3l5l7

(i) ≤∑k1

∑k5

∑k7

M5M2αδ

1+δ (k5 − k1) + 8M4M23α

2δ+12δ+2 (k7 − k5)

+∑k1

∑k3

∑k7

M5M2αδ

1+δ (k7 − k3) + 8M4M23α

2δ+12δ+2 (k3 − k1)

= TC

T∑k=1

T∑d=1

αδ

1+δ (d) + TC

T∑k=1

T∑d=1

α2δ+12δ+2 (d) = O

(T 2).

For (iib), we have the typical term E(ϕik1,l∗1ϕik3,l∗3ϕik3,l∗5ϕik7,l∗7

)in the summation, which can be

bounded by 8M1α4δ−14δ+4 (k3 − k1) when k3− k2 = d1 and 8M1α

4δ−14δ+4 (k7 − k3) when k7− k3 = d1.

Then follow the analysis of case (i’), we can show that G[3b]I,l1l3l5l7

(i) ≤ O (T ).

For (iii’), the absolute value of each term in the summation is bounded by (M6)2, whereM6 ≡ supL0 maxi max1≤l1,l3,l5,l7≤L

∣∣E (ϕit,l1ϕit,l3ϕit,l5ϕit,l7)∣∣, and the number of such terms areO(T 2). Thus, we have

G[2−]I,l1l3l5l7

(i) = M26O(T 2)

= O(T 2).

Combine the results for cases (i’)-(iii’), we have GI,l1l3l5l7 (i) = O(T 2)uniformly. Then

GI ≤16

N4T 4V2NT

(supL0

maxl,k

Qlk)4

L80

[NO

(T 2)NO

(T 2)

+NO(T 2)O(T 2)]

= O(L6N−2

)= o (1)

Proof of (iii). We write GII =∑

1≤i<j<l≤N E(W2ijW2

il +W2ijW2

jl +W2jlW2

il

)= GIIa +

GIIb+GIIc, say. By the symmetricity of indices i, j,and l, we only prove GIIa =∑

1≤i<j<l≤N E(W2ijW2

il

)=

o (1) . Note that

GIIa =2

N4T 4V2NT

∑1≤i 6=j 6=r≤N

E[(

∆u′iHij∆uj)2 (

∆u′iHir∆ur)2]

=2

N4T 4V2NT

∑1≤i 6=j 6=r≤N

E[∆u′iWiQW′

j∆uj∆u′jWjQW′i∆ui∆u′iWiQW′

r∆ur∆u′rWrQW′i∆ui

]=

2

N4T 4V2NT

N∑i=1

E[∆u′iWiQΩ1,−iQW′

i∆ui∆u′iWiQΩ1,−iQW′i∆ui

]− 2

N4T 4V2NT

N∑i=1

N∑j=1, 6=i

E[(

∆u′iWiQW′j∆uj

)4]= GIIa1 +GIIa2 , say

50

Page 51: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

where Ω−i = 1NT

∑Nj=1, 6=iE

(W′

j∆uj∆u′jWj

). Note that we have shown that GIIa2 = o (1)

in case (i). For the first term, we have

GIIa2 ≤ λmax (QΩ−iQ)2

N2T 2V2NT

N∑i=1

tr[E(W′

i∆ui∆u′iWiQΩ−iQW′i∆ui∆u′iWi

)]≤ λ2

max (QΩ−iQ)2

N2T 2V2NT

N∑i=1

tr[E(W′

i∆ui∆u′iWiW′i∆ui∆u′iWi

)]=

2C

N2T 2V2NT

N∑i=1

(T∑t=2

T∑s=2

W′isWit∆uit∆uis

)2

=2C

N2T 2V2NT

N∑i=1

∑t1,t2,t3,t4

E(ϕit1ϕit2ϕit3ϕit4

)where ϕit = Wit∆uit. Follow the proof for the case (ii), we can that the last term isO

(L2N−1

)=

o (1).Proof of (iv). We write GIII =

∑1≤i<j<l<r≤N E(WijWirWljWlr + WijWilWjrWlr+

WirWilWjrWjl) =∑4

s=1GIII,s, say. We have

GIII,1 =1

N4T 4V2NT

∑1≤i 6=j 6=l 6=r≤N

tr[E(QW′

i∆ui∆u′iWiQW′r∆ur∆u′rWrQW′

l∆ul∆u′lWlQ∆ujWj∆u′jWj

)]=

1

N4V2NT

∑1≤i 6=j 6=l 6=r≤N

tr (QΩiQΩrQΩlQΩj)

=1

V2NT

tr (QΩQΩQΩQΩ)− 1

N4V2NT

∑1≤j 6=l 6=r≤N

tr (QΩrQΩrQΩlQΩj)

− C

N4V2NT

∑1≤l 6=r≤N

tr (QΩrQΩrQΩlQΩl)−1

N4V2NT

∑1≤r≤N

tr (QΩrQΩrQΩrQΩr)

=1

V2NT

tr (QΩQΩQΩQΩ) +O (L0/N) +O(L0/N

2)

+O(L0/N

3)

= O(L−1

)+O (L0/N) +O

(L0/N

2)

+O(L0/N

3)

= o (1) .

Lastly, by the Cauchy-Schwarz inequality, we can show that Js,NT = op (1) for s = 4, 5, 6.This complete the proof.

Proposition B.4 Under Assumptions A.1-A.8, NT (D2,NT +D3,NT +D6,NT ) /√VNT →p µΨ

under H1 (δNT ).

Proof. (i) Noting that under H1 (δNT ), θit − γ0 = δNTΨ (zit). We have

NTD2,NT√VNT

=NTδ2

NT√VNT

1

NT

N∑i=1

T∑t=2

‖Ψ (zit)‖2 ait =1

NT

N∑i=1

T∑t=2

‖Ψ (zit)‖2 ait.

51

Page 52: Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models… · 2019-09-24 · Partially Linear Functional-Coe¢ cient Dynamic Panel Data Models: Sieve Estimation and Speci–cation

(ii) UnderH1 (δNT ), we have γ = (D′VMXVDV)−1D′VMXV

YV = γ0+δNTγΨ+Op[(NT )−1/2]

by Lemma B.2. Then we have

NTD3,NT√VNT

≡ NTδ2NT√

VNT1

NT

N∑i=1

T∑t=2

‖γΨ‖2 ait + op (1) =1

NT

N∑i=1

T∑t=2

‖γΨ‖2 ait + op (1) .

(iii) Under H1 (δNT ), we have

NTD6,NT√VNT

≡ − 2

NT

N∑i=1

T∑t=2

Ψ (zit)′ γΨait + op (1) .

Combining (i)-(iii) completes the proof.

Proposition B.5 Under Assumptions A.1-A.4, and A.6-A.7, (i) BNT = BNT +op(1) and (ii)VNT = VNT + op (1) .

Proof. (i) Recall that BNT =tr(QNTΩNT ). Write∣∣∣BNT − BNT ∣∣∣ = |tr(QNTΩNT −QΩ)|

≤ |tr [(QNT −Q)Ω]|+ |tr [QNT (ΩNT − Ω)]|≤ ‖QNT −Q‖ ‖Ω‖+ ‖QNT ‖ ‖ΩNT − Ω‖

= Op

(L2/√NT

)+Op

(L2/√NT

)= op (1) .

because ‖ΩNT − Ω‖ = Op

(L/√NT

)and ‖QNT −Q‖ = op

(L/√NT

)by Lemma B.2 and

under Assumption A.7.(ii) Recall that VNT = 2tr(QNTΩNTQNTΩNT ) and VNT = 2tr(QΩQΩ). We have∣∣∣VNT − VNT ∣∣∣ = |tr (QNTΩNTQNTΩNT −QΩQΩ)|

≤ |tr [(QNT −Q) ΩNTQNTΩNT ]|+ |tr [Q (ΩNT − Ω)QNTΩNT ]|+ |tr [QΩ (QNT −Q) ΩNT ]|+ |tr [QΩQ (ΩNT − Ω)]|

= Op

(L/√NT

)Op (L)

= op (1)

by Cauchy-Schwarz inequality and Lemma B.2 again and under Assumption A.7.

52