39
DEPARTMENT OF ECONOMICS WORKING PAPER SERIES 2018-02 McMASTER UNIVERSITY Department of Economics Kenneth Taylor Hall 426 1280 Main Street West Hamilton, Ontario, Canada L8S 4M4 http://www.mcmaster.ca/economics/

DEPARTMENT OF ECONOMICS WORKING PAPER SERIES · 2018. 1. 7. · second order univariate kernel (Epanechnikov, Gaussian, e.g.) and h s is the sth element of the bandwidth vector h

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • DEPARTMENT OF ECONOMICS

    WORKING PAPER SERIES

    2018-02

    McMASTER UNIVERSITY Department of Economics Kenneth Taylor Hall 426 1280 Main Street West

    Hamilton, Ontario, Canada L8S 4M4

    http://www.mcmaster.ca/economics/

  • Nonparametric Estimation and Inference forPanel Data Models

    Christopher F. Parmeter∗

    Jeffrey S. Racine†

    January 02, 2018

    Abstract

    This chapter surveys nonparametric methods for estimation and inference in a paneldata setting. Methods surveyed include profile likelihood, kernel smoothers, as well asseries and sieve estimators. The practical application of nonparametric panel-basedtechniques is less prevalent that, say, nonparametric density and regression techniques.It is our hope that the material covered in this chapter will prove useful and facilitatetheir adoption by practitioners.

    1 Introduction

    Though panel data models have proven particularly popular among applied econometricians,the most widely embraced approaches rely on parametric assumptions. When these assump-tions are at odds with the data generating process (DGP), the corresponding estimators willbe biased and worse, inconsistent. Practitioners who subject their parametric models to abattery of diagnostic tests are often disappointed to learn that their models are rejected bythe data. Consequently, they may find themselves in need of more flexible nonparametricalternatives.

    Given the popularity of panel data methods in applied settings, and given how quickly thefield of nonparametric panel methods is developing, this chapter presents a current surveyof available nonparametric methods and outlines how practitioners can avail themselves of

    ∗Department of Economics, University of Miami; [email protected]†Department of Economics and Graduate Program in Statistics, McMaster University, racinej@mcmaster.

    ca; Department of Economics and Finance, La Trobe University; Info-Metrics Institute, American Uni-versity. Racine would like to gratefully acknowledge support from the Natural Sciences and EngineeringResearch Council of Canada (NSERC:www.nserc.ca), the Social Sciences and Humanities Research Councilof Canada (SSHRC:www.sshrc.ca), and the Shared Hierarchical Academic Research Computing Network(SHARCNET:www.sharcnet.ca).

    1

    mailto:[email protected]:[email protected]:[email protected]

  • these recent developments. The existing literature that surveys semi- and nonparametricpanel data methods includes Li and Racine [2007], Ai and Li [2008], Su and Ullah [2011],Chen et al. [2013], Henderson and Parmeter [2015], Sun et al. [2015], and Rodriguez-Pooand Soberon [2017], among others. Our goal here is to unify and extend existing treatmentsand inject some additional insight that we hope is useful for practitioners trying to keepabreast of this rapidly growing field. By way of illustration, Rodriguez-Poo and Soberon[2017] provide a nice survey of available estimators, however they do not address inference,which is a practical necessity. We attempt to provide a more comprehensive treatment thanthat found elsewhere keeping the needs of the practitioner first and foremost.

    2 How Unobserved Heterogeneity Complicates Esti-mation

    To begin, we start with the conventional, one-way nonparametric setup for panel data:

    yit = m(xit) + αi + εit, i = 1, . . . , n, t = 1, . . . , T, (1)

    where xit is a q × 1 vector, m(·) is an unknown smooth function, αi captures individualspecific heterogeneity and εit is the random error term. The standard panel framework treatsi as indexing the individual and t as indexing time, though in many applications t may notrepresent time. For example, in the meta analysis field, i represents a given research studyand t the individual estimates produced from the study. As in a fully parametric setting,the αis need to be accounted for due to the incidental parameters problem under the fixedeffects framework. Two common transformations to eliminate αi prior to estimation are lineardifferencing or time-demeaning.

    Consider time-demeaning; in this case we use the standard notation z̄i· = T−1T∑t=1

    zit torepresent the mean of variable z for individual i. Given that αi is constant over time,time-demeaning will eliminate αi from Equation (1):

    yit − ȳi· = m(xit)− T−1T∑t=1

    m(xit) + εit − ε̄i·. (2)

    Unfortunately, we now have the function m(·) appearing T + 1 times on the right hand side.Given that m(·) is unknown, this causes problems with standard estimation because we mustensure that the same m(·) is being used. Moreover, even if m(·) were known, if it is nonlinear

    2

  • in the parameter space, then the resulting nonlinear estimating equation may complicateoptimization of the associated objective function. This is true in other settings as well, forexample, in the estimation of conditional quantiles with panel data. As we will discussshortly, a variety of approaches have been proposed for tackling the presence of unobservedheterogeneity.

    In the random effects framework, the incidental parameters problem no longer exists, howevera complicating factor for estimation is how to effectively capture the structure of the variancecovariance matrix of vit = αi + εit. In this case it is not clear how best to smooth the data,and a variety of estimators have been proposed. At issue is the best way to smooth thecovariates while simultaneously accounting for the covariance structure. As we will discussshortly, several early estimators that were proposed were not able to achieve asymptoticgains because the smoothing procedure advocated did not adequately capture the covariancestructure, asymptotically.

    We could also discuss the two-way error component setting, but in this case if the fixedeffects framework was assumed, then it is easier to treat time as a covariate and smooth itappropriately (using an ordered discrete kernel), while in the random effects framework itfurther complicates the variance matrix of the error term. In light of this, we will focus onthe one-way error component model.

    Prior to moving on to the discussion of estimation in either the fixed or random effectsframework in a one-way error component model, some definitions are in order. Under therandom effects framework, we assume that E[αi|xi1, . . . ,xiT ] = E[αi] = 0, whereas under thefixed effects framework we assume that E[αi|xi1, . . . ,xiT ] = αi. The difference between thetwo should be clear; under the random effects framework, αi is assumed to be independent ofxit for any t, whereas under the fixed effects framework αi and xit are allowed to be dependentupon one another. No formal relationship on this dependence is specified under the fixedeffects framework. One could relax this set of all or nothing assumptions by following theapproach of Hausman and Taylor [1981], however, this is currently an unexplored area withinthe field of nonparametric estimation of panel data models. We now turn to a discussion ofnonparametric estimation under both the fixed and random effects frameworks.

    3

  • 3 Estimation in the Random Effects Framework

    3.1 Preliminaries

    If we assume that αi is uncorrelated with xit, then its presence in Equation (1) can be dealtwith in a more traditional manner as it relates to kernel smoothing. To begin, assume thatVar(εit) = σ2ε and Var(αi) = σ2α. Then, for vit = αi + εit we set vi = [vi1, vi2, . . . , viT ]′, a T × 1vector, and Vi ≡ E(viv′i) takes the form

    Vi = σ2εIT + σ2αiT i′T , (3)

    where IT is an identity matrix of dimension T and iT is a T × 1 column vector of ones. Sincethe observations are independent over i and j, the covariance matrix for the full nT × 1disturbance vector u, Ω = E(vv′) is a nT × nT block diagonal matrix where the blocks areequal to Vi, for i = 1, 2, . . . , n. Note that this specification assumes a homoskedastic variancefor all i and t.

    Note that serial correlation over time is admitted, but only between the disturbances for thesame individuals. In other words

    Cov (vit, vjs) = Cov (αi + εit, αj + εjs) = E (αiαj) + E (εitεjs) ,

    given the independence assumption between αj and εjs and the i.i.d. nature of εit. Thecovariance between two error terms equals σ2α + σ2ε when i = j and t = s, it is equal to σ2αwhen i = j and t 6= s, and it is equal to zero when i 6= j. This is the common structurein parametric panel data models under the random effects framework as well. Estimationcan ignore this structure at the expense of a loss in efficiency. For example, ignoring thecorrelation architecture in Ω, standard kernel regression methods, such as local-constant orlocal-linear least-squares could be deployed.

    Parametric estimators under the random effects framework typically require the use of Ω−1/2

    so that a generalized least squares estimator can be constructed. Inversion of the nT × nTmatrix is computationally expensive but Baltagi [2013] provides a simple approach to invertingΩ based on the spectral decomposition. For any integer r,

    Ωr =(Tσ2α + σ2ε

    )rP + (σ2ε)rQ, (4)

    where P = In ⊗ JT , Q = In ⊗ ET , JT is a T × T dimensional matrix where each element is

    4

  • equal to 1/T and ET =(IT − JT

    ). Ω is infeasible as both σ2α and σ2ε are unknown; a variety

    of approaches exist to estimate the two unknown variance parameters, which we will discussin the sequel. An important feature of Ω is that its structure is independent of m(xit). Thus,when we model the unknown conditional mean as either parametric or nonparametric it doesnot influence the manner in which we will account for the variance structure.

    3.2 Local-Polynomial Weighted Least-Squares

    Lin and Carroll [2000] and Henderson and Ullah [2005] were among the first attempts toproperly capture the architecture of the covariance of the one-way error component model.To begin, take a pth order Taylor approximation of Equation (1) around the point x:

    yit = m(x) + αi + εit,

    wherem(x) =

    ∑0≤|j|≤p

    βj(xit − x

    )jwhere j = (j1, . . . , jq), |j| =

    ∑qi=1 ji, xj =

    ∏qi=1 x

    jii , j! =

    ∏qi=1 ji! = j1!× · · · × jq! and

    ∑0≤|j|≤p

    =p∑l=0

    ∑̀j1=0· · ·

    ∑̀jq=0

    j1+···+jq=`

    where we have used the notation of Masry [1996]. j!βj(x) corresponds to(Djm

    )(x), the

    partial derivative of m(x), which is defined as:

    (Djm

    )(x) ≡ ∂

    jm(x)∂(x1)j1 . . . ∂(xq)jq

    ,

    and β vertically concatenates βj (0 ≤ |j| ≤ p) in lexicographical order, with highest priorityto the last position so that (0, . . . , 0, i) is the first element in the sequence and (i, 0, . . . , 0) isthe last element.

    Thus, the pth-order local-polynomial estimator corresponds to the minimizer of the objectionfunction

    minβ

    (NT )−1N∑i=1

    T∑t=1

    yit − ∑0≤|j|≤p

    βj(xit − x

    )j2Kitxh, (5)where Kitxh =

    q∏s=1

    h−1s k(xits−xshs

    )is the standard product kernel where k(·) is any

    5

  • second order univariate kernel (Epanechnikov, Gaussian, e.g.) and hs is the sth

    element of the bandwidth vector h and smooths the sth dimension of x. LetKx = diag(K11x, K12x, . . . , K1Tx, K21x, . . . , KnTx). Finally, collecting yit into the vec-tor y and denoting the matrix Ditx which vertically concatenates (xit − x)j for 0 ≤ |j| ≤ pin lexicographical order, we use the notation Dx =

    [D11x, D12x, . . . , D1Tx, D21x, . . . , DnTx

    ]′.

    For example, Ditx = 1 for p = 0 (the local-constant setting), and Ditx = [1, (xit − x)′]′ forp = 1 (the local-linear setting).

    It can be shown that the local polynomial estimator for the minimization problem in Equation(5) is

    β̂(x) = (D′xKxDx)−1D′xKxy. (6)

    The local-polynomial estimator in Equation (6) ignores the covariance structure in the one-way error component model. Both Lin and Carroll [2000] and Henderson and Ullah [2005]proposed alternative weighting schemes to capture this covariance structure. Hendersonand Ullah [2005] focused on the specific setting of p = 1, and thus used what they termedthe local-linear weighted least-squares (LLWLS) estimator. This estimator is identical tothat in Equation (6) except that the diagonal kernel weighting matrix Kx is replaced with anon-diagonal weighting matrix, designed to account for within individual correlation. Thisnon-diagonal weighting matrix is designated as Wx and can take an array of shapes.

    Ullah and Roy [1998] proposeW1x = Ω−1/2KxΩ−1/2

    while Lin and Carroll [2000] propose

    W2x = Ω−1Kx

    andW3x =

    √KxΩ−1

    √Kx

    resulting in the LPWLS estimator

    β̂r(x) = (D′xWrxDx)−1D′xWrxy, r = 1, 2, 3. (7)

    When Ω is diagonal W1x =W2x =W3x. Both W1x and W3x are symmetric and amount tolocal-polynomial estimation of the transformed observations

    √KxΩ−1/2y on

    √KxΩ−1/2x and

    Ω−1/2Kxy on Ω−1/2Kxx, respectively.1 Given that Ω is unknown, due to the presence of σ2α1The most popular weighting scheme is W3x of Lin and Carroll [2000]. See Henderson and Ullah [2014]

    6

  • and σ2ε , a feasible matrix must be constructed. This can be accomplished most easily bydeploying the local-polynomial least-squares estimator in Equation (6) first, obtaining theresiduals and then use the best linear unbiased predictors for these variances as provided inBaltagi [2013]:

    σ̂21 =T

    N

    N∑i=1

    v̂2i·,

    σ̂2ε =1

    NT −N

    N∑i=1

    T∑t=1

    (v̂it − v̂i

    )2,

    where v̂i· = T−1∑Tt=1 v̂it is the cross-sectional average of the residuals for cross-section i and

    v̂it = yit − m̂(xit) is the LPLS residual based on the first stage estimator of β(x). Hereσ21 = Tσ2α + σ2ε in Equation (4).

    Lin and Carroll [2000] derive the bias and variance of β(x) while Henderson and Ullah[2005] provide the rate of convergence of β(x) for p = 1 under the assumption of N →∞.Most importantly, Lin and Carroll [2000] note that the asymptotic variance of the LPWLSestimator in Equation (7) for r = 1, 2, 3, is actually larger than that of the LPLS estimatorin Equation (6). While this result seems counter-intuitive, Wang [2003] explains that thisis natural when T is finite. By assumption, as N → ∞, h → 0, and the kernel matrix,evaluated at the point, xit, say, implies that the other points for individual i, xis, s 6= t, willnot provide weight asymptotically. This is true since, as N →∞, we obtain information onmore individuals, not more information on a given individual. Under the common assumptionin the random effects framework that we have independence across individuals, this suggeststhat the most efficient estimator occurs when Wrx = Kx.

    A more general approach for local-polynomial estimation in the presence of specific covariancestructure is found in Rucksthul et al. [2000] and Martins-Filho and Yao [2009]. WhileMartins-Filho and Yao [2009] consider a more general structure for Ω, both provide a two-stepestimator which will achieve asymptotic efficiency gains relative to the LPLS estimator. Theproposed estimator can be explained as follows. First, premultiply both sides of Equation (1)by Ω−1/2 to obtain

    Ω−1/2yit = Ω−1/2m(xit) + Ω−1/2vit,

    and then add and subtract m(xit) from the right hand side

    Ω−1/2yit = Ω−1/2m(xit) +m(xit)−m(xit) + Ω−1/2vit.

    for a Monte Carlo comparison of these alternative weighting schemes.

    7

  • This results in

    Ω−1/2yit − Ω−1/2m(xit)−m(xit) =m(xit) + Ω−1/2vitỹit =m(xit) + Ω−1/2vit

    where ỹit = Ω−1/2yit − Ω−1/2m(xit) + m(xit) = Ω−1/2yit + (1 − Ω−1/2)m(xit). For given ỹit,m(xit) can be estimated using local-polynomial least-squares. Unfortunately ỹit is unknowndue to the presence of Ω and m(xit). Rucksthul et al. [2000] and Martins-Filho and Yao[2009] propose estimation of Ω in a first stage that ignores the error structure. The two stepestimator is

    1. Estimate m(xit) using local-polynomial least-squares, and obtain the residuals toconstruct Ω̂.

    2. Run the local-polynomial least-squares regression of ̂̃yit on xit where ̂̃yit = Ω̂−1/2yit +(1− Ω̂−1/2)m̂(xit).

    Both Su and Ullah [2007] and Martins-Filho and Yao [2009] discuss the large sample propertiesof this random effects estimator.

    3.3 Spline-Based Estimation

    Unlike Ullah and Roy [1998], who consider kernel-based procedures, Ma et al. [2015a] considera B-spline regression approach towards nonparametric modelling of a random effects (errorcomponent) model. Their focus is on the estimation of marginal effects in these models,something that has not received perhaps as much attention as it might otherwise. To describetheir estimator, first, for the vector of covariates, xit = (xit1, . . . , xitd)′ assume for 1 ≤ s ≤ d,each xits is distributed on a compact interval [as, bs], and without loss of generality, Ma et al.[2015a] take all intervals [as, bs] = [0, 1]. Furthermore, they allow εit to follow the randomeffects specification, where εi = (εi1, . . . , εiT )′ be a T × 1 vector. Then V ≡ E (εiε′i) takesthe form

    V =σ2vIT + σ2α1T1′T ,

    where IT is an identity matrix of dimension T and 1T is a T × 1 column vector of ones. Thecovariance matrix for ε = (ε′1, . . . , ε′n)

    ′ is

    Ω =E (εε′) = IN ⊗V,Ω−1 = IN ⊗V−1

    8

  • By simple linear algebra, V−1 = (Vtt′)Tt,t′=1 = V1IT + V21T1′T with V1 = σ−2v and V2 =− (σ2v + σ2αT )

    −1σ2ασ

    −2v .

    Ma et al. [2015a] use regression B-splines to estimate the mean function m (·) and its firstderivative. Let N = Nn be the number of interior knots and let q be the spline order. Divide[0, 1] into (N + 1) subintervals Ij = [rj, rj+1), j = 0, . . . , N − 1, IN = [rN , 1], where {rj}Nj=1is a sequence of interior knots, given as

    r−(q−1) = · · · = r0 = 0 < r1 < · · · < rN < 1 = rN+1 = · · · = rN+q.

    Define the q-th order B-spline basis as Bs,q = {Bj (xs) : 1− q ≤ j ≤ N}′ [de Boor, 2001, Page89]. Let Gs,q = G(q−2)s,q be the space spanned by Bs,q, and let Gq be the tensor product ofG1,q, . . . , Gd,q, which is the space of functions spanned by

    Bq (x) = B1,q ⊗ · · · ⊗Bd,q

    ={ d∏

    s=1Bjs,q (xs) : 1− q ≤ js ≤ N, 1 ≤ s ≤ d

    }′Kn×1

    =[{Bj1,...,jd,q (x) : 1− q ≤ js ≤ N, 1 ≤ s ≤ d}

    ′]

    Kn×1,

    where x = (x1, . . . , xd)′and Kn = (N + q)d. Let Bq=

    [{Bq (x11) , . . . ,Bq (xnT )}′

    ]nT×Kn

    ,where xit= (xit1, . . . , xitd)

    ′. Then m (x) can be approximated by Bq (x)′ β, where β is a Kn×1

    vector. Letting Y =[{

    (Yit)1≤t≤T,1≤i≤n}′]

    nT×1, they estimate β by minimizing the weighted

    least squares criterion,{Y −Bqβ}′Ω−1 {Y −Bqβ} .

    Then the estimator of β, β̂, solves the estimating equations B′qΩ−1 {Y −Bqβ} = 0, whichgives the GLS estimator

    β̂ =(B′qΩ−1Bq

    )−1B′qΩ−1Y.

    The estimator of m (x) is then given by m̂ (x) = Bq (x)′ β̂. In de Boor [2001], Page 116,it is shown that the first derivative of a spline function can be expressed in terms of aspline of one order lower. For any function s (x) ∈ Gq that can be expressed by s (x) =∑j1,...,jd aj1,...,jdBj1,q (x1) · · ·Bjd,q (xd), the first derivative of s (x) with respect to xs is

    ∂s

    ∂xs(x) =

    ∑Njs=2−q

    ∑1−q≤js′≤N,1≤s′ 6=s≤d

    a(1s)j1,...,jd

    Bjs,q−1 (xs)∏s′ 6=s

    Bjs′ ,q (xs′) ,

    in which a(1s)j1,...,jd = (q − 1) (aj1,...,js,...,jd − aj1,...,js−1,...,jd) / (tjs+q−1 − tjs), for 2 − q ≤ js ≤ N

    9

  • and 1 ≤ s′ 6= s ≤ d, 1− q ≤ js′ ≤ N . Let Ln = (N + q)d−1 (N + q − 1), and

    Bs,q−1 (x) =[{Bj1,q (x1) · · ·Bjs,q−1 (xs) · · ·Bjd,q (xd)}

    1−q≤js′≤N,s′ 6=s,2−q≤js≤N

    ]Ln×1

    .

    For 1 ≤ s ≤ d, ∂m∂xs

    (x), which is the first derivative of m (x) with respect to xs, is estimatedas

    ∂̂m

    ∂xs(x) = Bs,q−1 (x)′Ds

    (B′qΩ−1Bq

    )−1B′qΩ−1Y,

    in which Ds ={I(N+q)s−1 ⊗M1 ⊗ I(N+q)d−s

    }Ln×Kn

    , and

    M1 = (q − 1)

    −1t1−t2−q

    1t1−t2−q 0 · · · 0

    0 −1t2−t3−q

    1t2−t3−q · · · 0... ... . . . . . . ...

    0 0 · · · −1tN+q−1−tN

    1tN+q−1−tN

    (N+q−1)×(N+q)

    .

    Let ∇m (x) be the gradient vector of m (x). The estimator of ∇m (x) is

    ∇̂m (x) ={∂̂m

    ∂x1(x) , . . . , ∂̂m

    ∂xd(x)

    }′= B∗q−1 (x)

    ′(B′qΩ−1Bq

    )−1B′qΩ−1Y,

    in which B∗q−1 (x) =[{D′1,1Bq−1,1 (x) , . . . , D′1,dBq−1,d (x)

    }]Kn×d

    . For any µ ∈ (0, 1], we denoteby C0,µ [0, 1]d the space of order µ-H{"o}lder continuous functions on [0, 1]d, i.e.,

    C0,µ [0, 1]d =

    φ : ‖φ‖0,µ = supx6=x′,x,x′∈[0,1]d

    |φ(x)− φ (x′)|‖x− x′‖µ2

    < +∞

    in which ‖x‖2 =

    (∑ds=1 x

    2s

    )1/2is the Euclidean norm of x, and ‖φ‖0,µ is the C0,µ-norm of φ.

    Given a d-tuple α = (α1, . . . , αd) of non-negative integers, let [α] =α1 + · · ·+ αd and let Dα

    denote the differential operator defined by Dα = ∂[α]∂xα11 ···∂x

    αdd

    .

    Ma et al. [2015a] establish consistency and asymptotic normality for the estimator m̂ (x) and∇̂m (x), i.e.,

    σ−1n (x) {m̂ (x)−m (x)} −→ N (0, 1)

    andΦ−1/2n (x)

    {∇̂m (x)−∇m (x)

    }−→ N(0d, Id),

    10

  • in which 0d is a d × 1 vector of 0’s, and where σ2n (x) = Bq (x)′Σ−1Bq (x) and where

    Φn (x) ={B∗q−1 (x)

    ′Σ−1B∗q−1 (x)

    }d×d

    .

    Ma et al. [2015a] use cross-validation to determine the degree vector for their B-spline method,and the approach admits discrete covariates; see Ma et al. [2015b] for details.

    Although the description above may be notationally cumbersome, the approach is in factextremely simple and requires only a few lines of code for its implementation.

    3.4 Profile Likelihood Estimation

    Profile likelihood methods are often used when traditional maximum likelihood methods fail.This is common in nonparametric models where the unknown function is treated as an infinitedimensional parameter. These methods commonly require specifying a criterion functionbased around an assumption of Gaussianity of the error term.2 In the present setting, wewould have, for individual i, the criterion function

    Li (·) = L (yi,m (xi)) = −12 (yi −m (xi))

    ′ V −1i (yi −m (xi)) ,

    where yi = (yi1, yi2, . . . , yiT )′ and m (xi) = (m (xi1) ,m (xi2) , . . . ,m (xiT ))′.

    Differentiating Li (·) with respect to m(x) yields

    Litm =∂Li

    ∂m (xit)= e′tV −1i (yi −m (xi)) =

    T∑s=1

    σts (yis −m (xis)) (8)

    where et is a T dimensional vector whose tth element is unity and all other elements are zeroand where σts is the (t, s) element of V −1i .3

    Lin and Carroll [2006] show that m (x) can be estimated in a local-linear fashion by solvingthe first order condition

    0 =N∑i=1

    T∑t=1

    KitxhGitxLitm (yi, m̌(xi)) ,

    where Gitx vertically concatenates (xit − x)j � hj for 0 ≤ |j| ≤ p in lexicographical order,m̌(xi) =

    (m̂ (xi1) , . . . , m̂ (x) + x̌itβ̂ (x) , . . . , m̂ (xiT )

    ), and x̌it = xit − x. Note that the

    argument for Litm is m̂ (xit) for s 6= t and m̂ (x) + x̌itβ̂ (x) for s = t. Plugging in Equation2Note that the assumption of Gaussianity is only required to construct a criterion function.3σtt and σts will differ across cross-sectional units in the presence of an unbalanced panel.

    11

  • (8) and solving yields

    Litm (yi, m̌(xi)) = σtt(yit − m̂ (x)− x̌itβ̂ (x)

    )+

    T∑s=1s6=t

    σts (yis − m̂ (xis)) .

    Wang [2003] developed an iterative procedure to estimate m(x). This estimator is shownto produce efficiency gains relative to the local-linear estimator of Lin and Carroll [2000].The iterative estimator is composed of two parts: one based off of a standard local-linear (orpolynomial) least-squares regression between y and x and a secondary component that usesthe residuals.

    The first stage estimator is constructed using any consistent estimator of the conditionalmean; the pooled LLLS estimator suffices in this setting. To highlight the fact that we havean iterative estimator, we will refer to our first stage estimator as m̂[1] (x) (the subscript[1] represents that we are at the l = 1 step); the residuals from this model are given byv̂[1]it = yit− m̂[1] (xit). At the lth step, m̂[l] (x), and the gradient, β̂[l] (x), are shown by Wang[2003] to be (

    m̂[l] (x)β̂[l] (x)

    )= J−11 (J2 + J3) ,

    where

    J1 =N∑i=1

    T∑t=1

    σttKitxhGitxG′itx,

    J2 =N∑i=1

    T∑t=1

    σttKitxhGitxyit,

    J3 =N∑i=1

    T∑t=1

    T∑s=1s6=t

    σstKitxhGitxv̂[`−1]it.

    If one ignores the presence of J3, then the estimator of Wang [2003] is nearly identical to thepooled LLLS estimator, outside of the presence of σtt (which only has an impact if there isan unbalanced panel). The contribution of J3 is what provides asymptotic gains in efficiency.J3 effectively contributes the covariance among the within cluster residuals to the overallsmoothing. To see why this is the case, consider the LLWLS estimator in Equation (7).Even though the within cluster covariance is included via Wrx, asymptotically this effectdoes not materialize since it appears in both the numerator and denominator in an identicalfashion. For the estimator of Wang [2003], this effect materializes given that the withincluster covariance only occurs in the numerator (i.e., through J3).

    12

  • The Wang [2003] estimator is iterated until convergence is achieved; for example a usefulconvergence criterion is∑Ni=1∑Tt=1{m̂[`](xit)−m̂[`−1](xit)}2/∑Ni=1∑Tt=1 m̂[`−1](xit)2 < ω, whereω is some small number. Wang [2003] demonstrates that the estimator usually convergesin only a few iterations and argues that the once-iterated (` = 2) estimator has the sameasymptotic behavior as the fully iterated estimator.

    Feasible estimation requires estimates of σts which can be obtained using the residuals fromm̂[1](x). The (t, s)th element is

    σts = σ2ε − σ21σ21σ

    2εT

    ,

    and the (t, t)th element is

    σtt = σ2ε + (T − 1)σ21

    σ21σ2εT

    .

    4 Estimation in the Fixed Effects Framework

    4.1 Differencing/Transformation Methods

    Consider first differencing Equation (1), which leads to

    yit − yit−1 = m(xit)−m(xit−1) + εit − εit−1, i = 1, . . . , N, t = 2, . . . , T.

    By assuming a fixed number of derivatives of m(x) to exist, Ullah and Roy [1998] positedthat the derivatives of m(x) would be identified and easily estimated using local-linear(polynomial) regression. For example, consider q = 1, and use the notation ∆zit = zit − zit−1resulting in

    ∆yit = m(xit)−m(xit−1) + ∆εit.

    Next, a first order Taylor expansion of m(xit) and m(xit−1) around the point x results in

    ∆yit =m(x) +m′(x)(xit − x)− (m(x) +m′(x)(xit−1 − x)) + ∆εit=m(x)−m(x) +m′(x) ((xit − x)− (xit−1 − x)) + ∆εit=∆xitm′(x) + ∆εit.

    13

  • This same argument could also be done using the within transformation as well. Thelocal-linear estimator of m′(x) as proposed in Lee and Mukherjee [2014] is

    m̂′(x) =

    N∑i=1

    T∑t=2

    Kitxh∆xit∆yitN∑i=1

    T∑t=2

    Kitxh∆x2it. (9)

    There are two main problems with this differencing estimator. First, the conditional meanis not identified in this setting. Second, as Lee and Mukherjee [2014] demonstrate, thelocal-linear estimator possesses a non-vanishing asymptotic bias even as h→ 0. Note thatin the construction of Equation (9), the linear expansion is around the point xit (which isacceptable in a cross-sectional setting). However, in the panel setting, information from thesame individual (i.e., xis s 6= t) cannot be controlled as h → 0. In particular, we took theTaylor expansion around both the points xit and xit−1 in our first differencing, however, thekernel weights only account for the difference between x and xit. This means that the Taylorapproximation error will not decay to zero as the bandwidth decays.

    Several approaches have been proposed to overcome the non-vanishing asymptotic bias. Forexample, Mundra [2005] considers local-linear estimation around the pair (xit, xit−1), whichproduces the estimator (termed the first-difference local-linear estimator, or FDLL)

    m̂′FDLL(x) =

    N∑i=1

    T∑t=2

    KitxhKit−1xh∆xit∆yitN∑i=1

    T∑t=2

    KitxhKit−1xh∆x2it.

    Even with this simple fix however, the issue remains that only the derivatives of the conditionalmean are identified. In some settings this is acceptable. For example, in the hedonic pricesetting the gradients of the conditional mean are of interest because they can be used torecover preferences of individuals. Bishop and Timmins [2016] use this insight and follow thelogic above (albeit using the within transformation) to recover the gradients of a hedonicprice function to value preferences for clean air in the San Francisco Bay area of California.

    As an alternative to the first-difference transformation, the within transformation couldalso be applied, as in Equation (2). Again, as noted in Lee and Mukherjee [2014], directlocal-linear estimation in this framework ignores the application of the Taylor approximationand results in an asymptotic bias. To remedy this, Lee and Mukherjee [2014] propose the localwithin transformation. To see how this estimator works, consider that the standard withintransformation calculates the individual specific mean of a variable z as z̄i· = T−1

    T∑t=1

    zit. Lee

    14

  • and Mukherjee [2014] replace the uniform 1/T weighting with kernel weights, producing thelocal individual specific means

    ˜̄zi· =T∑t=1

    Kitzhzit

    T∑s=1

    Kiszh

    =T∑t=1

    witzhzit.

    Using the notation z∗it = zit − ˜̄zi·, the locally-within transformed local-constant estimator is

    m̂′LWTLC(x) =

    N∑i=1

    T∑t=2

    Kitxhx∗ity∗it

    N∑i=1

    T∑t=2

    Kitxhx∗2it

    .

    These local differencing and transformation methods are simple to implement. However, atpresent, it is not clear if they can be used to recover the conditional mean, which would beimportant in applications where forecasting or prediction is desirable.

    4.2 Profile Estimation

    The obvious drawback of the differencing/transformation approaches proposed by Ullah andRoy [1998], Mundra [2005], and Lee and Mukherjee [2014] is that the conditional mean cannotbe identified. To remedy this, Henderson et al. [2008] consider estimation of Equation (1)under the fixed effects framework using the first difference transformation based off of period1:

    ỹit ≡ yit − yi1 = m(xit)−m(xi1) + εit − εi1. (10)

    While this estimator could be implemented using period t− 1, as is more common, we followtheir approach here. The benefit of this transformation is that, indeed, the fixed effectsare removed, and, under exogeneity of the covariates E [m(xit)] = E(yit). The problem asit stands with Equation (10) is the presence of both m(xit) and m(xi1). The main idea ofHenderson et al. [2008] is to exploit the variance structure of εit − εi1 when constructing theestimator, which is similar to Wang [2003]’s approach in the random effects setting.

    Start by defining ε̃it = εit − εi1 and ε̃i = (ε̃i2, . . . , ε̃iT )′. The variance-covariance matrix ofε̃i,Vi = Cov(ε̃i|xi1, . . . ,xiT ), is defined as

    Vi = σ2ε(IT−1 + iT−1i′T−1),

    15

  • where IT−1 is an identity matrix of dimension T−1, and iT−1 is a (T−1)×1 vector of ones (notethat in the random effects case it was of dimension T ). Further, V −1i = σ−2ε (IT−1−iT−1i′T−1/T ).Following Wang [2003] and Lin and Carroll [2006], Henderson et al. [2008] deploy a profilelikelihood approach to estimate m(·). The criterion function for individual i is

    Li(·) = L(yi,m (xi)) = −12(ỹi−m (xi) +m (xi1) iT−1)

    ′V −1i (ỹi−m (xi) +m (xi1) iT−1), (11)

    where ỹi = (ỹi2, . . . , ỹiT )′ and m (xi) = (m (xi2) ,m (xi3) , . . . ,m (xiT ))′.

    As in the construction of Wang [2003]’s random effects estimator, define Litm =∂Li(·)/∂m (xit). From Equation (11) we have

    Li1m = −i′T−1V −1i (ỹi −m (xi) +m (xi1) iT−1);

    Litm = c′t−1V −1i (ỹi −m (xi) +m (xi1) iT−1) for T ≥ 2,

    where ct−1 is a vector of dimension (T − 1)× 1 with the (t− 1) element being 1 and all otherelements being 0.

    We estimate the unknown function m(x) by solving the first order condition

    0 =N∑i=1

    T∑t=1

    KitxhGitxLitm[yi, m̂(xi1), . . . , m̂(x) + x̌itβ̂(x), . . . , m̂(xiT )],

    where the argument of Litm is m̂(xis) for s 6= t and m̂(x) + x̌itβ̂(x) when s = t.

    As in Wang [2003], an iterative procedure is required. Denote the estimate of m(x) at the[`− 1]th step as m̂[`−1](x). Then the current estimate of m(x), and its derivative, β(x), arem̂[`](x) and β̂[`](x), which solve:

    0 =N∑i=1

    T∑t=1

    KitxhGitxLitm[yi, m̂[`−1](xi1), . . . , m̂[`] + x̌itβ̂[`](x), . . . , m̂[`−1](xiT )].

    Henderson et al. [2008] use the restriction∑Ni=1∑Tt=1 (yit − m̂(xit)) = 0 so thatm(·) is uniquelydefined since E(yit) = E [m(xit)].

    The next step estimator is (m̂[`], β̂[`]

    )′= D−11 (D2 +D3) ,

    16

  • where

    D1 =T − 1Tσ2ε

    N∑i=1

    T∑t=1

    KitxhGitxG′itx

    D2 =T − 1Tσ2ε

    N∑i=1

    T∑t=1

    KitxhGitxm̂[`−1](xit);

    D3 =N∑i=1

    T∑t=2

    (KitxhGitxc

    ′ti−1 −Ki1xhGi1xi

    ′T−1

    )V −1i Hi,[`−1],

    and

    Hi,[`−1] =

    ε

    [`−1]i2...

    ε[`−1]iT

    − ε[`−1]i1 iT−1where ε[`−1]is = yis − m̂[`−1](xis) are the differenced residuals. Henderson and Parmeter [2015]note that the derivative estimator of Henderson et al. [2008] is incorrect; β̂[`] defined aboveneeds to be divided by h to produce the correct vector of first derivatives of m̂[`](x). Regardingthe asymptotic properties of this profile likelihood estimator, Henderson et al. [2008] onlyprovide a sketch of the form of the asymptotic bias, variance and asymptotic normality,whereas Li and Liang [2015], using the theory of Mammen et al. [2009], provide a fullderivation of asymptotic normality and also demonstrate the robustness of the estimator towhich period is used for differencing.

    Recall that the estimator proposed by Wang [2003] required a consistent initial estimatorof Vi to be operational. There, setting Vi to be an identity matrix resulted in the pooledlocal-linear least-squares estimator as an ideal choice. The same is true here; if we replaceVi by an identity matrix, Equation (10) is an additive model with the restriction that thetwo additive functions have the same functional form. Either a fourth order-polynomial or aseries estimator can be used in this setting to construct an initial consistent estimator of m(·)and subsequently, of Vi. The variance parameter σ2ε can be consistently estimated by

    σ̂2ε =1

    (2NT − 2N)

    N∑i=1

    T∑t=2

    (yit − yi1 − {m̂(xit)− m̂(xi1)})2 .

    However, note that σ̂2v is only necessary in order to estimate the covariance matrix of m̂(x). Itis not necessary for the construction of

    (m̂[`], β̂[`]

    )given that σ̂2ε simply drops out of Equation

    (10).

    17

  • 4.3 Marginal Integration

    An alternative to the iterative procedure of Henderson et al. [2008] is to estimate the modelin Equation (10) using marginal integration. This was proposed by Qian and Wang [2012].To describe their estimator we first restate the first-differenced regression model as follows:

    ∆yit ≡ yit − yit−1 = m(xit)−m(xit−1) + ∆εit. (12)

    Qian and Wang [2012] suggest estimating the model in Equation (12) by estimating

    ∆yit = m(xit,xit−1) + ∆εit

    and then integrating out xit−1 to obtain an estimator of m(xit). The benefit of this approachis that any standard nonparametric estimator can be used, such as local-polynomial least-squares. Consider our earlier discussion of local-polynomial least-squares estimation wherethe estimator was defined in Equation (6). Now, instead of estimation at the point x, wehave estimation at the pair (x, z), resulting in the estimator

    β̂(x, z) = (D′xzKxzDxz)−1D′xzKxz∆y.

    Here we have used the notation Kxz = KxKz and Dxz is the same vertically concatenatedmatrix, but now combined over the points x and z. Regardless of p, m̂(x, z) = e′1β̂(x, z)where e1 is a vector with 1 in the first position and zeros everywhere else. Once this estimatorhas been constructed, we can estimate m(x) from m̂(x, z) by integrating out z. The easiestway to do this is

    m̂(x) = (NT )−1N∑i=1

    T∑t=1

    m̂(x,xit). (13)

    There are two main issues with the marginal integration estimator of Qian and Wang [2012].First, given the well known curse of dimensionality, estimation of m(x, z) is likely to beplagued by bias if q is large and/or NT is relatively small. Second, notice that the marginalintegration estimator in Equation (13) requires counterfactual construction, which impliesthat to evaluate the estimator at a single point requires NT function evaluations and so(NT )2 evaluations are need to estimate the function at all data points. Even for moderatelysized N and T this could prove computationally expensive. In simulations, Gao and Li [2013]report that the marginal integration estimator takes substantial amounts of time to computeeven for small N and T .

    18

  • Even given these two drawbacks, the marginal integration estimator of Qian and Wang[2012] has much to offer. First, given that the only transformation that is required is first-differencing, this estimator can easily be implemented in any software that can conduct kernelsmoothing and allow the construction of counterfactuals. Moreover, this estimator doesnot require iteration or an arbitrary initial consistent estimator. Both of these advantagesmay lead to the increasing adoption of this estimator in the future. Qian and Wang [2012]prove asymptotic normality of the estimator for the local-linear setting. In Monte Carlosimulations they demonstrate that the estimator works well and they also show that themarginal integration estimator outperforms the profile likelihood estimator of Hendersonet al. [2008].

    4.4 Profile Least-Squares

    Gao and Li [2013], Li et al. [2013], and Lin et al. [2014], following Su and Ullah [2006] and Sunet al. [2009], propose estimation of the model in Equation (1) through profile least squares.Assuming the data are ordered so that t is the fast index, then the profile least-squaresestimator of Li et al. [2013] and Lin et al. [2014] begins by assuming that αi is known. In thelocal-linear setting,M(z) = (m(z), h� ṁ(z)′)′ (where � represents Hadamard multiplication)is estimated from

    Mα(x) = arg minM∈Rq+1

    (Y −Dα−DxM

    )′Kx(Y −Dα−DxM

    ), (14)

    where D = (In⊗ iT ) · dn, dn = [−in−1, In−1]′, and in is a n× 1 vector of ones. D is introducedin such a way to ensure that N−1

    N∑i=1

    αi = 0, a necessary identification condition. Define thesmoothing operator

    S(x) = (D′xKxDx)−1D′xKx,

    and the estimator which solves the minimization problem in Equation (14) is

    M̂α(x) = S(x)ε̈

    where ε̈ = Y −Dα. M̂α(x) contains the estimator of the conditional mean, m̂α(x) as well asthe q × 1 vector of first derivatives, scaled by the appropriate bandwidth, h� ̂̇mα(x)′. Defines(x)′ = e′S(x) with e = (1, 0, . . . , 0)′ the (q + 1)× 1 vector. Then m̂α(x) = s(x)′ε̈.

    19

  • Once the estimator of m(x) is obtained, α is estimated through profile least squares from

    α̂ = argminα

    (Y −Dα− m̂α(x)

    )′Kx(Y −Dα− m̂α(x)

    ),

    with m̂α(x) = (m̂α(x11), m̂α(x12), . . . , m̂α(x1T ), m̂α(x21), . . . , m̂α(xNT ))′. Lin et al. [2014]show that the parametric estimator which solves the profile least squares problem is

    α̂ =(D̃′D̃

    )−1D̃′Ỹ ,

    with D̃ = (INT−S)D and Ỹ = (INT−S)Y . Here S = (s(x11), s(x12), . . . , s(x1T ), s(x21), . . . , s(xNT ))′.Finally, α̂1 = −

    n∑i=2

    α̂i.

    The profile least-squares estimator for M(x) is given by

    M̂(x) = M̂α̂(x) = S(x)̂̈ε (15)with ̂̈ε = Y −Dα̂.Su and Ullah [2006] discuss the asymptotic properties of the profile least squares estimatorin the context of the partially linear model, Sun et al. [2009] for the smooth coefficientmodel, and Gao and Li [2013], Li et al. [2013], and Lin et al. [2014] in the full nonparametricmodel setting. The elegance of the profile least-squares estimator is that neither marginalintegration techniques nor iteration are required. This represents a computationally simplealternative to the other estimators previously discussed. To our knowledge only Qian andWang [2012] and Gao and Li [2013] have compared the profile least-squares estimator in afully nonparametric setting. Gao and Li [2013] run an extensive set of simulations, comparingthe profile least-squares estimator, the profile likelihood estimator of Henderson et al. [2008],and the marginal integration estimation of Qian and Wang [2012], finding that the profileleast squares estimator outperforms these other estimators in a majority of the settingsconsidered. Lastly, we know of no application using the profile least-squares approach toestimate the conditional mean nonparametrically, which would be a true test of its appliedappeal.

    The practitioner may find the profile least-squares estimator to be the most accessible of allof the fixed effects estimators described herein. This is no doubt due in part to the fact thatiteration is not required, nor is counterfactual analysis necessary when performing marginalintegration. Moreover, in the local-linear setting described here, both the conditional meanand the corresponding gradients are easily calculated (unlike the local within transformation).Lastly, the profile least-squares estimator can be easily adapted to any order local polynomial

    20

  • and readily modified to include other panel type settings. For example, both time andindividual specific heterogeneity could be accounted for, or if three-way panel data wereavailable, as in the gravity model of international trade, a range heterogeneity types could beincluded, such as importer, exporter and time specific heterogeneity. This estimator offersa range of attractive features for the applied economist and we anticipate it will becomeincreasingly popular over time.

    4.4.1 Estimation with Unbalanced Panel Data

    Given the way in which the data are ordered and how the smoothing is conducted, ifunbalanced panel data is present the only modification to the estimator is the constructionof the matrix D. Whereas in the balanced setting where D is an nT × (n − 1) matrix, Dbecomes a Ť × (n− 1) matrix where Ť =

    n∑i=1

    Ti is the total number of observations in thedata set and Ti is the number of time periods that firm i appears in the data.

    To understand how D changes when unbalanced panel data is present, define ∆1 as theT × (n− 1) matrix consisting of all -1s and ∆, ∈ (2, . . . , n) as the T × (n− 1) matrix whichhas all entries 0 except for the − 1 column, which contains 1s. Then in the balanced case

    Dbal =

    ∆1∆2...

    ∆n

    .

    In the unbalanced setting let e be the vector of 1s and 0s representing which of the T timeperiods individual appears in. Let Γ be the T×T matrix which contains 1s along the maindiagonal and 0s everywhere else. Finally, Γ1 be the matrix which vertically concatenates allof the es. If we assume that the 1st individual appears T times, then in the unbalanced casewe have

    Dunbal =

    Γ1 �∆1

    Γ2∆2...

    Γn∆n

    .

    Aside from this specification of D, no other changes are needed to implement the profileestimator of Li et al. [2013] or Lin et al. [2014] in the presence of panel data.

    21

  • 5 Dynamic Panel Estimation

    Su and Lu [2013] consider kernel estimation of a dynamic nonparametric panel data modelunder the fixed effects framework which can be expressed as

    yit = m(yi,t−1,xit) + αi + εit.

    To construct a kernel estimator for this dynamic model we first difference to eliminate thefixed effect, obtaining

    ∆yit = m(yi,t−1,xit)−m(yi,t−2,xi,t−1) + ∆εit, (16)

    where ∆yit = yit − yi,t−1 and ∆εit = εit − εi,t−1. The model in Equation (16) is onlyidentified up to location, and a further restriction is needed to ensure full identification.Since E (yit) = E [m(zi,t−1)], recentering is a simple way to achieve full identification of theunknown conditional mean. This model can be estimated using additive methods, followingthe marginal integration approach of Qian and Wang [2012], however, as noted earlier, severalcomplications arise. First, the fact that the two functions are identical is not used by themarginal integration estimator, most likely resulting in a loss of efficiency. Second, themarginal integration estimator requires counterfactual construction which can be prohibitivefor large NT . Third, the curse of dimensionality is likely to impede reliable estimation ofthe first stage function due to the curse of dimensionality. Given these hurdles, Su and Lu[2013]’s proposed estimator is a simplification of the profile likelihood estimator of Hendersonet al. [2008], being computationally easier to implement.

    To describe how Su and Lu [2013] construct a dynamic nonparametric panel data estimator,define zi,t−1 = (yi,t−1,xit) and assume that E [∆εit|zi,t−2] = 0. Then

    E (∆yit|zi,t−2) = E [m(zi,t−1)|zi,t−2]−m(zi,t−2).

    Setting zi,t−2 = z and rearranging we have

    m(z) =− E (∆yit|zi,t−2 = z) + E [m(zi,t−1)|zi,t−2 = z]

    =− E (∆yit|zi,t−2 = z) +∫m(v)f(v|zi,t−2 = z)dv. (17)

    The last equality is known as a Fredholm integral equation of the second kind [Kress, 1999].While there exist a variety of avenues to solve integral equations of the second kind, perhapsthe most straightforward is through iteration, which is the way Su and Lu [2013] constructed

    22

  • their estimator.

    To see how an iterative approach works, assume thatm(z) was known. In this case the integralin Equation (17) could be evaluated by setting v = zi,t−1 and running local-polynomial least-squares regression with zi,t−2 as the covariates and m(zi,t−1) as the regressand. Obviously,m(z) is unknown and thus an initial consistent estimator is required (Su and Lu [2013]propose a two-stage least-squares sieve estimator). The iterations are designed to mitigatethe impact than the initial estimator has on the final estimates.

    Su and Lu [2013]’s iterative estimation routine is implemented as follows:

    1. For a given bandwidth, perform local-polynomial least-squares estimation of -∆yit onzi,t−2, evaluating this conditional mean at zi,t−1. Call these estimates r̂.

    2. Define m̂[0] = (NT2)−1N∑i=1

    T∑t=1

    yit where NTj =N∑i=1

    (T − j). Using the same band-width as in Step 1, regress m̂[0] on zi,t−2 using local-polynomial least-squares,evaluating this conditional mean at zi,t−1. Recentering our estimates of m̂[0] by

    (NT1)−1N∑i=1

    T∑t=2

    (yit − m̂[0](zi,t−1)

    ), the initial estimator of the unknown conditional

    mean ism̃[0] = m̂[0] + (NT1)−1

    N∑i=1

    T∑t=2

    (yit − m̂[0](zi,t−1)

    ).

    3. Our next step estimator of m(zi,t−1) is

    m̂[1](zi,t−1) = m̃[1](zi,t−1) + r̂.

    Again, for identification purposes, recenter m̂[1](zi,t−1) by (NT1)−1N∑i=1

    T∑t=2

    (yit − m̂[1](zi,t−1)

    )to produce m̃[1](zi,t−1).

    4. Repeat Step 3, which at the `th step produces

    m̂[`](zi,t−1) = m̃[`−1](zi,t−1) + r̂.

    Lastly, recenter m̂[`](zi,t−1) to obtain the `th step estimator of the unknown conditionalmean, m̃[`](zi,t−1).

    The estimator should be iterated as with the estimators of Wang [2003] and Henderson et al.[2008]. Given that the evaluation of the unknown conditional mean does not change, there isno need to recalculate the kernel weights across iterations, potentially resulting in dramaticimprovements in computational speed for even moderately sized panels. The steps listed

    23

  • above are for the estimation of the conditional mean, through application of local-polynomialleast squares estimation. If higher order derivatives from the local-polynomial approach weredesired, they can be taken from the corresponding higher order derivatives of the last stageestimator, along with the appropriate derivatives from r̂, given that recentering amounts to alocation shift. Su and Lu [2013] demonstrate that the limiting distribution of this estimatoris normal.

    5.1 The Static Setting

    Nothing prevents application of the Su and Lu [2013] estimator in the static setting ofEquation (1). In comparison to Henderson et al. [2008]’s iterative estimator, Su and Lu[2013]’s estimator is less computationally expensive given that it only requires successivelocal polynomial estimation of an updated quantity. This estimator also avoids performingmarginal integration as required by Qian and Wang [2012] for their fixed effects nonparametricpanel data estimator.

    In the static setting Su and Lu [2013]’s iterative estimation routine is implemented as follows:

    1. For a given bandwidth, perform local-polynomial least-squares estimation of -∆yit onxi,t−2, evaluating this conditional mean at xi,t−1. Call these estimates r̂.

    2. Define m̂[0] = (NT2)−1N∑i=1

    T∑t=1

    yit. Using the same bandwidth as in Step 1, regressm̂[0] on xi,t−2 using local-polynomial least-squares, evaluating this conditional mean

    at xi,t−1. Recentering our estimates of m̂[0] by (NT1)−1N∑i=1

    T∑t=2

    (yit − m̂[0](xi,t−1)

    ), the

    initial estimator of the unknown conditional mean

    m̃[0] = m̂[0] + (NT1)−1N∑i=1

    T∑t=2

    yit − m̂[0](xi,t−1).

    3. Our next step estimator of m(xi,t−1) is

    m̂[1](xi,t−1) = m̃[1](xi,t−1) + r̂.

    Again, for identification purposes, recenter m̂[1](xi,t−1) by (NT1)−1N∑i=1

    T∑t=2

    (yit − m̂[1](xi,t−1)

    )to produce m̃[1](xi,t−1).

    24

  • 4. Repeat Step 3, which at the `th step produces

    m̂[`](xi,t−1) = m̃[`−1](xi,t−1) + r̂.

    Lastly, recenter m̂[`](xi,t−1) to construct the `th step estimator of the unknown condi-tional mean, m̃[`](xi,t−1).

    Outside of Qian and Wang [2012] and Gao and Li [2013], there are no finite sample comparisonsof the range of nonparametric estimators of the static nonparametric panel data model underthe fixed effects framework. This would be an interesting avenue to explore for future researchto help guide authors towards the most appropriate estimator for this model.

    6 Inference

    6.1 Poolability

    Having access to panel data affords researchers the ability to examine the presence (or lackthereof) of heterogeneity in many interesting dimensions that do not exist when cross-sectionaldata are present. One of the main tests of homogeneity is that of poolability. Baltagi et al.[1996] proposed one of the first nonparametric tests of poolability. The importance of such atest is that the size and power of a parametric test of poolability (such as a Chow test) couldbe adversely affected by parametric misspecification of the conditional mean.

    Baltagi et al. [1996] consider a test of poolability for the model

    yit = mt(xit) + εit, i = 1, . . . , N, t = 1, . . . , T, (18)

    where mt (xit) is the unknown functional form that may vary over time, xit is a 1× q vectorof regressors and εit is the error term. For the data to be poolable across time mt (x) = m (x)∀t almost everywhere, with m (x) representing the unknown functional form in the pooledmodel. More specifically, Baltagi et al. [1996] test

    H0 :mt (x) = m (x) ∀t

    almost everywhere versus the alternative that

    H1 :mt (x) 6= m (x)

    25

  • for some t with positive probability.

    Under H0, E (εit|xit) = 0 almost everywhere, where εit = yit −m (xit). Under H1, ε̂it fromthe pooled model will not converge to εit and hence E (ε|x) 6= 0 almost everywhere. Hence, aconsistent test for poolability based on E [εE (ε|x)] is available.

    Baltagi et al. [1996] construct the test statistics as

    ĴNT =N |h|2 ÎNT

    σ̂NT,

    whereÎNT =

    1NT (N − 1) |h|

    N∑i=1

    T∑t=1

    N∑j=1j 6=i

    Kitxjshε̂itε̂jtf̂ (xit) f̂ (xjt) ,

    andσ̂2NT =

    2NT (N − 1) |h|

    N∑i=1

    T∑t=1

    N∑j=1j 6=i

    ε̂2itε̂2jtf̂ (xit)

    2 f̂ (xjt)2 K2itxjsh.

    with |h| = h1 · · ·hq. Baltagi et al. [1996] prove that ĴNT has a standard normal distributionunder H0. While the limiting distribution of ĴNT is available, typically in nonparametricinference it is recommended to use resampling plans (bootstrapping or subsampling) toconstruct the finite sample distribution. The steps used to construct the wild bootstrap teststatistic are as follows:

    1. For i = 1, 2, . . . N and t = 1, 2, . . . T generate the two-point wild bootstrap errorε∗it =

    (1−√5)2

    (ε̂it − ε̂

    )with probability p = (1+

    √5)

    2√

    5 and u∗it =

    (1+√5)2

    (ε̂it − ε̂

    )with

    probability 1 − p where ε̂it = yit − m̂ (xit) is the residual from the pooled estimator.Here we are using the common wild bootstrap weights, but Rademacher weights couldalso be used.

    2. Construct the bootstrap left-hand-side variable y∗it = m̂ (xit) + ε∗it for i = 1, 2, . . . , Nand t = 1, 2, . . . T . The resulting sample {y∗it,xit} is the bootstrap sample. Note thatthis data is generated under the null of a pooled sample. Using the bootstrap sample,estimate m̂∗ (xit) via pooled LCLS where yit is replaced by y∗it.

    3. Use the bootstrap residuals ε̂∗it to construct the bootstrap test statistic Ĵ∗NT .

    4. Repeat steps 1-3 a large number (B) of times and then construct the sampling distri-bution of the bootstrapped test statistics. The null of poolability is rejected if ĴNT isgreater than the upper α-percentile of the bootstrapped test statistics.

    26

  • Lavergne [2001] is critical of this test partially because the smoothing parameter used in thepooled model is the same in of each period. Further, he disagrees that the density of theregressors, f̂ (x), should remain fixed across time. Lavergne [2001] argues that if f̂ (x) variesover time this can lead to poor performance of Baltagi et al. [1996]’s test of poolability.

    An alternative approach to test poolability is that of Jin and Su [2013]. They consider themodel

    yit = mi(xit) + εit, i = 1, . . . , N, t = 1, . . . , T,

    where mi (xit) is the unknown functional form that may vary over individuals with everythingelse defined as in Equation (18). For the data to be poolable across individualsmi (x) = mj (x)∀i, j almost everywhere, with m (x) representing the unknown functional form in the pooledmodel. More specifically, the null hypothesis is

    H0 :mi (x) = mj (x) ∀i, j

    almost everywhere versus the alternative that

    H1 :mi (x) 6= mj (x)

    for some i 6= j with positive probability.

    Jin and Su [2013] do not consider a conditional moment based test for poolability as doBaltagi et al. [1996], but rather a weighted integrated squared error statistic, defined as

    ΓNT =N−1∑i=1

    N∑j=i+1

    ∫(mi(x)−mj(x))2 w(x)dx, (19)

    where w(x) is a user specified probability density function. Under H0, ΓNT = 0, otherwiseΓNT > 0. Jin and Su [2013] point out that ΓNT cannot distinguish all departures from H0.Rather, Equation (19) corresponds to testing

    H̄0 :∆m = 0 versus H̄1 :∆m > 0

    where ∆m = limN→∞ (N(N − 1))−1 ΓNT . The difference between H0 and H̄0 is that H̄0allows for some i 6= j, mi(x) 6= mj(x) with probability greater than zero, but the countmeasure of such pair has to be of smaller order than N(N − 1). That is, it can happen thatmi(x) 6= mj(x), but these occurrences cannot increase as N increases. This is intuitive as thetest of poolability in this case is predicated on increasing the number of cross sections, and as

    27

  • more cross-sections become available, it would seem likely that it will be difficult to rule outthat mi(x) 6= mj(x) for every pair (i, j). Jin and Su [2013] show theoretically that under H̄1,the poolability test is consistent as long as (N(N − 1))−1 ΓNT does not shrink too fast to 0.

    The statistic in Equation (19) is calculated under H1 and requires the user to estimate mi(x)across all individuals. For example, using local-polynomial least-squares, with m̂j(x), theΓNT is estimated by

    Γ̂NT =N−1∑i=1

    N∑j=i+1

    ∫(m̂i(x)− m̂j(x))2 w(x)dx.

    Jin and Su [2013] show that after appropriate normalization, Γ̂NT is normally distributedunder reasonable assumptions.4 A bootstrap approach similar to that of Baltagi et al. [1996]can be deployed to construct the finite sample distribution of the test statistic.

    6.1.1 Poolability Through Irrelevant Individual and Time Effects

    An informal way to determine whether or not the data is poolable is to directly smooth overboth individual and time and assess the size of the bandwidths on these two variables. It iswell known that discrete variables whose bandwidths hit their upper bounds are removed fromthe smoothing operation. Thus, rather than splitting the data on time (Baltagi et al. [1996])or individual (Jin and Su [2013]), individual and time heterogeneity can be directly includedin the set of covariates and the pooled regression model can be estimated. If the bandwidthson either of these variables are at their corresponding upper bounds then this would signifythe ability to pool in that dimension. This approach was advocated by Racine [2008] andwould seem to hold promise, though further development of the theoretical properties as theypertain to the data-driven bandwidths is warranted.

    We again note that this is informal, but it will reveal poolability in either the individualand/or time dimension. Further, there is no need to construct a test statistic or use resamplingplans. The problem facing practitioners will occur when the estimated bandwidth is close,but not equal to its upper bound. In that case, a more formal approach would need to beundertaken.

    An alternative to the approach of Racine [2008] would be that of Lu and Su [2017], whodevelop a consistent model selection routine for the fixed effects panel data model. While their

    4Jin and Su [2013]’s theory focuses on the setting where mi(x) is estimated using sieves, but it can beadapted to the kernel setting.

    28

  • test is developed and studied in a parametric setting, m(x) = xβ, it can be easily extendedto the nonparametric setting that we have described here. Lu and Su [2017]’s selectiondevice entails choosing the estimator from the model which has the lowest leave-one-out crossvalidation score. For example, if we were comparing the pooled model against the one way,individual effects model, we would estimate both models leaving a single observation out,predict yit for the omitted observation, and repeat this over all nT observations, to calculatethe squared prediction error. The model with the lowest squared prediction error is thenchosen as the best model. Lu and Su [2017] demonstrate that this approach works remarkablywell even in the presence of serial correlation or cross-section dependence and substantiallyoutperforms more common model selection approaches such as AIC or BIC.

    6.2 Specification Testing

    Lin et al. [2014] provide an integrated squared error statistic to test for correct specificationof a panel data model under the fixed effects framework. The null hypothesis is

    H0 : Pr {m(x) = xβ0} = 1,

    for some β0 ∈ Rq against the alternative hypothesis

    H0 : Pr {m(x) = xβ0} < 1,

    for any β0 ∈ Rq. Let β̂ denote the parametric estimator of β0 (perhaps using within estimation)and m̂(x) the profile least-squares estimator. Then a consistent test for H0 is based off of∫ (

    m̂(x)− xβ̂)2dx.

    However, as noted in Lin et al. [2014], this test statistic would possess several non-zerocentering terms which, if not removed, would lead to an asymptotic bias. To avoid thisLin et al. [2014] use the approach of Härdle and Mammen [1993] and smooth xitβ̂. Morespecifically, estimate m(x) using local-constant least-squares (our earlier discussion focusedon local-linear least-squares) as

    m̂(x) = (ı′NTS(x)ıNT )−1ı′NTS(x)y (20)

    where S(x) = Q(x)′KxQ(x) and Q(x) = INT − D (D′KxD)−1 D′Kx with ıNT an NT × 1vector of ones. Note that we smooth over y in Equation (20) as Q(x)D = 0 which will

    29

  • eliminate the presence of the fixed effects as in Equation (15). The same smoothing is appliedto the parametric estimates to produce

    m̂para(x) = (ı′NTS(x)ıNT )−1ı′NTS(x)

    (xitβ̂

    ).

    Let �̂it = yit − xitβ̂ (the parametric residuals, free of the fixed effects). Then it holds thatm̂(x) − m̂para(x) = (ı′NTS(x)ıNT )

    −1 ı′NTS(x)�̂. Lin et al. [2014] discuss the fact that thepresence of the random denominator in m̂(x)− m̂para(x) will complicate computation of theasymptotic distribution of the integrated squared error test statistic.

    Instead, Lin et al. [2014] propose a simpler leave-one-out test-statistic (which omits centerterms), given by

    ÎNT =1

    N2|h|

    N∑i=1

    N∑j 6=i

    T∑t=1

    T∑s=1

    ̂̃�it̂̃�jsKitjsh, (21)where ̂̃�it = �̂it − �̂i·. An alternative test statistic is proposed by Henderson et al. [2008],however, this test is less appealing because it involves iteration of the estimator of the unknownconditional mean. The test statistic of Lin et al. [2014] only requires kernel weighting ofthe within residuals from parametric estimation.5 This simplicity allows Lin et al. [2014]to demonstrate that when ÎNT is appropriately normalized it has an asymptotically normaldistribution. They focus on the asymptotic behavior of ÎNT for N →∞ but the theory canbe established if both N and T are increasing.

    As is well known, kernel-based nonparametric tests commonly display poor finite sample sizeand power. To remedy this Lin et al. [2014] propose a bootstrap procedure to approximatethe distribution of the scaled test statistic ĴNT = N

    √|h|ÎNT/

    √σ̂20 where

    σ̂20 =2

    N2|h|

    N∑i=1

    N∑j 6=i

    T∑t=1

    T∑s=1

    ̂̃�2it̂̃�2jsK2itjsh. (22)Their bootstrap procedure is

    1. Estimate the linear panel data model under the fixed effects framework using the withinestimator and obtain the residuals �̂it = yit − xitβ̂.

    2. For i = 1, 2, . . . N and t = 1, 2, . . . T generate the two-point wild bootstrap error5This fact can be used to exploit existing software to implement the test. For example, the np package

    (Hayfield and Racine [2008]) offers a test of consistent model specification in the cross-sectional setting.However, the adept user could simply within transform their data and call npcmstest(), making implementationstraightforward.

    30

  • �∗it =(1−√5)

    2

    (ε̂it − ε̂

    )with probability p = (1+

    √5)

    2√

    5 and u∗it =

    (1+√5)2

    (�̂it − �̂

    )with

    probability 1− p. Then construct y∗it = xitβ̂ + �∗it. Call (y∗it,xit) for i = 1, 2, . . . N andt = 1, 2, . . . T the bootstrap sample.

    3. Use the bootstrap sample to estimate β based on the bootstrap sample using the withinestimator. Calculate the residuals �̂∗it = y∗it − xitβ̂∗.

    4. Compute J∗NT , where J∗NT is obtained from JNT using the residuals ̂̃�∗it.5. Repeat steps (2)-(4) a large number (B) of times and reject H0 if the estimated test

    statistic ĴNT is greater than the upper α-percentile of the bootstrapped test statistics.

    Lin et al. [2014] demonstrate that this bootstrap approach provides an asymptotically validapproximation of the distribution of JNT . Moreover, in the simulations that they conduct,the bootstrap test has correct size in both univariate and bivariate settings and also displayshigh power.

    6.3 A Hausman Test

    Aside from testing for poolability of the data, another interesting question that researcherscan ask in the presence of panel data is whether the fixed or random effects framework isappropriate. As should be obvious from our discussion above, the estimators for Equation (1)under the fixed or random effects framework take on different forms. Further, if the randomeffects estimator is applied erroneously, then it is inconsistent. The common approach totesting between these two frameworks is to use the Hausman test (Hausman [1978]). Evenwith the range of estimators that we have discussed for the fixed effects framework, to ourknowledge, only Henderson et al. [2008] describe a nonparametric Hausman test. One of thebenefits of using a random effects estimator (Martins-Filho and Yao [2009], for example),when the random effects framework is true, is that the gains in efficiency relative to a fixedeffects estimator, profile least-squares, say, can be substantial. In general, the larger T is orthe larger that σα is relative to σv the more efficient the random effects estimator is over thefixed effect estimator.

    Recall that a Hausman test works by examining the difference between estimators whereone estimator is consistent under both the null and alternative hypothesis, while anotherestimator is only consistent under the null hypothesis. Formally, under the random effectsframework the null hypothesis is

    H0 :E(αi|xit) = 0

    31

  • almost everywhere. The alternative hypothesis is

    H1 :E(αi|xit) 6= 0

    on a set with positive measure. Henderson et al. [2008] test H0 based on the sample analogueof J = E [vit E(vit|xit)f(xit)]. Note that J = 0 under H0 and is positive under H1, whichmakes this a proper statistic for testing H0.

    Let m̂(x) denote a consistent estimator of m(x) under the fixed effects assumption, profileleast-squares for example. Then a consistent estimator of vit is given by v̂it = yit − m̂(xit). Afeasible test statistic is given by

    ĴNT =1

    NT (NT − 1)

    N∑i=1

    T∑t=1

    N∑j=1

    T∑s=1

    {j,s}6={i,t}

    v̂itv̂jsKitjsh.

    It can be shown that ĴNT is a consistent estimator of J . Hence, ĴNTp→ 0 under the null, and

    ĴNTp→ C if H0 is false, where C > 0 is a positive constant. This test works by assessing if

    there is any dependence between the residuals and the covariates. Under H0, the fixed effectsestimator is consistent and so the residuals, v̂it, should be unrelated to the covariates, xit.

    Henderson et al. [2008] suggest a bootstrap procedure to implement this test to approximatethe finite sample null distribution of Ĵ . The steps are as follows:

    1. Let ṽi = (ṽi1, . . . , ṽiT )′, where ṽit = yit − m̃(xit) is the residual from a random effectsmodel, and m̃(x) is a random effects estimator of m(x). Compute the two-point wildbootstrap errors by v∗i = {(1 −

    √5)/2}ṽi with probability p = (1 +

    √5)/(2

    √5) and

    v∗i = {(1 +√

    5)/2}ṽi with probability 1− p. Generate y∗it via y∗it = m̃(xit) + v∗it. Call{y∗it,xit}

    N,Ti=1,t=1 the bootstrap sample. Note here that all residuals for a given individual

    are scaled by the same point of the two-point wild bootstrap.

    2. Use {y∗it,xit}N,Ti=1,t=1 to estimate m(x) with a fixed effects estimator and denote the

    estimate by m̂∗(x). Obtain the bootstrap residuals as v̂∗it = y∗it − m̂∗(xit).6

    3. The bootstrap test statistic Ĵ∗NT is obtained as for ĴNT except that v̂it (v̂js) is replacedby v̂∗it (v̂∗js) wherever it occurs.

    4. Repeat steps (1)-(3) a large number (B) of times and reject if the estimated test statisticĴ is greater than the upper α-percentile of the bootstrapped test statistics.

    6Henderson et al. [2008] suggest using a random effects estimator. However, as pointed out in Amini et al.[2012], asymptotically the performance of the test is independent of the estimator used within the bootstrap,but in finite samples the use of the fixed effects estimator leads to improved size.

    32

  • 6.4 Simultaneous Confidence Bounds

    Li et al. [2013] provide the maximum absolute deviation between m̂(x) and m(x), allowingfor the construction of uniform confidence bounds on the profile least-squares estimatorfor Equation (1) under the fixed effects framework. Their theoretical work focuses on theunivariate case. Specifically, they establish that

    P

    (−2 log h)1/2 supx∈[0,1]

    ∣∣∣∣m̂(x)−m(x)− B̂ias(m̂(x))√V̂ar(m̂(x)|X )

    ∣∣∣∣− dn→ e−2e−z , (23)

    where dn = (−2 log h)1/2 + 1(−2 log h)1/2 log(

    14πν0R(K

    ′)), B̂ias(m̂(x)) is a consistent estimator

    of the bias of the local-linear profile least-squares estimator, defined as

    Bias(m̂(x)) = h2κ2m′′(x)/2,

    V̂ar(m̂(x)|X ) is a consistent estimator of the bias of the local-linear profile least-squaresestimator, defined as

    Var(m̂(x)|X ) = ν0σ̄2(x)/f 2(x)

    where κj =∫ujK(u)du, νj =

    ∫ujK2(u)du, X = {xit, 1 ≤ i ≤ N, 1 ≤ t ≤ T}, f(x) =

    T∑t=1

    ft(x)

    where ft(x) denotes the density function of x for each time period t, σ̄2(x) =T∑t=1

    σ2t (x)ft(x),with σ2t (x) = E [ε̃2it|xit = x] with ε̃it = εit − ε̄i· and R(K) =

    ∫K2(u)du.

    From Equation (23), a (1− α)× 100% simultaneous confidence bound for m̂(x) is

    (m̂(x)− B̂ias(m̂(x))±∆1,α(x)

    )where

    ∆1,α(x) =(dn + (log 2− log(− log(1− α)))(−2 log h)1/2

    )√V̂ar(m̂(x)|X ).

    As with inference, it is expected that a bootstrap approach will perform well in finite samples.To describe the bootstrap proposed by Li et al. [2013] set

    T = supx∈[0,1]

    |m̂(x)−m(x)|√V̂ar(m̂(x)|X )

    .

    Denote the upper α-quantile of T as cα. When cα and V̂ar(m̂(x)|X ) are known the simulta-

    33

  • neous confidence bound of m̂(x) would be m̂(x)± cα√

    V̂ar(m̂(x)|X ). As these are unknownin practice, they need to be estimated. The bootstrap algorithm is

    1. Obtain the residuals from the fixed effects framework model and denote them as ε̂it.

    2. For each i and t, compute ε∗it = aitε̂it where ai are i.i.d. N(0, 1) across i. Generate thebootstrap observations as y∗it = m̂(xit) + ε∗it. Call {y∗it, xit}

    N,Ti=1,t=1 the bootstrap sample.

    Note here that all residuals for a given individual are scaled by the same factor.

    3. With the bootstrap sample {y∗it, xit}N,Ti=1,t=1, use local-linear profile least-squares to obtain

    the bootstrap estimator of m(x), denoted as m̂∗(x).

    4. Repeat steps 2. and 3. a large number (B) of times. The estimator V̂ar∗(m̂(x)|X ) of

    Var(m̂(x)|X ) is taken as the sample variance of the B estimates of m̂∗(x). Compute

    T ∗b = supx∈[0,1]

    |m̂∗(x)− m̂(x)|√V̂ar(m̂(x)|X )

    for b = 1, . . . , B.

    5. Use the upper α-percentile of {T ∗b }b=1,...,B to estimate the upper α-quantile cα ofT , call this ĉα. Construct the simultaneous confidence bound of m̂(x) as m̂(x) ±ĉα

    √V̂ar

    ∗(m̂(x)|X ).

    The simultaneous confidences bounds, in the univariate setting, can be used to provide agraphical depiction of when to reject a parametric functional form for m(x), offering analternative to the functional form test of Lin et al. [2014].

    7 Conclusions

    This chapter has reviewed the recent literature focusing on estimation and inference innonparametric panel data models under both the random and fixed effects frameworks.A range of estimation techniques were covered. This area is ripe for application across arange of domains. Nonparametric estimation under both fixed and random effects allowsthe practitioner to explore a wide range of hypotheses of interest, while consistent modelspecification tests provide a robustness check on less than fully nonparametric approaches.Bandwidth selection remains less studied, but as these methods are more widely embraced,it is likely that a potentially wider range of data-driven approaches will become available.We are optimistic that the interested practitioner can keep abreast of this rapidly expandingfield by digesting this chapter and the references herein.

    34

  • References

    C. Ai and Q. Li. Semi-parametric and non-parametric methods in panel data models. InL. Matyas and P. Sevestra, editors, The Econometrics of Panel Data: Fundamentals andRecent Developments in Theory and Practice, pages 451–478. Springer, 2008.

    S. Amini, M. S. Delgado, D. J. Henderson, and C. F. Parmeter. Fixed vs. random: TheHausman test four decades later. In B. H. Baltgi, R. C. Hill, W. K. Newey, and H. L.White, editors, Advances in Econometrics: Essays in Honor of Jerry Hausman, volume 29,pages 479–513. Emerald, 2012.

    B. H. Baltagi. Econometric Analysis of Panel Data. John Wiley & Sons, West Sussex, UnitedKingdom, 5th edition, 2013.

    B. H. Baltagi, J. Hidalgo, and Q. Li. A nonparamtric test for poolabilty using panel data.Journal of Econometrics, 75:345–367, 1996.

    K. C. Bishop and C. Timmins. Using panel data to easily estimate hedonic demand functions.Journal of the Association of Environmental and Resource Economists, 2016. Forthcoming.

    J. Chen, D. Li, and J. Gao. Non- and semi-parametric panel data models: A selective review.Monash University Working Paper 18/13, 2013.

    C. de Boor. A Practical Guide to Splines. Springer, 2001.

    Y. Gao and K. Li. Nonparametric estimation of fixed effects panel data models. Journal ofNonparametric Statistics, 25(3):679–693, 2013.

    W. Härdle and E. Mammen. Comparing nonparamtric versus parametric regression fits.Annals of Statistics, 21:1926–1947, 1993.

    J. A. Hausman. Specification tests in econometrics. Econometrica, 46:1251–1271, 1978.

    J. A. Hausman and W. E. Taylor. Panel data and unobservable individual effects. Economet-rica, 49:1377–1398, 1981.

    T. Hayfield and J. S. Racine. Nonparametric econometrics: The np package. Journal ofStatistical Software, 27(5), 2008. URL http://www.jstatsoft.org/v27/i05/.

    D. J. Henderson and C. F. Parmeter. Applied Nonparametric Econometrics. CambridgeUniversity Press, 2015.

    35

    http://www.jstatsoft.org/v27/i05/

  • D. J. Henderson and A. Ullah. A nonparametric random effects estimator. Economics Letters,88(3):403–407, 2005.

    D. J. Henderson and A. Ullah. Nonparametric estimation in a one-way error componentmodel: A Monte Carlo analysis. In A. Sengupta, T. Samanta, and A. Basu, editors,Statistical Paradigms: Recent Advances and Reconciliations, chapter 12, pages 213–237.World Scientific Review, 2014.

    D. J. Henderson, R. J. Carroll, and Q. Li. Nonparametric estimation and testing of fixedeffects panel data model. Journal of Econometrics, 144(2):257–275, 2008.

    S. Jin and L. Su. A nonparametric poolability test for panel data models with cross sectiondependence. Econometric Reviews, 32(4):469–512, 2013.

    R. Kress. Linear Integral Equations. Springer-Verlag, New York, New York, 2nd edition,1999.

    P. Lavergne. An equality test across nonparametric regressions. Journal of Econometrics,103:307–344, 2001.

    Y. Lee and D. Mukherjee. Nonparametric estimation of the marginal effect in fixed-effectpanel data models: An application on the environmental Kuznets curve. Unpublishedworking paper, 2014.

    C. Li and Z. Liang. Asymptotics for nonparametric and semiparametric fixed effects panelmodels. Journal of Econometrics, 185(3):420–434, 2015.

    G. Li, H. Peng, and T. Tong. Simultaneous conifdence band for nonparametric fixed effectspanel data models. Economics Letters, 119(2):229–232, 2013.

    Q. Li and J. S. Racine. Nonparametric Econometrics: Theory and Practice. PrincetonUniversity Press, Princeton, New Jersey, 2007.

    X. Lin and R. J. Carroll. Nonparametric function estimation for clustered data when thepredictor is measured without/with error. Journal of the American Statistical Association,95:520–534, 2000.

    X. Lin and R. J. Carroll. Semiparametric estimation in general repeated measures problems.Journal of the Royal Statistical Society (Series B), 68:68–88, 2006.

    Z. Lin, Q. Li, and Y. Sun. A consistent nonparametric test of parameteric regression functionalform in fixed effects panel data models. Journal of Econometrics, 178(1):167–179, 2014.

    36

  • X. Lu and L. Su. Determining individual or time effects in panel data models. Unpublishedworking paper, 2017.

    S. Ma, J. S. Racine, and A. Ullah. Nonparametric estimation of marginal effects in regression-spline random effects models. Technical Report Working Paper 2015-10, McMaster Univer-sity, 2015a.

    S. Ma, J. S. Racine, and L. Yang. Spline regression in the presence of categorical predictors.Journal of Applied Econometrics, 30:703–717, 2015b.

    E. Mammen, B. Støve, and D. Tjøstheim. Nonparametric additive models for panels of timeseries. Econometric Theory, 25:442–481, 2009.

    C. Martins-Filho and F. Yao. Nonparametric regression estimation with general parametricerror covariance. Journal of Multivariate Analysis, 100:309–333, 2009.

    E. Masry. Multivariate local polynomial regression for time series: Uniform strong consistencyand rates. Journal of Time Series Analysis, 17:571–599, 1996.

    K. Mundra. Nonparmetric slope estimation for the fixed-effect panel data model. WorkingPaper, Department of Economics, San Diego State University., 2005.

    J. Qian and L. Wang. Estimating semiparametric panel data models by marginal integration.Journal of Econometrics, 167(3):483–493, 2012.

    J. S. Racine. Nonparametric econometrics: A primer. Foundations and Trends in Economet-rics, 3(1):1–88, 2008.

    J. M. Rodriguez-Poo and A. Soberon. Nonparametric and semiparametric panel data models:Recent developments. Journal of Economic Surveys, 31(4):923–960, 2017.

    A. F. Rucksthul, A. H. Welsh, and R. J. Carroll. Nonparametric function estimation of therelationship between two repeatedly measured variables. Statistica Sinica, 10:51–71, 2000.

    L. Su and X. Lu. Nonparametric dynamic panel data models: kernel estimation andspecification testing. Journal of Econometrics, 176(1):112–133, 2013.

    L. Su and A. Ullah. Profile likelihood estimation of partially linear panel data models withfixed effects. Economics Letters, 92(1):75–81, 2006.

    L. Su and A. Ullah. More efficient esitmation of nonparametric panel data models withrandom effects. Economics Letters, 96(3):375–380, 2007.

    37

  • L. Su and A. Ullah. Nonparametric and semiparametric panel econometric models: Estimationand testing. In D. E. A. Giles and A. Ullah, editors, Handbook of Empirical Economicsand Finance, pages 455–497. Chapman & Hall/CRC, Boca Raton, FL, 2011.

    Y. Sun, R. J. Carroll, and D. Li. Semiparametric estimation of fixed effects panel datavarying coefficient models. In Q. Li and J. S. Racine, editors, Advances in Econometrics:Nonparametric Econometric Methods, volume 25, chapter 3, pages 101–130. Emerald, 2009.

    Y. Sun, Y. Y. Zhang, and Q. Li. Nonparametric panel data regression models. In B. H.Baltagi, editor, The Oxford Handbook of Panel Data, pages 285–324. Oxford UniversityPress, 2015.

    A. Ullah and N. Roy. Nonparametric and semiparametric econometrics of panel data. InD. E. A. Giles and A. Ullah, editors, Handbook of Applied Economic Statistics, pages579–604. Marcel Dekker, New York, 1998.

    N. Wang. Marginal nonparametric kernel regression accounting for within-subject correlation.Biometrika, 90(1):43–52, 2003.

    38

    IntroductionHow Unobserved Heterogeneity Complicates EstimationEstimation in the Random Effects FrameworkPreliminariesLocal-Polynomial Weighted Least-SquaresSpline-Based EstimationProfile Likelihood Estimation

    Estimation in the Fixed Effects FrameworkDifferencing/Transformation MethodsProfile EstimationMarginal IntegrationProfile Least-SquaresEstimation with Unbalanced Panel Data

    Dynamic Panel EstimationThe Static Setting

    InferencePoolabilityPoolability Through Irrelevant Individual and Time Effects

    Specification TestingA Hausman TestSimultaneous Confidence Bounds

    Conclusions