Uniform Bahadur Representation for Local Polynomial Estimates … · 2009. 10. 26. · OLIVER LINTON† London School of Economics YINGCUN XIA‡ National University of Singapore

Uniform Bahadur Representation for Local Polynomial Estimates of M-Regression

and Its Application to The Additive Model

Efang Kong∗ (Technische Universiteit Eindhoven)

Oliver Linton†

(London School of Economics)

Yingcun Xia‡ (National University of Singapore)

The Suntory Centre

Suntory and Toyota International Centres for Economics and Related Disciplines London School of Economics and Political Science

DP No: EM 2009 535 Houghton Street London WC2A 2AE 2009 Tel: 020 7955 6674

∗ Eurandom, Technische Universiteit Eindhoven, The Netherlands. E-mail address: [email protected]. † Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. http://econ.lse.ac.uk/staff/olinton/¯index own.html. E-mail address: [email protected]. ‡ Department of Statistics and Applied Probability, National University of Singapore, Singapore. http://www.stat.nus.edu.sg/¯staxyc. E-mail address: [email protected].

Abstract

We use local polynomial fitting to estimate the nonparametric M-regression function for strongly mixing stationary processes , . We establish a strong uniform consistency rate for the Bahadur representation of estimators of the regression function and its derivatives. These results are fundamental for statistical inference and for applications that involve plugging such estimators into other functional where some control over higher order terms are required. We apply our results to the estimation of an additive M-regression model. © The authors. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Uniform Bahadur Representation for Local

Polynomial Estimates of M-Regression and Its

Application to The Additive Model

EFANG KONG∗

Technische Universiteit EindhovenOLIVER LINTON†

London School of EconomicsYINGCUN XIA‡

National University of Singapore

We use local polynomial fitting to estimate the nonparametric M-regression function for strongly mixing

stationary processes {(Yi, Xi)}. We establish a strong uniform consistency rate for the Bahadur rep-

resentation of estimators of the regression function and its derivatives. These results are fundamental

for statistical inference and for applications that involve plugging such estimators into other functionals

where some control over higher order terms are required. We apply our results to the estimation of an

additive M-regression model.

1 INTRODUCTION

In many contexts one wants to evaluate the properties of some procedure that is a functional of

some given estimators. It is useful to be able to work with some plausible high level assumptions

about those estimators rather than to re-derive their properties for each different application.

In a fully parametric (and stationary, weakly dependent data) context, it is quite common to

assume that estimators are root-n consistent and asymptotically normal. In some cases this

property suffices; in other cases one needs to be more explicit in terms of the linear expansion

of these estimators, but in any case such expansions are quite natural and widely applicable. In∗Eurandom, Technische Universiteit Eindhoven, The Netherlands. E-mail address: [email protected].†Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United

Kingdom. http://econ.lse.ac.uk/staff/olinton/˜index own.html. E-mail address: [email protected].‡Department of Statistics and Applied Probability, National University of Singapore, Singapore.

http://www.stat.nus.edu.sg/˜staxyc. E-mail address: [email protected].

1

a nonparametric context there is less agreement about the use of such expansions and one often

sees standard properties of standard estimators derived anew for a different purpose. It is our

objective to provide results that can circumvent this. The types of application we have in mind

are estimation of semiparametric models where the parameters of interest are explicit or im-

plicit functionals of nonparametric regression functions and their derivatives; see Powell (1994),

Andrews (1994) and Chen, Linton and Van Keilegom (2003). Another class of applications

includes estimation of structured nonparametric models like the additive models (Linton and

Nielsen, 1995) or the generalized additive models (Linton, Sperlich and Van Keilegom, 2007).

We motivate our results in a simple i.i.d. setting. Suppose we have a random sample

{Yi, Xi}ni=1 and consider the Nadaraya-Watson estimator of the regression function m(x) =

E(Yi|Xi = x),

m(x) =r(x)

f(x)=n−1

∑ni=1Kh(Xi − x)Yi

n−1∑n

i=1Kh(Xi − x),

where K(.) is a symmetric density function, h is a bandwidth and Kh(.) = K(./h)/h. Standard

arguments (Hardle, 1990) show that under suitable smoothness conditions,

m(x)−m(x) = h2b(x) +1

nf(x)

n∑i=1

Kh(Xi − x)εi +Rn(x), (1)

where b(x) =∫u2K(u)du[m′′(x) + 2m′(x)f ′(x)/f(x)]/2, while f(x) is the covariate density

function and εi ≡ Yi −m(Xi) is the error term. The remainder term Rn(x) is of smaller order

(almost surely) than the two leading terms. Such an expansion is sufficient to derive the central

limit theorem for m(x) itself, but generally is not sufficient if m(x) is to be plugged into some

semiparametric procedure. For example, suppose we estimate the parameter θ0 =∫m(x)2dx 6= 0

by θ =∫m(x)2dx, where the integral is over some compact set D; we would expect to find that

n1/2(θ − θ0) is asymptotically normal. Based on expansion (1), the argument goes like this.

First, we obtain the following

n1/2(θ − θ0) = 2n1/2

∫m(x){m(x)−m(x)}dx+ n1/2

∫[m(x)−m(x)]2dx.

If it can be shown that m(x)−m(x) = o(n−1/4) a.s. uniformly in x ∈ D ( such results are widely

2

available; see for example Masry (1996)), we have

n1/2(θ − θ0) = 2n1/2

∫m(x){m(x)−m(x)}dx+ o(1) a.s.

Note that the quantity on the right hand side is the term in assumption 2.6 of Chen, Linton,

and Van Keilegom (2003) which is assumed to be asymptotically normal. It is the verification

of this condition with which we are now concerned. We substitute in expansion (1) and obtain

n1/2(θ − θ0) = 2n1/2h2

∫m(x)b(x)dx+ 2n1/2

∫n−1

n∑i=1

εiKh(Xi − x)m(x)f(x)

dx

+ 2n1/2

∫m(x)Rn(x)dx+ o(1) a.s.

If nh4 → 0, then the first term (the smoothing bias term) is o(1). The second term (the stochastic

term) is a sum of independent random variables with mean zero, which can be rewritten, using

a change of variables, as

n1/2∫m(x)f−1(x)n−1

n∑i=1

Kh(Xi − x)εidx = n−1/2n∑

i=1ξn(Xi)εi,

ξn(Xi) =∫m(Xi + uh)f−1(Xi + uh)K(u)du,

and this term obeys the Lindeberg central limit theorem under standard conditions. The problem

is with the third term, as equation (1) only guarantees that∫m(x)Rn(x)dx = o(n−2/5) a.s. at

best. In fact, it is possible to derive a more useful Bahadur representation (Bahadur, 1966) for

the kernel estimator

m(x)−m(x) = h2bn(x) + {Ef(x)}−1n−1n∑

i=1

Kh(Xi − x)εi +R∗n(x), (2)

where bn(x) is deterministic and satisfies bn(x) → b(x) and Ef(x) → f(x) uniformly in x ∈ D,

while the remainder term now satisfies

supx∈D

|R∗n(x)| = O

(log nnh

)a.s. (3)

This property is a consequence of the uniform convergence rate of f(x)−Ef(x), n−1∑n

i=1Kh(x

−Xi){m(Xi) −m(x)} − EKh(Xi − x){m(Xi) −m(x)} and n−1∑n

i=1Kh(Xi − x)εi that follow

3

from, for example Masry (1996). Clearly, by appropriate choice of the bandwidth h, R∗n(x) can

be made o(n−1/2) a.s. uniformly over D and thus 2n1/2∫m(x)R∗n(x)dx = o(1) a.s.. Therefore,

to derive asymptotic normality for n1/2(θ − θ0), one can just work with the two leading terms

in (2). These terms are slightly more complicated than in the previous expansion but are still

sufficiently simple for many purposes; in particular, bn(x) is uniformly bounded so that provided

nh4 → 0, the smoothing bias term satisfies h2n1/2∫m(x)bn(x)dx→ 0, while the stochastic term

is a sum of zero mean independent random variables

n1/2

∫m(x)f(x)

n−1n∑

i=1

Kh(Xi − x)εidx = n−1/2n∑

i=1

ξn(Xi)εi

ξn(Xi) =∫m(Xi + uh)f(Xi + uh)

K(u)du,

and obeys the Lindeberg central limit theorem under standard conditions, where f(x) = Ef(x).

This argument shows the utility of Bahadur representation (2). There are many other applica-

tions of this result because a host of probabilistic results are available for random variables like

n−1∑n

i=1Kh(Xi − x)εi and integrals thereof.

The one-dimensional Nadaraya-Watson estimator for i.i.d. data is particularly easy to an-

alyze and the above arguments are well known. However, the limitations of this estimator are

manyfold and there are good theoretical reasons for working instead with the local polynomial

class of estimators (Fan and Gijbels, 1996). In addition, for many data especially financial time

series data one may have concerns about heavy tails or outliers that point in the direction of

using robust estimators like the local median or local quantile method, perhaps combined with

local polynomial fitting. We examine a general class of (nonlinear) M-regression function (that

is, location functionals defined through minimization of a general objective function ρ(.)) and

derivative estimators. We treat a general time series setting where the multivariate data are

strongly mixing. Under mild conditions, we establish a uniform strong Bahadur representation

like (2) and (3) with remainder term of order (log n/nhd)3/4 almost surely, a rate that is almost

optimal or in other words can’t be improved further based on the results in Kiefer (1967) under

i.i.d. setting. The leading terms are linear and functionals of them can be analyzed simply.

4

The remainder term can be made to be o(n−1/2) a.s. under restrictions on the dimensionality

in relation to the amount of smoothness possessed by the M-regression function.

The best convergence rate of unrestricted nonparametric estimators strongly depends on d,

the dimension of the covariates. The rate decreases dramatically as d increases (Stone, 1982).

This phenomenon is the so-called “curse of dimensionality”. One approach to reduce the curse

is by imposing model structure. A popular model structure is the additive model assuming that

m(x1, . . . , xd) = c+m1(x1) + ...+md(xd), (4)

where c is an unknown constant and mk(.), k = 1, . . . , d are unknown functions which have been

normalized such that Emk(xk) = 0, k = 1, . . . , d. In this case, the optimal rate of convergence

is the same as in univariate nonparametric regression (Stone, 1986). An additive M-regression

function is given by (4), where m(x) is the M-regression function defined in (5) for some loss

function ρ(.; .). Previous work on additive quantile regression, for example, includes Linton

(2001) and Horowitz and Lee (2005) for the i.i.d. case. An interesting application of the

additive M-regression model is to combine (4) with the volatility model

Yi = σiεi and lnσ2i = m(Xi),

where Xi = (Yi−1, . . . , Yi−d)>. We suppose that εi satisfies E[ϕ(ln ε2i ; 0)|Xi] = 0 with ϕ(.; .) the

piecewise derivative of ρ(.; .), whence m(.) is the conditional M -regression of lnY 2i given Xi.

Peng and Yao (2003) applied LAD estimation to parametric ARCH and GARCH models and

showed the superior robustness property of this procedure over Gaussian QMLE with regard to

heavy tailed innovations. This heavy tail issue also arises in nonparametric regression models

and empirical evidences suggest that moderately high frequency financial data are often heavy

tailed, which is why our procedures may be useful. We apply the Bahadur representations to

the study of the marginal integration estimators (Linton and Nielsen, 1995) of the component

functions in the additive M-regression model in which case we only need the remainder term to

be o(n−p/(2p+1)) a.s., where p is a smoothness index.

Bahadur representations (Bahadur, 1966) have been widely studied and applied, with notable

5

refinements in the i.i.d. setting by Kiefer (1967). A recent paper of Wu (2005) extends these

results to a general class of dependent processes and provides a review. The closest paper to

ours is Hong (2003), which established the Bahadur representation for essentially the same local

polynomial M-regression estimator as ours. However, his results are: (a) pointwise, i.e., for a

single x only; (b) the covariate is univariate; (c) for i.i.d. data. Clearly, this limits the range

of applicability of his results, and specifically, the applications to semiparametric or additive

models are perforce precluded.

2 THE GENERAL SETTING

Let {(Yi, X i)} be a jointly stationary processes, where Xi = (xi1, ...,xid)> with d ≥ 1 and Yi is

a scalar. As dependent observations are considered in this paper, we introduce here the mixing

coefficient. Let Fts be the σ− algebra of events generated by random variables {(Yi, X i), s ≤ i ≤

t}. A stationary stochastic processes {(Yi, X i)} is strongly mixing if

supA∈F0

−∞B∈F∞k

|P [AB]− P [A]P [B]| = γ[k] → 0, as k →∞,

and γ[k] is called the strong mixing coefficient.

Suppose ρ(.; .) is a loss function. Our first goal is to estimate the multivariate M-regression

function

m(x1, · · · , xd) = arg minθE{ρ(Yi; θ)|Xi = (x1, · · · , xd)}, (5)

and its partial derivatives based on observations {(Yi, X i)}ni=1. An important example of the

M-function is the qth (0 < q < 1) quantile of Yi given Xi = (x1, · · · , xd)>, with loss function

ρ(y; θ) = (2q−1)(y−θ)+|y−θ|. Another example is the Lq criterion: ρ(y; θ) = |y−θ|q for q > 1,

which includes the least square criterion ρ(y; θ) = (y−θ)2 with m(.) the conditional expectation

of Yi given Xi. Yet another example is the celebrated Huber’s function (Huber, 1973)

ρ(t) = t2/2I{|t| < k}+ (k|t| − k2/2)I{|t| ≥ k}. (6)

6

Suppose m(x) is differentiable up to order p+ 1 at x = (x1, ..., xd)>. Then the multivariate p’th

order local polynomial approximation of m(z) for any z close to x is given by

m(z) ≈∑

0≤|r|≤p

1r!Drm(x)(z − x)r,

where r = (r1, ..., rd), |r| =∑d

i=1 ri, r! = r1!× · · · × rd!, and

Drm(x) =∂rm(x)

∂xr11 · · · ∂xrd

d

, xr = xr11 × ...× xrd

d ,∑

0≤|r|≤p

=p∑

j=0

j∑r1=0

...

j∑rd=0

r1+...+rd=j

. (7)

Let K(u) be a density function on Rd, h a bandwidth and Kh(u) = K(u/h). With observations

{(Yi, X i)}ni=1, we consider minimizing the following quantity with respect to βr, 0 ≤ |r| ≤ p :

n∑i=1

Kh(Xi − x)ρ(Yi;

∑0≤|r|≤p

βr(Xi − x)r). (8)

Denote by βr(x), 0 ≤ |r| ≤ p, the minima of (8). The M-regression function m(x) and its partial

derivatives Drm(x), 1 ≤ |r| ≤ p are then estimated respectively by

m(x) = β0(x) and Drm(x) = r!βr(x), 1 ≤ |r| ≤ p. (9)

3 MAIN RESULTS

In Theorem 3.2 below we give our main result, the uniform strong Bahadur representation for

the vector βp(x). We first need to develop some notations to define the leading terms in the

representation.

Let Ni =(i+d−1d−1

)be the number of distinct d−tuples r with |r| = i. Arrange these d−tuples

as a sequence in a lexicographical order (with the highest priority given to the last position

so that (0, · · · , 0, i) is the first element in the sequence and (i, 0, · · · , 0) the last element). Let

τi denote this 1-to-1 mapping, i.e. τi(1) = (0, · · · , 0, i), · · · , τi(Ni) = (i, 0, · · · , 0). For each

i = 1, · · · , p, define a Ni × 1 vector µi(x) with its kth element given by xτi(k) and write µ(x) =

(1, µ1(x)>, · · · , µp(x)>)>, which is a column vector of length N =∑p

i=0Ni. Similarly define

vectors βp(x) and β through the same lexicographical arrangement of Drm(x) and βr in (8) for

7

0 ≤ |r| ≤ p. Thus (8) can be rewritten as

n∑i=1

Kh(Xi − x)ρ(Yi;µ(Xi − x)>β). (10)

Suppose the minimizer of (10) is denoted as βn(x). Let βp(x) = Wpβn(x), where Wp is a diagonal

matrix with diagonal entries the lexicographical arrangement of r!, 0 ≤ |r| ≤ p.

Let νi =∫K(u)uidu. For g(.) given in (A.7), define

νni(x) =∫K(u)uig(x+ hu)f(x+ hu)du.

For 0 ≤ j, k ≤ p, let Sj,k and Sn,j,k(x) be two Nj × Nk matrices with their (l,m) elements

respectively given by

[Sj,k

]l,m

= ντj(l)+τk(m),[Sn,j,k(x)

]l,m

= νn,τj(l)+τk(m)(x). (11)

Now define the N ×N matrices Sp and Sn,p(x) by

Sp =

S0,0 S0,1 · · · S0,p

S1,0 S1,1 · · · S1,p...

. . ....

Sp,0 Sp,1 · · · Sp,p

, Sn,p(x) =

Sn,0,0(x) Sn,0,1(x) · · · Sn,0,p(x)Sn,1,0(x) Sn,1,1(x) · · · Sn,1,p(x)

.... . .

...Sn,p,0(x) Sn,p,1(x) · · · Sn,p,p(x)

.

According to Lemma 5.8, Sn,p(x) converges to g(x)f(x)Sp uniformly in x ∈ D almost surely.

Hence for |Sp| 6= 0, we can define

β∗n(x) = − 1nhd

WpS−1n,p(x)H

−1n

n∑i=1

Kh(Xi − x)ϕ(Yi, µ(Xi − x)>βp(x))µ(Xi − x), (12)

where ϕ(.; .) is the piecewise derivative of ρ(., .) as defined in (A1) and Hn is a diagonal matrix

with diagonal entries h|r|, 0 ≤ |r| ≤ p in the aforementioned lexicographical order. The quantity

β∗n(x) is the leading term in the Bahadur representation of βp(x)− βp(x); it is the sum of a bias

term, Eβ∗n(x), and a stochastic term β∗n(x)− Eβ∗n(x).

Denote the typical element of β∗n(x) by β∗nr(x), 0 ≤ |r| ≤ p and the probability density

function of X by f(.). The following results on Eβ∗nr(x) is an extension of Proposition 2.2 in

Hong (2003) to the multivariate case.

8

PROPOSITION 3.1 If f(x) > 0 and conditions (A1)-(A5) in the Appendix hold, then

Eβ∗nr(x) =

−hp+1eN(r)WpS

−1p B1mp+1(x) + o(hp+1), for p− |r| odd,

−hp+2eN(r)WpS−1p

[{fg}−1(x)mp+1(x){M(x)−NpS

−1p B1}+B2mp+2(x)

]+o(hp+2), for p− |r| even,

where N(r) = τ−1|r| (r) +

∑|r|−1k=0 Nk, ei is a N × 1 vector having 1 as the ith entry with all other

entries 0, and B1 = [S0,p+1, S1,p+1, · · ·Sp,p+1]> , B2 = [S0,p+2, S1,p+2, · · ·Sp,p+2]

> .

We next present our main result, the Bahadur representation for the local polynomial estimates

βp(x).

THEOREM 3.2 Suppose (A1)-(A7) in the Appendix hold with λ2 = (p + 1)/2(p + s + 1) for

some s ≥ 0 and D is any compact subset of Rd. Then

supx∈D

|Hn{βp(x)− βp(x)} − β∗n(x)| = O({ log n

nhd

}λ(s))almost surely,

where |.| is taken to be the sup norm and

λ(s) = min{ p+ 1p+ s+ 1

,3p+ 3 + 2s4p+ 4s+ 4

}.

REMARK 1. According to Theorem 1 in Kiefer (1967), the point-wise sharpest bound of the

remainder term in the Bahadur representation of the sample quantiles is (log log n/n)3/4. As

λ(0) = 3/4, we could safely claim the results here could not be further improved for a general

class of loss functions ρ(.) specified by (A1) and (A2). Nevertheless, it is possible to derive

stronger results, if the concerned loss functions enjoy a higher degree of smoothness; e.g. (3)

in which case ρ(.) is the squared loss function. More specifically, suppose that ϕ(.) is Lipschitz

continuous and (A1)-(A7) in the Appendix hold with λ2 = 1/2 and λ1 = 1. Then we prove in

the Appendix that

supx∈D

|Hn{βp(x)− βp(x)} − β∗n(x)| = O( log nnhd

)almost surely. (13)

REMARK 2. The dependence among the observations doesn’t have any impact on the rate of

the uniform convergence, provided that the degree of the dependence, as measured by the mixing

9

coefficient γ[k], is weak enough such that (A.3) and (A.4) are satisfied. This is in accordance

with the results in Masry (1996), where he proved that for local polynomial estimator of the

conditional mean function, the uniform convergence rate is (nhd/ log n)−1/2, the same as in the

independent case.

REMARK 3. It is of practical interest to provide an explicit rate of decay for the strong mixing

coefficient γ[k] of the form γ[k] = O(1/kc) for some c > 0 (to be determined) for Theorem 3.2 to

hold. It is easy to see that, among all the conditions imposed on γ[k], the summability condition

(A.4) is the most restrictive. We assume that

h = hn ∼ (log n/n)a for some1

2(p+ s+ 1) + d≤ a <

1d

{1− 4

(1− λ2)ν2 − 4λ1 + 2(1 + λ2)

},

whence (A.2) holds. Algebraic calculations show that (A.4) would be true if

c > ν2(1− ad){(1− λ2)(4N + 1) + 8Nλ1}+ 10 + (4 + 8N)ad2(1− λ2)(1− ad)ν2 − 8ad+ 4(1− ad)(1− λ2 − 2λ1)

− 1 ≡ c(d, p, ν2, a, λ1, λ2). (14)

Note that we would need the following condition

ν2 > 2 +4{ad+ (1− ad)λ1}(1− ad)(1− λ2)

to secure a positive denominator for (14). As c(d, p, ν2, a, λ1, λ2) is decreasing in ν2(≤ ν1), there

is a tradeoff between the order ν1 of the moment E|ϕ(εi)|ν1 < ∞ and the decay rate of the

strong mixing coefficient γ[k]: the existence of higher order moments allows γ[k] to decay more

slowly.

REMARK 4. It is trivial to generalize the result in Theorem 3.2 to functionals of the M-

estimates βp(x). Denote the typical elements of βp(x) and βp(x) by βpr(x) and βpr(x), 0 ≤

|r| ≤ p respectively. Suppose G(.) : Rd → R satisfies that for any compact set D ⊂ Rd, there

exists some constant C > 0, such that |G′(βpr(x))| ≤ C and |G′′(βpr(x))| ≤ C for all x ∈ D.

Then with probability 1,

supx∈D

∣∣∣h|r|{G(βpr(x))−G(βpr(x))} −G′(βpr(x))β∗nr(x)∣∣∣ = O

({ log nnhd

}λ(s)). (15)

The following proposition follows from Theorem 3.2 and the uniform convergence of the sum

of weakly dependent zero mean random variables.

10

COROLLARY 3.3 Suppose conditions in Theorem 3.2 hold with s = 0. Then with probability

1 we have, uniformly in x ∈ D,

Hn{βp(x)− βp(x)} − Eβ∗n(x)− WpH−1n

nhdS−1

np (x)n∑

i=1

Kh(Xi − x)ϕ(εi)µ(Xi − x) = O({ log n

nhd

}3/4).

4 M-ESTIMATION OF THE ADDITIVE MODEL

In this section, we apply our main result to derive the properties of a class of estimators in the

additive M-regression model (4). In terms of estimating the component functions mk(.), k =

1, . . . , d in (4), the marginal integration method (Linton and Nielsen, 1995) is known to achieve

the optimal rate under certain conditions. This involves estimating first the unrestricted M-

regression functionm(.) and then integrating it over some directions. PartitionXi = (x1, . . . , xd)

as Xi = (x1i, X2i), where x1i is the one dimensional direction of interest and X2i is a d − 1

dimensional nuisance direction. Let x = (x1, x2) and define the functional

φ1(x1) =∫m(x1, x2)f2(x2)dx2, (16)

where f2(x2) is the joint probability density of X2i. Under the additive structure (4), φ1(.) is

m1(.) up to a constant. Replace m(.) in (16) with β0(x1, x2) ≡ β0(x) given by (9) and φ1(x1)

can thus be estimated by the sample version of (16):

φn1(x1) = n−1n∑

i=1

β0(x1, X2i).

As noted by Linton and Hardle (1996) and Hengartner and Sperlich (2005), cautious choice

of the bandwidth is crucial for φn1(.) to be asymptotically normal. They suggested different

bandwidths be used for the direction of interest X1 and the d−1 dimensional nuisance direction

X2, say h1 and h respectively. Sperlich et. al. (1998) provides an extensive study of the small

sample properties of the marginal integration estimators, including an evaluation of bandwidth

choice.

The following corollary concerns the asymptotic properties of φn1(.).

11

COROLLARY 4.1 Suppose the support of X is [0, 1]⊗d with strictly positive probability density

function. Assume that conditions in Proposition 3.3 hold with Tn ≡ {r(n)/min(h1, h)}d and the

hd replaced by h1hd−1 in all the notations defined either in (A.1) or (A.2). If h1 ∝ n−1/(2p+3),

h = O(h1) and (A.2) is modified as

nh1h3(d−1)/ log3 n→∞, n−1{r(n)}ν2/2dn log n/M (2)

n →∞, (17)

then we have

(nh1)1/2{φn1(x1)− φ1(x1)}L→ N(e1WpS

−1p B1Emp+1(x1, X2), σ

2(x1)),

where ‘ L→’ stands for convergence in distribution,

σ2(x1) ={∫

[0,1]⊗d−1

{fg2}−1(x1, X2)f22 (X2)σ

2(x1, X2)dX2

}e1S

−1p K2K

>2 S

−1p e>1 ,

σ2(x) = E[ϕ2(ε)|X = x] and K2 =∫[0,1]⊗d K(v)µ(v)dv. In particular, for the additive quantile

regression model, i.e. ρ(y; θ) = (2q − 1)(y − θ) + |y − θ|, we have

σ2(x1) = q(1− q){∫

[0,1]⊗d−1

f−1(x1, X2)f−2ε (0|x1, X2)f

22 (X2)dX2

}e1S

−1p K2K

>2 S

−1p e>1 .

REMARK 5. For conditions in Corollary 4.1 to hold, we would need 3d < 2p + 5, i.e. the

order p of local polynomial approximation should increase with the dimension of the covariates

X. See also the discussion in Hengartner and Sperlich (2005).

REMARK 6. Besides asymptotic normality, we could also by applying Theorem 3.2 develop

Bahadur representations for φn1(x1), like those assumed in Linton, Sperlich and Van Keilegom

(2007). Based on (15), similar results are also applicable to the generalized additive M-regression

model, i.e. G(m(x1, . . . , xd)) = c+m1(x1)+. . .+md(xd) for some known smooth function G(.), in

which case the marginal integration estimator is defined as the sample average of G(m(x1, X2i)).

5 CONCLUSION

We have obtained an asymptotic expansion for a nonlinear local polynomial M-estimator of a

conditional location functional for stationary weakly dependent processes. The approximations

12

we have obtained are to a high enough order for many applications based on computing func-

tionals of said estimators. The error from the omitted terms is established in two cases, the

smooth case and the unsmooth case, and both cases we achieve what appears to be the optimal

rate.

REFERENCES

Andrews, D.W.K. (1994) Asymptotics for semiparametric econometric models via stochastic

equicontinuity. Econometrica 62, 43-72.

Bahadur, R.R. (1966) A note on quantiles in large samples. Annals of Mathematical Statistics

37, 577-580.

Bosq, D. (1998) Nonparametric Statistics for Stochastic Processes. NewYork: Springer-Verlag.

Chen, X., Linton, O. & I. Van Keilegom (2003) Estimation of Semiparametric Models when

the Criterion is not Smooth. Econometrica 71, 1591-1608.

Fan, J., Heckman, N.E. & Wand, M.P. (1995) Local polynomial kernel regression for gener-

alized linear models and quasi-likelihood functions. Journal of the American Statistical

Association 90, 141-150.

Fan, J. & Gijbels, I. (1996) Local Polynomial Modelling and Its Applications. London: Chap-

man & Hall.

Hengartner, N.W. & Sperlich, S. (2005) Rate optimal estimation with the integration method

in the presence of many covariates. Journal of Multivariate Analysis 95, 246 -2 72.

Hall, P. & Heyde, C.C. (1980) Martingale Limit Theory and Its Application. NewYork: Aca-

demic Press.

Hong, S. (2003). Bahadur representation and its application for Local Polynomial Estimates

in Nonparametric M-Regression. Journal of Nonparametric Statistics 15, 237-251.

Horowitz, J.L. & Lee, S. (2005). Nonparametric estimation of an additive quantile regression

model. Journal of the American Statistical Association 100, 1238-1249.

Huber, P.J. (1973) Robust regression. Annals of Statistics 1, 799-821.

13

Kiefer, J. (1967) On Bahadur’s representation of sample quantiles. Annals of Mathematical

Statistics. 38, 1323-1342.

Linton, O. (2001) Estimating additive nonparametric models by partial Lq Norm: The Curse

of Fractionality. Econometric Theory 17, 1037-1350.

Linton, O. & Nielsen, J.P. (1995) A kernel method of estimating structured nonparametric

regression based on marginal integration. Biometrika 82, 93-100.

Linton, O., Sperlich, S., & I. Van Keilegom (in press). Estimation of a semiparametric trans-

formation model by minimum distance. Annals of Statistics

Linton, O. & Hardle, W. (1996). Estimation of additive regression models with known links.

Biometrika 83, 529-540.

Masry, E. (1996) Multivariate local polynomial regression for time series: uniform strong con-

sistency & rates. Journal of Time Series Analysis 17, 571-599.

Peng, L. & Yao, Q. (2003) Least absolute deviation estimation for ARCH & GARCH models.

Biometrika 90, 967-975.

Powell, J.L. (1994) Estimation in semiparametric models. In Engle, R. F. & McFadden, D. L.

(eds.), The Handbook of Econometrics vol. 4, pp. 2444-2521. Amsterdam: North Holland.

Rosenblatt, M. (1956) A central limit theorem and strong mixing conditions. Proceedings of

the National Academy of Sciences of the United States of America 4, 43-47.

Sperlich, S., Linton, O. & Hardle, W. (1998) A Simulation comparison between the backfitting

and integration methods of estimating separable nonparametric models. Test 8, 419-458.

Stone, C.J. (1982) Optimal global rates of convergence for nonparametric regression. Annals

of Statistics 10, 1040-1053.

Stone, C.J. (1986) The dimensionality reduction principle for generalized additive models.

Annals of Statistics 14, 592-606.

Wu, W.B. (2005) On the Bahadur representation of sample quantiles for dependent sequences.

Annals of Statistics 33, 1934-1963.

14

APPENDIX: ProofsWe will need the following notations. For any λ2 ∈ (0, 1), λ1 ∈ (λ2, (1 + λ2)/2] and M > 2,

define

dn = (nhd/ log n)−(λ1+λ2/2)(nhd log n)1/2, r(n) = (nhd/ log n)(1−λ2)/2, (A.1)

M (1)n = M(nhd/ log n)−λ1 , M (2)

n = M1/4(nhd/ log n)−λ2 , Tn = {r(n)/h}d,

and Ln as the smallest integer such that log n(M/2)Ln+1 > nM(2)n /dn. Let ‖.‖ denote the

Euclidean norm and C be a generic constant, which may take different values in each appearance.

Let εi ≡ Yi −m(Xi) and assume that the following conditions hold.

(A1) For each y ∈ R, ρ(y; θ) is absolutely continuous in θ, i.e., there exists a function ϕ(y; θ) ≡

ϕ(y − θ) such that for any θ ∈ R, ρ(y; θ) = ρ(y; 0) +∫ θ0 ϕ(y; t)dt. The probability density

function of εi is bounded with E|ϕ(εi)|ν1 < ∞ for some ν1 > 2, and E{ϕ(εi)|Xi} = 0

almost surely.

(A2) ϕ(.) satisfies the Lipschitz condition in (aj , aj+1), j = 0, · · · ,m, where a0 ≡ −∞, am+1 ≡

+∞ and a1 < · · · < am are finite number of jump discontinuity points of ϕ(.).

(A3) K(.) has a compact support, say [−1, 1]⊗d and |Hj(u)−Hj(v)| ≤ C‖u− v‖ for all j with

0 ≤ |j| ≤ 2p+ 1, where Hj(u) = ujK(u).

(A4) The probability density function of X, f(.) is bounded with bounded first order deriva-

tives. The joint probability density of (X0, X l) satisfies f(u, v; l) ≤ C <∞ for all l ≥ 1.

(A5) For r with |r| = p+ 1, Drm(x) is bounded with bounded first order derivatives.

(A6) The bandwidth h→ 0, such that

nhd/ log n→∞, nhd+(p+1)/λ2/ log n <∞, n−1{r(n)}ν2/2dn log n/M (2)n →∞, (A.2)

15

for some 2 < ν2 ≤ ν1 and the processes {(Yi, X i)} is strongly mixing with mixing coefficient

γ[k] satisfying

∞∑k=1

ka{γ[k]}1−2/ν2 <∞ for some a > (p+ d+ 1)(1− 2/ν2)/d. (A.3)

Moreover, the bandwidth h and γ[k] should jointly satisfy the following condition

∞∑n=1

n3/2Tn

{M (1)n

dn

}1/2γ[r(n)(2ν2/2/M)2Ln/ν2 ]

r(n)(2ν2/2/M)2Ln/ν2{4M2N}Ln <∞, ∀M > 0. (A.4)

(A7) The conditional density fX|Y of X given Y exists and is bounded. The conditional

density function f(X1,Xl+1)|(Y1,Yl+1) of (X1, X l+1) given (Y1, Yl+1) exists and is bounded for

all l ≥ 1.

REMARK 7. Assumptions on ϕ(.) in (A1) and (A2) are satisfied in almost all known robust

and likelihood type regressions. For example, in qth quantile regression, we have ϕ(t) = 2qI{t ≥

0}+ (2q − 2)I{t < 0}, while for the Huber’s function (6), its piecewise derivative is given by

ϕ(t) = tI{|t| < k}+ sign(t)kI{|t| ≥ k}.

Note that the condition E{ϕ(εi)|Xi} = 0 a.e. is necessary for model specification. Moreover, if

the conditional density f(y|x) of Y given X is also continuously differentiable with respect to

y, then as shown in Hong (2003) there exists a constant C > 0, such that for all small t and x,

E[{ϕ(Y ; t+ a)− ϕ(Y ; a)

}2|X = u

]≤ C|t| (A.5)

holds for all (a, u) in a neighborhood of (m(x), x). Define

G(t, u) = E{ϕ(Y ; t)|X = u}, Gi(t, u) = (∂i/∂ti)G(t, u). i = 1, 2, (A.6)

Then it holds that

g(x) = G1(m(x), x) ≥ C > 0, G2(t, x) is bounded for all x ∈ D and t near m(x). (A.7)

Assumptions (A3)-(A7) are standard for nonparametric smoothing in multivariate time series

analysis, see Masry (1996). For example, condition (A.3) is needed to bound the covariance

16

of the partial sums of time series as in Lemma 5.5, while (A.4) plays a similar role to (4.7b)

in Masry (1996). It guarantees that the dependence of the time series is weakly enough such

that the deviance caused by the approximation of dependent random variables by independent

ones (through Bradley’s strong approximation theorem) is negligible; see Lemma 5.4. Of course,

(A.4) is more stringent than (4.7b) in Masry (1996), due to the non-linear nature of the estimates

obtained by using the loss function ρ(.) instead of the method of least squares.

Proof of Proposition 3.1. Write β∗n(x) = −WpS−1n,p(x)

∑ni=1 Zni(x)/n, where

Zni(x) = H−1n h−dKh(Xi − x)ϕ(Yi, µ(Xi − x)>βp(x))µ(Xi − x).

We first focus on EZni(x). Based on (A.6) and (A.7), we have

E{ϕ(Yi, µ(Xi − x)>βp(x))|Xi} = G(µ(Xi − x)>βp(x), X i)

= −g(Xi){m(Xi)− µ(Xi − x)>βp(x)}

+G2(ξi(x), X i){m(Xi)− µ(Xi − x)>βp(x)}2/2

for some ξi(x) between µ(Xi − x)>βp(x) and m(Xi). Apparently, if Xi = x+ hv, then

m(Xi)− µ(Xi − x)>βp(x) = hp+1∑

|k|=p+1

Drm(x)k!

vk + hp+2∑

|k|=p+2

Drm(x)k!

vk + o(hp+2).

Therefore,

EZni(x) = hp+1

∫K(v)fg(x+ hv)µ(v)

∑|k|=p+1

Drm(x)k!

vkdv

+hp+2

∫K(v)fg(x+ hv)µ(v)

∑|k|=p+2

Drm(x)k!

vkdv + o(hp+2)

≡ T1 + T2.

Now arrange the Np+1 elements of the derivatives Drm(x)/r! for |r| = p+ 1 as a column vector

mp+1(x) using the lexicographical order introduced earlier and define mp+2(x) in the similar

way. Let the N ×Np+1 matrix Bn1(x) and the N ×Np+2 matrix Bn2(x) be defined as

Bn1(x) =

Sn,0,p+1(x)Sn,1,p+1(x)

...Sn,p,p+1(x)

, Bn2(x) =

Sn,0,p+2(x)Sn,1,p+2(x)

...Sn,p,p+2(x)

,

17

where Sn,i,p+1(x) and Sn,i,p+2(x) is as given by (11). Therefore, T1 = hp+1Bn1(x)mp+1(x),

T2 = hp+2Bn2(x)mp+2(x), and

Eβ∗n(x) = −Wphp+1S−1

n,p(x)Bn1(x)mp+1(x)−Wphp+2S−1

n,p(x)Bn2(x)mp+2(x) + o(hp+2).

Let ei, i = 1, · · · , d be the d × 1 vector having 1 in the ith entry and all other entries 0. For

0 ≤ j ≤ p, 0 ≤ k ≤ p+ 1, let Nj,k(x) be a Nj ×Nk matrix with its (l,m) element given by

[Nj,k(x)

]l,m

=d∑

i=1

Dei{fg}(x)∫K(u)uτj(l)+τk(m)+eidu,

and use these Nj,k(x) to construct a N ×N matrix Np(x) and a N ×Np+1 matrix M(x) via

Np(x) =

N0,0(x) N0,1(x) · · · N0,p(x)N1,0(x) N1,1(x) · · · N1,p(x)

.... . .

...Np,0(x) Np,1(x) · · · Np,p(x)

, M(x) =

N0,p+1(x)N1,p+1(x)

...Np,p+1(x)

.

Then Sn,p(x) = {fg}(x)Sp + hNp(x) + O(h2), Bn1(x) = {fg}(x)B1 + hM(x) + O(h2) and

Bn2(x) = {fg}(x)B2 +O(h). As S−1n,p(x) = {fg}−1(x)S−1

p − h{fg}−2(x)S−1p Np(x)S−1

p +O(h2),

we have

−Eβ∗n(x) =Wphp+1

[{fg}−1(x)S−1

p − h{fg}−2(x)S−1p Np(x)S−1

p

][{fg}(x)B1 + hM(x)

]mp+1(x)

+Wphp+2{fg}−1(x)S−1

p {fg}(x)B2mp+2(x) + o(hp+2)

=hp+1WpS−1p B1mp+1(x) + hp+2WpS

−1p

[{fg}−1(x)mp+1(x){M(x)−Np(x)S−1

p B1}

+B2mp+2(x)]

+ o(hp+2).

We claim that for elements Eβ∗nr(x) of Eβ∗n(x) with p−|r| even, the hp+1 term will vanish. This

means for any given r with |r| ≤ p and r2 with |r2| = p+ 1,

∑0≤|r|≤p

{S−1p }N(r1),N(r) νr+r2

= 0. (A.8)

To prove this, first note that for any r1 with 0 ≤ |r1| ≤ p and r2 with |r2| = p+ 1,

∑0≤|r|≤p

{S−1p }N(r1),N(r) νr+r2

=∫ur2Kr1,p(u)du, (A.9)

18

where Kr,p(u) = {|Mr,p(u)|/|Sp|}K(u) and Mr,p(u) is the same as Sp, but with the N(r) column

replaced by µ(u). Let cij denote the cofactor of {Sp}i,j and expand the determinant of Mr,p(u)

along the N(r) column. We can see that∫ur2Kr,p(u)du = |Sp|−1

∫ ∑0≤|r|≤p

cN(r),N(r1)ur2+rK(u)du,

whence (A.9) follows, because cN(r),N(r1)/|Sp| = {S−1p }N(r1),N(r) from the symmetry of Sp and

a standard result concerning cofactors. As a generalization of Lemma 4 in Fan et. al. (1995) to

multivariate case, we can further show that for any r1 with 0 ≤ |r1| ≤ p and p− |r1| even,∫ur2Kr,p(u)du = 0, for any |r2| = p+ 1,

which together with (A.9) leads to (A.8). �

We proceed to prove Theorem 3.2. Define Xix = Xi−x, µix = µ(Xix), Kix = Kh(Xix) and

ϕni(x; t) = ϕ(Yi;µ>ixβp(x) + t). For any α, β ∈ RN , define

Φni(x;α, β) = Kix

{ρ(Yi;µ>ix(α+ β + βp(x)))− ρ(Yi;µ>ix(β + βp(x)))− ϕi(x; 0)µ>ixα

}= Kix

∫ µ>ix(α+β)

µ>ixβ{ϕni(x; t)− ϕni(x; 0)}dt

and Rni(x;α, β) = Φni(x;α, β)− EΦni(x;α, β).

Lemma 5.1 Under assumptions (A1)− (A6), we have for all large M > 0,

supx∈D

supα ∈ B

(1)n ,

β ∈ B(2)n

|n∑

i=1

Rni(x;α, β)| ≤M3/2dn almost surely, (A.10)

where B(i)n ≡ {β ∈ RN : |Hnβ| ≤M

(i)n }, i = 1, 2.

Proof. Since D is compact, it can be covered by a finite number Tn of cubes Dk = Dn,k with

19

side length ln = O(T−1/dn ) = O{h(nhd/log n)−(1−λ2)/2} and centers xk = xn,k. Write

supx∈D

supα ∈ B

(1)n ,

β ∈ B(2)n

|n∑

i=1

Rni(x;α, β)| ≤ max1≤k≤Tn

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

Φni(xk;α, β)− EΦni(xk;α, β)∣∣∣

+ max1≤k≤Tn

supx∈Dk

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

{Φni(xk;α, β)− Φni(x;α, β)

}∣∣∣+ max

1≤k≤Tn

supx∈Dk

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

{EΦni(xk;α, β)− EΦni(x;α, β)

}∣∣∣≡Q1 +Q2 +Q3.

In Lemma 5.2, it is shown that Q2 ≤M3/2dn/3 almost surely and thus Q3 ≤M3/2dn/3.

It remains to bound Q1. To this end, partition B(i)n , i = 1, 2, into a sequence of disjoint

subrectangles D(i)1 , · · · , D(i)

J1, such that

|D(i)j1| = sup

{|Hn(α− β)| : α, β ∈ D(i)

j1

}≤ 2M−1M (i)

n / log n, 1 ≤ j1 ≤ J1.

Apparently, J1 ≤ (M log n)N . For every 1 ≤ j1 ≤ J1, 1 ≤ k1 ≤ J1, choose a point αj1 ∈ D(1)j1

and βk1 ∈ D(2)k1

. Then

Q1 ≤ max1 ≤ k ≤ Tn

1 ≤ j1, k1 ≤ J1

supα ∈ D

(1)j1

,

β ∈ D(2)k1

|n∑

i=1

{Rni(xk;αj1 , βk1)−Rni(xk;α, β)}|

+ max1 ≤ k ≤ Tn

1 ≤ j1, k1 ≤ J1

|n∑

i=1

Rni(xk;αj1 , βk1)| = Hn1 +Hn2. (A.11)

We first consider Hn1. For each j1 = 1, · · · , J1 and i = 1, 2, partition each rectangle D(i)j1

further into a sequence of subrectangles D(i)j1,1, · · · , D

(i)j1,J2

. Repeat this process recursively as

follows. Suppose after the lth round, we get a sequence of rectangles D(i)j1,j2,··· ,jl

with 1 ≤ jk ≤

Jk, 1 ≤ k ≤ l, then in the (l+1)th round, each rectangle D(i)j1,j2,··· ,jl

is partitioned into a sequence

of subrectangles {D(i)j1,j2,··· ,jl,jl+1

, 1 ≤ jl ≤ Jl} such that

|D(i)j1,j2,··· ,jl,jl+1

| = sup{|Hn(α− β)| : α, β ∈ D(i)

j1,j2,··· ,jl,jl+1

}≤ 2M (i)

n /(M l log n), 1 ≤ jl+1 ≤ Jl+1,

20

where Jl+1 ≤MN . End this process after the (Ln + 1)th round, with Ln given at the beginning

of Section 3. Let D(i)l , i = 1, 2, denote the set of all subrectangles of D(i)

0 after the lth round of

partition and a typical element D(i)j1,j2,··· ,jl

of D(i)l is denoted as D(i)

(jl). Choose a point α(jl) ∈ D

(1)(jl)

and β(jl) ∈ D(2)(jl)

and define

Vl =∑(jl),(kl)

P{∣∣∣ n∑

i=1

{Rni(xk;αjl, βkl

)−Rni(xk;αjl+1, βkl+1

)}∣∣∣ ≥ M3/2dn

2l

}, 1 ≤ l ≤ Ln,

Ql =∑(jl),(kl)

P{

supα ∈ D

(1)(jl)

,

β ∈ D(2)(kl)

∣∣∣ n∑i=1

{Rni(xk;αjl, βkl

)−Rni(xk;α, β)}∣∣∣ ≥ M3/2dn

2l

}, 1 ≤ l ≤ Ln + 1.

By (A4), it is easy to see that for any α ∈ D(1)(jLn+1) ∈ D

(1)Ln+1 and β ∈ D(2)

(kLn+1) ∈ D(2)Ln+1,

|Rni(xk;α, β) −Rni(xk;αjLn+1 , βkLn+1)| ≤ CM

(2)n

MLn+1 log n,

which together with the choice of Ln implies that QLn+1 = 0. As Ql ≤ Vl +Ql, 1 ≤ l ≤ Ln,

P (Hn1 >M3/2dn

2) ≤ TnQ1 ≤ Tn

Ln∑l=1

Vl. (A.12)

To bound Vl, l = 1, · · · ,Ln, let

Wn =n∑

i=1

Zni, Zni ≡ Rni(xk;αjl, βkl

)−Rni(xk;αjl+1, βjl+1

). (A.13)

Note that by (A2) we have, uniformly in x, α and β, that

|Φni(x;α, β)| ≤ CM (1)n . (A.14)

Therefore, |Zni| ≤ CM(1)n . Using Lemma 5.6, we can apply Lemma 5.4 to each Vl with

B1 = C1M(1)n , B2 = nhd(M (1)

n )2M (2)n {M l log n}−2/ν2 ,

rn = rln ≡ (2ν2/2/M)2l/ν2r(n), q = n/rl

n, η = M3/2dn/2l,

λn = (2C1M(1)n rl

n)−1, Ψ(n) = Cq3/2/η1/2γ[rln]{rl

nM(1)n }1/2.

Note that nM (1)n /η →∞, rl

n →∞ for all 1 ≤ l ≤ Ln from (A.2) and

λη = CM1/2 log nM2l/ν2/22l, λ2B2 = C log n1−2/ν2M2l/ν2/22l = o(λη),

21

which hold uniformly for all 1 ≤ l ≤ Ln. Therefore,

Vl ≤( l+1∏

j=1

J2j

)4 exp{−C1 log n(M/2ν2)2l/ν2}+ C2τ

ln,

where, because J1 ≤ 2(M log n)N and Jl ≤ 2MN for 2 ≤ l ≤ Ln, τln is given by

τ ln = 4lM2N(l+1)(log n)2Nn3/2 γ[r

ln]{M (1)

n }1/2

rln{dn}1/2

.

It is tedious but easy to check that for M large enough,

Tn

Ln∑l=1

[( l+1∏j=1

J2j

)4 exp{−C1 log n(M/2ν2)2l/ν2}

]is summable over n. (A.15)

As γ[rln]/rl

n is increasing in l, we have

Tn

Ln∑l=1

τ ln ≤ Tn(log n)2Nn3/2 {M

(1)n }1/2

{dn}1/2

γ[rLnn ]

rLnn

Ln∏l=1

4lM2N(l+1),

which is again summable over n according to (A.4). This along with (A.12) and (A.15) implies

that Hn1 ≤M3/2dn/2 almost surely, using the Borel-Cantelli lemma.

For Hn2, first note that

P (Hn2 > η) ≤ TnJ21P (|

n∑i=1

Rni(x;αj1 , βk1)| > η). (A.16)

We apply Lemma 5.4 to quantify P (|∑n

i=1Rni(x;αj1 , βk1 | > η), with rn = r(n), B1 = 2C1M(1)n ,

B2 = C2nhd(M (1)

n )2M (2)n , λn = {r(n)M (1)

n }−1/4C1 and η = M3/2dn. Then nB1/η →∞ and

λnη/4 = (nhd)(1−λ2)/2(log n)(1+λ2)/2/{16C1r(n)} = M1/2 log n/(16C1),

λ2nB2 = M1/4(nhd)1−λ2(log n)λ2/{16C2

1r2(n)} = M1/4 log n/(16C2

1 ),

Ψ(n) ≡ qn{nB1/η}1/2γ[rn] = TnJ21 q(n)3/2/η1/2γ[r(n)]{r(n)M (1)

n }1/2,

where Ψ(n) is summable over n under condition (A.4). Therefore,

P (Hn2 > η) ≤ 2TnJ21/n

b + Ψ(n), b =1

16C1(M1/2 −M1/4C2/C1). (A.17)

By selecting M large enough, we can ensure that the right hand side of (A.17) is summable over

n. Thus, for M large enough, Hn2 ≤ M3/2dn almost surely. By (A.39), we know for large M ,

Q1 ≤M3/2dn almost surely. �

The quantification of Q2 is relatively more involved, so we put it as a separate lemma.

22

Lemma 5.2 Under conditions in Lemma 5.1, Q2 ≤M3/2dn/3 almost surely.

Proof. LetXik = Xi−xk, µik = µ(Xik) andKik = Kh(Xik).Write Φni(xk;α, β)−Φni(x;α, β) =

ξi1 + ξi2 + ξi3, where

ξi1 =(Kikµik −Kixµix

)>α

∫ 1

0

{ϕni(xk;µ

>ik(β + αt))− ϕni(xk; 0)

}dt,

ξi2 = Kixµ>ixα

∫ 1

0

{ϕni(xk;µ

>ik(β + αt))− ϕni(x;µ>ix(β + αt))

}dt,

ξi3 = Kixµ>ixα{ϕni(x; 0)− ϕni(xk; 0)}.

Then P (Q2 > M3/2dn/3) ≤ Tn(Pn1 + Pn2 + Pn3), with

Pnj ≡ max1≤k≤Tn

P(

supx∈Dk

supα ∈ B

(1)n ,

β ∈ B(2)n

|n∑

i=1

ξij | ≥M3/2dn/9), j = 1, 2, 3.

Based on Borel-Cantelli lemma, Q2 ≤M3/2dn almost surely, if∑

n TnPnj <∞, j = 1, 2, 3.

We first study Pn1. For any fixed α ∈ B(1)n and β ∈ B

(2)n , let Iα,β

ik = 1, if there exists some

t ∈ [0, 1], such that there are discontinuity points of ϕ(Yi; θ) between µ>ik(βp(xk) + β + αt)) and

µ>ikβp(xk); and Iα,βik = 0, otherwise. Write ξi1 = ξi1I

α,βik + ξi1(1 − Iα,β

ik ). Note that by (A3),

|(Kikµik−Kixµix)>α| ≤ C2M(1)n ln/h. Then by (A2) and the fact that |µ>ik(β+αt)| ≤ CM

(2)n , we

have |ξi1(1− Iα,βik )| ≤ CM

(2)n M

(1)n ln/h uniformly in i, α, β and x ∈ Dk. Define Uik = I{|Xik| ≤

2h}, whence ξi1 = ξi1Uik since ln = o(h). Therefore,

P(

supα ∈ B

(1)n ,

β ∈ B(2)n

supx∈Dk

∣∣∣ n∑i=1

ξi1(1− Iα,βik )

∣∣∣ > M3/2dn

18

)≤ P

( n∑i=1

Uik >M1/4nhd

18C

)

≤ P(|

n∑i=1

Uik − EUik| >M1/4nhd

36C

), (A.18)

where the second inequality follows from the fact that Var(∑n

i=1 I{|Xik| ≤ 2h) = O(nhd) implied

by Lemma 5.5. To quantify (A.18), we apply Lemma 5.4 withB1 = 1, η = M1/4nhd/(18C), B2 =

nhd, rn = r(n). As λnη = CM1/4 log n(nhd/ log n)(1+λ2)/2, λ2nB2 = o(λnη) and TnΨn is

summable over n under condition (A.4), we know that

TnP(

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

ξi1(1− Iα,βik )

∣∣∣ > M3/2dn/18)

is summable over n, (A.19)

23

whence∑

n TnPn1 <∞, is equivalent to

TnP(

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

ξi1Iα,βik

∣∣∣ > M3/2dn/18)

is summable over n. (A.20)

To prove (A.20), first note that Iα,βik ≤ I{εi ∈ Sα,β

i;k }, where

Sα,βi;k =

m⋃j=1

⋃t∈[0,1]

[aj −A(Xi, xk) + µ>ik(β + αt), aj −A(Xi, xk)]

⊆m⋃

j=1

[aj − CM (2)n , aj + CM (2)

n ] ≡ Dn, for some C > 0,

A(x1, x2) = (p+ 1)∑

|r|=p+1

1r!

(x1 − x2)r

∫ 1

0Drm(x2 + w(x1 − x2))(1− w)pdw,

where in the derivation of Sα,βi;k ⊆ Dn, we have used the fact that |Xik| ≤ 2h and A(Xi, xk) =

O(hp+1) = O(M (2)n ) uniformly in i. As Iα,β

ik ≤ I{εi ∈ Dn}, we have |ξi1|Iα,βik ≤ |ξi1|Uni, where

Uni ≡ I(|Xik| ≤ 2h)I{εi ∈ Dn}, which is independent of the choice of α and β. Therefore,

P(

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

ξi1Iα,βik

∣∣∣ > M3/2dn/18)≤ P

( n∑i=1

Uni > M1/2nhdM (2)n /(18C)

)

≤ P( n∑

i=1

(Uni − EUni) >M1/2nhdM

(2)n

36C

), (A.21)

where the first inequality is because |ξi1| ≤ CM(1)n ln/h and the second one is because EUni =

O(hdM(2)n ) by (A1). As EU2

ni = EUni, by Lemma 5.5, we know that Var(∑n

i=1 Uni) = CnhdM(2)n .

We can then apply Lemma 5.4 to the last term in (A.21) with

B2 = CnhdM (2)n , B1 ≡ 1, rn = r(n), η ≡M1/2nhdM (2)

n /(36C).

Apparently, λnη = C log n(nhd/log n)(1−λ2)/2 and λ2nB2 = o(λnη). As in this case TnΨn is still

summable over n by (A.4), (A.20) thus follows.

For Pn2, first note that using approach for Pn1, we can show that

TnP(

supα ∈ B

(1)n ,

β ∈ B(2)n

supx∈Dk

∣∣∣ n∑i=1

{ξi2 − ξi2}∣∣∣ ≥M3/2dn/18

)is summable over n.

24

where

ξi2 = Kikµ>ikα

∫ 1

0

{ϕni(xk;µ

>ik(β + αt))− ϕni(x;µ>ix(β + αt))

}dt.

Therefore, we would have∑

TnPn2 <∞, if

TnP(

supα ∈ B

(1)n ,

β ∈ B(2)n

supx∈Dk

∣∣∣ n∑i=1

ξi2

∣∣∣ ≥M3/2dn/18)


For any fixed α ∈ B(1)n , β ∈ B

(2)n and x ∈ Dk, let Iα,β

i;k,x = 1, if there exists some interval

[t1, t2] ⊆ [0, 1], such that

Yi − µ>ik(βp(xk) + β + αt) ≤ aj ≤ Yi − µ>ix(βp(x) + β + αt), ∀t ∈ [t1, t2] (A.23)

with aj ∈ {a1, · · · , am}; and Iα,βi;k,x = 0, otherwise. Write ξi2 = ξi2I

α,βi;k,x + ξi2(1−Iα,β

i;k,x). Note that

Kikµ>ikα = O(M (1)

n ) and ϕni(xk;µ>ik(β + αt)) − ϕni(x;µ>ix(β + αt)) = O(M (2)n ln/h) if Iα,β

i;k,x = 0.

Then again as ξi2 = ξi2I{|Xik| ≤ 2h}, we have similar to (A.19) that

TnP(

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

ξi2(1− Iα,βi;k,x)

∣∣∣ > M3/2dn/18)

is summable over n.

Therefore, by (A.22), to show∑

TnPn2 <∞, it is sufficient to show that

TnP(

supα ∈ B

(1)n ,

β ∈ B(2)n

supx∈Dk

∣∣∣ n∑i=1

ξi2Iα,βi;k,x

∣∣∣ ≥M3/2dn/36)


To this end, define εi = εi +A(Xi, xk). Then Iα,βi;k,x = 1, i.e. (A.23) is equivalent to

A(Xi, xk)−A(Xi, x) + µ>ix(β + αt) ≤ εi − aj ≤ µ>ik(β + αt), ∀t ∈ [t1, t2]. (A.25)

Let δn ≡M(2)n ln/h. Then |A(Xi, xk)−A(Xi, x)| ≤ Cδn, |(µik − µix)>β| ≤ Cδn and (A.25) thus

implies that

−2Cδn + µ>ik(β + αt) ≤ εi − aj ≤ µ>ik(β + αt) + 2Cδn, ∀t ∈ [t1, t2]. (A.26)

Without loss of generality, assume µ>ikα > 0. Then from (A.26) we can see that

−2Cδn + µ>ik(β + αt2) ≤ εi − aj ≤ µ>ik(β + αt1) + 2Cδn, (A.27)

25

which in turn means that if Iα,βi;k,x = 1, then |ξi2| ≤ C(t2 − t1)|µ>ikα| ≤ 4Cδn uniformly in

i, α ∈ B(1)n , β ∈ B(2)

n and x ∈ Dk. Therefore, as ξi2 = ξi2I{|Xik| ≤ 2h}, we have

P(

supα ∈ B

(1)n

β ∈ B(2)n

supx∈Dk

∣∣∣ n∑i=1

ξi2Iα,βi;k,x

∣∣∣ ≥ M3/2dn

36

)

≤ P(

supα ∈ B

(1)n

β ∈ B(2)n

supx∈Dk

n∑i=1

I{|Xik| ≤ 2h}Iα,βi;k,x ≥

M5/4nhdM(1)n

36C

). (A.28)

We will bound Iα,βi;k,x by a random variable that is independent of the choice of α ∈ B

(1)n and

x ∈ Dk. By the definition of Iα,βi;k,x and (A.27), the necessary condition for Iα,β

i;k,x = 1 is

εi ∈m⋃

j=1

[aj + µ>ikβ − 2M (1)n , aj + µ>ikβ + 2M (1)

n ] ≡ Dβni, (A.29)

which is indeed independent of the choice of α and x ∈ Dk. Therefore,

P(

supα ∈ B

(1)n ,

β ∈ B(2)n

supx∈Dk

n∑i=1

I{|Xik| ≤ 2h}Iα,βi;k,x ≥

M5/4nhdM(1)n

36C

)

≤ P(

supβ∈B

(2)n

n∑i=1

I{|Xik| ≤ 2h}I{εi ∈ Dβni} ≥

M5/4nhdM(1)n

36C

). (A.30)

Now we partition B(2)n into a sequence of subrectangles S1, · · · , Sm, such that

|Sl| = sup{|Hn(β − β′)| : β, β′ ∈ Sl

}≤M (1)

n , 1 ≤ l ≤ m.

Obviously, m ≤ (M (2)n /M

(1)n )N = M−3N/4(nhd/ log n)(λ1−λ2)N . Choose a point βl ∈ Sl for each

1 ≤ l ≤ m, and thus

P(

supβ∈B

(2)n

n∑i=1

I{|Xik| ≤ 2h}I{εi ∈ Dβni} ≥

M5/4nhdM(1)n

36C

)≤ mP

( n∑i=1

I{|Xik| ≤ 2h}I{εi ∈ Dβlni} ≥

M5/4nhdM(1)n

72C

)+mP

(supβ′∈Sl

n∑i=1

I{|Xik| ≤ 2h}|I{εi ∈ Dβlni} − I{εi ∈ Dβ′

ni}| ≥M5/4nhdM

(1)n

72C

)≡ m(T1 + T2). (A.31)

26

We deal with T1 first. Let

U jni ≡ I{|Xik| ≤ 2h}I{εi ∈ Dβl

ni}. (A.32)

Then by the definition of Dβj

ni given in (A.29), EU jni = O(hdM

(1)n ) < M5/4hdM

(1)n /(144C) for

large M and we have

T1 ≤ P( n∑

i=1

(U jni − EU j

ni) ≥M5/4nhdM

(1)n

144C

).

We can thus apply Lemma 5.4 to the quantity on the right hand side with B1 ≡ 1, B2 given by

(A.51), rn = r(n) and η ∝M5/4nhdM(1)n , and λn = 1/(2rn). It follows that

λnη = CM5/4 log n(nhd/ log n)(1+λ2)/2−λ1 , λ2nB2 = C log n(nhd/ log n)−2(λ1−λ2)/ν2 .

As (1 + λ2)/2 ≥ λ1 and λ2 < λ1, we have T1 = O(n−b) for any b > 0.

For T2, note that as |µ>ik(β − βl)| ≤ CM(1)n for any β ∈ Sl, 1 ≤ l ≤ m, we have

|I{εi ∈ Dβlni} − I{εi ∈ Dβ

ni}| = I{εi ∈ Dβlni rDβ

ni}

≤ I{εi ∈

m⋃j=1

[aj + µ>ikβl − CM (1)n , aj + µ>ikβl + CM (1)

n ]}≡ Uni,

for some C > 0, which is independent of the choice of β ∈ Sl. Therefore,

T2 ≤ P( n∑

i=1

I{|Xik| ≤ 2h}Uni ≥M5/4nhdM

(1)n

72C

),

which can be dealt with similarly as with T1 and thus T2 = O(n−b) for any b > 0. Thus from

(A.28), (A.30) and (A.31), we can claim that (A.24) is true and thus TnPn2 is summable over

n.

Dealing with Pn3 is simpler, as no β is involved in ξi3. For any given x ∈ Dk, let Ii;k,x = 1,

if there is a discontinuity point of ϕ(Yi; θ) between µ>ikβp(xk) and µ>ixβp(x); and Ii;k,x = 0,

otherwise. Write ξi3 = ξi3Ii;k,x + ξi3(1 − Ii;k,x). Again by (A2) and the fact that |Kixµ>ixα| =

O(M (1)n ) and |µ>ikβp(xk) − µ>ixβp(x)| = |A(Xi, xk) − A(Xi, x)| = O(M (2)

n ln/h), we have similar

to (A.19) that

TnP(

supα ∈ B

(1)n

x ∈ Dk

∣∣∣ n∑i=1

ξi3(1− Ii;k,x)∣∣∣ > M3/2dn/18

)is summable over n.

27

It’s easy to see that Ii;k,x ≤ I{εi +A(Xi, xk) ∈ Si;k,x}, where

Si;k,x =m⋃

j=1

⋃t∈[0,1]

[aj − |A(Xi, xk)−A(Xi, x)|, aj + |A(Xi, xk)−A(Xi, x)|

]

⊆m⋃

j=1

[aj − CM (2)n ln/h, aj + CM (2)

n ln/h] ≡ Dn, for some C > 0.

Therefore, |ξi3|Ii;k,x = |ξi3|I{|Xik| ≤ 2h}Ii;k,x ≤ Uni, with

Uni ≡M (1)n I{|Xik| ≤ 2h}I{εi +A(Xi, xk) ∈ Dn},

which is independent of the choice of α ∈ B(1)n and x ∈ Dk. Therefore,

TnP(

supα ∈ B

(1)n

x ∈ Dk

∣∣∣ n∑i=1

ξi3Ii;k,x

∣∣∣ > M3/2dn/18)≤ TnP

( n∑i=1

[Uni − EUni] > M3/2dn/36), (A.33)

where we have used the fact that EUni = O(hdM(1)n M

(2)n ln/h) = O(dn/n). We will have∑

TnPn3 <∞ if the right hand side in (A.33) is summable over n, i.e.

TnP( n∑

i=1

[Uni − EUni] > M3/2dn/36)


It’s easy to check that Lemma 5.5 again holds with ψx(Xi, Yi) standing for Uni. Applying Lemma

5.4 to (A.34) with B1 ≡ M(1)n , B2 ≡ Cnhd(M (1)

n )2M (2)n ln/h, η ≡ M3/2dn/36 and rn = r(n), we

have (note that nB1/η →∞ indeed)

λnη/4 = CM1/2 log n, λ2nB2 = Cr−2/ν2

n log n = o(λnη).

Thus, TnΨn is again summable over n and (A.34) indeed holds. �

Proof of Theorem 3.2. Let λ1 = λ(s). Then according to Lemma 5.1 and Lemma 5.9, we

know that with probability 1, there exists some C1 > 1, such that for all large M > 0,

supx∈D

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

Φni(x;α, β)− nhd

2(Hnα)>Snp(x)Hn(α+ 2β)

∣∣∣≤ C1M

3/2(dn1 + dn) ≤ 2C1M3/2(nhd)1−2λ1(log n)2λ1 for large n, (A.35)

where dn1 = (nhd)1−λ1−2λ2(log n)λ1+2λ2 . Note that based on (12), we can writen∑

i=1

Kniϕ(Yi;µ>niβp(x))µ>niα = nhdβ∗n(x)>W−1p Snp(x)Hnα.

28

Replace B(1)n in (A.35) with B

(1)nk =

{α ∈ RN : k ≤ M−1(nhd/ log n)λ1 |Hnα| ≤ k + 1} and M

with (k + 1)M . We have, by the definition of Φni(x;α, β), that

infx∈D

infα ∈ B

(1)nk ,

β ∈ B(2)n

{ n∑i=1

ρ(Yi;µ>ni(α+ β + βp(x)))Kni −n∑

i=1

ρ(Yi;µ>ni(β + βp(x)))Kni

+nhd(W−1p β∗n(x)−Hnβ)>Snp(x)Hnα

}≥ inf

x∈Dinf

α∈B(1)nk

nhd

2(Hnα)>Snp(x)Hnα− 2CM3/2(nhd)1−2λ1(log n)2λ1

≥{C3(kM)2/2− 2C1(k + 1)3/2M3/2

}(nhd)1−2λ1(log n)2λ1

≥ (8− 25/2)C1C3/24 (nhd)1−2λ1(log n)2λ1 > 0 almost surely, (A.36)

where the last term is independent of the choice of k ≥ 1. The last inequality is derived as follows.

As Sp > 0, suppose its minimum eigenvalue is τ1 > 0. As Snp(x) → g(x)f(x)Sp uniformly in

x ∈ D by Lemma 5.8 and g(x)f(x) is bounded away from zero by (A5) and (A.7), there exists

some constant C3 > 0, such that for all x ∈ D, the minimum eigenvalue of Snp(x) is greater

than C3. The last inequality thus holds if M ≥ C4 = (16C1/C3)2. Note that

∞⋃k=1

B(1)nk =

{α| ∈ RN :

( nhd

log n

)λ1

|Hnα| ≥M}

:= BNn . (A.37)

Therefore, from (A.36) and (A.37), we have

infx∈D

infα ∈ BN

n ,

β ∈ B(2)n

{ n∑i=1

ρ(Yi;µ>ni(α+ β + βp(x)))Kni −n∑

i=1

ρ(Yi;µ>ni(β + βp(x)))Kni

+nhd(W−1p β∗n(x)−Hnβ)>Snp(x)Hnα

}> 0 almost surely. (A.38)

Note that by (A.40), Lemma 5.10 and Proposition 3.1, we have |β∗n(x)| ≤ C3(nhd/ log n)−λ2

uniformly in x ∈ D almost surely. Namely, β∗n(x) ∈ B(2)n for all x ∈ D, if M > C4

3 . This implies

that if M > max(C43 , C4), (A.38) still holds with β replaced with H−1

n W−1p β∗n(x). Therefore,

infx∈D

infα∈BN

n

{ n∑i=1

Kniρ(Yi;µ>ni(α+H−1n W−1

p β∗n(x) + βp(x)))

−n∑

i=1

Kniρ(Yi;µ>ni(H−1n W−1

p β∗n(x) + βp(x)))}> 0,

29

which is equivalent to Theorem 3.2. �

Proof of (13). Let dn = (nhd)1−2λ1(log n)2λ1 . Following the proof lines of Theorem 3.2, we

can see that (13) will follow if

supx∈D

supα ∈ B

(1)n ,

β ∈ B(2)n

|n∑

i=1

Rni(x;α, β)| ≤M3/2dn almost surely,

with λ1 = 1, λ2 = 1/2 and B(i)n , i = 1, 2 defined as in Lemma 5.1.

To prove this, cover D by a finite number Tn = {(nhd/ log n)1/2/h}d of cubes Dk = Dnk with

side length ln = O{h(nhd/ log n)−1/2} and centers xk = xn,k. Write

supx∈D

supα ∈ B

(1)n ,

β ∈ B(2)n

|n∑

i=1

Rni(x;α, β)| ≤ max1≤k≤Tn

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

Φni(xk;α, β)− EΦni(xk;α, β)∣∣∣

+ max1≤k≤Tn

supx∈Dk

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

{Φni(xk;α, β)− Φni(x;α, β)

}∣∣∣+ max

1≤k≤Tn

supx∈Dk

supα ∈ B

(1)n ,

β ∈ B(2)n

∣∣∣ n∑i=1

{EΦni(xk;α, β)− EΦni(x;α, β)

}∣∣∣≡ Q1 +Q2 +Q3.

We will show that with probability 1, Qk ≤M3/2dn/3, k = 1, 2, 3.

Define ξij as in Lemma 5.1. As P (Q2 > M3/2dn/2) ≤ Tn(Pn1 + Pn2 + Pn3), where

Pnj ≡ max1≤k≤Tn

P(

supx∈Dk

supα ∈ B

(1)n ,

β ∈ B(2)n

|n∑

i=1

ξij | ≥M3/2dn/9), j = 1, 2, 3.

Then by Borel-Cantelli lemma, Q2 ≤M3/2dn/2 almost surely, if∑

n TnPnj <∞, for j = 1, 2, 3.

We only prove that for Pn1 to illustrate. Recall that

ξi1 =(Kikµik −Kixµix

)>α

∫ 1

0

{ϕni(xk;µ

>ik(β + αt))− ϕni(xk; 0)

}dt.

Because |(Kikµik − Kixµix)>α| ≤ C2M(1)n ln/h, |µ>ik(β + αt)| ≤ CM

(2)n and ϕ(.) is Lipschitz

continuous, we have |ξi1| ≤ CM(2)n M

(1)n ln/h. Define Uik = I{|Xik| ≤ 2h}. As ln = o(h), we can

30

see that ξi1 = ξi1Uik and similar to (A.18), we have

P(

supα ∈ B

(1)n ,

β ∈ B(2)n

supx∈Dk

∣∣∣ n∑i=1

ξi1

∣∣∣ > M3/2dn

9

)≤ P

( n∑i=1

Uik >M1/4nhd

9C

)

≤ P(|

n∑i=1

Uik − EUik| >M1/4nhd

18C

),

and∑

n TnPnj < ∞ thus follows from similar arguments as those lying between (A.18) and

(A.19).

The proof of Q1 ≤M3/2dn/2 almost surely is much easier than in Lemma 5.1, if ϕ(.) is Lipschitz

continuous. Instead of the iterative partition approach adopted there, we once for all partition

B(i)n , i = 1, 2, into a sequence of disjoint subrectangles D(i)

1 , · · · , D(i)J1

such that

|D(i)j1| = sup

{|Hn(α− β)| : α, β ∈ D(i)

j1

}≤M (i)

n (log n/n)1/2, 1 ≤ j1 ≤ J1.

Obviously J1 ≤ (n/ log n)N/2. Choose a point αj1 ∈ D(1)j1

and βk1 ∈ D(2)k1

. Then

Q1 ≤ max1 ≤ k ≤ Tn

1 ≤ j1, k1 ≤ J1

supα ∈ D

(1)j1

,

β ∈ D(2)k1

|n∑

i=1

{Rni(xk;αj1 , βk1)−Rni(xk;α, β)}|

+ max1 ≤ k ≤ Tn

1 ≤ j1, k1 ≤ J1

|n∑

i=1

Rni(xk;αj1 , βk1)| = Hn1 +Hn2. (A.39)

By Lipschitz continuity of ϕ(.), we have for any α ∈ D(1)j1

and β ∈ D(2)k1

,

|Φni(xk;αj1 , βk1)− Φni(xk;α, β)|2 = O({M (2)n }3 log n/n) < M3/2dn/(4n).

Therefore, it remains to show that P (Hn2 > M3/2dn/4) is summable over n.

First note that by Cauchy inequality, |Rni(x;α, β)|2 = O({M (1)n M

(2)n }2) and E|Rni(x;α, β)|2 =

O(hd{M (1)n M

(2)n }2) uniformly in Xi, x, α ∈M

(1)n and β ∈M (2)

n . Next, for any η > 0,

P (Hn2 > η) ≤ TnJ21P (|

n∑i=1

Rni(x;αj1 , βk1)| > η).

We apply Lemma 5.4 with rn = (nhd/ log n)1/2, B1 = 2C1M(1)n M

(2)n , B2 = C2nh

d(M (1)n M

(2)n )2, λn =

31

(4C1rn{M (2)n }2)−1 and η = M3/2dn/4. It is easy to see that nB1/η →∞ and

λnη/4 = M log n/(16C1), λ2nB2 = o(λnη)

Ψ(n) ≡ qn{nB1/η}1/2γ[rn] = n3/2(log n)−1/2γ[r(n)]/r(n).

As TnJ21Ψ(n) is summable over n by condition (A.4), so is P (Hn2 > M3/2dn/4). �

Proof of Corollary 3.3. As 1 + λ2 ≥ 2λ1, it’s sufficient to prove that with probability 1,

β∗n(x)− Eβ∗n(x)− 1nhd

WpS−1np (x)H−1

n

n∑i=1

Kh(Xi − x)ϕ(εi)µ(Xi − x) = O{( log n

nhd

)(1+λ2)/2},

(A.40)

uniformly in x ∈ D. As ϕ(εi) ≡ ϕ(Yi,m(Xi)) and Eϕ(εi) = 0, the term on the left hand side of

(A.40) stands for

WpS−1n,p(x)

1nhd

n∑i=1

{Zni(x)− EZni(x)},

where

Zni(x) = H−1n Kh(Xi − x)µ(Xi − x)

{ϕ(Yi, µ(Xi − x)>βp(x))− ϕ(εi)

}.

Next, similar to what we did in Lemma 5.1, we cover D with number Tn cubes Dk = Dn,k with

side length ln = O(T−1/dn ) and centers xk = xn,k. Write

supx∈D

|n∑

i=1

Zni(x)− EZni(x)| ≤ max1≤k≤Tn

∣∣∣ n∑i=1

Zni(xk)− EZni(xk)∣∣∣

+ max1≤k≤Tn

supx∈Dk

∣∣∣ n∑i=1

Zni(x)− Zni(xk)∣∣∣

+ max1≤k≤Tn

supx∈Dk

∣∣∣ n∑i=1

EZni(x)− EZni(xk)∣∣∣

≡Q1 +Q2 +Q3.

As Zni(x) − Zni(xk) = H−1n Kh(Xi − x)µ(Xi − x){ϕni(x; 0) − ϕni(xk; 0)}, through approaches

similar to that for ξi3 in the proof of Lemma 5.2, we can show that

Q2 = O{( nhd

log n

)(1−λ2)/2log n

}almost surely

and the same result for Q3 also holds. To bound Q1, first note that EZ2ni(xk) = O(hp+1+d)

uniformly in i and k. As |Zni(x)| ≤ C for some constant C by (A2), we can see that from

32

Lemma 5.5n∑

i=1

EZ2ni(xk) +

∑i<j

|Cov(Zni(xk), Znj(xk))| ≤ C2nhp+1+d.

Finally by Lemma 5.4 with B1 = C1 , B2 ≡ Cnhp+1+d, η = A3(nhd/ log n)(1−λ2)/2 log n and

rn = r(n), we have, as nB1/η →∞ that

λnη = A3/(2C1) log n, λ2nB2 = C2/(4C2

1 ) log n.

Therefore,

P(

max1≤k≤Tn

∣∣∣ n∑i=1

Zni(xk)− EZni(xk)∣∣∣ ≥ A3(nhd/ log n)(1−λ2)/2 log n

)≤ Tn/n

a + CTnΨn,

where a = A3/(8C1) − C2/(4C21 ). By selecting A3 large enough, we can ensure that Tn/n

a is

summable over n. As TnΨn is summable over n from (A.4), we can conclude that

Q1 = O{( nhd

log n

)(1−λ2)/2log n

}almost surely.

This together with Lemma 5.8 completes the proof. �

Proof of Corollary 4.1. Through the proof lines for Theorem 3.2 and Corollary 3.3, it’s

not difficult to see that Corollary 3.3 still holds under the conditions imposed here. Under the

additive structure (4), we thus have

φn1(x1) =φ1(x1) +1n

n∑i=1

m2(X2i)− hp+1e1WpS−1p B1

1n

n∑i=1

mp+1(x1, X2i)

+1

n2h1hd−1e1

n∑j=1

ϕ(εj)n∑

i=1

S−1np (x1, X2i)K(X1,xj/h1, X2,ij/h)µ(X1,xj/h1, X2,ij/h)

+ op({max(h1, h)}p+1) +Op{(nh1hd−1/log n)−3/4}, (A.41)

where X1,xj = X1j − x, X2,ij = X2i −X2j and e1 is as in Proposition 3.1. Note that by (17),

(nh1)1/2(nh1hd−1/log n)−3/4 → 0, the Op(.) term can thus be safely ignored.

By central limit theorem for strongly mixing processes (Bosq, 1998, Theorem 1.7), we have

1n

n∑i=1

m2(X2i) = Op(n−1/2),1n

n∑i=1

mp+1(x1, X2i) = Emp+1(x1, X2) +Op(n−1/2).

33

As the expectations of all other terms in (A.41) are 0, the leading term in the asymptotic bias

of φ1(x1)− φ1(x1) is thus given by

−{max(h1, h)}p+1e1WpS−1p B1Emp+1(x1, X2).

Again through standard arguments in Masry (1996), we can see that

1nhd−1

n∑i=1

S−1np (x1, X2i)Kh(X1,xj , X2,ij)µ(X1,xj/h1, X2,ij/h)

= S−1np (x1, X2j)f2(X2j)

∫[0,1]⊗d−1

{Kµ}(X1,xj/h1, v)dv{

1 +O({ log n

nhd−1

}1/2)}uniformly in 1 ≤ i ≤ n. Therefore, the leading term in the asymptotic variance of φn1(x1)−φ1(x1)

is the variance of the following term

(nh1)−1e1

n∑j=1

ϕ(εj)S−1np (x1, X2j)f2(X2j)

∫[0,1]⊗d−1

{Kµ}(X1,xj/h1, v)dv,

which is asymptotically

(nh1)−1{∫

[0,1]⊗d−1

{fg2}−1(x1, X2)f22 (X2)σ

2(x1, X2)dX2

}e1S

−1p K2K

>2 S

−1p e>1 . (A.42)

If ρ(y; θ) = (2q − 1)(y − θ) + |y − θ| and ϕ(θ) = 2qI{θ > 0} + (2q − 2)I{θ < 0}, we have

g(x) = 2fε(0|x) and

σ2(x) = E[ϕ2(ε)|X = x] = 4q2(1− Fε(0)) + 4(1− q)2Fε(0) = 4q(1− q),

which when substituted into (A.42), yields the asymptotic variance of the quantile regression

estimator,

σ2(x1) = q(1− q){∫

[0,1]⊗d−1

f−1(x1, X2)f−2ε (0|x1, X2)f

22 (X2)dX2

}e1S

−1p K2K

>2 S

−1p e>1 . �

The next Lemma is due to Davydov (Hall and Heyde (1980), Corollary A.2).

Lemma 5.3 Suppose X and Y are random variables which are respectively G− and H− mea-

surable, where G− and H− are two σ−algebras. E|X|p < ∞, E|Y |q < ∞, with p > 1, q > 1,

and p−1 + q−1 < 1. Then

|EXY − EXEY | ≤ 8‖X‖p‖Y ‖q

{sup

A∈G,B∈H|P (AB)− P (A)P (B)|

}1−p−1−q−1

.

34

The next lemma is a generalization of some results in the proof of Theorem 2 in Masry (1996).

Lemma 5.4 Suppose {Zi}∞i=1 is a zero-mean strictly stationary processes with strong mixing

coefficient γ[k], and that |Zi| ≤ B1,∑n

i=1EZ2i +

∑i<j |Cov(Zi, Zj)| ≤ B2. Then for any η > 0

and integer series rn →∞, if nB1/η →∞ and qn ≡ [n/rn] →∞, we have

P (|n∑

i=1

Zi| ≥ η) ≤ 4 exp{−λnη

4+ λ2

nB2}+ CΨ(n),

where Ψ(n) = qn{nB1/η}1/2γ[rn], λn = 1/{2rnB1}.

Proof. We partition the set {1, · · · , n} into 2q ≡ 2qn consecutive blocks of size r ≡ rn with

n = 2qr + v and 0 ≤ v < r. Write

Vn(j) =jr∑

i=(j−1)r+1

Zi, j = 1, · · · , 2q

and

W ′n =

q∑j=1

Vn(2j − 1), W ′′n =

q∑j=1

Vn(2j), W ′′′n =

n∑i=2qr+1

Zi.

Then Wn ≡∑n

i=1 Zi = W ′n +W ′′

n +W ′′′n . The contribution of W ′′′

n is negligible as it consists of

at most r terms compared of qr terms in W ′n or W ′′

n . Then by the stationarity of the processes,

for any η > 0,

P (Wn > η) ≤ P (W ′n > η/2) + P (W ′′

n > η/2) = 2P (W ′n > η/2). (A.43)

To bound P (W ′n > η/2), using recursively Bradley’s Lemma, we can approximate the random

variables Vn(1), Vn(3), · · · , Vn(2q−1) by independent random variables V ∗n (1), V ∗

n (3), · · · , V ∗n (2q−

1), which satisfy that for 1 ≤ j ≤ q, V ∗n (2j − 1) has the same distribution as Vn(2j − 1) and

P(|V ∗

n (2j − 1)− Vn(2j − 1)| > u)≤ 18(‖Vn(2j − 1)‖∞/u)1/2 sup |P (AB)− P (A)P (B)|, (A.44)

where u is any positive value such that 0 < u ≤ ‖Vn(2j− 1)‖∞ <∞ and the supremum is taken

over all sets of A and B in the σ−algebras of events generated by {Vn(1), Vn(3), · · · , Vn(2j −

35

3)} and Vn(2j − 1) respectively. By the definition of Vn(j), we can see that sup |P (AB) −

P (A)P (B)| = γ[rn]. Write

P (W ′n >

η

2) ≤ P

(∣∣∣ q∑j=1

V ∗n (2j − 1)

∣∣∣ > η

4

)+ P

(∣∣∣ q∑j=1

Vn(2j − 1)− V ∗n (2j − 1)

∣∣∣ > η

4

)≡ I1 + I2. (A.45)

We bound I1 as follows. Let λ = 1/{2B1r}. Since |Zi| ≤ B1, λ|Vn(j)| ≤ 1/2, then using the fact

that ex ≤ 1 + x+ x2/2 holds for |x| ≤ 1/2, we have

E{e±λV ∗

n (2j−1)}≤ 1 + λ2E{Vn(j)}2 ≤ eλ

2E{V ∗n (2j−1)}2 . (A.46)

By Markov inequality, (A.46) and the independence of the {V ∗n (2j − 1)}q

j=1, we have

I1 ≤ e−λη/4[E exp

(λ

q∑j=1

V ∗n (2j − 1)

)+ E exp

(− λ

q∑j=1

V ∗n (2j − 1)

)]≤ 2 exp

(− λη/4 + λ2

q∑j=1

E{V ∗n (2j − 1)}2

)≤ 2 exp

{− λη/4 + C2λ

2B2

}. (A.47)

We now bound the term I2 in (A.45). Notice that

I2 ≤q∑

j=1

P(∣∣∣Vn(2j − 1)− V ∗

n (2j − 1)∣∣∣ > η

4q

).

If ‖Vn(2j − 1)‖∞ ≥ η/(4q), substitute η/(4q) for u in (A.44),

I2 ≤ 18q{‖Vn(2j − 1)‖/η/(4q)}1/2γ[rn] ≤ Cq3/2/η1/2γ[rn](rnB1)1/2, (A.48)

If ‖Vn(2j − 1)‖∞ < η/(4q), let u ≡ ‖Vn(2j − 1)‖∞ in (A.44) and we have

I2 ≤ Cqγ[rn],

which is of smaller order than (A.48), if nB1/η →∞. Thus by (A.43), (A.45), (A.47) and (A.48),

P (Wn > η) ≤ 4 exp{−λnη/4 + C2B2λ2n}+ CΨn,

where the constant C is independent of n. �

36

Lemma 5.5 For any x ∈ Rd, let ψx(Xi, Yi) = I(|Xix| ≤ h)ψx(Xix, Yi), a measurable function

of (Xi, Yi) with |ψx(Xi, Yi)| ≤ B and V = Eψ2x(Xi, Yi). Suppose the mixing coefficient γ[k]

satisfies (A.3). Then

Cov(n∑

i=1

|ψx(Xi, Yi)|) = nV[1 + o

{(B2hp+d+1/V

)1−2/ν2}].

Proof. Denote ψx(Xi, Yi) by ψix. First note that

V = Eψ2ix = hd

∫|u|≤1

E(ψ2ix|Xi = x+ hu)f(x+ hu)du,

∑i<j

|Cov(ψix, ψjx)| =n−d∑l=1

(n− l − d+ 1)|Cov(ψ0x, ψlx)| ≤ n

n−d∑l=1

|Cov(ψ0x, ψlx)|

= n

d−1∑l=1

+nπn∑l=d

+nn−d∑

l=πn+1

≡ nJ21 + nJ22 + nJ23,

where πn = h(p+d+1)(2/ν2−1)/a. For J21, there might be an overlap between the components of

X0 and X l, for example, when Xi = (Xi−d, · · · , Xi−1), where {Xi} is a univariate time series.

Without loss of generality, let u′, u′′ and u′′′ of dimensions l, d − l and l respectively, be the

d+ l distinct random variables in (X0x/h,X lx/h). Write u1 = (u′>, u′′>)> and u2 = (u′′>, u′′′>)>.

Then by Cauchy inequality, we have

∣∣∣E(ψ0x, ψlx|X0 = x + hu1

Xl = x + hu2

)∣∣∣ ≤ {E(ψ2

0x|X0 = x+ hu1)E(ψ2jx|Xj = x+ hu2)

}1/2= V/hd (A.49)

and through a transformation of variables, we have

|Cov(ψ0x, ψlx)| ≤ hlV

∫|u1| ≤ 1|u2| ≤ 1

|f(x+ hu1, x+ hu2; l)− f(x+ hu1)f(x+ hu2; l + d− 1)|du′du′′du′′′,

where by (A4) and (A5), the integral is bounded. Therefore,

nJ21 ≤ CnVd−1∑l=1

hl = o(nV ).

For J22, there is no overlap between the components of X0 and X l. Let X0x = hu and X lx = hv

37

and we have

|Cov(ψ0x, ψlx)| ≤ h2d

∫|u| ≤ 1|v| ≤ 1

E(ψ0x, ψlx|X0 = x + hu

Xl = x + hv

)dudv

×[f(x+ hu, x+ hv; l + d− 1)− f(x+ hu)f(x+ hv)]

= ChdV,

where the last equality follows from (A4), (A5) and (A.49). Therefore, as πnhd → 0,

nJ22 = O{nπnhdV } = o(nV ).

For J23, using Davydov’s lemma (Lemma 5.3) we have

|Cov(ψ0x, ψlx)| ≤ 8{γ[l − d+ 1]}1−2/ν2{E|ψix|ν2}2/ν2 , as ν2 > 2. (A.50)

As |ψix| ≤ B, E|Φni|ν2 ≤ Bν2−2V ,

J23 ≤ CB(ν−2)2/ν2V 2/ν2/πan

∞∑l=πn+1

la{γ[l − d+ 1]}1−2/ν2 ,

where the summation term is o(1), as πn →∞. Thus J23 = o{V

(B2hp+d+1/V

)1−2/ν2}

, which

completes the proof. �

Lemma 5.6 Suppose (A2)- (A6) hold. Then for U lni, l = 1, · · · ,m defined in (A.32) and Zni, l =

1, · · · ,Ln defined in (A.13), we have

n∑i=1

E(U lni)

2 +∑i<j

|Cov(U lni, U

lnj)| ≤ CnhdM (1)

n {M (2)n /M (1)

n }1−2/ν2 , (A.51)

n∑i=1

EZ2ni +

∑i<j

|Cov(Zni, Znj)| = nhd(M (1)n )2M (2)

n {M l log n}−2/ν2 , (A.52)

uniformly in xk, 1 ≤ k ≤ Tn.

Proof. We only prove (A.52), which is more involved than (A.51). To simplify the notations,

denote αjl, βkl

, αjland βjl

by α1, β1, α2 and β2, respectively. Clearly,∫ u>Hn(α2+β2)

u>Hnβ2

{ϕni(xk; t)−ϕni(xk; 0)}dt =∫ u>Hn(α2+β1)

u>Hnβ1

{ϕni(xk; t+u>Hn(β2−β1))−ϕni(xk; 0)}dt,

38

and

Zni =∫ u>Hn(α1+β1)

u>Hnβ1

{ϕni(xk; t)− ϕni(xk; 0)}dt−∫ u>Hn(α2+β2)

u>Hnβ2

{ϕni(xk; t)− ϕni(xk; 0)}dt

=∫ u>Hn(α1+β1)

u>Hnβ1

{ϕni(xk; t)− ϕni(xk; t+ u>Hn(β2 − β1))}dt

−∫ u>Hn(α2+β1)

u>Hn(α1+β1){ϕni(xk; t+ u>Hn(β2 − β1))− ϕni(xk; 0)}dt ≡ ∆1 + ∆2.

Therefore, E{Zni}2 = hd∫K2(u)f(xk + hu)E{(∆1 + ∆2)2|Xi = xk + hu}du. The conclusion is

thus obvious observing that by Cauchy inequality and (A.5),

E(∆21|Xi = xk + hu) ≤ |u>Hnα1u

>Hn(β2 − β1)u>Hnα1| ≤ 2(M (1)n )2M (2)

n /(M l log n),

E(∆22|Xi = xk + hu) ≤ {u>Hn(α2 − α1)}2(|u>Hnα2|+ |u>Hnα1|+ 2|u>Hnβ2|)

≤ 4(M (1)n )2M (2)

n /(M l log n)2,

where we used the facts that |α1 − α2| ≤ 2M (1)n /(M l log n) and |β1 − β2| ≤ 2M (2)

n /(M l log n).

Therefore, E{Zni}2 = Chd(M (1)n )2M (2)

n /(M l log n). As |Zni| ≤ CM(1)n and hp+1/M

(2)n <∞, the

rest of the proof can be completed following the proof of Lemma 5.5. �

Lemma 5.7 Suppose (A2)- (A6) hold.

n∑i=1

EΦ2ni +

∑i<j

|Cov(Φni,Φnj)| ≤ Cnhd(M (1)n )2M (2)

n , (A.53)

uniformly in x ∈ D, α ∈ B(1)n and β ∈ B(2)

n .

Proof. By Cauchy inequality and (A.5), we have

EΦ2ni

=hd

∫K2(u)E

[{ ∫ µ(u)>Hn(α+β)

µ(u)>Hnβ

(ϕni(x; t)− ϕni(x; 0)

)dt

}2|Xi = x+ hu

]f(x+ hu)du

≤hd

∫f(x+ hu)K2(u)µ(u)>Hnα

∫ µ(u)>Hn(α+β)

u>HnβE

[(ϕni(x; t)− ϕni(x; 0)

)2|Xi = x+ hu

]dtdu

≤hd

∫K2(u)µ(u)>Hnα

∫ µ(u)>Hn(α+β)

µ(u)>HnβC|t|dtf(x+ hu)du = O

{hd(M (1)

n )2M (2)n

}, (A.54)

uniformly in x ∈ D, α ∈ B(1)n and β ∈ B(2)

n . (A.53) thus follows from (A.54) and Lemma 5.5. �

39

Lemma 5.8 Let (A3)− (A6) hold. Then

supx∈D

|Snp(x)− g(x)f(x)Sp| = O(h+ (nhd/ log n)−1/2) almost surely.

Proof. The result is almost the same as Theorem 2 in Masry (1996). Especially if (A.4) holds,

then the condition (3.8a) there on the mixing coefficient γ[k] is true. �

Lemma 5.9 Denote dn1 = (nhd)1−λ1−2λ2(log n)λ1+2λ2 and let λ1 and B(i)n , i = 1, 2, be as in

Lemma 5.1. Suppose that (A1) − (A5) and (A.2) hold. Then there is a constant C > 0 such

that for each M > 0 and all large n,

supx∈D

supα ∈ B

(1)n ,

β ∈ B(2)n

|n∑

i=1

EΦni(x;α, β)− nhd

2(Hnα)>Snp(x)Hn(α+ 2β)| ≤ CM3/2dn1.

Proof. Recall that G(t, u) = E(ϕ(Y ; t)|X = u),

EΦni(x;α, β) = hd

∫K(u)f(x+ hu)du×

∫ µ(u)>Hn(α+β)

µ(u)>Hnβ(A.55){

G(t+ µ(u)>Hnβp(x), x+ hu)−G(µ(u)>Hnβp(x), x+ hu)}dt.

By (A3) and (A5), we have

G(t+ µ(u)>Hnβp(x), x+ hu)−G(µ(u)>Hnβp(x), x+ hu)

= tG1(µ(u)>Hnβp(x), x+ hu) +t2

2G2(ξn(t, u;x), x+ hu),

G1(µ(u)>Hnβp(x), x+ hu) = g(x+ hu) +O(hp+1),

where ξn(t, u;x) falls between µ(u)>Hnβp(x) and t + µ(u)>Hnβp(x), and the term O(hp+1) is

uniform in x ∈ D. Therefore, the inner integral in (A.55) is given by

12g(x+ hu)(Hnα)>µ(u)µ(u)>Hn(α+ 2β) +O

{M3/2

( log nnhd

)λ1+2λ2}

uniformly in x ∈ D, where we have used the fact that nhd+(p+1)/λ2/ log n <∞. By the definition

of Snp(x), the proof is thus completed. �

40

Lemma 5.10 Under conditions in Theorem 3.2, we have

supx∈D

∣∣∣ 1nhd

WpS−1np (x)H−1

n

n∑i=1

Kh(Xi − x)ϕ(εi)µ(Xi − x)∣∣∣ = O

{( log nnhd

)1/2}almost surely.

Proof. Note that under conditions Theorem 3.2, the assumptions imposed by Masry (1996)

in Theorem 5 hold. Specifically, (4.5) there follows from (A.2), and (4.7b) there from (A.4).

Therefore, mimicking the proof lines there, we can show that

supx∈D

∣∣∣ 1nhd

H−1n

n∑i=1

Kh(Xi − x)ϕ(εi)µ(Xi − x)∣∣∣ = O

{( log nnhd

)1/2},

which together with Lemma 5.8 yields the desired results. �

41

Documents

Uniform Bahadur Representation for Local Polynomial Estimates … · 2009. 10. 26. · OLIVER LINTON† London School of Economics YINGCUN XIA‡ National University of Singapore