17
Biometrika (2011), xx, x, pp. 1–17 C 2007 Biometrika Trust Printed in Great Britain Asymptotic properties and variance estimators of the M-quantile regression coefficients estimators BY ANNAMARIA BIANCHI DMSIA, Universit` a di Bergamo, [email protected] NICOLA SALVATI DSMAE, Universit` a di Pisa, [email protected] SUMMARY M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence functions. The most appropriate assumption in the presence of gross errors in the data is that of independent not identically distributed (i.n.i.d.) regressors and errors. To the best of our knowledge, there is no existing theory for the asymptotic properties of the M-quantile regression coefficients estimators and the estimation of their variance in this context. The paper proves the consistency and asymptotic normality of the M-quantile regression coefficients es- timators in both the i.i.d. and the i.n.i.d. settings. Estimators of the variance of the M-quantile regression coefficients appropriate to each setting are then proposed. Empirical results show that these estimators appear to perform well under different simulated scenarios. Some key words: Influence function; M-estimation; Taylor expansion; Simulation experiments. 1. I NTRODUCTION Regression analysis is a standard tool for summarizing the average behaviour of a response variable y given a set of covariates x. It has been one of the most important statistical methods for applied research for many decades. However, the mean is not in general an adequate summary of a distribution, and so may be an inappropriate target of inference if one is, for example, inter- ested in the extreme behaviour of y conditional on x. For this reason, Koenker & Bassett (1978) proposed the quantile regression (QR) model, which allows one to characterise the distribution of a response variable given a set of explanatory variables through models for the quantiles of this conditional distribution. In the linear case, quantile regression leads to a family (or ‘ensemble’) of planes indexed by the value of the corresponding percentile coefficient. For each value of q in (0, 1), the corresponding model shows how the q-th quantile of the conditional distribution of y given x, varies with x. In their seminal work, Koenker & Bassett (1978) formalized asymptotic properties of the least absolute deviation estimator of the QR model for independent observations. Quantile regression can be viewed as a generalization of median regression. Similarly, ex- pectile regression (Newey & Powell, 1987) is a ‘quantile-like’ generalization of mean, i.e. stan- dard, regression. M-quantile (MQ) regression (Breckling & Chambers, 1988) extends this idea to a ‘quantile-like’ generalization of regression based on influence functions (M-regression). M- regression was introduced by Huber (1973) as a method of ensuring regression estimates that are robust against the presence of gross errors in the data. He also provided conditions that ensure the existence and asymptotic normality of the M-regression coefficients estimators for the case of

Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

Biometrika (2011), xx, x, pp. 1–17C© 2007 Biometrika TrustPrinted in Great Britain

Asymptotic properties and variance estimators of theM-quantile regression coefficients estimators

BY ANNAMARIA BIANCHI

DMSIA, Universita di Bergamo, [email protected]

NICOLA SALVATI

DSMAE, Universita di Pisa, [email protected]

SUMMARY

M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression basedon influence functions. The most appropriate assumption in the presence of gross errors in thedata is that of independent not identically distributed (i.n.i.d.) regressors and errors. To the bestof our knowledge, there is no existing theory for the asymptotic properties of the M-quantileregression coefficients estimators and the estimation of their variance in this context. The paperproves the consistency and asymptotic normality of the M-quantile regression coefficients es-timators in both the i.i.d. and the i.n.i.d. settings. Estimators of the variance of the M-quantileregression coefficients appropriate to each setting are then proposed. Empirical results show thatthese estimators appear to perform well under different simulated scenarios.

Some key words: Influence function; M-estimation; Taylor expansion; Simulation experiments.

1. INTRODUCTION

Regression analysis is a standard tool for summarizing the average behaviour of a responsevariable y given a set of covariates x. It has been one of the most important statistical methods forapplied research for many decades. However, the mean is not in general an adequate summaryof a distribution, and so may be an inappropriate target of inference if one is, for example, inter-ested in the extreme behaviour of y conditional on x. For this reason, Koenker & Bassett (1978)proposed the quantile regression (QR) model, which allows one to characterise the distributionof a response variable given a set of explanatory variables through models for the quantiles ofthis conditional distribution.

In the linear case, quantile regression leads to a family (or ‘ensemble’) of planes indexed by thevalue of the corresponding percentile coefficient. For each value of q in (0, 1), the correspondingmodel shows how the q-th quantile of the conditional distribution of y given x, varies with x.In their seminal work, Koenker & Bassett (1978) formalized asymptotic properties of the leastabsolute deviation estimator of the QR model for independent observations.

Quantile regression can be viewed as a generalization of median regression. Similarly, ex-pectile regression (Newey & Powell, 1987) is a ‘quantile-like’ generalization of mean, i.e. stan-dard, regression. M-quantile (MQ) regression (Breckling & Chambers, 1988) extends this ideato a ‘quantile-like’ generalization of regression based on influence functions (M-regression). M-regression was introduced by Huber (1973) as a method of ensuring regression estimates that arerobust against the presence of gross errors in the data. He also provided conditions that ensurethe existence and asymptotic normality of the M-regression coefficients estimators for the case of

nicola
Casella di testo
Page 2: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

2

fixed regressors and independent identically distributed (i.i.d.) errors. Yohai & Maronna (1979)provide similar results for stochastic i.i.d. regressors.

The aim of this paper is to study asymptotic properties of the M-quantile regression coeffi-cients estimators. First we show how the results in Huber (1973) apply to this case (as alreadypointed out by Breckling & Chambers, 1988). Next we extend these results to the case of inde-pendent not identically distributed (i.n.i.d.) regressors and disturbances. This is an important casewhich to the best of our knowledge has not been discussed in the literature. This assumption is themost natural in the presence of gross errors in the data. Moreover it is the most appropriate onefor relationships estimated using stratified cross-sectional data, e.g. when data are obtained fromsocial and business Surveys, like the EU-SILC Surveys and the various national Labour ForceSurveys. It is therefore very useful to have conditions which ensure that the familiar asymptoticproperties of robust regression coefficients estimates hold under the i.n.i.d. case. We also proposecovariance matrix estimators for both the i.i.d. and i.n.i.d. situations and prove their consistency.

The structure of the paper is as follows. In Section 2 we present the asymptotic properties ofthe M-quantile regression coefficients estimators and propose a covariance matrix estimator inthe i.i.d. setting. The extension of these results to the i.n.i.d. case is provided in Section 3. Theselarge sample approximations are then validated by a simulation study in Section 4. Finally, inSection 5 we summarize our main findings and provide directions for future research.

2. ASYMPTOTIC PROPERTIES AND COVARIANCE MATRIX ESTIMATION: THE I.I.D. CASE

Assume that the observations {(yi,xTi ), i = 1, . . . , n} are generated by the linear model

yi = xTi β0 + ui, (1)

where {ui} is a sequence of independent identically distributed (i.i.d.) random variables withvariance σ2 > 0, β0 is a vector of unknown parameters and xi are p-dimensional fixed regressors.We will denote Xn the n× p matrix with rows xTi .

The M-quantile regression coefficient estimator (Breckling & Chambers, 1988) is defined asthe vector βq which minimizes

n∑i=1

ρq

(yi − xTi β

σ

), (2)

over β, where ρ is the convex loss function associated with the M-quantile and ρq is de-fined as ρq(u) := |q − I(u < 0)|ρ(u), for any q ∈ (0, 1). The most common choices for ρ areρ(u) = |u|, which corresponds to quantile regression (Koenker & Bassett, 1978), ρ(u) = u2,which leads to expectile regression (Newey & Powell, 1987) and the Huber loss functionρ(u) = (c|u| − 1

2c2)I(|u| > c) + 1

2u2I(|u| ≤ c), which represents a compromise between the

quantile and expectile loss functions. Since ρ is convex, the vector βq can equivalently be de-fined as the solution of the equations

n∑i=1

ψq

(yi − xTi β

σ

)xi = 0, (3)

where ψq(u) = dρq(u)/du = |q − I(u < 0)|ψ(u), with ψ(u) = dρ(u)/du. An iterative solu-tion is needed here to obtain estimates of βq. The application of an iteratively reweighted leastsquares algorithm or the use of the Newton-Raphson algorithm then leads to a solution of (3).Theoretically the vector βq is defined as the vector that minimizes the expectation of (2) or,

Page 3: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

3

equivalently, as the solution of the expected value of equation (3). In order to better understandthe definition of βq, let Q(xi, q) = xTi βq denote the M-quantile of yi. From location and scaleequivariance of Q(xi, q), it follows that Q(xi, q) = xTi β0 +Q(q), where Q(q) is the q-th M-quantile of ui. This implies that

βq = β0 +Q(q)e1, e1 = (1, 0, . . . , 0)T ,

so that changing q only changes the intercept term in βq. On the other hand, when the scale of yialso depends on xi, say ui = (xTi κ)εi, with εi independent of xi and κ ∈ Rp, then

Q(xi, q) = xTi β0 +Q(q)xTi κ = xTi [β0 +Q(q)κ],

whereQ(q) denotes the q-th M-quantile of εi. Hence in this case the slope coefficients in βq alsovary with q (since βq = β0 +Q(q)κ). This case will be considered in more detail in Section 3.

The asymptotic theory for quantile regression and expectile regression models has alreadybeen studied by Koenker & Bassett (1978) and Newey & Powell (1987), respectively. The case ofM-quantile regression with i.i.d. errors and fixed regressors can be easily derived from the resultsin Huber (1973), as pointed out in Breckling & Chambers (1988). For the sake of completenesswe now state these results.For ui(q) = (yi − xTi βq)/σ, ψ′qi = ψ′q(ui(q)) and ψqi = ψq(ui(q)), let

Bn := σ2E[ψ2

q1](E[ψ′q1])2

[1n

n∑i=1

xixTi

]−1

.

Consider the following assumptions:

ASSUMPTION 1. The diagonal elements γii of the projection matrix Xn(XTnXn)−1XT

n sat-isfy

max1≤i≤n

γiin→+∞−→ 0.

ASSUMPTION 2. The function ψ is bounded and non-decreasing and possesses boundedderivatives up to the second order.

ASSUMPTION 3. E[ψqi] = 0 for all i.

THEOREM 1. Under Assumptions 1-3, for each q ∈ (0, 1) βq exists in probability. Moreover

√nB−1/2

n (βq − βq)d−→ N(0, Ip),

where Ip is the identity matrix of size p.

In order to use the previous theorem to build confidence intervals and hypothesis tests, a consis-tent estimator of the asymptotic covariance matrix of βq is needed.

Assume that s is a consistent estimator for σ and define ui(q) := (yi − xTi βq)/s, ψ′qi :=ψ′q(ui(q)) and ψqi = ψq(ui(q)). The analytical form of Bn suggests an estimator already pro-posed by Street et al. (1988) and now extended to covariance matrix of βq:

V arSCR(βq) =1nBn = s2

(n− p)−1∑n

i=1 ψ2qi[

n−1∑n

i=1 ψ′qi

]2 [ n∑i=1

xixTi]−1

. (4)

Page 4: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

4

An approximate confidence interval of level (1− α) for the hth coefficient at q is then

βqh ± z(1−α/2)

√V arSCR(βqh),

where z(1−α/2) denotes the (1− α/2) quantile of the standard normal distribution.In the next theorem we prove consistency of Bn.

THEOREM 2. Under Assumptions 1-3,

Bn −BnP−→ 0.

Proof. Since ψ′qi is bounded and continuous in β, the uniform law of large numbers yields

supβ

∣∣∣∣∣ 1nn∑i=1

ψ′q

(yi − xTi β

σ

)− E

[ψ′q

(yi − xTi β

σ

)]∣∣∣∣∣ P−→ 0.

Since βqP−→ βq and s

P−→ σ, it follows from a slight modification of Lemma 2.6 in White(1980) that

1n

n∑i=1

ψ′qiP−→ E[ψ′q1].

Analogously we can prove that

1n− p

n∑i=1

ψ2qi

P−→ E[ψ2q1].

So the result is a consequence of the continuous mapping theorem. �

The next result shows that using estimator (4) to test linear hypotheses and to construct confi-dence intervals is asymptotically correct.

Corollary. Under Assumptions 1-3,√nB−1/2

n (βq − βq)d−→ N(0, Ip)

and

n(βq − βq)T Bn(βq − βq)d−→ χ2

p.

Proof. Consequence of Theorems 1 and 2 and Lemma 3.3 in White (1980). �

3. ASYMPTOTIC PROPERTIES AND COVARIANCE MATRIX ESTIMATION: THE I.N.I.D. CASE

In the previous Section we stated asymptotic properties of the M-quantile regression coeffi-cients estimators in the case of i.i.d. errors and fixed regressors. However, these assumptions onthe data are quite strong and do not cover many cases which arise in practice. For this reasonwe extend the results presented previously to the case of i.n.i.d. data, which is the most naturalassumption in the presence of gross errors in the data. First we prove asymptotic normality andthen we introduce a new covariance matrix estimator, which is a generalization of the previousone. In Section 4 we will compare the two estimators and see that this new estimator performsbetter in the presence of heteroskedasticity.

Page 5: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

5

Assume that the observations {(yi,xTi ), i = 1, . . . , n} are generated by the linear model (1),with {(xTi , ui)} a sequence of independent not (necessarily) identically distributed (i.n.i.d.) ran-dom vectors, and xi a p-dimensional vector. The value ui is a scalar error with variance σ2

i andβ0 is a vector of unknown parameters. Notice that by assuming that {(xTi , ui)} is an i.n.i.d. se-quence, the case in which observations are obtained not from a controlled experiment but froma stratified cross-sectional sample is covered. Also covered is the case of fixed regressors with(possible) heteroskedastic errors.

Since the errors are no longer i.i.d., the definition of the M-quantile regression coefficientestimator needs to be modified from the definition given in the previous Section. Here βq is thevector that minimizes

n∑i=1

ρq

(yi − xTi β

σi

), (5)

over β, or equivalently it is the solution of the equationsn∑i=1

ψq

(yi − xTi β

σi

)xiσi

= 0. (6)

For ui(q) = (yi − xTi βq)/σi, ψ′qi = ψ′q(ui(q)) and ψqi = ψq(ui(q)), let W−1n VnW−1

n be theasymptotic covariance of βq obtained by Taylor expansion (Binder, 1983) of (6) at βq = βq with

Wn =1n

n∑i=1

1σ2i

E[ψ′qixixTi ]

Vn =1n

n∑i=1

1σ2i

E[ψ2qixix

Ti ].

Notice that if the errors ui are i.i.d. with variance σ2 then we have thatyi − xTi βq = yi − xTi β0 −Q(q) = ui −Q(q), where Q(q) is the q-th M-quantile of ui.It follows that yi − xTi βq are i.i.d. random variables. If moreover the regressors are fixed, thematrices Wn and Vn simplify

Wn = σ−2E[ψ′q1]1n

n∑i=1

xixTi

Vn = σ−2E[ψ2q1]

1n

n∑i=1

xixTi ,

and the asymptotic covariance matrix Bn follows directly.

The asymptotic theory will be developed under the following assumptions.

ASSUMPTION 4. For each i there exists a finite constant Min such that

|xi|√n≤Min a.s. and εn := max

1≤i≤nM2in

n→+∞−→ 0.

ASSUMPTION 5. The matrices Vn and Wn are uniformly positive definite.

ASSUMPTION 6. There exists a finite positive constant ∆ such that for all i E|xij |2 < ∆,j = 1, . . . , p.

Page 6: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

6

ASSUMPTION 7. E[ψqixi] = 0 for all i.

ASSUMPTION 8. The function ψ is bounded and non-decreasing and possesses boundedderivatives up to the second order.

ASSUMPTION 9. The variances σ2i = V ar(ui) are bounded away from zero, σ2

i > σ2MIN for

all i.

Assumption 4 is a generalization of Assumption 1 in Section 2. It is used in the applicationof Lindeberg Central Limit Theorem. The uniform boundedness of the elements of Vn andWn is guaranteed by Assumption 6. Assumption 6 together with Assumption 5 ensure that Vn,Wn and their inverses are uniformly bounded and uniformly positive definite for n sufficientlylarge. The generality of the assumptions about Vn and Wn is necessary because they are notrequired to converge to any particular limit. This case is particularly important in the samplingsituation, where the investigator usually cannot control the experiment to ensure the convergenceof these matrices to some limit. Assumption 7 is an identifiability condition; it guarantees that theexpectation of (5) reaches its minimum at the true value βq. When E[ψqi] = 0, independence ofui(q) and xi is sufficient but not necessary for Assumption 7. Assumption 7 covers also the casein which E[ψqi|xi] = 0, thus allowing heteroskedasticity of the form E[ψ2

qi|xi] = g(xi), whereg is a known function. Finally Assumption 8 is technically convenient. However we believe thatthe existence of higher order derivatives of ψ is not essential for the result to hold.The next theorem demonstrates the asymptotic normality of βq. Its proof is an extension of theproof of the asymptotic normality of β of M-regression in Huber (1973).

THEOREM 3. Under Assumptions 4-9, for each q ∈ (0, 1) βq exists in probability. Moreover

√nV−1/2

n Wn(βq − βq)d−→ N(0, Ip).

Proof. The matrix Wn is positive definite for n sufficiently large. Thus we can define the sym-metric positive definite matrix W−1/2

n such that (W−1/2n )2 = W−1

n . The elements of W−1/2n are

uniformly bounded under Assumptions 5, 6 and 9.Without loss of generality we assume βq = 0 and consider the transformation

zi =1

σi√nW−1/2

n xi β∗ =√nW1/2

n β.

Then β∗q =√nW1/2

n βq is a solution of

n∑i=1

ψq

(ui(q)− zTi β∗q

)zi = 0.

The idea is to compare the zeros of the two random functions:

Φ(β∗) = −n∑i=1

ψq(ui(q)− zTi β∗

)zi

Ψ(β∗) = β∗ −n∑i=1

ψqizi.

Page 7: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

7

The zero β∗q of Φ is corresponds to√nW1/2

n βq, while the zero of Ψ is

β∗q =n∑i=1

ψqizi.

Let a ∈ Rp such that |a| = aTa = 1. Thanks to Assumption 4, the Lindeberg Theorem ap-plied to aT β∗q implies that β∗q is asymptotically normal with mean 0 and covariance matrix∑n

i=1E[ψ2qiziz

Ti ] = W−1/2

n VnW−1/2n .

We will show that |β∗q − β∗q | = oP (1), so that they follow the same law. Obviously, this result

implies that√nβq = W−1/2

n β∗q is asymptotically normal with mean zero and covariance matrixW−1

n VnW−1n . Performing a Taylor expansion of aTΦ(β∗), we have

aTΦ(β∗) = −aT[

n∑i=1

ψqizi +n∑i=1

ψ′qizi(zTi β∗) +

12

n∑i=1

ψ′′q (ui(q)− θzTi β∗)zi(zTi β∗)2]

with 0 < θ < 1. Since∑

iE[ψ′qizizTi ] = Ip, we have that

aT [Φ(β∗)−Ψ(β∗)] =

= aTn∑i=1

[ψ′qizizTi − E(ψ′qiziz

Ti )]β∗ − 1

2

n∑i=1

ψ′′q (ui(q)− θzTi β∗)(zTi a)(zTi β∗)2

=: A+B.

In what follows we show that Φ−Ψ is uniformly bounded in a neighborhood of β∗ = 0, moreprecisely on sets of the form Γ = {(β∗,a) : |β∗|2 ≤ K, |a|2 = 1}. Introduce the notation ri :=zTi a and ti := zTi β∗ and consider the first term

E(A2) =n∑i=1

E(aT [ψ′qizizTi − E(ψ′qiziz

Ti )]β∗)2 ≤

n∑i=1

E(ψ′qiriti)2.

Notice that thanks to Assumptions 5 and 6, |zi|2 ≤ ηM2in/σ

2MIN a.s., where η is a lower bound

for the eigenvalues of W−1/2n . Since |ψ′q(u)| ≤ C ′ we have that

E(A2) ≤ C ′η

σ2MIN

|β∗|2εnn∑i=1

E[ψ′qi|zi|2] ≤ C ′ηKεnp =: K1εn.

Given δ > 0, Markov’s inequality implies that

P

(|A| ≥

(2δK1εn

)1/2)≤ δ

2,

Page 8: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

8

for all (β∗,a) ∈ Γ. As far as the second term is concerned, assuming |ψ′′q (u)| ≤ C ′′,

E|B| ≤ 12σMIN

C ′′ηε1/2n E

(∑i

t2i

)

≤ 12σMIN

C ′′ηε1/2n |β∗|2tr

(E

[∑i

zizTi

])

≤ 12σMIN

C ′′ηKC1pε1/2n =: K2ε

1/2n ,

where we have used the fact that tr(E[∑

i zizTi ])≤ C1p, since the matrix E[

∑i ziz

Ti ] is uni-

formly bounded by Assumption 6. Again, Markov’s inequality yields that

P

(|B| ≥ 2

δK2(εn)1/2

)≤ δ

2.

It follows that

P (|aT [Φ(β∗)−Ψ(β∗)]| < r) ≥ 1− δ,

where r :=((

2δK1

)1/2 + 2δK2

)(εn)1/2. Since this result holds simultaneously for all a with

|a| = 1,

P (|Φ(β∗)−Ψ(β∗)| ≤ r) ≥ 1− δ. (7)

Now, assuming |ψq(u)| ≤ C, we see that

E|β∗|2 ≤ C2C1p.

So from Markov’s inequality it follows that |β∗|2 ≤ K/4 with arbitrarily high probability, pro-vided K is chosen large enough. Now notice that

|β∗ − Φ(β∗)| ≤ |β∗|+ |Ψ(β∗)− Φ(β∗)| ≤√K

2+ r,

on the set |β∗| ≤√K with arbitrarily high probability. Since εn → 0, for n sufficiently large r

is smaller than√K/2, so that |β∗ − Φ(β∗)| ≤

√K for |β∗| ≤

√K. Applying Brouwer’s fixed

point theorem to the function f(β∗) := β∗ − Φ(β∗), we have that Φ(β∗) has a zero β∗ such that|β∗| ≤

√K.

Finally, substituting β∗ into (7), we obtain that |β∗q − β∗q | = oP (1) and the result follows. �

To proceed further, we now need to define an estimator of the asymptotic covariance matrix ofβq for the i.n.i.d. situation. Assume that si is a consistent estimator for σi for each i and defineui(q) := (yi − xTi βq)/si, ψ′qi := ψ′q(ui(q)) and ψqi = ψq(ui(q)). The estimator that we proposeis based on using the well known sandwich approach to estimate Wn and Vn directly from thesample data:

V arSAN (βq) = (n− p)−1nW−1n VnW−1

n (8)

Page 9: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

9

where

Wn =1n

n∑i=1

1s2iψ′qixix

Ti ,

Vn =1n

n∑i=1

1s2iψ2qixix

Ti .

This estimator is based on White (1980) and takes into account the stochastic nature of theregressors, possible heteroskedasticity of the errors and possible non independence of errors andregressors. Note that the factor (n− p)−1n in equation (8) ensures agreement with expression(4) when x = 1.

We now show that the covariance matrix estimator (8) is consistent. This requires the followingadditional assumption.

ASSUMPTION 10. There exist finite positive constants δ,∆ such that for all i E|xijxik|1+δ <∆, for j, k = 1, . . . , p.

THEOREM 4. Under Assumptions 4-10,

W−1n VnW−1

n −W−1n VnW−1

nP−→ 0.

Proof. Let us first prove the consistency of Wn. Its (j, k) component is given by1n

∑ni=1

1s2iψ′qixijxik. Notice that

| 1σ2i

ψ′qixijxik| ≤C ′

σ2MIN

|xijxik| =: m(xi),

with E|m(xi)|1+δ = (C ′/σ2MIN )1+δE|xijxik|1+δ < +∞ by Assumption 10. From Theorem

A.5 in Hoadley (1971) and thanks to the continuity of ψ′q we have that

supβ

∣∣∣∣∣ 1nn∑i=1

1σ2i

ψ′q

(yi − xTi β

σi

)xijxik −

1n

n∑i=1

1σ2i

E

[ψ′q

(yi − xTi β

σi

)xijxik

]∣∣∣∣∣ P−→ 0.

Since βqP−→ βq and si

P−→ σi, it follows from a slight modification of Lemma 2.6 by White

(1980) that |Wn −Wn|P−→ 0. Similarly the convergence of Vn to Vn can be proved. Since

Wn is uniformly positive definite by Assumption 5, thanks to Proposition 2.30 in White (2001)we obtain the result. �

As the following Corollary demonstrates, the variance estimator (8) leads to asymptoticallyvalid tests of linear hypotheses and confidence intervals.

Corollary. Under Assumptions 4-10,√nV−1/2

n Wn(βq − βq)d−→ N(0, Ip)

and

n(βq − βq)TW−1n VnW−1

n (βq − βq)d−→ χ2

p.

Proof. Consequence of Theorems 3 and 4 and Lemma 3.3 by White (1980). �

Page 10: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

10

Technically, the previous results do not hold for the Huber proposal 2 influence function be-cause Assumption 8 is not satisfied. The next theorem shows that in fact they still hold.

THEOREM 5. Under Assumptions 4-7, 9 and 10, with ψ(u) = uI(|u| ≤ c) + c ·sgn(u)I(|u| > c)

W−1n VnW−1

n −W−1n VnW−1

nP−→ 0.

Proof. Convergence of Vn to Vn follows as in Theorem 4. Since in the present case ψ′ is notcontinuous we have to proceed differently to prove consistency of Wn.Since |si − σi|

P−→ 0 and σi > σMIN for any i, |1/s2i − 1/σ2i |

P−→ 0, so that

P

(∣∣∣∣∣ 1nn∑i=1

1s2iψ′qixix

Ti −

1n

n∑i=1

1σ2i

ψ′qixixTi

∣∣∣∣∣ ≥ δ)≤ P

(C ′

n

n∑i=1

∣∣∣∣ 1s2i− 1σ2i

∣∣∣∣ |xi|2 ≥ δ)

≤ C ′ε

δn

n∑i=1

E|xi|2 + oP (1) ≤ ε∆C ′

δ+ oP (1),

where we have used Markov’s inequality, Assumption 10 and the fact that |ψ′qi| ≤ C ′.The previous quantity tends to zero for ε→ 0 and n→ +∞. Now notice that ψ′qi differsfrom ψ′qi only if |yi − xTi βq| ≤ |xTi [βq − βq]| or if −σic < yi − xTi βq < xTi [βq − βq]− sicor if xTi [βq − βq]− sic < yi − xTi βq < −σic or if σic < yi − xTi βq < xTi [βq − βq] + sic orif xTi [βq − βq] + sic < yi − xTi βq < σic. Therefore,

|ψ′qi − ψ′qi| ≤ I(|yi − xTi βq| ≤ |xTi [βq − βq]|)

+I(−σic < yi − xTi βq < xTi [βq − βq]− sic) + I(xTi [βq − βq]− sic < yi − xTi βq < −σic)+I(σic < yi − xTi βq < xTi [βq − βq] + sic) + I(xTi [βq − βq] + sic < yi − xTi βq < σic).

It follows that for any fixed δ > 0 and any ε > 0

P

(∣∣∣∣∣ 1nn∑i=1

1σ2i

ψ′qixixTi −

1n

n∑i=1

1σ2i

ψ′qixixTi

∣∣∣∣∣ ≥ δ)≤ P

(1

nσ2MIN

n∑i=1

|xi|2|ψ′qi − ψ′qi| ≥ δ

)

≤ P (1

nσ2MIN

n∑i=1

|xi|2{I(|yi − xTi βq| ≤ ε|xi|) + I(−σic < yi − xTi βq < ε|xi| − σic+ cε)

+I(−ε|xi| − σic− cε < yi − xTi βq < −σic) + I(σic < yi − xTi βq < ε|xi|+ σic+ cε)

+I(−ε|xi|+ σic− cε < yi − xTi βq < σic)} ≥ δ) + oP (1)

≤ 1δσ2

MINn

n∑i=1

E[|xi|2{I(|yi − xTi βq| ≤ ε|xi|) + I(−σic < yi − xTi βq < ε|xi| − σic+ cε)

+I(−ε|xi| − σic− cε < yi − xTi βq < −σic) + I(σic < yi − xTi βq < ε|xi|+ σic+ cε)

+I(−ε|xi|+ σi − cεc < yi − xTi βq < σic)}] + oP (1),

where the last inequality follows from Markov’s inequality. By the dominated convergencetheorem, the consistency of βq and si and the fact that yi is an absolutely continuous randomvariable, the previous quantity converges to zero for ε→ 0 and n→ +∞.

Page 11: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

11

Applying the triangle inequality, we have

|Wn −Wn| ≤

∣∣∣∣∣Wn −1n

n∑i=1

1σ2i

ψ′qixixTi

∣∣∣∣∣+

∣∣∣∣∣ 1nn∑i=1

1σ2i

ψ′qixixTi −

1n

n∑i=1

1σ2i

ψ′qixixTi

∣∣∣∣∣+

∣∣∣∣∣ 1nn∑i=1

1σ2i

ψ′qixixTi −Wn

∣∣∣∣∣ .Consistency of Wn then follows from the law of large numbers. �

4. SIMULATION STUDY

In this Section we provide results from a simulation study that was used to evaluate the largesample approximations outlined in Sections 2 and 3 and to investigate the performance of thevariance estimators (4) and (8) of the M-quantile regression coefficients. Data are generatedunder the following model

yi = 1 + 2xi + ui i = 1, . . . , 1000

with x drawn from a uniform distribution over the interval [0, 1] for each replication. Five differ-ent settings are considered for the individual effects ui:

(a) Gaussian errors with mean 0 and standard deviation 0.16;(b) 10% contaminated Gaussian errors where 90% of errors are generated from a normal distribu-

tion with mean 0 and standard deviation 0.16 and the remaining 10% of errors are generatedfrom a normal distribution with mean 0 and standard deviation 0.8;

(c) heteroschedastic errors where ui = εi(1 +√

2x) and εis are generated from a normal distri-bution with mean 0 and standard deviation 0.16;

(d) strong heteroschedastic errors where ui = εi exp{3x} and εis are generated from a normaldistribution with mean 0 and standard deviation 0.16;

(e) Chi-squared errors with 3 degrees of freedom.

The above scenarios will enable us to evaluate the large sample approximations and the per-formance of the variance estimators both when the assumption of the Gaussian errors holds andwhen this assumption is violated. Indeed, the first setting considers a situation of ‘regularly’ noisydata. The second one, on the contrary, defines a situation of more noisy data with the likely pres-ence of outlying observations. In the last three settings the assumptions of Gaussian errors modelare violated. A measure of the heteroskedasticity strength is given by λ = max(σ2

i )/min(σ2i ).

Under scenario (c) λ is equal to 3, corresponding to weak heteroskedasticity, whereas it assumesthe value 20.06 under case (d), denoting strong heteroskedasticity. Each scenario is indepen-dently simulated T = 5000 times and samples of size n = 100 are selected from the simulatedpopulation, by simple random sampling. For the M-quantile model the Huber proposal 2 influ-ence function is used with c = 1.345. This value gives reasonably high efficiency in the normalcase - it produces 95% efficiency when the standardized errors are normal - and still offers pro-tection against outliers (Huber, 1981). As estimate of scale the median absolute deviation (MAD)of the residuals is taken:

s =MED{|(yi − xTi βq)−MED|yi − xTi βq||; i = 1, . . . , n}

0.6745. (9)

Page 12: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

12

This estimator is convergent for σ when the error terms are N(0, σ2) (see, for example, Van derVaart, 1998). Under settings (c) and (d) the heteroskedasticity is of the form σi = σv−1/2(xi),where v(·) is a non-negative function. The MAD estimator of σ then becomes

s =MED{|(yi−xT

i βq

v1/2(xi))−MED|yi−xT

i βq

v1/2(xi)||; i = 1, . . . , n}

0.6745,

and the scale σi is estimated by si = sv−1/2(xi).For each type of distribution and for each estimator’s component βk, k = 0, 1 at q =

(0.10, 0.25, 0.50, 0.75, 0.90), Tables 1 and 2 reportr the Monte-Carlo variance,

S2(βk) =1T

T∑t=1

(β(t)k − βk)

2,

where β(t)k is the estimated M-quantile coefficient for the tth replication and βk =

T−1∑T

t=1 β(t)k ;r the estimated variance averaged over the simulations

S2(βk) =1T

T∑t=1

V ar(βk);

r the coverage rate (CR%) of nominal 95 per cent confidence intervals and its mean length. The

coverage of these intervals is defined by the number of times the interval βk ± 2√V ar(βk)

contains the ‘true’ population parameter.

Tables 1 and 2 summarize the results. In particular, Table 1 reports the results for the estima-tor (4), while Table 2 shows the behaviour of the sandwich-type variance estimator (8). Underthe five scenarios the asymptotic variance estimators of the M-quantile regression coefficientsprovide a good approximation to the true variances. In particular, under scenarios (a), (b) and(e) the two estimators show similar performance in terms of point estimation and coverage rate,but estimator (8) has wider confidence intervals than the estimator (4). On the other hand, thesandwich-type variance estimator (8) has better performance than (4) under scenarios (c) and (d).As expected, the heteroskedasticity of the disturbances of the linear model used to generate thedata in these cases causes (4) to become inconsistent.

We now turn to the examination of asymptotic normality of the M-quantile regression coef-ficients estimators. Figures 1 and 2 present normal probability plots of these estimators underthe Gaussian and Chi-squared scenarios at quantiles q = (0.25, 0.50, 0.75). The plots for othercases are not reported here, but are available from the authors upon request. It is observed thatunder the four scenarios a normal approximation of the distribution of the M-quantile regressioncoefficients estimators is reasonable. This is also confirmed by values W of the Shapiro-Wilk’stest with a few exceptions. Under the setting based on Chi-squared errors there are some depar-tures from normality for q = 0.25 and q = 0.75, but they are not severe and we believe that thenormal approximation improves as soon as the sample size increases. This can also be supportedby the fact that the coverage rates of normal confidence intervals for β0 and β1 are about 95%for all percentiles. Consequently, the construction of confidence intervals based on asymptoticnormality seems to be correct. We therefore conclude that the proposed large sample approxima-

Page 13: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

13

Err

ors

β0

β1

S2(β

k)

S2(β

k)

CR

%M

ean

leng

thS

2(β

k)

S2(β

k)

CR

%M

ean

leng

thq

=0.1

00.

0118

0.01

1694

.00.

412

0.03

550.

0347

94.0

0.71

5q

=0.2

50.

0080

0.00

7895

.50.

344

0.02

410.

0235

95.3

0.59

6G

auss

ian

q=

0.5

00.

0069

0.00

6895

.00.

321

0.02

100.

0204

95.3

0.55

7q

=0.7

50.

0079

0.00

7895

.00.

343

0.02

470.

0235

95.0

0.59

6q

=0.9

00.

0118

0.01

1594

.00.

412

0.03

700.

0347

93.7

0.71

4q

=0.1

00.

0409

0.03

5693

.10.

693

0.12

150.

1072

92.8

1.20

3q

=0.2

50.

0120

0.01

2095

.70.

423

0.03

700.

0359

95.3

0.73

4C

onta

min

edq

=0.5

00.

0089

0.00

9095

.90.

369

0.02

760.

0271

95.9

0.64

1q

=0.7

50.

0120

0.01

1995

.30.

422

0.03

720.

0357

95.2

0.73

2q

=0.9

00.

0403

0.03

5893

.30.

694

0.12

360.

1077

93.1

1.20

5q

=0.1

00.

0167

0.01

1690

.00.

412

0.06

570.

0348

85.3

0.71

6q

=0.2

50.

0112

0.00

7891

.00.

344

0.04

430.

0235

86.6

0.55

7H

eter

oske

dast

icq

=0.5

00.

0097

0.00

6891

.00.

321

0.03

890.

0204

85.6

0.55

7q

=0.7

50.

0112

0.07

891

.10.

343

0.04

580.

0236

85.6

0.59

6q

=0.9

00.

0168

0.01

1589

.40.

412

0.06

930.

0347

85.0

0.71

4q

=0.1

00.

0221

0.01

1584

.20.

412

0.15

830.

0346

64.6

0.71

5St

rong

q=

0.2

50.

0153

0.00

7885

.50.

344

0.11

000.

0235

65.5

0.59

7H

eter

oske

dast

icq

=0.5

00.

0134

0.00

6885

.20.

321

0.09

760.

0204

66.1

0.55

7q

=0.7

50.

0154

0.00

7885

.90.

343

0.11

330.

0235

65.0

0.59

6q

=0.9

00.

0226

0.01

1585

.00.

411

0.16

650.

0346

63.7

0.71

4q

=0.1

00.

0660

0.06

1192

.80.

959

0.19

010.

1835

95.0

1.66

3q

=0.2

50.

1013

0.09

4892

.41.

198

0.28

830.

2850

95.5

2.07

8C

hi-s

quar

edq

=0.5

00.

1813

0.16

5991

.71.

585

0.51

210.

4985

95.5

2.74

9q

=0.7

50.

4270

0.43

5894

.32.

548

1.30

311.

3100

95.0

4.42

0q

=0.9

01.

2195

1.26

0093

.34.

213

3.75

473.

779

93.1

7.30

5

Table 1. Empirical variances (S2(βk)), estimated variances (4) averaged over the simulations(S2(βk)), coverage rate (CR%) and average lengths of 95% confidence intervals on βk. Samplesize n=100.

tions are suitable for approximate inference based on the estimators of the βq coefficients of theM-quantile model.

Page 14: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

14

Err

ors

β0

β1

S2(β

k)

S2(β

k)

CR

%M

ean

leng

thS

2(β

k)

S2(β

k)

CR

%M

ean

leng

thq

=0.1

00.

0118

0.01

2393

.00.

415

0.03

550.

0368

93.1

0.72

5q

=0.2

50.

0080

0.00

8095

.00.

344

0.02

410.

0239

94.9

0.59

8G

auss

ian

q=

0.5

00.

0069

0.00

6995

.10.

321

0.02

100.

0206

95.0

0.55

7q

=0.7

50.

0079

0.00

8094

.70.

343

0.02

470.

0239

94.4

0.59

7q

=0.9

00.

0118

0.01

2492

.80.

416

0.03

700.

0370

93.1

0.72

5q

=0.1

00.

0409

0.05

4693

.10.

768

0.12

150.

1684

95.1

1.36

2q

=0.2

50.

0120

0.01

2695

.60.

429

0.03

700.

0378

96.0

0.74

7C

onta

min

edq

=0.5

00.

0089

0.00

9295

.80.

371

0.02

760.

0277

95.6

0.64

5q

=0.7

50.

0120

0.01

2595

.00.

426

0.03

720.

0377

95.2

0.74

5q

=0.9

00.

0403

0.05

4092

.40.

764

0.12

360.

1598

95.9

1.36

3q

=0.1

00.

0167

0.01

7592

.10.

488

0.06

570.

0693

93.3

0.99

3q

=0.2

50.

0112

0.01

1294

.40.

405

0.04

430.

0445

94.6

0.81

6H

eter

oske

dast

icq

=0.5

00.

0097

0.00

9695

.00.

379

0.03

890.

0383

95.3

0.76

0q

=0.7

50.

0112

0.01

1194

.30.

404

0.04

580.

0444

94.6

0.81

5q

=0.9

00.

0168

0.01

7491

.50.

487

0.06

930.

0692

92.7

0.99

0q

=0.1

00.

0221

0.02

3990

.30.

557

0.15

830.

1704

93.8

1.56

1St

rong

q=

0.2

50.

0153

0.01

5393

.50.

470

0.11

000.

1123

95.1

1.29

7H

eter

oske

dast

icq

=0.5

00.

0134

0.01

3294

.90.

442

0.09

760.

0969

95.0

1.21

1q

=0.7

50.

0154

0.01

5394

.00.

471

0.11

330.

1121

95.1

1.29

6q

=0.9

00.

0226

0.02

3390

.60.

552

0.16

650.

1684

93.7

1.55

4q

=0.1

00.

0660

0.06

2791

.90.

964

0.19

010.

1883

95.0

1.67

6q

=0.2

50.

1013

0.09

6191

.21.

200

0.28

830.

2890

95.8

2.08

7C

hi-s

quar

edq

=0.5

00.

1813

0.16

8890

.51.

591

0.51

210.

5089

95.8

2.76

8q

=0.7

50.

4270

0.46

4692

.62.

587

1.30

311.

3960

94.9

4.51

9q

=0.9

01.

2195

1.01

1291

.54.

488

3.75

473.

224

92.5

7.86

7

Table 2. Empirical variances (S2(βk)), estimated variances (8) averaged over the simulations(S2(βk)), coverage rate (CR%) and average lengths of 95% confidence intervals on βk. Samplesize n=100.

5. DISCUSSION

M-quantile regression allows one to investigate the behaviour of the conditional M-quantileregression functions of a response variable in terms of a set of covariates. Like ordinary quantiles,the M-quantiles characterize a distribution, and so the M-quantile regression functions lead to amore complete picture of this relationship. However, little or no work has been done on theasymptotic properties of the M-quantile regression coefficients estimators and on the estimation

Page 15: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

15

-4 -2 0 2 4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

q=0.25

Theoretical Quantiles

Beta0

-4 -2 0 2 4

0.7

0.8

0.9

1.0

1.1

1.2

1.3

q=0.50

Theoretical Quantiles

-4 -2 0 2 4

0.8

1.0

1.2

1.4

q=0.75

Theoretical Quantiles

-4 -2 0 2 4

1.6

1.8

2.0

2.2

2.4

Theoretical Quantiles

Beta1

-4 -2 0 2 4

1.6

1.8

2.0

2.2

2.4

Theoretical Quantiles

-4 -2 0 2 4

1.4

1.6

1.8

2.0

2.2

2.4

2.6

Theoretical Quantiles

Fig. 1. Q–Q plots of estimates of βq for q =0.25, 0.50, 0.75 in the case of Gaussian errors.

-4 -2 0 2 4

-1.0

-0.5

0.0

0.5

1.0

q=0.25

Theoretical Quantiles

Beta0

-4 -2 0 2 4

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

q=0.50

Theoretical Quantiles

-4 -2 0 2 4

01

23

45

q=0.75

Theoretical Quantiles

-4 -2 0 2 4

01

23

4

Theoretical Quantiles

Beta1

-4 -2 0 2 4

-10

12

34

Theoretical Quantiles

-4 -2 0 2 4

-20

24

6

Theoretical Quantiles

Fig. 2. Q–Q plots of estimates of βq for q =0.25, 0.50, 0.75 in the case of Chi-squared errors.

of their variances. This is particularly true for the i.n.i.d. case, which is appropriate when thereare gross errors in the data.

In this paper we prove the consistency and asymptotic normality of the M-quantile regressioncoefficients estimators for this case. Moreover we propose two estimators of the asymptoticvariance of the M-quantile regression coefficients estimator and we prove their consistency. The

Page 16: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

16

empirical results described in the previous Section provide empirical support for the asymptotictheory developed in Sections 2 and 3 and demonstrate that the variance estimators (4) and (8) arereasonably efficient over a wide range of error distributions.

The variance of the M-quantile regression coefficients could have also been estimated witha resampling method. In particular, it can be shown that a ‘one-step’ jackknife estimator of thevariance of the M-quantile regression coefficients is asymptotically equivalent to the sandwichestimator (8). We have evaluated the performance of this ‘one-step’ jackknife estimator and haveobserved that it works as well as the sandwich-type variance estimator (8), except for someextreme quantiles where the ‘one-step’ jackknife estimator showed some under-coverage. Forreasons of space, these results are not reported in the paper. However, they are available from theauthors upon request.

By construction, M-quantile-based estimators are robust against outliers in the distributionsof the random errors. However, the method does not take into account the presence of outliersin the auxiliary variables. In such cases, the M-quantile estimating equations (6) for βq can bemodified as

n∑i=1

diψq

(yi − xTi β

σi

)xiσi

= 0, (10)

where di is a diagonal matrix of weight functions d(xi) which allows one to downweight theoutliers in the auxiliary variables when estimating the parameters of the M-quantile model. Inthis context, we note that a function of the Mahalanobis distance can be used as the weightfunction d(xi) when the x’s are continuous (Sinha, 2004). If the auxiliary variables are discrete,Mallows weights can be used (see de Jongh et al., 1988).

Recently Pratesi et al. (2009) have extended the M-quantile regression model to nonparametricpenalized spline regression, in the sense that the M-quantile regression functions do not have tofollow any particular functional form, but can be left undefined and estimated from the data. Itcould be interesting to explore whether the variance estimators proposed in Sections 2 and 3 canalso be used in this case. Here we note that the asymptotic results would then have to take intoaccount the penalty term and the number and position of knots use in the spline approximation.

REFERENCES

BINDER, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. InternationalStatistical Review 51, 279–292.

BRECKLING, J. & CHAMBERS, R. (1988). M -quantiles. Biometrika 75, 761–771.DE JONGH, P., DE WET, T. & WELSH, A. (1988). Mallows-type bounded-influence-regression trimmed means.

Journal of the American Statistical Association 83, 805–810.HUBER, P. (1973). Robust regression: Asymptotics, conjectures and monte carlo. The Annals of Statistics 1, 799–821.HUBER, P. J. (1981). Robust Statistics. John Wiley & Sons.KOENKER, R. & BASSETT, G. (1978). Regression quantiles. Econometrica 46, 33–50.NEWEY, W. & POWELL, J. (1987). Asymmetric least squares estimation and testing. Econometrica 55, 819–847.PRATESI, M., RANALLI, M. G. & SALVATI, N. (2009). Nonparametric M-quantile regression via penalized splines.

Journal of Nonparametric Statistics 21, 287–304.SINHA, S. (2004). Robust analysis of generalized linear mixed models. Journal of the American Statistical Associa-

tion 99, 451–460.STREET, J., CARROLL, R. & RUPPERT, D. (1988). A note on computing robust regression estimates via iteratively

reweighted least squares. The American Statistician 42, 152–154.VAN DER VAART, A. (1998). Asymptotic Statistics. Cambridge University Press.WHITE, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and direct test for heteroskedasticity.

Econometrica 48, 817–838.WHITE, H. (2001). Asymptotic Theory for Econometricians. Academic Press, San Diego.

Page 17: Asymptotic properties and variance estimators of the M ...€¦ · M-quantile regression is defined as a ‘quantile-like’ generalization of robust regression based on influence

17

YOHAI, V. J. & MARONNA, R. A. (1979). Asymptotic behavior of M -estimators for the linear model. The Annalsof Statistics 7, 258–268.