22
Asymptotic Properties of M-estimators Based on Estimating Equations and Censored Data JANE-LING WANG University of California, Daris ABSTRACT. Properties of Huber’s M-estimators based on estimating equations have been studied extensively and are well understood for complete (i.i.d.) data. Although the concepts of M-estimators and influence curves have been extended for some time by Reid (1981) to incomplete data that are subject to right censoring, results on the general behavior of M- estimators based on incomplete data remain scattered and restrictive. This paper establishes a general large sample theory for M-estimators based on censored data. We show how to extend any asymptotic result available for M-estimators based on complete data to the case of censored data. The extensions are usually straightforward and include the multiparameter situation. Both the lifetime and censoring distributions may be discontinuous. We illustrate several extensions which provide simple and tractable sufficient conditions for an M- estimator to be strongly consistent and asymptotically normal. The influence curves and asymptotic variance of the M-estimators are also derived. The applicability of the new sufficient conditions is demonstrated through several examples, including location and scale M-estimators. Key words: asymptotic normality of M-estimators, functionals of Kaplan–Meier estimator, influence curve, one-step M-estimator, strong consistency 1. Introduction Let X 1 , ..., X n be n independent observations on a real-valued random variables X with distribution function F, and L L( F) be a parameter of interest from a set ¨ R k . Huber (1964) proposed a class of estimators, called M-estimators, as the solution of an estimating equation of the form l( x, L) dF n ( x) n 1 X n i1 l( X i , L) 0: (1:1) Here, F n is the empirical distribution function based on the observations X 1 , ..., X n , and l is a given function from R 3 ¨ to R k which satisfies El( X , L) l(x, L) dF( x) 0: Examples of M-estimators include maximum likelihood estimators (MLE), (when l( x, L) (@=@L)log f ( x, L), where f is the density function of F), generalized method of moment estimators (when l( x, L) g(x) E L f g(x)g, for some function g), and many robust estimators to be discussed in section 4. The above definition of M-estimators applies to situations where a complete random sample is available. In reality X may be subject to censoring by another random variable C so that one observes only Y min( X , C) and ä 1 f X <Cg , the indicator variable of the censoring. Let (Y i , ä i ), 1 < i < n, denote a random sample of (Y , ä) that one observes, and Y (1) , , Y ( m) denote the m distinct ordered values of Y 9s. When there are ties among the Y 9s, we have m , n. # Board of the Foundation of the Scandinavian Journal of Statistics 1998. Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USAVol 26: 297–318, 1999

Asymptotic Properties of M-estimators Based on - Statistics

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Asymptotic Properties of M-estimatorsBased on Estimating Equations and CensoredData

JANE-LING WANG

University of California, Daris

ABSTRACT. Properties of Huber's M-estimators based on estimating equations have been

studied extensively and are well understood for complete (i.i.d.) data. Although the concepts

of M-estimators and in¯uence curves have been extended for some time by Reid (1981) to

incomplete data that are subject to right censoring, results on the general behavior of M-

estimators based on incomplete data remain scattered and restrictive. This paper establishes

a general large sample theory for M-estimators based on censored data. We show how to

extend any asymptotic result available for M-estimators based on complete data to the case

of censored data. The extensions are usually straightforward and include the multiparameter

situation. Both the lifetime and censoring distributions may be discontinuous. We illustrate

several extensions which provide simple and tractable suf®cient conditions for an M-

estimator to be strongly consistent and asymptotically normal. The in¯uence curves and

asymptotic variance of the M-estimators are also derived. The applicability of the new

suf®cient conditions is demonstrated through several examples, including location and scale

M-estimators.

Key words: asymptotic normality of M-estimators, functionals of Kaplan±Meier estimator,

in¯uence curve, one-step M-estimator, strong consistency

1. Introduction

Let X 1, . . ., Xn be n independent observations on a real-valued random variables X with

distribution function F, and è � è(F) be a parameter of interest from a set È � Rk .

Huber (1964) proposed a class of estimators, called M-estimators, as the solution of an

estimating equation of the form�ø(x, è) dFn(x) � nÿ1

Xn

i�1

ø(Xi, è) � 0: (1:1)

Here, Fn is the empirical distribution function based on the observations X 1, . . ., X n, and

ø is a given function from R 3 È to Rk which satis®es

Eø(X , è) ��ø(x, è) dF(x) � 0:

Examples of M-estimators include maximum likelihood estimators (MLE), (when

ø(x, è) � (@=@è)log f (x, è), where f is the density function of F), generalized method of

moment estimators (when ø(x, è) � g(x)ÿ Eèfg(x)g, for some function g), and many robust

estimators to be discussed in section 4.

The above de®nition of M-estimators applies to situations where a complete random sample

is available. In reality X may be subject to censoring by another random variable C so that one

observes only Y � min(X , C) and ä � 1fX<Cg, the indicator variable of the censoring. Let

(Yi, äi), 1 < i < n, denote a random sample of (Y , ä) that one observes, and Y(1) , � � � , Y(m)

denote the m distinct ordered values of Y 9s. When there are ties among the Y 9s, we have m , n.

# Board of the Foundation of the Scandinavian Journal of Statistics 1998. Published by Blackwell Publishers Ltd, 108 Cowley Road,

Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 26: 297±318, 1999

We assume the above random censorship model in this paper and extend the estimating

equation (1.1) by replacing the empirical distribution function Fn by the Kaplan±Meier

product-limit estimator Fn de®ned as:

1ÿ Fn(x) �Ym

i�1

1ÿ di

ni

� �1fY(i)<xg,

where ni �Pn

j�11fY j>Y(i)g and di �Pn

j�11fY j�Y(i),ä j�1g, denoting the number of individuals

still at risk at time Y(i) and the number of deaths at time Y(i) respectively.

For a given function ø, let

ën(è) ��ø(x, è) dFn(x): (1:2)

De®nition 1

An M-estimator èn for censored data is de®ned as the solution of the estimating equation

ën(è) � 0: (1:3)

Note that ën is a function from È to Rk . Thus there are k equations to be solved

simultaneously in (1.3). The above extension of Huber's M-estimators was ®rst postulated by

Reid (1981) for k � 1. James (1986) derived it differently from (1.1), namely by replacing

ø(X i, è) in (1.1) for an unobserved (censored) X i by an estimate of E(ø(X , è)jX . Yi).

Note that when ø is not continuous in è, a solution sequence for equation (1.3) may not exist.

For example, the ën(è) corresponding to a sample quantile (see example 4.2(e) in section 4)

may change its sign at a sample quantile, but because of a discontinuity may not have a solution.

To include sample quantiles, we use the following more general de®nition of M-estimators for

the one-parameter case (k � 1):

De®nition 2

In the single parameter case with k � 1, any sequence fèng for which ën(è) (de®ned as in

(1.2)) changes sign at è � èn is also called an M-estimator.

A distinction of the M-estimator in de®nition 1 (or de®nition 2) from those de®ned by (1.1)

is that, for ø(x, è) � (@=@è)log f (x, è), they no longer correspond to the MLE when censoring

is present. Oakes (1986) refers to this particular type of M-estimator as the approximate MLE

and points out its computational and potential robustness advantages over the actual MLE.

Borgan (1984) studied the asymptotic properties of the actual MLE. Another type of M-

estimator, based on the cumulative hazard function and aiming at inclusion of the MLEs under

censoring is discussed in Hjort (1985). We focus on Reid's extension of Huber's M-estimators

in this paper. Some discussion on the two types of M-estimators is given at the end of this paper

in section 5.

Asymptotic properties of Huber's M-estimators based on complete data are well understood

nowadays and can be found, for example, in Huber (1981) and Ser¯ing (1980), among others.

For censored data, Wang (1995) recently showed that a full extension of strong consistency

results, such as those of Wald (1949) and Huber (1967), is possible. However, results on

asymptotic normality are sparse and available only under restrictive conditions (Reid, 1981;

Lai & Ying, 1994). For example, the asymptotic normality results in Reid (1981) are applic-

able only to M-estimators whose corresponding ø-function is differentiable and vanishes

outside a compact interval. Such M-estimators are of the truncated type (i.e. they minimize

298 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

� T

0ø(x, è) dFn(x) for some T with F(T ) , 1) and this excludes most useful functionals such

as the mean (with ø(x, è) � xÿ è) and, the parameters in a parametric model. Lai and Ying's

(1994) results for regression M-estimators, though more general, require much stronger

regularity conditions than the classical ones for uncensored data. The goal of this paper is to

establish general asymptotic normality results, which are comparable to those in CrameÂr

(1946), Huber (1964, 1967) and subsequent work. The in¯uence curves of an M-estimator

based on possibly non-differentiable ø-functions are obtained as well. In addition to the

asymptotic normality results, we also derive some simple suf®cient conditions for the strong

consistency of M-estimators. Such conditions supplement those of Wang (1995) based on

topological arguments which are more general but harder to verify.

It turns out that most of the conditions which guarantee strong consistency and asymptotic

normality in the complete data case have an extension to the case of censored data. We

demonstrate in section 3 several such extensions. If ø is either monotone (with k � 1) or

bounded continuous, strong consistency results are easily established as in the complete data

case (cf. theorems 1 and 3). However, asymptotic normality for monotone ø-functions requires

additional care due to the lack of Lindeberg±Feller type conditions for triangular arrays under

random censorship. Huber's (1964) result for monotone ø-functions cannot be extended fully

yet. Since it is important to deal with such ø-functions, especially when ø is not continuous, as

is the case for quantile functions, we illustrate in theorem 2 a modi®ed version of Huber's result

and provide some practical ways of implementing it (see the remark after theorem 2). If k . 1 or

if k � 1 but ø is not monotone, further regularity conditions are required as in the complete

data case. Several such conditions are given in theorems 4 and 5.

Because the suf®cient conditions in section 3 cover most of the frequently encountered M-

estimates, for simplicity of presentation we did not adopt the most general framework. For

example, Huber's (1967) results under non-standard conditions can be extended to censored data

as well, but this is not illustrated in this paper. Instead, we present two cases (example 4.2(c) and

example 4.3(b)) where Huber's conditions are called for. The law of the iterated logarithm can

be treated as well but we choose to omit this. Our main goal in this paper is to illustrate the link

between the complete data and the censoring case and to present a set of simple and tractable

conditions, some of which appear to be new even for complete data.

The main hurdle for the full development of the asymptotic properties of M-estimators is

the lack of a law of large numbers and of a central limit theorem for general functionals,�j dFn, of the Kaplan±Meier estimate. When there is no censoring (hereafter referred to also

as the complete data case), the Kaplan±Meier estimator yields the empirical distribution

function and�j dFn is a sample j-mean. Hence the classical strong law of large munbers

(SLLN) and central limit theorem (CLT) for i.i.d. observations provide suf®cient tools. In the

presence of censoring such fundamental results did not exist for arbitrary functions j until

very recently. Only scattered results were available for special functions j such as indicator

functions or for functions subject to regularity constraints (Gijbels & Veraverbeke, 1991).

Recently, Stute & Wang (1993) and Stute (1995) obtained respectively the most general SLLN

and CLT for�j dFn, with arbitrary j-function. Their results facilitate the asymptotic results

in this paper and are summarized in propositions 1 and 2 in section 2. The SLLN in

proposition 1 is a full and the best possible extension of the classical SLLN to censored data.

Consquently, all existing suf®cient conditions for the strong consistency of M-estimators are

applicable to censored data. Theorem 1 and theorem 3 demonstrate two of such extensions

which are the simplest and possibly the most useful ones. As for the extension of the CLT to

censored data, due to the additional requirement (R3) it is not clear whether the suf®cient

conditions presented in proposition 2 due to Stute (1995) are the best possible ones or not. In

that same paper, Stute gave an example where the CLT fails when the requirement (R3) is not

Scand J Statist 25 Asymptotics of M-estimators 299

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

met. This suggests that the suf®cient conditions in proposition 2 for the CLT to hold are close

to being necessary. Thus, any existing suf®cient conditions for the asymptotic normality of M-

estimators hold, nearly to a full extent, for censored data. Several of such extensions are

presented in theorems 2, 4 and 5.

Another approach to establish the asymptotic normality of M-estimators is to apply the

functional delta method or the so-called von Mises calculus. This is illustrated in Heesterman &

Gill (1992) for complete data. A full extension of their results to censored data is not available

at this point owing to the lack of a censored version of their lem. 4.1. Further research along this

line would be of interest.

A ®nal remark is that the methods and arguments in this paper can be extended to other

types of incomplete data (e.g. truncation, double censoring, interval censoring etc.) or data

subject to sampling bias, where the Kaplan±Meier product-limit estimate Fn will be replaced

by an appropriate estimate, usually the non-parametric maximum likelihood estimate of the

true lifetime distribution function. Such an extension is straightforward whenever, for the

particular choice of Fn, the SLLN and CLT of�j dFn have been established for an arbitrary j

function.

The rest of the paper is organized as follows: section 2 contains some notation and

fundamental results including the two key results on SLLN and CLT. It also lists some of the

regularity assumptions needed for the results in section 3. The main results of this paper are in

section 3. Several examples of M-estimators are given in section 4 and the main results in

section 3 are applied to those examples. Section 5 discusses two extensions of M-estimators due

to Reid and Hjort and con®rms the use of one-step M-estimators. The proofs are given in an

appendix.

2. Preliminaries and assumptions

Consider the random censorship model in section 1 and let F denote the lifetime distribution

of X and G the censoring distribution of C. Assume the independence of X and C. Then

the distribution H of the observation Y � min(X , C) satis®es 1ÿ H � (1ÿ F)(1ÿ G).

2.1. SLLN and CLT for�j dFn

For any speci®ed real function j we state in this section the strong law of large numbers

(SLLN) and the central limit theorem (CLT) for the Kaplan±Meier integral�j dFn. Such

results are essential to study the limiting behaviour of M-estimates in the next section.

For any distribution function L let ôL � supfx: L(x) , 1g denote the upper bound of the

support of L. Let ÄF(x) � F(x)ÿ F(xÿ) denote the probability mass of F at x. Since one can

only observe data in the range of [0, ôH ], it is possible to estimate�j dF consistently only if

ôF � ôH or if j(x) is zero for x > ôH. The speci®c requirement for strong consistency is:

(R1) At least one of (i) or (ii) below hold:

(i) For some b , ôH , j(x) � 0 for b , x < ôH.

(ii) ôF < ôG, where equality may hold except when G is continuous at ôF and ÄF(ôF) . 0.

Note that (R1) (ii) implies ôF � ôH , and is the necessary and suf®cient condition so that F

can be estimated consistently on its entire support. Such a requirement can be dispensed

with only if the j function satis®es requirement (R1) (i) which then results in a truncated

Kaplan±Meier integral. Note that only one of the two, but not both, conditions in (i) and

(ii) needs to hold for (R1).

We state in the next proposition the strong consistency of�j dFn which follows from (R1),

300 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

th. 1.1 and coroll. 1.2 of Stute & Wang (1993). Note that the original strong law in Stute &

Wang (1993) requires further that F and G have no common point of discontinuity. Such a

restriction was later discovered to be dispensable, see Stute (1995) for details.

Proposition 1 (Strong Law of Large Numbers)

Under (R1) and for any j with� jj(x)j dF(x) ,1, it follows that�

j(x) dFn(x)!�j(x) dF(x), with probability one: (2:1)

Moreover, under (R1) (ii), it follows that

supÿ1, x<ô H

jFn(x)ÿ F(x)j ! 0, with probability one: (2:2)

Proposition 1 essentially implies that the law of large numbers for censored data hold under

the same condition, namely the integrability of j, as for the i.i.d. case. The CLT however

requires a little more than the i.i.d. case.

Denote m(y) � p(ä � 1jY � y) and denote the subdistribution functions for the censored

and uncensored observations respectively by

H0(y) � P(Y < y, ä � 0) �� y

ÿ1(1ÿ m(t)) dH(t) �

� y

ÿ1(1ÿ F(t)) dG(t),

H1(y) � P(Y < y, ä � 1) �� y

ÿ1m(t) dH(t) �

� y

ÿ1(1ÿ G(tÿ)) dF(t),

(2:3)

and let the corresponding empirical estimates be denoted by

H jn(y) � nÿ1Xn

i�1

1fYi< y,äi� jg, j � 0, 1: (2:4)

Note that H0 � H1 � H . De®ne

ã0(x) � exp

�1f y , xgdH0(y)

1ÿ H(y)

� �,

ã1(x) � [1ÿ H(x)]ÿ1

�1fx , ygj(y)ã0(y)dH1(y), (2:5)

ã2(x) ��j(z)ã0(z)C(x ^ z)dH1(z),

where

C(x) ��

1f y , xgdH0(y)

[1ÿ H(y)]2��

1f y , xgdG(y)

[1ÿ F(y)][1ÿ G(y)]2: (2:6)

Let U denote the random variable de®ned as:

U � j(Y )ã0(Y )ä� ã1(Y )(1ÿ ä)ÿ ã2(Y )ÿ�j dF: (2:7)

It turns out that E(U ) � 0. The variance of U depends on j, F and G and is denoted by

Scand J Statist 25 Asymptotics of M-estimators 301

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

ó 2(j, F, G) � var(U ) ��j2(y)ã2

0(y) dH1(y)ÿ�ã2

1(y) dH0(y)

ÿ�j dF

� �2

��ã2

1(y)[1ÿ m(y)]2

1ÿ H(y)ÄH(y) dH(y): (2:8)

Clearly, the last integral vanishes for a continuous H .

The additional requirements for the asymptotic normality of�j d(Fn ÿ F) are:

(R2): E[j(Y )ã0(Y )ä]2 ��j2(y)ã2

0(y) dH1(y) ,1,

(R3):

�jj(x)jC1=2(x) dF(x) ,1:

The requirement (R2) is the modi®ed ` ®nite second moment'' assumption on j for censored

data. The requirement (R3) is mainly to control the bias of�j dFn and involves only the ®rst j-

moment. Note that, as Stute (1994) indicates, although the bias�j d(Fn ÿ F) tends to zero, the

rate of convergence may be worse than nÿ1=2. Thus (R3) is required for general js and (R3) is

not necessarily implied by (R2). The function C(x) arises from the variance of a process related

to the cumulative hazard function. When there is no censoring, C � 0, ã0 � 1 and ã2 � 0,

whence (R3) is redundant and (R2) reduces to the usual second moment assumption�j2 dF ,1. Thus proposition 2 below reduces to the central limit theorem for i.i.d. observa-

tions. In the presence of censoring, assumptions (R2) and (R3) are implied by�j2 dF ,1 and

assumption (R1) (i). The latter requirement (R1) (i), that j is non-zero only for x < b , ôH , is

imposed in Reid (1981), Gijbels & Veraverbeke (1991) and excludes many useful examples

including the mean function. Assumptions (R2) and (R3) however, allow js that have non-

compact support.

When the censoring distribution G is continuous, ã0 has a simpler form: ã0(x) �[1ÿ G(x)]ÿ1 by (2.3), and (R2) is equivalent to

�j2=[1ÿ G] dF ,1. This latter assumption is

utilized in Yang (1994) to obtain the asymptotic normality of�j d(Fn ÿ F) for continuous

lifetime distributions F. Note that an additional assumption based on (R3) is required in Yang

(1994) owing to the order of the bias of�j dFn. The variance function (2.8) reduces to

ó 2(j, F, G) ��1ÿ1

j(x)[1ÿ F(x)]ÿ�1

x

ø(t) dF(t)

� �2

[1ÿ H(x)]2dH1(x), (2:9)

when both F and G are continuous.

We now present the asymptotic normality results and in¯uence curves of�j d(Fn ÿ F),

which follow from th. 1 of Stute (1995) and (R1). Note that we adopt Reid's notion of in¯uence

curves.

Proposition 2 (Central Limit Theorem)

Under (R1)±(R3),�j(x) d(Fn ÿ F)(x) � nÿ1

Xn

i�1

Ui � o p (nÿ1=2),

��

IC1(x, j) dH1n(x)��

IC0(x, j) dH0n(x)� op(nÿ1=2),

302 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

where the Uis are i.i.d. copies of the variable U by replacing the Y and ä in (2.7) by Yi

and äi respectively, and the in¯uence curves are

IC0(x, j) � ã1(x)ÿ ã2(x)ÿ�j dF,

IC1(x, j) � j(x)ã0(x)ÿ ã1(x)� IC0(x, j):

(2:10)

Thus, for ó 2(j, F, G) de®ned in (2.8),

n1=2

�j(x) d(Fn ÿ F)(x)! N(0, ó 2(j, F, G)) in distribution: (2:11)

If j is differentiable, upon integration by parts and some additional calculations, (2.10)

becomes

IC0(x, j) � ÿ�

[1ÿ F(t)]C(x ^ t)j9(t) dt,

IC1(x, j) � [1ÿ H(x)]ÿ1

�1x

[1ÿ F(t)]j9(t) dt � IC0(x, ø),(2:12)

and for continuous H the asymptotic variance in (2.11) becomes

ó 2(j, F, G) ��1ÿ1

�1x

j9(t)[1ÿ F(t)] dt

� �2

[1ÿ F(x)]2[1ÿ G(x)]dF(x)

��1ÿ1

�1x

j9(t)[1ÿ F(t)] dt

� �2

[1ÿ H(x)]2dH1(x): (2:13)

The last equality in (2.13) follows from (2.3). A variance estimate can be obtained by

replacing F, H1 and H respectively by their empirical estimates. Note that the above

in¯uence curves in (2.12) and the variance in (2.13) coincide with the in¯uence curves and

variance expressions in Reid (1981, formulas (3.3) and (3.5)).

2.2. Notations and assumptions

Next we de®ne some notations for the asymptotic results in section 3. The parameter space

È is a subset of Rk and è0 denotes the true parameter of interest.

For any function ø(x, è), from R 3 È to Rk , de®ne

ëF(è) ��ø(x, è) dF(x), (2:14)

to be the target of its empirical counterpart ën(è) de®ned in (1.2). Let ø j(x, è) denote the

jth component of ø(x, è). Replace j by ø j(x, è) in (2.5) and (2.7) and denote the

corresponding ãis and U by fãij, i � 0, 1, 2g and U (ø j) respectively. It now follows from

proposition 2, and the multivariate central limit theorem that

n1=2

�ø(x, è)d(Fn ÿ F)(x)

Scand J Statist 25 Asymptotics of M-estimators 303

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

converges in distribution to a multivariate normal distribution with mean zero and covari-

ance matrix C(ø, è, F, G), whose (i, j)-entry is

Cij(ø, è, F, G) � E(U (øi)U (ø j))

� Ef[øi(Y , è)ã0i(Y )ä� ã1i(Y )(1ÿ ä)ÿ ã2i(Y )ÿ�øi(x, è) dF(x)]: (2:15)

[ø j(Y , è)ã0 j(Y )ä� ã1 j(Y )(1ÿ ä)ÿ ã2 j(Y )ÿ�ø j(x, è) dF(x)]g:

Note that Cii(ø, è, F, G) is equal to ó 2(øi, F, G) in (2.8), (2.9) or (2.13).

Let

@

@èø(x, è) � @

@èj

øi(x, è)

� �ij

denote the k 3 k derivative matrix of ø with respect to è and let ËF(t) denote the k 3 k

matrix with

ËF(t) ��@

@èø(x, è)jè� t dF(x): (2:16)

There are two forms of asymptotic variance for M-estimators, one in (2.17) below for the

one-parameter case with k � 1, and one in (2.18) below for the general case with k > 1.

They are given below:

V (ø, F, G) � C(ø, è0, F, G)[ë9F(è0)]ÿ2, for k � 1, (2:17)

and

Ó(ø, F, G) � [ËF(è0)]ÿ1C(ø, è0, F, G)[ËF(è0)T]ÿ1, for k > 1, (2:18)

where AT denote the transpose of a matrix A.

The variance expression in (2.18) requires the differentiability of ø w.r.t. è, while (2.17)

requires the differentiability of ëF instead. The two variance expressions coincide when ø is

differentiable w.r.t. è and upon interchangeability of integration and differentiation in (2.16).

We now list some of the assumptions on ø and ë for the asyumptotic results in the next

section. All the statements involving x hold almost everywhere (w.r.t. Pè0).

Assumptions

(A1). è0 is the unique root of ëF(è) � 0.

(A1�). For k � 1, ëF(è) changes sign only once at è � è0, i.e. ëF(è) , 0 for è, è0 and

ëF(è) . 0 for è.è0, or vice versa.

(A2). For k � 1, ëF(è) is differentiable at è � è0 with ë9F(è0) 6� 0.

(A3). For the case k � 1 and for each z 2 R, the following CLT holds:

limn!1 Prf[C(ø, èz,n, F, G)]ÿ1=2 n1=2

�ø(x, èz,n) d (Fn ÿ F)(x) < zg � Ö(z),

where èz,n � è0 � nÿ1=2z[V (ø, F, G)]ÿ1=2, and V (ø, F, G) is de®ned in (2.17).

(A4). The matrix C(ø, è0, F, G) de®ned in (2.15) is ®nite (i.e. each component of it is ®nite).

Assumptions (A1) and (A1�) correspond to Fisher consistency and only one of them needs to

hold. Assumptions (A2) and (A4) are needed for the existence of the variance of the M-estimate.

Assumption (A3) is needed to establish the asymptotic normality of an M-estimate with

monotone ø-function. Note that (A3) involves a triangular array. When no censoring is present,

304 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

Huber (1964) showed that the Lindeberg±Feller condition implies that (A3) always holds. The

presence of censoring complicates the situation as no Lindelberg type conditions are available

for censored data and one has to verify (A3) individually. Some guidelines ar given at the end of

section 3.1 and several alternative conditions which imply (A3) are listed in corollary 1.

3. Strong consistency and asymptotic normality

In this section suf®cient conditions which imply the strong consistency and asymptotic

normality of M-estimates will be derived. Basically almost all the existing conditions for

complete observations (Huber, 1964, 1967; Boos & Ser¯ing, 1980; Wang, 1985) can be

extended to accommodate censoring via proper adjustment upon replacing the classical

SLLN and CLT by propositions 1 and 2. Chapter 7 of Ser¯ing (1980) and sect. 3.2, sect.

5.2 and ch. 6 of Huber (1981) contain clear and self-contained presentations for such a

situation. For simplicity of illustration we present only those suf®cient conditions that are

easily veri®able. We also add some new results (cf. assumption (A5) (iii) and (iv)). With

a few exceptions they cover basically most of the interesting M-estimates. Interested

readers can make their own extensions of any existing or future results based on complete

data to censored data. To keep the presentation concise we omit overlapping proofs and

refer the reader to the aforementioned references. Non-overlapping proofs are given in an

appendix.

3.1. Single parameter ( k � 1) and monotone ø

For a single parameter, Huber (1964) showed that the asymptotic properties of M-estimates

can be handled easily for ø(x, è) which are monotone in è. The conditions required to

ensure consistency and asymptotic normality are much weaker than for the case when ø is

not monotone.

The presence of censoring does not cause any dif®culty in the extension of strong consistency

results. For example, lem. A on p. 249 of Ser¯ing (1980) and prop. 2.1 and coroll. 2.2 on p. 48

of Huber (1981) can be extended by replacing the classical SLLN by proposition 1 above. The

corresponding results are summarized in theorem 1 and the proofs omitted. It is assumed in this

subsection that k � 1.

Theorem 1 (Strong consistency)

Let ø(x, è) be monotone in è.

(i) Under (A1�) and (R1) (for j(x) � ø(x, è0)) there exists a sequence of M-estimates fèngsatisfying (1.3) and any such sequence fèng converges to è0 with probability one.

(ii) If (A1) holds in addition, then any solution sequence fèng satisfying (1.2) converges to

è0 with probability one. Such a solution sequence fèng exists provided ø(x, è) is

continuous in è in a neighbourhood of è0.

Note that (A1) is satis®ed for monotone ø functions if è0 is an isolated root of ëF(è) � 0 and

(A1�) is satis®ed if è0 is an isolated point where ëF changes sign.

Theorem 1 implies that the M-estimates corresponding to de®nition 2 always exist and are

consistent under (A1�) and (R1) regardless of whether ø is continuous in è or not. The

continuity assumption on ø is needed only if one insists on ®nding the exact root of ën(è) as in

de®nition 1. We show in section 4 that theorem 1 guarantees the strong consistency of many

location estimates.

Scand J Statist 25 Asymptotics of M-estimators 305

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

Next we consider asymptotic normality. Here, the presence of censoring complicates the

situation, largely due to the lack of Lindeberg±Feller type conditions for triangular arrays of the

form�jn dFn, for an arbitrary sequence of functions fjng (note that proposition 2 is for a ®xed

j). An additional condition (A3) for the asymptotic normality of triangular arrays needs to be

veri®ed case by case until a more general form of proposition 2 is available for a sequence of

fjng. We show in section 4 that (A3) holds for many of the monotone ø-functions for location

and scale parameters.

Theorem 2 (Asymptotic normality)

Let ø(x, è) be monotone in è. Assume that C(ø, è, F, G) is ®nite for each è in a

neighbourhood of è0 and is continuous at è � è0. Under (A1) (or (A1�)), (A2), (A3) and

(R1) (for j(x) � ø(x, è0)), any sequence of M-estimates fèng satis®es

n1=2(èn ÿ è0)!D N (0, V (ø, F, G)),

where V (ø, F, G) is de®ned in (2.17). (The strong consistency of èn to è0 is implied by

theorem 1.)

The direct veri®cation of (A3) can be complicated under random censoring. If (R2) and (R3)

hold for j(x) � ø(x, è0), then (A3) is implied by

(A3�) For each z,

�[ø(x, èz,n)ÿ ø(x, è0)] d (Fn ÿ F)(x) � op(nÿ1=2):

In practice one can verify (A3�) by noting that it is implied, via integration by parts, by either

one of the following conditions (C1) or (C2). Hereafter, for a real value function g, its total

variation or variation norm is de®ned as

i gií � supXN�1

j�1

jg(xj)ÿ g(xjÿ1)j,

where the supremum is taken over all N and over all choices of fxjg such that ÿ1 �x0 , x1 , � � � , xN , xN�1 � 1.

(C1) ø(x, è) is differentiable in a neighbourhood of è0, with (@=@è)ø(x, è) continuous in x

and limè!è0i(@=@è)ø(:, è)ií ,1.

(C2) ø(x, è) is continuous in x for è in a neighbourhood of è0. For any b . 0,

limn1=2jèÿè0j, b

n1=2 iø(:, è)ÿ ø(:, è0)ií ,1:

Note that the continuity in x for @ø=@è and ø above is just to ensure that proper integration

by parts can take place. Many of the location and scale M-estimators in section 4 satisfy (A3�).We summarize the above ®ndings in corollary 1.

Corollary 1

Assume that (R2) and (R3) hold for j(x) � ø(x, è0). Then theorem 2 remains true when

(A3) is replaced by any one of (A3�), (C1) or (C2).

306 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

3.2. Multiparameter ( k > 1) case and general ø

When ø is not monotone or in the multiparameter case with k . 1, we need stronger

conditions than those in section 3.1.

Theorem 3 (Strong consistency)

Let ø(x, è) be continuous in è and bounded. Asssume that (R1) holds for j(x) � ø j(x, è0),

1 < j < k.

(i) Under (A1) there exists a sequence of M-estimates fèng which satis®es (1.2) and

converges to è0 with probability one.

(ii) If in addition, there exists a compact set K in Rk such that

infè=2K

�����ø j(x, è) dF(x)

����. 0, for 1 < j < k,

then any sequence of M-estimators fèng, satisfying (1.2), converges to è0 with prob-

ability one.

The asymptotic normality requires smoothness or variational restrictions on ø or

(@=@è)ø(x, è). We present two types of conditions in theorems 4 and 5. Theorem 4 which

extends th. 2.2 of Boos & Ser¯ing (1980) applies to the one-parameter (k � 1) case only, while

theorem 5 is applicable to the multiparameter case (k > 1). Note that the asymptotic variance in

theorem 5 is different in appearance from that of theorems 2 and 4. For the single parameter

case (k � 1), the variance expression (2.17) is less restrictive as it allows non-differentiable ø-

function while (2.18) requires ø to be differentiable w.r.t. è. In case that ø is differentiable w.r.t.

è, the two expressions (2.17) and (2.18) usually coincide.

Theorem 4 (Asmptotic normality, k � 1).

Assume that ø(x, è): R 3 È! R, is continuous in x and satis®es

limè!è0

iø(:, è)ÿ ø(:, è0)iv � 0:

Also assume that (R2) and (R3) hold for j(x) � ø(x, è0). Under (A1), (A2) and (A4) any

sequence of M-estimators fèng satisfying both (1.2) and èn!P è0 is asymptotically normal

with

n1=2(èn ÿ è0)!D N (0, V (ø, F, G)):

Another approach to establish asymptotic normality of M-estimates for non-monotone øfunctions or the multi-parameter situation (k > 1) is to utilize the usual approach for maximum

likelihood estimates by imposing differentiability restrictions on ø. This approach which

originated in CrameÂr (1946), has been studied extensively for complete data. We extend this

approach in theorem 5 to censored data. First, we present a lemma which may be of independent

interest in itself. The proof of lemma 1 is given in the appendix.

Lemma 1

Let g(x, è) be any real function with� jg(x, è0)j dF(x) ,1. Assume that (R1) holds with

j(x) � g(x, è0). For any sequence èn!P è0, it follows that

Scand J Statist 25 Asymptotics of M-estimators 307

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

�g(x, èn) dFn(x)!P

�g(x, è0) dF(x),

provided that any one of the following conditions holds:

(A5) (i) g(x, è) is continuous at è0 uniformly in x,

(ii)�

supfè:jèÿè0j<ägjg(x, è)ÿ g(x, è0)jdF(x) � hä ! 0 as ä! 0.

(iii) g is continuous in x for è in a neighbourhood of è0, and limè!è0i g(:, è)

ÿg(:, è0)iv � 0.

(iv)�

g(x, è) dF(x) is continuous at è � è0, and g is continuous in x for è in a

neighbourhood of è0, and limè!è0i g(:, è)ÿ g(:, è0)iv ,1.

(v)�

g(x, è) dF(x) is continuous at è � è0, and�

g(x, è) dFn(x)!P � g(x, è)

dF(x) ,1, uniformly for è in a neighbourhood of è0.

Theorem 5 (Asymptotic normality, k > 1)

Let ø be differentiable in è for è in a neighbourhood of è0, and let ËF(è0) de®ned in

(2.16) be a ®nite and non-singular k 3 k matrix. Assume that the assumptions of lemma 1

hold for g(x, è) � (@=@è j)øi(x, è0), 1 < i, j < k, and that (R2) and (R3) hold for

j(x) � ø j(x, è0), 1 < j < k. Under (A1) and (A4), any sequence of M-estimates fèngsatisfying (1.2) and èn!P è0 is asymptotically normal with

n1=2(èn ÿ è0)!D N (0, Ó(ø, F, G)),

where Ó(ø, F, G) is de®ned in (2.18).

Note that only one of the conditions in (A5) (i)±(v) needs to hold for

g(x, è) � @

@è j

øi(x, è),

and the condition can vary for different i and j. When k � 1 and (A5) (i) holds, theorem 5

is the censored version of th. B on p. 253 of Ser¯ing (1980). Assumption (A5) (ii)

originates in LeCam (1956) and relaxes Cramer's condition, which further requires ø to be

twice differentiable. Note that (A5) (ii) is satis®ed if ø(x, è) is twice continuously

differentiable at è � è0 for each x, and if for each (i, j) there exists a function Hij(:) such

that ���� @@è j

øi(x, è)

����, Hij(x) and

�Hij dF ,1:

The uniform convergence condition (A5) (v) is based on th. 4.3.8 of Wilks (1962, p. 105).

This condition can be veri®ed following the work of Stute (1976) and will not be

elaborated here. We only make a note here that such a uniform convergence condition

needs to be con®rmed even in the complete data case. This was overlooked in Wilks

(1962, th. (12.3.3)) which was subsequently employed in Saunders & Myhre (1984).

The conditions (A5) (iii), (iv), which were implicit in earlier work on M-estimates for i.i.d.

observations, replaced the variation norm assumption in theorem 4. It should be noted that (A5)

(iii) or (A5) (iv), though imposed on the derivative of ø, do not imply the assumptions on ø in

theorem 4. In many cases, (@=@è j)øi(:, è) is of bounded variation for è in a neighbourhood of

è0. If so, the condition i g(:, è)ÿ g(:, è0)iv � O(1) in (A5) (iv) is satis®ed for g � (@øi=@è j).

308 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

Assumption (A5) (iii) trades the continuity of�

g(x, è) dF(x) at è � è0 in (A5) (iv) for the more

stringent condition that i g(:, è)ÿ g(:, è0)iv � o(1).

3.3. In¯uence curve

In the course of deriving theorems 4 and 5, we have also obtained the in¯uence curves of

an M-estimate. We illustrate this via theorem 5.

It follows from the proof of theorem 5 and proposition 2 that

èn ÿ è0 � ÿ[ËF(è0)]ÿ1

�ø(x, è0) d(Fn ÿ F)(x)� o p(nÿ1=2)

� ÿnÿ1[ËF(è0)]ÿ1Xn

i�1

Ui � o p(nÿ1=2), (3:1)

where

Ui � ø(Yi, è0)ã0(Yi)äi � ã1(Yi)(1ÿ ä)ÿ ã2(Yi),

and ã1, ã2 are vectors in Rk de®ned by (2.5) with j(:) � ø(:, è0). Thus the vectors of

in¯uence curves for the corresponding M-estimator are:

IC1(t, ø) � ÿfËF(è0)]ÿ1[ø(t, è0):ã0(t)ÿ ã2(t)]

IC0(t, ø) � ÿ[ËF(è0)]ÿ1[ã1(t)ÿ ã2(t)]:(3:2)

If ø is not differentiable but ëF is, then under the assumption of theorem 4, equation (3.1)

holds with ËF(è0) replaced by ë9F(è0). The in¯uence curve is thus obtained via (3.2) by

replacing ËF(è0) by ë9F(è0).

4. Applications and examples

For non-censored data, several types of M-estimates have been studied extensively. We

illustrate the corresponding M-estimates based on censored data in this section. The target

parameters are the solution è0 of equation ëF(è) � 0.

Before we proceed, a cautionary remark is in order. In the literature of robust estimation, the

underlying distribution function F is often assumed to be symmetric about a location parameter,

and the corresponding ø-function for an M-estimator is often chosen to be an odd function for a

targeted location parameter, and an even function for a targeted scale parameter. Such choices

of ø-functions often lead to M-estimators with some optimal properties (cf. Huber, 1981;

Hampel et al., 1986) and are thus desirable. However, those optimal choices of ø-functions are

often not suitable for censored lifetime data as lifetime variables, in addition to being non-

negative, are often skewed or do not belong to a location-scale family. One alternative is to

consider, instead of the original lifetime variable T , a transformed lifetime variable (such as log

T ) which is symmetric or belongs to a location-scale family so that standard robust methods are

applicable. When such a transformation approach is not suitable one may then prefer to use a ø-

function that is neither odd nor even.

For example, it is well known that the Kaplan±Meier estimate Fn(x) is unstable for large

values of x. Thus, one may prefer to give those values less in¯uence by choosing a ø-function

that assigns smaller values to large xs (cf. example 4.2 (b) and example 4.3 (a)). Thus, the

choices of the ø-function may differ from those classical choices for complete data. Other

issues of concerns for M-estimators based on censored data are dicussed in section 5.

Scand J Statist 25 Asymptotics of M-estimators 309

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

Example 4.1 (Approximate MLE). If the lifetime distribution F is known to belong to a

parametric family with cumulative distribution function F(x, è) and density function f (x, è),

the MLE based on a complete (i.i.d.) sample X1, . . ., X n corresponds to ø(x, è) �ø1(x, è) � (@=@è)log f (x, è), the score function. If this ø-function is employed in de®nition 1

or de®nition 2 for a censored sample (Yi, äi), i � 1, . . ., n, the corresponding M-estimate,

which was suggested in Oakes (1986), is no longer the MLE which corresponds to solvingXn

i�1

äi

@

@èlog f (Yi, è)� (1ÿ äi)

@

@èlog(1ÿ F(Yi, è))

� �� 0:

Oakes (1986) termed this M-estimate the ` approximate MLE'' since it mimics the actual MLE.

Note that for a location and scale family with densities ó ÿ1 f ((xÿ ì)=ó ), the corresponding

approximate MLEs with ø(x) � ÿd=dx log f (x) are solutions of the equations,�ø

t ÿ ì

ó

� �dFn(t) � 0,�

øt ÿ ì

ó

� �t ÿ ì

óÿ 1

� �dFn(t) � 0:

Example 4.2 (Location estimates). If the parameter of interest is a location parameter è, the

location M-estimate corresponds to ø(x, è) � ø(xÿ è), where the function ø is often chosen to

be odd when the underlying distribution F is symmetric. Note that since the SLLN and CLT

currently available for censored data are restricted to real valued random variables, the more

general M-estimators for vector lifetimes based on a multivariate version of Kaplan±Meier

estimates require further investigation. We study a few one-dimensional location M-estimates

here.

(a) ø2a(x) � x, yields the sample mean for a complete sample X 1, . . ., X n, and the Kaplan±

Meier mean for a censored sample (Yi, äi), i � 1, . . ., n.

(b) For some constants h, k . 0.

ø2b(x) �ÿh, x ,ÿh,

x, ÿh < x < k,

k, x . k,

8<:yields a Winsorized mean or Winsorized Kaplan±Meier mean. When h � k this is often

referred to as Huber's ø-function which yields an ef®cient (or most robust) estimate for

complete data in the sense of minimaxing the asymptotic variance (Huber, 1964). In the

presence of censoring, the ef®ciency of the corresponding M-estimate remains to be

con®rmed. One may prefer to choose different values for h and k, e.g. h . k to further

de-emphasize the in¯uence of large observations due to the dif®culty to estimate the

upper tail of right censored lifetimes. Also, since small values of lifetimes are less likely

to be extreme for lifetime variables one may even choose h � 1 which yields a one-

sided Winsorized mean for censored lifetimes.

(c) For some constants h, k . 0,

ø2c(x) � x, ÿh < x < k,

0, x ,ÿh or x . k,

�yields a trimmed type of mean which has low gross-error sensitivity.

Unlike the Winsorized mean which only restricts the in¯uence of the outliers (and thus

has in®nite gross-error sensitivity), a trimmed mean eliminates the in¯uence of the

310 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

outliers completely. The choices of h and k can be similar to those in part (b). The choice

of h � 1 corresponds to a truncated mean which is often encountered in the literature of

censored lifetime data. Note that this ø-function is non-regular and requires particular

care when dealing with the corresponding asymptotic results. Hampel (1974) suggests a

compromise between the Winsorized mean and trimmed mean through the use of a

redescending M-estimate. An extension of his estimate is given below in part (d).

(d) For some positive constants h , k , m,

ø2d(x) �

x, 0 < x < h,

h, h , x , k,

hmÿ x

mÿ k, k < x < m,

0 x . m:

8>>>><>>>>:Hampel (1974) used an odd ø-function, i.e. ø2d(x) � ÿø2d(ÿx), for x , 0, which for

symmetric distributions yields a relatively ef®cient M-estimate with low gross-error-

sensitivity for complete data. If the life distributions are not symmetric, one may prefer to

use a different but similar shaped redescending ø-function for x , 0. Several smooth

versions of this ø-function are proposed including Tukey's biweight function. We focus

on ø2d above as it is the least regular of all such redescending M-estimates. See Hampel

et al. (1986, sect. 2.6) for more examples of redescending M-estimates.

(e) For 0 , p , 1,

ø2e(x) �ÿ1, x , 0,

0, x � 0,

p=(1ÿ p), x . 0,

8<:yields the sample p-quantile estimate for a complete sample and the Kaplan±Meier p-

quantile estimate for a censored sample.

Example 4.3 (Scale estimates). A scale M-estimate corresponds to ø(x, è) � ø(x=è). The

function ø is often chosen to be even (ø(ÿx) � ø(x)) but for lifetime distributions this may not

be the case. Examples of several scale M-estimates are given here.

(a) For some constants h, k . 0,

ø3a(x) �h2 ÿ â, x ,ÿh

x2 ÿ â, ÿh < x < k,

k2 ÿ â, x . k,

8<:yields a Winsorized variance estimate, where â is a constant determined by

�ø3a

dÖ(x) � 0 and Ö is the standard normal c.d.f. This variance estimate was proposed by

Huber (1964) with the choice h � k. One may choose different values for h and k to

extend Huber's Winsorized variance estimate to a censored sample.

(b) The median absolute deviation (MAD) from zero for a complete sample corresponds to a

ø-function of the form

ø3b(x) � sign(jxj ÿ 1):

Its target parameter is the median of the absolute lifetime jX j. The corresponding M-

estimate is a censored version of the MAD. This ø-function like ø2c in example 4.2 is

non-regular.

Scand J Statist 25 Asymptotics of M-estimators 311

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

Example 4.4 (Simultaneous estimation for multiparameters). Lifetime variables X are often

modelled by several parameters, e.g. scale and shape parameters. In such situations, one can

either locate the M-estimator directly for these parameters, or consider a transformed lifetime,

such as log X for Weibull or log normal lifetime X , so that the resulting transformed lifetime

belongs to a location and scale family. We now focus on a location and scale family with

densities ó ÿ1 f ((xÿ ì)=ó ) and è � (ì, ó ). The approximate MLE in example 4.1 can be

extended to simultaneous M-estimates of location and scale, which are solutions of the two

equations:�ø4a

t ÿ u

ó

� �dFn(t) � 0,

�ø4b

t ÿ u

ó

� �dFn(t) � 0,

where ø4a and ø4b are chosen arbitrarily but properly. For a symmetric lifetime distribution

F, often ø4a will be an odd and ø4b an even function. However, the distributions of the

lifetimes or transformed lifetimes (such as the log of Weibull lifetime) may not be

symmetric. Thus many of the results for odd or even ø-functions in the non-censored

situation are not applicable here. For example, for non-symmetric (transformed) lifetime

variables, the M-estimates for location with preliminary estimates of scale have tractable

but complicated in¯uence curve structure and asymptotic variance expressions. We skip the

details here. Interested readers are referred to pp. 140±141 of Huber (1981).

The M-estimates corresponding to the ø-functions in examples 4.1 and 4.4 are strongly

consistent whenever ø1 is monotone or whenever ø1, ø4a and ø4b are bounded, continuous and

satisfy the assumptions in theorem 3. As for the asymptotic normality, one needs to verify the

assumptions in theorems 2, 4 or 5 depending on the particular form of the ø-function. In

particular, theorems 1 to 5 establish the strong consistency and asymptotic normality of the

approximate MLE for the 2-parameter (scale and shape parameters) Gamma and Weibull

families, and the log-normal distributions. The rest of this section deals with the speci®c ø-

functions in examples 4.2 and 4.3.

We shall assume that the requirements (R1)±(R3) for the SLLN and CLT in propositions 1

and 2 are satis®ed by the life and censoring distributions F and G and the corresponding jfunctions involved. We shall also assume that assumptions (A1), (A1�), (A4) and the assump-

tion in theorem 3 are all satis®ed whenever needed. We thus focus on verifying the other

assumptions that are needed for the strong consistency and asymptotic normality results.

First, we remind the reader again that, if the ø-function involved is neither monotone nor

continuous, more re®ned arguments than those based on theorems 1±5 are needed. For

example, Huber's (1967) results for non-regular situations can be extended to censored data by

properly replacing the classical SLLN and CLT for non-censored data by propositions 1 and 2.

This is applicable, for example, to the non-regular case of ø2c and ø3b, to obtain the strong

consistency and asymptotic normality of the trimmed Kaplan±Meier mean and the censored

MAD.

We now restrict our attention to the remaining regular cases in example 4.2 and example 4.3.

Consider a monotone ø-function ®rst, which includes ø2a, ø2b and ø2e. Theorem 1 implies the

strong consistency of the corresponding M-estimates to their respective target parameters. The

strong consistency of the M-estimates corresponding to ø2d and ø3a follows from theorem 3

since both ø-functions are bounded and continuous.

As for asymptotic normality we start with ø2e. Since it is monotone, but non-continuous, only

theorem 2 or corrollary 1 is applicable. As mentioned earlier, assumption (A3) can often be

312 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

con®rmed via (A3�). In this case (A3�) holds since èz,n ÿ è0 � O(nÿ1=2) and supjFn(x)

ÿF(x)j � op(1) by formula (2.2) in proposition 1. For the same reason, (A3�) is satis®ed by ø2a

and ø2b or any monotone step function. In addition, conditions (C1) and (C2) are both satis®ed

by ø2a. Other than theorem 2, the asymptotic normality of the Kaplan±Meier mean, i.e. the M-

estimate corresponding to ø2a, can also be derived via theorem 4 or theorem 5, since

assumptions (A2), (A5) (i)±(v) are all satis®ed. Also note iø2a(:, è)ÿ ø2a(:, è0)iv � 0 and

ËF (è0) � 1. It now follows from theorem 5 and (2.13) that for continuous H the asymptotic

variance (2.18) for the Kaplan±Meier mean,�1ÿ1 x dFn(x), is equal to�1

ÿ1[1ÿ H(x)]ÿ2

�1x

[1ÿ F(t)]dt

� �2

dH1(x): (4:1)

A variance estimate can be obtained for (4.1) by replacing H(x) by

H n(x) � nÿ1Xn

j�1

I(Y j < x),

F by the Kaplan±Meier estimate Fn, and H1 by H1n in (2.4).

Assumption (A2) is satis®ed for ø2b, ø2d and ø3a if the lifetime distribution F is continuous.

Now check that iø2b(:, è)ÿ ø2b(:, è0)iv and iø2d(:, è)ÿ ø2d(:, è0)iv are both of the order

O(jèÿ è0j). Thus theorem 4 is also applicable to ø2b and ø2d . As for ø3a, we have

iø3a(:, è)ÿ ø3a(:, è0)iv � O(1ÿ (è=è0)2), which also tends to zero as è tends to è0. Thus

theorem 4 implies the asymptotic normality of the respective location and scale M-estimates

corresponding to ø2b, ø2d and ø3a. Note that theorem 5 is not applicable to any of ø2b, ø2d and

ø3a due to non-differentiability. The asymptotic variance of the Winsorized Kaplan±Meier

mean corresponding to ø2b can be obtained from theorem 4 and (2.17) with

ë9F(è0) � F(è0 ÿ h)ÿ F(è0 � k), and

C(ø, è0, F, G) � h2

�è0ÿh

ÿ1

dH1(x)

[1ÿ H(x)]2��è0�k

è0ÿh

�è0�k

x

[1ÿ F(t)] dt

( )2

dH1(x)

[1ÿ H(x)]2,

(4:2)

where è0 is the Winsorized mean of F with�ø2b(xÿ è0) dF(x) � 0.

Theorem 2 and (2.17) also yield the asymptotic variance of the Kaplan±Meier p-quantile

corresponding to ø2d . Here, ë9F(è0) � F9(è0)=(1ÿ p), where F(è0) � p, and the asymptotic

variance is

(1ÿ p)2

F9(è0)

�è0

ÿ1

dH1(x)

[1ÿ H(x)]2, (4:3)

Again, variance estimates can be obtained by replacing the unknown H1, H and F by their

empirical counterparts in (4.2) and (4.3). The variance estimate for the quantile estimate is more

complicated as (4.3) requires density estimation at the p-quantile of F.

5. Further results and discussion

Since theorem 3 and theorem 5 cover the multiparameter situation, they can be applied to

simultaneous M-estimates for location and scale parameters or to estimate the multi-

parameters è in a parametric family of distributions Fè. The computation of the solutions

for equation (1.3) in the multiparameter situation is more subtle than in the one-parameter

case. Typically a closed form solution for equation (1.3) does not exist for censored data

and some iterative procedures for non-linear equations are called for. If the initial estimate

Scand J Statist 25 Asymptotics of M-estimators 313

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

~èn is some n1=2-consistent estimate of è0 (i.e. n1=2(~èn ÿ è0) � O p(1)), then, as in the non-

censored case, one-step iteration suf®ces and the resulting one-step M-estimate has the

same limiting distribution as the fully iterated M-estimate. This result is summarized in the

next theorem.

Theorem 6

Let ~èn be an n1=2-consistent estimate of è0, i.e.

n1=2(~èn ÿ è0) � Op(1), (5:1)

and let è(1)n be its one-step M-estimate de®ned by

è(1)n � ~èn ÿ [ËFn

(~èn)]ÿ1ën(~èn): (5:2)

Under the assumption of theorem 5,

n1=2(è(1)n ÿ è0)!D N (0, Ó(ø, F, G)):

One of the main motivations or goals to study M-estimators is to provide a general analytical

framework for a large sample theory for a wide class of estimators based on estimating

equations. Another one is to achieve various robustness properties through proper choice of the

ø-function in (1.2) for an M-estimator. This paper accomplishes the ®rst goal by extending

theories for complete data to censored data, but leave open most of the robustness issue. For

example, classical choices of ø-functions were geared towards symmetric distributions and the

optimal choices of ø-function for such distributions. Although there are examples in economics

or environmental sciences (for detection limits) where standard, e.g. normal, random variables

are censored, most of the applications for censored data are for lifetime variables which are

often asymmetric. Some discussion on the choice of a ø-function for censored lifetimes is given

in section 4. While it may be dif®cult to retain optimality of a classical M-estimator for censored

lifetimes, it is possible to achieve certain robustness features of M-estimators. Such robustness

issues are beyond the scope of this paper. The asymptotic results and in¯uence curves derived in

this paper will facilitate such further analysis on M-estimators.

A ®nal remark is on the two different extensions of M-estimators to censored data. This paper

concentrates on Reid's extension of Huber's M-estimators. Another type of M-estimators which

is mentioned in section 1 is based on the cumulative hazard function. It was suggested by Hjort

(1985) and further discussed and elaborated in Hjort (1992, sect. 5) and Andersen et al. (1993,

sect. VI.2). The relation and a comparison between these two extensions is as follows:

Consider a parametric model with distribution function F(x, è) and cumulative hazard

function A(x, è). Let An denote the Nelson±Aalen cumulative hazard estimator relating to the

Kaplan±Meier estimate Fn by dFn(x) � (1ÿ Fn(xÿ))dAn(x). The estimating equation (1.3)

may be rewritten as

ën(è) ��ø(x, è)[1ÿ Fn(xÿ)] dAn(x) � 0: (5:3)

Hjort's extension of M-estimators solves, instead of (5.3), the following

ën(è)ÿ�ø(x, è)[1ÿ Fn(xÿ)]J (x) dA(x, è) � 0: (5:4)

where J (x) � 1fY(m)>xg. The subtraction of the second term on the left hand side of (5.4)

is to ensure an unbiased estimating equation (cf. the discussion on the bias of the Kaplan±

Meier integral right after (R2) and (R3) on p. 7), and thus may lead to better small sample

314 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

properties than those based on (1.3) (or (5.3) equivalently). Thus, Hjort's extension (5.4)

seems to be better suited for estimating the parameters in a parametric model than Reid's

extension based in (1.3). However, M-estimators based on (5.4) may be more complicated

to compute than those based on (1.3), and because the extra bias correction term in (5.4)

involves the unknown cumulative hazard function A(x, è) the use of such estimators for

parametric models is actually quite restricted. The de®nition of M-estimators by (1.3) seems

to be more general, and, as illustrated in section 4, covers all the situations known from

the complete data case.

Acknowledgement

The research of this paper is supported in part by two NSF grants DMS-9312170 and

DMS-9404906. The author would like to thank the referees for many insightful suggestions

and a careful reading of the paper.

References

Andersen, P. K., Borgan, é., Gill, R. D. & Keiding, N. (1993). Statistical models based on counting

processes. Springer, New York.

Boos, D. D. & Ser¯ing, R. J. (1980). A note on differentials and the CLT and LIL for statistical functions,

with application to M-estimates. Ann. Statist. 8, 618±624.

Borgan, é. (1984). Maximum likelihood estimation in parametric counting process models with applications

to censored failure time data. Scand. J. Statist. 11, 1±16.

CrameÂr, H. (1946). Mathematical methods of statistics. Princeton University Press, Princeton.

Gijbels, I. & Veraverbeke, N. (1991). Almost sure asymptotic representation for a class of functionals of the

Kaplan±Meier estimator. Ann. Statist. 19, 1457±1470.

Hampel, F. R. (1974). The in¯uence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69,

383±393.

Hampel, F. R., Rousseeuw, P. J., Ronchetti, E. M. & Stahel, W. A. (1986). Robust statistics. Wiley, New York.

Heesterman, C. C. & Gill, R. D. (1992). A central limit theorem for M-estimators by the von Mises method.

Statist. Neerlandica. 46, 165±177.

Hjort, N. L. (1985). Discussion of the paper by P. K. Andersen & é. Borgan. Scand. J. Statist. 12, 141±150.

Hjort, N. L. (1992). On inference in parametric survival data models. Int. Statist. Rev. 60, 355±387.

Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35, 73±101.

Huber, P. J. (1967). The behaviour of maximum likelihood estimates under nonstandard conditions. Proc. 5th

Berkeley Symp. Math. Statist. Probab. 1, 221±233.

Huber, P. J. (1981). Robust statstics. Wiley, New York.

James, I. R. (1986). On estimating equations with censored data. Biometrika 73, 35±42.

Lai, T. Z. & Ying, Z. (1994). A missing information principle and M-estimators in regression analysis with

censored and truncated data. Ann. Statist. 22, 1222±1255.

LeCam, L. (1956). On the asymptotic theory of estimation and testing hypotheses. Proc. 3rd Berkeley Symp.

Math. Statist Probab. 1, 129±156.

Oakes, D. (1986). An approximate likelihood procedure for censored data. Biometrics 42, 177±182.

Reid, N. (1981). In¯uence functions for censored data. Ann. Statist. 9, 78±92.

Saunders, S. C. & Myhre, J. M. (1984). On the behavior of certain maximum likelihood estimators from

large, randomly censored samples. J. Amer. Statist. Assoc. 79, 294±301.

Ser¯ing, R. J. (1980). Approximation theorems of mathematical statistics. Wiley, New York.

Stute, W. (1976). On a generalization of the Glivenko±Cantelli theorem. Z. Wahrsch. verw. Gebiete 35,

167±175.

Stute, W. (1994). The bias of Kaplan±Meier integrals. Scand. J. Statist. 21, 475±484.

Stute, W. (1995). The central limit theorem under random censorship. Ann. Statist. 23, 422±439.

Stute, W. & Wang, J.-L. (1993). The strong law under random censorship. Ann. Statist. 21, 1591±1607.

Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20,

595±601.

Scand J Statist 25 Asymptotics of M-estimators 315

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

Wang, J.-L. (1985). Strong consistency of approximate maximum likelihood estimators with applications in

nonparametrics. Ann. Statist. 13, 932±946.

Wang, J.-L. (1995). M-estimators for censored data: strong consistency. Scand. J. Statist. 22 197±206.

Wilks, S. (1962). Mathematical statistics. Wiley, New York.

Yang, S. (1994). A central limit theorem for functions of the Kaplan±Meier estimator. Statist. Probab. Lett.

21, 337±345.

Received October 1995, in ®nal form April 1998

Jane-Ling Wang, Division of Statistics, University of California, Davis, CA 95616-8705, USA.

Appendix

Proof of theorem 2. Assume that ø(x, è) is non-increasing in è. Similar to the proof of th. A

on p. 251 of Ser¯ing (1980), it suf®ces to show that for èz,n de®ned in (A3),

limn!1 P(ën(èz,n) , 0) � lim

n!1 P(ën(èz,n) < 0) � Ö(z),

where Ö is the distribution function of N (0, 1).

Equivalently, this reduces to show that, for each z,

limn!1

�[C(ø, èz,n, F, G)]ÿ1=2 n1=2

�ø(x, èz,n) d(Fn ÿ F)(x)

< ÿn1=2[C(ø, èz,n, F, G]ÿ1=2ëF(èz,n)

�� Ö(z):

The assumption on C(ø, è, F, G) implies that C(ø, èz,n, F, G)! C(ø, è0, F, G) �V (ø, F, G)[ë9F(è0)]2. Assumption (A2) implies that n1=2ëF(èz,n)! [V (ø, F, G)]ÿ1=2ë9F(è0)z.

Hence it suf®ces to show that (A3) holds.

Proof of theorem 3. (i) This is a multivariate extension of lem. 3 on p. 249 of Ser¯ing (1980).

The proof is similar to the complete data case by replacing the classical SLLN by Proposition 1

whenever applicable.

(ii) The additional assumption ensures that any sequence of M-estimates fèng eventually falls

in a compact neighborhood of è0. Assumptions (A1), (R1) together with the SLLN in propostion

1 then ensure that fèng is strongly consistent.

Proof of theorem 4. Let

h(è) � [ëF(è)ÿ ëF (è0)]=(èÿ è0), if è 6� è0,

� ë9F(è0), if è � è0

Then

èn ÿ è0 � [ë9F(è0)=h(èn)][ën(è0)=ë9F(è0)]

� ÿ[h(èn)]ÿ1

�[ø(x, èn)ÿ ø(x, è0)] d [Fn(x)ÿ F(x)]:

Using the fact that èn!P è0 and applying integration by parts to the right-hand side, it follows

that

n1=2fèn ÿ è0 � [ë9F(è0)=h(èn)][ën(è0)=ë9F(è0)]g � n1=2op(iFn ÿ F i1),

which tends to zero in probability.

316 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

Now apply the fact that h(èn)! h(è0) � ë9F (è0) in probability. The theorem is thus com-

pleted by applying (2.11) in proposition 2 which yields

n1=2ën(è0) � n1=2

�ø(x, è0) d (Fn ÿ F)(x)!D N (0, C(ø, è0, F, G)):

Proof of lemma 1. Consider�g(x, èn) dFn(x)ÿ

�g(x, è0) dF(x)

��

[g(x, èn)ÿ g(x, è0)] dFn(x)��

g(x, è0) d (Fn ÿ F)(x) (A:1)

� IA � IIA:

Proposition 1 implies that the second term IIA in (A.1) tends to zero with probability one.

The ®rst term IA in (A.1) tends to zero in probability under (A5) (i) since iFn ií < 1.

Now consider IA in (A.1) under assumption (A5) (ii). Note that for jèn ÿ è0j < ä,

IA <

�sup

fè:jèÿè0j<ägjg(x, è)ÿ g(x, è0)j dFn(x),

! hä, with probability one as n!1:Let ä! 0 and use the fact that èn!P è0 to obtain that I A!P 0 under (A5) (ii).

Under (A5) (iii), using integration by parts, the ®rst term IA in (A1) also tends to zero in

probability. We have thus shown that lemma 1 holds under (A5) (i)±(iii).

Next consider another decomposition,�g(x, èn) dFn(x)ÿ

�g(x, è0) dF(x)

��

g(x, èn) d(Fn ÿ F)(x)��

[g(x, èn)ÿ g(x, è0)] dF(x) (A:2)

� IB � IIB:

The second term IIB in (A.2) tend to zero in probability under the continuity of�g(x, è) dF(x) at è � è0. The uniform continuity in (A5) (v) implies that the ®rst term IB

in (A.2) tends to zero in probability. This completes the proof of (A5) (v).

As for (A5) (iv), further decompose the ®rst term IB in (A.2) to�[g(x, èn)ÿ g(x, è0)] d(Fn ÿ F)(x)�

�g(x, è0) d(Fn ÿ F)(x) � IIIA � IIIB: (A:3)

The second term IIIB in (A.3) tends to zero with probability one by proposition 1. Under

the assumption of (A5) (iv) and via integration by parts, the ®rst term IIIA in (A.3) also

tends to zero in probability by formula (2.2). Lemma 1 thus holds under (A5) (iv).

Proof of theorem 5. ën(è) is differentiable in è since ø is. The multivariate mean value

theorem thus implies that

ën(èn)ÿ ën(è0) � ËFn(în)(èn ÿ è0),

where jîn ÿ è0j < jèn ÿ è0j and j:j is the Euclidean norm. Since ën(èn) � 0, and

ëF(è0) � 0, we arrive at

Scand J Statist 25 Asymptotics of M-estimators 317

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.

èn ÿ è0 � ÿ[ËFn(în)]ÿ1

�ø(x, è0) d (Fn ÿ F)(x)

� �:

Under assumptions (A5) (i)±(v), lemma 1 implies (with g(x, è) � (@øi=@è j)(x, è)) that the

(i, j) entry of ËFn(în) converges in probability to the (i, j) entry of ËF(è0). The theorem now

follows from proposition 2 and Slutsky's theorem.

Proof of theorem 6.

è(1)n ÿ è0 � ~èn ÿ è0 ÿ [ËFn

(~èn)]ÿ1ën(~èn)

� ÿ[ËFn(~èn)]ÿ1ën(è0)

ÿ [ËFn(~èn)]ÿ1[ën(~èn)ÿ ën(è0)ÿËFn

(è0)(~èn ÿ è0)]

� [ËFn(~èn)]ÿ1[ËFn

(~èn)ÿËFn(è0)](~èn ÿ è0)

� An � Bn � Cn:

Under the assumption of theorem 5 and by proposition 1,

ËFn(~èn)!P ËF(è0), and ËFn

(è0)!P ËF(è0) almost surely:

Hence n1=2Cn!P 0 by (5.1), and

n1=2 Bn � o p(n1=2(~èn ÿ è0)) � o p(1):

Finally,

n1=2 An � ÿ[ËFn(~èn)]ÿ1 n1=2[ën(è0)ÿ ëF(è0)]:

The corollary now follows from Slutsky's theorem and proposition 2 under the assumption of

theorem 5.

318 J.-L. Wang Scand J Statist 25

# Board of the Foundation of the Scandinavian Journal of Statistics 1998.