Nonexistence of consistent estimates in a density estimation problem

ELSEVIER Statistics & Probability Letters 21 (1994) 141-145

Nonexistence of consistent estimates in a density estimation problem

X.R. Chen”,‘, Y. WU~~*‘~

aGraduate School, Academic Sinica. Beijing, China

bDepartment of Mathematics and Statistics, York University, 4700 Keele Street, North York, Ont.. Canada A43J IP3

Received May 1993; revised November 1993

Abstract

In this paper, it is shown that there is no weakly uniformly convergent estimate of a probability density function which is bounded and continuous, but not uniformly continuous, over ( - cc, co ).

Keywords; Consistency; Density estimation

1. Introduction and result

While in the independent and identically distributed (i.i.d) sample framework consistent estimates exist under very general condition, there do exist cases in which no consistent estimate can be found. Sometimes the nonexistence of such estimates seems intuitively clear, but a formal proof is hard to obtain. An example is to estimate the coefficient of the singular part in the representation of an unknown one-dimensional distribution function.

In this paper we shall consider such a problem arising in the theory of density estimation. It is well known that if an unknown density function f is assumed to be uniformly continuous over ( - cc , cc ), then there exists an estimate converging uniformly tofover ( - cc , cc ) with probability one. The problem is: Does this conclusion remain true if we drop the word “uniformly” in the assumption on f? We shall show that the answer is “no”.

Denote by F the family of all one-dimensional densities bounded and continuous over ( - cc, cc ). Let

x1, . . . , x, be i.i.d. samples drawn from a population with density f~ F, and fn -fn (x;xl, . . . ,x,) be an estimate off: Write

4JJ)-4fn,f;x1, ~..,x”)=~uPIL(x;xl, . . ..%--fM. x

*Corresponding author.

‘Supported by National Science Foundation of China and Natural Sciences and Engineering Research Council of Canada. ‘Supported by Natural Sciences and Engineering Research Council of Canada and Faculty Research Grant, Faculty of Arts, York University.

0167-7152/94/$7.00 0 1994 Elsevier Science B.V. All rights reserved SSDI 0167-7152(94)00016-2

142 X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145

Theorem 1.1. There exists no estimatef, such that

d(.Lf) -+ 0, in probability.

2. Proof of Theorem 1.1

This idea of proof is based on the following lemma. Let {PO, 8 E 0) be a family of probability measures on the sample space (X,%9). 0 is a metric space with distance function d, and (0, d) is a subspace of another metric space (O*, d*) (so d(8,, 0,) = d*(Q1, 0,) if 0i E 0, e2 E 0). An estimate T,, = T, (Xl, . . . , X,) of 0 taking values on O* is said to be consistent if d*(T,, 0) + 0 in probability for any 0 E 0.

Lemma 2.1. A necessary condition for the existence of a consistent estimate of 8 is: For any given constant r > 0, a subsequence Ok t 0 can be found such that for any jixed k and 0 E 0 we have

inf {sup{ 1 PO(A) - P,(A) 1: A E B}: cp E 0,: d(8, cp) B r} > 0. (D

(2.1)

Proof. Denote

(!z-, 99, PO) x ... x (5?, 93 Ps) (k-fold)

by (X”, 9Yk, Pi). Define

&(e, cp) = SUP{ 1 P;(A) - Pk,(A)I: A Eak}, 8, up E 0, k = 1,2, . . .

and write g for gi. We proceed to show that

lim g(e, Cpi) = 0 * lim gk(O, vi) = 0, k 2 1. i-m i+m

Indeed, find probability measure ,u so that Pe 6 p, P, 4 p, i 2 1, and write

f = dPel& f;: = dP,/dp, i 2 1.

By g(0, vi) + 0 we have 1 lfi -f Idp + 0. This together with the inequality

fflil m+l

j~lh(Xj) - jJJlf(xj) Lf;( 1< X~+~~l~~1;(~j)-~~/(i)l + (~~f(xj))l~(x.+~)-f(x~+l)l

entails, by induction, that for any k > 1

lim s s

... Ifi “‘ji(xk) -fh) .~.f(%)IdP(X1) .‘. d/k) = 0. i-tm

This proves (2.2).

Now let T, E TJX,, . . . ,X,) be a consistent estimate of 8. Then

lim P;(d*(T,, 0) 3 r/3) = 0, 0 E 0. n+‘x

Put

Ok = (0: 8 ~63; P;(d*(T,, 0) 2 r/3) < 3 for n 2 k), k 2 1.

(2.4

X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145 143

Then Ok t 0. It remain to show (2.1). If (2.1) is not true, we can find k and 8,, E 0 such that

inf{g(oO, cp): cp EOk; d(8,, cp) 2 Y} = 0. (2.3) ‘p

Choosing a larger k if necessary, we can assume that do E Ok. From (2.3) we can find (Pi E Ok, i >, 1, with

d(d0, vi) 3 r, but g(oo, Vi) + 0, hence

lim gk(Q0, Cpi) = 0 (2.4) i+m

as proved earlier. Put

A = {(x,, . ..) xk): d*(Tk(xl, . . . , xd, 4,) d r/J},

Ai = {(XI, . . . ,Xk): d*(Tk(Xl, ... ,Xk), Cpi) G r/3}, i >, 1.

Then by definition of Ok and the fact that o0 E Ok, Cpi E Ok, we have P&(A) 2 4, f’!&(Ai) 2 3. Since d(Bo, cpi) 3 r, we have A n Ai = 0, i 2 1. On the other hand, from (2.4) it follows that

I f%(4 - WA)I G d

for i sufficiently large. Therefore, for such i we have

P;,(A) 2 P&(A) - a 3 : - * = 5.

This brings to a contradiction:

Pzi(A U Ai) = P:,(A) + Pzi(A,) 2 & + * > 1

and the lemma is proved. q

Turning to the proof of the theorem, we define a function h,, b, Jx) as follows

kl,b,h) = a-a/x-bl/c, b-cbxbb+c, o

otherwise, a

where a, b and c are constants with c > 0. For any strictly increasing positive integer sequence m =

i ml, m2, . . . }, define

(2.5) i=l

fm belongs to the family 9 defined in Section 1. Let & r* be the family of all one-dimensional densities bounded over ( - cc, co ). Introducing the uniform distance d*(f,g) = supX If(x) - g(x)) in %*, then we have a framework described at the beginning of this section. We proceed to show that the necessary condition specified in Lemma 2.1 cannot be satisfied, and this will prove the theorem.

Suppose that %-1, %z, . . . are subsets of % such that %-k t %. Since the set of density functions with a form (2.5) is not countable, a positive integer t can be found such that %-t contains uncountably many densities with a form (2.5). Fix this t, and define

%t(c) = {f:fE%t;f has a form (2.5) with ml = c}, c = 1,2, . . . .

144 X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145

Then there exists positive integer my such that F((rny) is not countable. Repeating this argument and by induction, we can find a sequence m” = {ml, m;, . . . } of strictly increasing positive integers such that for any I 2 1 the set

Ft(my, . . . . my) = {f:f~R~;f has a form (2.5) with mi = m:, i = 1, . . ..I}

is not countable. Consider f” =fmo. Find t’ > t such that f ’ EF~,. As

9Jmy, . . . . mp) c Ft c FCC

and Ft(my, . . . , mp) contains an infinite number of elements, we can find positive integers ml+ 1, ml + 2, . . . , such that

my =c ... -c mf < m,+I < m,,, < ..

and mi # rn! for at least one i > 1, andf’ E fmt, where m1 = {my, . . . , mf’, ml+ 1, ml+ 2, . . ) belongs to 9,~. Now since m” # ml, we have, by the definition off& fml and distance function d*, that

d*(f”,f’) = 1.

But

g(f”,f’) < i

IfO(x) -f’(x)Idx d 2 f (2P”i + 2-“‘7 < 22-‘+ 0 i=l+l

as I+ cc. This shows that condition (2.1) does not hold for Y = 1, k = t’ and e. =f”. The theorem is proved.

3. Some remarks

(a) Lemma 2.1 concerns the necessary condition for the existence of consistent estimates. For a sufficient

condition, see, for example, Ibragimov and Hasminski (1981, p. 31). (b) Lemma 2.1 may be useful in other cases. The following is an example in which tJ is an ordinary real

parameter. Suppose that % = 10, l}, and

po(x = 1) = 1 - P,(x = 0)= i 0, 0 is rational,

0 1. 1 _ 8 ’ e is irrational 9

for 0 d d (3.1)

LetX,, . . . . X, be i.i.d samples drawn from this population. We proceed to show that a consistent estimate of 0 cannot be found.

Indeed, take I = a in the lemma, and assume that Ok T 0 = [0, l] IS a sequence of subsets of 0 satisfying (2.1). We deduce a contradiction, and this will show that the condition in the lemma cannot be fulfilled.

Denote by I the interval [$, 21. We assert, under the assumption that (2.1) is true, that for any positive integer k and rational number h E I, there must exist an open interval J c (0,l) containing h SO that any irrational number c EJ does not belong to Ok. If this is not the case we can find positive integer k, rational number h E I, and a sequence of irrational numbers {c,,,} c Ok such that lim,,, c, = h. Since Ok 7 0, we have 1 - h E 0, for some n > k. As 1 - h is rational, c, is irrational and c, -+ h, it follows from (3.1) that

lim g(l - h, c,) = 0, (3.2) m-cc

X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145 145

where g is defined in Section 2. Since 1 - h E O,, c, E Ok c O,, by h E I and c, + h, we have d(1 - h, c,) = I(1 - h) - c, 1 > a = r for m sufficiently large. From this and (3.2) it will follow that (2.1) does not hold for k = n and Be = 1 - h, which is contrary to our assumption. This proves the existence of interval J mentioned above.

Now denote by H = {h,, hZ, . . . } the set of all rational numbers in the interval I. Then by what already

proved, for any two positive integers i, k there exists open interval Jik c (0,l) containing hi such that

(Jik-H)nO,=O, i,k= 1,2, . . . .

Therefore

(3.3)

Since Ok t [0, l] and Jik c (0, I), from (3.3) we have

ii ii Jik - H = 0. k=l i=l

On the other hand, H c u,Y 1 Jik for each k 2 1. Hence

H = li ~ Jik. k=l i=l

So H is a set of G6 type. But this is impossible as H is a countable set. A contradiction is reached, which shows that (2.1) cannot be fulfilled, and a consistent estimate of 0 does not exist.

(c) Theorem 1.1 does not give a solution of the problem proposed at the beginning of Section 1: Whether or not a consistent estimate of the coefficient of the singular part of an unknown distribution. It is easy to see that the condition of the lemma satisfies in this problem, but it is doubtful that a consistent estimate exists.

It should be noted that in the lemma the sequence {ok} usually depends on Y. One may ask if the conclusion of the lemma is still true when {ok} is required to be independent of r. The answer is negative. Without going into details we mention an example, that is the estimation of a density function which is assumed to be uniformly continuous over ( - cc , 00 ). It can be shown that no {ok} independent of Y exists satisfying (2.1) whereas we know that a consistent estimate (in the sense specified in the theorem of Section 1) exists.

Reference

Ibragimov, LA. and R.Z. Hasminski (1981) Statistical Estimation, Asymptotic Theory (Springer, Berlin).

Documents

Nonexistence of consistent estimates in a density estimation problem