Upload
xr-chen
View
219
Download
5
Embed Size (px)
Citation preview
ELSEVIER Statistics & Probability Letters 21 (1994) 141-145
Nonexistence of consistent estimates in a density estimation problem
X.R. Chen”,‘, Y. WU~~*‘~
aGraduate School, Academic Sinica. Beijing, China
bDepartment of Mathematics and Statistics, York University, 4700 Keele Street, North York, Ont.. Canada A43J IP3
Received May 1993; revised November 1993
Abstract
In this paper, it is shown that there is no weakly uniformly convergent estimate of a probability density function which is bounded and continuous, but not uniformly continuous, over ( - cc, co ).
Keywords; Consistency; Density estimation
1. Introduction and result
While in the independent and identically distributed (i.i.d) sample framework consistent estimates exist under very general condition, there do exist cases in which no consistent estimate can be found. Sometimes the nonexistence of such estimates seems intuitively clear, but a formal proof is hard to obtain. An example is to estimate the coefficient of the singular part in the representation of an unknown one-dimensional distribution function.
In this paper we shall consider such a problem arising in the theory of density estimation. It is well known that if an unknown density function f is assumed to be uniformly continuous over ( - cc , cc ), then there exists an estimate converging uniformly tofover ( - cc , cc ) with probability one. The problem is: Does this conclusion remain true if we drop the word “uniformly” in the assumption on f? We shall show that the answer is “no”.
Denote by F the family of all one-dimensional densities bounded and continuous over ( - cc, cc ). Let
x1, . . . , x, be i.i.d. samples drawn from a population with density f~ F, and fn -fn (x;xl, . . . ,x,) be an estimate off: Write
4JJ)-4fn,f;x1, ~..,x”)=~uPIL(x;xl, . . ..%--fM. x
*Corresponding author.
‘Supported by National Science Foundation of China and Natural Sciences and Engineering Research Council of Canada. ‘Supported by Natural Sciences and Engineering Research Council of Canada and Faculty Research Grant, Faculty of Arts, York University.
0167-7152/94/$7.00 0 1994 Elsevier Science B.V. All rights reserved SSDI 0167-7152(94)00016-2
142 X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145
Theorem 1.1. There exists no estimatef, such that
d(.Lf) -+ 0, in probability.
2. Proof of Theorem 1.1
This idea of proof is based on the following lemma. Let {PO, 8 E 0) be a family of probability measures on the sample space (X,%9). 0 is a metric space with distance function d, and (0, d) is a subspace of another metric space (O*, d*) (so d(8,, 0,) = d*(Q1, 0,) if 0i E 0, e2 E 0). An estimate T,, = T, (Xl, . . . , X,) of 0 taking values on O* is said to be consistent if d*(T,, 0) + 0 in probability for any 0 E 0.
Lemma 2.1. A necessary condition for the existence of a consistent estimate of 8 is: For any given constant r > 0, a subsequence Ok t 0 can be found such that for any jixed k and 0 E 0 we have
inf {sup{ 1 PO(A) - P,(A) 1: A E B}: cp E 0,: d(8, cp) B r} > 0. (D
(2.1)
Proof. Denote
(!z-, 99, PO) x ... x (5?, 93 Ps) (k-fold)
by (X”, 9Yk, Pi). Define
&(e, cp) = SUP{ 1 P;(A) - Pk,(A)I: A Eak}, 8, up E 0, k = 1,2, . . .
and write g for gi. We proceed to show that
lim g(e, Cpi) = 0 * lim gk(O, vi) = 0, k 2 1. i-m i+m
Indeed, find probability measure ,u so that Pe 6 p, P, 4 p, i 2 1, and write
f = dPel& f;: = dP,/dp, i 2 1.
By g(0, vi) + 0 we have 1 lfi -f Idp + 0. This together with the inequality
fflil m+l
j~lh(Xj) - jJJlf(xj) Lf;( 1< X~+~~l~~1;(~j)-~~/(i)l + (~~f(xj))l~(x.+~)-f(x~+l)l
entails, by induction, that for any k > 1
lim s s
... Ifi “‘ji(xk) -fh) .~.f(%)IdP(X1) .‘. d/k) = 0. i-tm
This proves (2.2).
Now let T, E TJX,, . . . ,X,) be a consistent estimate of 8. Then
lim P;(d*(T,, 0) 3 r/3) = 0, 0 E 0. n+‘x
Put
Ok = (0: 8 ~63; P;(d*(T,, 0) 2 r/3) < 3 for n 2 k), k 2 1.
(2.4
X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145 143
Then Ok t 0. It remain to show (2.1). If (2.1) is not true, we can find k and 8,, E 0 such that
inf{g(oO, cp): cp EOk; d(8,, cp) 2 Y} = 0. (2.3) ‘p
Choosing a larger k if necessary, we can assume that do E Ok. From (2.3) we can find (Pi E Ok, i >, 1, with
d(d0, vi) 3 r, but g(oo, Vi) + 0, hence
lim gk(Q0, Cpi) = 0 (2.4) i+m
as proved earlier. Put
A = {(x,, . ..) xk): d*(Tk(xl, . . . , xd, 4,) d r/J},
Ai = {(XI, . . . ,Xk): d*(Tk(Xl, ... ,Xk), Cpi) G r/3}, i >, 1.
Then by definition of Ok and the fact that o0 E Ok, Cpi E Ok, we have P&(A) 2 4, f’!&(Ai) 2 3. Since d(Bo, cpi) 3 r, we have A n Ai = 0, i 2 1. On the other hand, from (2.4) it follows that
I f%(4 - WA)I G d
for i sufficiently large. Therefore, for such i we have
P;,(A) 2 P&(A) - a 3 : - * = 5.
This brings to a contradiction:
Pzi(A U Ai) = P:,(A) + Pzi(A,) 2 & + * > 1
and the lemma is proved. q
Turning to the proof of the theorem, we define a function h,, b, Jx) as follows
kl,b,h) = a-a/x-bl/c, b-cbxbb+c, o
otherwise, a
where a, b and c are constants with c > 0. For any strictly increasing positive integer sequence m =
i ml, m2, . . . }, define
(2.5) i=l
fm belongs to the family 9 defined in Section 1. Let & r* be the family of all one-dimensional densities bounded over ( - cc, co ). Introducing the uniform distance d*(f,g) = supX If(x) - g(x)) in %*, then we have a framework described at the beginning of this section. We proceed to show that the necessary condition specified in Lemma 2.1 cannot be satisfied, and this will prove the theorem.
Suppose that %-1, %z, . . . are subsets of % such that %-k t %. Since the set of density functions with a form (2.5) is not countable, a positive integer t can be found such that %-t contains uncountably many densities with a form (2.5). Fix this t, and define
%t(c) = {f:fE%t;f has a form (2.5) with ml = c}, c = 1,2, . . . .
144 X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145
Then there exists positive integer my such that F((rny) is not countable. Repeating this argument and by induction, we can find a sequence m” = {ml, m;, . . . } of strictly increasing positive integers such that for any I 2 1 the set
Ft(my, . . . . my) = {f:f~R~;f has a form (2.5) with mi = m:, i = 1, . . ..I}
is not countable. Consider f” =fmo. Find t’ > t such that f ’ EF~,. As
9Jmy, . . . . mp) c Ft c FCC
and Ft(my, . . . , mp) contains an infinite number of elements, we can find positive integers ml+ 1, ml + 2, . . . , such that
my =c ... -c mf < m,+I < m,,, < ..
and mi # rn! for at least one i > 1, andf’ E fmt, where m1 = {my, . . . , mf’, ml+ 1, ml+ 2, . . ) belongs to 9,~. Now since m” # ml, we have, by the definition off& fml and distance function d*, that
d*(f”,f’) = 1.
But
g(f”,f’) < i
IfO(x) -f’(x)Idx d 2 f (2P”i + 2-“‘7 < 22-‘+ 0 i=l+l
as I+ cc. This shows that condition (2.1) does not hold for Y = 1, k = t’ and e. =f”. The theorem is proved.
3. Some remarks
(a) Lemma 2.1 concerns the necessary condition for the existence of consistent estimates. For a sufficient
condition, see, for example, Ibragimov and Hasminski (1981, p. 31). (b) Lemma 2.1 may be useful in other cases. The following is an example in which tJ is an ordinary real
parameter. Suppose that % = 10, l}, and
po(x = 1) = 1 - P,(x = 0)= i 0, 0 is rational,
0 1. 1 _ 8 ’ e is irrational 9
for 0 d d (3.1)
LetX,, . . . . X, be i.i.d samples drawn from this population. We proceed to show that a consistent estimate of 0 cannot be found.
Indeed, take I = a in the lemma, and assume that Ok T 0 = [0, l] IS a sequence of subsets of 0 satisfying (2.1). We deduce a contradiction, and this will show that the condition in the lemma cannot be fulfilled.
Denote by I the interval [$, 21. We assert, under the assumption that (2.1) is true, that for any positive integer k and rational number h E I, there must exist an open interval J c (0,l) containing h SO that any irrational number c EJ does not belong to Ok. If this is not the case we can find positive integer k, rational number h E I, and a sequence of irrational numbers {c,,,} c Ok such that lim,,, c, = h. Since Ok 7 0, we have 1 - h E 0, for some n > k. As 1 - h is rational, c, is irrational and c, -+ h, it follows from (3.1) that
lim g(l - h, c,) = 0, (3.2) m-cc
X.R. Chen, Y. Wu / Statistics & Probability Letters 21 (1994) 141-145 145
where g is defined in Section 2. Since 1 - h E O,, c, E Ok c O,, by h E I and c, + h, we have d(1 - h, c,) = I(1 - h) - c, 1 > a = r for m sufficiently large. From this and (3.2) it will follow that (2.1) does not hold for k = n and Be = 1 - h, which is contrary to our assumption. This proves the existence of interval J mentioned above.
Now denote by H = {h,, hZ, . . . } the set of all rational numbers in the interval I. Then by what already
proved, for any two positive integers i, k there exists open interval Jik c (0,l) containing hi such that
(Jik-H)nO,=O, i,k= 1,2, . . . .
Therefore
(3.3)
Since Ok t [0, l] and Jik c (0, I), from (3.3) we have
ii ii Jik - H = 0. k=l i=l
On the other hand, H c u,Y 1 Jik for each k 2 1. Hence
H = li ~ Jik. k=l i=l
So H is a set of G6 type. But this is impossible as H is a countable set. A contradiction is reached, which shows that (2.1) cannot be fulfilled, and a consistent estimate of 0 does not exist.
(c) Theorem 1.1 does not give a solution of the problem proposed at the beginning of Section 1: Whether or not a consistent estimate of the coefficient of the singular part of an unknown distribution. It is easy to see that the condition of the lemma satisfies in this problem, but it is doubtful that a consistent estimate exists.
It should be noted that in the lemma the sequence {ok} usually depends on Y. One may ask if the conclusion of the lemma is still true when {ok} is required to be independent of r. The answer is negative. Without going into details we mention an example, that is the estimation of a density function which is assumed to be uniformly continuous over ( - cc , 00 ). It can be shown that no {ok} independent of Y exists satisfying (2.1) whereas we know that a consistent estimate (in the sense specified in the theorem of Section 1) exists.
Reference
Ibragimov, LA. and R.Z. Hasminski (1981) Statistical Estimation, Asymptotic Theory (Springer, Berlin).