Likelihood-tuned Density Estimator Yeojin Chung and Bruce G. Lindsay The Pennsylvania State University Nonparametric Maximum Likelihood Estimator (NPMLE) of a latent distribution Let the likelihood function for an individual observation be and the objective function be Theorem ([5]) Suppose ‘s are nonnegative and bounded. Then there exists a maximum likelihood estimator that is a discrete distribution with no more than D distinct points of support where D is the number of distinct x i . We consider an improved density estimator which arises from treating the kernel density estimator as an element of the model that consists of all mixtures of the kernel, continuous or discrete. One can then "likelihood tune" the kernel density estimator by using it appropriately as the starting value in an EM algorithm. If we do so, then one EM step leads to a fitted density with higher likelihood than the kernel density estimator. The one step EM estimator can be written explicitly, and its bias is one order of magnitude smaller than the kernel estimator. In addition, the order of magnitude of the variance stays of the same order, so that the asymptotic mean square error can be reduced significantly. Compared with other important adaptive density estimators, we find that their biases are in the same order but our estimator is still superior, particularly when the density is small. IDEA 1.Start from the uniform density function 2.Do several EM steps to update Give information about where we need more mass. EM steps for Tuning the Density Estimator STEP 1 Update the derivative of with where STEP 2 Update the density estimator The fixed kernel density estimator can be expressed using the emprical distribution as the deviation from π old to π new Latent density of f With normal kernel , the first EM step gives and the second EM step gives where and . We call as the Likelihood Tuned Density Estimator. Asymptotic Bias of Asymptotic Variance of • The Fixed Kernel Density Estimator The simplest form of kernel method of estimating density is the fixed kernel density estimator where . The bandwidth of kernel function is fixed for all data points and the point of estimation. With the normal kernel, it has Asymptotic Bias Asymptotic Variance .[1] • The Adaptive Denstiy Estimator There have been many studies on improving kernel density estimators with adaptive bandwidths. Abramson [2] found that the adaptive density estimator with the bandwidth reduces the bias significantly. With normal kernel, it has Asymptotic Bias [3] Asymptotic Variance [4] With 100 observations from N(0,1), MISE of each density estimator is calculated by 500 replicates. Compared with the fixed kernel estimator and the adaptive density estimator with Abramson’s bandwidth, the likelihood-tuned estimator has smaller MISE with larger optimal bandwidth than the others. The likelihood tuned estimator has less MSE than the kernel estimator uniformly with respect to x. Although liklihood estimator has a little larger MSE than the adaptive one where its bias is 0, it is still better in the sparse area, which is promising especially in the Fixed kernel density estimator ! [1] M.P. Wand and M.C. Jones, Kernel Smoothing, Chapman & Hall/CRC, 1995. [2] I.S. Abramson, On bandwidth variation in kernel estimates-a square root law, The Annals of Statis- tics 10 (1982), no. 4, 1217– 1223. [3] BW Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, 1986. [4] G.R. Terrell and D.W. Scott, Variable kernel density estimation, The Annals of Statistics 20 (1992), no. 3, 1236–1265. Asymp. Bias of kernel and likelihood estimators are 0 Asymp. Bias of adaptive estimator is 0

Likelihood-tuned Density Estimator Yeojin Chung and Bruce G. Lindsay The Pennsylvania State University Nonparametric Maximum Likelihood Estimator Nonparametric

Download PPT Report

Upload
spencer-miller
View
222
Download
0

Embed Size (px)

Citation preview

Likelihood-tuned Density Estimator

Yeojin Chung and Bruce G. LindsayThe Pennsylvania State University

Nonparametric Maximum Likelihood Estimator (NPMLE) of a latent distributionLet the likelihood function for an individual observation be

and the objective function be

Theorem ([5]) Suppose ‘s are nonnegative and bounded. Then there exists a maximum likelihood estimator that is a discrete distribution with no more than D distinct points of support where D is the number of distinct xi .

We consider an improved density estimator which arises from treating the kernel density estimator as an element of the model that consists of all mixtures of the kernel, continuous or discrete. One can then "likelihood tune" the kernel density estimator by using it appropriately as the starting value in an EM algorithm. If we do so, then one EM step leads to a fitted density with higher likelihood than the kernel density estimator. The one step EM estimator can be written explicitly, and its bias is one order of magnitude smaller than the kernel estimator. In addition, the order of magnitude of the variance stays of the same order, so that the asymptotic mean square error can be reduced significantly. Compared with other important adaptive density estimators, we find that their biases are in the same order but our estimator is still superior, particularly when the density is small.

IDEA1.Start from the uniform density function 2.Do several EM steps to updateGive information about where we need more mass.

EM steps for Tuning the Density EstimatorSTEP 1 Update the derivative of with

where STEP 2 Update the density estimator

The fixed kernel density estimator can be expressed using the emprical distribution as

the deviation from πold to πnew

Latent density of f

With normal kernel , the first EM step gives

and the second EM step gives

where and . We call as the Likelihood Tuned Density Estimator.

Asymptotic Bias of

Asymptotic Variance of • The Fixed Kernel Density EstimatorThe simplest form of kernel method of estimating density is the fixed kernel density estimator

where . The bandwidth of kernel function is fixed for all data points and the point of estimation. With the normal kernel, it hasAsymptotic Bias

Asymptotic Variance .[1]

• The Adaptive Denstiy EstimatorThere have been many studies on improving kernel density estimators with adaptive bandwidths. Abramson [2] found that the adaptive density estimator

with the bandwidth reduces the bias significantly. With normal kernel, it hasAsymptotic Bias [3]

Asymptotic Variance [4]

With 100 observations from N(0,1), MISE of each density estimator is calculated by 500 replicates. Compared with the fixed kernel estimator and the adaptive density estimator with Abramson’s bandwidth, the likelihood-tuned estimator has smaller MISE with larger optimal bandwidth than the others.

The likelihood tuned estimator has less MSE than the kernel estimator uniformly with respect to x. Although liklihood estimator has a little larger MSE than the adaptive one where its bias is 0, it is still better in the sparse area, which is promising especially in the large dimensional case.

Fixed kernel density estimator !

[1] M.P. Wand and M.C. Jones, Kernel Smoothing, Chapman & Hall/CRC, 1995.[2] I.S. Abramson, On bandwidth variation in kernel estimates-a square root law, The Annals of Statis- tics 10 (1982), no. 4, 1217–1223. [3] BW Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, 1986. [4] G.R. Terrell and D.W. Scott, Variable kernel density estimation, The Annals of Statistics 20 (1992), no. 3, 1236–1265.[5] B.G. Lindsay, Mixture Models: Theory, Geometry, and Applications, Ims, 1995.

Asymp. Bias of kernel and likelihood estimators are 0

Asymp. Bias of adaptive estimator is 0