Kernel Estimation of the Conditional Mode for Fixed Design ... · Kernel Estimation of the...

Kernel Estimation of the Conditional

Mode for Fixed Design Models

تقدير النواة للمنوال الشرطي لنماذج التصميم الثابته

Doaa A. ELhertaniy

Supervised by

prof. Raid B. Salha

prof. of Mathematical Statistics

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree

of Master of Science in Mathematical Sciences

October /2017

زةــغ – ةــلاميــــــة الإســـــــــامعـالج

حث العلمي والدراسات العلياـبشئون ال

ة العـلـــــــــــــــــــــومــــــــــــــــــــليـك

اضـيــةالـريــــوم ـــماجستيـــــــر العلـ

The Islamic University – Gaza

Research and Postgraduate Affairs

Faculty of Science

Master of Mathematical Science

إقــــــــــــــرار

أنا الموقع أدناه مقدم الرسالة التي تحمل العنوان:

تقدير النواة للمنوال الشرطي لنماذج التصميم الثابته

أقر بأن ما اشتملت عليه هذه الرسالة إنما هو نتاج جهدي الخاص، باستثناء ما تمت الإشارة إليه حيثما ورد،

لنيل درجة أو لقب علمي أو بحثي لدى أي الاخرين وأن هذه الرسالة ككل أو أي جزء منها لم يقدم من قبل

مؤسسة تعليمية أو بحثية أخرى.

Declaration

I understand the nature of plagiarism, and I am aware of the University’s policy on

The work provided in this thesis, unless otherwise referenced, is the researcher's own

work, and has not been submitted by others elsewhere for any other degree or

qualification.

اسم الطالب:

:Student's Name دعاء عبد الرحمن الحرتاني

:Signature دعاء الحرتاني التوقيع:

:Date 2017 / 8/10 التاريخ:

Doaa Abd ELrahman ELhertaniy

October 8, 2017

Abstract

We are interested in the area of nonparametric prediction, therefore, we study the

relationship between a current observation and previous observations, where the

conditional density function plays an important role.

In this thesis, we study the problem of estimating nonparametrically the condi-

tional mode for fixed design models. We suppose the error random variables are

independent. The joint asymptotic normality of the conditional mode estimator at

different fixed design points is established under some regularity conditions.

We started our study by consider the mode of a sample of identically independent

distributed (i.i.d.) data. Also, we study some theoretical properties of the kernel

estimator for the conditional density function. Moreover we study some sufficient

conditions under which the kernel estimation of the mode for random design models

is asymptotically normally distributed. After that we study the kernel estimation of

the mode for fixed design models. We assume that (x1, Y1), (x2, Y2), ..., (xn, Yn) are

i.i.d. random variables with conditional pdf f(y|x) where the variable Y depending

upon a fixed design predictor x through a regression function m(x).

Finally, we used the proposed estimator in some applications using simulated and

real data. The results have indicated that the mode estimator has a good perfor-

mance and efficiency estimator by MSE error and correlation coefficient.

الملخص

تلعب التي الشرطية الكثافة بدالة و اللامعلمية بالتنبؤات تهتم هذه الدراسة .هذه الدراسة في مهما دورا

.الثابت التصميم لنماذج اللامعلمي الشرطي المنوال تقدير مشكلة دراسة تم دراسةال هذه في

المشتركة المتقاربة الطبيعية لةالحا تحديد وتم . مستقل العشوائية المتغيرات في الخطأ أن افترضنا

. المنتظمة الشروط بعض على قائمة مختلفة ثابتة تصميم نقاط عند الشرطي المنوال لمقدر

بعض درسنا ,ايضا ، ومتجانسا مستقلا التوزيع يكون عندما المنوال بدراسة دراستنا بدأنا

لمقدر الكافية الشروط بعض دراسة ثم الشرطية، الكثافة لدالة النواة لمقدر النظرية الخصائص

ذلك بعد . المقارب الطبيعي التوزيع يتبع العشوائي التصميم لنموذج للمنوال النواة

، .،.. Y1( x ،) 2, Y2( x ,1 (ان افترضنا ,الثابت التصميم لنموذج للمنوال النواة مقدر درسنا

) n, Ynx )المتغير حيث ، المشروطة الكثافة لدالة ومتجانس مستقل توزيع ذات عشوائية متغيرات

Y الثابت التصميم تنبؤ على يعتمد x الانحدار دالة خلال من m(x).

المحاكاة بيانات باستخدام التطبيقات بعض في المقترح المقدر استخدمنا الدراسة نهاية وفي من ذلك و وكفء الأداء جيد مقدر المنوال مقدر بأن النتائج بينت وقد . حقيقية وبيانات

. الإرتباط ومعامل الخطأ نسبة خلال

Dedication

I am grateful and thankful the most to Allah for giving me the knowledge, the

power and the patience to complete this work, and to his prophet of mercy to the

mankind, the Prophet Mohammad (pbuh), who lightened and guided me to the

right way.

To My Parents.

To My Brother and Sister.

To My Friends.

To all Knowledge Seekers.

Acknowledgements

After Allah, I reserve my greatest appreciation for my thesis supervisor, Professor

Raid Salha, for suggesting the research topics, his extremely helpful guidance, en-

couragement and patience throughout the course of my research work. I would like

to express my grateful sincere to my family especially, my Parents, my brothers

and my sisters for give me confidence, support, and help me to reach this level of

learning.

and I am also thankful to all my friends for their kind advises, and encouragement.

Finally. I pray to Allah to accept this work.

Contents

Abstract i

Abstract in Arabic iii

Dedication iii

Acknowledgements iv

Contents v

List of Tables vii

List of Figures viii

List of Abbreviations ix

List of Symbols x

Introduction 1

1 Preliminaries 5

1.1 Basic definitions and notations . . . . . . . . . . . . . . . . . . . . . 6

1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Properties of the Kernels Estimator . . . . . . . . . . . . . . . . . . 18

1.4 Bias and variance of the kernel density estimation . . . . . . . . . . 20

1.5 The MSE and MISE criteria . . . . . . . . . . . . . . . . . . . . . . 24

1.6 Optimal Binwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Kernel estimation of the mode for random design models 32

2.1 The kernel estimation of the mode . . . . . . . . . . . . . . . . . . . 34

2.1.1 Parzen’s Mode . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.1.2 Generalization of Parzen’s mode estimation. . . . . . . . . . 40

2.2 The kernel estimation of the conditional mode for random design

models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2.1 Asymptotic properties of the conditional mode estimation . 48

3 Kernel estimation of the regression mode for fixed design models 53

3.1 Fixed design model . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2 Mode estimation for fixed design data . . . . . . . . . . . . . . . . . 59

4 Applications 71

4.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Bibliography 80

List of Tables

1.1 Some classical kernel functions . . . . . . . . . . . . . . . . . . . . . 19

4.1 The MSE and The Correlation Coefficient for the Simulation Study 1 74

List of Figures

1.1 Histograms with different binwidths for the same sample. . . . . . . 12

1.2 The effect of the choice of origin is displayed in the five histograms. 13

1.3 Naive estimate constructed from Old Faithful geyser data, h = 0.25. 15

1.4 Kernel density estimation based on 7 points. . . . . . . . . . . . . 17

1.5 Kernel density estimates based on different binwidths. . . . . . . . 29

4.1 A scatter plot of the first simulated data together perfect curve . . 74

4.2 A scatter plot of the second simulated data together with perfect

curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 A scatter plot of the third simulated data together with perfect curve 76

4.4 A scatter plot of the foorth simulated data together with perfect curve 77

4.5 Regression mode estimation of the ethanol data . . . . . . . . . . . 78

4.6 Regression mode estimation of the vapor Pressure of Liquid Water 79

List of Abbreviations

Abbreviation Description

AMISE Asymptotic mean integrated squared error.

Cdf Cumulative distribution function.

Corr Correlation

Cov Covariance.

i.i.d. Independent and identically distributed.

ISE Integrated Squared Error.

KDE Kernel Density Estimation.

MISE Mean Integral Square Error.

MIAE Mean Integral Absolute Square Error.

MSE Mean Square Error.

NW Nadaraya Watson.

o Small oh.

O Big oh.

pdf Probability density function.

Var Variance.

SSE Sum of Squared Error.

SSTO Sum of squares Stands for total.

List of SymbolsSymbol Description

f(x) Probability density function.

F (x) Cumulative distribution function.

fn Kernel estimator for the function f .

h The bandwidth.

hopt The optimal bandwidth.

P Probability set function.

µ The mean.

m(x) Regression mean function.

εi Independent random variables .

σ2 The variance.

θ The population mode.

θn The variable mode estimation.

θ∗n The a concurrent estimator of the mode.

E The expectation.

S The sample standard deviation.

N The sample size.

IA The indicator function.

R The set of real numbers.

R2y,yn Correlation Coefficient.∏

Product.

K(.) The kernel function.p↪→ Converge in probability.

d↪→ Converge in distribution.

a.s.↪→ Almost sure convergence.

Introduction

The probability density function is a fundamental concept in statistics. Suppose

we have a set of observed data points assumed to be a sample from an unknown

probability density function f , the construction of an estimate of the density func-

tion from observed data is known as density estimation. Accordingly, probability

density estimators are generally broken down into two basic classes, parametric or

nonparametric estimators. Parametric estimators assume a functional form of the

density parametrized by a finite set of parameters, while nonparametric methods

consist of all other types of estimators. The main subject of this thesis is the kernel

estimation of the probability density function.

Parzen(1962) and Nadaraya (1965) have shown that under some regularity condi-

tions the estimator of the population mode obtained by maximizing a kernel estima-

tor of the pdf is strongly consistent and asymptotically normally distributed. For

independent and identically distributed data, Samanta and Thavanesmaran (1990)

considered the problem of estimating the mode of a conditional pdf, for random de-

sign model, and they have shown under regularity conditions that the estimator of

the population conditional mode is strongly consistent and asymptotically normally

distributed. Salha and Ioanides (2007) considered the estimation of the conditional

mode under dependence conditions. We assume that (x1, Y1), (x2, Y2), ..., (xn, Yn)

are i.i.d. random variables with conditional pdf f(y|x) where the variable Y de-

pending upon a fixed design predictor x through a regression function m(x). In this

thesis, we consider the problem of estimating the mode of the unknown conditional

pdf f(y|x) for fixed design model. We estimate it by many methods like histogram,

naive estimator, kernel estimator,...etc. In this thesis, we will study the kernel es-

timation of the conditional density function, then we will use it to estimate the

conditional mode.

The conditional mode is denoted by θ(x), where

θ(x) = maxyf(y|x).

The kernel estimator of the conditional mode is denoted by θn(x), where

θn(x) = maxyfn(y|x).

fn(y|x) =

∑ni=1Khn(x−Xi)Khn(y − Yi)∑n

i=1Khn(x−Xi)

Also, we will study some theoretical properties of the kernel estimator for the con-

ditional density function.

let (x1, Y1), (x2, Y2), ...(xn, Yn) be a data for a fixed design, Yi = m(xi) + εi. for

i = 1, ..., n. be the regression model, where m(x) is unknown regression function.

The design points x1, x2, ..., xn determined by the experimenter, are ordered, i.e. we

have 0 ≤ x1 ≤ x2 ≤ ... ≤ xn ≤ 1. Inside the absence of different information, we

will take xi = in, i = 1, 2, ..., n. εi are independent random variables with mean 0

and variance σ2.

Finally, we will performance of the proposed estimator is tested via applications

using simulated and real life data.

This thesis will consist of the following chapters

Chapter 1. Preliminaries.

This chapter will contain some basic definitions, facts and notations that will use

in the thesis. Also, it will contain an introduction to the kernel estimation.

Chapter 2. Kernel estimation of the mode for random design models.

In this chapter, we will study the mode estimation for random design models and

the conditional mode for independent data.

Chapter 3. Kernel estimation of the regression mode for fixed design

models.

This chapter is the main chapter of the thesis. In this chapter, we will study the

problem of estimating nonparametrically the conditional mode for fixed design mod-

els. We suppose the error random variables are independent. The joint asymptotic

normality of the conditional mode estimator at different fixed design points is es-

tablished under some regularity conditions.

Chapter 4. Applications.

In this chapter, the performance of the mode estimation will be tested using ap-

plication to real data and simulated data. Then the thesis will be closed by some

conclusion remarkes and suggestions for new research ideas.

Chapter 1

Preliminaries

Chapter 1

Preliminaries

Introduction:

In this chapter, we present some important methods to estimate the density func-

tion. The concept of estimation will be introduced and discussed. We will concern

with the nonparametric estimation and introduce three of the most nonparametric

estimation methods, histograms, naive and the kernel. The asymptotic properties

of the kernel estimation of the density function, will be discussed in details.

This chapter is organized as follows.

In Section 1.1, we present some concepts and facts from statistic, that we will need

in this thesis. In Section 1.2, we present some well known estimators of the density

function. In Section 1.3, we summarize some properties of the kernels and estima-

tor, the bias and the variance of the kernel density estimator will be discussed in

Section 1.4. In Section 1.5, we will present two types of error criteria the MSE and

the MISE for the kernel density estimation. In Section 1.6, we present the optimal

binwith.

1.1 Basic definitions and notations

In this Section, we will introduce some basic definitions and theorems from statistic,

that will help in the remaining of this thesis.

Definition 1.1.1. (Royden and Fitzpatrick, 1988) If A is any set, we define the

Indicator function IA of the set A to be the function given by

1 if x∈A,

0 if x/∈A.

Definition 1.1.2. (Hogg and Craig, 1995) (Converge in Probability)

Let Xn be a sequence of random variables and let X be a random variable defined

on a sample space. We say Xn converges in probability to X if for all ε > 0, we have.

limn−→∞

P [|Xn −X|≥ε] = 0, (1.1.1)

or equivalently,

limn−→∞

P [|Xn −X|≤ε] = 1. (1.1.2)

If so, we write

Xnp−→ X

Definition 1.1.3. (Hogg and Craig, 1995)(Converge in Distribution)

Let Xn be a sequence of random variables and let X be a random variable. Let

FXn and FX be, respectively, the cumulative distribution function (cdfs) of Xn and

X. Let C(FX) denote the set of all points where FX is continuous. We say that Xn

converge in distribution to X if

limn−→∞

FXn(x) = FX(x), for all x ∈ C(FX). (1.1.3)

We denote this convergence by

Xnd−→ X

Definition 1.1.4. (Yates and Goodman, 1999)(Almost Sure Convergence)

A sequence of random variables, X1, X2, ....., converges almost surely to a random

variable X if, for every ε > 0,

P ( limn→∞

|Xn −X| < ε) = 1.

We denote this convergence by

Xna.s.−→ X

Theorem 1.1.5. (Hogg and Craig, 1995)

1. If Xn converge to X with probability 1, then Xn converge to X in probability.

2. If Xn converge to X in probability, then Xn converge to X in distribution.

3. Let Xn converge to X in probability and let g be a continuous function on R,

then g(Xn) converge to g(X) in probability.

Theorem 1.1.6. (Sen, 1993) (Liapounov Theorem)

Let Xk, k > 1, be independent random variables such that EXk = µk and

V arXk = σ2k, and for some 0 < δ 6 1,

ν(k)2+δ = E|Xk − µk|2+δ <∞, K ≥ 1.

Also, let

Tn =∑n

k=1Xk, ξn =ETn =∑n

k=1 µk,

S2n = V arTn =

∑nk=1 σ

2k, Zn = (Tn − ξn)|Sn

and ρn = S−(2+δ)n

∑nk=1 ν

(k)2+δ. Then, if

limn→∞

ρn = 0,

we have,

Znd−→ N(0, 1).

Definition 1.1.7. (Hansen, 2009) (Order Notation O And o)

Given two sequences {xn} and {yn} such that yn ≥ 0 for all n.

• The sequence {xn} is of big order with the sequence {yn} if limn−→∞

∣∣∣∣xnyn∣∣∣∣ <

∞, denoted xn = O(yn) (read : xn is big oh of yn ),

• The sequence {xn} is of small order with the sequence {yn} if limn−→∞

∣∣∣∣xnyn∣∣∣∣ = 0,

denoted xn = o(yn) as x −→∞ (read : xn is little oh of yn).

Theorem 1.1.8. (Wand, 1995) (Taylor’s Theorem)

Suppose that f is real valued function defined on R and let x ∈ R . Assume

that f has p continuous derivatives in an interval (x – δ, x + δ) for some δ > 0.

Then for any sequence αn converging to zero .

f(x+ αn) =

p∑j=0

(αjnj!

)f j(x) + o(αpn).

Theorem 1.1.9. (Sen, 1993) (Cramer-Wold)

let X, X1, X2, .... be random vectors in RP, then Xnd−→ X if, and only if for

a fixed λ ∈ RP, we have

λTXnd−→ λTX

where λT denotes the transpose of λ.

1.2 Estimation

Estimation refers to the process by which one makes inferences about a population,

based on information obtained from a sample, and it is a rule for calculating an

estimate of a given quantity based on observed data.

The probability density function is a fundamental concept in statistics. Consider

any random variable X that has probability density function f . Specifying the func-

tion f gives a natural description of the distribution of X, and allows probabilities

associated with X to be found from the relation.

P (a < X < b) =∫ baf(x)dx

for any real constants a and b with a < b.

If the observed data are drawn from a distribution with unknown probability

density function, then the construction of an estimator to the unknown density

function is called a density estimation (Silverman, 1986). Density estimation

has experienced a wide explosion of interest over the last 40 years. It has been ap-

plied in many fields, including archeology, chemistry, banking, climatology, genetics,

economics, hydrology and physiology.

Definition 1.2.1. ( Estimator) An estimator is any statistic from the sample data

which is used to give information about an unknown parameter in the population.

Definition 1.2.2. (Le Cam, 1953)( Unbiased estimator)

Let X be a random variable with pdf with parameter θ. Let X1, ...., Xn be a random

sample from the distribution of X and let θn denotes an estimator of θ . We say θn

is an unbiased estimator of θ , if E(θn) = θ.

If θn is not unbiased, we say that θn is abiased estimator of θ .

Definition 1.2.3. (Le Cam, 1953)( Asymptotic Unbiasedness)

Let X be a random variable with pdf with parameter θ. Let X1, ...., Xn be a random

sample from the distribution of X and let θn denotes an estimator of θ . We say

θn is asymptotic unbiased estimator of θ, if its expected value converges to θ as

n→∞ .

limn→∞

E(θn) = θ

If this is not true, then θn is asymptotically biased.

Definition 1.2.4. (Le Cam, 1953)(Consistent Estimators)

The estimator θn of a parameter θ is said to be consistent estimator if for any

positive ε

limn→∞

P (|θn − θ| ≥ ε) = 0

or equivalently

limn→∞

P (|θn − θ| < ε) = 1

We say that θn converges in probability to θ

If this is not true, then θn is inconsistent.

There are two types of density estimation :

• Parametric Estimation.

• Nonparametric Estimation.

Parametric Estimation

The parametric estimation assumes that, the sample which is studied has known

distribution such as Gaussian and Gamma distribution and then parametric esti-

mation can be used to estimate the missed parameters for the distribution by using

moment, maximum likelihood estimators, bayes estimators, chi square estimators

...etc. For example the parametric density function assume that, if the data has

normal distribution with mean µ and variance σ2, we can estimate these parameters

µ, σ2 and substitute in the normal distribution formula, and then we obtain the es-

timation density function and denoted by fn(x). The simple example of parametric

estimation are:

1. capacity factor estimates.

2. equipment factor estimates.

Nonparametric Estimation

Nonparametric estimation methods are defined in (Racine, 2008) as“statistical

techniques that do not require a researcher to specify functional forms

for the objects being estimated”. There are many nonparametric statistical

objects of potential interest, including density functions (uni-variate and multi-

variate), density derivatives, conditional density functions, conditional distribution

functions, regression functions, median functions, quantile functions, and variance

functions. Many nonparametric problems are generalizations of uni-variate density

estimation. For obtaining a nonparametric estimation of a pdf there many methods.

Three common methods are:

1. Histogram.

2. The Naive Estimator.

3. Kernel Density Estimation .

Histogram

The oldest and most widely used density estimator is the histogram. A histogram

was formalised by (Sturges, 1926), where the histogram requires two parameters to

be defined binwidth and starting position of the first bin. The data range is divided

into a set of successive and non-overlapping intervals (bins), and the frequencies

of occurrence in the bins are plotted against the discredited data range. Given an

origin x0 and a binwidth h, we define the bins of the histogram to be the intervals

[x0 + mh, x0 + (m + 1)h), for positive and negative integer m. The intervals have

been chosen closed on the left and open on the right for definiteness.

Definition 1.2.5. (Wand, 1995)(Histogram Estimator)

Let X1, X2, ..., Xn be a random sample from unknown pdf f(x), the histogram

estimator of the density function f(x) is defined by:

fn(x) =1

nh(no, of Xi in same bin as x). (1.2.1)

Figure 1.1: Histograms with different binwidths for the same sample.

Figure 1.1 shows three histogram estimates of f(x) for the sample, for differ-

ent binwidths. Note that the estimates are piecewise constant and that they are

strongly influenced by the choice of binwidth, see (Hrdle, 2012).

Figure 1.2: The effect of the choice of origin is displayed in the five histograms.

Figure 1.2 shows five histograms of the buffalo snowfall data with the same bin-

width h = 10, but with different origins x0 = 0, 2, 4, 6, 8, and the average shifted

histogram built from these five histogram (Hrdle, 2012).

Histogram is a very simple form of density estimation, but has several weakness:

1. The density estimate depends on the starting position of the bins. For mul-

tivariate data, the density estimate is also affected by the orientation of the

2. The discontinuities of the estimate are not because of the underlying density,

they are only an artifact of the selected bin locations those discontinuities

make it very difficult to comprehend the structure of the data.

3. A much more serious problem is the curse of dimensionality, since the number

of bins grows exponentially with the number of dimensions. In high dimen-

sions we would require a very large number of examples or else most of the

bins would be empty.

4. These issues make the histogram unsuitable for most practical applications

except for quick visualizations in one or two dimensions.

5. Histogram depends on the binwidth h and the origin x0.

To overcome the weakness of the histogram method there is another method

which is naive estimator.

The naive estimator (Silverman, 1986)

A generalization of the histogram method, is the naive estimator, it is equivalent

to a histogram where the estimation point x is used as the center of the bin, and

the binwidth is 2h. Therefore, the naive estimator is always globally a valid pdf,

i.e., it is non-negative and integrates to one. The naive estimator was proposed by

(Fix and Hodges Jr, 1951).

If the random variable X has density function f , then

f(x) = limh−→0

2hP (x− h < X < x+ h). (1.2.2)

For any given h, we can estimate P (x − h < X < x + h) by the proportion of the

sample falling in the interval (x− h, x+ h). Thus a natural estimator fn(x) of the

density function is given by choosing a small number h and setting

fn(x) =1

2nh[no. of X1, ..., Xn falling in (x− h, x+ h)]. (1.2.3)

This estimator is called the naive estimator.

Comparing the above estimator to Equation (1.2.1), we see that it is equivalent to

a histogram.

It can be informative to rewrite Equation (1.2.3), by the weighting function W :

W (x) =

if |x| < 1

0 otherwise.

Using this notation, we can express the naive estimator as

fn(x) =1

n∑i=1

(x−Xi

). (1.2.4)

where Xi are the data samples.

Figure 1.3: Naive estimate constructed from Old Faithful geyser data, h = 0.25.

Figure 1.3 shows the stepwise nature of the estimate is clear. The boxes used

to construct the estimate have the eruption lengths constructed with the same bin-

width but different origins (Samanta and Thavanesmaran, 1990).

The Naive have some disadvantage:

• It is continuity but has jumps at the points of (xi ± h).

• It has zero derivative everywhere.

Refinement of naive estimator by replacing the weight function W by kernel

function K .

The Kernel Estimator

The univariate kernel density estimation (KDE) is a non-parametric way to esti-

mate the pdf f(x) of a random variable X. It is a fundamental data smoothing

problem where inferences about the population are made, based on a finite data

sample. These techniques are widely used in various inference procedures such as

signal processing, data mining, and econometrics.

Definition 1.2.6. (Wand, 1995)(Kernel Estimator)

Let X1, X2, ...., Xn be a random sample from unknown pdf f(x), the kernel esti-

mator of the density function f(x) is defined by (Rosenblatt, 1956). He generalize

the naive estimator to the kernel form

fn(x) =1

n∑i=1

Kh(x−Xi) =1

n∑i=1

(x−Xi

). (1.2.5)

h is called the binwidth and K(.) is a kernel function considers to be both

symmetric and satisfies.

∫ ∞−∞

K(x) dx = 1

The kernel estimator can be viewed as a sum of bumps placed at the observa-

tion. The kernel function K determines the shape of bumps, where the binwidth h

determines their width, see the illustration in Figure 1.4 (Fan and Yao, 2003).

Figure 1.4: Kernel density estimation based on 7 points.

From Figure 1.4, we have:

• (1) The shape of the bump is defined the kernel function.

• (2) The spread of the bump is determined by a binwidth h, that is analogous

to the binwidth of a histogram.

That is the value of the kernel estimate at the point x is the average of the n

kernel ordinates at this point.

1.3 Properties of the Kernels Estimator

In this section, we will discuss some fundamental properties of the kernels estimator

and give some examples of common the kernel functions.

We consider some properties of the kernels :

1. The kernel is a piecewise continuous function, symmetric around zero, and

integrating to one, i.e.

K(x) = K(−x),

∫ ∞−∞

K(x) dx = 1

2. The kernel function need not have bounded support and in most applications

K is a positive probability density function.

Definition 1.3.1. (Hansen, 2009) A kernel function K is said to be of order p, if

its first nonzero moment is µp , i.e. if

µj(k) = 0, j = 1, 2, ...., p− 1, µp(k) 6= 0,

where,

µj(k) =

∫ ∞−∞

yjK(y) dy. (1.3.1)

Some examples of the kernel functions and their formula are given in Table 1.1,

where I is the indicator function.

Table 1.1: Some classical kernel functions

kernel Equation

Epanechnikov K(t) = 34(1− 1

5t2/√

5)I(|t|≤√5)

Biweight K(t) = 1516

(1− t2)2I(|t|≤1)

Triangular K(t) = (1− |t|)I(|t|≤1)

Gaussian K(t) = 1√2π

exp(−t2/2)

Rectangular K(t) = 12I(|t|≤1)

From the definitions given in the table 1.1, we can see that the choice of h will

boost how many values are consisted in estimating the density at each point. This

value is called the window width or binwidth. If the window width is not limited,

it is determined as

m = min

(√variancex ,

interquartile rangex1.349

h =0.9m

where x is the variable for which we wish to estimate the kernel and n is the number

of observations. Most researchers agree that the choice of kernel is not as important

as the choice of binwidth. There is a great deal of literature on choosing bandwidths

under differed conditions, see for example, (Parzen, 1962).

Now, we introduce some important properties of the kernel estimator. We con-

sider the following conditions:

(i) The unknown density function f(x) has continuous second derivative f ′′(x).

(ii) The binwidth h = hn satisfies limn→∞

h = 0, and limn→∞

nh =∞.

(iii) The kernel K is a bounded probability density function of order 2 and sym-

metric about the origin.

(iv)∫∞−∞ zK(z) dz = 0 and

∫∞−∞ z2K(z) dz <∞.

1.4 Bias and variance of the kernel density esti-

mation

In the light of the previous properties, the bias and the variance of the kernel density

estimation function will be derived.

Definition 1.4.1. (Hansen, 2009) The bias of an estimator fn(x) of a density f(x)

is the difference between the expected value of fn(x) and f(x), that is

Bias(fn(x)) = E(fn(x)) – f(x),

E(fn(x)) =1

n∑i=1

(x – Xi

n∑i=1

∫ ∞−∞

K(x – t

)f(t)dt

∫ ∞−∞

K(x – t

)f(t)dt

The transformation z = x – th

, i.e. t = x− hz, |dzdt| = 1

E(fn(x)) =

∫ ∞−∞

K(z)f(x – hz)dz

since f is continuous derivative of order 2.

Expanding f(x–hz) in a Taylor series yields,

f(x – hz) =2∑j=0

(−hz)j

j!f j(x) + o(−hz)2

= f(x) + (−hz)f ′(x) +h2z2

2f ′′(x) + o(h2z2)

= f(x) – hzf ′(x) +1

2(hz)2f ′′(x) + o(h2),

where o(h2) represents terms that converge to zero faster than h2 as h approaches

E(fn(x)) =

∫ ∞−∞

K(z){f(x) – hzf ′(x) +h2z2

2f ′′(x) + o(h2)}dz

∫ ∞−∞

K(z)f(x)dz –

∫ ∞−∞

K(z)hzf ′(x)dz

∫ ∞−∞

K(z)h2z2

2f ′′(x)dz + o(h2)

= f(x)

∫ ∞−∞

K(z)dz – hf ′(x)

∫ ∞−∞

zK(z)dz

2f ′′(x)

∫ ∞−∞

z2K(z)dz + o(h2)

= f(x) +h2

2µ2(K)f ′′(x) + o(h2)

Bias(fn(x)) =1

2h2f ′′(x)µ2(K) + o(h2), (1.4.1)

where µ2(K) =∫∞−∞ z2K(z) dz.

The variance of (fn(x)) is given by

V ar(fn(x)) = V ar

n∑i=1

K(x – Xi

n∑i=1

(K(x – Xi

because the Xi, i = 1, 2, ..., n, are independently distributed. Now

(K(x – Xi

(K(x−Xi

)2)–

(x – Xi

∫ ∞−∞

K(x – t

)2f(t)dt –

(∫ ∞−∞

K(x – t

)f(t)dt

V ar(fn(x)) =1

∫ ∞−∞

h2K(x–t

)2f(t)dt –

∫ ∞−∞

K(x – t

)f(t)dt

∫ ∞−∞

x – t

h)2f(t)dt –

(f(x) +Bias(fn(x))

Substituting z =x–t

hone obtains

V ar(fn(x)) =1

∫ ∞−∞

K(z)2f(x – hz)dz –1

n(f(x) + o(h2))2.

Applying a Taylor approximation yields

V ar(fn(x)) =1

∫ ∞−∞

K(z)2(f(x) – hzf ′(x) + o(h))dz –1

n(f(x) + o(h2))2.

Note that if n becomes large and h becomes small then the above expression

becomes approximately:

V ar(fn(x)) =1

nhf(x)R(K) + o(nh)−1, (1.4.2)

where, R(K) =∫∞−∞K

2(z)dz.

By assumption, the result holds.

From the above we have some properties about the bias and the variance:

1. The bias is of order h2, which implies that fn(x) is an asymptotically unbiased

estimator.

2. The bias is large, whenever the absolute value of the second derivative |f ′′(x)|

is large. This occurs for several densities at peaks where the bias in negative,

and valleys where the bias is positive.

3. The variance is of order (nh)−1, which means that the variance converges to

zero by condition (ii).

Parzen(1962) studied the statistical properties of kernel estimator. In addition to

the above, he proved several other properties. He showed that fn(x) is a consistent

of f(x) and the sequence of estimates fn(x) is asymptotically normally distributed.

Also he proved that if the probability density function f(x) is uniformly continuous,

and if limn−→∞

nh2n =∞, then fn(x) tends uniformly continuously (in probability) to

f(x), in the sense that next Equation holds,

limn−→∞

P ( sup−∞<x<∞

|fn(x)− f(x)| < ε) = 1, ∀ε > 0.

1.5 The MSE and MISE criteria

The important role played by kernel density estimator makes us concerned with its

performance, its efficiency and accuracy in estimating the true density. There are

two types of errors criteria:

1. The mean squared error (MSE)

This type of error criteria is used to measure the error when estimating the density

function at a single point. It is defined by:

MSE {fn(x)} = E {fn(x) – f(x)}2 . (1.5.1)

The MSE measures the average squared difference between the density estimator

and the true density. In general, any function of the absolute distance |fn(x) – f(x)|

(often called metric) would serve as a measurement of the goodness of an estimator.

But MSE metric has at least two advantages:

• it is tractable analytically.

• it has an interesting decomposition into variance and squared bias provided

f(x) is not random.

For estimation at a single point x, a natural measure of discrepancy is the mean

square error (MSE) defined by:

MSE(fn(x)) = E[fn(x)− f(x)]2

= E[f 2n(x)− 2f(x)fn(x) + f 2(x)]

= Ef 2n(x)− 2f(x)Efn(x) + Ef 2(x)

= (Efn(x))2 − 2f(x)Efn(x) + f 2(x) + Ef 2n(x)− (Efn(x))2

= [Efn(x)− f(x)]2 + V ar(fn(x))

= Bias2(fn(x)) + V ar(fn(x)).

2. The mean integral squared error (MISE)

This type of error criteria is used to measures the error when estimating the density

over the whole real line. The most well known of this type is the mean integral

square error, which was introduced by (Rosenblatt, 1956).

Definition 1.5.1. (Ouyang, 2005). An error criterion that measures the distance

between fn(x) and f(x) is the integrated squared error (ISE), given by

ISE {fn(x)} =

∫ ∞−∞

(fn(x) – f(x))2dx.

Note that the ISE is not appropriate if we deal with all data sets, so we prefer to

analyze the expected value of this random quantity, the integrated squared error.

Definition 1.5.2. (Ouyang, 2005). The expected value of ISE is called the mean

integrated squared error (MISE) is given by

MISE(fn(x)) = E(ISE {fn(x)}) = E

∫ ∞−∞{fn(x) – f(x)}2 dx. (1.5.2)

We can write the MISE as integral the MSE and as a sum of integral the squared

bias and integral the variance at x,

MISE(fn(x)) =

∫ ∞−∞

MSE(fn(x))dx =

∫ ∞−∞{Efn(x)− f(x)}2 dx+

∫ ∞−∞

V ar(fn(x))dx.

(1.5.3)

Equation (1.5.3) gives the MISE as a sum of the integral squared bias and the

integral variance. Substituting (1.4.1) and (1.4.2), we conclude that

MISE(fn(x)) = AMISE(fn(x)) + o{h4 + (nh)−1

}, (1.5.4)

where AMISE is the asymptotic mean integral squared error given by

AMISE(fn(x)) =1

4h4µ2(K)2R(f ′′) + (nh)−1R(K), (1.5.5)

see (Wand, 1995). Notice that the integral squared bias is asymptotically propor-

tional to h4, so to reduce this quantity one needs to take h to be small. On the

other hand, taking a small h increases the integral variance since it is proportional

to (nh)−1. Therefore, as n increases, h should vary in such a way that each of the

components of the MISE becomes small. This is known as the variance-bias trade-

off. The trade-off between bias and variance in the binwidth distributions seems to

be an intrinsic part of the performance of data-based binwidth selectors. Less bias

seems to entail more variance, and at some cost in bias, much less variance can be

obtained.

Mean integrated absolute error (MIAE)

We could also work with other criterions where,

MIAE {fn(., h)} = E

∫ ∞−∞|fn(x, h)− f(x)|dx.

MIAE is always defined whenever fn(x, h) is a density, and it is invariant under

monotone transformation, but more complicated.

1.6 Optimal Binwidth

The problem of selection the binwidth is very important in kernel density estimation.

Choice of appropriate binwidth is critical to the performance of most nonparamet-

ric density estimators. When the binwidth is very small, the estimate will be very

close to the original data. The estimate will be almost unbiased, but it will have

large variation under repeated sampling. If the binwidth is very large, the estimate

will be very smooth, lying close to the mean of all the data. Such an estimate will

have small variance, but it will be highly biased. There are many rules for bin-

width selection, for example normal scale rules, over-smoothed binwidth selection

rules, least squares cross-validation, biased Cross-Validation, estimation of density

functionals and plug-in binwidth selection. For more details see (Loeve, 1960), (Sil-

verman, 1986) and (Wand, 1995). In this section, we briefly review methods for

choosing a global value of the binwidth h.

The problem of choosing is crucial in density estimation

(i) A large h will over-smooth the density estimation and mask the structure of

the data.

(ii) A small h will yield a density estimation that is spiky and very hard to interpret.

(iii) We would like to find a value of h that minimizes the error between the

estimated density and the true density.

A natural measure is the MSE at the estimation point x, defined by (1.5.1).

The bias of an estimate is the systematic error incurred in the estimation.

The variance of an estimate is the random error incurred in the estimation.

(iv) The bias-variance dilemma applied to bandwidth choosing simply means that,

a large binwidth will debase the differences inter the estimates of fn(x) for

different data sets (the variance), but it will increase the bias of fn(x) with

respect to the true density f(x). A small binwidth will debase the bias of

fn(x), at the outlay of a larger variance in the estimates fn(x).

Subjective choice

The natural way for choosing h is to plot out several curves and choose the esti-

mate that best matches one prior (subjective) ideas. However, this method is not

practical in pattern recognition since we typically have high-dimensional data.

Reference to a standard distribution

Assume a standard density function and find the value of the binwidth that mini-

mizes the integral of the square error (MISE)

hMISE = argminE

[∫(fn(x)− f(x))2dx

](1.6.1)

If we assume that the true distribution is Gaussian and we use a Gaussian kernel,

the bandwidth h is computed using the following equation from (Silverman, 1986).

h∗ = 1.06SN−15 (1.6.2)

where S is the sample standard deviation and N is the sample size.

The AMISE (asymptotic MISE) has some useful advantages. Its simplicity as

a mathematical expression to deal with, makes it useful for large sample approxi-

mation. Also, we can see an important alternative relationship between bias and

variance, it is known as the variance-bias trade-off. It gives us an understanding

about the role of binwidth h.

Figure 1.5: Kernel density estimates based on different binwidths.

Figure 1.5 depends on three binwidths:

If we choose h = 0.25, then we have a solid curve. If we choose h = 0.5, then

we have a dashed curve. And if we choose h = 0.75, then we have a dotted curve

(Salha, 2014).

Corollary 1.6.1. The AMISE-optimal binwidth, hAMISE , has a closed form

hopt =

µ2(K)2R(f ′′)n

. (1.6.3)

Proof. By differentiating (1.5.5) with respect to h and setting the derivative equal

to zero we can find the optimal binwidth

dh{AMISEfn(x)} = −(nh2)−1R(K) + h3µ2

2(K)R(f ′′) = 0

h5nµ22(K)R(f ′′) = n−1R(K)

hopt =

nµ22(K)R(f ′′)

Therefore if we substitute (1.6.3) into (1.5.5), we obtain the smallest value of

AMISE for estimating f using the kernel K.

infh>0

AMISE {fn} =5

{µ2(K)2R(K)4R(f ′′)

} 15 n

−45 . (1.6.4)

Notice that from (1.6.3), we see that the optimal binwidth depends on the un-

known density being estimated, so we can not use (1.6.3) directly to find the optimal

binwidth hopt. Also from (1.6.3) we can conclude the following useful conclusions:

• The optimal binwidth will converge to zero as the sample size increases, but

at very slow rate.

• The optimal binwidth is inversely proportional to R(f ′′)15 . Since R(f ′′) mea-

sures the curvature of f , this means that for a density function with little

curvature, the optimal binwidth will be large. Conversely, if the density func-

tion has a large curvature, the optimal binwidth will be small.

Summary:

In this chapter, we introduced some basic definitions and theorems that will need

it in this thesis, and we studied definition of estimation, its types, and precis of

nonparametric estimation and its common methods, then we presented kernel den-

sity estimation of the pdf, properties of the kernel estimator and we presented the

MSN and MISE criteria and its role. Also we studied the optimal binwidth and

many of rules for bandwidth selection. In the next chapter, we will study the kernel

estimation of the mode for random design models.

Chapter 2

Kernel Estimation of The Mode

for Random Design Models

Chapter 2

Kernel estimation of the mode for

random design models

Introduction:

The mode is one of the central tendency measures, which is used to find the most

repeated values and were the density function has its maximum values.

Definition 2.0.1. (Salha and Ioanides, 2007) (A mode) of a probability density

f(t) is a value θ which maximizes f. The size of the mode is f(θ).

θ = maxx

f(x) (2.0.1)

The mode is defined as the common value or the most repetitive among those

observation, and the data might have more than one modes or nothing at all which

can’t be calculated. The mode can be found by calculation or drawing and it is

not affected by the irregular values. It’s important to refer to the relation between

the mean, the median, and the mode. Those measures are the most used measures

of locations by the investigators, because those measures are easy to understand.

They are equal when the curve is symmetric, while when the curve is in a state of a

positive bending, the mean is larger than the median and the both are larger than

the mode. When the curve is in a state of negative bending, the mode is larger

than the median and the both are larger than the mean.

Strengths of the mode (Le Cam, 1953)

• Very quick and easy to determine.

• Is an actual value of the data.

• Not affected by extreme scores.

Weakness of the mode

• Sometimes not very informative.

• Can change dramatically from sample to sample.

• Might be more than one.

The problem of estimating the mode of a pdf has received large heed in the

literature. The study of nonparametric mode estimation is now four decades old,

having roots in many papers. In the last few years, an increasing interest in this

topic can be observed.

In this chapter, we will study the mode estimation for random design models and

the conditional mode for independent data. This chapter consists of two sections.

In Section 2.1, we introduce the kernel estimation of the mode. In the next section,

we present the kernel estimation of the conditional mode for random design models.

2.1 The kernel estimation of the mode

The problem of estimating the location and size of the mode is considered via kernel

density estimates. In this section, we study the parzen’s kernel mode estimation

and it’s generalization.

2.1.1 Parzen’s Mode

We present the kernel estimation of the mode. Let X1, X2, ....., Xn be a sequence

of i.i.d. random variable with pdf f(x). Assume that the pdf f(x) is uniformly

continuous in x. It follows that f(x) possesses a mode θ which defined by

f(θ) = maxx

assume that θ is unique.

The classical procedure to estimate the mode is as follows, If f(x) is the unknown

function and θ is the mode of f(x), then θ is estimated from the location mode θn,

which maximize the estimator function fn(x) for f(x). Suppose that fn(x) is a

continuous function and tends to 0 as |x| tends to ∞. Therefore there is a random

variable θn such that

θn = argmaxx

fn(x), (2.1.1)

where,

fn(x) =1

n∑i=1

(x−Xi

). (2.1.2)

We call θn the sample mode.

(Parzen, 1962) considered θn as an estimator of the population mode θ. He

established conditions under which θn is consistent estimate in the sense that

limn−→∞

P [|θn − θ|≤ε = 1]. ∀ ε > 0 (2.1.3)

The estimator (2.1.1) is increasingly used, although it is diffcult to calculate.

Indeed, in addition to the calculation of fn, it involves a numerical step for the

computation of arg max as noticed by (Devroye, 1979), classical search methods

of the arg max perform satisfactorily only when fn is sufficiently regular. Thus in

practice, the arg max is usually computed over a finite grid, although it may affect

the asymptotic properties of the estimator. Moreover, when the dimension of the

sample space is large, or when accurate estimation is needed, the grid size increasing

exponentially with the dimension, leads to time-consuming computations.

Finally, the search grid should be located around high density areas. In high

dimension, this is a diffcult task and the search grid usually includes low density

areas. To solve this problem (Abraham et al., 2003) proposed a concurrent estimator

of the mode θ∗n which is defined by

θ∗n = argmaxx∈sn

fn(x). (2.1.4)

where Sn ={ X1, X2, ....., Xn}, is a finite sample of d dimension data. The main

advantage of using θ∗n instead of θn is that the former is easily computed in a finite

number of operations. Moreover, since the sample points are naturally concentrated

in high density areas, the set Sn can be regarded as the most natural random grid

for approximating the mode.

(Abraham et al., 2003) established the strong consistency of θ∗n towards θn and

provided almost sure rate of convergence without any differentiability condition on

f around the mode. (Abraham et al., 2004) examine whether maximization over

a finite sample alters the rate of convergence of the estimate θ∗n compared to that

of the estimate θn. They proved that the two estimates have the same asymptotic

behavior. Also, another use of computing θ∗n is that it may be an appropriate choice

for a stating value of an optimization algorithm to approximate θn.

To achieve asymptotic normality of θn and therefore to be able to construct

asymptotic confidence interval for θ, it is generally believed that rather heavy

smoothing conditions are needed. (Parzen, 1962) gave conditions under which the

sample mode θn is asymptotically normally distributed.

Theorem 2.1.1. (Parzen, 1962) (Asymptotic normality of the sample mode.)

Let K(x) be a probability density with characteristic function k(u) and let the

density f(x) have a characteristic function φ(u). If for some r ≥ 2

limu−→0

1−K(u)

ur> 0,

and if, for some δ, 12< δ < 1,

1.∫∞−∞ u

2+δ|φ(u)|du <∞,

2.∫∞−∞ u

2+δ|K(u)|du <∞,

3. limn−→∞

nh5+2δ = 0,

4. limn−→∞

nh6 =∞,

(nh3)12 (θn–θ) −→ N

[f(θ)′′]2

∫ ∞−∞

(K ′(y))2dy

). (2.1.5)

Proof. See (Parzen, 1962) theorem (5A).

In this section conditions under which the asymptotic properties of the kernel

estimator of the pdf and its mode obtained are stated. For more details see, (Parzen,

1962) and (Salha and Ioanides, 2007).

Condition 1

1. The density function f(x) is uniformly continuous,

2. limx−→±∞

f(x) = 0,

3. f(x) has continuous second derivative.

condition 2

The bandwidth h is a function of n such that

1. limn−→∞

h = 0,

2. limn−→∞

nh =∞,

3. limn−→∞

nh2 =∞,

condition 3

let K(x) is a borel function satisfying the conditions

1. K(x) is twice differentiable,

2. sup−∞<x<∞

|K(x)| <∞,

3.∫∞−∞ |K(x)|dx <∞,

4. limx−→∞

|xK(x)| = 0,

5.∫∞−∞K(x)dx = 1,

6. limx−→∞

x2K(x) = c.

The unbiased property and the consistent of the kernel density estimator under

the previous conditions.

First, the following theorem (2.1.2) is important to prove the unbiased property

and consistency of the kernel density.

Theorem 2.1.2. (Bochner Theorem ). Suppose K(y) is a Borel function satis-

fying the conditions:

1. sup−∞<y<∞

|K(y)| <∞,

2.∫∞−∞ |K(y)|dy <∞,

3. limn−→∞

|yK(y)| = 0.

and let g(y) satisfy

∫ ∞−∞|g(y)|dy <∞,

let h be a sequence of positive constants satisfying limn−→∞

h = 0 Define,

gn(x) =1

∫ ∞−∞

h)g(x – y)dy,

then at every point x of continuity of g(.)

limn−→∞

gn(x) = g(x)

∫ ∞−∞

K(y)dy.

proof: first note that

gn(x) – g(x)

∫ ∞−∞

K(y)dy =

∫ ∞−∞

(g(x – y) – g(x)

let δ>0, and split the region of integration into two regions, |y| ≤ δ and |y|>δ,

|gn(x) – g(x)|∫ ∞−∞

K(y)dy ≤ max|y|≤δ|g(x – y) – g(x)|

∫|z|≤ δ

K(z)dz

∫|y|≥δ

yg(x – y)

+ |g(x)|∫|y|≥δ

≤ max|y|≤δ|g(x – y) – g(x)|

∫ ∞−∞

K(z)dz

δsup|z|≥ δ

|zK(z)|∫ ∞−∞|g(y)|dy

+ |g(x)|∫|z|≥ δ

|K(z)|dz

which tend to 0 as n tend to ∞, and then lets δ tend to 0.

Theorem 2.1.3. Under the conditions of the last theorem (2.1.2), if h is a function

of n satisfying

limn−→∞

nh2 =∞, limn−→∞

E[fn(x)] = f(x),

and if the probability density function f(x) is uniformly continuous. Then for

every ε > 0

P [supx|fn(x)− f(x)| < ε] −→ 1 as n −→∞.

proof:

To prove this theorem we want to show that

limn−→∞

E12 [ sup−∞<x<∞

|fn(x) – f(x)|2] = 0 (2.1.6)

Since limn−→∞

E[fn(x)] = f(x), it suffices to show that

E12 [ sup−∞<x<∞

|fn(x) – E[fn(x)]|2] −→ 0, (2.1.7)

as n −→∞, since by theorem (2.1.2), it follows that

limn−→∞

sup−∞<x<∞

|E[fn(x)] – f(x)| = 0.

fn(x) = (2π)−1∫ ∞−∞

e−iuxK(uh)ϕn(u)du.

sup−∞<x<∞

|fn(x) – E[fn(x)]| ≤ (2π)−1|∫ ∞−∞

e−iuxK(uh)ϕn(u)du –

∫ ∞−∞

e−iuxK(uh)E[ϕn(u)]du|

= (2π)−1∫ ∞−∞|e−iuxK(uh){ϕn(u) – E[ϕn(u)]}du|

= (2π)−1∫ ∞−∞|K(uh)||ϕn(u) – E[ϕn(u)]|du.(since |e−iux| = 1)

Therefore, by Minkowski’s inequality, the quantity in (2.1.7) is no greater than

(2π)−1∫ ∞−∞|K(uh)|σ[ϕn(u)]du ≤ (n

12h)−1

∫ ∞−∞|K(u)|du

which tends to 0. The proof of this Theorem is complete.

2.1.2 Generalization of Parzen’s mode estimation.

(Samanta, 1973) and (Konakov, 1974) have given multivariate versions of Parzen’s

results. The problem of estimating the mode of a pdf is a matter of both theo-

retical and practical interest. This problem was first considered by (Parzen, 1962)

in the univariate situation. He has shown that under certain regularity conditions

the estimate of the population mode obtained by maximizing an estimate of the

true pdf is consistent and asymptotically normal. The strongest result in this di-

rection is due to (Nadaraya, 1965) who has proved that under certain regularity

conditions the estimate converges to the theoretical mode with probability one.

Xj = (X1j, X2j, ..., Xkj), j = 1, 2, ..., n are n i.i.d. K-dimensional random variables

with an unknown common distribution function F (x) which is assumed to be ab-

solutely continuous having the density function f(x) where X = (x1, x2, ..., xk).

Let K1, K2, ..., Kk be k univariate pdf and {h} be a sequence of positive numbers

converging to zero. We consider an estimate fn(x) of f(x) defined

1. fn(x) = 1n

∑nj=1 h

−kk∏i=1

Ki(xi – Xij

It is clear that fn(x) can be expressed as

2. fn(x) =∫∞−∞· · ·

∫∞−∞ h

k∏i=1

Ki(xi – uih

}dFn(u), where, u = (u1, u2, ..., uk)

and Fn(u) is the sample distribution function defined by

Fn(u) =1

n∑j=1

k∏i=1

{I(ui – Xij)}

and I(x – y) = 1 for y ≤ x and vanishes for y > x.

3. f (j1,j2,...,jk)(x) =∂{j1+j2+···+jk}fn(x)

∂(x1)j1∂(x2)j2 ...∂(xk)jk.

4. f (j1,j2,...,jk)(x) =∂{j1+j2+···+jk}fn(x)

∂(x1)j1∂(x2)j2 ...∂(xk)jk.

∑nj=1 h

−(k+j1+j2+...+jk)k∏i=1

K(ji)i

(xi – Xij

=∫∞−∞· · ·

∫∞−∞ h

−(k+j1+j2+...+jk)n

{k∏i=1

K(ji)i (

xi – Xij

}dFn(u).

Suppose the density function f(x) is uniform, continuous in x. Then there exists a

vector θ = (θ1, θ2, . . . , θK) such that f(θ) = maxx

f(x). We assume that θ is unique.

We next assume that for each i, Ki(t) is continuous in t and Ki(t) approaches zero

along with |t| going to infinity. Therefore, there exists a K-dimensional random

variable θn = (θ1n, θ2n, . . . , θkn) such that fn(θn) = maxx

fn(x). If for each i, Ki(t)

is chosen to be twice differentiable, then

f (1,0,0,...,0)n (θn) = f (0,1,0,...,0)

n (θn) = ... = f (0,0,0,...,1)n (θn) = 0.

It will be shown Theorem(2.1.4) that under certain regularity conditions θn converge

to θ with probability one. Furthermore, (nhk+2)12 (θn – θ) converges in distribution

to a K-dimensional normal random variable with mean vector 0 and covariance

matrix C−1 ∧ C−1 where C = ((crs)), ∧ = ((λrs)), crs =∂2f(x)

∂xr∂xs|x=0, C−1 is

the inverse of C.

λrs =

∫∞−∞{K

′r(t)}2dt

i=1,j 6=r

∫∞−∞K

2j (t)dt

}, if r = s ,

f(θ){∫∞−∞Kr(t)K

′r(t)dt}{

∫∞−∞Ks(t)K

′s(t)dt}

i=1,j 6=r,s

∫∞−∞K

2j (t)dt

}if r 6= s .

We now state the conditions for which the asymptotic properties of the estimate

proposed in this section will be proved.

(i) f (j1,j2,...,jk)(x) are uniformly continuous for 0 ≤ j1 + j2 + ...+ jk ≤ 2 .

(ii) The vector θ defined by f(θ) = maxx

f(x) is unique.

(iii) K(i)j are function of bounded variation for j = 1, 2, ..., k and i = 0, 1, 2.

(iv) h = n−δ, 0 < δ < 12k+4

Theorem 2.1.4. Under conditions (i),(ii),(iii), and (iv),

limn−→∞

θn = θ,

with probability one.

Theorem 2.1.5. Under conditions (i),(ii),(iii), and (iv), θn is asymptotically nor-

mally distributed with mean vector 0 and covariance matrixc−1Λc−1

nhk+2.

proof. See (Samanta, 1973) for more details.

We note that for k = 1, Theorem (2.1.5) reduces to Theorem (2.1.1) (Theo-

rem(5A) Parzen (1962) ).

2.2 The kernel estimation of the conditional mode

for random design models

Conditional density functions underlie many popular statistical object of interest,

thought they are rarely modeled directly in parametric setting and have perhaps

received even less attention in kernel setting. Nevertheless, as will be seen, they

are extremely useful for range of tasks, whether directly estimation the conditional

density function, or perhaps molding conditional quantiles via estimation of a con-

ditional CDF. And, of course, regression analysis (i.e, modeling means) depends

directly on the conditional density function. Indeed, estimating the conditional

density is actually much more informative, since it allows us not only to recalculate

the conditional expected value E(Y |X) and conditional variance from the density,

but also to provide the general shape of conditional density. Serval nonparametric

methods can be proposed for estimating conditional density function based on data

(Xi, Yi). A class of kernel-type estimators is called the Nadaraya-Watson estimator

which is one of the most widely known and used for estimating conditional den-

sity function. A lot of researches have been carried out on conditional distribution

function estimation. Conditional density estimation was introduced by (Rosenblatt,

1956). (Fan, 1993) proposed a direct estimator on Local polynomial estimation. The

NW estimator is created by independent researchers (Watson, 1964) and (Nadaraya,

1964). In this section, we investigate a nonparametric method for estimating the

conditional density function fn(y|x) of a random variable y given a random vector

x which is called Nadaraya-Watson estimator.

Let (X1, Y1), ..., (Xn, Yn) be R × R valued independent random variables with a

common pdf f(x, y). Also assume that X admits a marginal density g(x).

Suppose that we are given n observations of (X, Y ), denoted by (X1, Y1), ..., (Xn, Yn).

First, we consider the following estimator of the joint density f(x, y) of (X, Y ),

fn(x, y) =1

n∑i=1

Kh(x−Xi)Kh(y − Yi),

and define the marginal pdf of X as

gn(x) =1

n∑i=1

Kh(x−Xi),

where Kh = 1hK(x/h).

The Nadaraya-Watson estimator of the conditional density function f(y|x) is

fn(y|x) =fn(x, y)

=n−1

∑ni=1Kh(x–Xi)Kh(y–Yi)

n−1∑n

i=1Kh(x–Xi)

∑ni=1Kh(x−Xi)Kh(y − Yi)∑n

i=1Kh(x−Xi)

Now to estimate m(.), first we compute an estimator of the joint density f(x, y)

of (X, Y ), and then to integrate it according to the formula

m(x) =

∫∞−∞ yf(x, y)dy∫∞−∞ f(x, y)dy

. (2.2.1)

the conditional mean estimator of Y given X = x, mn(x), is defined as follows :

mn(x) =

∫ ∞−∞

yfn(y|x)dy =

∫∞−∞ yfn(x, y)dy∫∞−∞ fn(x, y)dy

. (2.2.2)

We want to investigate the prediction of Y by the mode function,

θ(x) = argmaxy∈R

fn(y|x), x ∈ R, (2.2.3)

where fn(y|x) denotes the kernel estimation of conditional density of Y given X,

we call θ(x) the population conditional mode or the mode function,

Definition 2.2.1. (Samanta and Thavanesmaran, 1990) (The estimated condi-

tional mode).

The estimated conditional mode is defined as the maximum of fn(y|x) over

y ∈ R

θn(x) = argmaxy∈R

fn(y|x), x ∈ R (2.2.4)

We call θn(x) the sample conditional mode. (Samanta and Thavanesmaran, 1990)

considered θn(x) as an estimate of θ(x). Then the result for multivariate case for

distinct points x1, x2, ..., xk discussed by (Salha and Ioanides, 2007).

Lemma 2.2.2. Under the formulas of fn(x, y) and f(x, y), we have

• (1)∫∞−∞ fn(x, y)dy = 1

∑ni=1Kh(x−Xi).

• (2)∫∞−∞ yfn(x, y)dy = 1

∑ni=1Kh(x−Xi)Yi.

proof. (1)

∫ ∞−∞

fn(x, y)dy =

∫ ∞−∞

n∑i=1

nKh(x−Xi)Kh(y − Yi)dy.

n∑i=1

Kh(x−Xi)

∫ ∞−∞

Kh(y − Yi)dy, sinceKh(u) = K(u/h)/h

n∑i=1

Kh(x−Xi)

∫ ∞−∞

(y–Yih

)dy, (2.2.5)

let u =y – Yih

, then du = 1hdy

then substitute in (2.2.5), we have

∫ ∞−∞

fn(x, y)dy =1

n∑i=1

Kh(x−Xi)

∫ ∞−∞

(y–Yih

n∑i=1

Kh(x−Xi)

∫ ∞−∞

K(u)du

n∑i=1

Kh(x−Xi).

Proof: (2)

∫ ∞−∞

yfn(x, y)dy = y

∫ ∞−∞

n∑i=1

nKh(x−Xi)Kh(y − Yi)dy. (2.2.6)

n∑i=1

Kh(x−Xi)

∫ ∞−∞

yKh(y − Yi)dy, (2.2.7)

Let u = y – Yi, then du = dy Then substitute in (2.2.7), we have

∫ ∞−∞

yfn(x, y)dy =1

n∑i=1

Kh(x−Xi)

∫ ∞−∞

(u+ Yi)Kh(u)du,

n∑i=1

Kh(x−Xi)(

∫ ∞−∞

uKh(u)du+ Yi

∫ ∞−∞

Kh(u)du)

n∑i=1

Kh(x−Xi)Yi.

If we substitute these into the numerator and denominator of (2.2.2) we obtain

the Nadaraya Watson kernel estimator for m(.),

mn(x) =

∑ni=1Kh(x−Xi)Yi∑ni=1Kh(x−Xi)

=n∑i=1

wni(x)Yi,

wni(x) =Kh

(x−Xi)

n∑i=1

Kh(x−Xi), i = 1, ..., n,

are the weight functions.

The bandwidth h determines the degree of smoothness of mn(.), This can be imme-

diately seen by considering the limits for h tending to zero or to infinity respectively.

2.2.1 Asymptotic properties of the conditional mode esti-

mation

We consider the following assumptions from (Samanta and Thavanesmaran, 1990).

Then discuss this result for multivariate case for distinct points by (Salha and

Ioanides, 2007). We consider the following assumptions from (Samanta, 1973).

(A1) (X1, Y1), ..., (Xn, Yn), is a sample of i.i.d. random variables with joint pdf

f(x, y), where the following hold:

(i) The marginal probability density function of X, g(x) is uniformly continuous.

(ii) f (i,j)(x, y) =∂i+jf(x, y)

∂xi∂yjexist and are bounded for 1 ≤ i+ j ≤ 4.

(A2)The kernel K is a Borel function and satisfies the following:

(i) K(u) tends to zero as u tends to ±∞.

(ii) K(u) and its first two derivatives are functions of bounded variation.

(iii) lim|u|−→∞

|u2K(i)(u)| = 0, (i = 0, 1).

(iv)∫∞−∞ u

iK(u)du = 1, (if i = 0),∫∞−∞ u

iK(u)du = 0, (if i = 1, 2).

(v)∫∞−∞ |u|

3K(u)du <∞.

(A3) h is a sequence of positive numbers tending to zero, and satisfies the following

limn−→∞

nh10 = 0.

Lemma 2.2.3. Under the assumptions (A1)(ii), (A2) and (A3), if (x, y) ∈ C(f),

then the following are true

(i) limn−→∞

nh4[V arf(0,1)n (x, y)] = f(x, y)

∫∞−∞

∫∞−∞(K(u)K(1)(v))2dudv.

(ii) (nh4)12{Ef (0,1)

n (x, y) – f (0,1)(x, y)} = o(1).

Lemma 2.2.4. Under the assumptions of lemma( 2.2.3), the following is true,

limn−→∞

(n−1h4)1+δ2

[ n∑i=1

E|wni–E{wni}|2+δ]

For fixed x expanding f(0,1)n (x, θn(x)) around θ(x), we obtain 0 = f

(0,1)n (x, θn(x)) =

f(0,1)n (x, θ(x)) + (θn(x) – θ(x))f

(0,2)n (x, θ∗n(x)),

where |θ∗n(x) – θ(x)| < |θn(x) – θ(x)|.

Hence,

θn(x) – θ(x) = –f(0,1)n (x, θ(x))

f(0,2)n (x, θ∗n(x))

. (2.2.8)

Lemma 2.2.5. Under the assumptions (A1), (A2)(ii), (iii) and (A3),if g(x) > 0

then f(0,2)n (x, θ∗n(x)) converges in probability to f

(0,2)n (x, θn(x)) as n tends to infinity.

Theorem 2.2.6. Suppose that x1, x2, ..., xk are distinct points, where f(xi, y) > 0,

and (xi, y) ∈ C(f), (i = 1, 2, ..., k). Then under the assumption (A1), (A2)(ii), (iii),

and (A3), the distribution of the vector

(nh4)12{f (0,1)

n (x1, y) – f (0,1)(x1, y), ..., f (0,1)n (xk, y) – f (0,1)(xk, y)}T ,

where T denotes the transpose, is asymptotically multivariate normal with mean

zero vector and diagonal covariance matrix Γ = [γij], where

γij = f(xi, y)

∫ ∞−∞

∫ ∞−∞{K(u)K(1)(v)}2dudv. (i = 1, 2, ..., k).

where K(1)(v) denotes the first derivative of K(v).

proof:

Without loss of generality, we consider the special case k = 2. The same argu-

ments are used for the more general case.

Before we start the proof of the theorem, we introduce some notation. For i =

1, 2, ..., n and s = 1, 2 the following notations:

Vni = h−3K(xs – Xi

(y – Yih

Wni(xs) = h2(Vni(xs) – EVni(xs)), Wn(xs) =n∑i=1

Wni(xs),

Zni = (Wni(x1),Wni(x2))T , Zn = n−

12 (Wn(x1),Wn(x2))

Zn = (nh4)12

(f (0,1)n (x1, y) – Ef (0,1)

n (x1, y), f (0,1)n (x2, y) – Ef (0,1)

n (x2, y))T

(2.2.9)

let A = [ars] be a 2 × 2 diagonal matrix with

ars = f(xs, y)

∫ ∞−∞

(K(u)K(1)(v)

)2dudv.

Let Z be a bivariate normally distributed random variable with mean zero vec-

tor and covariance matrix A.

First we will show that Zn converges in distribution to Z. To do that, we will use

the multivariate version of Cramer - World theorem. It will be sufficient to prove

that CZnT converges in distribution to CZT for any constant C = (c1, c2) ∈ R2,

C 6= 0. Note that,

CZTn =

∑ni=1 n

− 12CZT

ni, E(n−12CZT

ni) = 0.

let ρ2+δni = E|n− 12CZni|2+δ, ρ2+δn =

∑ni=1 ρ

2+δni , and σ2

n = V ar(CZTn ).

Using Liapounov’s condition, it will be sufficient to show that,

limn−→∞

ρ2+δn

σ2+δn

= 0. (2.2.10)

Now, the proof of the theorem will be satisfied via the following lemmas.

Lemma 2.2.7. Under conditions (A2)(ii),(iii),(iv), if (xs, y) ∈ C(f), then for (s

= 1, 2), (r = 1, 2), the following are true:

a. limn−→∞

Ew2ni(xs) = f(x, y)

∫∞−∞

∫∞−∞{K(u)K(1)(v)}2dudv.

b limn−→∞

Ewni(xs)wni(xr) = 0, (r 6= s).

proof: see (El-sayed, 2008).

Summary:

In this chapter, we presented precis of important of the mode, and discussion an

advantage and disadvantage it, and we studied Parzen’s mode and it’s generalization

where (Samanta, 1973) and (Salha, 2014) has given multivariate versions of Parzen’s

results, then we presented conditional mode function, and we studied them in the

case of i.i.d. random variables. Also we studied the asymptotic behavior of the

mode and the conditional mode estimation. In the next chapter we will study

kernel estimation of the regression mode for fixed design models.

Chapter 3

Kernel Estimation of TheRegression Mode for Fixed Design

Models

Chapter 3

Kernel estimation of the

regression mode for fixed design

models

Introduction:

The problem of estimating the mode of a probability density has received consider-

able attention in the literature. For a historical and mathematical survey, Parzen

(1962) deem the problem of estimating the mode of a univariate pdf. Parzen (1962)

and Nadaraya (1965) have shown that under some regularity conditions the esti-

mator of the population mode obtained by maximizing a kernel estimator of the

pdf is strongly consistent and asymptotically normally distributed. Samanta (1973)

has given multivariate versions of Parzen’s results. For independent and identically

distributed data, Samanta and Thavanesmaran (1990) deem the problem of esti-

mating the mode of a conditional pdf, for random design model, and they have

shown under regularity conditions that the estimator of the population conditional

mode is strongly consistent and asymptotically normally distributed. Salha and

Ioanides (2004) generalized their work for the multivariate case. Salha and Ioanides

(2007) deem the estimation of the conditional mode under dependence conditions.

In This chapter we study the fixed design regression estimation problem, where we

are given data (x1, Y1), (x2, Y2), ..., (xn, Yn) with conditional pdf f(y|x), where the

variable Y depending upon a fixed design predictor x through a regression function

m(x). In this chapter, we consider the regression model satisfying x1, x2, ..., xn ∈

[0, 1] and

Yi = m(xi) + εi.

i = 1, ..., n.

for so-called regression functionm : [0, 1] −→ R and some independent random vari-

ables ε1, ε2, ..., εn with mean zero and variance σ2. The design points x1, x2, ..., xn de-

termined by the experimenter, are ordered, i.e. we have 0 ≤ x1 ≤ x2 ≤ ... ≤ xn ≤ 1.

In the absence of other information, we can take xi = in, i = 1, 2, ..., n.

Most inquests are concerned with regression mean function m(x), where

m(x) =

∫yf(y|x)dy.

However, new foresights about the underling structures can be earned by deem-

ing other localities of the conditional distribution function f(y|x) of Y at a given

value x ∈ [0, 1] such as the regression quantile and the regression mode. We con-

sider the problem of estimating the mode of the unknown function f(y|x).

We consider fn(y|x) as an estimator of f(y|x), and it is given by

fn(y|x) = h−1n∑i=1

uni(x)L(y–Yih

) (3.0.1)

where, L is kernel function satisfying some conditions and h is a sequence of positive

numbers converging to zero and satisfies some specific conditions.

uni(x) = h−1∫ si

si–1

K(x–u

is the Gasser-Mueller function, where, K is a kernel function satisfying some

condition, s1, s2, ..., sn is a sequence of interpolating points such that xi–1 ≤ si ≤ xi,

for i = 1, 2, ..., n. In general, we take si =xi−1 + xi

Consequentially, there is a random variable Mn(x) which is called the sample re-

gression mode, such that

fn(Mn(x)|x) = max−∞<y<∞

fn(y|x).

In this chapter, for differentiated points x1, x2, ..., xk we will establish conditions

under which (nh4n)14 (Mn(x1),Mn(x2), ...,Mn(xk))

T , where T denotes the transpose,

is asymptotically multivariate normally distributed random variable.

This chapter is the main chapter of the thesis. In this chapter, we will study

the problem of estimating nonparametrically the regression mode for fixed design

models. We suppose the error random variables are independent. The joint asymp-

totic normality of the regression mode estimator at different fixed design points is

established under some regularity conditions. This chapter consists of two sections.

In Section 3.1, we introduce the fixed design model. In the next section, we present

mode estimation for fixed design data.

3.1 Fixed design model

A research design is the plan of a research study. The design of a study defines

the study kind (descriptive, correlational, semi-experimental, experimental, review,

meta-analytic) and sub-type (e.g., descriptive-longitudinal case study), research

problem, hypotheses, independent and dependent variables, experimental design,

and, if applicable, data collection methods and a statistical analysis plan. Research

design is the framework that has been created to find solutions to research questions.

Research designs are concerned with turning the research question into atesting

project. The best design depends on your research questions. Every design has

its positive and negative sides. The research design has been considered as for re-

search, dealing with at least four problems: what questions to study, what data are

relevant, what data to collect, and how to analyze the results (Krishna, 2013).

Research design can be divided into fixed and flexible research designs (Robson,

1993), others have referred to this distinction with quantitative research designs

and qualitative research designs.

In fixed designs, the design of the study is fixed before the main stage of data

collection takes place. Fixed designs are normally theory-driven, otherwise, it is

impossible to know in advance which variables need to be controlled and measured.

Often, these variables are measured quantitatively. Flexible designs allow for more

freedom during the data collection process. One reason for using a flexible research

design can be that the variable of interest is not quantitatively measurable, such as

culture. In other cases, theory might not be available before one starts the research.

Examples of fixed (quantitative) research designs: (Krishna, 2013)

Experimental design:

In an experimental design, the researcher briskly attempts to change the situa-

tion, adverbs, or experience of contributers (manipulation), which may progress to

a change in behavior or outcomes for the contributers of the study. The researcher

randomly assigns contributers to different conditions, measures the variables of in-

terest and attempts to control for confounding variables. Therefore, experiments

are often highly fixed even before the data collection starts.

In a good experimental design, a few things are of smashingly importance. First

of all, it is necessary to think of the best way to operationalize the variables that will

be measured, as well as which statistical methods would be most suitable to answer

the research question. Thus, the researcher should labeled what the expectations

of the study are as well as how to analyse any potential results. Finally, in an

experimental design the researcher must think of the practical calibration including

the availability of contributers as well as how representative the contributers are to

the goal population. It is important to labeled each of these factors before beginning

the experiment (Adr and Adr, 2008). Additionally, many researchers utilization

power analysis before they behaving an experiment, in order to determine how

large the sample must be to find an effect of a given size with a given design at the

needed probability of making a Type I or Type II error.

In statistical hypothesis testing, a type I error is the incorrect rejection of a true

null hypothesis (a ”false positive”), while a type II error is incorrectly retaining

a false null hypothesis (a ”false negative”). More simply stated, a type I error is

detecting an effect that is not present, while a type II error is failing to detect an

effect that is present.

In statistics, a null hypothesis is a statement that one seeks to nullify with evi-

dence to the contrary. Most commonly it is a statement that the phenomenon being

studied produces no effect or makes no difference. An example of a null hypothesis

is the statement ”This diet has no effect on people’s weight.” Usually, an experi-

menter frames a null hypothesis with the intent of rejecting it: that is, intending

to run an experiment which produces data that shows that the phenomenon un-

der study does make a difference (Sheskin, 2004). In some cases there is a specific

alternative hypothesis that is opposed to the null hypothesis, in other cases the

alternative hypothesis is not explicitly stated, or is simply ”the null hypothesis is

false”, in either event, this is a binary judgment, but the interpretation differs and

is a matter of significant dispute in statistics.

A type I error (or error of the first kind) is the incorrect rejection of a true null

hypothesis. Usually a type I error leads one to conclude that a supposed effect or

relationship exists when in fact it doesn’t. Examples of type I errors include a test

that shows a patient to have a disease when in fact the patient does not have the

disease, a fire alarm going on indicating a fire when in fact there is no fire, or an

experiment indicating that a medical treatment should cure a disease when in fact

it does not.

A type II error (or error of the second kind) is the failure to reject a false null

hypothesis. Examples of type II errors would be a blood test failing to detect

the disease it was designed to detect, in a patient who really has the disease, a

fire breaking out and the fire alarm does not ring; or a clinical trial of a medical

treatment failing to show that the treatment works when really it does (Peck and

Devore, 2011).

Non-experimental research designs:

Non-experimental research designs do not involve a manipulation of the situa-

tion, circumstances or experience of the participants. Non-experimental research

designs can be broadly classified into three categories. First, in relational designs,

a range of variables are measured. These designs are also called correlation studies,

because correlation data are most often used in analysis. Since correlation does not

imply causation, such studies simply identify co-movements of variables. Correla-

tional designs are helpful in identifying the relation of one variable to another, and

seeing the frequency of co-occurrence in two natural groups . The second type is

comparative research. These designs compare two or more groups on one or more

variable, such as the effect of gender on grades. The third type of non-experimental

research is a longitudinal design. A longitudinal design examines variables such as

performance exhibited by a group or groups over time.

Quasi experimental:

Quasi research designs are research design that follow the experimental procedure,

but do not randomly assign people to (treatment and comparison) groups.

3.2 Mode estimation for fixed design data

The mode is considered as one of the central tendency measures, the tendency of

data towards the central around a particular value. The mode is defined as the

common value or the most repetitive among those observation, and the data might

have more than one modes or nothing at all which can’t be calculated. The mode

can be found by calculation or drawing and it is not affected by the irregular values.

In this section, we will study the mode estimation for fixed design models and we

present the kernel estimation of the conditional mode for fixed design models.

We consider the fixed equally spaced design regression model

Yi = m(xi) + εi. i = 1, 2, ..., n.

n, i = 1, 2, ..., n.

In this section we consider the following assumptions from (Samanta and Thavanes-

maran, 1990). Then discuss this result for multivariate case by (Salha and Ioanides,

2007), (Salha, 2014) For distinct points.

We assume the following conditions are satisfied.

Condition 1

1. m(x) is an bounded function on [0, 1].

2. εi are independent random variables with E(εi) = 0 and V ar(εi) = σ2, i =

1, 2, ..., n.

Condition 2

The partial derivatives

f (j)(x, y) =∂jf(y|x)

∂yj, j = 1, 2

exist and are bounded.

Condition 3

1. The kernel function K(.) has support [-1, 1] with K(−1) = K(1) = 0.

2.∫ 1

−1 K(u) du = 1 and∫ 1

−1 uK(u) du = 0.

3. The kernel function L(.) is asymmetric pdf.

Condition 4

The bandwidth are chosen such that h = cn−δ, c > 0, 0<δ<1 such that

limn−→∞

nh10 = 0.

To prove our main result we will use the following preliminaries Lemmas from

(Samanta and Thavanesmaran, 1990).

Lemma 3.2.1. (Samanta, 1973) (Bochner Lemma) Suppose K1(u) and K2(u) are

real valued Borel measurable functions satisfying the following conditions:

(1) supu∈R |Ki(u)| <∞, i = 1, 2.

(2)∫∞−∞ |Ki(u)|du <∞, i = 1, 2.

(3) lim|u|−→∞

u2|Ki(u)| = 0, i = 1, 2.

If f(x, y) ∈ C(f), the set of continuity points of f , then for any η ≥ 0,

limn−→∞

[h−2∫∞−∞ |K1(

uh)K2(

vh)|1+ηf(y−v|x−u)dvdu] = f(y|x)

∫∞−∞

∫∞−∞ |K1(u)K2(v)|1+ηdvdu.

Define,

fn(j)(y|x) = h−(j+1)

n∑i=1

uni(x)L(j)(y–Yih

), j = 1, 2,

Vni(y|x) = h−2uni(x)L(1)(y–Yih

), i = 1, 2, ..., n.

Lemma 3.2.2. Under the conditions (1) through (4), if f(x, y) ∈ C(f), then the

following are true

1. limn−→∞

nh2[V arf

(1)n (y|x)

]= f(y|x)

∫∞−∞

−1(K(u)L(1)(v))2dudv.

2. (nh2)12

(1)n (y|x) – f (1)(y|x)

]= o(1).

Lemma 3.2.3. Under the assumptions of lemma (3.2.2), the following is true,

limn−→∞

(n−1h2)1+δ2

[ n∑i=1

E|Vni–EVni|2+δ]

Lemma 3.2.4. Under the conditions (1) through (4), if g(x) > 0, then f(2)n (M∗

n(x)|x)

converges in probability to f (2)(M(x)|x) as n tends to infinity, where |M∗n(x) – M(x)|<

|Mn(x) – M(x)|.

Theorem 3.2.5. Suppose that x1, x2, ..., xk are distinct points, where f(y|xi) > 0,

i = 1, 2, ..., k. Then under the conditions (1) through (4),

(nh2)12

(f (1)n (y|x1), f (1)

n (y|x2), ..., f (1)n (y|xk)

where T denotes the transpose, is asymptotically multivariate normal with mean

vector zero and diagonal covariance matrix Γ = [γij], with

γii = f(y|xi)∫ ∞−∞

(K(u)L(1)(v)

)2dudv.

where L(1)(v) denotes the first derivative of L(v).

Proof:

Without loss of generality, we consider the special case k = 2. The method of the

proof remains valid for more general case. Firstly, we define for (i = 1, 2, ..., n) and

(s = 1, 2) the following notations:

Wni(xs) = h(Vni(y|xs) – EVni(y|xs)),

Wn(xs) =n∑i=1

Wni(xs),

Zni = (Wni(x1),Wni(x2))T ,

Zn = n−12 (Wn(x1),Wn(x2))

Zn = (nh4)12

(f (1)n (y|x1) – Ef (1)

n (y|x1), f (1)n (y|x2) – Ef (1)

n (y|x2))T

(3.2.1)

let A = [ars] be a 2 × 2 diagonal matrix with

ass = f(y|xs)∫ ∞−∞

(K(u)L(1)(v)

)2dudv.

Let Z be a bivariate normally distributed random variable with mean zero vector

and covariance matrix A.

First we will show that Zn converges in distribution to Z. To do that, we will use

the multivariate version of Cramer - World theorem. It will be sufficient to prove

that CZnT converges in distribution to CZT for any constant C = (c1, c2) ∈ R2,

C 6= 0.

Note that,

CZTn =

∑ni=1 n

− 12CZT

ni, E(n−12CZT

ni) = 0.

let ρ2+δni = E|n− 12CZni|2+δ, ρ2+δn =

∑ni=1 ρ

2+δni , and σ2

n = V ar(CZTn ).

Using Liapounov’s condition, it will be sufficient to show that,

limn−→∞

ρ2+δn

= 0. (3.2.2)

Now, the proof of the theorem will given via the following two lemmas.

Lemma 3.2.6. Under the conditions (1) through (4), if f(x, y) ∈ C(f), then for

s = 1, 2, r = 1, 2, the following are true:

a. limn−→∞

Ew2ni(xs) = f(y|xs)

∫∞−∞

(K(u)L(1)(v)

)2du dv.

b. limn−→∞

Ewni(xs)wni(xr) = 0, (r 6= s).

Proof: a. From the definition of wni(xs), we have

Ew2ni(xs) = h2[EV 2

ni(y|xs) – (EVni(y|xs))2]. (3.2.3)

h2EV 2ni(y|xs) = h2

[h−4

∫ ∞−∞

xs – u

h)L(1)(

y – v

h))2f(v|u)du dv

]= h−2

[ ∫ ∞−∞

h)L(1)(

h))2f(y – v|xs – u)du dv

]Now, by an application of Lemma (3.2.1), we get that

limn−→∞

h2EV 2ni(y|xs) = f(y|xs)

∫ ∞−∞

(K(u)L(1)(v)

)2dudv. (3.2.4)

h2(EVni(y|xs))2 = h2[h−2

∫ ∞−∞

−1K(

h)L(1)(

h)f(y – v|xs – u)dudv

By another application of Lemma (3.2.1), we get that

limn−→∞

h2(EVni(y|xs)

)2= 0. (3.2.5)

Now, a combination of (3.2.3), (3.2.4) and (3.2.5), (a) is satisfied.

b. From the definition of Wni(x), we have

E(Wni(x1)Wni(x2)) = h2(EVni(y|x1)Vni(y|x2) – EVni(y|x1)EVni(y|x2)

). (3.2.6)

Suppose that without loss of generality, x2 > x1, let δ = x2 – x1 and δn = δh.

h2EVni(y|x1)Vni(y|x2) = h−2∫ ∞−∞

−1K(

x1 – u

x2 – u

(L(1)(

y – v

f(v|u)du dv

∫ ∞−∞

−1K(u)K(δn + u)

(L(1)(v)

)2f(y–hv|x1 – hu)dudv.

∫ ∞−∞

(L(1)(v)

)2f(y – hv|x1 – hu)dv

[ ∫ 1

−1K(u)K(δn+u)

(3.2.7)

Next, note that∫ 1

−1K(u)K(δn + u)du =

∫|u|< δn

K(u)K(δn + u)du+

∫|u|> δn

K(u)K(δn + u)du

6 sup|u|< δn

K(δn + u)

−1K(z)dz + sup

|u|> δn2

−1K(δn + z)dz

6 sup|u|> δn

K(u).O(1) + sup|u|> δn

K(u).O(1).

6 2 sup|u|> δn

K(u).O(1) 64

δnsup|u|> δn

|uK(u)|.O(1)

δsup|u|> δn

|uK(u)|.O(1) = O(h). (3.2.8)

Finally, from (3.2.7) and (3.2.8), we have that

limn−→∞

h2(EVni(y|x1)Vni(y|x2)) = 0. (3.2.9)

h2EVni(y|x1)EVni(y|x2) = h2[h−2

∫ ∞−∞

−1K(

h)L(1)(

h)f(y – v|x1 – u) du dv

]×[h−2

∫ ∞−∞

−1K(

h)L(1)(

h)f(y–v|x1–u) du dv

]−→ 0. (3.2.10)

By an application of Lemma (3.2.1). Hence a combination of (3.2.6), (3.2.9),

and (3.2.10), gives the desired result.

Lemma 3.2.7. Under the conditions of Lemma (3.2.6), we have that

limn−→∞

σ2n = CACT .

Proof:

σ2n = V ar(CZT

n ) and by the definition of ZTn , we have

σ2n = V ar(n−

12 c1Wn(x1) + n−

12 c2Wn(x2))

= n−1c21 V ar(Wn(x1)) + n−1c22V ar(Wn(x2)) + 2n−1c1c2 Cov(Wn(x1),Wn(x2))

= n−1c21

n∑i=1

V ar(Wni(x1)) + n−1c22

n∑i=1

V ar(Wni(x2))

+ 2n−1c1c2 Cov

( n∑i=1

Wni(x1),n∑i=1

Wni(x2)

)= c21 V ar(Wni(x1)) + c22 V ar(Wni(x2)) + 2n−1c1c2 E

( n∑i=1

n∑j=1

Wni(x1)Wnj(x2)

An application of Lemma (3.2.6) implies that,

limn−→∞

∫ ∞−∞

(K(u)L(1)(v)

)2du dv[c21f(y|x1) + c22f(y|x2)] = CACT > 0.

Next, we have

ρ2+δni ≤ n−(1+δ2)|C|2+δE|Zni|2+δ = n−(1+

δ2)|C|2+δE|(Wni(x1),Wni(x2))|2+δ

≤ n−(1+δ2)|C|2+δ22+δmax{E|Wni(x1)|2+δ, E|Wni(x2)|2+δ}.

Without loss of generality assume that E|Wni(x1)|2+δ ≥ E|Wni(x2)|2+δ.

ρ2+δni ≤ n−(1+δ2)|C|2+δ22+δE|Wni(x1)|2+δ

= n−(1+δ2)|C|2+δ22+δE|h2(Vni(x1) – EVni(x1))|2+δ

= 22+δ|C|2+δ(n−1)(1+δ2)(h2)2+δE|Vni(x1) – EVni(x1)|2+δ

= 22+δ|C|2+δ(n−1h4)(1+δ2)E|(Vni(x1) – EVni(x1)|2+δ

Now, by an application of Lemma (3.2.3), we get that

ρ2+δn = n−(1+δ2)

n∑i=1

ρ2+δni

≤ 22+δ|C|2+δ(n−1h4n)(1+δ2)

n∑i=1

E|Vni(x1) – EVni(x1)|2+δ −→ 0.

Hence the Liapounov’s condition (2) is satisfied. Therefore, we have that, CZTn

is asymptotically normally distributed with mean zero and variance CACT and

by the multivariate version of Cramr - World theorem, we have that Zn converges

in distribution to Z. Now an application of the second part of Lemma (3.2.2) in

conjunction with the last convergence gives the proof of the theorem.

Theorem 3.2.8. Suppose that x1, x2, ..., xk are distinct points, where f(y|xi) > 0,

i = 1, 2, ..., k then under the conditions (1) through (4), the random vector variable

(nh2n)12

(Mn(x1) – M(x1), ...,Mn(xk) – M(xk)

)Tis asymptotically multivariate normally distributed with mean vector zero and a

diagonal covariance matrix B = [bij], with

bii =f(M(xi)|xi)

(f (2)(M(xi)|xi))2

∫ ∞−∞

(K(u)L(1)(v)

du dv.

Proof: For a fixed x, taking the Taylor expansion of f(1)n (Mn(x)|x) around M(x),

implies

0 = f (1)n (Mn(x)|x) ≈ f (1)

n (M(x)|x) + (Mn(x) – M(x))f (2)n (M∗

n(x)|x),

|M∗n(x) – M(x)| < |Mn(x) – M(x)|.

This implies that,

Mn(x) – M(x) ≈ –f(1)n (M(x)|x)

f(2)n (M∗

n(x)|x),

and therefore

(nh4)12 (Mn(x1) – M(x1), ...,Mn(xk) – M(xk))

−(nh4)12

[f(1)n (M(x1)|x1)

f(2)n (M∗

n(x1)|x1), ...,

f(1)n (M(xk)|xk)

f(2)n (M∗

n(xk)|xk)

|M∗n(xi) – M(xi)| < |Mn(xi) – M(xi)|, i = 1, 2, ..., k.

An application of Theorem (3.2.5) and Lemma (3.2.4) completes the proof of

the theorem.

Summary:

In this chapter, we introduce the fixed design model, then we studied the problem

of estimating nonparametrically the regression mode for fixed design models. We

suppose the error random variables are independent. The joint asymptotic normal-

ity of the regression mode estimator at different fixed design points is established

under some regularity conditions. We considered finished theoretical phase whose

need it in practical phase.

Chapter 4

Applications

Chapter 4

Applications

Introduction:

This chapter contains three sections. The first section contains four simulated

studies. The second section contains applications using real data. The last section

contains some conclusion remakes and suggestions.

4.1 Simulation

If f(y|x) is unknown conditional pdf, then its estimation can be done by estimating

the conditional density function fn(y|x). Several nonparametric methods have been

proposed for estimating the conditional density function f(y|x) based on a fixed

sample, we will simulate the data and compare the results from the simulated

studies with the theoretical results, and if it consistent with or not. In this section,

we computed the conditional mode function through applications using simulated

data. For each estimator, we computed the mean squared error (MSE) and the

correlation coefficients between y the predicated values and yn the actual values

R2y,yn , where

MSE =SSE

SSE =n∑i=1

(yi – yni)2.

R2y,yn = 1 –

where, SSTO =∑n

i=1(yi – y)2 where y denotes the mean of actual values, yi de-

notes the true value and yni denotes its predicted value. The binwidth is computed

using the equation (1.6.2).

We studied ten models and then we choose the best four models.

For simulated data of sizes 150, 200 and 300 were simulated from the follwing

four models.

1. y = (sin 2π(1 – x)2) + xe,

2. y =1

5x+ 1+ sin(5x) + xe,

3. y = cos(π(1 – x)2) + xe,

4. y = 1 + 10cos(x) + x2e.

where x has a uniform distribution on [0, 1] and e has a standard normal dis-

tribution. We used the following two kernel functions, the rectangle kernel

K(x) =1

2, |x|<1

and the Gaussian kernel

L(x) =1√2πe

−x22 ,−∞<x<∞,

in eqution (3.0.1) to estimate the fixed design mode.

For each simulation, we computed the (MSE), and Correlation Coefficient.

The results are summarized in Table 4.1, Table 4.2, Table 4.3, and Table 4.4

respectively.

Figure 4.1, Figure 4.2, Figure 4.3 and Figure 4.4 show the scatter plot of the

simulated data together with the perfect curve and mode estimation.

From the tables below, we show that if the sample size increased then the MSE

decreased and correlation cofficient are increased.

The results of the four simulation studies indicate that the performance of the

estimator is reasonably good.

Table 4.1: The MSE and The Correlation Coefficient for the Simulation Study 1

Sample size MSE Correlation Coefficient

150 0.1333399 0.6517507

200 0.09691356 0.7468916

300 0.07884304 0.7940889

Figure 4.1: A scatter plot of the first simulated data together perfect curve

150 0.1155162 0.8318199

200 0.07993088 0.8834305

300 0.03671442 0.9463647

Figure 4.2: A scatter plot of the second simulated data together with perfect curve

150 0.1171055 0.7545829

200 0.09724039 0.7967533

300 0.0672099 0.8598906

Figure 4.3: A scatter plot of the third simulated data together with perfect curve

150 0.1149359 0.9409647

200 0.08765359 0.9548517

300 0.07012372 0.9637793

Figure 4.4: A scatter plot of the foorth simulated data together with perfect curve

4.2 Real data

In this section, we computed the conditional mode by kernel estimator through

applications using real data.

1- Ethanol data

We consider an application using a real life data which is built in S-Plus program.

We consider the ethanol data which records 88 measurements from an experiment

in which ethanol was burned in a single cylinder auto-mobile test engine. We used

the first 71 observation from the variable E, which indicates the measure of the

richness of the air/ethanol mixture to estimate the last 17 observations. The mean

squared error is computed, MSE = 0.0436. Figure 4.5 shows the plot of the data

and the regression mode estimation.

Figure 4.5: Regression mode estimation of the ethanol data

2- Vapor Pressure of Liquid Water data

We consider an application using a real life data which is (Lide, 2002). We con-

sider the vapor Pressure of Liquid Water data which records 160 measurements from

an experiment in which the 160 - degree fixed point of the international temperature

scale is defined as the temperature of condensing water vapor under the pressure of

one standard atmosphere. In actual practice, the usual procedure in thermometer

calibration is to observe the thermometer reading in condensing water vapor at the

prevailing atmospheric pressure, and to determine the actual temperature by ob-

servation of the pressure and use of the relation of vapor pressure to temperature.

The differences shown indicate the range of interpretations of the vapor pressure

data as affecting the practical calibration of thermometers. The mean squared er-

ror is computed, MSE = 0.000241. Figure 4.6 shows the plot of the data and the

regression mode estimation.

Figure 4.6: Regression mode estimation of the vapor Pressure of Liquid Water

4.3 Conclusion

In this thesis, the problem of estimating the regression mode for fixed design model

has been considered. The joint asymptotic normality of the regression mode esti-

mator at different fixed design points has been established under some regularity

conditions. The performance of the proposed estimator is tested via applications

using simulated and real life data. The results of the applications indicate that the

performance of the estimator is reasonably good.

The results of the applications can be slightly improved if the rectangle kernel

is replaced by the Epanchinkov kernel which is the optimal kernel. We used the

rectangle kernel for computational reasons. The new estimator can be modified by

considering a new bandwidth selection technique that uses a variable bandwidth

that depends on the points at which the mode function is estimated rather than a

constant bandwidth. By considering some conditions, the results of this thesis can

be generalized to the case of dependent data under some mixing conditions and for

the case of time series data.

Bibliography

Abraham, C., Biau, G., and Cadre, B. (2003). Simple estimation of the mode of a

multivariate density. Canadian Journal of Statistics, 31(1):23–34.

Abraham, C., Biau, G., and Cadre, B. (2004). On the asymptotic properties of a

simple estimate of the mode. ESAIM: Probability and Statistics, 8:1–11.

Adr, H. J. and Adr, M. (2008). Advising on research methods: A consultant’s

companion. Johannes van Kessel Publishing.

Devroye, L. (1979). Recursive estimation of the mode of a multivariate density. The

Canadian Journal of Statistics, pages 159–167.

El-sayed, H. (2008). The Asymptotic Distributions of The Kernel Estimations of

The Conditional Mode and Quantiles. Degree Of Master of Mathematics. The

Islamic University Of Gaza, Palestine.

Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies.

The Annals of Statistics, pages 196–216.

Fan, J. and Yao, Q. (2003). Nonlinear time series: Nonparametric and parametric

methods springer-verlag. New York.

Fix, E. and Hodges Jr, J. L. (1951). Discriminatory analysis nonparametric dis-

crimination: consistency properties. Technical report, DTIC Document.

Hansen, B. E. (2009). Lecture notes on nonparametrics. Lecture notes.

Hogg, R. V. and Craig, A. T. (1995). Introduction to mathematical statistics.(5

edition). Upper Saddle River, New Jersey: Prentice Hall.

Hrdle, W. (2012). Smoothing techniques: with implementation in S. Springer Science

and Business Media.

Konakov, V. D. (1974). On the asymptotic normality of the mode of multidi- men-

sional distributions, volume 19, pages 794– 799. Theory of Probab.

Krishna, S. (2013). Assignment on research design importance.

Le Cam, L. M. (1953). On some asymptotic properties of maximum likelihood

estimates and related Bayes estimates, volume 1. University of California press.

Lide, D. R. (2002). Crc handbook of chemistry and physics: A ready-reference book

of chemical and physical data Florida: CRC Press, cop.

Loeve, M. (1960). Probanility Theory. 2nd Ed.Van Nostrand, Princeton. Nostand.

Nadaraya, E. A. (1964). some new estimates for distribution functions, volume 15,

pages 497–500. Theory Probab.

Nadaraya, E. A. (1965). On Nonparametric Estimation of Density Functions and

Regression Curves, volume 10, pages 186–190. Theory of Probability and Their

Applications.

Ouyang, Z. (2005). Univariate kernel density estimation. pdf (18-09-2011).

Parzen, E. (1962). On Estimation of a Probability Density Functin and Mode,

volume 33, pages 1065–1076. The Annals of Mathematical Statistics.

Peck, R. and Devore, J. L. (2011). Statistics: The exploration and analysis of data.

Cengage Learning.

Racine, J. (2008). Nonparametric econometrics: a primer, volume 3. Now Pub.

Robson, C. (1993). Real-world research: A resource for social scientists and prac-

titioner researchers. Malden: Blackwell Publishing.

Rosenblatt, M. (1956). Remarks on some non-parametric estimates of the density

function, volume 27, pages 832–837. Annals Math. Statist.

Royden, H. L. and Fitzpatrick, P. (1988). Real analysis, volume 198. Macmillan

New York.

Salha, R. (2014). Kernel estimation of the regression mode for fixed design model.

Salha, R. and Ioanides, D. (2007). The asymptotic distribution of the estimated

conditional mode at afinite number of distinct points under dependence conditions,

volume 15, pages 199–214. The Islamic University Journal.

Samanta, M. (1973). Nonparametric Estimation of The Mode of A multivariate

Density, volume 7, pages 109–117. South African Statistical Journal.

Samanta, M. and Thavanesmaran, A. (1990). Nonparametric Estimation of the

Conditional Mode. Communications in Statistics, volume 19, pages 4515–4524.

Theory and Methods.

Sen, P. K. (1993). Large sample methods in statistics: an introduction with appli-

cations. New York.

Sheskin, D. J. (2004). Handbook of parametric and nonparametric statistical proce-

dures. crc Press.

Silverman, B. W. (1986). Density estimation for statistics and data analysis, vol-

ume 26. CRC press.

Sturges, H. A. (1926). The choice of a class interval. Journal of the american

statistical association, 21(153):65–66.

Wand, M.P, J. M. (1995). Kernel Smoothing. Univ of New South Wales Asutralia.

Watson, G. S. (1964). Smooth regression analysis, pages 359–372. ankhy: The

Indian Journal of Statistics, Series A.

Yates, R. D. and Goodman, D. J. (1999). Probability and stochastic processes.

John Willey and Sons.

Kernel Estimation of the Conditional Mode for Fixed Design ... · Kernel Estimation of the...

Documents

Using Conditional Kernel Density Estimation for Wind …users.ox.ac.uk/~mast0315/WindPowerCKD.pdfThe management of wind farms and electricity systems can benefit greatly from the availability

Kernel density estimation in Rjames/w7-STAT574b.pdf · Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is

Kernel Density Estimation

Bandwidth Selection in Nonparametric Kernel Estimation

Consistent Parameter Estimation for Conditional Moment …homes.chass.utoronto.ca/~amaynard/metricsseminar/CM... · 2006-03-14 · Consistent Parameter Estimation for Conditional

Auto Regressive Conditional Density Estimation

Kernel estimation and display of a five-dimensional … estimation and display of a five-dimensional conditional intensity function G. Adelfio Dipartimento di Scienze Statistiche

Kernel Conditional Moment Test via Maximum Moment Restriction · conditional moment restrictions in a reproduc-ing kernel Hilbert space (RKHS) called con-ditional moment embedding

ADAPTIVE BAYESIAN ESTIMATION OF CONDITIONAL DENSITIES

Kernel Embeddings of Conditional Distributionslsong/papers/SonFukGre13.pdf · Kernel embeddings of conditional distributions lead to simple algorithms to solve these problems, which

Kernel density estimation via diffusion · 2010. 9. 16. · KERNEL DENSITY ESTIMATION VIA DIFFUSION 2917 Second, the popular Gaussian kernel density estimator [42] lacks local adaptiv-

Robust Kernel Density Estimation

Two-Phase Kernel Estimation for Robust Motion · PDF fileTwo-Phase Kernel Estimation for Robust Motion ... Two-Phase Kernel Estimation for Robust Motion Deblurring 159 ... r. (h) Deblurring

Kernel Dependency Estimation - Neural Information Processing …papers.nips.cc/paper/2297-kernel-dependency-estimation.pdf · 2017-11-15 · Kernel Dependency Estimation Jason Weston,

Kernel deconvolution density estimation

Conditional Mutual Information Estimation for Mixed

SPATIAL CONDITIONAL QUANTILE REGRESSION: WEAK CONSISTENCY ...€¦ · SPATIAL CONDITIONAL QUANTILE REGRESSION: WEAK CONSISTENCY OF A ... Kernel regression estimation, spatial process

Estimation of the Multivariate Conditional-Tail ...cedric.cnam.fr/~dibernae/presentationDiBernardino.pdf · Estimation of the Multivariate Conditional ... Introduction Multivariate

KERNEL SPATIAL DENSITY ESTIMATION IN INFINITE …

LOG-TRANSFORM KERNEL DENSITY ESTIMATION …expertise.hec.ca/actualiteeconomique/wp-content/uploads/...HEC Montréal emmanuel.flachaire@hec.ca abstract–Standard kernel density estimation