View
9
Download
0
Category
Preview:
Citation preview
Kernel Estimation of the Conditional
Mode for Fixed Design Models
تقدير النواة للمنوال الشرطي لنماذج التصميم الثابته
Doaa A. ELhertaniy
Supervised by
prof. Raid B. Salha
prof. of Mathematical Statistics
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree
of Master of Science in Mathematical Sciences
October /2017
زةــغ – ةــلاميــــــة الإســـــــــامعـالج
حث العلمي والدراسات العلياـبشئون ال
ة العـلـــــــــــــــــــــومــــــــــــــــــــليـك
اضـيــةالـريــــوم ـــماجستيـــــــر العلـ
The Islamic University – Gaza
Research and Postgraduate Affairs
Faculty of Science
Master of Mathematical Science
إقــــــــــــــرار
أنا الموقع أدناه مقدم الرسالة التي تحمل العنوان:
Kernel Estimation of the Conditional
Mode for Fixed Design Models
تقدير النواة للمنوال الشرطي لنماذج التصميم الثابته
أقر بأن ما اشتملت عليه هذه الرسالة إنما هو نتاج جهدي الخاص، باستثناء ما تمت الإشارة إليه حيثما ورد،
لنيل درجة أو لقب علمي أو بحثي لدى أي الاخرين وأن هذه الرسالة ككل أو أي جزء منها لم يقدم من قبل
مؤسسة تعليمية أو بحثية أخرى.
Declaration
I understand the nature of plagiarism, and I am aware of the University’s policy on
this.
The work provided in this thesis, unless otherwise referenced, is the researcher's own
work, and has not been submitted by others elsewhere for any other degree or
qualification.
اسم الطالب:
:Student's Name دعاء عبد الرحمن الحرتاني
:Signature دعاء الحرتاني التوقيع:
:Date 2017 / 8/10 التاريخ:
Kernel Estimation of the Conditional
Mode for Fixed Design Models
Doaa Abd ELrahman ELhertaniy
October 8, 2017
Abstract
We are interested in the area of nonparametric prediction, therefore, we study the
relationship between a current observation and previous observations, where the
conditional density function plays an important role.
In this thesis, we study the problem of estimating nonparametrically the condi-
tional mode for fixed design models. We suppose the error random variables are
independent. The joint asymptotic normality of the conditional mode estimator at
different fixed design points is established under some regularity conditions.
We started our study by consider the mode of a sample of identically independent
distributed (i.i.d.) data. Also, we study some theoretical properties of the kernel
estimator for the conditional density function. Moreover we study some sufficient
conditions under which the kernel estimation of the mode for random design models
is asymptotically normally distributed. After that we study the kernel estimation of
the mode for fixed design models. We assume that (x1, Y1), (x2, Y2), ..., (xn, Yn) are
i.i.d. random variables with conditional pdf f(y|x) where the variable Y depending
upon a fixed design predictor x through a regression function m(x).
Finally, we used the proposed estimator in some applications using simulated and
real data. The results have indicated that the mode estimator has a good perfor-
mance and efficiency estimator by MSE error and correlation coefficient.
i
II
الملخص
تلعب التي الشرطية الكثافة بدالة و اللامعلمية بالتنبؤات تهتم هذه الدراسة .هذه الدراسة في مهما دورا
.الثابت التصميم لنماذج اللامعلمي الشرطي المنوال تقدير مشكلة دراسة تم دراسةال هذه في
المشتركة المتقاربة الطبيعية لةالحا تحديد وتم . مستقل العشوائية المتغيرات في الخطأ أن افترضنا
. المنتظمة الشروط بعض على قائمة مختلفة ثابتة تصميم نقاط عند الشرطي المنوال لمقدر
بعض درسنا ,ايضا ، ومتجانسا مستقلا التوزيع يكون عندما المنوال بدراسة دراستنا بدأنا
لمقدر الكافية الشروط بعض دراسة ثم الشرطية، الكثافة لدالة النواة لمقدر النظرية الخصائص
ذلك بعد . المقارب الطبيعي التوزيع يتبع العشوائي التصميم لنموذج للمنوال النواة
، .،.. Y1( x ،) 2, Y2( x ,1 (ان افترضنا ,الثابت التصميم لنموذج للمنوال النواة مقدر درسنا
) n, Ynx )المتغير حيث ، المشروطة الكثافة لدالة ومتجانس مستقل توزيع ذات عشوائية متغيرات
Y الثابت التصميم تنبؤ على يعتمد x الانحدار دالة خلال من m(x).
المحاكاة بيانات باستخدام التطبيقات بعض في المقترح المقدر استخدمنا الدراسة نهاية وفي من ذلك و وكفء الأداء جيد مقدر المنوال مقدر بأن النتائج بينت وقد . حقيقية وبيانات
. الإرتباط ومعامل الخطأ نسبة خلال
Dedication
I am grateful and thankful the most to Allah for giving me the knowledge, the
power and the patience to complete this work, and to his prophet of mercy to the
mankind, the Prophet Mohammad (pbuh), who lightened and guided me to the
right way.
To My Parents.
To My Brother and Sister.
To My Friends.
To all Knowledge Seekers.
iii
Acknowledgements
After Allah, I reserve my greatest appreciation for my thesis supervisor, Professor
Raid Salha, for suggesting the research topics, his extremely helpful guidance, en-
couragement and patience throughout the course of my research work. I would like
to express my grateful sincere to my family especially, my Parents, my brothers
and my sisters for give me confidence, support, and help me to reach this level of
learning.
and I am also thankful to all my friends for their kind advises, and encouragement.
Finally. I pray to Allah to accept this work.
iv
Contents
Abstract i
Abstract in Arabic iii
Dedication iii
Acknowledgements iv
Contents v
List of Tables vii
List of Figures viii
List of Abbreviations ix
List of Symbols x
Introduction 1
1 Preliminaries 5
1.1 Basic definitions and notations . . . . . . . . . . . . . . . . . . . . . 6
1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
v
1.3 Properties of the Kernels Estimator . . . . . . . . . . . . . . . . . . 18
1.4 Bias and variance of the kernel density estimation . . . . . . . . . . 20
1.5 The MSE and MISE criteria . . . . . . . . . . . . . . . . . . . . . . 24
1.6 Optimal Binwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2 Kernel estimation of the mode for random design models 32
2.1 The kernel estimation of the mode . . . . . . . . . . . . . . . . . . . 34
2.1.1 Parzen’s Mode . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.2 Generalization of Parzen’s mode estimation. . . . . . . . . . 40
2.2 The kernel estimation of the conditional mode for random design
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.1 Asymptotic properties of the conditional mode estimation . 48
3 Kernel estimation of the regression mode for fixed design models 53
3.1 Fixed design model . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Mode estimation for fixed design data . . . . . . . . . . . . . . . . . 59
4 Applications 71
4.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Real data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Bibliography 80
vi
List of Tables
1.1 Some classical kernel functions . . . . . . . . . . . . . . . . . . . . . 19
4.1 The MSE and The Correlation Coefficient for the Simulation Study 1 74
4.2 The MSE and The Correlation Coefficient for the Simulation Study 2 75
4.3 The MSE and The Correlation Coefficient for the Simulation Study 3 76
4.4 The MSE and The Correlation Coefficient for the Simulation Study 4 77
vii
List of Figures
1.1 Histograms with different binwidths for the same sample. . . . . . . 12
1.2 The effect of the choice of origin is displayed in the five histograms. 13
1.3 Naive estimate constructed from Old Faithful geyser data, h = 0.25. 15
1.4 Kernel density estimation based on 7 points. . . . . . . . . . . . . 17
1.5 Kernel density estimates based on different binwidths. . . . . . . . 29
4.1 A scatter plot of the first simulated data together perfect curve . . 74
4.2 A scatter plot of the second simulated data together with perfect
curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3 A scatter plot of the third simulated data together with perfect curve 76
4.4 A scatter plot of the foorth simulated data together with perfect curve 77
4.5 Regression mode estimation of the ethanol data . . . . . . . . . . . 78
4.6 Regression mode estimation of the vapor Pressure of Liquid Water 79
viii
List of Abbreviations
Abbreviation Description
AMISE Asymptotic mean integrated squared error.
Cdf Cumulative distribution function.
Corr Correlation
Cov Covariance.
i.i.d. Independent and identically distributed.
ISE Integrated Squared Error.
KDE Kernel Density Estimation.
MISE Mean Integral Square Error.
MIAE Mean Integral Absolute Square Error.
MSE Mean Square Error.
NW Nadaraya Watson.
o Small oh.
O Big oh.
pdf Probability density function.
Var Variance.
SSE Sum of Squared Error.
SSTO Sum of squares Stands for total.
ix
List of SymbolsSymbol Description
f(x) Probability density function.
F (x) Cumulative distribution function.
fn Kernel estimator for the function f .
h The bandwidth.
hopt The optimal bandwidth.
P Probability set function.
µ The mean.
m(x) Regression mean function.
εi Independent random variables .
σ2 The variance.
θ The population mode.
θn The variable mode estimation.
θ∗n The a concurrent estimator of the mode.
E The expectation.
S The sample standard deviation.
N The sample size.
IA The indicator function.
R The set of real numbers.
R2y,yn Correlation Coefficient.∏
Product.
K(.) The kernel function.p↪→ Converge in probability.
d↪→ Converge in distribution.
a.s.↪→ Almost sure convergence.
x
Introduction
The probability density function is a fundamental concept in statistics. Suppose
we have a set of observed data points assumed to be a sample from an unknown
probability density function f , the construction of an estimate of the density func-
tion from observed data is known as density estimation. Accordingly, probability
density estimators are generally broken down into two basic classes, parametric or
nonparametric estimators. Parametric estimators assume a functional form of the
density parametrized by a finite set of parameters, while nonparametric methods
consist of all other types of estimators. The main subject of this thesis is the kernel
estimation of the probability density function.
Parzen(1962) and Nadaraya (1965) have shown that under some regularity condi-
tions the estimator of the population mode obtained by maximizing a kernel estima-
tor of the pdf is strongly consistent and asymptotically normally distributed. For
independent and identically distributed data, Samanta and Thavanesmaran (1990)
considered the problem of estimating the mode of a conditional pdf, for random de-
sign model, and they have shown under regularity conditions that the estimator of
the population conditional mode is strongly consistent and asymptotically normally
distributed. Salha and Ioanides (2007) considered the estimation of the conditional
mode under dependence conditions. We assume that (x1, Y1), (x2, Y2), ..., (xn, Yn)
are i.i.d. random variables with conditional pdf f(y|x) where the variable Y de-
1
pending upon a fixed design predictor x through a regression function m(x). In this
thesis, we consider the problem of estimating the mode of the unknown conditional
pdf f(y|x) for fixed design model. We estimate it by many methods like histogram,
naive estimator, kernel estimator,...etc. In this thesis, we will study the kernel es-
timation of the conditional density function, then we will use it to estimate the
conditional mode.
The conditional mode is denoted by θ(x), where
θ(x) = maxyf(y|x).
The kernel estimator of the conditional mode is denoted by θn(x), where
θn(x) = maxyfn(y|x).
where
fn(y|x) =
∑ni=1Khn(x−Xi)Khn(y − Yi)∑n
i=1Khn(x−Xi)
Also, we will study some theoretical properties of the kernel estimator for the con-
ditional density function.
let (x1, Y1), (x2, Y2), ...(xn, Yn) be a data for a fixed design, Yi = m(xi) + εi. for
i = 1, ..., n. be the regression model, where m(x) is unknown regression function.
The design points x1, x2, ..., xn determined by the experimenter, are ordered, i.e. we
have 0 ≤ x1 ≤ x2 ≤ ... ≤ xn ≤ 1. Inside the absence of different information, we
will take xi = in, i = 1, 2, ..., n. εi are independent random variables with mean 0
and variance σ2.
Finally, we will performance of the proposed estimator is tested via applications
using simulated and real life data.
2
This thesis will consist of the following chapters
Chapter 1. Preliminaries.
This chapter will contain some basic definitions, facts and notations that will use
in the thesis. Also, it will contain an introduction to the kernel estimation.
Chapter 2. Kernel estimation of the mode for random design models.
In this chapter, we will study the mode estimation for random design models and
the conditional mode for independent data.
Chapter 3. Kernel estimation of the regression mode for fixed design
models.
This chapter is the main chapter of the thesis. In this chapter, we will study the
problem of estimating nonparametrically the conditional mode for fixed design mod-
els. We suppose the error random variables are independent. The joint asymptotic
normality of the conditional mode estimator at different fixed design points is es-
tablished under some regularity conditions.
Chapter 4. Applications.
In this chapter, the performance of the mode estimation will be tested using ap-
plication to real data and simulated data. Then the thesis will be closed by some
conclusion remarkes and suggestions for new research ideas.
3
Chapter 1
Preliminaries
Chapter 1
Preliminaries
Introduction:
In this chapter, we present some important methods to estimate the density func-
tion. The concept of estimation will be introduced and discussed. We will concern
with the nonparametric estimation and introduce three of the most nonparametric
estimation methods, histograms, naive and the kernel. The asymptotic properties
of the kernel estimation of the density function, will be discussed in details.
This chapter is organized as follows.
In Section 1.1, we present some concepts and facts from statistic, that we will need
in this thesis. In Section 1.2, we present some well known estimators of the density
function. In Section 1.3, we summarize some properties of the kernels and estima-
tor, the bias and the variance of the kernel density estimator will be discussed in
Section 1.4. In Section 1.5, we will present two types of error criteria the MSE and
the MISE for the kernel density estimation. In Section 1.6, we present the optimal
binwith.
5
1.1 Basic definitions and notations
In this Section, we will introduce some basic definitions and theorems from statistic,
that will help in the remaining of this thesis.
Definition 1.1.1. (Royden and Fitzpatrick, 1988) If A is any set, we define the
Indicator function IA of the set A to be the function given by
IA =
1 if x∈A,
0 if x/∈A.
Definition 1.1.2. (Hogg and Craig, 1995) (Converge in Probability)
Let Xn be a sequence of random variables and let X be a random variable defined
on a sample space. We say Xn converges in probability to X if for all ε > 0, we have.
limn−→∞
P [|Xn −X|≥ε] = 0, (1.1.1)
or equivalently,
limn−→∞
P [|Xn −X|≤ε] = 1. (1.1.2)
If so, we write
Xnp−→ X
Definition 1.1.3. (Hogg and Craig, 1995)(Converge in Distribution)
Let Xn be a sequence of random variables and let X be a random variable. Let
FXn and FX be, respectively, the cumulative distribution function (cdfs) of Xn and
X. Let C(FX) denote the set of all points where FX is continuous. We say that Xn
converge in distribution to X if
limn−→∞
FXn(x) = FX(x), for all x ∈ C(FX). (1.1.3)
6
We denote this convergence by
Xnd−→ X
Definition 1.1.4. (Yates and Goodman, 1999)(Almost Sure Convergence)
A sequence of random variables, X1, X2, ....., converges almost surely to a random
variable X if, for every ε > 0,
P ( limn→∞
|Xn −X| < ε) = 1.
We denote this convergence by
Xna.s.−→ X
Theorem 1.1.5. (Hogg and Craig, 1995)
1. If Xn converge to X with probability 1, then Xn converge to X in probability.
2. If Xn converge to X in probability, then Xn converge to X in distribution.
3. Let Xn converge to X in probability and let g be a continuous function on R,
then g(Xn) converge to g(X) in probability.
Theorem 1.1.6. (Sen, 1993) (Liapounov Theorem)
Let Xk, k > 1, be independent random variables such that EXk = µk and
V arXk = σ2k, and for some 0 < δ 6 1,
ν(k)2+δ = E|Xk − µk|2+δ <∞, K ≥ 1.
Also, let
Tn =∑n
k=1Xk, ξn =ETn =∑n
k=1 µk,
S2n = V arTn =
∑nk=1 σ
2k, Zn = (Tn − ξn)|Sn
7
and ρn = S−(2+δ)n
∑nk=1 ν
(k)2+δ. Then, if
limn→∞
ρn = 0,
we have,
Znd−→ N(0, 1).
Definition 1.1.7. (Hansen, 2009) (Order Notation O And o)
Given two sequences {xn} and {yn} such that yn ≥ 0 for all n.
• The sequence {xn} is of big order with the sequence {yn} if limn−→∞
sup
∣∣∣∣xnyn∣∣∣∣ <
∞, denoted xn = O(yn) (read : xn is big oh of yn ),
• The sequence {xn} is of small order with the sequence {yn} if limn−→∞
∣∣∣∣xnyn∣∣∣∣ = 0,
denoted xn = o(yn) as x −→∞ (read : xn is little oh of yn).
Theorem 1.1.8. (Wand, 1995) (Taylor’s Theorem)
Suppose that f is real valued function defined on R and let x ∈ R . Assume
that f has p continuous derivatives in an interval (x – δ, x + δ) for some δ > 0.
Then for any sequence αn converging to zero .
f(x+ αn) =
p∑j=0
(αjnj!
)f j(x) + o(αpn).
Theorem 1.1.9. (Sen, 1993) (Cramer-Wold)
let X, X1, X2, .... be random vectors in RP, then Xnd−→ X if, and only if for
a fixed λ ∈ RP, we have
λTXnd−→ λTX
where λT denotes the transpose of λ.
8
1.2 Estimation
Estimation refers to the process by which one makes inferences about a population,
based on information obtained from a sample, and it is a rule for calculating an
estimate of a given quantity based on observed data.
The probability density function is a fundamental concept in statistics. Consider
any random variable X that has probability density function f . Specifying the func-
tion f gives a natural description of the distribution of X, and allows probabilities
associated with X to be found from the relation.
P (a < X < b) =∫ baf(x)dx
for any real constants a and b with a < b.
If the observed data are drawn from a distribution with unknown probability
density function, then the construction of an estimator to the unknown density
function is called a density estimation (Silverman, 1986). Density estimation
has experienced a wide explosion of interest over the last 40 years. It has been ap-
plied in many fields, including archeology, chemistry, banking, climatology, genetics,
economics, hydrology and physiology.
Definition 1.2.1. ( Estimator) An estimator is any statistic from the sample data
which is used to give information about an unknown parameter in the population.
Definition 1.2.2. (Le Cam, 1953)( Unbiased estimator)
Let X be a random variable with pdf with parameter θ. Let X1, ...., Xn be a random
sample from the distribution of X and let θn denotes an estimator of θ . We say θn
is an unbiased estimator of θ , if E(θn) = θ.
If θn is not unbiased, we say that θn is abiased estimator of θ .
9
Definition 1.2.3. (Le Cam, 1953)( Asymptotic Unbiasedness)
Let X be a random variable with pdf with parameter θ. Let X1, ...., Xn be a random
sample from the distribution of X and let θn denotes an estimator of θ . We say
θn is asymptotic unbiased estimator of θ, if its expected value converges to θ as
n→∞ .
limn→∞
E(θn) = θ
If this is not true, then θn is asymptotically biased.
Definition 1.2.4. (Le Cam, 1953)(Consistent Estimators)
The estimator θn of a parameter θ is said to be consistent estimator if for any
positive ε
limn→∞
P (|θn − θ| ≥ ε) = 0
or equivalently
limn→∞
P (|θn − θ| < ε) = 1
We say that θn converges in probability to θ
If this is not true, then θn is inconsistent.
There are two types of density estimation :
• Parametric Estimation.
• Nonparametric Estimation.
Parametric Estimation
The parametric estimation assumes that, the sample which is studied has known
distribution such as Gaussian and Gamma distribution and then parametric esti-
mation can be used to estimate the missed parameters for the distribution by using
moment, maximum likelihood estimators, bayes estimators, chi square estimators
10
...etc. For example the parametric density function assume that, if the data has
normal distribution with mean µ and variance σ2, we can estimate these parameters
µ, σ2 and substitute in the normal distribution formula, and then we obtain the es-
timation density function and denoted by fn(x). The simple example of parametric
estimation are:
1. capacity factor estimates.
2. equipment factor estimates.
Nonparametric Estimation
Nonparametric estimation methods are defined in (Racine, 2008) as“statistical
techniques that do not require a researcher to specify functional forms
for the objects being estimated”. There are many nonparametric statistical
objects of potential interest, including density functions (uni-variate and multi-
variate), density derivatives, conditional density functions, conditional distribution
functions, regression functions, median functions, quantile functions, and variance
functions. Many nonparametric problems are generalizations of uni-variate density
estimation. For obtaining a nonparametric estimation of a pdf there many methods.
Three common methods are:
1. Histogram.
2. The Naive Estimator.
3. Kernel Density Estimation .
Histogram
The oldest and most widely used density estimator is the histogram. A histogram
was formalised by (Sturges, 1926), where the histogram requires two parameters to
11
be defined binwidth and starting position of the first bin. The data range is divided
into a set of successive and non-overlapping intervals (bins), and the frequencies
of occurrence in the bins are plotted against the discredited data range. Given an
origin x0 and a binwidth h, we define the bins of the histogram to be the intervals
[x0 + mh, x0 + (m + 1)h), for positive and negative integer m. The intervals have
been chosen closed on the left and open on the right for definiteness.
Definition 1.2.5. (Wand, 1995)(Histogram Estimator)
Let X1, X2, ..., Xn be a random sample from unknown pdf f(x), the histogram
estimator of the density function f(x) is defined by:
fn(x) =1
nh(no, of Xi in same bin as x). (1.2.1)
Figure 1.1: Histograms with different binwidths for the same sample.
Figure 1.1 shows three histogram estimates of f(x) for the sample, for differ-
ent binwidths. Note that the estimates are piecewise constant and that they are
12
strongly influenced by the choice of binwidth, see (Hrdle, 2012).
Figure 1.2: The effect of the choice of origin is displayed in the five histograms.
Figure 1.2 shows five histograms of the buffalo snowfall data with the same bin-
width h = 10, but with different origins x0 = 0, 2, 4, 6, 8, and the average shifted
histogram built from these five histogram (Hrdle, 2012).
Histogram is a very simple form of density estimation, but has several weakness:
1. The density estimate depends on the starting position of the bins. For mul-
tivariate data, the density estimate is also affected by the orientation of the
bins.
2. The discontinuities of the estimate are not because of the underlying density,
they are only an artifact of the selected bin locations those discontinuities
13
make it very difficult to comprehend the structure of the data.
3. A much more serious problem is the curse of dimensionality, since the number
of bins grows exponentially with the number of dimensions. In high dimen-
sions we would require a very large number of examples or else most of the
bins would be empty.
4. These issues make the histogram unsuitable for most practical applications
except for quick visualizations in one or two dimensions.
5. Histogram depends on the binwidth h and the origin x0.
To overcome the weakness of the histogram method there is another method
which is naive estimator.
The naive estimator (Silverman, 1986)
A generalization of the histogram method, is the naive estimator, it is equivalent
to a histogram where the estimation point x is used as the center of the bin, and
the binwidth is 2h. Therefore, the naive estimator is always globally a valid pdf,
i.e., it is non-negative and integrates to one. The naive estimator was proposed by
(Fix and Hodges Jr, 1951).
If the random variable X has density function f , then
f(x) = limh−→0
1
2hP (x− h < X < x+ h). (1.2.2)
For any given h, we can estimate P (x − h < X < x + h) by the proportion of the
sample falling in the interval (x− h, x+ h). Thus a natural estimator fn(x) of the
density function is given by choosing a small number h and setting
fn(x) =1
2nh[no. of X1, ..., Xn falling in (x− h, x+ h)]. (1.2.3)
14
This estimator is called the naive estimator.
Comparing the above estimator to Equation (1.2.1), we see that it is equivalent to
a histogram.
It can be informative to rewrite Equation (1.2.3), by the weighting function W :
W (x) =
12
if |x| < 1
0 otherwise.
Using this notation, we can express the naive estimator as
fn(x) =1
n
n∑i=1
1
hW
(x−Xi
h
). (1.2.4)
where Xi are the data samples.
Figure 1.3: Naive estimate constructed from Old Faithful geyser data, h = 0.25.
Figure 1.3 shows the stepwise nature of the estimate is clear. The boxes used
to construct the estimate have the eruption lengths constructed with the same bin-
width but different origins (Samanta and Thavanesmaran, 1990).
15
The Naive have some disadvantage:
• It is continuity but has jumps at the points of (xi ± h).
• It has zero derivative everywhere.
Refinement of naive estimator by replacing the weight function W by kernel
function K .
The Kernel Estimator
The univariate kernel density estimation (KDE) is a non-parametric way to esti-
mate the pdf f(x) of a random variable X. It is a fundamental data smoothing
problem where inferences about the population are made, based on a finite data
sample. These techniques are widely used in various inference procedures such as
signal processing, data mining, and econometrics.
Definition 1.2.6. (Wand, 1995)(Kernel Estimator)
Let X1, X2, ...., Xn be a random sample from unknown pdf f(x), the kernel esti-
mator of the density function f(x) is defined by (Rosenblatt, 1956). He generalize
the naive estimator to the kernel form
fn(x) =1
n
n∑i=1
Kh(x−Xi) =1
nh
n∑i=1
K
(x−Xi
h
). (1.2.5)
h is called the binwidth and K(.) is a kernel function considers to be both
symmetric and satisfies.
∫ ∞−∞
K(x) dx = 1
16
The kernel estimator can be viewed as a sum of bumps placed at the observa-
tion. The kernel function K determines the shape of bumps, where the binwidth h
determines their width, see the illustration in Figure 1.4 (Fan and Yao, 2003).
Figure 1.4: Kernel density estimation based on 7 points.
From Figure 1.4, we have:
• (1) The shape of the bump is defined the kernel function.
• (2) The spread of the bump is determined by a binwidth h, that is analogous
to the binwidth of a histogram.
That is the value of the kernel estimate at the point x is the average of the n
kernel ordinates at this point.
17
1.3 Properties of the Kernels Estimator
In this section, we will discuss some fundamental properties of the kernels estimator
and give some examples of common the kernel functions.
We consider some properties of the kernels :
1. The kernel is a piecewise continuous function, symmetric around zero, and
integrating to one, i.e.
K(x) = K(−x),
∫ ∞−∞
K(x) dx = 1
2. The kernel function need not have bounded support and in most applications
K is a positive probability density function.
Definition 1.3.1. (Hansen, 2009) A kernel function K is said to be of order p, if
its first nonzero moment is µp , i.e. if
µj(k) = 0, j = 1, 2, ...., p− 1, µp(k) 6= 0,
where,
µj(k) =
∫ ∞−∞
yjK(y) dy. (1.3.1)
Some examples of the kernel functions and their formula are given in Table 1.1,
where I is the indicator function.
18
Table 1.1: Some classical kernel functions
kernel Equation
Epanechnikov K(t) = 34(1− 1
5t2/√
5)I(|t|≤√5)
Biweight K(t) = 1516
(1− t2)2I(|t|≤1)
Triangular K(t) = (1− |t|)I(|t|≤1)
Gaussian K(t) = 1√2π
exp(−t2/2)
Rectangular K(t) = 12I(|t|≤1)
From the definitions given in the table 1.1, we can see that the choice of h will
boost how many values are consisted in estimating the density at each point. This
value is called the window width or binwidth. If the window width is not limited,
it is determined as
m = min
(√variancex ,
interquartile rangex1.349
)
h =0.9m
n15
where x is the variable for which we wish to estimate the kernel and n is the number
of observations. Most researchers agree that the choice of kernel is not as important
as the choice of binwidth. There is a great deal of literature on choosing bandwidths
under differed conditions, see for example, (Parzen, 1962).
Now, we introduce some important properties of the kernel estimator. We con-
sider the following conditions:
(i) The unknown density function f(x) has continuous second derivative f ′′(x).
(ii) The binwidth h = hn satisfies limn→∞
h = 0, and limn→∞
nh =∞.
19
(iii) The kernel K is a bounded probability density function of order 2 and sym-
metric about the origin.
(iv)∫∞−∞ zK(z) dz = 0 and
∫∞−∞ z2K(z) dz <∞.
1.4 Bias and variance of the kernel density esti-
mation
In the light of the previous properties, the bias and the variance of the kernel density
estimation function will be derived.
Definition 1.4.1. (Hansen, 2009) The bias of an estimator fn(x) of a density f(x)
is the difference between the expected value of fn(x) and f(x), that is
Bias(fn(x)) = E(fn(x)) – f(x),
where
E(fn(x)) =1
n
n∑i=1
1
hEK
(x – Xi
h
)=
1
n
n∑i=1
1
h
∫ ∞−∞
K(x – t
h
)f(t)dt
=1
h
∫ ∞−∞
K(x – t
h
)f(t)dt
The transformation z = x – th
, i.e. t = x− hz, |dzdt| = 1
h
E(fn(x)) =
∫ ∞−∞
K(z)f(x – hz)dz
20
since f is continuous derivative of order 2.
Expanding f(x–hz) in a Taylor series yields,
f(x – hz) =2∑j=0
(−hz)j
j!f j(x) + o(−hz)2
= f(x) + (−hz)f ′(x) +h2z2
2f ′′(x) + o(h2z2)
= f(x) – hzf ′(x) +1
2(hz)2f ′′(x) + o(h2),
where o(h2) represents terms that converge to zero faster than h2 as h approaches
zero.
Thus
E(fn(x)) =
∫ ∞−∞
K(z){f(x) – hzf ′(x) +h2z2
2f ′′(x) + o(h2)}dz
=
∫ ∞−∞
K(z)f(x)dz –
∫ ∞−∞
K(z)hzf ′(x)dz
+
∫ ∞−∞
K(z)h2z2
2f ′′(x)dz + o(h2)
= f(x)
∫ ∞−∞
K(z)dz – hf ′(x)
∫ ∞−∞
zK(z)dz
+h2
2f ′′(x)
∫ ∞−∞
z2K(z)dz + o(h2)
= f(x) +h2
2µ2(K)f ′′(x) + o(h2)
Bias(fn(x)) =1
2h2f ′′(x)µ2(K) + o(h2), (1.4.1)
where µ2(K) =∫∞−∞ z2K(z) dz.
21
The variance of (fn(x)) is given by
V ar(fn(x)) = V ar
(1
nh
n∑i=1
K(x – Xi
h
))
=1
n2h2
n∑i=1
V ar
(K(x – Xi
h
))
because the Xi, i = 1, 2, ..., n, are independently distributed. Now
V ar
(K(x – Xi
h
))= E
(K(x−Xi
h
)2)–
(EK
(x – Xi
h
))2
=
∫ ∞−∞
K(x – t
h
)2f(t)dt –
(∫ ∞−∞
K(x – t
h
)f(t)dt
)2
V ar(fn(x)) =1
n
∫ ∞−∞
1
h2K(x–t
h
)2f(t)dt –
1
n
(1
h
∫ ∞−∞
K(x – t
h
)f(t)dt
)2
=1
n
∫ ∞−∞
1
h2K(
x – t
h)2f(t)dt –
1
n
(f(x) +Bias(fn(x))
)2.
Substituting z =x–t
hone obtains
V ar(fn(x)) =1
nh
∫ ∞−∞
K(z)2f(x – hz)dz –1
n(f(x) + o(h2))2.
Applying a Taylor approximation yields
V ar(fn(x)) =1
nh
∫ ∞−∞
K(z)2(f(x) – hzf ′(x) + o(h))dz –1
n(f(x) + o(h2))2.
Note that if n becomes large and h becomes small then the above expression
becomes approximately:
22
V ar(fn(x)) =1
nhf(x)R(K) + o(nh)−1, (1.4.2)
where, R(K) =∫∞−∞K
2(z)dz.
By assumption, the result holds.
From the above we have some properties about the bias and the variance:
1. The bias is of order h2, which implies that fn(x) is an asymptotically unbiased
estimator.
2. The bias is large, whenever the absolute value of the second derivative |f ′′(x)|
is large. This occurs for several densities at peaks where the bias in negative,
and valleys where the bias is positive.
3. The variance is of order (nh)−1, which means that the variance converges to
zero by condition (ii).
Parzen(1962) studied the statistical properties of kernel estimator. In addition to
the above, he proved several other properties. He showed that fn(x) is a consistent
of f(x) and the sequence of estimates fn(x) is asymptotically normally distributed.
Also he proved that if the probability density function f(x) is uniformly continuous,
and if limn−→∞
nh2n =∞, then fn(x) tends uniformly continuously (in probability) to
f(x), in the sense that next Equation holds,
limn−→∞
P ( sup−∞<x<∞
|fn(x)− f(x)| < ε) = 1, ∀ε > 0.
23
1.5 The MSE and MISE criteria
The important role played by kernel density estimator makes us concerned with its
performance, its efficiency and accuracy in estimating the true density. There are
two types of errors criteria:
1. The mean squared error (MSE)
This type of error criteria is used to measure the error when estimating the density
function at a single point. It is defined by:
MSE {fn(x)} = E {fn(x) – f(x)}2 . (1.5.1)
The MSE measures the average squared difference between the density estimator
and the true density. In general, any function of the absolute distance |fn(x) – f(x)|
(often called metric) would serve as a measurement of the goodness of an estimator.
But MSE metric has at least two advantages:
• it is tractable analytically.
• it has an interesting decomposition into variance and squared bias provided
f(x) is not random.
For estimation at a single point x, a natural measure of discrepancy is the mean
square error (MSE) defined by:
MSE(fn(x)) = E[fn(x)− f(x)]2
= E[f 2n(x)− 2f(x)fn(x) + f 2(x)]
= Ef 2n(x)− 2f(x)Efn(x) + Ef 2(x)
= (Efn(x))2 − 2f(x)Efn(x) + f 2(x) + Ef 2n(x)− (Efn(x))2
= [Efn(x)− f(x)]2 + V ar(fn(x))
= Bias2(fn(x)) + V ar(fn(x)).
24
2. The mean integral squared error (MISE)
This type of error criteria is used to measures the error when estimating the density
over the whole real line. The most well known of this type is the mean integral
square error, which was introduced by (Rosenblatt, 1956).
Definition 1.5.1. (Ouyang, 2005). An error criterion that measures the distance
between fn(x) and f(x) is the integrated squared error (ISE), given by
ISE {fn(x)} =
∫ ∞−∞
(fn(x) – f(x))2dx.
Note that the ISE is not appropriate if we deal with all data sets, so we prefer to
analyze the expected value of this random quantity, the integrated squared error.
Definition 1.5.2. (Ouyang, 2005). The expected value of ISE is called the mean
integrated squared error (MISE) is given by
MISE(fn(x)) = E(ISE {fn(x)}) = E
∫ ∞−∞{fn(x) – f(x)}2 dx. (1.5.2)
We can write the MISE as integral the MSE and as a sum of integral the squared
bias and integral the variance at x,
MISE(fn(x)) =
∫ ∞−∞
MSE(fn(x))dx =
∫ ∞−∞{Efn(x)− f(x)}2 dx+
∫ ∞−∞
V ar(fn(x))dx.
(1.5.3)
Equation (1.5.3) gives the MISE as a sum of the integral squared bias and the
integral variance. Substituting (1.4.1) and (1.4.2), we conclude that
MISE(fn(x)) = AMISE(fn(x)) + o{h4 + (nh)−1
}, (1.5.4)
25
where AMISE is the asymptotic mean integral squared error given by
AMISE(fn(x)) =1
4h4µ2(K)2R(f ′′) + (nh)−1R(K), (1.5.5)
see (Wand, 1995). Notice that the integral squared bias is asymptotically propor-
tional to h4, so to reduce this quantity one needs to take h to be small. On the
other hand, taking a small h increases the integral variance since it is proportional
to (nh)−1. Therefore, as n increases, h should vary in such a way that each of the
components of the MISE becomes small. This is known as the variance-bias trade-
off. The trade-off between bias and variance in the binwidth distributions seems to
be an intrinsic part of the performance of data-based binwidth selectors. Less bias
seems to entail more variance, and at some cost in bias, much less variance can be
obtained.
Mean integrated absolute error (MIAE)
We could also work with other criterions where,
MIAE {fn(., h)} = E
∫ ∞−∞|fn(x, h)− f(x)|dx.
MIAE is always defined whenever fn(x, h) is a density, and it is invariant under
monotone transformation, but more complicated.
1.6 Optimal Binwidth
The problem of selection the binwidth is very important in kernel density estimation.
Choice of appropriate binwidth is critical to the performance of most nonparamet-
ric density estimators. When the binwidth is very small, the estimate will be very
close to the original data. The estimate will be almost unbiased, but it will have
large variation under repeated sampling. If the binwidth is very large, the estimate
26
will be very smooth, lying close to the mean of all the data. Such an estimate will
have small variance, but it will be highly biased. There are many rules for bin-
width selection, for example normal scale rules, over-smoothed binwidth selection
rules, least squares cross-validation, biased Cross-Validation, estimation of density
functionals and plug-in binwidth selection. For more details see (Loeve, 1960), (Sil-
verman, 1986) and (Wand, 1995). In this section, we briefly review methods for
choosing a global value of the binwidth h.
The problem of choosing is crucial in density estimation
(i) A large h will over-smooth the density estimation and mask the structure of
the data.
(ii) A small h will yield a density estimation that is spiky and very hard to interpret.
(iii) We would like to find a value of h that minimizes the error between the
estimated density and the true density.
A natural measure is the MSE at the estimation point x, defined by (1.5.1).
The bias of an estimate is the systematic error incurred in the estimation.
The variance of an estimate is the random error incurred in the estimation.
(iv) The bias-variance dilemma applied to bandwidth choosing simply means that,
a large binwidth will debase the differences inter the estimates of fn(x) for
different data sets (the variance), but it will increase the bias of fn(x) with
respect to the true density f(x). A small binwidth will debase the bias of
fn(x), at the outlay of a larger variance in the estimates fn(x).
Subjective choice
The natural way for choosing h is to plot out several curves and choose the esti-
27
mate that best matches one prior (subjective) ideas. However, this method is not
practical in pattern recognition since we typically have high-dimensional data.
Reference to a standard distribution
Assume a standard density function and find the value of the binwidth that mini-
mizes the integral of the square error (MISE)
hMISE = argminE
[∫(fn(x)− f(x))2dx
](1.6.1)
If we assume that the true distribution is Gaussian and we use a Gaussian kernel,
the bandwidth h is computed using the following equation from (Silverman, 1986).
h∗ = 1.06SN−15 (1.6.2)
where S is the sample standard deviation and N is the sample size.
The AMISE (asymptotic MISE) has some useful advantages. Its simplicity as
a mathematical expression to deal with, makes it useful for large sample approxi-
mation. Also, we can see an important alternative relationship between bias and
variance, it is known as the variance-bias trade-off. It gives us an understanding
about the role of binwidth h.
28
Figure 1.5: Kernel density estimates based on different binwidths.
Figure 1.5 depends on three binwidths:
If we choose h = 0.25, then we have a solid curve. If we choose h = 0.5, then
we have a dashed curve. And if we choose h = 0.75, then we have a dotted curve
(Salha, 2014).
Corollary 1.6.1. The AMISE-optimal binwidth, hAMISE , has a closed form
hopt =
[R(K)
µ2(K)2R(f ′′)n
] 15
. (1.6.3)
Proof. By differentiating (1.5.5) with respect to h and setting the derivative equal
to zero we can find the optimal binwidth
d
dh{AMISEfn(x)} = −(nh2)−1R(K) + h3µ2
2(K)R(f ′′) = 0
h5nµ22(K)R(f ′′) = n−1R(K)
hopt =
{R(K)
nµ22(K)R(f ′′)
} 15
29
Therefore if we substitute (1.6.3) into (1.5.5), we obtain the smallest value of
AMISE for estimating f using the kernel K.
infh>0
AMISE {fn} =5
4
{µ2(K)2R(K)4R(f ′′)
} 15 n
−45 . (1.6.4)
Notice that from (1.6.3), we see that the optimal binwidth depends on the un-
known density being estimated, so we can not use (1.6.3) directly to find the optimal
binwidth hopt. Also from (1.6.3) we can conclude the following useful conclusions:
• The optimal binwidth will converge to zero as the sample size increases, but
at very slow rate.
• The optimal binwidth is inversely proportional to R(f ′′)15 . Since R(f ′′) mea-
sures the curvature of f , this means that for a density function with little
curvature, the optimal binwidth will be large. Conversely, if the density func-
tion has a large curvature, the optimal binwidth will be small.
Summary:
In this chapter, we introduced some basic definitions and theorems that will need
it in this thesis, and we studied definition of estimation, its types, and precis of
nonparametric estimation and its common methods, then we presented kernel den-
sity estimation of the pdf, properties of the kernel estimator and we presented the
MSN and MISE criteria and its role. Also we studied the optimal binwidth and
many of rules for bandwidth selection. In the next chapter, we will study the kernel
estimation of the mode for random design models.
30
Chapter 2
Kernel Estimation of The Mode
for Random Design Models
Chapter 2
Kernel estimation of the mode for
random design models
Introduction:
The mode is one of the central tendency measures, which is used to find the most
repeated values and were the density function has its maximum values.
Definition 2.0.1. (Salha and Ioanides, 2007) (A mode) of a probability density
f(t) is a value θ which maximizes f. The size of the mode is f(θ).
θ = maxx
f(x) (2.0.1)
The mode is defined as the common value or the most repetitive among those
observation, and the data might have more than one modes or nothing at all which
can’t be calculated. The mode can be found by calculation or drawing and it is
not affected by the irregular values. It’s important to refer to the relation between
the mean, the median, and the mode. Those measures are the most used measures
of locations by the investigators, because those measures are easy to understand.
They are equal when the curve is symmetric, while when the curve is in a state of a
32
positive bending, the mean is larger than the median and the both are larger than
the mode. When the curve is in a state of negative bending, the mode is larger
than the median and the both are larger than the mean.
Strengths of the mode (Le Cam, 1953)
• Very quick and easy to determine.
• Is an actual value of the data.
• Not affected by extreme scores.
Weakness of the mode
• Sometimes not very informative.
• Can change dramatically from sample to sample.
• Might be more than one.
The problem of estimating the mode of a pdf has received large heed in the
literature. The study of nonparametric mode estimation is now four decades old,
having roots in many papers. In the last few years, an increasing interest in this
topic can be observed.
In this chapter, we will study the mode estimation for random design models and
the conditional mode for independent data. This chapter consists of two sections.
In Section 2.1, we introduce the kernel estimation of the mode. In the next section,
we present the kernel estimation of the conditional mode for random design models.
33
2.1 The kernel estimation of the mode
The problem of estimating the location and size of the mode is considered via kernel
density estimates. In this section, we study the parzen’s kernel mode estimation
and it’s generalization.
2.1.1 Parzen’s Mode
We present the kernel estimation of the mode. Let X1, X2, ....., Xn be a sequence
of i.i.d. random variable with pdf f(x). Assume that the pdf f(x) is uniformly
continuous in x. It follows that f(x) possesses a mode θ which defined by
f(θ) = maxx
f(x)
assume that θ is unique.
The classical procedure to estimate the mode is as follows, If f(x) is the unknown
function and θ is the mode of f(x), then θ is estimated from the location mode θn,
which maximize the estimator function fn(x) for f(x). Suppose that fn(x) is a
continuous function and tends to 0 as |x| tends to ∞. Therefore there is a random
variable θn such that
θn = argmaxx
fn(x), (2.1.1)
where,
fn(x) =1
nh
n∑i=1
K
(x−Xi
h
). (2.1.2)
We call θn the sample mode.
34
(Parzen, 1962) considered θn as an estimator of the population mode θ. He
established conditions under which θn is consistent estimate in the sense that
limn−→∞
P [|θn − θ|≤ε = 1]. ∀ ε > 0 (2.1.3)
The estimator (2.1.1) is increasingly used, although it is diffcult to calculate.
Indeed, in addition to the calculation of fn, it involves a numerical step for the
computation of arg max as noticed by (Devroye, 1979), classical search methods
of the arg max perform satisfactorily only when fn is sufficiently regular. Thus in
practice, the arg max is usually computed over a finite grid, although it may affect
the asymptotic properties of the estimator. Moreover, when the dimension of the
sample space is large, or when accurate estimation is needed, the grid size increasing
exponentially with the dimension, leads to time-consuming computations.
Finally, the search grid should be located around high density areas. In high
dimension, this is a diffcult task and the search grid usually includes low density
areas. To solve this problem (Abraham et al., 2003) proposed a concurrent estimator
of the mode θ∗n which is defined by
θ∗n = argmaxx∈sn
fn(x). (2.1.4)
where Sn ={ X1, X2, ....., Xn}, is a finite sample of d dimension data. The main
advantage of using θ∗n instead of θn is that the former is easily computed in a finite
number of operations. Moreover, since the sample points are naturally concentrated
in high density areas, the set Sn can be regarded as the most natural random grid
for approximating the mode.
(Abraham et al., 2003) established the strong consistency of θ∗n towards θn and
provided almost sure rate of convergence without any differentiability condition on
35
f around the mode. (Abraham et al., 2004) examine whether maximization over
a finite sample alters the rate of convergence of the estimate θ∗n compared to that
of the estimate θn. They proved that the two estimates have the same asymptotic
behavior. Also, another use of computing θ∗n is that it may be an appropriate choice
for a stating value of an optimization algorithm to approximate θn.
To achieve asymptotic normality of θn and therefore to be able to construct
asymptotic confidence interval for θ, it is generally believed that rather heavy
smoothing conditions are needed. (Parzen, 1962) gave conditions under which the
sample mode θn is asymptotically normally distributed.
Theorem 2.1.1. (Parzen, 1962) (Asymptotic normality of the sample mode.)
Let K(x) be a probability density with characteristic function k(u) and let the
density f(x) have a characteristic function φ(u). If for some r ≥ 2
limu−→0
1−K(u)
ur> 0,
and if, for some δ, 12< δ < 1,
1.∫∞−∞ u
2+δ|φ(u)|du <∞,
2.∫∞−∞ u
2+δ|K(u)|du <∞,
3. limn−→∞
nh5+2δ = 0,
4. limn−→∞
nh6 =∞,
then,
(nh3)12 (θn–θ) −→ N
(0,
f(θ)
[f(θ)′′]2
∫ ∞−∞
(K ′(y))2dy
). (2.1.5)
36
Proof. See (Parzen, 1962) theorem (5A).
In this section conditions under which the asymptotic properties of the kernel
estimator of the pdf and its mode obtained are stated. For more details see, (Parzen,
1962) and (Salha and Ioanides, 2007).
Condition 1
1. The density function f(x) is uniformly continuous,
2. limx−→±∞
f(x) = 0,
3. f(x) has continuous second derivative.
condition 2
The bandwidth h is a function of n such that
1. limn−→∞
h = 0,
2. limn−→∞
nh =∞,
3. limn−→∞
nh2 =∞,
condition 3
let K(x) is a borel function satisfying the conditions
1. K(x) is twice differentiable,
2. sup−∞<x<∞
|K(x)| <∞,
3.∫∞−∞ |K(x)|dx <∞,
37
4. limx−→∞
|xK(x)| = 0,
5.∫∞−∞K(x)dx = 1,
6. limx−→∞
x2K(x) = c.
The unbiased property and the consistent of the kernel density estimator under
the previous conditions.
First, the following theorem (2.1.2) is important to prove the unbiased property
and consistency of the kernel density.
Theorem 2.1.2. (Bochner Theorem ). Suppose K(y) is a Borel function satis-
fying the conditions:
1. sup−∞<y<∞
|K(y)| <∞,
2.∫∞−∞ |K(y)|dy <∞,
3. limn−→∞
|yK(y)| = 0.
and let g(y) satisfy
∫ ∞−∞|g(y)|dy <∞,
let h be a sequence of positive constants satisfying limn−→∞
h = 0 Define,
gn(x) =1
h
∫ ∞−∞
K(y
h)g(x – y)dy,
then at every point x of continuity of g(.)
limn−→∞
gn(x) = g(x)
∫ ∞−∞
K(y)dy.
38
proof: first note that
gn(x) – g(x)
∫ ∞−∞
K(y)dy =
∫ ∞−∞
(g(x – y) – g(x)
)1
hK(
y
h)dy
let δ>0, and split the region of integration into two regions, |y| ≤ δ and |y|>δ,
then
|gn(x) – g(x)|∫ ∞−∞
K(y)dy ≤ max|y|≤δ|g(x – y) – g(x)|
∫|z|≤ δ
h
K(z)dz
+
∫|y|≥δ
yg(x – y)
yhK(
y
h)dy
+ |g(x)|∫|y|≥δ
1
hK(
y
h)dy
≤ max|y|≤δ|g(x – y) – g(x)|
∫ ∞−∞
K(z)dz
+1
δsup|z|≥ δ
h
|zK(z)|∫ ∞−∞|g(y)|dy
+ |g(x)|∫|z|≥ δ
h
|K(z)|dz
which tend to 0 as n tend to ∞, and then lets δ tend to 0.
Theorem 2.1.3. Under the conditions of the last theorem (2.1.2), if h is a function
of n satisfying
limn−→∞
nh2 =∞, limn−→∞
E[fn(x)] = f(x),
and if the probability density function f(x) is uniformly continuous. Then for
every ε > 0
P [supx|fn(x)− f(x)| < ε] −→ 1 as n −→∞.
proof:
To prove this theorem we want to show that
limn−→∞
E12 [ sup−∞<x<∞
|fn(x) – f(x)|2] = 0 (2.1.6)
39
Since limn−→∞
E[fn(x)] = f(x), it suffices to show that
E12 [ sup−∞<x<∞
|fn(x) – E[fn(x)]|2] −→ 0, (2.1.7)
as n −→∞, since by theorem (2.1.2), it follows that
limn−→∞
sup−∞<x<∞
|E[fn(x)] – f(x)| = 0.
Since
fn(x) = (2π)−1∫ ∞−∞
e−iuxK(uh)ϕn(u)du.
Then
sup−∞<x<∞
|fn(x) – E[fn(x)]| ≤ (2π)−1|∫ ∞−∞
e−iuxK(uh)ϕn(u)du –
∫ ∞−∞
e−iuxK(uh)E[ϕn(u)]du|
= (2π)−1∫ ∞−∞|e−iuxK(uh){ϕn(u) – E[ϕn(u)]}du|
= (2π)−1∫ ∞−∞|K(uh)||ϕn(u) – E[ϕn(u)]|du.(since |e−iux| = 1)
Therefore, by Minkowski’s inequality, the quantity in (2.1.7) is no greater than
(2π)−1∫ ∞−∞|K(uh)|σ[ϕn(u)]du ≤ (n
12h)−1
∫ ∞−∞|K(u)|du
which tends to 0. The proof of this Theorem is complete.
2.1.2 Generalization of Parzen’s mode estimation.
(Samanta, 1973) and (Konakov, 1974) have given multivariate versions of Parzen’s
results. The problem of estimating the mode of a pdf is a matter of both theo-
retical and practical interest. This problem was first considered by (Parzen, 1962)
in the univariate situation. He has shown that under certain regularity conditions
the estimate of the population mode obtained by maximizing an estimate of the
40
true pdf is consistent and asymptotically normal. The strongest result in this di-
rection is due to (Nadaraya, 1965) who has proved that under certain regularity
conditions the estimate converges to the theoretical mode with probability one.
Xj = (X1j, X2j, ..., Xkj), j = 1, 2, ..., n are n i.i.d. K-dimensional random variables
with an unknown common distribution function F (x) which is assumed to be ab-
solutely continuous having the density function f(x) where X = (x1, x2, ..., xk).
Let K1, K2, ..., Kk be k univariate pdf and {h} be a sequence of positive numbers
converging to zero. We consider an estimate fn(x) of f(x) defined
1. fn(x) = 1n
∑nj=1 h
−kk∏i=1
Ki(xi – Xij
h).
It is clear that fn(x) can be expressed as
2. fn(x) =∫∞−∞· · ·
∫∞−∞ h
−k{
k∏i=1
Ki(xi – uih
)
}dFn(u), where, u = (u1, u2, ..., uk)
and Fn(u) is the sample distribution function defined by
Fn(u) =1
n
n∑j=1
k∏i=1
{I(ui – Xij)}
and I(x – y) = 1 for y ≤ x and vanishes for y > x.
3. f (j1,j2,...,jk)(x) =∂{j1+j2+···+jk}fn(x)
∂(x1)j1∂(x2)j2 ...∂(xk)jk.
4. f (j1,j2,...,jk)(x) =∂{j1+j2+···+jk}fn(x)
∂(x1)j1∂(x2)j2 ...∂(xk)jk.
= 1n
∑nj=1 h
−(k+j1+j2+...+jk)k∏i=1
K(ji)i
(xi – Xij
h
).
=∫∞−∞· · ·
∫∞−∞ h
−(k+j1+j2+...+jk)n
{k∏i=1
K(ji)i (
xi – Xij
h)
}dFn(u).
Suppose the density function f(x) is uniform, continuous in x. Then there exists a
vector θ = (θ1, θ2, . . . , θK) such that f(θ) = maxx
f(x). We assume that θ is unique.
41
We next assume that for each i, Ki(t) is continuous in t and Ki(t) approaches zero
along with |t| going to infinity. Therefore, there exists a K-dimensional random
variable θn = (θ1n, θ2n, . . . , θkn) such that fn(θn) = maxx
fn(x). If for each i, Ki(t)
is chosen to be twice differentiable, then
f (1,0,0,...,0)n (θn) = f (0,1,0,...,0)
n (θn) = ... = f (0,0,0,...,1)n (θn) = 0.
It will be shown Theorem(2.1.4) that under certain regularity conditions θn converge
to θ with probability one. Furthermore, (nhk+2)12 (θn – θ) converges in distribution
to a K-dimensional normal random variable with mean vector 0 and covariance
matrix C−1 ∧ C−1 where C = ((crs)), ∧ = ((λrs)), crs =∂2f(x)
∂xr∂xs|x=0, C−1 is
the inverse of C.
λrs =
f(θ)
∫∞−∞{K
′r(t)}2dt
{k∏
i=1,j 6=r
∫∞−∞K
2j (t)dt
}, if r = s ,
f(θ){∫∞−∞Kr(t)K
′r(t)dt}{
∫∞−∞Ks(t)K
′s(t)dt}
{k∏
i=1,j 6=r,s
∫∞−∞K
2j (t)dt
}if r 6= s .
We now state the conditions for which the asymptotic properties of the estimate
proposed in this section will be proved.
(i) f (j1,j2,...,jk)(x) are uniformly continuous for 0 ≤ j1 + j2 + ...+ jk ≤ 2 .
(ii) The vector θ defined by f(θ) = maxx
f(x) is unique.
(iii) K(i)j are function of bounded variation for j = 1, 2, ..., k and i = 0, 1, 2.
(iv) h = n−δ, 0 < δ < 12k+4
.
42
Theorem 2.1.4. Under conditions (i),(ii),(iii), and (iv),
limn−→∞
θn = θ,
with probability one.
Theorem 2.1.5. Under conditions (i),(ii),(iii), and (iv), θn is asymptotically nor-
mally distributed with mean vector 0 and covariance matrixc−1Λc−1
nhk+2.
proof. See (Samanta, 1973) for more details.
We note that for k = 1, Theorem (2.1.5) reduces to Theorem (2.1.1) (Theo-
rem(5A) Parzen (1962) ).
2.2 The kernel estimation of the conditional mode
for random design models
Conditional density functions underlie many popular statistical object of interest,
thought they are rarely modeled directly in parametric setting and have perhaps
received even less attention in kernel setting. Nevertheless, as will be seen, they
are extremely useful for range of tasks, whether directly estimation the conditional
density function, or perhaps molding conditional quantiles via estimation of a con-
ditional CDF. And, of course, regression analysis (i.e, modeling means) depends
directly on the conditional density function. Indeed, estimating the conditional
density is actually much more informative, since it allows us not only to recalculate
the conditional expected value E(Y |X) and conditional variance from the density,
but also to provide the general shape of conditional density. Serval nonparametric
methods can be proposed for estimating conditional density function based on data
43
(Xi, Yi). A class of kernel-type estimators is called the Nadaraya-Watson estimator
which is one of the most widely known and used for estimating conditional den-
sity function. A lot of researches have been carried out on conditional distribution
function estimation. Conditional density estimation was introduced by (Rosenblatt,
1956). (Fan, 1993) proposed a direct estimator on Local polynomial estimation. The
NW estimator is created by independent researchers (Watson, 1964) and (Nadaraya,
1964). In this section, we investigate a nonparametric method for estimating the
conditional density function fn(y|x) of a random variable y given a random vector
x which is called Nadaraya-Watson estimator.
Let (X1, Y1), ..., (Xn, Yn) be R × R valued independent random variables with a
common pdf f(x, y). Also assume that X admits a marginal density g(x).
Suppose that we are given n observations of (X, Y ), denoted by (X1, Y1), ..., (Xn, Yn).
First, we consider the following estimator of the joint density f(x, y) of (X, Y ),
fn(x, y) =1
n
n∑i=1
Kh(x−Xi)Kh(y − Yi),
and define the marginal pdf of X as
gn(x) =1
n
n∑i=1
Kh(x−Xi),
where Kh = 1hK(x/h).
The Nadaraya-Watson estimator of the conditional density function f(y|x) is
given
fn(y|x) =fn(x, y)
gn(x)
44
=n−1
∑ni=1Kh(x–Xi)Kh(y–Yi)
n−1∑n
i=1Kh(x–Xi)
=
∑ni=1Kh(x−Xi)Kh(y − Yi)∑n
i=1Kh(x−Xi)
Now to estimate m(.), first we compute an estimator of the joint density f(x, y)
of (X, Y ), and then to integrate it according to the formula
m(x) =
∫∞−∞ yf(x, y)dy∫∞−∞ f(x, y)dy
. (2.2.1)
the conditional mean estimator of Y given X = x, mn(x), is defined as follows :
mn(x) =
∫ ∞−∞
yfn(y|x)dy =
∫∞−∞ yfn(x, y)dy∫∞−∞ fn(x, y)dy
. (2.2.2)
We want to investigate the prediction of Y by the mode function,
θ(x) = argmaxy∈R
fn(y|x), x ∈ R, (2.2.3)
where fn(y|x) denotes the kernel estimation of conditional density of Y given X,
we call θ(x) the population conditional mode or the mode function,
Definition 2.2.1. (Samanta and Thavanesmaran, 1990) (The estimated condi-
tional mode).
The estimated conditional mode is defined as the maximum of fn(y|x) over
y ∈ R
θn(x) = argmaxy∈R
fn(y|x), x ∈ R (2.2.4)
45
We call θn(x) the sample conditional mode. (Samanta and Thavanesmaran, 1990)
considered θn(x) as an estimate of θ(x). Then the result for multivariate case for
distinct points x1, x2, ..., xk discussed by (Salha and Ioanides, 2007).
Lemma 2.2.2. Under the formulas of fn(x, y) and f(x, y), we have
• (1)∫∞−∞ fn(x, y)dy = 1
n
∑ni=1Kh(x−Xi).
• (2)∫∞−∞ yfn(x, y)dy = 1
n
∑ni=1Kh(x−Xi)Yi.
proof. (1)
∫ ∞−∞
fn(x, y)dy =
∫ ∞−∞
n∑i=1
1
nKh(x−Xi)Kh(y − Yi)dy.
=1
n
n∑i=1
Kh(x−Xi)
∫ ∞−∞
Kh(y − Yi)dy, sinceKh(u) = K(u/h)/h
=1
n
n∑i=1
Kh(x−Xi)
∫ ∞−∞
1
hK
(y–Yih
)dy, (2.2.5)
let u =y – Yih
, then du = 1hdy
then substitute in (2.2.5), we have
∫ ∞−∞
fn(x, y)dy =1
n
n∑i=1
Kh(x−Xi)
∫ ∞−∞
1
hK
(y–Yih
)dy
=1
n
n∑i=1
Kh(x−Xi)
∫ ∞−∞
K(u)du
=1
n
n∑i=1
Kh(x−Xi).
46
Proof: (2)
∫ ∞−∞
yfn(x, y)dy = y
∫ ∞−∞
n∑i=1
1
nKh(x−Xi)Kh(y − Yi)dy. (2.2.6)
=1
n
n∑i=1
Kh(x−Xi)
∫ ∞−∞
yKh(y − Yi)dy, (2.2.7)
Let u = y – Yi, then du = dy Then substitute in (2.2.7), we have
∫ ∞−∞
yfn(x, y)dy =1
n
n∑i=1
Kh(x−Xi)
∫ ∞−∞
(u+ Yi)Kh(u)du,
=1
n
n∑i=1
Kh(x−Xi)(
∫ ∞−∞
uKh(u)du+ Yi
∫ ∞−∞
Kh(u)du)
=1
n
n∑i=1
Kh(x−Xi)Yi.
If we substitute these into the numerator and denominator of (2.2.2) we obtain
the Nadaraya Watson kernel estimator for m(.),
mn(x) =
∑ni=1Kh(x−Xi)Yi∑ni=1Kh(x−Xi)
=n∑i=1
wni(x)Yi,
where
wni(x) =Kh
(x−Xi)
n∑i=1
Kh(x−Xi), i = 1, ..., n,
are the weight functions.
The bandwidth h determines the degree of smoothness of mn(.), This can be imme-
diately seen by considering the limits for h tending to zero or to infinity respectively.
47
2.2.1 Asymptotic properties of the conditional mode esti-
mation
We consider the following assumptions from (Samanta and Thavanesmaran, 1990).
Then discuss this result for multivariate case for distinct points by (Salha and
Ioanides, 2007). We consider the following assumptions from (Samanta, 1973).
(A1) (X1, Y1), ..., (Xn, Yn), is a sample of i.i.d. random variables with joint pdf
f(x, y), where the following hold:
(i) The marginal probability density function of X, g(x) is uniformly continuous.
(ii) f (i,j)(x, y) =∂i+jf(x, y)
∂xi∂yjexist and are bounded for 1 ≤ i+ j ≤ 4.
(A2)The kernel K is a Borel function and satisfies the following:
(i) K(u) tends to zero as u tends to ±∞.
(ii) K(u) and its first two derivatives are functions of bounded variation.
(iii) lim|u|−→∞
|u2K(i)(u)| = 0, (i = 0, 1).
(iv)∫∞−∞ u
iK(u)du = 1, (if i = 0),∫∞−∞ u
iK(u)du = 0, (if i = 1, 2).
(v)∫∞−∞ |u|
3K(u)du <∞.
(A3) h is a sequence of positive numbers tending to zero, and satisfies the following
limn−→∞
nh8 =∞, limn−→∞
nh10 = 0.
Lemma 2.2.3. Under the assumptions (A1)(ii), (A2) and (A3), if (x, y) ∈ C(f),
then the following are true
48
(i) limn−→∞
nh4[V arf(0,1)n (x, y)] = f(x, y)
∫∞−∞
∫∞−∞(K(u)K(1)(v))2dudv.
(ii) (nh4)12{Ef (0,1)
n (x, y) – f (0,1)(x, y)} = o(1).
Lemma 2.2.4. Under the assumptions of lemma( 2.2.3), the following is true,
limn−→∞
(n−1h4)1+δ2
[ n∑i=1
E|wni–E{wni}|2+δ]
= 0.
For fixed x expanding f(0,1)n (x, θn(x)) around θ(x), we obtain 0 = f
(0,1)n (x, θn(x)) =
f(0,1)n (x, θ(x)) + (θn(x) – θ(x))f
(0,2)n (x, θ∗n(x)),
where |θ∗n(x) – θ(x)| < |θn(x) – θ(x)|.
Hence,
θn(x) – θ(x) = –f(0,1)n (x, θ(x))
f(0,2)n (x, θ∗n(x))
. (2.2.8)
Lemma 2.2.5. Under the assumptions (A1), (A2)(ii), (iii) and (A3),if g(x) > 0
then f(0,2)n (x, θ∗n(x)) converges in probability to f
(0,2)n (x, θn(x)) as n tends to infinity.
Theorem 2.2.6. Suppose that x1, x2, ..., xk are distinct points, where f(xi, y) > 0,
and (xi, y) ∈ C(f), (i = 1, 2, ..., k). Then under the assumption (A1), (A2)(ii), (iii),
and (A3), the distribution of the vector
(nh4)12{f (0,1)
n (x1, y) – f (0,1)(x1, y), ..., f (0,1)n (xk, y) – f (0,1)(xk, y)}T ,
where T denotes the transpose, is asymptotically multivariate normal with mean
zero vector and diagonal covariance matrix Γ = [γij], where
49
γij = f(xi, y)
∫ ∞−∞
∫ ∞−∞{K(u)K(1)(v)}2dudv. (i = 1, 2, ..., k).
where K(1)(v) denotes the first derivative of K(v).
proof:
Without loss of generality, we consider the special case k = 2. The same argu-
ments are used for the more general case.
Before we start the proof of the theorem, we introduce some notation. For i =
1, 2, ..., n and s = 1, 2 the following notations:
Vni = h−3K(xs – Xi
h
)K(1)
(y – Yih
),
Wni(xs) = h2(Vni(xs) – EVni(xs)), Wn(xs) =n∑i=1
Wni(xs),
Zni = (Wni(x1),Wni(x2))T , Zn = n−
12 (Wn(x1),Wn(x2))
T
Zn = (nh4)12
(f (0,1)n (x1, y) – Ef (0,1)
n (x1, y), f (0,1)n (x2, y) – Ef (0,1)
n (x2, y))T
(2.2.9)
let A = [ars] be a 2 × 2 diagonal matrix with
ars = f(xs, y)
∫ ∞−∞
∫ ∞−∞
(K(u)K(1)(v)
)2dudv.
Let Z be a bivariate normally distributed random variable with mean zero vec-
tor and covariance matrix A.
First we will show that Zn converges in distribution to Z. To do that, we will use
the multivariate version of Cramer - World theorem. It will be sufficient to prove
that CZnT converges in distribution to CZT for any constant C = (c1, c2) ∈ R2,
50
C 6= 0. Note that,
CZTn =
∑ni=1 n
− 12CZT
ni, E(n−12CZT
ni) = 0.
let ρ2+δni = E|n− 12CZni|2+δ, ρ2+δn =
∑ni=1 ρ
2+δni , and σ2
n = V ar(CZTn ).
Using Liapounov’s condition, it will be sufficient to show that,
limn−→∞
ρ2+δn
σ2+δn
= 0. (2.2.10)
Now, the proof of the theorem will be satisfied via the following lemmas.
Lemma 2.2.7. Under conditions (A2)(ii),(iii),(iv), if (xs, y) ∈ C(f), then for (s
= 1, 2), (r = 1, 2), the following are true:
a. limn−→∞
Ew2ni(xs) = f(x, y)
∫∞−∞
∫∞−∞{K(u)K(1)(v)}2dudv.
b limn−→∞
Ewni(xs)wni(xr) = 0, (r 6= s).
proof: see (El-sayed, 2008).
Summary:
In this chapter, we presented precis of important of the mode, and discussion an
advantage and disadvantage it, and we studied Parzen’s mode and it’s generalization
where (Samanta, 1973) and (Salha, 2014) has given multivariate versions of Parzen’s
results, then we presented conditional mode function, and we studied them in the
case of i.i.d. random variables. Also we studied the asymptotic behavior of the
mode and the conditional mode estimation. In the next chapter we will study
kernel estimation of the regression mode for fixed design models.
51
Chapter 3
Kernel Estimation of TheRegression Mode for Fixed Design
Models
Chapter 3
Kernel estimation of the
regression mode for fixed design
models
Introduction:
The problem of estimating the mode of a probability density has received consider-
able attention in the literature. For a historical and mathematical survey, Parzen
(1962) deem the problem of estimating the mode of a univariate pdf. Parzen (1962)
and Nadaraya (1965) have shown that under some regularity conditions the esti-
mator of the population mode obtained by maximizing a kernel estimator of the
pdf is strongly consistent and asymptotically normally distributed. Samanta (1973)
has given multivariate versions of Parzen’s results. For independent and identically
distributed data, Samanta and Thavanesmaran (1990) deem the problem of esti-
mating the mode of a conditional pdf, for random design model, and they have
shown under regularity conditions that the estimator of the population conditional
mode is strongly consistent and asymptotically normally distributed. Salha and
53
Ioanides (2004) generalized their work for the multivariate case. Salha and Ioanides
(2007) deem the estimation of the conditional mode under dependence conditions.
In This chapter we study the fixed design regression estimation problem, where we
are given data (x1, Y1), (x2, Y2), ..., (xn, Yn) with conditional pdf f(y|x), where the
variable Y depending upon a fixed design predictor x through a regression function
m(x). In this chapter, we consider the regression model satisfying x1, x2, ..., xn ∈
[0, 1] and
Yi = m(xi) + εi.
for
i = 1, ..., n.
for so-called regression functionm : [0, 1] −→ R and some independent random vari-
ables ε1, ε2, ..., εn with mean zero and variance σ2. The design points x1, x2, ..., xn de-
termined by the experimenter, are ordered, i.e. we have 0 ≤ x1 ≤ x2 ≤ ... ≤ xn ≤ 1.
In the absence of other information, we can take xi = in, i = 1, 2, ..., n.
Most inquests are concerned with regression mean function m(x), where
m(x) =
∫yf(y|x)dy.
However, new foresights about the underling structures can be earned by deem-
ing other localities of the conditional distribution function f(y|x) of Y at a given
value x ∈ [0, 1] such as the regression quantile and the regression mode. We con-
sider the problem of estimating the mode of the unknown function f(y|x).
We consider fn(y|x) as an estimator of f(y|x), and it is given by
54
fn(y|x) = h−1n∑i=1
uni(x)L(y–Yih
) (3.0.1)
where, L is kernel function satisfying some conditions and h is a sequence of positive
numbers converging to zero and satisfies some specific conditions.
uni(x) = h−1∫ si
si–1
K(x–u
h)du
is the Gasser-Mueller function, where, K is a kernel function satisfying some
condition, s1, s2, ..., sn is a sequence of interpolating points such that xi–1 ≤ si ≤ xi,
for i = 1, 2, ..., n. In general, we take si =xi−1 + xi
2.
Consequentially, there is a random variable Mn(x) which is called the sample re-
gression mode, such that
fn(Mn(x)|x) = max−∞<y<∞
fn(y|x).
In this chapter, for differentiated points x1, x2, ..., xk we will establish conditions
under which (nh4n)14 (Mn(x1),Mn(x2), ...,Mn(xk))
T , where T denotes the transpose,
is asymptotically multivariate normally distributed random variable.
This chapter is the main chapter of the thesis. In this chapter, we will study
the problem of estimating nonparametrically the regression mode for fixed design
models. We suppose the error random variables are independent. The joint asymp-
totic normality of the regression mode estimator at different fixed design points is
established under some regularity conditions. This chapter consists of two sections.
In Section 3.1, we introduce the fixed design model. In the next section, we present
mode estimation for fixed design data.
55
3.1 Fixed design model
A research design is the plan of a research study. The design of a study defines
the study kind (descriptive, correlational, semi-experimental, experimental, review,
meta-analytic) and sub-type (e.g., descriptive-longitudinal case study), research
problem, hypotheses, independent and dependent variables, experimental design,
and, if applicable, data collection methods and a statistical analysis plan. Research
design is the framework that has been created to find solutions to research questions.
Research designs are concerned with turning the research question into atesting
project. The best design depends on your research questions. Every design has
its positive and negative sides. The research design has been considered as for re-
search, dealing with at least four problems: what questions to study, what data are
relevant, what data to collect, and how to analyze the results (Krishna, 2013).
Research design can be divided into fixed and flexible research designs (Robson,
1993), others have referred to this distinction with quantitative research designs
and qualitative research designs.
In fixed designs, the design of the study is fixed before the main stage of data
collection takes place. Fixed designs are normally theory-driven, otherwise, it is
impossible to know in advance which variables need to be controlled and measured.
Often, these variables are measured quantitatively. Flexible designs allow for more
freedom during the data collection process. One reason for using a flexible research
design can be that the variable of interest is not quantitatively measurable, such as
culture. In other cases, theory might not be available before one starts the research.
56
Examples of fixed (quantitative) research designs: (Krishna, 2013)
Experimental design:
In an experimental design, the researcher briskly attempts to change the situa-
tion, adverbs, or experience of contributers (manipulation), which may progress to
a change in behavior or outcomes for the contributers of the study. The researcher
randomly assigns contributers to different conditions, measures the variables of in-
terest and attempts to control for confounding variables. Therefore, experiments
are often highly fixed even before the data collection starts.
In a good experimental design, a few things are of smashingly importance. First
of all, it is necessary to think of the best way to operationalize the variables that will
be measured, as well as which statistical methods would be most suitable to answer
the research question. Thus, the researcher should labeled what the expectations
of the study are as well as how to analyse any potential results. Finally, in an
experimental design the researcher must think of the practical calibration including
the availability of contributers as well as how representative the contributers are to
the goal population. It is important to labeled each of these factors before beginning
the experiment (Adr and Adr, 2008). Additionally, many researchers utilization
power analysis before they behaving an experiment, in order to determine how
large the sample must be to find an effect of a given size with a given design at the
needed probability of making a Type I or Type II error.
In statistical hypothesis testing, a type I error is the incorrect rejection of a true
null hypothesis (a ”false positive”), while a type II error is incorrectly retaining
a false null hypothesis (a ”false negative”). More simply stated, a type I error is
57
detecting an effect that is not present, while a type II error is failing to detect an
effect that is present.
In statistics, a null hypothesis is a statement that one seeks to nullify with evi-
dence to the contrary. Most commonly it is a statement that the phenomenon being
studied produces no effect or makes no difference. An example of a null hypothesis
is the statement ”This diet has no effect on people’s weight.” Usually, an experi-
menter frames a null hypothesis with the intent of rejecting it: that is, intending
to run an experiment which produces data that shows that the phenomenon un-
der study does make a difference (Sheskin, 2004). In some cases there is a specific
alternative hypothesis that is opposed to the null hypothesis, in other cases the
alternative hypothesis is not explicitly stated, or is simply ”the null hypothesis is
false”, in either event, this is a binary judgment, but the interpretation differs and
is a matter of significant dispute in statistics.
A type I error (or error of the first kind) is the incorrect rejection of a true null
hypothesis. Usually a type I error leads one to conclude that a supposed effect or
relationship exists when in fact it doesn’t. Examples of type I errors include a test
that shows a patient to have a disease when in fact the patient does not have the
disease, a fire alarm going on indicating a fire when in fact there is no fire, or an
experiment indicating that a medical treatment should cure a disease when in fact
it does not.
A type II error (or error of the second kind) is the failure to reject a false null
hypothesis. Examples of type II errors would be a blood test failing to detect
the disease it was designed to detect, in a patient who really has the disease, a
fire breaking out and the fire alarm does not ring; or a clinical trial of a medical
treatment failing to show that the treatment works when really it does (Peck and
Devore, 2011).
58
Non-experimental research designs:
Non-experimental research designs do not involve a manipulation of the situa-
tion, circumstances or experience of the participants. Non-experimental research
designs can be broadly classified into three categories. First, in relational designs,
a range of variables are measured. These designs are also called correlation studies,
because correlation data are most often used in analysis. Since correlation does not
imply causation, such studies simply identify co-movements of variables. Correla-
tional designs are helpful in identifying the relation of one variable to another, and
seeing the frequency of co-occurrence in two natural groups . The second type is
comparative research. These designs compare two or more groups on one or more
variable, such as the effect of gender on grades. The third type of non-experimental
research is a longitudinal design. A longitudinal design examines variables such as
performance exhibited by a group or groups over time.
Quasi experimental:
Quasi research designs are research design that follow the experimental procedure,
but do not randomly assign people to (treatment and comparison) groups.
3.2 Mode estimation for fixed design data
The mode is considered as one of the central tendency measures, the tendency of
data towards the central around a particular value. The mode is defined as the
common value or the most repetitive among those observation, and the data might
have more than one modes or nothing at all which can’t be calculated. The mode
can be found by calculation or drawing and it is not affected by the irregular values.
In this section, we will study the mode estimation for fixed design models and we
59
present the kernel estimation of the conditional mode for fixed design models.
We consider the fixed equally spaced design regression model
Yi = m(xi) + εi. i = 1, 2, ..., n.
where
xi =i
n, i = 1, 2, ..., n.
In this section we consider the following assumptions from (Samanta and Thavanes-
maran, 1990). Then discuss this result for multivariate case by (Salha and Ioanides,
2007), (Salha, 2014) For distinct points.
We assume the following conditions are satisfied.
Condition 1
1. m(x) is an bounded function on [0, 1].
2. εi are independent random variables with E(εi) = 0 and V ar(εi) = σ2, i =
1, 2, ..., n.
Condition 2
The partial derivatives
f (j)(x, y) =∂jf(y|x)
∂yj, j = 1, 2
exist and are bounded.
Condition 3
1. The kernel function K(.) has support [-1, 1] with K(−1) = K(1) = 0.
2.∫ 1
−1 K(u) du = 1 and∫ 1
−1 uK(u) du = 0.
60
3. The kernel function L(.) is asymmetric pdf.
Condition 4
The bandwidth are chosen such that h = cn−δ, c > 0, 0<δ<1 such that
limn−→∞
nh8 =∞, limn−→∞
nh10 = 0.
To prove our main result we will use the following preliminaries Lemmas from
(Samanta and Thavanesmaran, 1990).
Lemma 3.2.1. (Samanta, 1973) (Bochner Lemma) Suppose K1(u) and K2(u) are
real valued Borel measurable functions satisfying the following conditions:
(1) supu∈R |Ki(u)| <∞, i = 1, 2.
(2)∫∞−∞ |Ki(u)|du <∞, i = 1, 2.
(3) lim|u|−→∞
u2|Ki(u)| = 0, i = 1, 2.
If f(x, y) ∈ C(f), the set of continuity points of f , then for any η ≥ 0,
limn−→∞
[h−2∫∞−∞ |K1(
uh)K2(
vh)|1+ηf(y−v|x−u)dvdu] = f(y|x)
∫∞−∞
∫∞−∞ |K1(u)K2(v)|1+ηdvdu.
Define,
fn(j)(y|x) = h−(j+1)
n∑i=1
uni(x)L(j)(y–Yih
), j = 1, 2,
Vni(y|x) = h−2uni(x)L(1)(y–Yih
), i = 1, 2, ..., n.
Lemma 3.2.2. Under the conditions (1) through (4), if f(x, y) ∈ C(f), then the
following are true
61
1. limn−→∞
nh2[V arf
(1)n (y|x)
]= f(y|x)
∫∞−∞
∫ 1
−1(K(u)L(1)(v))2dudv.
2. (nh2)12
[Ef
(1)n (y|x) – f (1)(y|x)
]= o(1).
Lemma 3.2.3. Under the assumptions of lemma (3.2.2), the following is true,
limn−→∞
(n−1h2)1+δ2
[ n∑i=1
E|Vni–EVni|2+δ]
= 0.
Lemma 3.2.4. Under the conditions (1) through (4), if g(x) > 0, then f(2)n (M∗
n(x)|x)
converges in probability to f (2)(M(x)|x) as n tends to infinity, where |M∗n(x) – M(x)|<
|Mn(x) – M(x)|.
Theorem 3.2.5. Suppose that x1, x2, ..., xk are distinct points, where f(y|xi) > 0,
i = 1, 2, ..., k. Then under the conditions (1) through (4),
(nh2)12
(f (1)n (y|x1), f (1)
n (y|x2), ..., f (1)n (y|xk)
)T,
where T denotes the transpose, is asymptotically multivariate normal with mean
vector zero and diagonal covariance matrix Γ = [γij], with
γii = f(y|xi)∫ ∞−∞
∫ 1
−1
(K(u)L(1)(v)
)2dudv.
where L(1)(v) denotes the first derivative of L(v).
Proof:
Without loss of generality, we consider the special case k = 2. The method of the
proof remains valid for more general case. Firstly, we define for (i = 1, 2, ..., n) and
(s = 1, 2) the following notations:
Wni(xs) = h(Vni(y|xs) – EVni(y|xs)),
62
Wn(xs) =n∑i=1
Wni(xs),
Zni = (Wni(x1),Wni(x2))T ,
Zn = n−12 (Wn(x1),Wn(x2))
T
Zn = (nh4)12
(f (1)n (y|x1) – Ef (1)
n (y|x1), f (1)n (y|x2) – Ef (1)
n (y|x2))T
(3.2.1)
let A = [ars] be a 2 × 2 diagonal matrix with
ass = f(y|xs)∫ ∞−∞
∫ 1
−1
(K(u)L(1)(v)
)2dudv.
Let Z be a bivariate normally distributed random variable with mean zero vector
and covariance matrix A.
First we will show that Zn converges in distribution to Z. To do that, we will use
the multivariate version of Cramer - World theorem. It will be sufficient to prove
that CZnT converges in distribution to CZT for any constant C = (c1, c2) ∈ R2,
C 6= 0.
Note that,
CZTn =
∑ni=1 n
− 12CZT
ni, E(n−12CZT
ni) = 0.
let ρ2+δni = E|n− 12CZni|2+δ, ρ2+δn =
∑ni=1 ρ
2+δni , and σ2
n = V ar(CZTn ).
Using Liapounov’s condition, it will be sufficient to show that,
limn−→∞
ρ2+δn
σ2n
= 0. (3.2.2)
Now, the proof of the theorem will given via the following two lemmas.
63
Lemma 3.2.6. Under the conditions (1) through (4), if f(x, y) ∈ C(f), then for
s = 1, 2, r = 1, 2, the following are true:
a. limn−→∞
Ew2ni(xs) = f(y|xs)
∫∞−∞
∫ 1
−1
(K(u)L(1)(v)
)2du dv.
b. limn−→∞
Ewni(xs)wni(xr) = 0, (r 6= s).
Proof: a. From the definition of wni(xs), we have
Ew2ni(xs) = h2[EV 2
ni(y|xs) – (EVni(y|xs))2]. (3.2.3)
h2EV 2ni(y|xs) = h2
[h−4
∫ ∞−∞
∫ 1
−1
(K(
xs – u
h)L(1)(
y – v
h))2f(v|u)du dv
]= h−2
[ ∫ ∞−∞
∫ 1
−1
(K(
u
h)L(1)(
v
h))2f(y – v|xs – u)du dv
]Now, by an application of Lemma (3.2.1), we get that
limn−→∞
h2EV 2ni(y|xs) = f(y|xs)
∫ ∞−∞
∫ 1
−1
(K(u)L(1)(v)
)2dudv. (3.2.4)
h2(EVni(y|xs))2 = h2[h−2
∫ ∞−∞
∫ 1
−1K(
u
h)L(1)(
v
h)f(y – v|xs – u)dudv
]2.
By another application of Lemma (3.2.1), we get that
limn−→∞
h2(EVni(y|xs)
)2= 0. (3.2.5)
Now, a combination of (3.2.3), (3.2.4) and (3.2.5), (a) is satisfied.
b. From the definition of Wni(x), we have
E(Wni(x1)Wni(x2)) = h2(EVni(y|x1)Vni(y|x2) – EVni(y|x1)EVni(y|x2)
). (3.2.6)
64
Suppose that without loss of generality, x2 > x1, let δ = x2 – x1 and δn = δh.
h2EVni(y|x1)Vni(y|x2) = h−2∫ ∞−∞
∫ 1
−1K(
x1 – u
h)K(
x2 – u
h)
(L(1)(
y – v
h)
)2
f(v|u)du dv
=
∫ ∞−∞
∫ 1
−1K(u)K(δn + u)
(L(1)(v)
)2f(y–hv|x1 – hu)dudv.
=
∫ ∞−∞
(L(1)(v)
)2f(y – hv|x1 – hu)dv
[ ∫ 1
−1K(u)K(δn+u)
]du.
(3.2.7)
Next, note that∫ 1
−1K(u)K(δn + u)du =
∫|u|< δn
2
K(u)K(δn + u)du+
∫|u|> δn
2
K(u)K(δn + u)du
6 sup|u|< δn
2
K(δn + u)
∫ 1
−1K(z)dz + sup
|u|> δn2
K(u)
∫ 1
−1K(δn + z)dz
6 sup|u|> δn
2
K(u).O(1) + sup|u|> δn
2
K(u).O(1).
6 2 sup|u|> δn
2
K(u).O(1) 64
δnsup|u|> δn
2
|uK(u)|.O(1)
=4h
δsup|u|> δn
2
|uK(u)|.O(1) = O(h). (3.2.8)
Finally, from (3.2.7) and (3.2.8), we have that
limn−→∞
h2(EVni(y|x1)Vni(y|x2)) = 0. (3.2.9)
h2EVni(y|x1)EVni(y|x2) = h2[h−2
∫ ∞−∞
∫ 1
−1K(
u
h)L(1)(
v
h)f(y – v|x1 – u) du dv
]×[h−2
∫ ∞−∞
∫ 1
−1K(
u
h)L(1)(
v
h)f(y–v|x1–u) du dv
]−→ 0. (3.2.10)
By an application of Lemma (3.2.1). Hence a combination of (3.2.6), (3.2.9),
and (3.2.10), gives the desired result.
65
Lemma 3.2.7. Under the conditions of Lemma (3.2.6), we have that
limn−→∞
σ2n = CACT .
Proof:
σ2n = V ar(CZT
n ) and by the definition of ZTn , we have
σ2n = V ar(n−
12 c1Wn(x1) + n−
12 c2Wn(x2))
= n−1c21 V ar(Wn(x1)) + n−1c22V ar(Wn(x2)) + 2n−1c1c2 Cov(Wn(x1),Wn(x2))
= n−1c21
n∑i=1
V ar(Wni(x1)) + n−1c22
n∑i=1
V ar(Wni(x2))
+ 2n−1c1c2 Cov
( n∑i=1
Wni(x1),n∑i=1
Wni(x2)
)= c21 V ar(Wni(x1)) + c22 V ar(Wni(x2)) + 2n−1c1c2 E
( n∑i=1
n∑j=1
Wni(x1)Wnj(x2)
).
An application of Lemma (3.2.6) implies that,
limn−→∞
∫ ∞−∞
∫ 1
−1
(K(u)L(1)(v)
)2du dv[c21f(y|x1) + c22f(y|x2)] = CACT > 0.
Next, we have
ρ2+δni ≤ n−(1+δ2)|C|2+δE|Zni|2+δ = n−(1+
δ2)|C|2+δE|(Wni(x1),Wni(x2))|2+δ
≤ n−(1+δ2)|C|2+δ22+δmax{E|Wni(x1)|2+δ, E|Wni(x2)|2+δ}.
Without loss of generality assume that E|Wni(x1)|2+δ ≥ E|Wni(x2)|2+δ.
66
ρ2+δni ≤ n−(1+δ2)|C|2+δ22+δE|Wni(x1)|2+δ
= n−(1+δ2)|C|2+δ22+δE|h2(Vni(x1) – EVni(x1))|2+δ
= 22+δ|C|2+δ(n−1)(1+δ2)(h2)2+δE|Vni(x1) – EVni(x1)|2+δ
= 22+δ|C|2+δ(n−1h4)(1+δ2)E|(Vni(x1) – EVni(x1)|2+δ
Now, by an application of Lemma (3.2.3), we get that
ρ2+δn = n−(1+δ2)
n∑i=1
ρ2+δni
≤ 22+δ|C|2+δ(n−1h4n)(1+δ2)
n∑i=1
E|Vni(x1) – EVni(x1)|2+δ −→ 0.
Hence the Liapounov’s condition (2) is satisfied. Therefore, we have that, CZTn
is asymptotically normally distributed with mean zero and variance CACT and
by the multivariate version of Cramr - World theorem, we have that Zn converges
in distribution to Z. Now an application of the second part of Lemma (3.2.2) in
conjunction with the last convergence gives the proof of the theorem.
Theorem 3.2.8. Suppose that x1, x2, ..., xk are distinct points, where f(y|xi) > 0,
i = 1, 2, ..., k then under the conditions (1) through (4), the random vector variable
(nh2n)12
(Mn(x1) – M(x1), ...,Mn(xk) – M(xk)
)Tis asymptotically multivariate normally distributed with mean vector zero and a
diagonal covariance matrix B = [bij], with
67
bii =f(M(xi)|xi)
(f (2)(M(xi)|xi))2
∫ ∞−∞
∫ 1
−1
(K(u)L(1)(v)
)2
du dv.
Proof: For a fixed x, taking the Taylor expansion of f(1)n (Mn(x)|x) around M(x),
implies
0 = f (1)n (Mn(x)|x) ≈ f (1)
n (M(x)|x) + (Mn(x) – M(x))f (2)n (M∗
n(x)|x),
where
|M∗n(x) – M(x)| < |Mn(x) – M(x)|.
This implies that,
Mn(x) – M(x) ≈ –f(1)n (M(x)|x)
f(2)n (M∗
n(x)|x),
and therefore
(nh4)12 (Mn(x1) – M(x1), ...,Mn(xk) – M(xk))
T ≈
−(nh4)12
[f(1)n (M(x1)|x1)
f(2)n (M∗
n(x1)|x1), ...,
f(1)n (M(xk)|xk)
f(2)n (M∗
n(xk)|xk)
],
where
|M∗n(xi) – M(xi)| < |Mn(xi) – M(xi)|, i = 1, 2, ..., k.
An application of Theorem (3.2.5) and Lemma (3.2.4) completes the proof of
the theorem.
Summary:
In this chapter, we introduce the fixed design model, then we studied the problem
68
of estimating nonparametrically the regression mode for fixed design models. We
suppose the error random variables are independent. The joint asymptotic normal-
ity of the regression mode estimator at different fixed design points is established
under some regularity conditions. We considered finished theoretical phase whose
need it in practical phase.
69
Chapter 4
Applications
Chapter 4
Applications
Introduction:
This chapter contains three sections. The first section contains four simulated
studies. The second section contains applications using real data. The last section
contains some conclusion remakes and suggestions.
4.1 Simulation
If f(y|x) is unknown conditional pdf, then its estimation can be done by estimating
the conditional density function fn(y|x). Several nonparametric methods have been
proposed for estimating the conditional density function f(y|x) based on a fixed
sample, we will simulate the data and compare the results from the simulated
studies with the theoretical results, and if it consistent with or not. In this section,
we computed the conditional mode function through applications using simulated
data. For each estimator, we computed the mean squared error (MSE) and the
correlation coefficients between y the predicated values and yn the actual values
71
R2y,yn , where
MSE =SSE
n,
where
SSE =n∑i=1
(yi – yni)2.
and,
R2y,yn = 1 –
SSE
SSTO,
where, SSTO =∑n
i=1(yi – y)2 where y denotes the mean of actual values, yi de-
notes the true value and yni denotes its predicted value. The binwidth is computed
using the equation (1.6.2).
We studied ten models and then we choose the best four models.
For simulated data of sizes 150, 200 and 300 were simulated from the follwing
four models.
1. y = (sin 2π(1 – x)2) + xe,
2. y =1
5x+ 1+ sin(5x) + xe,
3. y = cos(π(1 – x)2) + xe,
4. y = 1 + 10cos(x) + x2e.
where x has a uniform distribution on [0, 1] and e has a standard normal dis-
tribution. We used the following two kernel functions, the rectangle kernel
K(x) =1
2, |x|<1
72
and the Gaussian kernel
L(x) =1√2πe
−x22 ,−∞<x<∞,
in eqution (3.0.1) to estimate the fixed design mode.
For each simulation, we computed the (MSE), and Correlation Coefficient.
The results are summarized in Table 4.1, Table 4.2, Table 4.3, and Table 4.4
respectively.
Figure 4.1, Figure 4.2, Figure 4.3 and Figure 4.4 show the scatter plot of the
simulated data together with the perfect curve and mode estimation.
From the tables below, we show that if the sample size increased then the MSE
decreased and correlation cofficient are increased.
The results of the four simulation studies indicate that the performance of the
estimator is reasonably good.
73
Table 4.1: The MSE and The Correlation Coefficient for the Simulation Study 1
Sample size MSE Correlation Coefficient
150 0.1333399 0.6517507
200 0.09691356 0.7468916
300 0.07884304 0.7940889
Figure 4.1: A scatter plot of the first simulated data together perfect curve
74
Table 4.2: The MSE and The Correlation Coefficient for the Simulation Study 2
Sample size MSE Correlation Coefficient
150 0.1155162 0.8318199
200 0.07993088 0.8834305
300 0.03671442 0.9463647
Figure 4.2: A scatter plot of the second simulated data together with perfect curve
75
Table 4.3: The MSE and The Correlation Coefficient for the Simulation Study 3
Sample size MSE Correlation Coefficient
150 0.1171055 0.7545829
200 0.09724039 0.7967533
300 0.0672099 0.8598906
Figure 4.3: A scatter plot of the third simulated data together with perfect curve
76
Table 4.4: The MSE and The Correlation Coefficient for the Simulation Study 4
Sample size MSE Correlation Coefficient
150 0.1149359 0.9409647
200 0.08765359 0.9548517
300 0.07012372 0.9637793
Figure 4.4: A scatter plot of the foorth simulated data together with perfect curve
77
4.2 Real data
In this section, we computed the conditional mode by kernel estimator through
applications using real data.
1- Ethanol data
We consider an application using a real life data which is built in S-Plus program.
We consider the ethanol data which records 88 measurements from an experiment
in which ethanol was burned in a single cylinder auto-mobile test engine. We used
the first 71 observation from the variable E, which indicates the measure of the
richness of the air/ethanol mixture to estimate the last 17 observations. The mean
squared error is computed, MSE = 0.0436. Figure 4.5 shows the plot of the data
and the regression mode estimation.
Figure 4.5: Regression mode estimation of the ethanol data
78
2- Vapor Pressure of Liquid Water data
We consider an application using a real life data which is (Lide, 2002). We con-
sider the vapor Pressure of Liquid Water data which records 160 measurements from
an experiment in which the 160 - degree fixed point of the international temperature
scale is defined as the temperature of condensing water vapor under the pressure of
one standard atmosphere. In actual practice, the usual procedure in thermometer
calibration is to observe the thermometer reading in condensing water vapor at the
prevailing atmospheric pressure, and to determine the actual temperature by ob-
servation of the pressure and use of the relation of vapor pressure to temperature.
The differences shown indicate the range of interpretations of the vapor pressure
data as affecting the practical calibration of thermometers. The mean squared er-
ror is computed, MSE = 0.000241. Figure 4.6 shows the plot of the data and the
regression mode estimation.
Figure 4.6: Regression mode estimation of the vapor Pressure of Liquid Water
79
4.3 Conclusion
In this thesis, the problem of estimating the regression mode for fixed design model
has been considered. The joint asymptotic normality of the regression mode esti-
mator at different fixed design points has been established under some regularity
conditions. The performance of the proposed estimator is tested via applications
using simulated and real life data. The results of the applications indicate that the
performance of the estimator is reasonably good.
The results of the applications can be slightly improved if the rectangle kernel
is replaced by the Epanchinkov kernel which is the optimal kernel. We used the
rectangle kernel for computational reasons. The new estimator can be modified by
considering a new bandwidth selection technique that uses a variable bandwidth
that depends on the points at which the mode function is estimated rather than a
constant bandwidth. By considering some conditions, the results of this thesis can
be generalized to the case of dependent data under some mixing conditions and for
the case of time series data.
80
Bibliography
Abraham, C., Biau, G., and Cadre, B. (2003). Simple estimation of the mode of a
multivariate density. Canadian Journal of Statistics, 31(1):23–34.
Abraham, C., Biau, G., and Cadre, B. (2004). On the asymptotic properties of a
simple estimate of the mode. ESAIM: Probability and Statistics, 8:1–11.
Adr, H. J. and Adr, M. (2008). Advising on research methods: A consultant’s
companion. Johannes van Kessel Publishing.
Devroye, L. (1979). Recursive estimation of the mode of a multivariate density. The
Canadian Journal of Statistics, pages 159–167.
El-sayed, H. (2008). The Asymptotic Distributions of The Kernel Estimations of
The Conditional Mode and Quantiles. Degree Of Master of Mathematics. The
Islamic University Of Gaza, Palestine.
Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies.
The Annals of Statistics, pages 196–216.
Fan, J. and Yao, Q. (2003). Nonlinear time series: Nonparametric and parametric
methods springer-verlag. New York.
81
Fix, E. and Hodges Jr, J. L. (1951). Discriminatory analysis nonparametric dis-
crimination: consistency properties. Technical report, DTIC Document.
Hansen, B. E. (2009). Lecture notes on nonparametrics. Lecture notes.
Hogg, R. V. and Craig, A. T. (1995). Introduction to mathematical statistics.(5
edition). Upper Saddle River, New Jersey: Prentice Hall.
Hrdle, W. (2012). Smoothing techniques: with implementation in S. Springer Science
and Business Media.
Konakov, V. D. (1974). On the asymptotic normality of the mode of multidi- men-
sional distributions, volume 19, pages 794– 799. Theory of Probab.
Krishna, S. (2013). Assignment on research design importance.
Le Cam, L. M. (1953). On some asymptotic properties of maximum likelihood
estimates and related Bayes estimates, volume 1. University of California press.
Lide, D. R. (2002). Crc handbook of chemistry and physics: A ready-reference book
of chemical and physical data Florida: CRC Press, cop.
Loeve, M. (1960). Probanility Theory. 2nd Ed.Van Nostrand, Princeton. Nostand.
Nadaraya, E. A. (1964). some new estimates for distribution functions, volume 15,
pages 497–500. Theory Probab.
Nadaraya, E. A. (1965). On Nonparametric Estimation of Density Functions and
Regression Curves, volume 10, pages 186–190. Theory of Probability and Their
Applications.
Ouyang, Z. (2005). Univariate kernel density estimation. pdf (18-09-2011).
82
Parzen, E. (1962). On Estimation of a Probability Density Functin and Mode,
volume 33, pages 1065–1076. The Annals of Mathematical Statistics.
Peck, R. and Devore, J. L. (2011). Statistics: The exploration and analysis of data.
Cengage Learning.
Racine, J. (2008). Nonparametric econometrics: a primer, volume 3. Now Pub.
Robson, C. (1993). Real-world research: A resource for social scientists and prac-
titioner researchers. Malden: Blackwell Publishing.
Rosenblatt, M. (1956). Remarks on some non-parametric estimates of the density
function, volume 27, pages 832–837. Annals Math. Statist.
Royden, H. L. and Fitzpatrick, P. (1988). Real analysis, volume 198. Macmillan
New York.
Salha, R. (2014). Kernel estimation of the regression mode for fixed design model.
Salha, R. and Ioanides, D. (2007). The asymptotic distribution of the estimated
conditional mode at afinite number of distinct points under dependence conditions,
volume 15, pages 199–214. The Islamic University Journal.
Samanta, M. (1973). Nonparametric Estimation of The Mode of A multivariate
Density, volume 7, pages 109–117. South African Statistical Journal.
Samanta, M. and Thavanesmaran, A. (1990). Nonparametric Estimation of the
Conditional Mode. Communications in Statistics, volume 19, pages 4515–4524.
Theory and Methods.
Sen, P. K. (1993). Large sample methods in statistics: an introduction with appli-
cations. New York.
83
Sheskin, D. J. (2004). Handbook of parametric and nonparametric statistical proce-
dures. crc Press.
Silverman, B. W. (1986). Density estimation for statistics and data analysis, vol-
ume 26. CRC press.
Sturges, H. A. (1926). The choice of a class interval. Journal of the american
statistical association, 21(153):65–66.
Wand, M.P, J. M. (1995). Kernel Smoothing. Univ of New South Wales Asutralia.
Watson, G. S. (1964). Smooth regression analysis, pages 359–372. ankhy: The
Indian Journal of Statistics, Series A.
Yates, R. D. and Goodman, D. J. (1999). Probability and stochastic processes.
John Willey and Sons.
84
Recommended