52
MIMA Group Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Chapter 4 Non-Parameter Estimation M L D M

M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA Group

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Chapter 4Non-Parameter Estimation

M LD M

Page 2: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2

Contents Introduction Parzen Windows K-Nearest-Neighbor Estimation Classification Techniques

The Nearest-Neighbor rule(1-NN) The Nearest-Neighbor rule(k-NN)

Distance Metrics

Page 3: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 3

Bayes Rule for Classification

To compute the posterior probability, we need to know the prior probability and the likelihood.

Case I: has certain parametric form Maximum-Likelihood Estimation Bayesian Parameter Estimation

Problems: The assumed parametric form may not fit the ground-

truth density encountered in practice, e.g., assumed parametric form: unimodal; ground-truth: multimodal

)()()|()|(

xppxpxP ii

i

)|( ixp

Page 4: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 4

Non-Parameter Estimation Case II: doesn’t have parametric form How?

)|( ixp

Let the data speak for themselves!

Parzen WindowsKn-Nearest-Neighbor

Page 5: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 5

Goals Estimate class-conditional densities

Estimate posterior probabilities

)|( ip x

)|( xiP

Page 6: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 6

Density Estimation Assume p(x) is continuous, and R is small Fundamental fact

The probability of a vector x fall into a region R:

R

n samples

+x

')'()( xxX dpP R

R R

')( xx dp

RPRVp )(x

Given n examples (i.i.d.) {x1, x2,…, xn}, let K denote the random variable representing number of samples falling into R, K will take Binomial distribution:

knk PPkn

kKP

)1()( RR),(~ RPnBK

Page 7: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 7

Density Estimation Assume p(x) is continuous, and R is small Fundamental fact

The probability of a vector x fall into a region R:

R

n samples

+x

')'()( xxX dpP R

R R

')( xx dp

RPRVp )(x

RR PVp )(x

RnPKE ][ RVnKEp /][)( x

Let kR denote the actual number of samples in R

R

R

Vnk

p/

)( x

Page 8: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 8

Density Estimation

Use subscript n to take sample size into account

We hope that: To do this, we should have

R

R

Vnk

p/

)( x

)()(lim xx ppnn

n

nn V

nkp /)( x

0lim nnV

nn

klim

0/lim

nknn

R

+x

n samples

Page 9: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 9

Density Estimation

Fix Vn and determine kn Parzen Windows

Fix kn and determine Vn kn-Nearest-Neighbor

n

nn V

nkp /)( x What items can be controlled?How?

Page 10: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 10

Parzen Windows

Assume Rn is a d-dimensional hypercube The length of each edge is hn

n

nn V

nkp /)( x Fix Vn and determine kn

dnn hV

Emanuel Parzen(1929-)

Determine kn with window function a.k.a. kernel function, potential function.

Page 11: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 11

Window function

It defines a unit hypercube centered at the origin.

1

1

1

otherwise0

2/||1 n

n

hxhx

hn

hn

hn

Page 12: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 12

Window function

1 means that x’ falls within the hypercube of volume vn centered at x.

kn: # samples inside the hypercube centered at x,

otherwise0

2/||1 njj

n

hxxhxx

hn

hn

hnx

n

i n

in h

k1

xx

Page 13: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 13

Parzen Window Estimation

is not limited to be the hypercube window function defined previously. It could be any pdf function

n

i n

in h

k1

xx

n

nn V

nkp /)( x

n

i n

i

nn hVn

p1

11)( xxx

)(u

1)( uu d

Parzen pdf

Page 14: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 14

Parzen Window Estimation

pn(x) is a pdf function?

1

1 1ni

i n n

dn V h

x x x

n

id

n 1

1 uu1

1 1ni

i n n

dn V h

x x x

n

i n

i

nn hVn

p1

11)( xxx

Set (x-xi)/hn=u.

Window functionBeing pdf

Window width

Training data

Parzenpdf

Page 15: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 15

Parzen Window Estimation

pn(x): superpostion (叠加)of n interpolations (插值)

xi: contributes to pn(x) based on its “distance” from x.

n

i n

i

nn hVn

p1

11)( xxx

nnn hV

xx 1)(

n

iinn n

p1

)x-x(1)( x

What is the effect of hn (window width) on the Parzen pdf?

Page 16: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 16

Parzen Window Estimation The effect of hn

ndnnn

n hhhVxxx 11)(

Affects the width(horizontal scale)

Affects the amplitude(vertical scale)

Page 17: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 17

Parzen Window Estimation

The shape of δn(x) with decreasing values of hn

nnn hV

xx 1)( Suppose φ(.) being a 2-d Gaussian pdf.

Page 18: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 18

Parzen Window Estimation

When hn is very large, δn(x) will be broad with small amplitude. Pn(x) will be the superposition of n broad, slowly

changing functions, i.e., being smooth with low resolution.

When hn is very small, δn(x) will be sharp with large amplitude. Pn(x) will be the superposition of n sharp pulses, i.e.,

being variable/unstable with high resolution.

nnn hV

xx 1)(

n

iinn n

p1

)x-x(1)( x

Page 19: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 19

Parzen Window Estimation

Pazen window estimations for five samples, supposing that φ(.) is a 2-d Gaussian pdf.

nnn hV

xx 1)(

n

iinn n

p1

)x-x(1)( x

Page 20: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 20

Parzen Window Estimation Convergence conditions

To ensure convergence, i.e.,

We have the following additional constraints:

)()]([lim xx ppE nn

0)]([lim

xnn

pVar

)(sup uu

0)(lim1||||

d

iiuu

u

0lim nnV

nn

nVlim

Page 21: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 21

Illustrations One dimension case:

2/2

21)( ue

u

n

i n

i

nn hhn

p1

11)( xxx

nhhn /1 nhhn /1

)1,0(~ NX

Page 22: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 22

Illustrations One dimension case:

n

i n

i

nn hhn

p1

11)( xxx

nhhn /1 nhhn /1

2/2

21)( ue

u

Page 23: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 23

Illustrations Two dimension case:

nhhn /1 nhhn /1

n

i n

i

nn hhn

p1

2

11)( xxx

Page 24: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 24

Classification Example

Smaller window Larger window

Page 25: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 25

Choosing Window Function Vn must approach zero when n, but at a rate

slower than 1/n, e.g.,

The value of initial volume V1 is important. In some cases, a cell volume is proper for one

region but unsuitable in a different region.

nVVn /1

Page 26: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 26

kn-Nearest Neighbor

Fix kn and then determine Vn

To estimate p(x), we can center a cell about xand let it grow until it captures kn samples, kn is some specified function of n, e.g.,

n

nn V

nkp /)( x

nkn

nnklim

0lim nnV

Principled rule to choose kn

Page 27: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 27

kn-Nearest Neighbor

Eight points in one dimension(n=8, d=1)Red curve: kn=3Black curve: kn=5

Thirty-one points in two dimensions(n = 31, d=2)Black surface: kn=5

Page 28: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 28

Estimation of A Posterior probability

Pn(i|x)=?

c

jjn

inin

xp

xpP

1

),(

),()|(

x

n

iin V

nkxp /),(

n

nc

jjnn V

nkxp /),(1

n

i

kk

x

Page 29: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 29

Estimation of A Posterior probability

Pn(i|x)=?

c

jjn

inin

xp

xpP

1

),(

),()|(

x

n

iin V

nkxp /),(

n

nc

jjnn V

nkxp /),(1

n

i

kk

The value of Vn or kn can be determined base on Parzen windowor kn-nearest-neighbor technique.

Page 30: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 30

Nearest Neighbor Classifier Store all training examples Given a new example x to be classified, search

for the training example (xi, yi) whose xi is most similar (or closest) to x, and predict yi. (Lazy Learning)

c

jjn

inin

xp

xpP

1

),(

),()|(

xn

i

kk

Page 31: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 31

Decision Boundaries Decision Boundaries

The voronoi diagram Given a set of points, a Voronoi diagram describes

the areas that are nearest to any given point. These areas can be viewed as zones of control.

Page 32: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 32

Decision Boundaries Decision boundary is

formed by only retaining these line segment separating different classes.

The more training examples we have stored, the more complex the decision boundaries can become.

Page 33: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 33

Decision Boundaries With large number of

examples and noise in the labels, the decision boundary can become nasty!

It can be bad some times-note the islands in this figure, they are formed because of noisy examples.

If the nearest neighbor happens to be a noisy point, the prediction will be incorrect.

How to deal with this?

Page 34: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 34

Effect of k Different k values give different results:

Large k produces smoother boundaries The impact of class label noises canceled out by one another.

When k is too large, what will happen. Oversimplified boundaries, e.g., k=N, we always predict the majority

class

Page 35: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 35

How to Choose k? Can we choose k to minimize the mistakes that

we make on training examples? (training error) What is the training error of nearest-neighbor?

Can we choose k to minimize the mistakes that we make on test examples? (test error)

Page 36: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 36

How to Choose k? How do training error and test error change as

we change the value ok k?

Page 37: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 37

Model Selection Choosing k for k-NN is just one of the many

model selection problems we face in machine learning.

Model selection is about choosing among different models Linear regression vs. quadratic regression K-NN vs. decision tree Heavily studied in machine learning, crucial

importance in practice. If we use training error to select models, we will

always choose more complex ones.

Page 38: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 38

Model Selection Choosing k for k-NN is just one of the many

model selection problems we face in machine learning.

Model selection is about choosing among different models Linear regression vs. quadratic regression K-NN vs. decision tree Heavily studied in machine learning, crucial

importance in practice. If we use training error to select models, we will

always choose more complex ones.

Page 39: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 39

Model Selection We can keep part of the labeled data apart as

validation data. Evaluate different k values based on the

prediction accuracy on the validation data Choose k that minimize validation error

Validation can be viewed as another name for testing, but the name testing is typically reserved for final evaluation purpose, whereas validation is mostly used for model selection purpose.

Page 40: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 40

Model Selection The impact of validation set size

If we only reserve one point in our validation set, should we trust the validation error as a reliable estimate of our classifier’s performance?

The larger the validation set, the more reliable our model selection choices are

When the total labeled set is small, we might not be able to get a big enough validation set – leading to unreliable model selection decisions

Page 41: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 41

Model Selection K-fold Cross Validation

Perform learning/testing K timesEach time reserve one subset for validation set,

train on the rest

Special case: Learve one-out crass validation

Page 42: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 42

Other issues of kNN It can be computationally expensive to find the

nearest neighbors! Speed up the computation by using smart data

structures to quickly search for approximate solutions For large data set, it requires a lot of memory

Remove unimportant examples

Page 43: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 43

Final words on KNN KNN is what we call lazy learning (vs. eager learning)

Lazy: learning only occur when you see the test example Eager: learn a model before you see the test example, training examples can be

thrown away after learning

Advantage: Conceptually simple, easy to understand and explain Very flexible decision boundaries Not much learning at all!

Disadvantage – It can be hard to find a good distance measure – Irrelevant features and noise can be very detrimental – Typically can not handle more than 30 attributes – Computational cost: requires a lot computation and memory

Page 44: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 44

Distance Metrics Distance Measurement is an importance factor

for nearest-neighbor classifier, e.g., To achieve invariant pattern recognition and data

mining results.The effect of change units

Page 45: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 45

Distance Metrics Distance Measurement is an importance factor

for nearest-neighbor classifier, e.g., To achieve invariant pattern recognition and data

mining results.The effect of change units

Page 46: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 46

Properties of a Distance Metric

Nonnegativity

Reflexivity

Symmetry

Triangle Inequality

0),( baD

baba iff 0),(D

),(),( abba DD

),(),(),( cacbba DDD

Page 47: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 47

Minkowski Metric (Lp Norm)

1. L1 norm

2. L2 norm

3. L norm

pd

i

pp x

/1

1||||||

x

d

iii baL

111 ||||||),( baba

2/1

1

222 ||||||),(

d

iii baL baba

|)max(|||||||),(/1

1ii

d

iii babaL

baba

Manhattan or city block distance

Euclidean distance

Chessboard distance

Page 48: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 48

Minkowski Metric (Lp Norm)

1. L1 norm

2. L2 norm

3. L norm

pd

i

pp x

/1

1||||||

x

d

iii baL

111 ||||||),( baba

2/1

1

222 ||||||),(

d

iii baL baba

|)max(|||||||),(/1

1ii

d

iii babaL

baba

Manhattan or city block distance

Euclidean distance

Chessboard distance

Page 49: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 49

Summary Basic setting for non-parametric techniques

Let the data speak for themselvesParametric form not assumed for class-conditional

pdf Estimate class-conditional pdf from training examples

Make predictions based on Bayes Theorem Fundamental results in density estimation

n

nn V

nkp /)( x

Page 50: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 50

Summary Parzen Windows

Fix Vn and then determine kn

n

nn V

nkp /)( x

n

i n

i

nn hVn

p1

11)( xxx

Window functionBeing pdf

Window width

Training data

Parzenpdf

Page 51: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 51

Summary kn-Nearest-Neighbor

Fix kn and then determine Vn

Fix kn and then determine Vn

To estimate p(x), we can center a cell about xand let it grow until it captures kn samples, where is some specified function of n, e.g.,

nkn

nnklim

0lim nnV

Principled rule to choose kn

Page 52: M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

MIMA Group

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Any Question?