INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1-1/i2ml-chap8-v1... · CHAPTER 8: Nonparametric ... Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning

INTRODUCTION TO

Machine Learning

ETHEM ALPAYDIN© The MIT Press, 2004

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml

Lecture Slides for

CHAPTER 8:

Nonparametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3

Nonparametric Estimation

Parametric (single global model), semiparametric (small number of local models)Nonparametric: Similar inputs have similar outputsFunctions (pdf, discriminant, regression) change smoothlyKeep the training data;“let the data speak for itself”Given x, find a small number of closest training instances and interpolate from theseAka lazy/memory-based/case-based/instance-based learning


4

Density Estimation

Given the training set X={xt}t drawn iid from p(x)

Divide data into bins of size h

Histogram:

Naive estimator:

or

( ) { }Nh

xx#xp

t as bin same the in =

( ) { }Nh

hxxhx#xp

t

2+≤<−

=

( ) ( )⎩⎨⎧ <

=⎟⎟⎠

⎞⎜⎜⎝

⎛ −= ∑

= otherwise0

1if 21

1

1

u/uw

hxx

wNh

xpN

t

t


5


6


7

Kernel Estimator

Kernel function, e.g., Gaussian kernel:

Kernel estimator (Parzen windows)

( ) ∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

N

t

t

hxx

KNh

xp1

1

( ) ⎥⎦

⎤⎢⎣

⎡−

π=

2exp

2

1 2uuK


8


9

k-Nearest Neighbor Estimator

Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width

dk(x), distance to kth closest instance to x

( ) ( )xNdk

xpk2

=


10


11

Multivariate Data

Kernel density estimator

Multivariate Gaussian kernel

spheric

ellipsoid

( ) ∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

N

t

t

d hK

Nhp

1

1 xxx

( )

( )( ) ⎥⎦

⎤⎢⎣⎡−

π=

⎥⎥⎦

⎤

⎢⎢⎣

⎡−⎟

⎠

⎞⎜⎝

⎛π

=

− uuu

uu

1212

2

21

exp2

1

2exp

2

1

SS

T//d

d

K

K


12

Nonparametric Classification

Estimate p(x|Ci) and use Bayes’ rule

Kernel estimator

k-NN estimator

( ) ( )

( ) ( ) ( ) ti

N

t

t

diii

ii

ti

N

t

t

di

i

rh

KNh

CPCpg

NN

CPrh

KhN

Cp

∑

∑

=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −==

=⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

1

1

1|

1

|

xxxx

xxx

( ) ( ) ( ) ( ) ( )( ) k

kp

CPCpCP

VNk

Cp iiiik

i

ii ===

xx

xx

x|

| |


13

Condensed Nearest Neighbor

Time/space complexity of k-NN is O (N)

Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968)

( ) ( ) ZZXXZ λ+= || E'E


14

Condensed Nearest Neighbor

Incremental algorithm: Add instance if needed


15

Nonparametric Regression

Aka smoothing models

Regressogram

( ) ( )( )

( )⎩⎨⎧

=

=∑∑

=

=

otherwise0

withbin same the in is if 1

where

1

1

xxx,xb

x,xb

rx,xbxg

tt

N

t

t

tN

t

t


16


17


18

Running mean smoother

Running line smoother

Running Mean/Kernel Smoother

Kernel smoother

where K( ) is Gaussian

Additive models (Hastie and Tibshirani, 1990)

( )

( )⎩⎨⎧ <

=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −

⎟⎟⎠

⎞⎜⎜⎝

⎛ −

=

∑

∑

=

=

otherwise0

1if 1

where

1

1

uuw

hxx

w

rh

xxw

xgN

t

t

tN

t

t

( )∑

∑

=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −

⎟⎟⎠

⎞⎜⎜⎝

⎛ −

=N

t

t

tN

t

t

hxx

K

rh

xxK

xg

1

1


19


20


21


22

How to Choose k or h?

When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity

As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity

Cross-validation is used to finetune k or h.

Documents

INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1-1/i2ml-chap8-v1... · CHAPTER 8: Nonparametric ... Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning