24
Data Mining and Statistic al Learning - 2008 1 Kernel methods - overview Kernel smoothers Local regression Kernel density estimation Radial basis functions

Kernel methods - overview

Embed Size (px)

DESCRIPTION

Kernel methods - overview. Kernel smoothers Local regression Kernel density estimation Radial basis functions. Introduction. Kernel methods are regression techniques used to estimate a response function from noisy data Properties: - PowerPoint PPT Presentation

Citation preview

Page 1: Kernel methods - overview

Data Mining and Statistical Learning - 2008

1

Kernel methods- overview

Kernel smoothers

Local regression

Kernel density estimation

Radial basis functions

Page 2: Kernel methods - overview

Data Mining and Statistical Learning - 2008

2

Introduction

Kernel methods are regression techniques used to estimate a response function

from noisy data

Properties:

• Different models are fitted at each query point, and only those observations close to that point are used to fit the model

• The resulting function is smooth

• The models require only a minimum of training

dRXXfy ),(

Page 3: Kernel methods - overview

Data Mining and Statistical Learning - 2008

3

A simple one-dimensional kernel smoother

where

N

ii

N

iii

xxK

yxxKxf

10

10

0

,

otherwise,0

|| if ,1 00

xx

xxK

4.9

5

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

6

0 5 10 15 20 25

Observed Fitted

Page 4: Kernel methods - overview

Data Mining and Statistical Learning - 2008

4

Kernel methods, splines and ordinary least squares regression (OLS)

• OLS: A single model is fitted to all data

• Splines: Different models are fitted to different subintervals (cuboids) of the input domain

• Kernel methods: Different models are fitted at each query point

Page 5: Kernel methods - overview

Data Mining and Statistical Learning - 2008

5

Kernel-weighted averages and moving averages

The Nadaraya-Watson kernel-weighted average

where indicates the window size and the function D shows how the weights change with distance within this window

The estimated function is smooth!

K-nearest neighbours

The estimated function is piecewise constant!

N

ii

N

iii

xxK

yxxKxf

10

10

0

,

))(|()(ˆ xNxyAvexf kii

0xx

DK

Page 6: Kernel methods - overview

Data Mining and Statistical Learning - 2008

6

Examples of one-dimesional kernel smoothers

• Epanechnikov kernel

• Tri-cube kernel

otherwise

tifttD

0

1143

)(2

otherwise

tifttD0

11)(33

Page 7: Kernel methods - overview

Data Mining and Statistical Learning - 2008

7

Issues in kernel smoothing

• The smoothing parameter λ has to be defined

• When there are ties at xi : Compute an average y value and introduce weights representing the number of points

• Boundary issues

• Varying density of observations:

– bias is constant– the variance is inversely proportional to the density

Page 8: Kernel methods - overview

Data Mining and Statistical Learning - 2008

8

Boundary effects of one-dimensionalkernel smoothers

Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression

Page 9: Kernel methods - overview

Data Mining and Statistical Learning - 2008

9

Local linear regression

Find the intercept and slope parameters solving

The solution is a linear combination of yi:

Page 10: Kernel methods - overview

Data Mining and Statistical Learning - 2008

10

Kernel smoothing vs local linear regression

Kernel smoothing

Solve the minimization problem

Local linear regression

Solve the minimization problem

N

iiixa xyxxK

1

200)( )]()[,(min

0

N

iiiixxa xxxyxxK

1

2000)(),( ])()()[,(min

00

Page 11: Kernel methods - overview

Data Mining and Statistical Learning - 2008

11

Properties of local linear regression

• Automatically modifies the kernel weights to correct for bias

• Bias depends only on the terms of order higher than one in the expansion of f.

Page 12: Kernel methods - overview

Data Mining and Statistical Learning - 2008

12

Local polynomial regression

• Fitting polynomials instead of straight lines

Behavior of estimated response function:

Page 13: Kernel methods - overview

Data Mining and Statistical Learning - 2008

13

Polynomial vs local linear regression

Advantages:

• Reduces the ”Trimming of hills and filling of valleys”

Disadvantages:

• Higher variance (tails are more wiggly)

Page 14: Kernel methods - overview

Data Mining and Statistical Learning - 2008

14

Selecting the width of the kernel

Bias-Variance tradeoff:

Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance.

Page 15: Kernel methods - overview

Data Mining and Statistical Learning - 2008

15

Selecting the width of the kernel

1. Automatic selection ( cross-validation)

2. Fixing the degrees of freedom

ijij xlSS ,ˆ yf

Stracedf

Page 16: Kernel methods - overview

Data Mining and Statistical Learning - 2008

16

Local regression in RP

The one-dimensional approach is easily extended to p dimensions by

• Using the Euclidian norm as a measure of distance in the kernel.

• Modifying the polynomial

,,,,,,1 2221

2121 XXXXXXXb

Page 17: Kernel methods - overview

Data Mining and Statistical Learning - 2008

17

Local regression in RP

”The curse of dimensionality”

• The fraction of points close to the boundary of the input domain increases with its dimension

• Observed data do not cover the whole input domain

Page 18: Kernel methods - overview

Data Mining and Statistical Learning - 2008

18

Structured local regression models

Structured kernels (standardize each variable)

Note: A is positive semidefinite

Page 19: Kernel methods - overview

Data Mining and Statistical Learning - 2008

19

Structured local regression models

Structured regression functions

• ANOVA decompositions (e.g., additive models)

Backfitting algorithms can be used

• Varying coefficient models (partition X)

• INSERT FORMULA 6.17

Page 20: Kernel methods - overview

Data Mining and Statistical Learning - 2008

20

Structured local regression models

Varying coefficient

models (example)

Page 21: Kernel methods - overview

Data Mining and Statistical Learning - 2008

21

Local methods

• Assumption: model is locally linear ->maximize the log-likelihood locally at x0:

• Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et ->

yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt)

Page 22: Kernel methods - overview

Data Mining and Statistical Learning - 2008

22

Kernel density estimation

• Straightforward estimates of the density are bumpy

• Instead, Parzen’s smooth estimate is preferred:

Normally, Gaussian kernels are used

Page 23: Kernel methods - overview

Data Mining and Statistical Learning - 2008

23

Radial basis functions and kernels

Using the idea of basis expansion, we treat kernel functions as basis functions:

where ξj –prototype parameter, λj-scale parameter

Page 24: Kernel methods - overview

Data Mining and Statistical Learning - 2008

24

Radial basis functions and kernels

Choosing the parameters:

• Estimate {λj, ξj } separately from βj (often by using the distribution of X alone) and solve least-squares.