Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual...

Preview:

Citation preview

Regression

“A new perspective on freedom”

Classification

?Cat Dog

Cleanliness

Size

?

$ $$ $$$ $$$$

Regression

$

$$

$$$

$$$$

Price

Top speed

x

y

Regression

Data

Goal: given , predict

i.e. find a prediction function

(xi ;yi )i=1:::n

y(x)

x y

Nearest neighbor

-5 0 5 10 15 20 25-10

-5

0

5

10

15

Nearest neighbor

• To predict x– Find the data point xi closest to x

– Choose y = yi

+ No training

– Finding closest point can be expensive

– Overfitting

Kernel Regression

• To predict X– Give data point xi weight

– Normalize weights

– Let y=nX

i=1

m0iyi

k(x) = e.g. k(x) = e¡x 2

2¾2

m0i =

miP nj =1mj

mi = k(x ¡ xi )

Kernel Regression

-5 0 5 10 15 20 25-10

-5

0

5

10

15

[matlab demo]k

Kernel Regression

+ No training

+ Smooth prediction

– Slower than nearest neighbor

– Must choose width of

y(x) =P

i yik(xi ¡ x)P

i k(xi ¡ x)

k

Linear regression

Linear regression

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

[start Matlab demo lecture2.m]

Given examples

Predict given a new point

(xi ;yi )i=1:::n

yn+1 xn+1

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

xn+1

yn+1

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

Linear regression

Predictionyi = w0 + w1xi

Predictionyi = w0 + w1xi;1 + w2xi;2

=³1 xi;1 xi;2

´0

B@w0w1w2

1

CA

= X >i w

xn+1

yn+1

Linear Regression

yy Error or “residual”

Prediction

Observation

x

X i =

0

B@

1xi;1xi;2

1

CA

Sum squared errorX

i(X >

i w ¡ yi)2

y = X >i w

Linear Regression

n

d Solve the system (it’s better not to invert the matrix)

E =X

i

(X >i w¡ yi )2 = kXw¡ yk22

= w>X >Xw¡ 2y>Xw+kyk22

A b>

X =

0

B@

¡ X >1 ¡

¡ X >2 ¡: : :

1

CA

@E@w

=2Aw¡ 2b

Aw= b

LMS Algorithm(Least Mean Squares)

where

Online algorithm

E =X

i

(X >i w¡ yi )2 =

X

i

E i

@E@w

=X

i

@E i

@w

@E i

@w

@E@w

@E i

@w=

@@w

(X >i w¡ yi )2

= 2X i (X >i w¡ yi )

®@E@w

wX i

X >i w= yi

wt+1 =wt +®X i (yi ¡ X >i w

t)

Beyond lines and planes

everything is the same with

still linear in

0 10 200

20

40

yi =w0+w1xi +w2x2i

w

X i =

0

@1xix2i

1

A

Linear Regression [summary]

n

d

Let

For example

Let

Minimize by solvingkX w ¡ yk22³X >X

´w = X >y

y =

0

BB@

y1y2: : :

1

CCA

Given examples

X >i =

³1 xi;1 xi;2 x2i;1 x2i;2 xi;1xi;2

´X >i = (f 1(xi) f 2(xi) : : : f d(xi))

X =

0

BB@

¡ X >1 ¡

¡ X >2 ¡

: : :

1

CCA

Predict yn+1 = X >n+1w

(xi ;yi )i=1:::n

Probabilistic interpretation

Likelihood

X >i wyi

xi

yi jxi » N (X >i w;¾

2)

L =Y

iexp ¡

12¾2

(X >i w ¡ yi)

2 = exp ¡12¾2

X

i(X >

i w ¡ yi)2

= exp ¡12¾2

kX w ¡ yk2

Overfitting

0 2 4 6 8 10 12 14 16 18 20-15

-10

-5

0

5

10

15

20

25

30

[Matlab demo]

Degree 15 polynomial

Ridge Regression(Regularization)

0 2 4 6 8 10 12 14 16 18 20-10

-5

0

5

10

15Effect of regularization (degree 19)

with “small”²Minimize12kX w ¡ yk22+ ²kwk22

A = X >X

b= X >y

(A + ²I )w = bSolve

Let

Probabilistic interpretation

yi jxi » N (X >i w;¾

2)Likelihood

Prior

P (wjx1; : : :xn) =P (w;x1; : : :xn)P (x1; : : :xn)

/ P (w;x1; : : :xn)

Posterior

w » N

Ã

0;¾2

²

!

P (w;x1; : : :xn) = exp ¡½ ²2¾2

kwk22

¾Y

iexp ¡

12¾2

(X >i w ¡ yi)

2

= exp ¡12¾2

2

4²kwk22+X

i(X >

i w ¡ yi)2

3

5

Locally Linear Regression

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Global temperature increase

Locally Linear Regression

• To predict X– Give data point xi weight

– Let

– Let

w=Argminw

nX

i=1

mi (X >i w¡ yi )2

mi = k(xn+1 ¡ xi )

k(x) = e.g. k(x) = e¡x 2

2¾2

yn+1 =X >n+1w

Locally Linear Regression

+ Good even at the boundary (more important in high dimension)

– Solve linear system for each new prediction

– Must choose width of k

To minimize

Solve³X >M X

´w = X >M y

Predict yn+1 = X >n+1w

nX

i=1

mi (X >i w¡ yi )2

where M =

0

@m1

m2

m3

1

A

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

Locally Linear RegressionGaussian kernel

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

180

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

Locally Linear RegressionLaplacian kernel

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

180

L1 Regression

Sensitivity to outliers

yi

High weight given to outliers

010

2030

40

0

10

20

30

5

10

15

20

25

Temperature at noon

x>i w

yix>i w

E =X

i(x>i w ¡ yi)

2 =X

iE i E i

@E i@yi Influence

function

s.t. x>i w ¡ yi · ci 8i

yi ¡ x>i w · ci 8i

L1 Regression

E 0 =X

ijx>i w ¡ yi j

=X

iE 0i yix>i w

Linear program

E iE 0i

yix>i w

@E 0i

@yiminw;c

X

ici

Influence function

Spline RegressionRegression on each interval

5200 5400 5600 5800

50

60

70

Spline RegressionWith equality constraints

5200 5400 5600 5800

50

60

70

Spline RegressionWith L1 cost

5200 5400 5600 5800

50

60

70

To learn more

• The Elements of Statistical Learning, Hastie, Tibshirani, Friedman, Springer

Recommended