38
Regression “A new perspective on freedom”

Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Embed Size (px)

Citation preview

Page 1: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Regression

“A new perspective on freedom”

Page 2: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Classification

Page 3: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

?Cat Dog

Page 4: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Cleanliness

Size

Page 5: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

?

$ $$ $$$ $$$$

Page 6: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Regression

Page 7: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

$

$$

$$$

$$$$

Price

Top speed

x

y

Page 8: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Regression

Data

Goal: given , predict

i.e. find a prediction function

(xi ;yi )i=1:::n

y(x)

x y

Page 9: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Nearest neighbor

-5 0 5 10 15 20 25-10

-5

0

5

10

15

Page 10: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Nearest neighbor

• To predict x– Find the data point xi closest to x

– Choose y = yi

+ No training

– Finding closest point can be expensive

– Overfitting

Page 11: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Kernel Regression

• To predict X– Give data point xi weight

– Normalize weights

– Let y=nX

i=1

m0iyi

k(x) = e.g. k(x) = e¡x 2

2¾2

m0i =

miP nj =1mj

mi = k(x ¡ xi )

Page 12: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Kernel Regression

-5 0 5 10 15 20 25-10

-5

0

5

10

15

[matlab demo]k

Page 13: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Kernel Regression

+ No training

+ Smooth prediction

– Slower than nearest neighbor

– Must choose width of

y(x) =P

i yik(xi ¡ x)P

i k(xi ¡ x)

k

Page 14: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Linear regression

Page 15: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Linear regression

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

[start Matlab demo lecture2.m]

Given examples

Predict given a new point

(xi ;yi )i=1:::n

yn+1 xn+1

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

xn+1

yn+1

Page 16: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

Linear regression

Predictionyi = w0 + w1xi

Predictionyi = w0 + w1xi;1 + w2xi;2

=³1 xi;1 xi;2

´0

B@w0w1w2

1

CA

= X >i w

xn+1

yn+1

Page 17: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Linear Regression

yy Error or “residual”

Prediction

Observation

x

X i =

0

B@

1xi;1xi;2

1

CA

Sum squared errorX

i(X >

i w ¡ yi)2

y = X >i w

Page 18: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Linear Regression

n

d Solve the system (it’s better not to invert the matrix)

E =X

i

(X >i w¡ yi )2 = kXw¡ yk22

= w>X >Xw¡ 2y>Xw+kyk22

A b>

X =

0

B@

¡ X >1 ¡

¡ X >2 ¡: : :

1

CA

@E@w

=2Aw¡ 2b

Aw= b

Page 19: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

LMS Algorithm(Least Mean Squares)

where

Online algorithm

E =X

i

(X >i w¡ yi )2 =

X

i

E i

@E@w

=X

i

@E i

@w

@E i

@w

@E@w

@E i

@w=

@@w

(X >i w¡ yi )2

= 2X i (X >i w¡ yi )

®@E@w

wX i

X >i w= yi

wt+1 =wt +®X i (yi ¡ X >i w

t)

Page 20: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Beyond lines and planes

everything is the same with

still linear in

0 10 200

20

40

yi =w0+w1xi +w2x2i

w

X i =

0

@1xix2i

1

A

Page 21: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Linear Regression [summary]

n

d

Let

For example

Let

Minimize by solvingkX w ¡ yk22³X >X

´w = X >y

y =

0

BB@

y1y2: : :

1

CCA

Given examples

X >i =

³1 xi;1 xi;2 x2i;1 x2i;2 xi;1xi;2

´X >i = (f 1(xi) f 2(xi) : : : f d(xi))

X =

0

BB@

¡ X >1 ¡

¡ X >2 ¡

: : :

1

CCA

Predict yn+1 = X >n+1w

(xi ;yi )i=1:::n

Page 22: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Probabilistic interpretation

Likelihood

X >i wyi

xi

yi jxi » N (X >i w;¾

2)

L =Y

iexp ¡

12¾2

(X >i w ¡ yi)

2 = exp ¡12¾2

X

i(X >

i w ¡ yi)2

= exp ¡12¾2

kX w ¡ yk2

Page 23: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Overfitting

0 2 4 6 8 10 12 14 16 18 20-15

-10

-5

0

5

10

15

20

25

30

[Matlab demo]

Degree 15 polynomial

Page 24: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Ridge Regression(Regularization)

0 2 4 6 8 10 12 14 16 18 20-10

-5

0

5

10

15Effect of regularization (degree 19)

with “small”²Minimize12kX w ¡ yk22+ ²kwk22

A = X >X

b= X >y

(A + ²I )w = bSolve

Let

Page 25: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Probabilistic interpretation

yi jxi » N (X >i w;¾

2)Likelihood

Prior

P (wjx1; : : :xn) =P (w;x1; : : :xn)P (x1; : : :xn)

/ P (w;x1; : : :xn)

Posterior

w » N

Ã

0;¾2

²

!

P (w;x1; : : :xn) = exp ¡½ ²2¾2

kwk22

¾Y

iexp ¡

12¾2

(X >i w ¡ yi)

2

= exp ¡12¾2

2

4²kwk22+X

i(X >

i w ¡ yi)2

3

5

Page 26: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Locally Linear Regression

Page 27: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Global temperature increase

Page 28: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Locally Linear Regression

• To predict X– Give data point xi weight

– Let

– Let

w=Argminw

nX

i=1

mi (X >i w¡ yi )2

mi = k(xn+1 ¡ xi )

k(x) = e.g. k(x) = e¡x 2

2¾2

yn+1 =X >n+1w

Page 29: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Locally Linear Regression

+ Good even at the boundary (more important in high dimension)

– Solve linear system for each new prediction

– Must choose width of k

To minimize

Solve³X >M X

´w = X >M y

Predict yn+1 = X >n+1w

nX

i=1

mi (X >i w¡ yi )2

where M =

0

@m1

m2

m3

1

A

Page 30: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

Locally Linear RegressionGaussian kernel

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

180

Page 31: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

[source: http://www.cru.uea.ac.uk/cru/data/temperature]

Locally Linear RegressionLaplacian kernel

1840 1860 1880 1900 1920 1940 1960 1980 2000 2020-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

180

Page 32: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

L1 Regression

Page 33: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Sensitivity to outliers

yi

High weight given to outliers

010

2030

40

0

10

20

30

5

10

15

20

25

Temperature at noon

x>i w

yix>i w

E =X

i(x>i w ¡ yi)

2 =X

iE i E i

@E i@yi Influence

function

Page 34: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

s.t. x>i w ¡ yi · ci 8i

yi ¡ x>i w · ci 8i

L1 Regression

E 0 =X

ijx>i w ¡ yi j

=X

iE 0i yix>i w

Linear program

E iE 0i

yix>i w

@E 0i

@yiminw;c

X

ici

Influence function

Page 35: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Spline RegressionRegression on each interval

5200 5400 5600 5800

50

60

70

Page 36: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Spline RegressionWith equality constraints

5200 5400 5600 5800

50

60

70

Page 37: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

Spline RegressionWith L1 cost

5200 5400 5600 5800

50

60

70

Page 38: Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A

To learn more

• The Elements of Statistical Learning, Hastie, Tibshirani, Friedman, Springer