22
LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: [email protected]

LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: [email protected]

Embed Size (px)

Citation preview

Page 1: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Regression Analysis

Lecturer: Dr. Bo Yuan

E-mail: [email protected]

Page 2: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Regression

To express the relationship between two or more variables by a mathematical formula.

x : predictor (independent) variable

y : response (dependent) variable

Identify how y varies as a function of x.

y is also considered as a random variable.

Real-Word Example:

Footwear impressions are commonly observed at crime scenes.

While there are numerous forensic properties that can be obtained

from these impressions, one in particular is the shoe size. The

detectives would like to be able to estimate the height of the

impression maker from the shoe size.

The relationship between shoe sizes and heights2

Page 3: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Shoe Size vs. Height

3

Page 4: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Shoe Size vs. Height

What is the predictor?

What is the response?

Can the height by accurately estimated from the shoe size?

If a shoe size is 11, what would you advise the police?

What if the size is 7 or 12.5?

4

Page 5: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

General Regression Model

The systematic part m(x) is deterministic.

The error ε(x) is a random variable.

Measurement Error

Natural Variations

Additive

5

)()()( xxmxy

Page 6: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Example: Sin Function

6

)()sin()( xxAxy

Page 7: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Standard Assumptions

7

Page 8: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

A1

8

Page 9: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

A2

9

Page 10: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

A3

10

Page 11: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Back to Shoes

11

Page 12: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Simple Linear Regression

12

xxm 10)(

Page 13: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Model Parameters

13

Page 14: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Derivation

14

n

iii xyR

1

21010 ),(

xy

xyn

iii

R

10

1100

020

2

1

2

11

111

1100

0

021

xnx

yxnyx

xxyxyx

xyx

n

ii

n

iii

n

iiiii

n

iiii

R

Page 15: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Standard Deviations

15

n

iin 1

22

2

1

2/1

2

1

2

21

0

xnx

x

n n

i

2/1

2

1

2

11

xnxn

i

Page 16: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Polynomial Terms

Modeling the data as a line is not always adequate.

Polynomial Regression

This is still a linear model!

m(x) is a linear combination of β.

Danger of Overfitting

16

p

k

kk

pp xxxxm

010 ...)(

Page 17: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Matrix Representation

17

i

p

k

kiki xy

0

XY

Page 18: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Matrix Representation

18

XYXYR T )(

YXXX

XXYXXYYYTT

TTTTTTR

00

YXXX TT 1

Page 19: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Model Comparison

19

n

ii yySST

1

2 :Total Squares of Sum

n

iii yySSE

1

2^

:Error Squares of Sum

Page 20: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

R2

20

SST

SSE

SST

SSESSTR

12

2 / ( ( 1))1

/ ( 1)adj

SSE n pR

SST n

Page 21: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Example

21

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5

0

5

10

15

20

25

30

X

Y

Y= -3.6029+4.8802X

R2=0.9131

Y= 0.7341-0.4303X+1.0621X2

R2=0.9880

Y=X2+N(0,1)

Page 22: LOGO Regression Analysis Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Summary

Regression is the oldest data mining technique.

Probably the first thing that you want to try on a new data set.

No need to do programming! Matlab, Excel …

Quality of Regression

R2

Residual Plot

Cross Validation

What you should learn after class:

Confidence Interval

Multiple Regression

Nonlinear Regression

22