Chapter 8. Simple Linear Regressionhomepages.math.uic.edu/~wangjing/stat481/Stat481-Lecture... · 2015. 1. 20. · 2 Regression analysis: The earliest form of linear regression was

1

Chapter 8. Simple Linear Regression

2

Regression analysis:

● The earliest form of linear regression was the method of least squares, which was published by Legendre in 1805, and by Gauss in 1809.

● The method was extended by Francis Galton in the 19th century to describe a biological phenomenon.

● This work was extended by Karl Pearson and Udny Yule to a more general statistical context around 20th century.

● regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variable.

History:

● when there is just one predictor variable, we will use simple linear regression. When there are two or more predictor variables, we use multiple linear regression.

● when it is not clear which variable represents a response and which is a predictor, correlation analysis is used to study the strength of the relationship

3

Section 8.1. Simple Linear Regression

4

Model Assumption Specific settings of the predictor variable (x) and

corresponding values of the response variable (Y)

nn yxyxyx ,...,,,,, 2211

nixNY

YVarxYEY

N

nixY

xY y

iind

i

iiii

diii

iii

iii

,...,1 ,,~

, : nse Respo

.,0~ areerror random where

,..1 , :gression Linear ReSimple

on variablerandom theof valueobserved - :Assume

210

.

210

2...10

5

6

Example 1. (Tires Tread Wear vs. Mileage: Scatter Plot)

7

20 1

1

[ ( )]n

i ii

Q y x

Least Square Criterion (residual sum square)

8

The “best” fitting straight line in the sense of minimizing Q:Least Square estimate

One way to find the LS estimate and

Setting these partial derivatives equal to zero and simplifying, we get the Normal Equation

1

0 11 1

20 1

1 1 1

n n

i ii i

n n n

i i i ii i i

n x y

x x x y

0 110

0 111

2 [ ( )]

2 [ ( )]

n

i ii

n

i i ii

Q y x

Q x y x

0

9

Solve the equations and we get the least square estimators of β0 and β1.

2

1 1 1 10

2 2

1 1

1 1 11

2 2

1 1

( )( ) ( )( )

( )

( )( )

( )

n n n n

i i i i ii i i i

n n

i ii i

n n n

i i i ii i i

n n

i ii i

x y x x y

n x x

n x y x y

n x x

10

Maximum Likelihood Estimators of β0 and β1.

Under normal assumption of the errors, the likelihood function of the parameters, β0 and β1, given the (observed) responses is

LSE

n

iiiR

RMLE

n

iii

n

i

ii

iind

in

iiYn

xy

L

xy

xy

xNYyfYYLi

101

210,

10,10

10

1

2102

2

12

2102/12

210

110110

ˆ,ˆ min

,maxˆ,ˆ

: and of (MLE) Estimators Likelihood Maximum

.2

1exp2

,2

exp2

] .,~[Given ,,;,...,;,

10

10

11

To simplify expressions of the LS solution, we introduce

We get The equation is known as the least squares line, which is an estimate of the true regression line.

1 1 1 1

2 2 2

1 1 1

2 2 2

1 1 1

1( )( ) ( )( )

1( ) ( )

1( ) ( )

n n n n

xy i i i i i ii i i in n n

xx i i ii i in n n

yy i i ii i i

S x x y y x y x yn

S x x x xn

S y y y yn

0 1y x

xx

xySS

xy 110 ˆ ,ˆˆ :LSEt Coefficien

12

Example 2 (Tire Tread vs. Mileage: Least Square Fit)

Find the equation of the line for the tire tread wear data

and n=9. From these we calculate

2 2144, 2197.32, 3264, 589,887.08, 28,167.72i i i i i ix y x y x y

16, 244.15,x y

1 1 1

1 1( )( ) 28,167.72 (144 2197.32) 6989.40

9

n n n

xy i i i ii i i

S x y x yn

2 2 2

1 1

1 1( ) 3264 (144) 960

9

n n

xx i ii i

S x xn

13

The slope and intercept estimates are

Therefore, the equation of the LS line is

Conclusion: there is a loss of 7.281 mils in the tire groove depth for every 1000 miles of driving.

Given a particular We can find that y = 360.64 - 7.281*25 = 178.62 milswhich means the mean groove depth for all tires driven for 25,000miles is estimated to be 178.62 mils.

1 06989.40ˆ ˆ7.281 244.15 7.281*16 360.64

960and

25x

360.64 7.281 .y x

14

Goodness of Fit of the LS Line

Coefficient of Determination and Correlation

The residuals: are used to evaluate the goodness of fit of the LS line.

Decomposition of Sum Squares:

SST = SSR + SSENote: Sum of Squares of Total (SST)

Sum of Squares of Regression (SSR)Sum of Squares of Errors (SSE)

( 1,2,..... )i n

011

2

1

2

1

2 ˆˆ2ˆˆ

n

iiii

SSE

n

iii

SSR

n

ii

n

ii yyyyyyyyyySST

ix10iˆˆy : line Fitted

nixyyye iiiii ,...,1,ˆˆˆ 10

15

Define:

The ratio is called the coefficient of determination. It explains the percentage of the variation in response variable (Y) is accounted for by linear regression on the explanatory variable (x).

It can be shown that is the square of the sample linear correlation coefficient (r), i.e.

Coefficient of Determination

,12

SSRSSE

SSTSSRR

2R

.10

2

1 1

22

122

yyxx

xyn

i

n

iii

n

iii

SS

S

YYxx

YYxxrR

16

Example 3(Tire Tread Wear vs. Mileage: Coefficient of Determination and Correlation)

For the tire tread wear data, calculate using the results from Example 10.2 we have

Calculate

Therefore

where the sign of r follows from the sign of since 95.3% of the variation in tread wear is accounted for by linear regression on mileage, the relationship between the two is strongly linear with a negative slope.

2 2 2

1 1

1 1( ) 589,887.08 (2197.32) 53, 418.739

n n

yy i ii i

SST S y yn

53, 418.73 2531.53 50,887.20SSR SST SSE

2 50,887.20 0.953 0.953 0.97653, 418.73

r and r

1 7.281

2R

17

Estimation of

An unbiased estimate of is given by

Example 4. (Tire Tread Wear Vs. Mileage: Estimate of

Find the estimate of for the tread wear data using the results from Example 10.3 we have SSE=2351.3 and n-2=7,therefore

which has df=7. The estimate of is

2

2

2

nSSEMSE

65.3617

53.23512

ˆ 2

nSSEMSE

02.1965.361ˆ MSE

18

Point estimators:

Sampling distributions of and :

.

1

0

MSEnS

xSE

nSx

Nxx

i

xx

i

202

200 ˆ ,,~ˆ

xxxx S

MSESES

N

1

211 ˆ ,,~ˆ

Statistical Inference on and

.error squaremean the,ˆ where 2 MSE

xx

xySS

xy 110 ˆ ,ˆˆ

19

Statistical Inference on and , Con’t

Sampling distribution (parameter-free):

Confidence Interval’s for β0 and β1 :

2~)ˆ(

ˆ

0

00

ntSE

2~)ˆ(

ˆ

1

11

ntSE

.ˆ2ˆ

,ˆ2ˆ

12

1

0

20

SEnt

SEnt

20

Hypothesis test:or

-- Test statistic:

-- At the significance level α, we reject infavor of iff

-- Can be used to show whether there is alinear relationship between x and Y.

aH0H

00 1 1:H 0

1 1:aH

0 1: 0H 1: 0aH

Testing hypothesis on and

)ˆ(

ˆ

1

011

SET

22/ ntT

21

Analysis of Variance (ANOVA), Con’t

Mean Square: a sum of squares divided by its d.f.

SSR SSEMSR= ,MSE=1 2n

.0:under 2,1~)ˆ(

ˆ

/

ˆˆ

102

2

1

1

21

21

HnFTSEMSE

MSR

SMSEMSES

MSESSR

MSEMSR

xx

xx

22

Analysis of Variance (ANOVA)

ANOVA Table

Example:

Source of Variation(Source)

Sum of Squares

(SS)

Degrees of Freedom

(d.f.)

Mean Square(MS)

F

Regression

Error

SSR

SSE

1

n - 2Total SST n - 1

SSRMSR=1

SSEMSE=2n

MSRF=MSE

Source SS d.f. MS FRegressionError

50,887.20 2531.53

17

50,887.20361.25

140.71

Total 53,418.73 8

23

Regression Diagnostics

Checking for Model Assumptions Checking for Linearity Checking for Constant Variance Checking for Normality Checking for Independence

24

Checking for LinearityXi = Mileage Y=β0 + β1 xYi = Groove Depth

Yi _hat = fitted value ei =residual i Xi Yi

^Yi ei

1 0 394.33 360.64 33.69

2 4 329.50 331.51 -2.01

3 8 291.00 302.39 -11.39

4 12 255.17 273.27 -18.10

5 16 229.33 244.15 -14.82

6 20 204.83 215.02 -10.19

7 24 179.00 185.90 -6.90

8 28 163.83 156.78 7.05

9 32 150.33 127.66 22.67Xi

ei

35302520151050

40

30

20

10

0

-10

-20

Scatterplot of ei vs Xi

25

Checking for Normality

C1

Perc

ent

50403020100-10-20-30-40

99

95

90

80

70

60504030

20

10

5

1

Mean 3.947460E-16StDev 17.79N 9AD 0.514P-Value 0.138

Normal Probability Plot of residualsNormal

26

Checking for Constant Variance

Var(Y) is not constant. A sample residual plots whenVar(Y) is constant.

Plot of Residuals vs Fitted Value

-30

-20

-10

0

10

20

30

40

0 100 200 300 400

Fitted Value

Res

idua

ls

27

Checking for Independence

Does not apply for Simple Linear Regression Model

Only apply for time series data

28

Outlier and Influential Points

29

30

Checking for Outliers & Influential Observations

What is OUTLIER Why checking for outliers is important? Mathematical definition

How to deal with them?

Investigate (Data errors? Rare events? Can be corrected?) Ways to accommodate outliers1. Non Parametric Methods (robust to outliers)2. Data Transformations3. Deletion (or report model results both with and without the outliers

or influential observations to see how much they change)

31

Data Transformations Reason To achieve linearity To achieve homogeneity of variance To achieve normality or symmetry about the regression

equation

Method of Linearizing Transformation Use mathematical operation, e.g. square root, power, log,

exponential, etc. Only one variable needs to be transformed in the simple

linear regression. Which one? Predictor or Response? Why?

32

Xi Yi^

log Yi^

exp (logYi) Ei

0 394.33 5.926 374.64 19.69

4 329.50 5.807 332.58 -3.08

8 291.00 5.688 295.24 -4.24

12 255.17 5.569 262.09 -6.92

16 229.33 5.450 232.67 -3.34

20 204.83 5.331 206.54 -1.71

24 179.00 5.211 183.36 -4.36

28 163.83 5.092 162.77 1.06

32 150.33 4.973 144.50 5.83

Exponential transformation on Y = exp (-x) log Y = log - x

xi

Res

idua

l

35302520151050

40

30

20

10

0

-10

-20

ei (original)ei with transformation

Variable

Plot of Residual vs xi & xi from the exponential fit

Data

Perc

ent

50403020100-10-20-30-40

99

95

90

80

70

605040

30

20

10

5

1

3.947460E-16 17.79 9 0.514 0.1380.3256 8.142 9 0.912 0.011

Mean StDev N AD P

eiei with transformation

Variable

Normal Probability Plot of ei and ei with transformation

33

Residual Check Model checking by residual plots:

1. residual vs fitted value --- ei vs Yi

2. residual vs explanatory variable ---- ei vs xi

3. residual vs lag of residual --- ei vs ei-1

Transformation1. Boxcox transformation for skewed distribution2. log transformation3. square root transformation

Correlation of residuals 1. correlation coefficient2. Durbin-Watson Statistic (series)

34

Durbin-Watson Statistic

11

1

2

2

21

11

1

2

21

1

22)(

:Statistic ),(

tcoefficienation autocorrelk Lag

.1,0~),(

tcoefficienation autocorrel 1 Lag

re

eeD

WatsonDurbinryyCorr

nNr

e

eeyyCorr

n

tt

n

ttt

kkii

n

tt

n

ttt

ii

35

Correlation Analysis

Correlation: a measurement of how closely two variables share a linear relationship.

Useful when it is not possible to determine which variable is the predictor and which is the response. Health vs wealth. Which is predictor? Which is response?

Y)Var(X)Var(Y) Cov(X, Y) corr(X,

36

Linear Correlation Coefficient

37

Derivation of T Therefore, we can use t

as a statistic for testing against the null hypothesis

H0: β1=0

Equivalently, we can test against

H0: ρ=0

.equivalent are they yes,

)ˆ(

ˆ

/

ˆ

)2()2(ˆ t

:then

)2(1

ˆˆˆ

:substitute

)ˆ(

ˆ

12

?equivalent theseare

1

1121

22

111

1

1?

2

SESssnSSTn

SSTS

SSTsn

SSTSSEr

SSTS

SS

ssr

SErnrt

xx

xx

xx

yy

xx

y

x

2~1

22

nt

rnrt

Documents

Chapter 8. Simple Linear Regressionhomepages.math.uic.edu/~wangjing/stat481/Stat481-Lecture... · 2015. 1. 20. · 2 Regression analysis: The earliest form of linear regression was