Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Chapter 8. Simple Linear Regression
2
Regression analysis:
● The earliest form of linear regression was the method of least squares, which was published by Legendre in 1805, and by Gauss in 1809.
● The method was extended by Francis Galton in the 19th century to describe a biological phenomenon.
● This work was extended by Karl Pearson and Udny Yule to a more general statistical context around 20th century.
● regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variable.
History:
● when there is just one predictor variable, we will use simple linear regression. When there are two or more predictor variables, we use multiple linear regression.
● when it is not clear which variable represents a response and which is a predictor, correlation analysis is used to study the strength of the relationship
3
Section 8.1. Simple Linear Regression
4
Model Assumption Specific settings of the predictor variable (x) and
corresponding values of the response variable (Y)
nn yxyxyx ,...,,,,, 2211
nixNY
YVarxYEY
N
nixY
xY y
iind
i
iiii
diii
iii
iii
,...,1 ,,~
, : nse Respo
.,0~ areerror random where
,..1 , :gression Linear ReSimple
on variablerandom theof valueobserved - :Assume
210
.
210
2...10
5
6
Example 1. (Tires Tread Wear vs. Mileage: Scatter Plot)
7
20 1
1
[ ( )]n
i ii
Q y x
Least Square Criterion (residual sum square)
8
The “best” fitting straight line in the sense of minimizing Q:Least Square estimate
One way to find the LS estimate and
Setting these partial derivatives equal to zero and simplifying, we get the Normal Equation
1
0 11 1
20 1
1 1 1
n n
i ii i
n n n
i i i ii i i
n x y
x x x y
0 110
0 111
2 [ ( )]
2 [ ( )]
n
i ii
n
i i ii
Q y x
Q x y x
0
9
Solve the equations and we get the least square estimators of β0 and β1.
2
1 1 1 10
2 2
1 1
1 1 11
2 2
1 1
( )( ) ( )( )
( )
( )( )
( )
n n n n
i i i i ii i i i
n n
i ii i
n n n
i i i ii i i
n n
i ii i
x y x x y
n x x
n x y x y
n x x
10
Maximum Likelihood Estimators of β0 and β1.
Under normal assumption of the errors, the likelihood function of the parameters, β0 and β1, given the (observed) responses is
LSE
n
iiiR
RMLE
n
iii
n
i
ii
iind
in
iiYn
xy
L
xy
xy
xNYyfYYLi
101
210,
10,10
10
1
2102
2
12
2102/12
210
110110
ˆ,ˆ min
,maxˆ,ˆ
: and of (MLE) Estimators Likelihood Maximum
.2
1exp2
,2
exp2
] .,~[Given ,,;,...,;,
10
10
11
To simplify expressions of the LS solution, we introduce
We get The equation is known as the least squares line, which is an estimate of the true regression line.
1 1 1 1
2 2 2
1 1 1
2 2 2
1 1 1
1( )( ) ( )( )
1( ) ( )
1( ) ( )
n n n n
xy i i i i i ii i i in n n
xx i i ii i in n n
yy i i ii i i
S x x y y x y x yn
S x x x xn
S y y y yn
0 1y x
xx
xySS
xy 110 ˆ ,ˆˆ :LSEt Coefficien
12
Example 2 (Tire Tread vs. Mileage: Least Square Fit)
Find the equation of the line for the tire tread wear data
and n=9. From these we calculate
2 2144, 2197.32, 3264, 589,887.08, 28,167.72i i i i i ix y x y x y
16, 244.15,x y
1 1 1
1 1( )( ) 28,167.72 (144 2197.32) 6989.40
9
n n n
xy i i i ii i i
S x y x yn
2 2 2
1 1
1 1( ) 3264 (144) 960
9
n n
xx i ii i
S x xn
13
The slope and intercept estimates are
Therefore, the equation of the LS line is
Conclusion: there is a loss of 7.281 mils in the tire groove depth for every 1000 miles of driving.
Given a particular We can find that y = 360.64 - 7.281*25 = 178.62 milswhich means the mean groove depth for all tires driven for 25,000miles is estimated to be 178.62 mils.
1 06989.40ˆ ˆ7.281 244.15 7.281*16 360.64
960and
25x
360.64 7.281 .y x
14
Goodness of Fit of the LS Line
Coefficient of Determination and Correlation
The residuals: are used to evaluate the goodness of fit of the LS line.
Decomposition of Sum Squares:
SST = SSR + SSENote: Sum of Squares of Total (SST)
Sum of Squares of Regression (SSR)Sum of Squares of Errors (SSE)
( 1,2,..... )i n
011
2
1
2
1
2 ˆˆ2ˆˆ
n
iiii
SSE
n
iii
SSR
n
ii
n
ii yyyyyyyyyySST
ix10iˆˆy : line Fitted
nixyyye iiiii ,...,1,ˆˆˆ 10
15
Define:
The ratio is called the coefficient of determination. It explains the percentage of the variation in response variable (Y) is accounted for by linear regression on the explanatory variable (x).
It can be shown that is the square of the sample linear correlation coefficient (r), i.e.
Coefficient of Determination
,12
SSRSSE
SSTSSRR
2R
.10
2
1 1
22
122
yyxx
xyn
i
n
iii
n
iii
SS
S
YYxx
YYxxrR
16
Example 3(Tire Tread Wear vs. Mileage: Coefficient of Determination and Correlation)
For the tire tread wear data, calculate using the results from Example 10.2 we have
Calculate
Therefore
where the sign of r follows from the sign of since 95.3% of the variation in tread wear is accounted for by linear regression on mileage, the relationship between the two is strongly linear with a negative slope.
2 2 2
1 1
1 1( ) 589,887.08 (2197.32) 53, 418.739
n n
yy i ii i
SST S y yn
53, 418.73 2531.53 50,887.20SSR SST SSE
2 50,887.20 0.953 0.953 0.97653, 418.73
r and r
1 7.281
2R
17
Estimation of
An unbiased estimate of is given by
Example 4. (Tire Tread Wear Vs. Mileage: Estimate of
Find the estimate of for the tread wear data using the results from Example 10.3 we have SSE=2351.3 and n-2=7,therefore
which has df=7. The estimate of is
2
2
2
nSSEMSE
65.3617
53.23512
ˆ 2
nSSEMSE
02.1965.361ˆ MSE
18
Point estimators:
Sampling distributions of and :
.
1
0
MSEnS
xSE
nSx
Nxx
i
xx
i
202
200 ˆ ,,~ˆ
xxxx S
MSESES
N
1
211 ˆ ,,~ˆ
Statistical Inference on and
.error squaremean the,ˆ where 2 MSE
xx
xySS
xy 110 ˆ ,ˆˆ
19
Statistical Inference on and , Con’t
Sampling distribution (parameter-free):
Confidence Interval’s for β0 and β1 :
2~)ˆ(
ˆ
0
00
ntSE
2~)ˆ(
ˆ
1
11
ntSE
.ˆ2ˆ
,ˆ2ˆ
12
1
0
20
SEnt
SEnt
20
Hypothesis test:or
-- Test statistic:
-- At the significance level α, we reject infavor of iff
-- Can be used to show whether there is alinear relationship between x and Y.
aH0H
00 1 1:H 0
1 1:aH
0 1: 0H 1: 0aH
Testing hypothesis on and
)ˆ(
ˆ
1
011
SET
22/ ntT
21
Analysis of Variance (ANOVA), Con’t
Mean Square: a sum of squares divided by its d.f.
SSR SSEMSR= ,MSE=1 2n
.0:under 2,1~)ˆ(
ˆ
/
ˆˆ
102
2
1
1
21
21
HnFTSEMSE
MSR
SMSEMSES
MSESSR
MSEMSR
xx
xx
22
Analysis of Variance (ANOVA)
ANOVA Table
Example:
Source of Variation(Source)
Sum of Squares
(SS)
Degrees of Freedom
(d.f.)
Mean Square(MS)
F
Regression
Error
SSR
SSE
1
n - 2Total SST n - 1
SSRMSR=1
SSEMSE=2n
MSRF=MSE
Source SS d.f. MS FRegressionError
50,887.20 2531.53
17
50,887.20361.25
140.71
Total 53,418.73 8
23
Regression Diagnostics
Checking for Model Assumptions Checking for Linearity Checking for Constant Variance Checking for Normality Checking for Independence
24
Checking for LinearityXi = Mileage Y=β0 + β1 xYi = Groove Depth
Yi _hat = fitted value ei =residual i Xi Yi
^Yi ei
1 0 394.33 360.64 33.69
2 4 329.50 331.51 -2.01
3 8 291.00 302.39 -11.39
4 12 255.17 273.27 -18.10
5 16 229.33 244.15 -14.82
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90
8 28 163.83 156.78 7.05
9 32 150.33 127.66 22.67Xi
ei
35302520151050
40
30
20
10
0
-10
-20
Scatterplot of ei vs Xi
25
Checking for Normality
C1
Perc
ent
50403020100-10-20-30-40
99
95
90
80
70
60504030
20
10
5
1
Mean 3.947460E-16StDev 17.79N 9AD 0.514P-Value 0.138
Normal Probability Plot of residualsNormal
26
Checking for Constant Variance
Var(Y) is not constant. A sample residual plots whenVar(Y) is constant.
Plot of Residuals vs Fitted Value
-30
-20
-10
0
10
20
30
40
0 100 200 300 400
Fitted Value
Res
idua
ls
27
Checking for Independence
Does not apply for Simple Linear Regression Model
Only apply for time series data
28
Outlier and Influential Points
29
30
Checking for Outliers & Influential Observations
What is OUTLIER Why checking for outliers is important? Mathematical definition
How to deal with them?
Investigate (Data errors? Rare events? Can be corrected?) Ways to accommodate outliers1. Non Parametric Methods (robust to outliers)2. Data Transformations3. Deletion (or report model results both with and without the outliers
or influential observations to see how much they change)
31
Data Transformations Reason To achieve linearity To achieve homogeneity of variance To achieve normality or symmetry about the regression
equation
Method of Linearizing Transformation Use mathematical operation, e.g. square root, power, log,
exponential, etc. Only one variable needs to be transformed in the simple
linear regression. Which one? Predictor or Response? Why?
32
Xi Yi^
log Yi^
exp (logYi) Ei
0 394.33 5.926 374.64 19.69
4 329.50 5.807 332.58 -3.08
8 291.00 5.688 295.24 -4.24
12 255.17 5.569 262.09 -6.92
16 229.33 5.450 232.67 -3.34
20 204.83 5.331 206.54 -1.71
24 179.00 5.211 183.36 -4.36
28 163.83 5.092 162.77 1.06
32 150.33 4.973 144.50 5.83
Exponential transformation on Y = exp (-x) log Y = log - x
xi
Res
idua
l
35302520151050
40
30
20
10
0
-10
-20
ei (original)ei with transformation
Variable
Plot of Residual vs xi & xi from the exponential fit
Data
Perc
ent
50403020100-10-20-30-40
99
95
90
80
70
605040
30
20
10
5
1
3.947460E-16 17.79 9 0.514 0.1380.3256 8.142 9 0.912 0.011
Mean StDev N AD P
eiei with transformation
Variable
Normal Probability Plot of ei and ei with transformation
33
Residual Check Model checking by residual plots:
1. residual vs fitted value --- ei vs Yi
2. residual vs explanatory variable ---- ei vs xi
3. residual vs lag of residual --- ei vs ei-1
Transformation1. Boxcox transformation for skewed distribution2. log transformation3. square root transformation
Correlation of residuals 1. correlation coefficient2. Durbin-Watson Statistic (series)
34
Durbin-Watson Statistic
11
1
2
2
21
11
1
2
21
1
22)(
:Statistic ),(
tcoefficienation autocorrelk Lag
.1,0~),(
tcoefficienation autocorrel 1 Lag
re
eeD
WatsonDurbinryyCorr
nNr
e
eeyyCorr
n
tt
n
ttt
kkii
n
tt
n
ttt
ii
35
Correlation Analysis
Correlation: a measurement of how closely two variables share a linear relationship.
Useful when it is not possible to determine which variable is the predictor and which is the response. Health vs wealth. Which is predictor? Which is response?
Y)Var(X)Var(Y) Cov(X, Y) corr(X,
36
Linear Correlation Coefficient
37
Derivation of T Therefore, we can use t
as a statistic for testing against the null hypothesis
H0: β1=0
Equivalently, we can test against
H0: ρ=0
.equivalent are they yes,
)ˆ(
ˆ
/
ˆ
)2()2(ˆ t
:then
)2(1
ˆˆˆ
:substitute
)ˆ(
ˆ
12
?equivalent theseare
1
1121
22
111
1
1?
2
SESssnSSTn
SSTS
SSTsn
SSTSSEr
SSTS
SS
ssr
SErnrt
xx
xx
xx
yy
xx
y
x
2~1
22
nt
rnrt