Simple Linear Regression
1
Chapter 2. Simple Linear Regression
Regression Analysis
Study a functional relationship between variables
- response variable y ,Y (dependent variable) - explanatory variable x , X (independent variable)
To explain the “variability” of Y ,
Simple Linear Regression
2
Simple linear regression model (§2.4)
0 1 ,i i iY x ( 1, , nx x : non-random)
1, , n : independent random errors
2( ) 0, ( )i iE Var ( 1, , )i n
(additional assumption : 2(0, )i N ) inference를 위해 필요
Method of estimation (§2.5)
<Least Squares Method >
- minimize 20 1
1
( )n
i ii
y x
w.r.t. 0 and 1
- normal equation :
11
1
0ˆ ˆ, ( ) (residu
0
)0 al
n
ii
i i i
n
i ii
e
e
x e y x
Simple Linear Regression
3
( ( ) 0i ix x e )
- least square estimates : 1
0 1
ˆ
ˆ ˆxy xxS S
y x
where
2
( )( )
( )xy i i
xx i
S x x y y
S x x
- least squares regression fit : 0 1 1ˆ ˆ ˆˆ ( )y x y x x
Simple Linear Regression
4
<“Unbiased” estimation of 2 > (§2.6)
2 21 ˆ( )2 i iy y
n
, 0 1ˆ ˆˆi iy x
12n
SSE (SSE : residual sum of squares (error sum of squares))
( 2n : degrees of freedom, df)
Example (Computer Repair Data, §2.3)
data (n=14)
scatter plot: “simple linear regression” seems O.K.
model setting : eq. (2.10)
estimated “l.s. line” eq. (2.19) with residuals (Table 2.7)
estimated error variance: eq. (2.23)
Simple Linear Regression
5
2 21 0
ˆ ˆ ˆ ˆ15.509, 4.162, 5.392 4.162 15.509y x
Simple Linear Regression
6
Simple Linear Regression
7
Method of inference (추정량의 성질 및 신뢰구간, 검정)
(1) Properties of estimates
i. 21 1 1
ˆ ˆ( ) , ( ) xxE Var S
ii. 2 1 20 0 0
ˆ ˆ( ) , ( ) ( )xxE Var n x S
iii. 20 1
ˆ ˆ( , ) xxCov x S
iv. 2 2 2 2ˆ( ) , ( ) (1 ) , ( , )i ii i j ijE Var e p Cov e e p
and ( ) 0iE e where ( )( )1 i j
ijxx
x x x xp
n S
Simple Linear Regression
8
(2) Inference under additional normality assumption
i. 1 1 1 21 1
1
ˆ ˆ ˆ ˆ~ ( 2); . .( ) var( )ˆ. .( )xxt n s e S
s e
- 100 (1 )% C.I. : 1 1 1 1 1ˆ ˆ ˆ ˆ[ ( 2; 2) . .( ) ( 2; 2) . .( )]t n s e t n s e
- Reject 00 1 1:H in favor of 0
1 1 1:H 0
1 1
1
ˆ( 2; 2)ˆ. .( )
iff t ns e
- p-value
ii. 0 0 1 1/20 0
0
ˆ ˆ ˆ ˆ~ ( 2); . .( ) var( ) ( )ˆ. .( )xxt n s e n x S
s e
- 100 (1 )% C.I. : 0 0 0 0 0ˆ ˆ ˆ ˆ[ ( 2; 2) . .( ) ( 2; 2) . .( )]t n s e t n s e
- Reject 00 0 0:H in favor of 0
1 0 0:H 0
0 0
0
ˆ( 2; 2)ˆ. .( )
iff t ns e
Simple Linear Regression
9
iii. 0 0 0 1 0( | )E Y x x
0 0 1 0ˆ ˆˆ x
0 0 1 2 1/20 0 1 0 0
0
ˆ ˆ ˆˆ ˆ~ ( 2); . .( ) var( ) ( ( ) )ˆ. .( ) xxnt n s e x x x S
s e
- 100 (1 )% C.I. : 0 0 0 0 0ˆ ˆ ˆ ˆ[ ( 2; 2) . .( ) ( 2; 2) . .( )]t n s e t n s e
- Test (not given)
iv. Prediction for 0 0 100 1 0 ( : , , )ny x indep of
0 0 1 0ˆ ˆy x
0 0 1 2 1/20 0 0
0 0
ˆ ˆ ˆ~ ( 2) ; . .( ) (1 ( ) )ˆ. .( ) xx
y y t n s e y y n x x Ss e y y
100 (1 )% Prediction interval
: 0 0 0 0 0 0 0ˆ ˆ ˆ ˆ[ ( 2; 2) . .( ) ( 2; 2) . .( )]y t n s e y y y y t n s e y y
** Note that 0 is identical to the predicted response 0y at any given 0x .
Simple Linear Regression
10
Example(computer repair data) (c.t.d.)
① “Test of significance” (of explanatory variable)
0 1: 0H v.s. 1 1: 0H
- 1 1ˆ ˆ. .( )t s e 30.71 (Table 2.9)
p-value / meaning : “we’ve seen a data which can hardly be observed under 0H ”
- We may reject 0H
② 95% C.I .for 1
③ 95% C.I for 4 0 1 4
④ 95% P.I for 4 0 1 14 ( , , )ny (wider than ③)
- All these are valid under “the model assumptions” Need to check them! (chapter 4)
- Note that 0 0: 0H v.s. 1 0: 0H can’t be rejected even at 10% (Table 2.9)
Meaning : We may start with a “simpler” model 1i i iy x 2~ (0, )iid
i N
Then, all the above inferences should be changed!
Simple Linear Regression
11
Measuring the quality of fit
i. Decomposition of Sum of Squares :
deviation sum of squares
ˆ ˆ( ) ( )i i i iy y y y y y 2 2 2ˆ ˆ( ) ( ) ( )i i i iy y y y y y
SST SSE SSR
(d.f.) (n-1) (n-2) (1)
1 1 11
ˆ ˆ ˆˆ ˆ ˆ2( )( ) 2 ( ) 2 ( ) 0 ( )n
i i i i i i i i i ii
y y y y e x x x e x e y y x x
(*) SSR2
22 2 2
11 1
( )ˆˆ( ) ( )n n
ixyi i i
xx xx
x xSy y x x yS S
ii. Coefficient of determination( or Multiple Correlation Coeff.)
2 1SSR SSERSST SST
, 20 1R
2R : “proportion of variation of y explained by x ”
Simple Linear Regression
12
Example (Computer Repair Data)
s.s. d.f. 2RReg. 27419.500 1 0.987
Err. 348.848 12
Total 27768.348 13
Simple Linear Regression
13
Supplement I (ch.2)
(1) Geometry of Least Squares Method
- minimize 20 1
1
{ ( )}n
i ii
y x
w.r.t. 0 & 1
- minimize 2
0 1( )y 1 x
w.r.t. 0 & 1
where 1
n
y
y
y 1
1
1 1
n
x
x
x “column vectors”
- Examples
(x,y) = (1,1), (1,2), (2,2) 1=(1,1,1)T, x=(1, 1, 2) T , y=(1, 2, 2) T
Simple Linear Regression
14
<perpendicular projection onto a vector>
( ) y βx x ( ) 0T X y βX 1ˆ ( )T T β X X X y
i.e. 1( )T T X β X X X X y : proj of y onto ( )C X ( X의 column space)
Simple Linear Regression
15
(*1) 1( )T T y 1 1 1 1 y 1
(*2) 11( ) ( ) ( ) ( ) ( )T Tx x x x x x 1 x 1 x 1 x 1 y x 1 ( ( ) ( ) 0)Tx y x 1 1
1 0 1ˆ ˆ ˆˆ ( )y x y 1 x 1 1 x where 0 1
ˆ ˆy x
Simple Linear Regression
16
<meaning of coefficient of determination>
2( )iy y SST , 2ˆ( )i iSSE y y , 2ˆ( )iSSR y y
2 2 2cos : cos 1 ( 0)SSR SST R
y gets closer to the plane ( , )C 1 x which is determined by ,1 x
Simple Linear Regression
17
(2) Properties of Variance & Covariance of random variables
cov( , ) ( )( )Y Z E Y EY Z EZ
① 1 1 1 1
cov( , ) cov( , )m n m n
i i j j i j i ji j i j
a Y b Z a b Y Z
② 21 1
1
var( ) var( ) cov( , )n
n n i i i j i ji j
a Y a Y a Y a a Y Y
③ 2
11 1
, : cov( , ) 0
, , : var( ) var( )n n
n i i i i
Y Z indep Y Z
Y Y indep a Y a Y
(3) Expectation, Variance & Covariance of random vectors
For random vector 1, , nY Y Y
(column vector notation),
1
n
EYEY
EY
, (mean vector),
1 1 2 1
1 2
var( ) cov( , ) cov( , )
var( ) cov( , )
cov( , ) cov( , ) var( )
n
i j
n n n
Y Y Y Y Y
Y Y Y
Y Y Y Y Y
(variance-covariance matrix)
Simple Linear Regression
18
Note that (*1)
var( ) ( )( )Y E Y EY Y EY
( , )( cov( , ) ( )( ), ( ) )i j i i j j i j i jY Y E Y EY Y EY a a a a
① ( ) ( )
var( ) var( ) var( )E AY b AE Y b
AY b AY A Y A
(*2)
1
n
ij j ij
AY b a Y b
for ( ), ( )ij iA a b b
: constants
1 1
( ) ( )n n
ij j i ij j ij j
E AY b E a Y b a EY b AEY b
var( ) { ( )}{ ( )}AY b E AY b E AY b AY b E AY b
{ ( )}{ ( )}E A Y EY A Y EY
( )( )E A Y EY Y EY A
( )( )AE Y EY Y EY A
var( )A Y A
② In simple (or multiple) linear regression model, 2( ) , var( ) nE Y X Y I
Simple Linear Regression
19
(4) Gradient vector
① For (n1) vector 1
11
, , ( , , )n
n i ii
n
xc x c x c c c x
x
, partial derivative of c x
w.r.t. x
:
11
( )
( )
( ) n
n
c xcx
c x cx
c x cx
. Similarly, 11
( )
( )
( ) n
n
x ccx
x c cx
x c cx
② For any matrix 1& ( , , )nA y y y
( )( )
y AyA A y
y
. When A : symmetric, ( )
2y Ay
Ayy
2
1 1 1 1
n n n n
l lk k i ik k l lk ii il k k l
k i l k
y Ay y a y y a y y a a y
Simple Linear Regression
20
( )2ik k l li ii i
k i l ii
y Aya y y a a y
y
1 1
( )n n
ik k l liik l
a y y a A A y
(5) Properties of Least Squares Estimates
iY : Independent, 1, , nx x : constants
20 1 , var( )i i iEY x Y ( 1, , )i n
11 1 1
1ˆ ( )( ) ( ) 0n n n
xy ii i i i
ixx xx xx
S x xx x Y Y Y x x YS S S
0 11
1ˆ ˆ ( )n
ii
xx
x xY x x Yn S
0 1 11
1ˆ ˆ ˆ( ) ( ) ( )n
ji i i i i i i j
j xx
x xe Y x Y Y x x Y x x Y
n S
i. 1 0 11 1
ˆ( ) ( )n n
i ii i
i ixx xx
x x x xE EY xS S
Simple Linear Regression
21
1 ( ( ) ( )( ) )i i i i xxx x x x x x x S
2
21
1
ˆvar( ) var( )n
ii xx
i xx
x x Y SS
1( , , nY Y : indep. 2var( ) )iY
ii. 1 10 0 1
1 1
ˆ( ) ( )n n
i ii i
xx xx
x x x xE n x EY n x xS S
0
2 21 2 1 2 2
0 21 1 1 1
0
2 21 2 2 1 2
21
( )ˆvar( ) var( ) 2
( )
n n n ni i i
ixx xx xx
n
ixx xx
x x x x x xn x Y n xn xS S S
x xn x x nS S
11 0
1 1
ˆ ˆcov( , ) cov ( ) ,n n
jii j
i jxx xx
x xx xn x Y YS S
1
1 1
cov( , )n n
jii j
i j xx xx
x xx xn x Y YS S
.
Simple Linear Regression
22
1 2
1
2
( cov( , ) 0 )n
i ii j
i xx xx
xx
x x x xn x Y Y for i jS S
x S
Simple Linear Regression
23
<SAS>
Computer Repair Data
1. Input Program
Data repair;
Input units minutes @@;
Cards;
1 23 2 29 3 49 4 64 4 74 5 87 6 96 6 97 7 109 8 119 9 149 9 145 10 154 10 166
;
run;
Simple Linear Regression
24
2. Scatter plot and Linear regression line
symbol1 interpol = RL c=black h=1 v=dot;
axis1 minor=none order=(0,40,80,120,160);
axis2 minor=none order=(0,2,4,6,8,10);
proc gplot data=repair;
plot minutes*units / haxis=axis2 vaxis=axis1;
run;
minutes
0
40
80
120
160
units
0 2 4 6 8 10
Simple Linear Regression
25
3. Regression Analysis
proc reg data=repair
model minutes = units;
run;
<절편없는 회귀분석>
proc reg data=repair
model minutes = units /noint;
run;
Simple Linear Regression
26
Simple Linear Regression
27
Anscombe’s Quartet