62
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/ ~ riczw/ Email: [email protected] Department of Statistics Colorado State University Fall 2015 W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 62

STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

STAT 540: Data Analysis and Regression

Wen Zhou

http://www.stat.colostate.edu/~riczw/

Email: [email protected]

Department of Statistics

Colorado State University

Fall 2015

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 62

Page 2: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Contents

1 Multiple Linear Regression Model

2 Inference on Multiple Regression

3 Inference about Regression Parameters

4 Estimation and Prediction

5 Geometric View of Regression and Linear Models

6 Estimating estimable function of coefficient

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 2 / 62

Page 3: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Multiple Linear Regression I

Multiple linear regression model

1 Multiple linear regression model in matrix terms

2 Estimation of regression coefficients

Inference

1 ANOVA results

2 Inference about regression parameters

3 Estimation of mean response and prediction of new observation

Inference about regression parameters

Estimation and prediction

Geometric interpretation of linear model and regression

Estimating estimable function of regression or linear coefficient β

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 3 / 62

Page 4: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

1 Multiple Linear Regression Model

2 Inference on Multiple Regression

3 Inference about Regression Parameters

4 Estimation and Prediction

5 Geometric View of Regression and Linear Models

6 Estimating estimable function of coefficient

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 4 / 62

Page 5: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Multiple Linear Regression

Example: # of predictor variables = 2.

Yi = β0 + β1Xi1 + β2Xi2 + εi, εi ∼ iid N(0, σ2),

for i = 1, . . . , n.

Response surface:

E(Yi) =

Example:

I Y = Pine bark beetle density

I X1 = Temperature

I X2 = Tree species

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 5 / 62

Page 6: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Interpretation of Coefficients

β0: Intercept. When the model scope includes X1 = X2 = 0.

I β0 is interpreted as the mean response E(Y ) at X1 = X2 = 0.

βj : Slope in the direction of Xj (effect).

I ∂E(Y )/∂Xj =

I EY |X=(X1,X2)(Y )− EY |X=(X′1,X2)(Y ) =

Interpreted as the change in the mean response E(Y ) per unit increase in Xj ,

when X−j are held constant.

What if Xj is qualitative?

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 6 / 62

Page 7: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Multiple Linear Regression

A “general” linear regression model is, for i = 1, · · · , n

Yi = β0 +

p∑j=1

Xijβj + εi, εi ∼ iid N(0, σ2).

Response surface:

E(Yi) = β0 +

p∑j=1

Xijβj

Regression coefficients: β0, β1, . . . , βp−1, βp.

Predictor variables: X1, . . . , Xp are known constants/values.

The model is linear in the parameters, not necessarily in the shape of the

response surface.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 7 / 62

Page 8: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Response Surface Examples

Polynomial regression

E(Y ) = β0 + β1X + β2X2 + β3X

3.

Transformed variables

E(log(Y )) = β0 + β1X1 + β2√X2.

Interaction effects

E(Y ) = β0 + β1X1 + β2√X2 + β3X1X2.

I The change in the mean response corresponding to a unit change in X1

depends on X2 and vice versa.

I Testing whether β3 = 0 or not is very challenging in high-dimensional

(n = o(p)).

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 8 / 62

Page 9: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Qualitative Predictor Variables

Example: Let Y = length of hospital stay, X1 = age, and X2 = gender: 0 for

male and 1 for female.

I An additive model is

I Thus the response surface for males is

and for females is

β2 is

This kind of model sometimes is called ANVOCA model.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 9 / 62

Page 10: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Qualitative Predictor Variables

Interaction: the relationship between X1 and Y for a fixed value of X2 = x2

depends on x2.

An interaction model is

Thus the response surface for males is

and for females is

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 10 / 62

Page 11: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Notation

n observations, 1 response variable, p− 1 β’s with predictors (i.e. β0 is the

pth).

Response variable: Yn×1 = (Y1, Y2, . . . , Yn)T .

The predictors are arranged in the design matrix

Xn×p =

1 X11 X12 · · · X1,p−1

1 X21 X22 · · · X2,p−1...

......

1 Xn1 Xn2 · · · Xn,p−1

Random error: εn×1 = (ε1, ε2, . . . , εn)T .

Regression coefficients: βp×1 = (β0, β1, . . . , βp−1)T .

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 62

Page 12: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Multiple Linear Regression Model in Matrix Terms

The multiple linear regression model can be written as

where as we have seen before

E(ε) = 0n×1, Var{ε} = σ2In×n.

Thus,

and

Y ∼

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 12 / 62

Page 13: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Least Squares Estimation

Consider the criterion:

Q =

n∑i=1

(Yi − β0 −p−1∑j=1

βjXij)2 =

The least squares estimate of β is

assuming that XTX is invertible.

I This is also the MLE.

I What condition on X do we need to have XTX invertible?

I What if XTX is not invertible?

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 13 / 62

Page 14: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Fitted Values and Residuals

Fitted values: Y =

where the hat matrix is

Residuals: e =

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 62

Page 15: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

1 Multiple Linear Regression Model

2 Inference on Multiple Regression

3 Inference about Regression Parameters

4 Estimation and Prediction

5 Geometric View of Regression and Linear Models

6 Estimating estimable function of coefficient

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 15 / 62

Page 16: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Sums of Squares

We have sums of squares in matrix forms that

SSR =

n∑i=1

(Yi − Y )2 =

SSE =

n∑i=1

(Yi − Yi)2 =

SSTO =

n∑i=1

(Yi − Y )2 =

Partitioning of total sum of squares and particularly the df are

SSTO︸ ︷︷ ︸df=n−1

= SSR︸ ︷︷ ︸df=p−1

+ SSE︸ ︷︷ ︸df=n−p

.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 16 / 62

Page 17: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Mean Squares

Define mean squares

MSR =SSR

p− 1, MSE =

SSE

n− p.

It can be shown that E(MSE) =

Also can be shown that

E(MSR)

= σ2 if βj = 0 for ∀ j

> σ2 otherwise.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 17 / 62

Page 18: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

ANOVA Table

The ANOVA table is

Source SS df MS F

Regression SSR MSR F = MSR/MSE

Error SSE MSE

Total SSTO

If then

E(MSE) = E(MSR) = σ2

in which case MSR/MSE ≈ 1.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 18 / 62

Page 19: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Overall F Test for Regression Relation

Test

H0 : v.s. Ha : .

I It can be shown that under H0,

F ∗ =MSR

MSE∼

I Thus we can perform an F -test at level α by the decision rule:

Conditional on H0 being rejected, we may want to find

S = {j | βj 6= 0}

(or a.s.)– Identification/Selection.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 19 / 62

Page 20: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Coefficient of Multiple Determination, R2

The coefficient of multiple determination is denoted by R2 and is defined as

R2 =SSR

SSTO= 1− SSE

SSTO

Interpretation: The proportion of variation in the Yi’s explained by the

regression relation.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 20 / 62

Page 21: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

More on R2

As more predictors are added to the model (p ↑), R2 must increase. Why?

I Recall

SSTO = SSR+ SSE

SSTO is fixed for Y while SSE is a minimum of the unconstraint convex

optimization problem β = argminSSE(β0, . . . , βp−1).

I Suppose we consider an extra predictor and thus consider SSE(β0, . . . , βp).

The β that minimizes this SSE cannot be inferior to the previous minimizer

because βp = 0 is a special case within the new minimization problem that

incorporates the previous one.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 21 / 62

Page 22: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Adjusted R2

R2 depends on p (even for p� n), how to remove that dependence?

The adjusted coefficient of multiple determination is denoted by R2a and is

defined as

R2a = 1− SSE/n− p

SSTO/n− 1= 1−

(n− 1

n− p

)SSE

SSTO.

The adjusted coefficient of multiple determination R2a may decrease when

more predictors are in the model.

Many other statistics such as AIC, BIC, Mallow’s Cp, etc. will be discussed

and they are superior over R2a.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 22 / 62

Page 23: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

1 Multiple Linear Regression Model

2 Inference on Multiple Regression

3 Inference about Regression Parameters

4 Estimation and Prediction

5 Geometric View of Regression and Linear Models

6 Estimating estimable function of coefficient

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 23 / 62

Page 24: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Estimation of Regression Coefficients

Mean satisfies

E(β) = β.

That is, the LS estimate β is an unbiased estimate of β.

Variance-covariance matrix:

Σβ := Var{β} = σ2(XTX

)−1.

I (Σβ)kk =

I (Σβ)kl =

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 24 / 62

Page 25: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Inference about Regression Coefficients

The estimated variance-covariance matrix.

Σβ := s2{β} = MSE ·(XTX

)−1

:=

s2{β0} s{β0, β1} · · · s{β0, βp−1}s{β1, β0} s2{β1} · · · s{β1, βp−1}

......

...

s{βp−1, β0} s{βp−1, β1} · · · s2{βp−1}

Under the multiple linear regression model, we have

βk − βks{βk}

for k = 0, 1, . . . , p− 1.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 25 / 62

Page 26: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Inference about Regression Coefficients

Thus the (1− α) confidence interval for βk is

βk ± t1−α/2;n−ps{βk}.

Test H0 : βk = βk0 versus Ha : βk 6= βk0.

Under H0, we have

t∗ =βk − βk0s{βk}

∼ tn−p

Thus we can perform a t-test at level α by the decision rule:

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 26 / 62

Page 27: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

1 Multiple Linear Regression Model

2 Inference on Multiple Regression

3 Inference about Regression Parameters

4 Estimation and Prediction

5 Geometric View of Regression and Linear Models

6 Estimating estimable function of coefficient

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 27 / 62

Page 28: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Estimation of Mean Response–Hidden Extrapolation

Define Xh = (1, Xh1, . . . , Xh,p−1)T .

Caution about hidden extrapolations.

I The region (with respect to X0) defined by

d(X0) =XT0 (XTX)−1X0 ≤ hmax

where hmax = maxi hii, is an ellipsoid enclosing all data points inside the

“regressor variable hull” (RVH).

I Predictions for any X0 outside the RVH (i.e., d(X0) > hmax) is hidden

extrapolation, at least to some degree.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 28 / 62

Page 29: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Estimation of Mean Response

The estimated mean response corresponding to Xh =

Mean E(Yh) =

Variance Var{Yh} =

Estimated variance is s2{Yh} =

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 29 / 62

Page 30: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Confidence Intervals for Mean Response

The (1− α) confidence interval for E(Yh) is

Yh ± t1−α/2;n−ps{Yh}

The Working-Hotelling (1− α) confidence band for the regression surface is

Yh ±Ws{Yh}

where W 2 = pF (1− α; p, n− p).

The Bonferroni (1− α) joint confidence intervals for g mean responses are

Yh ±Bs{Yh}

where B = t1−α/(2g);n−p.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 30 / 62

Page 31: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Prediction of New Observation

The predicted new observation corresponding to Xh is Yh = XTh β, and

I Mean E(Yh) =XTh β = E(Yh(new)).

I Prediction error variance

σ2

pred = Var(Yh − Yh(new)) =

I Estimated prediction error variance is

s2{pred} =

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 31 / 62

Page 32: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Prediction Intervals for New Observation

The (1− α) prediction interval for Yh(new) is

Yh ± t1−α/2;n−ps{pred}

The Scheffe (1− α) joint confidence intervals for g new observations are

Yh ± Ss{pred}

where S2 = gF (1− α; g, n− p).

The Bonferroni (1− α) joint confidence intervals for g new observations are

Yh ±Bs{pred}

where B = t1−α/(2g);n−p.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 32 / 62

Page 33: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

1 Multiple Linear Regression Model

2 Inference on Multiple Regression

3 Inference about Regression Parameters

4 Estimation and Prediction

5 Geometric View of Regression and Linear Models

6 Estimating estimable function of coefficient

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 33 / 62

Page 34: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Geometric Viewpoint: The Column Space of the Design

Matrix

Xβ is a linear combination of the columns of X

Xβ = [x1, . . . ,xp]

β1...

βp

= β1x1 + . . .+ βpxp

The set of all possible linear combinations of the columns of X is called the

column space of X and is denoted by

C(X) = {Xa : a ∈ Rp}

The Gauss-Markov linear model says y is a random vector whose mean is in

the column space of X and whose variance is σ2I for some positive real

number σ2, i.e.

E(y) ∈ C(X) and Var(y) = σ2I, σ2 ∈ R+

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 34 / 62

Page 35: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

An Example Column Space

X =

[1

1

]⇒ C(X) = {Xa : a ∈ Rp}

=

{[1

1

][a1] : a1 ∈ R

}

=

{a1

[1

1

]: a1 ∈ R

}

=

{[a1

a1

]: a1 ∈ R

}

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 35 / 62

Page 36: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Another Example Column Space

X =

1 0

1 0

0 1

0 1

⇒ C(X) =

1 0

1 0

0 1

0 1

[

a1

a2

]: a ∈ R2

=

a1

1

1

0

0

+ a2

0

0

1

1

: a1, a2 ∈ R

=

a1

a1

0

0

+

0

0

a2

a2

: a1, a2 ∈ R

=

a1

a1

a2

a2

: a1, a2 ∈ R

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 36 / 62

Page 37: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Another Example Column Space

X1 =

1 0

1 0

0 1

0 1

, X2 =

1 1 0

1 1 0

1 0 1

1 0 1

x ∈ C(X1)⇒ x = X1a for some a ∈ R2

⇒ x = X2

[0

a

]for some a ∈ R2

⇒ x = X2b for some b ∈ R3

⇒ x ∈ C(X2)

Thus

C(X1) ⊂ C(X2)

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 37 / 62

Page 38: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Another Example Column Space (continued)

x ∈ C(X2)⇒ x = X2a for some a ∈ R3

⇒ x = a1

1

1

1

1

+ a2

1

1

0

0

+ a3

0

0

1

1

for some a ∈ R3

⇒ x =

a1 + a2

a1 + a2

a1 + a3

a1 + a3

for some a1, a2, a3 ∈ R

⇒ x = X1

[a1 + a2

a1 + a3

]for some a1, a2, a3 ∈ R

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 38 / 62

Page 39: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Another Example Column Space (continued)

⇒ x = X1

[a1 + a2

a1 + a3

]for some a1, a2, a3 ∈ R

⇒ x = X1b for some b ∈ R2

⇒ x ∈ C(X1)

Thus, C(X2) ⊂ C(X1), as we have shown C(X1) ⊂ C(X2). It follows that

C(X1) = C(X2).

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 39 / 62

Page 40: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Estimation of E(y)

A fundamental goal of linear model analysis is to estimate E(y)

We could, of course, use y to estimate E(y)

y is obviously an unbiased estimator of E(y), but it is often not a very

sensible estimator.

For example, suppose[y1

y2

]=

[1

1

]µ+

[ε1

ε2

], and we observe y = [6.1, 2.3]′

Should we estimate E(y) = [µ, µ]′ by y = [6.1, 2.3]′?

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 40 / 62

Page 41: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Estimation of E(y)

The Gauss-Markov linear models says that E(y) ∈ C(X), so we should use

that information when estimating E(y)

Consider estimating E(y) by the point in C(X) that is closest to y (as

measured by the usual Euclidean distance).

This unique point is called the orthogonal projection of y onto C(X) and

denoted by y (although it could be argued that E(y) might be better

notation).

By definition, ||y − y|| = minz∈C(X) ||y − z|| where ||a|| =√∑n

i=1 a2i

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 41 / 62

Page 42: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Geometric Viewpoint on Multiple Regression (and LM)

Geometrically, how to minimize the distance between Y and C(X)?

I That point is

I The vector between Y and Xβ is , and the distance is

For R2: if we add another predictor, C(X) gains 1 more dimension, so ||e||can only decrease. C(X)

I Note: if dim(S) = n then

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 42 / 62

Page 43: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 43 / 62

Page 44: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Orthogonal Projection Matrices

It can be shown that, as we did for least square estimators

∀y ∈ Rn, y = PXy is the optimal one, i.e.

y = PXy is the best estimator of E(y) in the class of linear unbiased estimators

for the unique matrix PX = H, the hat matrix, and is called orthogonal

projection matrix

HH = H, idempotent

H = H ′, symmetric

HX = X and X ′H = X ′ (Why? Intuitively...)

If (X ′X) is not invertible, we use its generalized inverse (X ′X)− where

AA−A = A.

The H is invariant to the choice of (X ′X)−, which is itself not unique

y and y − y are orthogonal (Why?)

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 44 / 62

Page 45: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

An Example Orthogonal Projection

Suppose

[y1

y2

]=

[1

1

]µ+

[ε1

ε2

], and we observe y = [6.1, 2.3]. Then

X(X ′X)−1X ′ =

[1

1

]([1

1

]′ [1

1

])−1 [1

1

]′

=

[1

1

][2]−1[1 1]

=

[1

1

][1

2][1 1]

=1

2

[1 1

1 1

]

=

[1/2 1/2

1/2 1/2

]

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 45 / 62

Page 46: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

An Example Orthogonal Projection

Thus, the orthogonal projection of y = [6.1, 2.3] onto the column space of

X =

[1

1

]is

PXy = Hy =

[1/2 1/2

1/2 1/2

][6.1

2.3

]=

[4.2

4.2

]

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 46 / 62

Page 47: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Geometric illustration

Suppose X =

[1

2

]and y =

[2

3/4

]

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 47 / 62

Page 48: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Geometric illustration

Suppose X =

[1

2

]and y =

[2

3/4

]

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 48 / 62

Page 49: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Geometric illustration

Suppose X =

[1

2

]and y =

[2

3/4

]

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 49 / 62

Page 50: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Geometric illustration

Suppose X =

[1

2

]and y =

[2

3/4

]

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 50 / 62

Page 51: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Geometric illustration

The angle between y and residual y − y is 90. So, “orthogonal projection”.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 51 / 62

Page 52: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

1 Multiple Linear Regression Model

2 Inference on Multiple Regression

3 Inference about Regression Parameters

4 Estimation and Prediction

5 Geometric View of Regression and Linear Models

6 Estimating estimable function of coefficient

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 52 / 62

Page 53: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

What if X is not full column rank?

XTX is not invertible, then (XTX)−1 has to be defined based on the

generalized inverse matrix.

If X is not of full column rank, then there are infinitely many vectors in the

set {b : Xb = Xβ} for any fixed value of β.

Thus, no matter what the value of E(y), there will be infinitely many vectors

b such that Xb = E(y) when X is not of full column rank.

Our response vector y can help us learn about E(y) = Xβ, but when X is

NOT of full column rank, there is NO hope of learning about β alone unless

additional information about β is available.

How, we could estimate estimable function of β

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 53 / 62

Page 54: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Treatment Effects Model

Researchers randomly assigned a total of six experimental units to two treatments

and measured a response of interest.

yij = µ+ τi + εij , i = 1, 2; j = 1, 2, 3

y11

y12

y13

y21

y22

y23

=

µ+ τ1

µ+ τ1

µ+ τ1

µ+ τ2

µ+ τ2

µ+ τ2

+

ε11

ε12

ε13

ε21

ε22

ε23

Question: what is X, β?

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 54 / 62

Page 55: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Treatment Effects Model (continued)

In this case, it makes no sense to estimate β = [µ, τ1, τ2]′ because there are

multiple (infinitely many, in fact) choices of β that define the same mean for

y.

For example µ

τ1

τ2

=

5

−1

1

, 0

4

6

, 999

−995

−993

all yield same Xβ = E(y).

When multiple values for β define the same E(y), we say that β is

non-estimable.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 55 / 62

Page 56: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Estimable Functions of β

A linear function of β, Cβ, is said to be estimable if there is a linear

function of y, say Ay, that is an unbiased estimator for Cβ. Otherwise,

nonexistence of such linear function implies that Cβ is non-estimable.

Note that Ay is an unbiased estimator of Cβ if and only if

E(Ay) = Cβ, for ∀ β ∈ Rp ⇔ AXβ = Cβ ⇔ AX = C

This says that we can estimate Cβ as long as Cβ = AXβ = AE(y) for

some A, i.e. as long as Cβ is a linear function of E(y)

The bottom line is that we can always estimate E(y) and all linear functions

of E(y); all other linear functions of β are non-estimable

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 56 / 62

Page 57: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Treatment Effects Model (continued)

Xβ =

1 1 0

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1

1 0 1

µ

τ1

τ2

=

µ+ τ1

µ+ τ1

µ+ τ1

µ+ τ2

µ+ τ2

µ+ τ2

so that

[1, 0, 0, 0, 0, 0]Xβ = [1, 1, 0]β = µ+ τ1

[0, 0, 0, 1, 0, 0]Xβ = [1, 0, 1]β = µ+ τ2

[1, 0, 0,−1, 0, 0]Xβ = [0, 1, 1]β = τ1 − τ2

are estimable functions of β

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 57 / 62

Page 58: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Estimating Estimable Functions of β

If Cβ is estimable, then there exists a matrix A such that C = AX and

Cβ = AXβ = AE(y) for any β ∈ Rp

It makes sens to estimate Cβ by

AE(y) = Ay = APXy = AX(X ′X)−X ′y

= AX(X ′X)−X ′Xβ = APXXβ = AXβ = Cβ

Cβ is called an Ordinary Least Squares (OLS) estimator of Cβ

Note that although the “hat” is on β, it is Cβ that we are estimating

Invariance of Cβ to the choice of β: Although there are infinitely many

solutions to the normal equations when X is not of full column rank, Cβ is

the same for all normal equation solutions β whenever Cβ is estimable

(STAT 640)

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 58 / 62

Page 59: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Treatment Effects Model (continued)

Suppose our aim is to estimate τ1 − τ2As noted before

Xβ =

1 1 0

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1

1 0 1

µ

τ1

τ2

=

µ+ τ1

µ+ τ1

µ+ τ1

µ+ τ2

µ+ τ2

µ+ τ2

, so that

[1, 0, 0,−1, 0, 0]Xβ = [0, 1, 1]β = τ1 − τ2Thus, we can compute the OLS estimator of τ1 − τ2 as

[1, 0, 0,−1, 0, 0]y = [0, 1, 1]β

where β is any solution to the normal equations.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 59 / 62

Page 60: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Treatment Effects Model (continued)

The normal equation in this case is

1 1 0

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1

1 0 1

1 1 0

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1

1 0 1

b1

b2

b3

=

1 1 0

1 1 0

1 1 0

1 1 0

1 0 1

1 0 1

1 0 1

1 0 1

y11

y12

y13

y21

y22

y23

so that 6 3 3

3 3 0

3 0 3

b1

b2

b3

=

y..

y1.

y2.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 60 / 62

Page 61: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

Treatment Effects Model (continued)

β1 =

y..

y1. − y..y2. − y..

and β2 =

0

y1.

y2.

are both solutions to the normal equation

(Check this).

Thus, the OLS estimator of Cβ = [0, 1,−1]β = τ1 − τ2 is

Cβ1 = [0, 1,−1]

y..

y1. − y..y2. − y..

= y1. − y2. = [0, 1,−1]

0

y1.

y2.

= Cβ2

HW: Can you find two different generalized inverse of (X ′X), A1 and A2 that

(X ′X)Ai(X′X) = (X ′X) so that Ai = (X ′X)− for each i, and they will give

you β1 and β2, respectively?

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 61 / 62

Page 62: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/... · 2015-10-09 · where the hat matrix is Residuals: e= W. Zhou (Colorado State University) STAT 540 July 6th, 2015

The Gauss-Markov Theorem

Under the Gauss-Markov Linear Model, the OLS estimator c′β of an estimable

linear function c′β is the unique Best Linear Unbiased Estimator (BLUE) in the

sense that Var(c′β) is strictly less than the variance of any other linear unbiased

estimator of c′β for all β ∈ Rp and all σ2 ∈ R+.

The Gauss-Markov Theorem says that if we want to estimate an estimable

linear function c′β using a linear estimator that is unbiased, we should always

use the OLS estimator.

In our simple example of the treatment effects model, we could have used

y11y21 to estimate τ1τ2. It is easy to see that y11y21 is a linear estimator

that is unbiased for τ1τ2, but its variance is clearly larger than the variance of

the OLS estimator y1.y2. (as guaranteed by the Gauss-Markov Theorem).

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 62 / 62