12_Week4Lect3

MATH2831/2931

Linear Models/ Higher Linear Models.

August 19, 2013

Week 4 Lecture 3 - Last lecture:

Confidence Intervals for coefficients

Properties of multivariate Gaussian

Hypothesis testing for coefficients

Confidence intervals for the mean and prediction intervals.

Joint confidence regions.

Week 4 Lecture 3 - This lecture:

Decomposing variation

Introduction to the analysis of variance table

Sequential sums of squares.

Week 4 Lecture 3 - Decomposing variation

RECALL: Identity for simple linear regression

n

i=1

(yi y)2 =

n

i=1

(yi y)2 +

n

i=1

(yi yi )2.

SStotal = SSreg + SSres

SStotal , total sum of squares (the sum of squared deviations ofthe responses about their mean)

SSreg , regression sum of squares (sum of squared deviations offitted values about their mean, which is y)

SSres is called the residual sum of squares (the sum of thesquared deviations of the fitted values from the responses).

Week 4 Lecture 3 - Decomposing variation

This identity decomposing variation into a part explained bythe model and a part unexplained holds in the general linearmodel.

For simple linear regression, the partitioning of variation waspresented in the analysis of variance (ANOVA) table

The ANOVA table was also a way of organizing calculations inhypothesis testing

Week 4 Lecture 3 - Adjusted R2

For simple linear regression

R2 =SSreg

SStotal.

We also have adjusted R2, written as R2.

R2 = 0.748 here (or 74.8 percent)

What is the definition of R2 ???

Rewrite R2 as

R2 = 1SSres

SStotal(1)

Define R2 by replacing SSres in (1) by 2 (which is

SSres/(n p)) and replacing SStotal by SStotal/(n 1).


R2 = 1(n 1)SSres(n p)SStotal

(2)

or

R2 = 12(n 1)

SStotal. (3)

In terms of R2,

R2 = 1n 1

n p(1 R2).


What was the motivation for introduction of R2 ?

R2 is an easily interpreted measure of fit of a linear model:proportion of total variation explained by the model.

Might be tempted to use R2 as a basis for comparing modelswith different numbers of parameters.

IMPORTANT: R2 is not helpful here: if a new predictor isadded to a linear model, the residual sum of squares alwaysdecreases, and R2 will increase.

The attempt to select a subset of good predictors from a setof possible predictors using R2 results in the full model, evenif many of the predictors are irrelevant.

R2 does not necessarily increase as new predictors areadded to a model.


Since

R2 = 12(n 1)

SStotal

R2 increases as 2 decreases

Ranking models using R2 is equivalent to ranking modelsbased on 2

QUESTION: Does 2 necessarily decrease as new predictorsare added to the model, and hence must R2 increase?


Recall

2 =(y Xb)(y Xb)

n p.

Consider two models in which one model contains a subset of thepredictors included in the other.

For the larger model, the numerator in the above expression(residual sum of squares) is smaller, but the denominator willalso be smaller, as p is larger

Any reduction in the residual sum of squares must be largeenough to overcome the reduction in the denominator

R2 doesnt necessarily increase as we make the model morecomplicated.

So R2 may be useful as a crude device for model comparison !!

Week 4 Lecture 3 - Analysis of variance table

Week 4 Lecture 3 - Analysis of variance table

Notation: = (0, ...k), = ((1), (2)) where (1) is an r 1

subvector and (2) is a (p r) 1 subvector.

Week 4 Lecture 3 - Sequential sums of squares

Write R((2)|(1)) for the increase in SSreg when thepredictors corresponding to the parameters (2) are added toa model involving the parameters (1)

Think of R((2)|(1)) as the variation explained by theterm involving (2) in the presence of the term involving(1)

Define R(1, ..., k |0) as SSreg .


Sequential sums of squares shown below the analysis of variancetable are the values

R(1|0)

R(2|0, 1)

R(3|0, 1, 2)...

R(k |0, ..., k1)

Values add up to R(1, ...k |0) = SSreg .


Sequential sums of squares are useful when we have first orderedthe variables in our model in a meaningful way(based on the underlying science or context).

They tell us about how much a term contributes to explainingvariation given all the previous terms in the table(but ignoring the terms which come after).

Week 4 Lecture 3 - Hypothesis testing

Simple linear regression model: t test (or equivalent F test)for examining the usefulness of a predictor.

General linear model: partial t test for the usefulness of apredictor in the presence of the other predictors.

Equivalent partial F test: test statistic is the square of thepartial t statistic.

Test for overall model adequacy: is the model including all thepredictors better than the model containing just an intercept?

The F statistic in the analysis of variance table and thep-value relate to a test for overall model adequacy !

Week 4 Lecture 3 - Testing model adequacy

In the general linear model, if 1 = . . . = k = 0, then the statistic

F =SSreg/k

SSres/(n p)

has an Fk,np distribution.This distributional result is the basis for a hypothesis test.

Week 4 Lecture 3 - Testing model adequacyTo test

H0 : 1 = . . . = k = 0

versusH1 : Not all j = 0, j = 1, . . . , k

we use the test statistic

F =SSreg/k

SSres/(n p).

For a size test the critical region is

F F;k,np.

Alternatively, the p-value for the test is

Pr(F F )

where F Fk,np.

Week 4 Lecture 3 - Testing model adequacy

ANOVA table: columns show source of variation (Source), degreesof freedom (DF), sums of squares (SS), mean squares (MS), valueof F statistic for testing model adequacy (F) and correspondingp-value (P).

Source DF SS MS F P

Regression p 1 SSregSSreg(p1)

MSregMSres

p

Residual n p SSresSSres(np)

Total n 1 SStotal

Week 4 Lecture 3 - Model adequacy for risk assessment

Risk assessment data: response is mean risk assessment, sevenaccounting determined measures of risk as predictors.

Week 4 Lecture 3 - Model adequacy for risk assessmentRECAL we are testing

H0 : 1 = . . . = k = 0

versusH1 : Not all j = 0, j = 1, . . . , k

we use the test statistic

F =SSreg/k

SSres/(n p).

F statistic for testing overall model adequacy is 6.97, and theassociated p-value is

p = Pr(F 6.97)

where F F7,17.p = 0.001

approximately.


RESULT: Reject the null hypothesis

H0 : 1 = ... = k = 0

in favour of the alternative

H1 : Not all j = 0, j = 1, ..., k.

What can we say about inclusion of predictors in the order wehave selected?

Mean Risk Assessment = 2.19 + 0.443 Dividend Payout +0.865 Current Ratio - 0.247 Asset Size + 1.96 Asset Growth+ 3.59 Leverage + 0.135 Variability Earnings + 1.05Covariability Earnings

We have from this ordering: R(1|0) = 18.42;R(2|1, 0) = 5.6042; R(3|2, 1, 0) = 10.12;R(4|3, 2, 1, 0) = 1.64; . . .


Under a different ordering:Mean Risk Assessment = 2.19 + 0.865 Current Ratio + 1.96Asset Growth + 3.59 Leverage + 0.443 Dividend Payout -0.247 Asset Size + 1.05 Covariability Earnings + 0.135Variability Earnings

NOTE: The joint model adequacy F test statistic doesnot change neither does the result of the hypothesistest !!!

NOTE: The estimates S = 0.981620; R-Sq = 74.2%;R-Sq(adj) = 63.5% are unchanged !!!

NOTE: R(1|0), R(2|1, 0) ... clearly changed !!!

Week 4 Lecture 2 - Learning Expectations.

Be familiar with decomposing variation in the General LinearModel.

Understand the Sequential sums of squares and be able tointerpret and calculate.

Understand R2 versus R2(Adjusted)

Documents

12_Week4Lect3