F Test and Goodness of Fit - mysmu.edu€¦ · Christopher Ting QF302 Week 6 February 10, 2017 16 / 21 (Lee Kong Chian School of Business Singapore Management University[1em] February

F Test and Goodness of Fit

Investment and Financial Data Analysis 2017

Christopher Tinghttp://mysmu.edu.sg/faculty/christophert/

k [email protected]

Lee Kong Chian School of BusinessSingapore Management University

February 10, 2017

February 10, 2017

Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 1 / 21

http://mysmu.edu.sg/faculty/christophert/

mailto:[email protected]

Table of Contents

1 Tutorial on Multiple Linear Regression

2 F Tests

3 Goodness of Fit


Tutorial on Multiple Linear Regression

Multiple Linear Regression and the Constant Term

[ Model 2: yyy = XβXβXβ + uuuFor t = 1, 2, . . . , T ,

yt = β1 + β2X2,t + β3X3,t + ...+ βKXK,t + ut,

There are K parameters, β1, β2, . . . , βK .

[ The first parameter is the y-intercept in Model 1, and the averageof y in Model 0, with X1,t = 1 being a constant for all t.

XXX1 =

11···1



Model 1: Simple Linear Regression

[ If K = 2, we are back to Model 1y1y2...yT

=

1 x1,11 x2,2...

...1 x2,T

[β1β2

]+

u1u2...uT

T × 1 T × 2 2× 1 T × 1

[ Notice that the matrices written in this way are conformable.



Numerical Illustration: Data

[ When K = 3,

yt = β1 + β2x2,t + β3x3,t + ut

for t = 1, 2, . . . , 15.

(XXX ′XXX)−1 =

2.0 3.5 −1.0

3.5 1.0 6.5

−1.0 6.5 4.3

, XXX ′yyy =

−3.0

2.2

0.6

[ The residual sum of sqauares (RSS) is uuu′uuu = 10.96



Calculations in Simple Regression

[ To calculate the coefficients, just multiply the matrix (XXX ′XXX)−1 bythe vector XXX ′yyy to obtain (XXX ′XXX)−1yyy 2.0 3.5 −1.0

3.5 1.0 6.5

−1.0 6.5 4.3

−3.0

2.2

0.6

=

1.10

4.40

19.88

[ To calculate the standard errors, we need an estimate of σ2u.

s2 =RSST −K

=10.96

15− 3= 0.91

[ The variance-covariance matrix of βββ is given by

s2(XXX ′XXX)−1 = 0.91(XXX ′XXX)−1 =

1.82 3.19 −0.913.19 0.91 5.92−0.91 5.92 3.91



Standard Errors and Estimated Model

[ The variances are on the leading diagonal:

var(β1) = 1.82 SE(β1) = 1.35

var(β2) = 0.91 ⇐⇒ SE(β2) = 0.95

var(β3) = 3.91 SE(β3) = 1.98

[ We write:

y = 1.10− 4.40x2 + 19.88x3

(1.35) (0.96) (1.98)


F Tests

Testing Multiple Hypotheses: The F -test

\ We have used the t-test to test single hypotheses, i.e. hypothesesinvolving only one coefficient. But what if we want to test morethan one coefficient simultaneously?

\ Answer: F -test, which involves estimating two regressions:

\ The unrestricted regression is the one in which the coefficients arefreely determined by the data, as we have done before.

\ The restricted regression is the one in which the coefficients arerestricted, i.e. the restrictions are imposed on some parameters ofβββ.


F Tests

A F -test Example

\ The general regression is

yt = β1 + β2x2,t + β3x3,t + β4x4,t + ut

\ We want to test the restriction that β3 + β4 = 1 as we have sometheory suggesting this relationship.

\ The unrestricted regression is (9) above, but what is the restrictedregression?

yt = β1 + β2x2,t + β3x3,t + β4x4,t + ut s.t. β3 + β4 = 1

\ We substitute the restriction (β3 + β4 = 1) into the regression sothat it is automatically imposed on the data.

β3 + β4 = 1⇒ β4 = 1− β3


F Tests

The F -test: Forming the Restricted Regression

\ Hence, the restricted regression is

yt = β1 + β2x2,t + β3x3,t + (1− β3)x4,t + ut

=⇒ yt = β1 + β2x2,t + β3x3,t + x4,t − β3x4,t + ut

=⇒ (yt − x4,t) = β1 + β2x2,t + β3(x3,t − x4,t) + ut

\ For estimation, we create two new variables, Pt and Qt.

Pt = yt − x4tQt = x3t − x4t

\ SoPt = β1 + β2x2,t + β3Qt + ut

is the restricted regression we actually estimate.


F Tests

Calculating the F -Test Statistic

\ The test statistic is given by

test statistic =RRSS− URSS

URSS× T −K

m

where URSS = RSS from unrestricted regressionRRSS = RSS from restricted regression

m = number of restrictionsT = number of observationsK = number of regressors in unrestricted regression

including a constant in the unrestricted regression(or the total number of parameters to be estimated).


F Tests

The F -Distribution

\ The test statistic follows the F -distribution, which has a pair ofdegree-of-freedom parameters.

\ The value of the degrees of freedom parameters are m andT −K, respectively.

\ Note that the order of the d.f. parameters is important.

\ The F -distribution has only positive values and is not symmetrical.

\ We therefore only reject the null if the test statistic > criticalF -value.


F Tests

Determining the Number of Restrictions

\ Examples :

H0 : hypothesis Number of restrictions,mβ1 + β2 = 2 1β2 = 1 and β3 = −1 2β2 = 0, β3 = 0 and β4 = 0 3

\ If the model is

yt = β1 + β2x2,t + β3x3,t + β4,tx4,t + ut,

then the null hypothesis is

H0: β2 = 0, and β3 = 0 and β4 = 0,

i.e., all of the coefficients except the intercept coefficient are zero.\ Alternative hypothesis

H1 : β2 6= 0, or β3 6= 0 or β4 6= 0


F Tests

The Relationship between the t and theF -Distributions

\ Any hypothesis which could be tested with a t-test could havebeen tested using an F -test, but not the other way around.

\ For example, consider the hypothesisH0 : β2 = 0.5H1 : β2 6= 0.5

We could have tested this using the usual t-test: test stat=β2 − 0.5

SE(β2)

or it could be tested in the framework above for the F -test.\ Note that the two tests always give the same result since the

t-distribution is just a special case of the F -distribution.\ For example, if we have some random variable Z, and Z ∼ tT−K

then also Z2 ∼ F (1, T −K).


F Tests

F -test Example: Question

\ Suppose a researcher wants to test whether the returns y on acompany stock show unit sensitivity to two factors (factor x2 andfactor x3) among three considered.

\ The regression is carried out on 144 monthly observations.

\ The regression is

yt = β1 + β2x2,t + β3x3,t + β4x4,t + ut.

– What are the restricted and unrestricted regressions?

– Given that the two RSS are 436.1 and 397.2 respectively, performthe test.


F Tests

F -test Example: Solution

\ Unit sensitivity implies H0: β2 = 1 and β3 = 1.\ The unrestricted regression is the one in the question.\ The restricted regression is (yt − x2,t − x3,t) = β1 + β4x4,t + ut or

letting zt = yt − x2,t − x3,t, the restricted regression iszt = β1 + β4x4,t + ut

\ In the F -test formula, T=144, K=4, m=2, RRSS=436.1,URSS=397.2

\ F -test statistic = 6.86. Critical value is an F (2,140) = 3.07 (5%)and 4.79 (1%).

\ Conclusion: Reject H0.


Goodness of Fit

Goodness of Fit Statistics

] The most common goodness of fit statistic is known as R2. Oneway to define R2 is to say that it is the square of the correlationcoefficient between yt and yt.

] Recall that what we are interested in doing is explaining thevariability of y about its mean value, i.e., the total sum of squares,TSS:

TSS =∑t

(yt − y)2

] We can split the TSS into two parts, the part which we haveexplained (known as the explained sum of squares (ESS), and thepart which we did not explain using the model (RSS).


Goodness of Fit

Defining R2

] That is,

TSS = ESS + RSS∑t

(yt − y)2 =∑t

(yt − y)2 +∑t

u2t

] Our goodness of fit statistic is R2 = ESSTSS

] But since TSS = ESS + RSS, we can also write

R2 =TSS− RSS

TSS= 1− RSS

TSS] R2 must always lie between zero and one. To understand this,

consider two extremes

RSS = TSS i.e. ESS = 0 so R2 = ESSTSS = 0

ESS = TSS i.e. RSS = 0 so R2 = ESSTSS = 1


Goodness of Fit

Problems with R2 as a Goodness of Fit Measure

1 R2 is defined in terms of variation about the mean of y so that if amodel is reparameterised (rearranged) and the dependentvariable changes, R2 will change.

2 R2 never falls if more regressors are added. to the regression, e.g.consider:

Regression1 : yt = β1 + β2x2,t + β3x3,t + ut

Regression2 : yt = β1 + β2x2,t + β3x3,t + β4x4t, + ut

R2 will always be at least as high for regression 2 relative toregression 1.


Goodness of Fit

Adjusted R2

] In order to get around these problems, a modification is made totake into account the loss of degrees of freedom associated withadding extra variables. This is known as R2, or adjusted R2:

R2 = 1−[T − 1

T −K(1−R2)

]] So if we add an extra regressor, K increases and unless R2

increases by a more than offsetting amount, R2 will actually fall.

] But there are still problems with the criterion: No distribution for R2

or R2


Goodness of Fit

Akaike Information Criterion (1973)

] For i.i.d. normally distributed errors,

AIC = T ln

(RSST

)+ 2K.

] For small sample sizes (T/K ≤ 40), use the second-order AIC:

AICc = AIC +2K(K + 1)

T −K − 1

] The smaller AIC is, the better is the model in not over-fitting thedata.


Documents

F Test and Goodness of Fit - mysmu.edu€¦ · Christopher Ting QF302 Week 6 February 10, 2017 16 / 21 (Lee Kong Chian School of Business Singapore Management University[1em] February