Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
F Test and Goodness of Fit
Investment and Financial Data Analysis 2017
Christopher Tinghttp://mysmu.edu.sg/faculty/christophert/
Lee Kong Chian School of BusinessSingapore Management University
February 10, 2017
February 10, 2017
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 1 / 21
Table of Contents
1 Tutorial on Multiple Linear Regression
2 F Tests
3 Goodness of Fit
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 2 / 21
Tutorial on Multiple Linear Regression
Multiple Linear Regression and the Constant Term
[ Model 2: yyy = XβXβXβ + uuuFor t = 1, 2, . . . , T ,
yt = β1 + β2X2,t + β3X3,t + ...+ βKXK,t + ut,
There are K parameters, β1, β2, . . . , βK .
[ The first parameter is the y-intercept in Model 1, and the averageof y in Model 0, with X1,t = 1 being a constant for all t.
XXX1 =
11···1
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 3 / 21
Tutorial on Multiple Linear Regression
Model 1: Simple Linear Regression
[ If K = 2, we are back to Model 1y1y2...yT
=
1 x1,11 x2,2...
...1 x2,T
[β1β2
]+
u1u2...uT
T × 1 T × 2 2× 1 T × 1
[ Notice that the matrices written in this way are conformable.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 4 / 21
Tutorial on Multiple Linear Regression
Numerical Illustration: Data
[ When K = 3,
yt = β1 + β2x2,t + β3x3,t + ut
for t = 1, 2, . . . , 15.
(XXX ′XXX)−1 =
2.0 3.5 −1.0
3.5 1.0 6.5
−1.0 6.5 4.3
, XXX ′yyy =
−3.0
2.2
0.6
[ The residual sum of sqauares (RSS) is uuu′uuu = 10.96
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 5 / 21
Tutorial on Multiple Linear Regression
Calculations in Simple Regression
[ To calculate the coefficients, just multiply the matrix (XXX ′XXX)−1 bythe vector XXX ′yyy to obtain (XXX ′XXX)−1yyy 2.0 3.5 −1.0
3.5 1.0 6.5
−1.0 6.5 4.3
−3.0
2.2
0.6
=
1.10
4.40
19.88
[ To calculate the standard errors, we need an estimate of σ2u.
s2 =RSST −K
=10.96
15− 3= 0.91
[ The variance-covariance matrix of βββ is given by
s2(XXX ′XXX)−1 = 0.91(XXX ′XXX)−1 =
1.82 3.19 −0.913.19 0.91 5.92−0.91 5.92 3.91
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 6 / 21
Tutorial on Multiple Linear Regression
Standard Errors and Estimated Model
[ The variances are on the leading diagonal:
var(β1) = 1.82 SE(β1) = 1.35
var(β2) = 0.91 ⇐⇒ SE(β2) = 0.95
var(β3) = 3.91 SE(β3) = 1.98
[ We write:
y = 1.10− 4.40x2 + 19.88x3
(1.35) (0.96) (1.98)
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 7 / 21
F Tests
Testing Multiple Hypotheses: The F -test
\ We have used the t-test to test single hypotheses, i.e. hypothesesinvolving only one coefficient. But what if we want to test morethan one coefficient simultaneously?
\ Answer: F -test, which involves estimating two regressions:
\ The unrestricted regression is the one in which the coefficients arefreely determined by the data, as we have done before.
\ The restricted regression is the one in which the coefficients arerestricted, i.e. the restrictions are imposed on some parameters ofβββ.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 8 / 21
F Tests
A F -test Example
\ The general regression is
yt = β1 + β2x2,t + β3x3,t + β4x4,t + ut
\ We want to test the restriction that β3 + β4 = 1 as we have sometheory suggesting this relationship.
\ The unrestricted regression is (9) above, but what is the restrictedregression?
yt = β1 + β2x2,t + β3x3,t + β4x4,t + ut s.t. β3 + β4 = 1
\ We substitute the restriction (β3 + β4 = 1) into the regression sothat it is automatically imposed on the data.
β3 + β4 = 1⇒ β4 = 1− β3
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 9 / 21
F Tests
The F -test: Forming the Restricted Regression
\ Hence, the restricted regression is
yt = β1 + β2x2,t + β3x3,t + (1− β3)x4,t + ut
=⇒ yt = β1 + β2x2,t + β3x3,t + x4,t − β3x4,t + ut
=⇒ (yt − x4,t) = β1 + β2x2,t + β3(x3,t − x4,t) + ut
\ For estimation, we create two new variables, Pt and Qt.
Pt = yt − x4tQt = x3t − x4t
\ SoPt = β1 + β2x2,t + β3Qt + ut
is the restricted regression we actually estimate.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 10 / 21
F Tests
Calculating the F -Test Statistic
\ The test statistic is given by
test statistic =RRSS− URSS
URSS× T −K
m
where URSS = RSS from unrestricted regressionRRSS = RSS from restricted regression
m = number of restrictionsT = number of observationsK = number of regressors in unrestricted regression
including a constant in the unrestricted regression(or the total number of parameters to be estimated).
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 11 / 21
F Tests
The F -Distribution
\ The test statistic follows the F -distribution, which has a pair ofdegree-of-freedom parameters.
\ The value of the degrees of freedom parameters are m andT −K, respectively.
\ Note that the order of the d.f. parameters is important.
\ The F -distribution has only positive values and is not symmetrical.
\ We therefore only reject the null if the test statistic > criticalF -value.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 12 / 21
F Tests
Determining the Number of Restrictions
\ Examples :
H0 : hypothesis Number of restrictions,mβ1 + β2 = 2 1β2 = 1 and β3 = −1 2β2 = 0, β3 = 0 and β4 = 0 3
\ If the model is
yt = β1 + β2x2,t + β3x3,t + β4,tx4,t + ut,
then the null hypothesis is
H0: β2 = 0, and β3 = 0 and β4 = 0,
i.e., all of the coefficients except the intercept coefficient are zero.\ Alternative hypothesis
H1 : β2 6= 0, or β3 6= 0 or β4 6= 0
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 13 / 21
F Tests
The Relationship between the t and theF -Distributions
\ Any hypothesis which could be tested with a t-test could havebeen tested using an F -test, but not the other way around.
\ For example, consider the hypothesisH0 : β2 = 0.5H1 : β2 6= 0.5
We could have tested this using the usual t-test: test stat=β2 − 0.5
SE(β2)
or it could be tested in the framework above for the F -test.\ Note that the two tests always give the same result since the
t-distribution is just a special case of the F -distribution.\ For example, if we have some random variable Z, and Z ∼ tT−K
then also Z2 ∼ F (1, T −K).
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 14 / 21
F Tests
F -test Example: Question
\ Suppose a researcher wants to test whether the returns y on acompany stock show unit sensitivity to two factors (factor x2 andfactor x3) among three considered.
\ The regression is carried out on 144 monthly observations.
\ The regression is
yt = β1 + β2x2,t + β3x3,t + β4x4,t + ut.
– What are the restricted and unrestricted regressions?
– Given that the two RSS are 436.1 and 397.2 respectively, performthe test.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 15 / 21
F Tests
F -test Example: Solution
\ Unit sensitivity implies H0: β2 = 1 and β3 = 1.\ The unrestricted regression is the one in the question.\ The restricted regression is (yt − x2,t − x3,t) = β1 + β4x4,t + ut or
letting zt = yt − x2,t − x3,t, the restricted regression iszt = β1 + β4x4,t + ut
\ In the F -test formula, T=144, K=4, m=2, RRSS=436.1,URSS=397.2
\ F -test statistic = 6.86. Critical value is an F (2,140) = 3.07 (5%)and 4.79 (1%).
\ Conclusion: Reject H0.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 16 / 21
Goodness of Fit
Goodness of Fit Statistics
] The most common goodness of fit statistic is known as R2. Oneway to define R2 is to say that it is the square of the correlationcoefficient between yt and yt.
] Recall that what we are interested in doing is explaining thevariability of y about its mean value, i.e., the total sum of squares,TSS:
TSS =∑t
(yt − y)2
] We can split the TSS into two parts, the part which we haveexplained (known as the explained sum of squares (ESS), and thepart which we did not explain using the model (RSS).
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 17 / 21
Goodness of Fit
Defining R2
] That is,
TSS = ESS + RSS∑t
(yt − y)2 =∑t
(yt − y)2 +∑t
u2t
] Our goodness of fit statistic is R2 = ESSTSS
] But since TSS = ESS + RSS, we can also write
R2 =TSS− RSS
TSS= 1− RSS
TSS] R2 must always lie between zero and one. To understand this,
consider two extremes
RSS = TSS i.e. ESS = 0 so R2 = ESSTSS = 0
ESS = TSS i.e. RSS = 0 so R2 = ESSTSS = 1
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 18 / 21
Goodness of Fit
Problems with R2 as a Goodness of Fit Measure
1 R2 is defined in terms of variation about the mean of y so that if amodel is reparameterised (rearranged) and the dependentvariable changes, R2 will change.
2 R2 never falls if more regressors are added. to the regression, e.g.consider:
Regression1 : yt = β1 + β2x2,t + β3x3,t + ut
Regression2 : yt = β1 + β2x2,t + β3x3,t + β4x4t, + ut
R2 will always be at least as high for regression 2 relative toregression 1.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 19 / 21
Goodness of Fit
Adjusted R2
] In order to get around these problems, a modification is made totake into account the loss of degrees of freedom associated withadding extra variables. This is known as R2, or adjusted R2:
R2 = 1−[T − 1
T −K(1−R2)
]] So if we add an extra regressor, K increases and unless R2
increases by a more than offsetting amount, R2 will actually fall.
] But there are still problems with the criterion: No distribution for R2
or R2
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 20 / 21
Goodness of Fit
Akaike Information Criterion (1973)
] For i.i.d. normally distributed errors,
AIC = T ln
(RSST
)+ 2K.
] For small sample sizes (T/K ≤ 40), use the second-order AIC:
AICc = AIC +2K(K + 1)
T −K − 1
] The smaller AIC is, the better is the model in not over-fitting thedata.
Christopher Ting (Lee Kong Chian School of Business Singapore Management University [1em] February 10, 2017 [6em] )QF302 Week 6 February 10, 2017 21 / 21