CHAPTER 4: Forecasting by Regression

1. Revision of Linear Regression2. Multicollinearity3. Autocorrelation

4. Seemingly Unrelated Regression Equations

CHAPTER 4: Forecasting by Regression

Prof. Alan Wan

1 / 57



Table of contents

1. Revision of Linear Regression

2. Multicollinearity

3. Autocorrelation3.1 First-order Autocorrelation and the Durbin-Watson Test3.2 Correction for Autocorrelation


2 / 57



Revision of Linear Regression

I One main purpose of regression is to forecast an outcome,also called response variable or dependent variable, based oncertain factors, also called explanatory variables or regressors.

I The outcome has to be quantitative, but the explanatoryvariables can either be quantitative or qualitative.

I Linear regression postulates a linear association between theresponse and each of the explanatory variables; simpleregression deals with situations with one explanatory variable,whereas multiple regression tackles cases with more than oneregressor.

3 / 57







3 / 57







3 / 57




I A multiple linear regression model may be expressed as:

Yt = β0 + β1X1t + β2X2t + β3X3t + · · ·+ βkXkt + εt ,

where εt ∼ N(0, σ2). Hence

E (Yt) = β0 + β1X1t + β2X2t + β3X3t + · · ·+ βkXkt

I The estimated sample multiple linear regression model is thus

Yt = b0 + b1X1t + b2X2t + b3X3t + · · ·+ bkXkt ,

where b0, b1, · · · , bk are the ordinary least squares (O.L.S.)estimators of β0, β1, · · · , βk respectively obtained by thecriterion

min∑n

t=1 e2t = min

∑nt=1(Yt − Yt)

2

4 / 57




I A multiple linear regression model may be expressed as:

Yt = β0 + β1X1t + β2X2t + β3X3t + · · ·+ βkXkt + εt ,

where εt ∼ N(0, σ2). Hence

E (Yt) = β0 + β1X1t + β2X2t + β3X3t + · · ·+ βkXkt

I The estimated sample multiple linear regression model is thus

Yt = b0 + b1X1t + b2X2t + b3X3t + · · ·+ bkXkt ,

where b0, b1, · · · , bk are the ordinary least squares (O.L.S.)estimators of β0, β1, · · · , βk respectively obtained by thecriterion

min∑n

t=1 e2t = min

∑nt=1(Yt − Yt)

2

4 / 57




I A slope coefficient represents the marginal change of Yt withrespect to a one-unit change in the corresponding explanatoryvariable.

I The linear regression model assumes1. that there is a linear association between the response and

each of the explanatory variables2. that E (εt) = 0 for all t, meaning that no relevant explanatory

variable has been omitted3. that the disturbances are homoscedastic, i.e., var(εt) = σ2 for

all t4. that the disturbances are uncorrelated, i.e., cov(εt , εt+s) = 0

for all t and s 6= 05. the absence of perfect multicollinearity, i.e., no exact linear

association exists among the explanatory variables6. normality of εt ’s (this assumption is needed only when

conducting inference)

5 / 57






each of the explanatory variables

2. that E (εt) = 0 for all t, meaning that no relevant explanatoryvariable has been omitted

3. that the disturbances are homoscedastic, i.e., var(εt) = σ2 forall t

4. that the disturbances are uncorrelated, i.e., cov(εt , εt+s) = 0for all t and s 6= 0

5. the absence of perfect multicollinearity, i.e., no exact linearassociation exists among the explanatory variables

6. normality of εt ’s (this assumption is needed only whenconducting inference)

5 / 57







variable has been omitted

3. that the disturbances are homoscedastic, i.e., var(εt) = σ2 forall t




5 / 57








all t




5 / 57









for all t and s 6= 0



5 / 57










association exists among the explanatory variables


5 / 57










association exists among the explanatory variables6. normality of εt ’s (this assumption is needed only when

conducting inference)5 / 57




I The O.L.S. estimator bj is a ”linear” estimator of βj forj = 0, · · · , k, because each bj can be written as a linearcombination of Yt ’s weighted by a mixture of the values ofXt ’s.

I When Assumptions 1.-5. are fulfilled, bj yields the best linearunbiased estimator (B.L.U.E.) of βj , meaning that the linearestimator bj is unbiased (i.e., E (bj) = βj for j = 0, · · · , k)and bj has the smallest variance (and hence the highestaverage precision) of all linear unbiased estimators of βj .

I The theorem proving the above result is known as theGauss-Markov Theorem.

6 / 57







6 / 57







6 / 57




I Common model diagnostics include

1. t-tests of significance of individual coefficients2. F test of model significance3. R2 and adjusted-R2 for goodness of fit4. test of autocorrelation (usually for time series data)5. test of homoscedasticity (usually for cross section data)6. test of autoregressive conditional heteroscedasticity (usually for

financial time series data)7. detection of outliers8. test of normality of errors9. test of coefficient constancy (structural change)

and others.

7 / 57




The following example with n = 34 annual observations is takenfrom Griffiths, Hill and Judge (1993). We are concerned with therelationship between the area of sugarcane (A, in 1000 of hectares)planted in a region of Bangladesh. By using area planted insteadof quantity as the dependent variable, we are elminating yielduncertainty. It is thought that when farmers decide on an area forsugarcane production, their decision is largely determined by theprice of sugarcane (PS , in taka/tonne) and that of its mainsubstitute, jute (PJ, in taka/tonne). Assuming a log-linearfunction form for constant elasticity, we specify the model as

lnAt = β0 + β1lnPSt + β2lnPJt + εt

8 / 57




PROC REG of SAS produces the following results:

The REG Procedure Model: MODEL1 Dependent Variable: lna Number of Observations Read 34 Number of Observations Used 34 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 7.93746 3.96873 25.35 <.0001 Error 31 4.85254 0.15653 Corrected Total 33 12.79000 Root MSE 0.39564 R-Square 0.6206 Dependent Mean 4.66765 Adj R-Sq 0.5961 Coeff Var 8.47628 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3.96334 2.96552 1.34 0.1911 lnps 1 0.99396 0.14244 6.98 <.0001 lnpj 1 -0.68526 0.45964 -1.49 0.1461

9 / 57




I The estimated regression equation is thus

lnAt = 3.96334(2.96552)

+ 0.99396(0.14244)

lnPSt − 0.68526(0.45964)

lnPJt

I A test of H0 : β1 = 0 vs. H1 : β1 6= 0 yields

t = b1−0s.e.(b1)

= 0.993960.14244 = 6.98 with a p-value of < 0.0001.

Hence β1 is significantly different from zero, and lnPS istherefore a significant explanatory variable.

I However, the same cannot be said about β2 or lnPJ.

10 / 57





lnAt = 3.96334(2.96552)

+ 0.99396(0.14244)

lnPSt − 0.68526(0.45964)

lnPJt


t = b1−0s.e.(b1)

= 0.993960.14244 = 6.98 with a p-value of < 0.0001.



10 / 57





lnAt = 3.96334(2.96552)

+ 0.99396(0.14244)

lnPSt − 0.68526(0.45964)

lnPJt


t = b1−0s.e.(b1)

= 0.993960.14244 = 6.98 with a p-value of < 0.0001.



10 / 57




I A test of H0 : β1 = β2 = 0 vs. H1 : otherwise by the F test:

F = RSS/kESS/(n−(k+1)) = 7.93746/2

4.85254/31 = 25.35

with a p-value of < 0.0001, confirming the overall significanceof the model.

I Question: Why should we test the overall significance of themodel in addition to testing individual regressors’ significance?

I R2 = 0.6206 meaning that the estimated regression canexplain 62.06% of the variability of lnA in the sample; afteradjusting for the model’s d.o.f., the explanatory power of themodel is 59.61%, as indicated by the adjusted-R2.

11 / 57




Removing the insignificant lnPJ and re-running the regressionyields:

The REG Procedure Model: MODEL1 Dependent Variable: lna Number of Observations Read 34 Number of Observations Used 34 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 7.58954 7.58954 46.70 <.0001 Error 32 5.20046 0.16251 Corrected Total 33 12.79000 Root MSE 0.40313 R-Square 0.5934 Dependent Mean 4.66765 Adj R-Sq 0.5807 Coeff Var 8.63669 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.32540 0.73391 -0.44 0.6605 lnps 1 0.90903 0.13302 6.83 <.0001

12 / 57




I Note that R2 decreases by 4.38% from 0.6206 to 0.5934whereas adjusted-R2 decreases by 2.58% from 0.5961 to0.5807. Recall that when explanatory variables are dropped(added), R2 always falls (rises), but adjusted-R2 may rise orfall; when the model contains fewer (more) variables,adjusted-R2 will rise (drop) if the increase (decrease) in d.o.f.due to the omission (addition) of variables outweighs the fall(rise) in the explanatory power of the regression.

I For the simple linear regression model, the t statistic forH0 : β1 = 0 is 6.834, which is

√F =

√46.7. This result does

not hold for multiple regression.

13 / 57




I Note that R2 decreases by 4.38% from 0.6206 to 0.5934whereas adjusted-R2 decreases by 2.58% from 0.5961 to0.5807. Recall that when explanatory variables are dropped(added), R2 always falls (rises), but adjusted-R2 may rise orfall; when the model contains fewer (more) variables,adjusted-R2 will rise (drop) if the increase (decrease) in d.o.f.due to the omission (addition) of variables outweighs the fall(rise) in the explanatory power of the regression.

I For the simple linear regression model, the t statistic forH0 : β1 = 0 is 6.834, which is

√F =

√46.7. This result does

not hold for multiple regression.

13 / 57



Multicollinearity

I There is another serious consequence of adding too manyvariables to a model besides depleting the model’s d.o.f. If amodel has several variables, it is likely that some of thevariables will be strongly correlated. This problem, known asmulticollinearity, can drastically alter the results from onemodel to another, making them harder to interpret.

I The most extreme form of multicollinearity is perfectmulticollinearity. It refers to the situation where anexplanatory variable can be expressed as an exact linearcombination of some of the others. Under perfectmulticollinearity, O.L.S. fails to produce estimates of thecoefficients. A classic example of perfect multicollinearity isthe ”dummy variable trap”.

14 / 57



Multicollinearity

I There is another serious consequence of adding too manyvariables to a model besides depleting the model’s d.o.f. If amodel has several variables, it is likely that some of thevariables will be strongly correlated. This problem, known asmulticollinearity, can drastically alter the results from onemodel to another, making them harder to interpret.

I The most extreme form of multicollinearity is perfectmulticollinearity. It refers to the situation where anexplanatory variable can be expressed as an exact linearcombination of some of the others. Under perfectmulticollinearity, O.L.S. fails to produce estimates of thecoefficients. A classic example of perfect multicollinearity isthe ”dummy variable trap”.

14 / 57



Multicollinearity

I (Imperfect) multicollinearity is also known as near collinearity- the explanatory variables are linearly correlated but they donot obey an exact linear relationship.

I Consider the following three models explaining therelationship between HOUSING (number of housing starts (inthousands) in the U.S., and POP (U.S. population inmillions), GDP (U.S. Gross Domestic Product in billions ofdollars) and INTRATE (new home mortgage interest rate)between 1963 to 1985:

1)HOUSINGt = β0 + β1POPt + β2INTRATEt + εt

2)HOUSINGt = β0 + β3GDPt + β2INTRATEt + εt

3)HOUSINGt = β0 + β1POPt + β2INTRATEt + β3GDPt + εt

15 / 57



Multicollinearity

I (Imperfect) multicollinearity is also known as near collinearity- the explanatory variables are linearly correlated but they donot obey an exact linear relationship.

I Consider the following three models explaining therelationship between HOUSING (number of housing starts (inthousands) in the U.S., and POP (U.S. population inmillions), GDP (U.S. Gross Domestic Product in billions ofdollars) and INTRATE (new home mortgage interest rate)between 1963 to 1985:

1)HOUSINGt = β0 + β1POPt + β2INTRATEt + εt

2)HOUSINGt = β0 + β3GDPt + β2INTRATEt + εt

3)HOUSINGt = β0 + β1POPt + β2INTRATEt + β3GDPt + εt

15 / 57



Multicollinearity

I Results for the first model:

The REG Procedure Model: MODEL1 Dependent Variable: housing Number of Observations Read 23 Number of Observations Used 23 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1125359 562679 7.50 0.0037 Error 20 1500642 75032 Corrected Total 22 2626001 Root MSE 273.91987 R-Square 0.4285 Dependent Mean 1601.07826 Adj R-Sq 0.3714 Coeff Var 17.10846 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -3813.21672 1588.88417 -2.40 0.0263 pop 1 33.82138 9.37464 3.61 0.0018 intrate 1 -198.41880 51.29444 -3.87 0.0010

16 / 57



Multicollinearity

I Results for the second model:

The REG Procedure Model: MODEL1 Dependent Variable: housing Number of Observations Read 23 Number of Observations Used 23

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1134747 567374 7.61 0.0035 Error 20 1491254 74563 Corrected Total 22 2626001 Root MSE 273.06168 R-Square 0.4321 Dependent Mean 1601.07826 Adj R-Sq 0.3753 Coeff Var 17.05486 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 687.92418 382.69637 1.80 0.0874 gdp 1 0.90543 0.24899 3.64 0.0016 intrate 1 -169.67320 43.83996 -3.87 0.0010

17 / 57



Multicollinearity

I Results from Models 1) and 2) both make sense - estimates ofthe coefficients are of the expected signs: β1 > 0, β2 < 0 andβ3 > 0 and the coefficients are all highly significant.

I Consider the third model that combines regressors of the firstand second models:


Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1147699 382566 4.92 0.0108 Error 19 1478302 77805 Corrected Total 22 2626001 Root MSE 278.93613 R-Square 0.4371 Dependent Mean 1601.07826 Adj R-Sq 0.3482 Coeff Var 17.42177 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1317.45317 4930.68042 -0.27 0.7922 pop 1 4.91398 36.55401 0.41 0.6878 gdp 1 0.52186 0.97391 0.54 0.5983 intrate 1 -184.77902 58.10610 -3.18 0.0049

18 / 57



Multicollinearity

I Results from Models 1) and 2) both make sense - estimates ofthe coefficients are of the expected signs: β1 > 0, β2 < 0 andβ3 > 0 and the coefficients are all highly significant.

I Consider the third model that combines regressors of the firstand second models:


Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1147699 382566 4.92 0.0108 Error 19 1478302 77805 Corrected Total 22 2626001 Root MSE 278.93613 R-Square 0.4371 Dependent Mean 1601.07826 Adj R-Sq 0.3482 Coeff Var 17.42177 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1317.45317 4930.68042 -0.27 0.7922 pop 1 4.91398 36.55401 0.41 0.6878 gdp 1 0.52186 0.97391 0.54 0.5983 intrate 1 -184.77902 58.10610 -3.18 0.0049

18 / 57



Multicollinearity

I In the third model, POP and GDP change to becominginsignificant although they are both significant when enteringseparately in the first and second models. This is because thethree explanatory variables are strongly correlated. Thepairwise sample correlations of the three variables are asfollows: rGDP,POP = 0.99, rGDP,INTRATE = 0.88 andrPOP,INTRATE = 0.91.

19 / 57



Multicollinearity

I Consider another example that relates EXPENSES, cumulativeexpenditure on the maintenance of an automobile, to MILES,the cumulative mileage in thousand of miles, and WEEKS, theautomobile’s age in weeks since first purchase, for 57automobiles. The following three models are considered:

1)EXPENSESt = β0 + β1WEEKSt + εt

2)EXPENSESt = β0 + β2MILESt + εt

3)EXPENSESt = β0 + β1WEEKSt + β2MILESt + εt

I A priori, we expect β1 > 0 and β2 > 0; a car that is drivenmore should have a greater maintenance expense; similarly,the older the car the greater the cost of maintaining it.

20 / 57



Multicollinearity

I Consider another example that relates EXPENSES, cumulativeexpenditure on the maintenance of an automobile, to MILES,the cumulative mileage in thousand of miles, and WEEKS, theautomobile’s age in weeks since first purchase, for 57automobiles. The following three models are considered:

1)EXPENSESt = β0 + β1WEEKSt + εt

2)EXPENSESt = β0 + β2MILESt + εt

3)EXPENSESt = β0 + β1WEEKSt + β2MILESt + εt

I A priori, we expect β1 > 0 and β2 > 0; a car that is drivenmore should have a greater maintenance expense; similarly,the older the car the greater the cost of maintaining it.

20 / 57



Multicollinearity

I Consider results for the three models:

The REG Procedure Model: MODEL1 Dependent Variable: expenses Number of Observations Read 57 Number of Observations Used 57 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 66744854 66744854 491.16 <.0001 Error 55 7474117 135893 Corrected Total 56 74218972 Root MSE 368.63674 R-Square 0.8993 Dependent Mean 1426.57895 Adj R-Sq 0.8975 Coeff Var 25.84061 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -626.35977 104.71371 -5.98 <.0001 weeks 1 7.34942 0.33162 22.16 <.0001

21 / 57



Multicollinearity

The REG Procedure Model: MODEL1 Dependent Variable: expenses Number of Observations Read 57 Number of Observations Used 57 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 63715228 63715228 333.63 <.0001 Error 55 10503743 190977 Corrected Total 56 74218972 Root MSE 437.00933 R-Square 0.8585 Dependent Mean 1426.57895 Adj R-Sq 0.8559 Coeff Var 30.63338 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -796.19928 134.75770 -5.91 <.0001 miles 1 53.45246 2.92642 18.27 <.0001

22 / 57



Multicollinearity

The REG Procedure Model: MODEL1 Dependent Variable: expenses Number of Observations Read 57 Number of Observations Used 57 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 70329066 35164533 488.16 <.0001 Error 54 3889906 72035 Corrected Total 56 74218972 Root MSE 268.39391 R-Square 0.9476 Dependent Mean 1426.57895 Adj R-Sq 0.9456 Coeff Var 18.81381 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 7.20143 117.81217 0.06 0.9515 weeks 1 27.58405 2.87875 9.58 <.0001 miles 1 -151.15752 21.42918 -7.05 <.0001

23 / 57



Multicollinearity

I It is interesting to note that even though the coefficientestimate for MILES is positive in the second model, it isnegative in the third model. Thus there is a reversal in sign.

I The magnitude of the coefficient estimate for WEEKS alsochanges substantially.

I The t-statistics for MILES and WEEKS are also much lower inthe third model even though both variables are still significant.

I The problem is again high correlation between WEEKS andMILES.

24 / 57



Multicollinearity





24 / 57



Multicollinearity





24 / 57



Multicollinearity





24 / 57



Multicollinearity

I To explain, consider the model

Yt = β0 + β1X1t + β2X2t + εt

It can be shown that

var(b1) = σ2∑nt=1(X1t−X1)2(1−r212)

and

var(b2) = σ2∑nt=1(X2t−X2)2(1−r212)

,

where r12 is the sample correlation between X1t and X2t .

25 / 57



Multicollinearity

I The effects of increasing r12 on var(b2):

r12 var(b2)

0 σ2∑nt=1

(X2t−X2)2 = V

0.5 1.33×V0.7 1.96×V0.8 2.78×V0.9 5.26×V0.95 10.26×V0.97 16.92×V0.99 50.25×V0.995 100×V0.999 500×V

I The sign reversal and decrease in t values (in absolute terms)are caused by the inflated variances of the estimators.

26 / 57



Multicollinearity

I Common consequences of multicollinearity:- Wider confidence intervals.- Insignificant t statistics.- High R2 and consequently F can convincingly rejectH0 : β1 = β2 = · · · = βk = 0, but few significant t values.- O.L.S. estimates and their standard errors are very sensitiveto small changes in model.

I Multicollinearity is very much a norm in regression analysisinvolving non-experimental data. It can never be eliminated.The question is not about the existence or non-existence ofmulticollinearity, but how serious the problem is.

27 / 57



Multicollinearity

I Common consequences of multicollinearity:- Wider confidence intervals.- Insignificant t statistics.- High R2 and consequently F can convincingly rejectH0 : β1 = β2 = · · · = βk = 0, but few significant t values.- O.L.S. estimates and their standard errors are very sensitiveto small changes in model.

I Multicollinearity is very much a norm in regression analysisinvolving non-experimental data. It can never be eliminated.The question is not about the existence or non-existence ofmulticollinearity, but how serious the problem is.

27 / 57



Identifying multicollinearity

How to identify multicollinearity?

I High R2 (and significant F value) but low values of tstatistics.

I Coefficient estimates and standard errors are sensitive to smallchanges in model specification.

I High pairwise correlations between the explanatory variables,but the converse need not be true. In other words,multicollinearity can still be a problem even though thecorrelation between two variables does not appear to be high.It is possible for three or more variables to be stronglycorrelated with low pairwise correlations.

28 / 57








28 / 57








28 / 57




I variance inflation factor (VIF): The VIF for the variable Xj is

VIFj = 11−R2

j,

where R2j is the coefficient of determination of the regression

of Xj on the remaining explanatory variables. The VIF is ameasure of the strength of the relationship between eachexplanatory variable and all other explanatory variables.

I Relationship between R2j and VIFj :

R2j VIFj

0 10.9 10

0.99 100

29 / 57




I variance inflation factor (VIF): The VIF for the variable Xj is

VIFj = 11−R2

j,

where R2j is the coefficient of determination of the regression

of Xj on the remaining explanatory variables. The VIF is ameasure of the strength of the relationship between eachexplanatory variable and all other explanatory variables.

I Relationship between R2j and VIFj :

R2j VIFj

0 10.9 10

0.99 100

29 / 57




I Rule of thumb for using VIF:- An individual VIFj larger than 10 indicates thatmulticollinearity may be seriously influencing the least squaresestimates of regression coefficients.- If the average of the VIFj ’s of the model exceeds 5 thenmuilticollinearity is considered to be serious.

30 / 57




I For the HOUSING example,


Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F Model 3 1147699 382566 4.92 0.0108 Error 19 1478302 77805 Corrected Total 22 2626001 Root MSE 278.93613 R-Square 0.4371 Dependent Mean 1601.07826 Adj R-Sq 0.3482 Coeff Var 17.42177 Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation Intercept 1 -1317.45317 4930.68042 -0.27 0.7922 0 pop 1 14.91398 36.55401 0.41 0.6878 87.97808 gdp 1 0.52186 0.97391 0.54 0.5983 64.66953 intrate 1 -184.77902 58.10610 -3.18 0.0049 7.42535

31 / 57



Solutions to multicollinearity

Solutions to multicollinearity:

I Benign neglect: If an analyst is less interested in interpretingindividual coefficients but more interested in forecasting thenmulticollinearity may not a serious concern. Even with highcorrelations among independent variables, if the regressioncoefficients are significant and have meaningful signs andmagnitudes, one need not be too concerned withmulticollinearity.

I Eliminating Variables: Remove the variable with strongcorrelation with the rest would generally improve thesignificance of other variables. There is a danger, however, inremoving too many variables from the model because thatwould lead to bias in the estimates.

32 / 57




Solutions to multicollinearity:

I Benign neglect: If an analyst is less interested in interpretingindividual coefficients but more interested in forecasting thenmulticollinearity may not a serious concern. Even with highcorrelations among independent variables, if the regressioncoefficients are significant and have meaningful signs andmagnitudes, one need not be too concerned withmulticollinearity.

I Eliminating Variables: Remove the variable with strongcorrelation with the rest would generally improve thesignificance of other variables. There is a danger, however, inremoving too many variables from the model because thatwould lead to bias in the estimates.

32 / 57




I Respecify the model: For example, in the housing regression,we can include the variables as per capita rather thanincluding population as an explanatory variable, leading to

HOUSINGt/POPt = β0 + β1GDPt/POPt + β2INTRATEt + εt The REG Procedure

Model: MODEL1 Dependent Variable: phousing Number of Observations Read 23 Number of Observations Used 23 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 26.33472 13.16736 7.66 0.0034 Error 20 34.38472 1.71924 Corrected Total 22 60.71944 Root MSE 1.31120 R-Square 0.4337 Dependent Mean 7.50743 Adj R-Sq 0.3771 Coeff Var 17.46531 Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation Intercept 1 2.07920 3.34724 0.62 0.5415 0 pgdp 1 0.93567 0.36701 2.55 0.0191 3.45825 intrate 1 -0.69832 0.18640 -3.75 0.0013 3.45825

33 / 57




I Increase the sample size if additional information is available.

I Use alternative estimation techniques such as Ridge regressionand principal component analysis (beyond the scope of thiscourse)

34 / 57




I Increase the sample size if additional information is available.

I Use alternative estimation techniques such as Ridge regressionand principal component analysis (beyond the scope of thiscourse)

34 / 57



3.1 First-order Autocorrelation and the Durbin-Watson Test3.2 Correction for Autocorrelation

First-order autocorrelation

I As described previously, the standard linear regression modelis assumed to be such that εt and εt+k are uncorrelated for allk 6= 0. When this assumption fails the situation is known asautocorrelation or serial correlation.

I The interpretation of such a situation is that the disturbanceat time t influences not only the current value of thedependent variable but also values of the dependent variableat other times.

I Many factors can cause autocorrelation, e.g., omittedexplanatory variables, misspecification of functional form,measurement errors, patterns of business cycles, to name afew.

35 / 57








35 / 57








35 / 57





I There are many possible specifications of correlations amongdisturbances. The simplest, and also most common type isfirst-order autocorrelation by which the current disturbancedepends linearly upon the immediate past disturbance plusanother disturbance term that exhibits no autocorrelation overtime, i.e.,

εt = ρεt−1 + νt ,

where νt ’s are uncorrelated and ρ is an autocorrelationcoefficient.

I It is required that −1 < ρ < 1 to fulfill the assumption of”stationarity” (see Chapter 5).

36 / 57





I There are many possible specifications of correlations amongdisturbances. The simplest, and also most common type isfirst-order autocorrelation by which the current disturbancedepends linearly upon the immediate past disturbance plusanother disturbance term that exhibits no autocorrelation overtime, i.e.,

εt = ρεt−1 + νt ,

where νt ’s are uncorrelated and ρ is an autocorrelationcoefficient.

I It is required that −1 < ρ < 1 to fulfill the assumption of”stationarity” (see Chapter 5).

36 / 57




Durbin-Watson Test

I The problem with O.L.S. under autocorrelation is that it leadsto inefficient estimators of the coefficients and a biasedestimator of the error variance. Alternative estimationstrategies other than O.L.S. are typically used whendisturbances are autocorrelated.

I How to test for first-order autocorrelation?

I The Durbin-Watson (DW) test is the most common test. TheDW test statistic is given by

DW =∑n

t=2(et−et−1)2∑nt=1 e

2t

,

where et = Yt − Yt .

37 / 57




Durbin-Watson Test




DW =∑n


2t

,


37 / 57




Durbin-Watson Test




DW =∑n


2t

,


37 / 57




Durbin-Watson Test

Note that

DW =

∑nt=2(et − et−1)2∑n

t=1 e2t

=

∑nt=2 e

2t +

∑nt=2 e

2t−1 − 2

∑nt=2 etet−1∑n

t=1 e2t

=

∑nt=1 e

2t − e21 +

∑nt=1 e

2t − e2n − 2

∑nt=2 etet−1∑n

t=1 e2t

=2∑n

t=1 e2t − 2

∑nt=2 etet−1 − (e21 + e2n)∑nt=1 e

2t

= 2(1− r)− (e21 + e2n)∑nt=1 e

2t

,

where r =∑n

t=2 etet−1∑nt=1 e

2t

is the sample autocorrelation coefficient.

38 / 57




Durbin-Watson Test

I When the sample size is sufficient large, DW ≈ 2(1− r).

I If this was based on the true εt ’s then DW tends in the limitto 2(1− ρ) as n increases. This means- if ρ→ 0, then DW → 2- if ρ→ 1, then DW → 0- if ρ→ −1, then DW → 4

I Therefore, a test of H0 : ρ = 0 can be based on whether DWis close to 2 or not.

I Unfortunately, the critical values of DW depend on the valuesof the explanatory variables and these vary from one data setto another.

39 / 57




Durbin-Watson Test





39 / 57




Durbin-Watson Test





39 / 57




Durbin-Watson Test





39 / 57




Durbin-Watson Test

I To get around this problem, Durbin and Watson establishedthe lower (dL) and upper (dU) bounds for the DW criticalvalue.

I If DW > 4− dL or DW < dL, then we reject H0.If the observed dU < DW < 4− dU , then we do not reject H0.If DW lies in neither of these two regions then the test isinconclusive.

I See the DW table uploaded on the website. Note that dL anddU are tabulated in terms of n and k ′ = k − 1 = number ofcoefficients excluding the intercept.

I An intercept term must be present in order for dL’s and dU ’sto be valid.

40 / 57




Durbin-Watson Test





40 / 57




Durbin-Watson Test





40 / 57




Durbin-Watson Test





40 / 57




Durbin-Watson Test

I To be more specific, for testingH0 : ρ = 0 vs. H1 : ρ > 0, the decision rule is to reject H0 ifDW < dL and not to reject H0 if DW > dU ; the test isinconclusive if dL < DW < dU .

I For testingH0 : ρ = 0 vs. H1 : ρ < 0, the decision rule is to reject H0 ifDW > 4− dL and not to reject H0 if DW < 4− dU ; the testis inconclusive if 4− dU < DW < 4− dL.

41 / 57




Durbin-Watson Test

I To be more specific, for testingH0 : ρ = 0 vs. H1 : ρ > 0, the decision rule is to reject H0 ifDW < dL and not to reject H0 if DW > dU ; the test isinconclusive if dL < DW < dU .

I For testingH0 : ρ = 0 vs. H1 : ρ < 0, the decision rule is to reject H0 ifDW > 4− dL and not to reject H0 if DW < 4− dU ; the testis inconclusive if 4− dU < DW < 4− dL.

41 / 57




Durbin-Watson Test

I SAS calculates the DW statistic by the option DW in PROCREG.

I For example, for our previous sugarcane plant area example,one can calculate the DW statistic by

proc reg data=bangladesh;model lna=lnps lnpj/dw;run;

I yielding the results

The REG Procedure Model: MODEL1 Dependent Variable: lna Durbin-Watson D 1.160 Number of Observations 34 1st Order Autocorrelation 0.412

42 / 57




Durbin-Watson Test

I SAS calculates the DW statistic by the option DW in PROCREG.

I For example, for our previous sugarcane plant area example,one can calculate the DW statistic by

proc reg data=bangladesh;model lna=lnps lnpj/dw;run;

I yielding the results

The REG Procedure Model: MODEL1 Dependent Variable: lna Durbin-Watson D 1.160 Number of Observations 34 1st Order Autocorrelation 0.412

42 / 57




Durbin-Watson Test

I As r = 0.412, we test H0 : ρ = 0 vs. H1 : ρ > 0.

I DW = 1.16. For n = 34, k ′ = 2, at the 5% significance level,dL = 1.33 and dU = 1.58.

I We therefore reject H0 and conclude that there is a significantfirst-order autocorrelation in the disturbances.

43 / 57




Durbin-Watson Test




43 / 57




Durbin-Watson Test




43 / 57




Correction for Autocorrelation

I Many alternative least squares procedures have beenintroduced for autocorrelation correction, e.g.,Cochrane-Orcutt procedure, Prais-Winstein procedure.

I SAS uses an AUTOREG procedure that augments the originalregression model with the autocorrelated disturbance function.For example, in the case of the sugarcane plant arearegression example, AUTOREG considers the following model:

lnAt = β0 + β1lnPSt + β2lnPJt + εt ;εt = −ζεt−1 + νt , where ζ = −ρ.

The procedure simultaneously estimates β0, β1, β2 and ρ.

44 / 57





I Many alternative least squares procedures have beenintroduced for autocorrelation correction, e.g.,Cochrane-Orcutt procedure, Prais-Winstein procedure.

I SAS uses an AUTOREG procedure that augments the originalregression model with the autocorrelated disturbance function.For example, in the case of the sugarcane plant arearegression example, AUTOREG considers the following model:

lnAt = β0 + β1lnPSt + β2lnPJt + εt ;εt = −ζεt−1 + νt , where ζ = −ρ.

The procedure simultaneously estimates β0, β1, β2 and ρ.

44 / 57





I The SAS commands and outputs are as follows:

proc autoreg data=bangladesh;model lna=lnps lnpj/nlag=1;run;

The AUTOREG Procedure Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 1 -0.412279 0.166336 -2.48

Yule-Walker Estimates SSE 3.92945271 DFE 30 MSE 0.13098 Root MSE 0.36191 SBC 37.4123083 AIC 31.3068662 MAE 0.27498619 AICC 32.6861766 MAPE 6.21460972 HQC 33.3889968 Durbin-Watson 1.8766 Regress R-Square 0.5304 Total R-Square 0.6928 Parameter Estimates Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 5.3823 2.3813 2.26 0.0312 lnps 1 1.0735 0.1902 5.64 <.0001 lnpj 1 -0.9523 0.3465 -2.75 0.0100

45 / 57





I The DW value has increased to 1.8766, resulting innon-rejection of H0 : ρ = 0.

I The coefficient β2 changes from being insignificant (underO.L.S.) to significant.

I The estimated equation is

lnAt = 5.3823(2.3813)

+ 1.0735(0.1902)

lnPSt − 0.9523(0.3465)

lnPJt + 0.412279(0.166336)

et−1

I The forecast of lnAt thus depends on et−1, the error in thelast period. For out-of-sample forecast of more than oneperiod, et−1 is unknown and set to zero as E (et) = 0.

46 / 57








lnAt = 5.3823(2.3813)

+ 1.0735(0.1902)

lnPSt − 0.9523(0.3465)

lnPJt + 0.412279(0.166336)

et−1


46 / 57








lnAt = 5.3823(2.3813)

+ 1.0735(0.1902)

lnPSt − 0.9523(0.3465)

lnPJt + 0.412279(0.166336)

et−1


46 / 57








lnAt = 5.3823(2.3813)

+ 1.0735(0.1902)

lnPSt − 0.9523(0.3465)

lnPJt + 0.412279(0.166336)

et−1


46 / 57



Seemingly Unrelated Regression Equations

I Sometimes different regression equations may be connectednot because they interact, but because their error terms arerelated.

I For example,

I in demand studies, a system of demand equations is specifiedto explain consumption of different commodities; the potentialcorrelations of the disturbances across the equations arisebecause a shock affecting the demand of one good may spillover and affect demand of other goods.

I firms in the same branch of industry are likely subject tosimilar disturbances.

47 / 57




I Sometimes different regression equations may be connectednot because they interact, but because their error terms arerelated.

I For example,

I in demand studies, a system of demand equations is specifiedto explain consumption of different commodities; the potentialcorrelations of the disturbances across the equations arisebecause a shock affecting the demand of one good may spillover and affect demand of other goods.

I firms in the same branch of industry are likely subject tosimilar disturbances.

47 / 57




I The seemingly unrelated regression equations (S.U.R.E.)model pool the observations of different regressions togetherand allow for ”contemporaneous correlations” of thedisturbances across the different equations.

I S.U.R.E. usually (but not always) leads to improved precisionover O.L.S. that treats each equation separately.

I The equations are ”seemingly unrelated” because they areonly related through the disturbance terms.

48 / 57







48 / 57







48 / 57




I A standard two-equation S.U.R.E. model may be expressed as:

Yt = β0 + β1X1t + β2X2t + β3X3t + · · ·+ βkXkt + εt

Wt = γ0 + γ1Z1t + γ2Z2t + γ3Z3t + · · ·+ γkZkt + ut ,

where

E (εt) = E (ut) = 0,var(εt) = σ21, var(ut) = σ22cov(εtεt−j) = cov(utut−j) = 0 for j 6= 0cov(εtut) 6= 0 and cov(εtut−j) = 0 for j 6= 0.

49 / 57




I Thus, the standard S.U.R.E. model rules out possibilities ofserial correlations or heteroscedasticity in an individualequation or serial correlations across equations, but permitscontemporaneous correlations across the equations.

I The standard S.U.R.E. model has been extended to allow forthe non-standard features described above as well as differentnumber of explanatory variables in the equations. But theseare beyond the scope of our discussion here.

50 / 57




I Thus, the standard S.U.R.E. model rules out possibilities ofserial correlations or heteroscedasticity in an individualequation or serial correlations across equations, but permitscontemporaneous correlations across the equations.

I The standard S.U.R.E. model has been extended to allow forthe non-standard features described above as well as differentnumber of explanatory variables in the equations. But theseare beyond the scope of our discussion here.

50 / 57




I To illustrate the S.U.R.E. technique, consider two firms,General Electric and Westinghouse, indexed by 1 and 2respectively. Consider the following economic modeldescribing gross firm investment of the two firms:

I1t = β0 + β1V1t + β2K1t + εt

I2t = γ0 + γ1V2t + γ2K2t + ut

t = 1, ..20, where I , V and K are annual gross investment,stock market value and capital stock of the firm at thebeginning of the year respectively. The data are taken fromGriffiths, Hill and Judge (1993).

51 / 57




I As General Electric and Westinghouse are in similar lines ofbusiness, the unexplained disturbances that affect the twofirms’ investment decision may be contemporaneouslycorrelated (i.e., the unexplained factor that affects GeneralElectric’s investment at time t may be correlated with asimilar factor that affects Westinghouse’s at the same time).

I O.L.S. regression of individual regressions cannot capture thiscorrelation. We therefore pool the 40 observations and treatthe model as a two-equation system.

52 / 57




I The SAS commands for S.U.R.E. estimation of the abovemodel are as follows. PROC SYSLIN first produces the O.L.S.results from estimating the equations separately followed bythe S.U.R.E. results from joint estimation:

data invest;input i1 v1 k1 i2 v2 k2;cards;...;proc syslin sur;model i1=v1 k1;model i2=v2 k2;run;

53 / 57



Seemingly Unrelated Regression Equations The SAS System SYSLIN Procedure Ordinary Least Squares Estimation Model: I1 Dependent variable: I1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 31632.03023 15816.01511 20.344 0.0001 Error 17 13216.58777 777.44634 C Total 19 44848.61800 Root MSE 27.88272 R-Square 0.7053 Dep Mean 102.29000 Adj R-SQ 0.6706 C.V. 27.25850 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -9.956306 28.925628 -0.344 0.7349 V1 1 0.026551 0.014351 1.850 0.0818 K1 1 0.151694 0.023698 6.401 0.0001 SYSLIN Procedure Ordinary Least Squares Estimation Model: I2 Dependent variable: I2 Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 5165.55292 2582.77646 24.761 0.0001 Error 17 1773.23393 104.30788 C Total 19 6938.78686 Root MSE 10.21312 R-Square 0.7444 Dep Mean 42.89150 Adj R-SQ 0.7144 C.V. 23.81153 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -0.509390 7.389731 -0.069 0.9458 V2 1 0.052894 0.014481 3.653 0.0020 K2 1 0.092406 0.051721 1.787 0.0918

54 / 57




SYSLIN Procedure Seemingly Unrelated Regression Estimation Cross Model Correlation Corr I1 I2 I1 1 0.7289649707 I2 0.7289649707 1 Model: I1 Dependent variable: I1 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -27.719317 27.032828 -1.025 0.3195 V1 1 0.038310 0.013290 2.883 0.0103 K1 1 0.139036 0.023036 6.036 0.0001 Model: I2 Dependent variable: I2 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -1.251988 6.956347 -0.180 0.8593 V2 1 0.057630 0.013411 4.297 0.0005 K2 1 0.063978 0.048901 1.308 0.2082

55 / 57




I Hence O.L.S. estimation produces

I1t = −9.956306(28.925628)

+ 0.026551(0.014351)

V1t + 0.151694(0.023698)

K1t

I2t = −0.509390(7.389731)

+ 0.052894(0.014481)

V2t + 0.092406(0.051721)

K2t

whereas S.U.R.E. estimation yields

I1t = −27.719317(27.032828)

+ 0.038310(0.013290)

V1t + 0.139036(0.023036)

K1t

I2t = −1.251988(6.956347)

+ 0.057630(0.013411)

V2t + 0.063978(0.048901)

K2t

56 / 57




I S.U.R.E. estimation results in smaller standard errors of theestimates and hence more precise estimates of the coefficients.

I S.U.R.E. estimation will result no efficiency gain over O.L.S. if

1. cov(εtut) = 0, or2. the equations contain identical explanatory variables, e.g.,

V1t = V2t and K1t = K2t for all t.

I In our example, the O.L.S. residuals from the two equationshave a contemporaneous correlation of 0.729. It can be testedif the disturbances are indeed correlated.

57 / 57









57 / 57









57 / 57

Documents

CHAPTER 4: Forecasting by Regression