33
Econ107 Applied Econometrics Topics 2-4: discussed under the classical Assumptions 1-6 (or 1-7 when normality is needed for nite-sample inference) Question: what if some of the classical assumptions are violated? Topic 5: deals with violations of Assumption 1 (A1 hereafter). Topics 6-8: deal with three cases of violations of the classical assumptions: multicollinearity (A6), serial correlation (A4), and heteroskedasticity (A5). Questions to be addressed: what is the nature of the problem? what are the consequences of the problem? how is the problem diagnosed? how to remedy the problem? 1

Econ107 Applied Econometricsfaculty.lahoreschool.edu.pk/Academics/Lectures/... · 2010. 11. 10. · Econ107 Applied Econometrics Topics 2-4: discussed under the classical Assumptions

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

  • Econ107 Applied Econometrics

    Topics 2-4: discussed under the classical Assumptions 1-6 (or 1-7 when normality isneeded for finite-sample inference)

    Question: what if some of the classical assumptions are violated?

    Topic 5: deals with violations of Assumption 1 (A1 hereafter). Topics 6-8: deal withthree cases of violations of the classical assumptions: multicollinearity (A6), serialcorrelation (A4), and heteroskedasticity (A5). Questions to be addressed:• what is the nature of the problem?• what are the consequences of the problem?• how is the problem diagnosed?• how to remedy the problem?

    1

  • 6 Multicollinearity (Studenmund, Chapter 8)

    6.1 The Nature of Multicollinearity

    6.1.1 Perfect Multicollinearity

    1. Definition: Perfect multicollinearity exists in the following regression

    Yi = β0 + β1X1i + · · ·+ βkXki + εi, (1)if there exist a set of parameters λj (j = 0, 1, · · · , k, not all equal to zero) suchthat

    λ0X0i + λ1X1i + · · ·+ λkXki = 0, (2)where X0i ≡ 1. (2) must hold for all observations.

    2

  • Alternatively, we could write an independent variable as an exact linear combinationof the others, e.g., if λk 6= 0, we can write (2) as

    Xki = −λ0λk

    X0i −λ1λk

    X1i − · · ·−λk−1λk

    Xk−1,i. (3)

    The last expression says essentially that Xki is redundant and it does not have anyinformation other than those contained in X0i, X1i, · · · ,Xk−1,i to explain Yi.

    3

  • Example. Consider the following regression model for consumption function

    C = β1 + β2N + β3S + β4T + ε,

    where C is consumption, N is nonlabor income, S is salary, T is total income, andε is error term. Since T = N + S, it is not possible to separate individual effects ofthe components (N,S) of income and total income (T ) . According to the model,

    E (C) = β1 + β2N + β3S + β4T.

    But if we let c be any nonzero value, and let β02 = β2 − c, β03 = β3 − c, andβ04 = β4 + c, then

    E (C) = β1 + β02N + β

    03S + β

    04T

    as well for a different set of parameters. This allows the same value of E (C) formany different values of the parameters.

    4

  • 2. Problems(1) Coefficients can’t be estimated. Consider the regression:

    Yi = β0 + β1X1i + β2X2i + εi. (4)

    If X2i = λX1i (λ 6= 0), we will explain that the parameters β1 and β2 cannot beidentified or estimated. To see why, define

    β∗1 = β1 + cλ, and β∗2 = β2 − c,where c can be any constant. (4) is equivalent to

    Yi = β0 + β∗1X1i + β

    ∗2X2i + εi. (5)

    This means that there are an infinite number of c’s that makes (5) hold. In otherwords, there are an infinite number of (β∗1, β∗2) such that (5) holds. We cannotseparate the influence of X1i from that of X2i on Yi.The above analysis extends to a generic MLR model where a regressor can be writtenas a linear combination of others.

    5

  • (2) Standard errors can’t be estimated. In the regression model (4), the standarderror of β̂1 can be written:

    std(β̂1) =

    vuuut σ2Pni=1

    ³X1i −X

    ´2 ³1− r212

    ´,where r12 is the sample correlation between X1i and X2i. In the case of perfectmulticollinearity (e.g., X2i = λX1i + a), r12 = 1 or −1, so that the denominatoris zero. Thus, std(β̂1)= ∞.

    Solution:The solution to perfect multicollinearity is trivial: Drop one or several of the regres-sors. In the above example, we can drop either X2i or X1i so that (4) can be writtenas

    Yi = β0 + (β1 + λβ2)X1i + εi,

    or

    Yi = β0 + (β2 + β1/λ)X2i + εi.

    6

  • By regressing Yi on X1i, we are estimating β1 +λβ2. Analogously, by regressing Yion X2i, we are estimating β2 + β1/λ. In either case, we cannot estimate β1 or β2.

    Remarks.Perfect multicollinearity is fairy easy to avoid. Econometricians almost never talkabout perfect multicollinearity. Instead, when we use the word multicollinearity, weare really talking about severe imperfect multicollinearity.

    7

  • 6.1.2 Imperfect Multicollinearity

    Imperfect multicollinearity can be defined as a linear functional relationship betweentwo or more independent variables that is so strong that it can significantly affectthe estimation of the coefficients of the variables.

    Definition: Imperfect multicollinearity exists in a k-variate regression if

    λ0X0i + λ1X1i + · · ·+ λkXki + vi = 0for some stochastic variable vi.

    Remarks.

    1) As Var(vi)→0, imperfect multicollinearity tends to perfect multicollinearity.

    8

  • 2) Alternatively, we could write any particular independent variable as an “almost”exact linear function of the others. E.g., if λk 6= 0, then

    Xki = −λ0λk

    X0i −λ1λk

    X1i − · · ·−λk−1λk

    Xk−1,i −viλk

    . (6)

    The above equation implies that although the relationship betweenXki andX0i,X1i, · ·might be fairly strong, it is not strong enough to allowXki to be completely explainedby X0i,X1i, · · · ,Xk−1,i; some unexplained variation still remains.

    3) Imperfect multicollinearity indicates a strong linear relationship between the re-gressors. The stronger the relationship between the two or more regressors, the morelikely it is that they will be considered significantly multicollinear.

    9

  • 6.2 The Consequences of (Imperfect) Multicollinearity

    1. Coefficient estimators will remain unbiased. Imperfect multicollinearity doesnot violate the classical assumptions. If all the classical assumptions 1-6 are met, wecan estimate the coefficients and the estimators of β’s will still be centered aroundthe true value of β’s. Moreover, the OLS estimators are still unbiased and are BLUE.

    2. The variances/standard errors of the coefficient estimators “blow up”.They increase with the degree of multicollinearity. Since two or more of the regressorsare significantly related, it becomes too difficult to identify the separate effects ofthe multicollinear variables and we are much more likely to make errors in estimatingthe coefficients than we were before we encountered multicollinearity. So imperfectmulticollinearity reduces the “precision” of our coefficient estimates.

    10

  • For example, in the 2-variate regression case

    std(β̂1) =

    vuuut σ2Pni=1

    ³X1i −X

    ´2 ³1− r212

    ´.As |r12|→1, the standard error → ∞.Numerical example: Suppose the standard error σ2/

    Pni=1

    ³X1i −X

    ´2= 1, i.e.,

    std(β̂1)=1 when r12 = 0.If r12=0.10, then the standard error=1.01.If r12=0.25, then the standard error=1.03.If r12=0.50, then the standard error=1.15.If r12=0.75, then the standard error=1.51.If r12=0.90, then the standard error=2.29.If r12=0.99, then the standard error=7.09.Standard error increases at an increasing rate with the multicollinearity between theexplanatory variables. As a result, we will have wider confidence intervals and possiblyinsignificant t ratios on our coefficient estimates because t1 = bβ1/se(bβ1).

    11

  • 0 0.2 0.4 0.6 0.8 11

    2

    3

    4

    5

    6

    7

    8

    r12

    std(β 1^

    )

    Figure 1: A consequence of impefect multicollinearity: the blow up of standard errors

    12

  • 3. The computed t-ratios will fall. This means that we’ll have more difficulty inrejecting the null hypothesis that a slope coefficient is equal to zero.This problem is closely related to the problem of a “small sample size”. In both cases,standard errors “blow up”. With a small sample size the denominator is reduced bythe lack of variation in the explanatory variable.

    4. Coefficient estimates become very sensitive to the changes in specificationand number of observations.• The coefficient estimates may be very sensitive to the addition of one or a smallnumber of observations.• The coefficient estimates may be sensitive to the deletion of a statistically insignif-icant variable.• One may get very odd coefficient estimates possibly with wrong signs due to thehigh variance of the estimator.

    5. The overall fit of the model will be largely unaffected. Even though theindividual t-ratios are often quite low in the case of imperfect multicollinearity, theoverall fit of the equation (R2 or R2) will not fall much.

    13

  • A hypothetical example.Suppose we want to estimate a student’s consumption function. After some prelim-inary work, we come up with the following equation

    Ci = β0 + β1Yd,i + β2LAi + εi,

    where Ci =the annual consumption expenditures of the ith studentYd,i =the annual disposable income (including gifts) of the ith studentLAi =the liquid asset (cash, savings, etc.) of the ith student.Please analyze the following regression outputs:

    bCi = −367.83 + 0.5133Yd,i + 0.0427LAi (7)(1.0307) (0.0492)

    t [0.496] [0.868]

    n = 9, R2 = 0.835.

    14

  • bCi = −471.43 + 0.9714Yd,i (8)(0.157)

    t [6.187]

    n = 9, R2 = 0.861.

    An empirical example: petroleum consumptionSuppose that we are interested in building a cross-sectional model of the demand forgasoline by state:

    Ci = β0 + β1Milei + β2Taxi + β3Regi + εi,

    where Ci =the petroleum consumption in the ith stateMilei =the urban highway miles within the ith stateTaxi =the gasoline tax rate in the ith stateRegi =the motor vehicle registrations in the ith state.

    15

  • Please analyze the following regression outputs:

    bCi = 389.0 + 60.8Milei − 36.5Taxi − 0.061Regi (9)(10.3) (13.2) (0.043)

    t [5.92] [-2.77] [-1.43]

    n = 50, R2= 0.919.

    bCi = 551.7− 53.6Taxi + 0.186Regi (10)(16.9) (0.012)

    t [-3.18] [15.88]

    n = 50, R2= 0.861.

    16

  • 6.3 The Detection of Multicollinearity

    It is worth mentioning that multicollinearity exists in almost all equations. It is virtu-ally impossible in the real world to find a set of independent variables that are totallyuncorrelated with each other. Our purpose is to learn to determine how much mul-ticollinearity exists by using three general indicators or diagnostic tools.

    1. t-ratios versus R2. Look for a ‘high’ R2, but ‘few’ significant t ratios.Remarks.(1) Common “rule of thumb”. Can’t reject the null hypotheses that coefficients areindividually equal to zero (t tests), but can reject the null hypothesis that they aresimultaneously equal to zero (F test).(2) This is not an exact test. What do we mean by “few” significant t ratios, and a“high” R2? Too imprecise. Also depends on other factors like the sample size.

    17

  • 2. Correlation matrix of regressors. Look for high pair-wise correlation coeffi-cients. Look at the correlation matrix for the regressors.

    Remarks.(1) How high is high? As a rule of thumb, we can use 0.8. If the sample correlationexceeds 0.8 in absolute value, we should be concerned about multicollinearity.(2) Multicollinearity refers to a linear relationship among all or some of the regressors.Any pair of independent variables may not be highly correlated, but one variable maybe a linear function of a number of others. In a 2-variate regression, multicollinearityis the correlation between the 2 explanatory variables.(3) This is a “... sufficient, but not a necessary condition for multicollinearity.” Inother words, if you’ve got a high pairwise correlation, you’ve got problems. However,it isn’t conclusive evidence of an absence of multicollinearity.

    18

  • 3. High variance inflation factors (VIFs).

    The variance inflation factor (VIF) is a method of detecting the degree of multi-collinearity by looking at the extent to which a given explanatory variable can beexplained by all other explanatory variables in the equation. So there is a VIF foreach regressor.

    Suppose we want to use VIF to detect multicollinearity in the following regression:

    Yi = β0 + β1X1i + · · ·+ βkXki + εi. (11)Let bβj denote the OLS estimator of βj in the above regression. We need to calculatek different VIFs, one for each Xji (j = 1, · · · , k).1) Run the following k regressions:

    X1i = γ0 + γ2X2i + · · ·+ γkXk,i + v1i· · ·

    Xki = α0 + α1X1i + · · ·+ αk−1Xk−1,i + vki

    19

  • 2) Calculate the R2 for each of the above k regressions and denote R2j as the R2

    from the linear regression of Xji on all other regressors in (11). The VIF for bβj isdefined by

    VIF(bβj) = 11−R2j

    .

    The higher VIF(bβj), the higher the variance of bβj (holding constant the variance ofthe error term) and the more severe the effects of multicollinearity.Remarks.1) How high is high? As a common rule of thumb, if VIF(bβj) > 5 for some j, then the multi-collinearity is severe.

    2) As the number of regressors increases, it makes sense to increase the above number (5) slightly.

    3) In Eviews we can calculate the VIF(bβj) after the jth regression (i.e, runXji = α0+α1X1i+· · · + αj−1Xj−1,i +αj+1Xj+1,i + · · · + αkXk,i + vki, and name the equation as eqjafter the regression) by typing in the command window

    scalar VIFj=1/(1-eqj.@R2)

    Summary: No single test for multicollinearity.

    20

  • 6.4 Remedies for Multicollinearity

    Once we’re convinced that multicollinearity is present, what can we do about it? Thediagnosis of the ailment isn’t clear cut, neither is the treatment. Appropriateness ofthe following remedial measures varies from one situation to another.

    Example. Estimating the labour supply of married women from 1950 -1999:

    Hourst = β0 + β1Ww,t + β2Wm,t + εt, (12)

    where: Hourst = Average annual hours of work of married womenWw,t= Average wage rate for married womenWm,t = Average wage rate for married men.Suppose the regression output is

    dHourst = 733.71 + 48.37Ww,t − 22.91Wm,t(34.97) (29.01)

    n = 50, R2 = 0.847

    21

  • Multicollinearity is a problem here. The t-ratios are less than 1.5 and 1, respectively(insignificant at 10% levels). Yet, R2 is 0.847. It is easy to confirm multicollinearityin this case. The correlation between the two wage rates is as high as 0.99 overour sample period! Standard errors blow up. We can’t separate the wage effects onlabour supply of married women.

    Possible Solutions?

    22

  • 1. A Priori Information

    If we know the relationship between the slope coefficients, we can substitute thisrestriction into the regression and eliminate the multicollinearity. This relies heavilyon economic theory.

    Example. If we use time series data to estimate the Cobb-Douglass productionfunction or the elasticity of output (Y ) with respect to the capital (K) and labor(L) , we may have multicollinearity problem because as time evolves, both K and Lincrease and they can be highly correlated. Suppose that we have a constant return toscale in the Cobb-Douglass production function Yt = AK

    β1t L

    β2t e

    εt (β1 + β2 = 1).We can impose the restriction β1 + β2 = 1 in the following regression:

    lnYt = β0 + β1 lnKt + β2 lnLt + εt

    23

  • by plugging β2 = 1− β1 into the above equation to obtainlnYt = β0 + β1 lnKt + (1− β1) lnLt + εt

    ⇐⇒lnYt − lnLt = β0 + β1 (lnKt − lnLt) + εt

    ⇐⇒ln (Yt/Lt) = β0 + β1 ln (Kt/Lt) + εt.

    That is we can estimate β1 by regressing ln (Yt/Lt) on a constant and ln (Kt/Lt) .After we obtain estimate bβ1 of β1, we can obtain estimate β2 bybβ2 = 1− bβ1.Remarks. Unfortunately, such a priori information is extremely rare.

    24

  • 2. Dropping a VariableIn the example of labour supply of married women, suppose we omit the wage ofmarried men and estimate the following model

    Hourst = α0 + α1Ww,t + vt. (13)

    In this example, it seems natural to drop the variable Wm,t. In other cases, it maymake no statistical significance which variable is dropped. One has to rely on thetheoretical underpinnings of the model or common sense.

    Some cautionary note. Sometimes we have to be careful when we consider droppinga variable in case of multicollinearity. If one variable should appear in the regressionwhile we have dropped it, then we will encounter the problem of “omitted variablebias”. So we are substituting one problem for another. The remedy may be worsethan the disease.Suppose that Wm,t should appear in (12), then the OLS estimator bα1 in (13) islikely to be biased for β1:

    E(α̂1) = β1 + β2b12,

    where b12 is associated with the correlation between Ww,t and Wm,t.

    25

  • 3. Transformation of the VariablesOne of the simplest things to do with time series regressions is to run the regressionon the “first differences” data.Start with the original specification at time t:

    Hourst = β0 + β1Ww,t + β2Wm,t + εt, (14)

    The same linear relationship holds for the previous period (t− 1) as well:Hourst−1 = β0 + β1Ww,t−1 + β2Wm,t−1 + εt−1. (15)

    Subtracting (15) from (14) yields

    Hourst−Hourst−1 = β1³Ww,t −Ww,t−1

    ´+β2

    ³Wm,t −Wm,t−1

    ´+(εt − εt−1) ,

    (16)or

    4Hourst = β14Ww,t + β24Wm,t +4εt, (17)where e.g., 4Hourst = Hourst −Hourst−1.The advantage is that “changes” in wage rates may not be as highly correlated astheir “levels”.

    26

  • The disadvantages are:(i) Number of observations are reduced (i.e., loss of one degree of freedom). Thesample period changes from 1950-1999 to 1951-1999, say.(ii) May introduce serial correlation. Even if εt are uncorrelated, 4εt are notbecause

    Cov(4εt,4εt−1)= Cov(εt − εt−1, εt−1 − εt−2)= Cov(εt, εt−1)− Cov(εt, εt−2)− Cov(εt−1, εt−1)+ Cov(εt−1, εt−2)= 0− 0− Var (εt−1) + 0 = −Var (εt−1) 6= 0.

    Again, the cure may be worse than the disease. It violates one of the classicalassumptions and new problems need to be addressed (in later topic).

    27

  • 4. Get More Data. Two possibilities here:(1) Extend the data set. Multicollinearity is a “sample phenomenon”. Wage ratesmay be correlated over the period 1950-1999. Add more years. For example, goback to 1940. Correlation may be reduced. The problem is that more data maynot be available, or the relationship among the variables may have changed (i.e., theregression function isn’t stable over time). More likely that the data are not there.If they are there, why not include them initially?

    (2) Change Nature or Source of Data. If possible, we can switch from time-seriesto cross-sectional analysis or to panel data analysis.• The sample correlation in the cross-sectional data is usually different from that inthe time series data.• The use of panel data potentially reduces the multicollinearity in the total sample.

    For example, we can use a random sample of many households at a point in time.The degree of multicollinearity in wages may be relatively lower ‘between’ spouses.Or, we can use a random sample of households over a number of years.

    28

  • 5. Do Nothing (A Remedy!)• Multicollinearity is not a problem if the objective of the analysis is forecasting. Itdoesn’t affect the overall explanatory power of the regression (i.e., R2).• It is a problem if the objective is to test the significance of individual coefficientsbecause of the inflated variances/standard errors.

    Multicollinearity is often given too much emphasis in the list of common problemswith regression analysis. If it’s imperfect multicollinearity, which is almost alwaysgoing to be the case, then it doesn’t violate the classical assumptions.

    29

  • • Exercise: Q8.11• Questions for Discussion: Example 8.5.2Example 8.5.2: Does the Pope’s 1966 decision to allow Catholics to eat meat onnon-Lent Fridays cause a shift in the demand function for fish?Consider the regression

    Ft = β0 + β1PFt + β2PBt + β3 lnY dt + β4Nt + β5Pt + εt,

    where Ft : average pounds of fish consumed per capita in year tPFt : price index for fish in year tPBt : price index for beef in year tY dt : real per capita disposable income in year t (in billions of dollars)Nt : the number of Catholics in the US in year t (tens of thousands)Pt : =1 after the Pope’s 1966 decision and 0 otherwiseQuestion 1: State the null and alternative hypotheses to test whether the Pope’sdecision plays a negative role in the consumption of fish.Question 2: Some economic theory suggests that as income rises, the portion of

    30

  • that extra income devoted to the consumption of fish will decrease. Is the choice ofsemilog function to relate the disposable income to the consumption of fish consistentwith this theory?Question 3: Suppose the regression output is

    Ft = −1.99 + 0.0395PFt-0.000777PBt + 1.770 lnY dt-0.000031Nt-0.355Pt(0.031) (0.0202) (1.87) (0.000033) (0.353)

    t : [1.27] [-0.0384] [0.945] (-0.958) (-1.01)

    R2 = 0.736, R2= 0.666, n = 25.

    Evaluate the above regression results.Question 4: Are there any signs of multicollinearity in the above regression model?How do you check for this by using simple correlation coefficients? What is thedrawback of this approach? [Hint. To detect multicollinearity with simple correlationcoefficients: After you run the regression, select Procs/Make Regressor group onthe equation window menu bar. Select View/Correlation/Common Sample onthe group object menu bar.]

    31

  • Question 5: How do you check the presence of multicollinearity by using the VIF?Verify that the VIF for PFt and lnY dt, is about 43.4 and 23.3, respectively. Whatdoes this suggest to us?Question 6: Given the high correlation between lnY dt and Nt, it is reasonable todrop one of them. Given that the logic behind including the number of Catholics ina per capita fish consumption equation is fairly weak, we can decide to drop Nt :

    Ft = 7.961+0.028PFt+0.0047PBt+0.360 lnY dt-0.124Pt(0.03) (0.019) (1.15) (0.26)

    t : [0.98] [0.24] [0.31] (-0.48)

    R2 = 0.723, R2= 0.667, n = 25.

    Does this solve the problem?Question 7: In the case of prices, both PFt and PBt are theoretically important,so it is not advisable to drop either one. As an alternative, the textbook authorsuggests to use RPt = PFt/PBt to replace both price variables. Does it make any

    32

  • sense to do so? If so, what is the expected sign of RPt? The regression output nowbecomes

    Ft = −5.17-1.93RPt+2.71 lnY dt+0.0052Pt(1.43) (0.66) (0.281)

    t : [-1.35] [4.13] [0.019]

    R2 = 0.640, R2= 0.588, n = 25.

    Question 8: Based on the last regression output, can you reject the null hypothesisin Question 1?Remark. To calculate VIF(bβj) (j =1,· · · , k) in Eviews:Step 1: Run the regression of Xji on (1, X1i, · · · ,Xj−1,i,Xj+1,i, · · · ,Xki) andname the equation as eqj, for example.Step 2: In the command window type

    scalar vifj=1/(1-eqj.@r2)

    or genr vifj=1/(1-eqj.@r2). The former generate a scalar value for vifj whereas thelatter generates a sequence values for vifj.

    33