31
Autocorrelation 1

# Autocorrelation

Embed Size (px)

Citation preview

Autocorrelation

1

What is Autocorrelation?

• One of the important assumptions of CLRM is absence of autocorrelation/serial correlation among the error terms u i of

the pop. regression fx.

• Assumptions: in CLRM it is assumed that ‘conditional on the X’s, disturbances in two time periods are not correlated: Corr(ut,us|X) = 0 for all t≠s.’

• Alternatively E(ut,us)=0 when t≠s-error term relating to one

obs. is not affected by error relating to other obs.

• When this assumption is violated, errors are correlated across time known as the problem of serial correlation.

• Suppose ut-1>0 and on an avg. ut>0 then we may find serial correlation: Corr(ut,ut-1)>0.

2

What is Autocorrelation?

3

• Example: 3 month T-bill rates (i3t) are regressed on rate of inflation (inft) and deficit as a % of GDP (deft). (Wooldridge, 2003): i3t = β0+β1inft+β2deft+ut

• Assumption of Homoskedasticity: ‘conditional on X, variance of ut is same for all t’: Var(ut|X)=Var(ut)=σ2; t=1, 2….n.

• In this context, homosk. implies that over time, the unobservables (u’s) affecting interest rates have a constant variance over time.

• Autocorrelation in this context implies: if interest rates are high for period t-1, it is likely to be high in period t as well.

• Absence of auto assumes nothing about the correlation of X’s across time, e.g. it is not concerned whether inft is

correlated across time or not (it is generally so).

What is Autocorrelation?

4

• Example: Consider the case of quarterly data of output, labour and capital. If there is a labour strike in one quarter, it is expected to affect output in that quarter. But in the absence of autocorrelation the effect of this strike is not expected to have effect on next period.

• In cross sectional analysis if we assume random sampling, for 2 hhold i and h, ui and uh are independent. Thus serial

correlation is a potential problem in time series data.

Why Serial Correlation?

• Inertia or sluggishness could result in serial correlation. Many time series data (GDP, employment, production) exhibit business cycles and successive observations are likely to be interdependent.

• Omitted Variable Bias, when some important variables are not included in the model:

• Example: demand for beef is a function of its price (X1t), income of consumer’s (X2t) and price of chicken (X3t):

Yt=β0+β1X1t+β2X2t+β3X3t+ut

• If we run the reg. without X3t: Yt=β0+β1X1t+β2X2t+vt

the error term becomes vt= β3X3t+ut and the error term is expected to exhibit a systematic pattern.

5

Why Serial Correlation?

• Incorrect Functional Form: serial correlation can occur if the regression is not correctly specified. E.g. let the model of MC be specified as: MCi=β0+β1Outputi+β2Outputi

2+ui

If the squared term is not included estimated MC will be a linear one and new disturbance, vi=Output2+ui will incorporate

the systematic effect of squared output on MC.

• Cobweb Pattern: for agricultural goods supply reacts to price with a lag of 1 period: St=β0+β1Pt-1+ut and in such cases u’s

are not random because if farmers produces more in period t, they might produce less in t+1.

6

Why Serial Correlation?

• Models with Lag: Const=β0+β1Incomet+β2Const-1+ut.

If the lagged term is not incorporated in the model the error term is expected to capture the effect of lagged variable.

• Manipulation of Data: If we generate data through manipulation (dividing quarterly data by 3 to get monthly data, interpolation, extrapolation) it might incorporate systematic pattern in the disturbances.

7

Consequences: Key Issues

• OLS estimators are linear and unbiased but do not have the minimum variance among all of the linear unbiased estimators.

• OLS estimators are not BLUE in the presence of autocorrelation.

• OLS standard errors and t-statistics are no longer valid.

• The t-statistics will often be too large.

• The usual F test and LM test are invalid.

8

Gauss Markov in Time Series Analysis

• Assumption 1: linear in parametres: yt=β0+β1xt1+…..βkxtk+ut

• Assumption 2: for each t, the expected value of the error term, given the explanatory var. for all time periods is zero: E(ut|X)=0, t=1………n.

• Assumption 3: no perfect collinearity: no explanatory variable is constant or a perfect linear comb. of others.

• Assumption 4: homoskedasticity: the variance of ut (conditional on the X’s) is same for all t: Var(ut|X)=Var(ut)=σ2

• Assumption 5: no serial correlation: conditional on X, errors in 2 time period are uncorrelated:Corr(ut,us|X)=0 for

all t≠s.9

Different Ways Errors are Generated

• Assume that the disturbances can be characterized in the manner: ut=ρut-1+et

where t=1,2..…n and |ρ|<1

• ρ is known as the coefficient of autocovariance.• et are the uncorrelated random variables (satisfies

standard ass. of OLS) with zero mean and constant variance σe

2.

• This structure of the error term is known as 1st order autoregressive process or AR(1).

• Shift in ut can be divided into 2 parts: (i) ρut-1-systematic shift and (ii) et -random term.

• An AR(2) model is the one like: ut=ρ1ut-1+ρ2ut-2+et

10

Different Ways Errors are Generated

• If the disturbance term follows a 1st order Moving Average Scheme MA(1): ut=vt+λvt-1 here v is a random disturbance

term with usual properties and |λ|<1.

• In case of ARMA (1,1): ut=ρut-1+vt+λvt-1

• Higher order scheme can be considered as well.

• Consider a bivariate model: yt=β0+β1xt+ut

• Assume that sample avg of xt is zero

• Let us assume that the disturbances can be characterized as AR(1): ut=ρut-1+et

11

)0( x

OLS Estimator in case of Autocorrelation

12

• OLS estimator of β1:

here

here the 1st term in RHS is when ρ=0 and this is the usual variance and Var(ut)= σ2 and Cov(ut,ut+j)=ρjσ2

t

n

ttx uxSST

1

111

n

ttx xSST

1

2

jt

n

t

tn

jt

jxx

jttjt

n

t

tn

jtt

n

ttx

t

n

ttx

xxSSTSSTVar

uuExxuVarxSSTVar

1

1 1

2221

1

1 11

221

1

21

)/(2)/()(

)](2)([)ˆ(

)()(

)ˆ( 1Var

OLS Estimation in case of Autocorrelation

• If the usual formula of variance is used then the 2nd term is ignored.

• In most cases (i) ρ>0 so ρj>0; (ii) independent variables are positively correlated: xtxt+j is positive. As a result the 2nd term of RHS is positive and variance of OLS estimator (σ2/SSTx)

is an underestimation of the true variance.

• If ρ<0 then ρj>0 when j is even and ρj<0 if j is odd. So the sign of the 2nd term is difficult to determine.

• In both cases in the presence of serial correlation the usual variance estimator will be biased for

13

)ˆ( 1Var

OLS Estimation in case of Autocorrelation

• In addition, stan. error of β-hat is an estimate of the sta. dev. of β-hat, if we use the usual OLS stan. error in the presence of serial correlation, it is no longer valid.

• t-stats are also not valid as the usual t-stat will be artificially large with auto.

• The F test and LM test are not also applicable.

• Under certain assumptions (stationary data, weakly dependent data) the goodness-of-fit measures are still valid.

• In case of cross-sectional data R2=1-(σu2/ σy

2 ). Overtime

variances of error term and dep. variable do not change.

14

Testing For Serial Correlation

• Graphical Method: Absence of auto. is concerned with unknown population disturbance, ut which cannot be

generated through OLS. A graphical plot of u-hat often be

used for an informal detection of auto.

• We can plot the residuals or standardized residuals

(u-hat/σ-hat) against time. If the residuals exhibit a pattern then its possible that ut is non random. We can also plot

against and observe the pattern, whether there is any

systematic association. [insert/draw graph here and show it

in STATA].

15

tu

tu1ˆ tu

Testing for Serial Correlation: t test with Strictly Exogenous Regressors

• Our model is: yt = β0+β1xt1+…….βkxtk+ut

• Assume the regressors are strictly exogenous: error term ut

will be uncorrelated with the regresssors in all time periods. we exclude models with lagged dep. variables, it’s a large sample test.

• Null hypothesis H0: ρ=0• Consider AR(1): ut=ρut-1+et E(ut|X)=0, t=1………n.• E(et|ut-1,ut-2……)=0. Var(et|ut-1)= Var(et)=σe

2

• If ut could be observed: could estimate ρ from the reg. of ut on ut-1 and use t-test for the significance of ρ-hat.

• We don’t observe it, so replace ut with OLS residual ut-hat

and use t-test.16

Testing for Serial Correlation: t test with Strictly Exogenous Regressors

• Run an OLS of yt on X’s and obtain ut-hat. • Regress ut-hat on ut-1-hat and obtain ρ-hat and the

corresponding t-stat.• Use this t-stat to test the null: H0: ρ=0.

• We conclude on the basis of t stat at 5% significant level whether the null is rejected.

• In very large data sets sometimes serial correlation is found even with small value of ρ.

• In addition to AR(1) can detect other form of error and any corr. between adjacent errors can be picked up.

• If adjacent errors are not correlated this test fails to detect serial correlation.

17

The Durbin-Watson (DW) Test

• The assumptions of the Durbin-Watson test are: • The reg. model should include an intercept term.• The X’s are non stochastic, or fixed in repeated sampling.• Disturbances are generated through AR(1) process.• The reg. model cannot include lagged dependent variable

(autoregressive models).• There is no missing observation in the data.• The Durbin-Watson d statistics can be defined as:

• This is the ratio of sum of squared differences in successive residuals to the residual sum square.

18

n

tt

n

ttt

u

uud

2

2

2

21

ˆ

)ˆˆ(

The Durbin-Watson (DW) Test

• Expanding the model we get:

• DW and ρ-hat obtained in t-test are linked: d≈2(1- ρ^)

where assume the adjacent sum squares are equal.

• As we know: -1≤ρ≤+1: 0≤d≤4.• The exact dist. of the statistics is difficult to derive so there

is no unique critical value but upper and lower bound. Reject H0, Reject Ho

indecision do not reject H0 or H0* or both indecision

positive auto no auto negative auto

0 dL dU 2 4-dU 4-dL 4

19

2

121

2

ˆ

ˆˆ2ˆˆ

t

tttt

u

uuuud

2

1

ˆ

ˆˆˆ

t

tt

u

uu

The Durbin-Watson Test

• Run the OLS and get the residual u-hat’s.• Compute d with the formula.• For the sample size and no. of explanatory vars, find dL

and dU. Make a conclusion of the null.• One problem is when it falls within indecision region.• There are different variants of DW test.• Example: • Static Philips curve:• Reg. ut-hat on ut-1-hat and ρ-hat=0.573 with t=4.93.

• DW=2(1- ρ-hat)=0.854; with k=1, n=50, dL=1.32; falls in rejection region so reject null: there is auto.

20

tt unem468.42.1fin

The Durbin-Watson Test

• Expectation Augmented Philips curve:

• Reg. ut-hat on ut-1-hat and ρ-hat=-0.036 with t=-.297.

• DW=2(1- ρ-hat)=2.07; with k=1, n=50, dU=1.59; falls in acceptance region so accept null: no auto.

• Alternatively in STATA: tsset data with time var. run OLS, predict residuals and lag of residuals, reg. ut-hat on ut-1-hat and get ρ-hat.

• For the t-test, simply check the significance of ρ-hat.• For DW test type ‘estat dwatson’ for getting the result of

DW stat.• The manual results and STATA could differ slightly.

21

tt unem543.03.3fin

The Breusch-Godfrey Test for Higher Order Auto

• If the disturbance term ut is generated by the q-th order autoregressive scheme: ut=ρ1ut-1+ ρ2ut-2 ……. ρqut-q+et

• The null is: H0: ρ1=0; ρ2=0…… ρq=0.

• Run the OLS of yt on xt1, xt2….xtk and get the residuals ut-hat for all t=1,2….n.

• Regress ut-hat on all the x’s (xt1, xt2….xtk) and the lagged value of the estimated residuals (ut-1-hat,ut-2-hat etc) for all t=(q+1),…..n.

• Compute the F-test for joint significance of all u-hats.• Alternative is to compute Lagrange Multiplier test (LM)• LM=(n-q)R2

u-hat where the R-sq is the one obtained in the 2nd stage of regression of u-hat.

22

Breusch-Godfrey Test for Higher Order Auto

• Under the null: LM~χ2. If (n-q)R2 exceeds critical χ2 we can reject the null and that means that at least one ρ is sig. different from zero.

• The explanatory variable can contain lagged value of Y. • It is applicable even if the errors follow a qth order MA

process: ut=εt+ λ1εt-1 ……. λqεt-q

• When it is 1st order autoregression (q=1) it is known as Durbin’s m test.

• One problem is, the length of lag q cannot be specified.• Example: Estimate a model of import of Barium Chloride on

several controls (Wooldridge, 2003) by OLS:

23

6565.6032.6060.)log(983.

)log(196.)log(12.380.17)ˆlog(

afdecaffilebefilertwex

gaschempinimphc

BG Test: Example

24

• In STATA: estimate the model with OLS and get predicted value of residual, and lags of order 1, 2, 3.

• Run an OLS of ut-hat on all the x’s and the lagged residuals (ut-1-hat,ut-2-hat, ut-3-hat).

• For F test of joint significance of the u-hat’stype ‘test ut-1-hat ut-2-hat ut-3-hat’ after the regression.

• F(3,118)=5.12; Prob>F=.0023 where the null is no serial corr. We reject the null so AR(3).

• For BG test: LM=(n-q)R2=(131-3)*.1159=14.835. Here the R-sq is obtained from the reg of residual on all X’s and u’s. Critical chi2 (3)=12.838. Can’t accept null of no serial corr.

• Type ‘estat bgodfrey, lag(3)’-we get chi2=14.768; prob>chi2=.0020; reject the null of no auto.

Remedial Measures: Quasi-Differenced Method

• Assume AR(1) model with strictly exogenous regressors (without any lagged dependent var. as X).

• ut = ρut-1+et for al t=1….n (1)• Assume, Var(ut) =σe

2/(1-ρ2).• Consider the model: yt=β0+β1xt+ut (2) • For t≥2: yt-1=β0+β1xt-1+ut-1 (3)

• Multiply (2) with ρ and subtract from (1): yt-ρyt-1= (1- ρ)β0+β1(xt-ρxt-1)+et for t≥2

yt*= (1- ρ)β0+β1xt

*+et for t≥2 (4)

This is known as quasi-differenced data (with ρ=1 these are differenced data) where errors are not serially corr. ( |ρ|<1).

• When the structure of auto (ρ) is known we could estimate eq. (4) which satisfies all ass. of Gauss-Markov.

25

Quasi-Differenced Method

• The OLS estimators of (4) are not BLUE but can easily be transformed into.

• y1=β0+β1x1+u1 for t=1 (5)• If we add (5) to (4) we get serially uncorrelated error.• But Var(u1) =σe

2/(1-ρ2)>σe2(Var(et)).

• If we multiply (5) by (1-ρ2)1/2, we get:(1-ρ2)1/2 y1= (1-ρ2)1/2 β0+(1-ρ2)1/2β1x1+(1-ρ2)1/2u1

(6)• Here the error term has variance =(1-ρ2)Var(u1)= σe

2

• Can use (6) with (4) and this gives BLUE estimators and satisfies Gauss Markov. This is a form of GLS.

• For given ρ it is easy to transform data and to perform OLS.

26

11102/12

1~~)1(~ uxy

Feasible GLS with AR(1) Errors

• In case of GLS the problem is we might not know the exact value of ρ but we can get an estimate of it and this method is known as feasible GLS.

• Regress OLS residuals on lagged residuals and get ρ-hat. • Use this estimate of ρ instead of actual ρ to get quasi-

differenced variables. • Apply OLS: (7)• For t=1 t≥2 • Thus the procedure is:• Regress (OLS) of yt on the x’s and obtain ut-hat for t=1…n.

• Regress estimated residuals on its lag and get ρ-hat.• Run an OLS of (7) and obtain the beta-s.

27

ttkkttt errorxxxy ~........~~~1100

2/120,1 )ˆ1(~ x )ˆ1(~

0, tx

Feasible GLS with AR(1) Errors

• In this procedure when the 1st obs. is omitted and uses estimated ρ is known as Cochrane-Orcutt Estimation.

• When the 1st obs. used-known as Prais Winsten Estimation.

• Both of the procedure follows an iterative scheme.

• Once the FGLS is found with ρ-hat, can compute a new set of residuals and get a new estimator of ρ, transform the data with the new estimate of ρ and estimate (7) by OLS. Similar procedure is followed many times, till estimated-ρ changes minutely from previous estimate.

• The shortcoming of using FGLS is, it doesn’t have certain properties of a finite sample.

• it is not unbiased (so not BLUE) but it is consistent under certain assumptions (weakly dependent data). `

28

Example: CO Procedure

• The t and F stats are approximately t and F distributed due to estimation error of ρ-hat. But most cases these are not serious problem unless the sample size is small.

• Estimate a model of import of Barium Chloride on several controls (Wooldridge, 2003) with OLS:

• In STATA: tsset data with time var., run OLS, after the OLS type ‘prais dep.var. ind.vars., corc’ for CO iterative method.

• For the significant variables the result doesn’t differ much. But for CO estimates the st. errors are higher as they take care of the problem of serial corr.

• The OLS st. errors understate the actual sampling variation and should be corrected when auto is present. 29

6565.6032.6060.)log(983.

)log(196.)log(12.380.17)ˆlog(

afdecaffilebefilertwex

gaschempinimphc

Differencing as a Remedy

• Consider the model: yt=β0+β1xt+ut t=1, 2…..n

• ut follows AR(1) scheme.

• In case of non stationary data and random walk models, using OLS is misleading and often differencing is used.

• Differencing leads to: Δyt=β1Δxt+Δut t=2…..n

• First differencing is often a good strategy in case of auto corr. with a positive and large value of ρ as it is expected to eliminate serial corr.

• Example: • In STATA obtain u-hat from this reg. and also its lagged

value and estimate ρ-hat (=.529)- sig. at 1% so serial corr.• Take 1st diff. of all the vars. and again check new ρ-hat

=(.068) –not sig. thus differencing has removed auto.

30

ttt defi 700.inf613.25.13

Serial Correlation Robust SE after OLS

• Detailed discussion, Wooldridge (2003). • Basically we need to get a serial correlation robust SE

(here “se(β^)” is the usual incorrect OLS SE.

• Estimate the model by OLS and get “se(β-hat)” , σ-hat and u-hats.

• Compute rt-hats and form at-hat. Also compute v-hat.

• Finally get se(β^) while using the formula.• Example: Puerto Rican wage data (Wooldridge, 2003).

31

ttt

g

h

n

hthtt

n

tt

ura

aaghav

vsese

ˆˆˆ

)ˆˆ)](1/(1[2ˆˆ

ˆ]ˆ/)"ˆ([")ˆ(

1 11

2

211