Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
EconometricsLecture 9
Time Series Methods
Tak Wai Chau
Shanghai University of Finance and Economics
Spring 2014
1 / 82
Time Series Data
I Time series data are data observed for the same unitrepeatedly over time.
I Example: macro variables such as GDP, in�ation,unemployment, exports, or �nancial data like stock price,bond price, exchange rate.
I Denote time for each observation as t = 1, 2, ...,T .I The most notable di¤erence for time series data is that, thereis a natural ordering according to time. The order incross-sectional data is arbitrary.
I It may be helpful to plot the time series against time.I There can be dependence across observations over time.It is not as reasonable as in cross-sectional data to assumeindependence across observations.
I Current value may depend on the past values of itself andother variables. We call them lagged variables.
2 / 82
3 / 82
Time Series Data
I We call a time series strongly stationary if the jointdistribution of a segment of time series zt , zt+1, ..., zt+k doesnot depend on t for any k = 0, 1, ....
I We call a time series weakly stationary (or covariancestationary) if the unconditional mean E (zt ), varianceVar(zt ), and all covariances cov(zt , zt�k ) for all k = 1, 2, ...do not depend on time t.
I If the series is not stationary, it may have a trend.I Be cautious about relations between variables with trends.I Two variables having upward or downward trend may seem tohave strong relation with each other in a regression (e.g veryhigh R2 and t statistics), but they may not be closely relatedwhen having a closer look.
I If two variables have causal relations, we would expect theyare still closely related even after taking away the trend.
4 / 82
Time Series Data
I Here I mainly brie�y discuss two aspects:I 1) Static model. Adjustments to the OLS assumptions undertime series data, and the consequences if the error term areserially correlated.
I 2) Models with lagged variables as regressors. We call thesedynamic models. (e.g. ARDL.)
I 3) Stochastic trends, unit root and cointegration.
5 / 82
Time Series DataI Some useful terminology and transformations:I We call yt�j the j th lag of yt . Sometimes we writeyt�j = Ljyt , where L is called the lag operator.
I We call ∆yt = yt � yt�1 the �rst di¤erence of yt .I When we consider the log transformation, the �rst di¤erencein log is the approximated growth rate if it is not too large:
∆ ln yt = ln yt � ln yt�1 = ln(yt/yt�1)
= ln�yt � yt�1yt�1
+ 1�
' yt � yt�1yt�1
since ln(1+ x) ' x for small x .I First di¤erences, like growth rates, are less a¤ected by thetrend over time. e.g. growing at a �xed percentage becomes aconstant now.
6 / 82
7 / 82
AutocorrelationI A useful concept about time series is autocorrelation.I For a time series yt , t = 1, 2, ...,T , the j th autocovariance iscov(yt , yt�j ), and so the j th autocorrelation is
corr(yt , yt�j ) = ρj =cov(yt , yt�j )pVar(yt )Var(yt�j )
=cov(yt , yt�j )Var(yt )
and the last equality holds for weakly stationary series.I The sample covariance can be obtained from a sample
dcov(yt , yt�j ) = 1T
T
∑t=j+1
(yt � yj+1,T )(yt�j � y1,T�j )
where ys ,t is the sample average from period s to period t.I The sample autocorrelation is given by
ρj =dcov(yt , yt�j )dVar(yt )
where Var(yt ) = cov(yt , yt�0).8 / 82
Basic Assumptions
I Consider the regression model
yt = x 0tβ+ εt
I Some assumptions can be maintained, but some may needmodi�cations.
I These can be maintained:I Assumption 1: The regression function is linear in parametersI Assumption 3: There is no perfect multicollinearity amongregressors.
I Assumption 5: Errors εt are homoscedastic.I But the other assumptions should be adjusted to allow forbroader situations.
9 / 82
Basic AssumptionsI Assumption 2 is originally about independence across eachobservation or unit.
I But for time series, the same unit is sampled repeatedly overtime, so they are generally not independent.
I e.g. When the unemployment rate is higher than normal thisquarter, it is likely that it will also be higher than normal thenext quarter.
I We may replace with this:I Assumption 2 (Stationary Time Series){(yt , xt ) : t = 1, ...,T} are stationary and weaklydependent.
I Weakly dependent roughly means the dependence over time(e.g. autocorrelation) dies down to zero when the time inbetween increases to in�nity.
I This guarantees some forms of LLN and CLT to apply.Technical details are skipped.
10 / 82
Basic AssumptionsI Assumption 4 is the zero conditional mean assumption.I Strict exogeneity: E (εt jx1, ..., xT ) = 0, that means values ofregressors for all observations do not give us more informationabout the error of a period.
I This is required for the unbiasedness of the OLSestimator.
I But it may be too strict in some cases. e.g. It is violated whenone of the regressors in xt is a lagged dependent variable yt�1.
I So, usually weak exogeneity is assumed: E (εt jxt ) = 0.I This is enough for consistency.I For dynamic models, it is also reasonable to haveE (εt jxt�1, ..., x1) = 0, where xt are known aspre-determined variables. (xt refers to regressors in general,which can include yt�j .)
I This means the values of regressors in the past do not a¤ectthe errors now and for the future. We call such εt a shock.
11 / 82
Basic Assumptions
I Before in the basic assumptions, independent observationimplies no correlation among the error terms.
I For time series, we need an extra assumption to include this:I Assumption 6: No serial correlation cov(εt , εs ) = 0 for alls, t = 1, ...,T .
I But we can also allow this to be violated.I If we just want a static model where no lag variables areincluded, then we can allow the error to be serially correlatedbecause of the correlation of the unobserved variables overtime.
I For a dynamic model, when lag dependent variables areincluded to capture the dynamics, then we will use enoughlagged variables so that the error terms are uncorrelated.
12 / 82
Properties of OLSI For unbiasedness, we need strict exogeneityE (εt jx1, ..., xT ) = 0. If it fails, the OLS estimator is notunbiased.
I But if we have reasonably large sample, consistency justrequires weak exogeneity E (εt jxt ) = 0.
I So, in the case of lagged dependent variable as a regressor, wecan only have consistency.
I Note that, if we have both serial correlation of error εt andlagged dependent variable yt�1, weak exogeneity is notsatis�ed, and so there is inconsistency.
I But if there is no lagged dependent variables, serial correlationin the error term does not a¤ect consistency.
I For e¢ ciency, Gauss-Markov Theorem only applies to the casewith error homoscedastic and no serial correlation.
I We will look at how to improve e¢ ciency by GLS in case ofserial correlation in errors.
13 / 82
Properties of OLS
I For asymptotic properties, we need a version of Law of LargeNumbers and Central Limit Theorem to hold whenobservations are dependent / correlated where thevariables are only weakly dependent.
I Technical details are not covered here.I The key is that if the dependence over time die down fastenough so that latter draws can carry enough new informationto the sample mean, then a version of LLN and CLT holds.
I If this is satis�ed, the OLS estimator is consistent andasymptotically normal as before.
I For models with trends, the LLN and CLT involved aredi¤erent and much more complicated. They will be skipped inthis class.
14 / 82
Serial CorrelationsI Now consider a static regression model with stationaryvariables, where only variables of the same t are included:
yt = x 0tβ+ εt
I If cov(εt , εt�k ) 6= 0 for some k > 0, then we say the errors areserially dependent, or there is serial correlation in the error.
I The dependence may come from the dependence ofunobservable factors over time.
I Sometimes we focus on long run relations and the deviationfrom this long run relation is in the error term. Theadjustment towards this long run relation may take time andso the deviation can be serially correlated.
I The OLS estimator is still consistent if cov(xt , εt ) = 0.I But as the derivation of usual OLS variance formula requiresuncorrelated errors, the variance formula should beadjusted under serial correlation.
15 / 82
Autocorrelation Robust VarianceI To obtain a valid variance formula, we deal with the varianceof the last term.
Var
1T
T
∑t=1xt εt
!which equals
E
1T
T
∑t=1xt εt
! 1T
T
∑t=1xt εt
!0!= E
1T 2
T
∑s=1
T
∑t=1
εt εsxtx 0s
!I We may want to use the former one, but when directlyreplacing ε by e, the OLS �rst order condition implies
1T
T
∑t=1xtet
!= 0
which makes the outer product identically zero.
16 / 82
Autocorrelation Robust VarianceI So, at least we have to impose some restrictions on theautocorrelations to obtain a valid variance estimator.
I The usual one is the Newey-West heteroscedasticity andautocorrelation consistent estimator (HAC), where weassume that the serial correlation only substantially a¤ect thevariance estimate when the two observations are separated byfewer than L periods.
I The middle term is
V =1T 2
T
∑t=1e2t xtx
0t
+1T 2
L
∑l=1
T
∑t=l+1
wletet�l (xtx0t�l + xt�lx
0t )
where wl = 1� l/(L+ 1), thus a lower weight for longer lags.(To ensure the estimate matrix is positive de�nite.)
I A suggestion is L ' T 1/4.
17 / 82
Model for Serial CorrelationI Autoregressive Process of Order 1 (AR(1)):
εt = ρεt�1 + ut
where ut is stationary, zero mean and serially uncorrelatedover time. We call such series white noise.
I ut is also uncorrelated to εt�s for s > 0. So it is anunpredictable shock.
I By successive substitution, we have
εt = ρs εt�s + ρs�1ut�s+1 + ρs�2ut�s+2 + ...+ ρut�1 + ut
I Thus
cov(εt , εt�s ) = ρscov(εt�s , εt�s ) = ρsvar(εt )
or autocorrelation of lack s = ρs .I It goes to zero when s goes to in�nity, as long as jρj < 1.I The closer the ρ to 1, the more persistence is the series.
18 / 82
Model for Serial CorrelationI We may also consider AR(p) process:
εt = ρ1εt�1 + ρ2εt�2 + ...+ ρpεt�p + ut
Thus the current error term depends directly on the errorterms p periods before plus a new shock.
I Another model for stationary time series is a moving averagemodel (MA). MA(1):
εt = ut + λut�1
where the shock at one previous period can directly a¤ect thecurrent εt .
cov(εt , εt�1) = cov(ut + λut�1, ut�1,λut�2) = λvar(ut�1)
cov(εt , εt�s ) = cov(ut + λut�1, ut�s ,λut�s�1) = 0 for s > 1
I So, the covariance drops to zero for 2 or more lags.I Similarly we can have MA(q) models that allow q shocks inthe past.
19 / 82
Tests for AutocorrelationI Suppose we suspect that the error term ε follows AR(p).I Lagrange Multiplier Test, also known as Breusch-Godfreytest: This test is based on regression of the (OLS) residuals.
I First, obtain the residual series of the corresponding modelet = yt � x 0tb.
I Second, do the following regression (for t � p + 1)
et = x 0tγ+ α1et�1 + α2et�2 + ...+ αpet�p + vt
I LM = (T � p)R2 which follows χ2 distribution with degreesof freedom p asymptotically under the null of NO correlationto the past p lags.
I One may also use an F or Wald test for α1 = ... = αp = 0.I If there is correlation between errors, it will show up asnon-zero α1, ..., αp , and thus some explanatory power to et .
I Very often, we take p = 1.
20 / 82
Tests for AutocorrelationI A more traditional test if the Durbin-Watson Test, whichmainly test for �rst order autocorrelation.
I The test statistic is
d =∑Tt=2(et � et�1)2
∑Tt=1 e
2t
' ∑T�1t=1 e
2t +∑T
t=2 e2t � 2∑T
t=2 etet�1∑Tt=1 e
2t
' 2∑Tt=2 e
2t � 2∑T
t=2 etet�1∑Tt=1 e
2t
= 2(1� r)
where r is sample correlation coe¢ cient between et and et�1.I The di¤erences between ∑T�1
t=1 e2t , ∑T
t=2 e2t and ∑T
t=1 e2t are
negligible if T is large.I So, if there is strong positive correlation, r tends to 1, and dtends to 0.
I If there is strong negative correlation, d tends to 4.I If there is no correlation, then it is close to 2.
21 / 82
Tests for AutocorrelationI There is a table to check for critical values (dL, dU ), whichdepends on sample size (T ) and number of regressors (K ) inthe original model.
I For positive correlation, d < dL we reject the null of no serialcorrelation.
I d > dU we fail to reject the null.I In the middle then it�s inconclusive.I For negative correlation, the critical values become(4� dL, 4� dH ). (Reject if d > 4� dL)
I Some software report the Durbin-Watson statistic routinely,but now more people use LM test for a formal test.
I But it can still give you a guide how serious theautocorrelation is.
I DW test should not be used if the regressors are not strictlyexogeneous (e.g. with yt�1 as regressor.)
22 / 82
ExampleConsider the annual change in log of gasoline consumption percapita in the US between 1953-2004. Regressors include change inlog gasoline price, change in log disposable income per capita,change in log price of new cars, change in log price of old cars,change in log price of public transport and change in log price ofconsumer services.I use change because the original variables include upward trends.
_cons .0158686 .0070341 2.26 0.029 .0016922 .030045 dlps .1959603 .228362 0.86 0.395 .6561938 .2642731 dlppt .0771696 .0934987 0.83 0.414 .1112647 .2656039 dlnpuc .0103004 .0531744 0.19 0.847 .1174664 .0968656 dlnpnc .0041783 .1797192 0.02 0.982 .3663786 .358022 dlny_pop .492014 .1388956 3.54 0.001 .2120884 .7719396 dlnpg .1287145 .0316817 4.06 0.000 .1925647 .0648642
dlng_pop Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total .033993998 50 .00067988 Root MSE = .01839 Adj Rsquared = 0.5028
Residual .014872705 44 .000338016 Rsquared = 0.5625 Model .019121293 6 .003186882 Prob > F = 0.0000
F( 6, 44) = 9.43Source SS df MS Number of obs = 51
. reg dlng_pop dlnpg dlny_pop dlnpnc dlnpuc dlppt dlps
23 / 82
ExampleTest of serial correlation: you may use the command directly:
H0: no serial correlation
3 10.372 3 0.0157
lags(p) chi2 df Prob > chi2
BreuschGodfrey LM test for autocorrelation
. estat bgodfrey, nom lag(3)
H0: no serial correlation
1 7.619 1 0.0058
lags(p) chi2 df Prob > chi2
BreuschGodfrey LM test for autocorrelation
. estat bgodfrey, nom
H0: no serial correlation
1 7.473 1 0.0063
lags(p) chi2 df Prob > chi2
BreuschGodfrey LM test for autocorrelation
. estat bgodfrey
DurbinWatson dstatistic( 7, 51) = 1.277744
. estat dwatson
(1 missing value generated). predict eb, res
24 / 82
ExampleVerify by auxiliary regression
.01565611
. dis chi2tail(3,NR2)
10.37182. dis NR2
. scalar NR2=e(N)*e(r2)
_cons .0012807 .0068371 0.19 0.852 .0125603 .0151218 dlps .076229 .2261988 0.34 0.738 .3816866 .5341445 dlppt .0563952 .0913959 0.62 0.541 .2414166 .1286261 dlnpuc .0671129 .0562779 1.19 0.240 .0468158 .1810416 dlnpnc .1209709 .185097 0.65 0.517 .4956802 .2537385 dlny_pop .1284956 .1385036 0.93 0.359 .4088814 .1518903 dlnpg .0095619 .029309 0.33 0.746 .0688948 .049771
L3. .0188572 .1636594 0.12 0.909 .312454 .3501684 L2. .2595683 .158538 1.64 0.110 .0613752 .5805118 L1. .2840365 .167714 1.69 0.099 .0554826 .6235557 eb
eb Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total .013496535 47 .00028716 Root MSE = .01669 Adj Rsquared = 0.0304
Residual .010580209 38 .000278427 Rsquared = 0.2161 Model .002916325 9 .000324036 Prob > F = 0.3451
F( 9, 38) = 1.16Source SS df MS Number of obs = 48
. reg eb l(1/3).eb dlnpg dlny_pop dlnpnc dlnpuc dlppt dlps
25 / 82
Example
Newey-West Standard Errors
_cons .0158686 .0079346 2.00 0.052 .0001226 .0318597 dlps .1959603 .2363491 0.83 0.412 .6722906 .28037 dlppt .0771696 .0898375 0.86 0.395 .103886 .2582251 dlnpuc .0103004 .0492629 0.21 0.835 .1095832 .0889825 dlnpnc .0041783 .1400511 0.03 0.976 .2864327 .2780761 dlny_pop .492014 .1277589 3.85 0.000 .2345329 .7494951 dlnpg .1287145 .0257426 5.00 0.000 .1805952 .0768337
dlng_pop Coef. Std. Err. t P>|t| [95% Conf. Interval] NeweyWest
Prob > F = 0.0000maximum lag: 3 F( 6, 44) = 12.04Regression with NeweyWest standard errors Number of obs = 51
. newey dlng_pop dlnpg dlny_pop dlnpnc dlnpuc dlppt dlps, lag(3)
26 / 82
Feasible GLS EstimationI If you discover serial correlation, one may keep the OLS anduse the Newey-West (HAC) standard errors.
I If you do not want to use a dynamic model, you may also dofeasible GLS to improve e¢ ciency.
I This is to transform the model so that the error terms ofdi¤erent time periods are no longer correlated.
I Consider the case where error is AR(1):
εt = ρεt�1 + ut
I Substitute back the regression equation,
(yt � x 0tβ) = ρ(yt�1 � x 0t�1β) + utyt � ρyt�1 = (xt � ρxt�1)0β+ ut
I If we have an estimate ρ, we can run the above regressionwith t = 2, 3, ...,T . It is known as the Cochrane and Orcuttprocedure (C-O).
I An estimate of ρ is obtained by regressing et on et�1 by OLS.27 / 82
Feasible GLS EstimationI A slightly di¤erent procedure called Prais and Winstenprocedure, which also make use of the �rst observation.
I Note that if the process is stationary,
σ2ε = ρ2σ2ε + σ2u
(1� ρ2)σ2ε = σ2u
I First, the �rst observation�s error ε1 is not correlated tou2, ..., uT .
I Second, we want the regression to be homoscedastic.I So, we transform the �rst observation asq
1� ρ2y1 = (q1� ρ2x1)0β+
q1� ρ2ε1
so that Var(p1� ρ2ε1) = (1� ρ2)σ2ε = σ2u .
I Observation 2 to T are the same as C-O procedure above.28 / 82
ExampleFGLS: Prais and Winsten
DurbinWatson statistic (transformed) 2.159772DurbinWatson statistic (original) 1.277744
rho .5570968
_cons .0235941 .0091849 2.57 0.014 .0050831 .042105 dlps .197071 .2246826 0.88 0.385 .649889 .2557469 dlppt .0356913 .0834583 0.43 0.671 .1325078 .2038903 dlnpuc .0902902 .0463877 1.95 0.058 .0031981 .1837785 dlnpnc .2736814 .1583755 1.73 0.091 .5928664 .0455035 dlny_pop .3208192 .1022699 3.14 0.003 .1147077 .5269307 dlnpg .1181465 .0259353 4.56 0.000 .1704156 .0658774
dlng_pop Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total .030564064 50 .000611281 Root MSE = .01627 Adj Rsquared = 0.5667
Residual .011654406 44 .000264873 Rsquared = 0.6187 Model .018909657 6 .00315161 Prob > F = 0.0000
F( 6, 44) = 11.90 Source SS df MS Number of obs = 51
PraisWinsten AR(1) regression iterated estimates
Iteration 11: rho = 0.5571Iteration 10: rho = 0.5571Iteration 9: rho = 0.5571Iteration 8: rho = 0.5571Iteration 7: rho = 0.5570Iteration 6: rho = 0.5569Iteration 5: rho = 0.5562Iteration 4: rho = 0.5534Iteration 3: rho = 0.5425Iteration 2: rho = 0.5003Iteration 1: rho = 0.3549Iteration 0: rho = 0.0000
. prais dlng_pop dlnpg dlny_pop dlnpnc dlnpuc dlppt dlps
use the option corc if you just want the C-O procedure.29 / 82
Models with Lagged Variables
I We can also allow multivariate models where we have anothervariables as regressors.
I Here, we consider to put in the past values of regressors ordependent variable or both into the regression.
I We call such model a dynamic regression model.I Reasons to put in lag values:
1. The e¤ect is dynamical in nature. Adjustments may taketime, so the e¤ect may change some time after the causechanges. Its e¤ect takes time to fully realize.
2. There may be state dependent, or inertia that, for example,once a variable gets to a low value, it takes time to rise back.
3. Response may be a¤ected by expectations, which is formedby the past values of variables. (e.g. expectation augmentedPhilips curve)
30 / 82
Models with Lagged VariablesI Here we consider stationary series.I If we put lags of independent variable, we call the modeldistributed lag model.
I Consider a model also with x one period before:
yt = α+ β0xt + β1xt�1 + εt
I Then the contemporaneous (same period, immediate)e¤ect of xt on yt is β0.
I If xt�1 is one unit higher, holding xt unchanged, the e¤ect onyt is β1.
I So, if there�s a temporary (single period) 1-unit increasein xt , yt increases by β0 and yt+1 increases by β1, and changenothing else. Thus, the total e¤ect is β0 + β1.
I A permanent (all-period) increase in xt would change yt byβ0 + β1 for every period, after 2 periods of adjustment.
31 / 82
Models with Lagged VariablesI More generally, for a distributed lag model with p lags
yt = α+ β0xt + β1xt�1 + β2xt�2 + ...+ βpxt�p + εt
I The contemporaneous e¤ect (for a unit change in xt) is β0.I The e¤ect to the next period is β1. The e¤ect at s periods(s � p) after is βs .
I For a temporary (single period) change in x , the total e¤ectto the variable y over time is ∑p
s=0 βs .I For a permanent change in x , the e¤ect to y every period isalso ∑p
s=0 βs after p periods of adjustment since all x and itslags on the RHS increases by 1.
I We can also think in terms of a steady state (y0, x0) where xand y no longer change, which may be taken as a long-runequilibrium after adjustment completed.
y0 = α+ β0x0 + ...+ βpx0 = α+ (p
∑s=0
βs )x0
32 / 82
Models with Lagged Variables
Here all β0s are negative.33 / 82
Models with Lagged VariablesI If instead, we include a lagged dependent variable, we callit an autoregressive model:
yt = α+ βxt + γyt�1 + εt
where as before stationarity requires jγj < 1I The one-unit increase in xt leads to a contemporaneouschange in yt by β.
I However, when it changes yt , which is also in the yt+1equation, there is an e¤ect to yt+1 of γβ.
I Similarly, when it changes yt+1,which enters the yt+2equation, there is an e¤ect to yt+2 is γ2β, and so on.
I So, the total e¤ect to y for a temporary change in x isβ+ γβ+ γ2β+ ... = ∑∞
s=0 βγs = β/(1� γ)
I Similarly, it is also the long run e¤ect for a one-unitpermanent change in x .
34 / 82
Models with Lagged Variables
I Viewing the long run e¤ect in another way, after all theadjustments, and neglect temporary shocks εt , it attains asteady state yt�1 = yt = y0.
I Also assume x is �xed over time.I Then,
y0 = α+ βx + γy0
(1� γ)y0 = α+ βx
y0 =α
1� γ+
β
1� γx
I Thus a permanent change in x by one unit leads to a changeof y in the long run by β/(1� γ).
I Also note that the constant term α here actually means thesteady state constant term is α/(1� γ).
35 / 82
Models with Lagged Variables
I So, there are a few reasons we put lagged of the dependentvariable into the model:
1. An approximation to the e¤ect of x that dies down slowly in asimilar pattern.
2. There is state dependent. When the value is lower(experiencing a negative shock), it takes some time to riseback. Similarly for a positive shock.
3. Partial adjustment to the optimal level. Assume the optimallevel as
y�t = α+ βxt + εt
and the adjustment takes
yt � yt�1 = γ(y�t � yt�1)
Then the resulting equation becomes
yt = αγ+ (1� γ)yt�1 + γβxt + γεt
36 / 82
Models with Lagged VariablesI If yt�1 is a regressor, the past error εt�1 would be correlatedto the present regressor yt�1. Thus strict exogeneity cannotbe satis�ed, and OLS estimator fails to be unbiased.
I But as stated before, weak exogeneity, which meansregressors of the current equation are not correlated to thecurrent error, is enough for consistency.
I Here, we need E (εt jyt�1, xt ) = 0.I Usually true if it can be understood as a shock: somethingthat cannot be predicted beforehead.
I Moreover, if εt are serially correlated, say cov(εt , εt�1) 6= 0,then models with lagged dependent variable would faileven to be consistent.
cov(yt�1, εt ) = cov(α+ βxt�1 + γyt�2 + εt�1, εt )
= cov(εt�1, εt ) 6= 0I In this case, one should use IV, or respecify the model withmore lags of either x or y to capture this dependence.
37 / 82
Models with Lagged VariablesI In general, we can have autoregressive distributive lagmodel (ARDL):
yt = α+p
∑s=1
γsyt�s +q
∑j=0
βjxt�j + εt
I This would give us various forms of dynamic e¤ects for achange in x ,α or ε.
I As before, these models can be estimated by OLS, if weakexogeneity is satis�ed.
I The long run e¤ect of a permenant change in x can beexpressed as
y0 = α+p
∑s=1
γsy0 +q
∑j=0
βjx0
so
y0 =α
1�∑ps=1 γs
+∑qj=0 βj
1�∑ps=1 γs
x0
38 / 82
Selecting Speci�cationsA few concerns for selecting appropriate speci�cations, especiallyabout how many lags to use:
I Theorectical concern: Is there any suggestions from theory?I The error term is a white noise (serially uncorrelated), sothat any useful information from the past has already beencaptured by the model. The error should be unpredictable.(dynamically complete)
I General to speci�c: include the maximum possible lags, andtest whether the coe¢ cient on last lag is statisticallysigni�cant. If not, drop this lag, reestimate without the lastlag, and again check the last lag retained. (The opposite ofspeci�c to general is not recommended because there may beomitted variable bias.)
I Information Criteria: Using the model with the lowest AICor BIC.
39 / 82
ForecastingI In many situations, we may not be able to pin down thecausal e¤ect of the regressor, but instead we are interested inforecasting based on what we have known.
I In this case, we need a dynamically complete model withwhite noise errors, so that the model can capture as muchinformation before t as we can.
I For forecasting, we may NOT want to put the value of x inthe same period:
yt = α+p
∑s=1
γsyt�s +q
∑j=1
βjxt�j + εt
I Then, the forecast of yT+1 given values of x and y at periodT and before is given by
yT+1 = α+p
∑s=1
γsyT+1�s +q
∑j=1
βjxT+1�j
I We can de�ne more periods ahead forecast similarly, but wehave to �nd ways to predict xT+1 onwards if required. 40 / 82
Granger Causality TestI There is a test called Granger Causality test.I The idea is that if given the past of y , if the past of x canexplain y , then x Granger causes y .
I This means x moves before y or x helps to forecast y .I It may support the belief that x has a casual e¤ect on y . Butit depends on context.
I You should not use it solely in proving causality!I It is alwo possible that x Granger causes y and y also Grangercauses x .
I Given an ARDL model with no current x
yt = α+p
∑s=1
γsyt�s +q
∑j=1
βjxt�j + εt
The Granger Causality test is the F-test on all βj . The null isx does not Granger causes y .
I If we reject the null that all βj are zero, then x Grangercauses y .
41 / 82
Seasonality
I If your data is quarterly or monthly, there may be seasonalityissue.
I There is a pattern over di¤erent periods of a year.I As mentioned before, you may include a dummy for eachquarter (but leave out one) to control for the quarter speci�ce¤ect.
yt = α+∑ γsyt�s +∑ βjxt�j +3
∑q=1
δqQqt + εt
I You may consider lags one year before, say t � 4 for quarterlydata, in order to capture speci�c relations a year before.
42 / 82
ExampleI In economics, there is a proposition that in�ation andunemployment are negatively related.
I The more sophisticated form is the Expectation AugmentedPhilips Curve.
in�t � E (in�t jIt�1) = β(ut � u�) + εt = �βu� + βut + εt
where in�t is in�ation rate; It is the information set at t � 1,which is the information one has at t � 1, and u� is thenatural (in�ation nonaccelerating) unemployment rate.
I The simplest thing is to put E (in�t jIt�1) =in�t�1, so theright hand side becomes ∆in�t . (∆yt = yt � yt�1)
I Or in general, we can build a ARDL model between change inin�ation and unemployment rate.
I The data here is a quarterly data from US between1957Q1-2005Q1.
43 / 82
Example
First you should tell Stata that it is a time series data by usingtsset time
Then one convenient feature of Stata is that you can use l.x andd.x to refer to lag and di¤erence respectively.*here, they use annualized percentage,*so multiply by 400gen infl=400*(punew-l.punew)/l.punew
*define difference (change) in inflationgen dinfl=d.infl
44 / 82
ExampleI In comparison of AIC and BIC, it is better to hold number ofobservations constant.
I I �nd that for lowest BIC, we have lag 2 for din� and lag 2 forunemployment.
H0: no serial correlation
4 8.503 4 0.0748
lags(p) chi2 df Prob > chi2
BreuschGodfrey LM test for autocorrelation
. estat bgodfrey, lag(4)
_cons 1.342246 .4651096 2.89 0.004 .4245791 2.259913
L2. 1.654419 .3905678 4.24 0.000 .8838241 2.425014 L1. 1.771388 .6915626 2.56 0.011 3.135849 .4069266 . .1081005 .3869622 0.28 0.780 .8715814 .6553805 lhur
L2. .3943785 .0632786 6.23 0.000 .5192279 .2695291 L1. .4676938 .0652519 7.17 0.000 .5964367 .338951 dinfl
dinfl Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 563.297362 188 2.99626256 Root MSE = 1.4198 Adj Rsquared = 0.3272
Residual 368.911084 183 2.01590756 Rsquared = 0.3451 Model 194.386278 5 38.8772555 Prob > F = 0.0000
F( 5, 183) = 19.29Source SS df MS Number of obs = 189
. regress dinfl L(1/2).dinfl L(0/2).lhur
45 / 82
Example
Prob > F = 0.0000 F( 5, 175) = 11.27
( 5) L.lhur = 0( 4) L.lhur L5.lhur = 0( 3) L.lhur L4.lhur = 0( 2) L.lhur L3.lhur = 0( 1) L.lhur L2.lhur = 0
. test l.lhur=l2.lhur=l3.lhur=l4.lhur=l5.lhur=0
_cons 1.112877 .4931173 2.26 0.025 .1396545 2.086099
L5. 1.359282 .4019793 3.38 0.001 .5659309 2.152634 L4. 1.968619 .7685626 2.56 0.011 3.485464 .4517746 L3. .0066039 .8214687 0.01 0.994 1.614657 1.627865 L2. 2.90072 .7614096 3.81 0.000 1.397992 4.403447 L1. 2.486391 .3948785 6.30 0.000 3.265728 1.707054 lhur
L5. .0302168 .0687626 0.44 0.661 .1659275 .1054938 L4. .1408624 .0763557 1.84 0.067 .2915589 .0098342 L3. .0414439 .083255 0.50 0.619 .205757 .1228692 L2. .342112 .0797319 4.29 0.000 .4994719 .1847521 L1. .4270276 .0735012 5.81 0.000 .5720906 .2819647 dinfl
dinfl Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 542.638606 185 2.93318165 Root MSE = 1.3718 Adj Rsquared = 0.3584
Residual 329.313173 175 1.88178956 Rsquared = 0.3931 Model 213.325433 10 21.3325433 Prob > F = 0.0000
F( 10, 175) = 11.34Source SS df MS Number of obs = 186
. regress dinfl L(1/5).dinfl L(1/5).lhur
. *test for Granger Causality using a longer own lags
46 / 82
Non-stationary time series
I Many economic variables have trends and are non-stationary.I For example, GDP, price, population usually grows at acertain percentage every year.
I Some other variables may have reached a new level that doesnot tend to go back to the original level.
I Some special attention should be paid to analyze trendedvariables.
I Variables can easily be shown to have strong relationship (highR2 and t values) just because both variables have a trend.
I In Econometrics, there are two types of trends: deterministicor stochastic.
47 / 82
Non-stationary time series
I One danger of running a regression with trended time series isthat the trended variables seem to have relationship with eachother because of the trend but indeed there should be nodirect relationship.
I e.g. Nominal prices or sales of all goods tend to increase overtime because of the general in�ation, but some real (relative)prices may have fallen relative to some others. So a positiverelationship is purely driven by the general price level ratherthan other deeper causes.
I So to determine if the two variables have deeper relationship,one way is to "detrend" the series.
I If after removing the trend, we still �nd a relationship, thenthe variables are more likely to be related directly.
48 / 82
Non-stationary time series
I One way to remove the trend is to use the �rst di¤erence,especially for log variable it becomes the growth rate:
∆yt = β0 + ∆x 0tβ1 + vt
I If the level trended upward and so it becomes larger and largerover time, the �rst di¤erence or growth rate is more likely tobe stable over time.
I If there is a relationship in the �rst di¤erence or growth rate,it is more likely that they are related directly.
I e.g. we think in terms of GDP growth rate or in�ation rateinstead of GDP level or CPI directly, where latter ones trendupward over time.
49 / 82
Deterministic TrendsI Deterministic trends are trends that can be captured bydeterministic function, usually polynomial.
I A common way is to use linear trend
yt = α0 + α1t + vt
or sometimes we use quadratic trend
yt = α0 + α1t + α2t2 + vt
or sometimes with higher order terms.I If both yt and xt have deterministic trend, but you do yourregression as usual:
yt = β0 + β1xt + εt
then one possible problem is that, even though the two seriesare unrelated, if they both have a deterministic trend, thecoe¢ cient β here would capture the relation due to the trend.
I At a closer look, their relation around this trend can beunrelated. This is a kind of spurious regression.
50 / 82
51 / 82
Deterministic TrendsI One way of modeling such relation is to include t or itspolynomial in the regression.
yt = x 0tβ+ γ1t + γ2t2 + εt
I By FWL theorem, this just means we �rst regress y and x ontime trend (polynomial) respectively, then use the residuals torun regression to obtain β.
I In this way the deterministic trend is �rst removed, and weanalyze the deviation from the trend.
I If variables are closely related (e.g. causal e¤ect), thedeviation from trend should also show close relationship.
I If the variables have a roughly constant growth rate, one mayconsider log form with linear time trend:
ln yt = (ln xt )0β+ γt + εt
I If the time trend is just linear, taking �rst di¤erence(∆yt = yt � yt�1) can reduce the trend to a constant, so thetrend is removed.
52 / 82
53 / 82
Stochastic TrendsI Stochastic trends cannot be captured by a deterministicfunction, but an accumulation of shocks.
I Here we mainly use a random walk model to capture astochastic trend:
yt = yt�1 + εt
where εt is a white noise process (stationary, mean zerowith no serial correlation).
I Random walk: one starts from where one is, then step forwardor backward randomly.
I The best forecast of a random walk process is the value at theprevious period:
E (yt+1jyt , ..., y1) = ytsince E (εt+1jyt , ..., y1) = 0
I Thus, there is no tendency to go back to the unconditionalmean.
54 / 82
Stochastic TrendsI This is in the form of AR(1) but with ρ = 1.
yt = yt�1 + εt = yt�2 + εt�1 + εt
= y0 +t
∑s=1
εs
Shocks before have an e¤ect on all future periods(permanent e¤ect).
I In contrast, with stationary AR(1) with jρj < 1,
yt = ρty0 +t�1∑s=0
ρt�s εt
Thus the e¤ect of shocks dies down as ρs decreases andapproaches 0 when s increases. Thus, it tends to go back toits unconditional mean 0 as the e¤ects of the past shocksvanished.
55 / 82
Stochastic Trends
I If Var(εt ) = σ2ε , and the process starts at y0 = 0, then theunconditional variance is
Var(yt ) = Var
t
∑s=1
εs
!=
t
∑s=1
σ2ε = tσ2ε
and it tends to in�nity when t tends to in�nity.I As all the future shocks have permanent e¤ect on the series,its uncertainty grows (linearly) with time.
I The autocorrelation is then given by
ρh =cov(yt , yt�h)pvar(yt )var(yt�h)
=var(yt�h)p
var(yt )var(yt�h)=
rt � hh
thus it falls only very slowly.
56 / 82
Stochastic Trends
I More generally, if εt is serially correlated but weaklydependent, we call such process a unit root process.
I Recall the lag operator: Ljyt = yt�j . Then,
yt � yt�1 = yt � Lyt = (1� L)yt = εt
so the polynomial on the left in L has a root of 1.I More generally, for an AR(p)
yt �φ1yt�1� ...�φpyt�p = (1�φ1L�φ2L2� ...�φpLp)yt = εt
then unit root means the polynomial in L has a root of 1.I (BTW, stationary series requires the roots of this polynomialbigger than 1, or outside unit circle if it is complex.)
57 / 82
Four series of Unit root process, εt � N(0, 1)
20
10
010
0 50 100 150 200t
y1 y2y3 y4
58 / 82
Unit root processRegression results for four independently generated series
_cons 3.933836 .7938562 4.96 0.000 2.36824 5.499433 t .125111 .0132507 9.44 0.000 .1512432 .0989788 y4 .1166655 .0948362 1.23 0.220 .0703648 .3036959 y3 .3099425 .0831227 3.73 0.000 .4738722 .1460129 y2 .3930941 .0851424 4.62 0.000 .2251812 .561007
y1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 8384.25163 200 41.9212581 Root MSE = 3.453 Adj Rsquared = 0.7156
Residual 2336.92922 196 11.9231083 Rsquared = 0.7213 Model 6047.32241 4 1511.8306 Prob > F = 0.0000
F( 4, 196) = 126.80Source SS df MS Number of obs = 201
. reg y1 y2 y3 y4 t
_cons 3.103069 .9492054 3.27 0.001 1.231161 4.974977 y4 .8321876 .0685979 12.13 0.000 .6969072 .967468 y3 .2328477 .0722317 3.22 0.001 .0904011 .3752942 y2 .0313003 .0869976 0.36 0.719 .2028664 .1402659
y1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 8384.25163 200 41.9212581 Root MSE = 4.1543 Adj Rsquared = 0.5883
Residual 3399.857 197 17.2581574 Rsquared = 0.5945 Model 4984.39462 3 1661.46487 Prob > F = 0.0000
F( 3, 197) = 96.27Source SS df MS Number of obs = 201
. reg y1 y2 y3 y4
59 / 82
Spurious RegressionI The above diagram shows a phenonmenon that two or morerandom walk series are quite easily found a statisticallysigni�cant relationship, even when all shocks ε it areindependent.
I This is known as spurious regression, which is one problemwith regression of trended variables.
I The two series can behave rather di¤erently upon a closerlook, but they are just related because the beginning and theend of the series are quite di¤erent from each other withouttendency to go back to the original level.
I In particular, the residuals series of such regression isnon-stationary (another unit root process).
I The usual T test rejects the null of β = 0 when it is actuallytrue by a much higher probability than the supposedsigni�cance level (e.g. 5%) of the test.
I This gets worse with a larger T because permanent deviationsfrom zero is more likely to have emerged over time.
60 / 82
AR(1) with ρ = 0.7
42
02
4
0 50 100 150 200t
y1 y2
61 / 82
AR(1) with ρ = 0.7Regression results for these 4 series
_cons .4671551 .1051404 4.44 0.000 .6745002 .25981 y4 .0865012 .0752009 1.15 0.251 .0618009 .2348033 y3 .0897198 .079669 1.13 0.261 .2468333 .0673938 y2 .0113832 .0769657 0.15 0.883 .1631656 .1403992
y1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 403.176253 200 2.01588127 Root MSE = 1.4219 Adj Rsquared = 0.0029
Residual 398.28752 197 2.02176406 Rsquared = 0.0121 Model 4.88873273 3 1.62957758 Prob > F = 0.4919
F( 3, 197) = 0.81Source SS df MS Number of obs = 201
. regress y1 y2 y3 y4
62 / 82
Stochastic TrendsI If we allow a constant term in the equation, it becomes arandom walk with a drift.
yt = µ+ yt�1 + εt
I Then,
yt = µ+ (µ+ yt�2 + εt�1) + εt
= y0 + tµ+t
∑s=1
εs
I Thus, it includes a linear trend plus a stochastic trend.
I If a series only has a deterministic trend, it moves around thetrend line. It will eventually go back when the e¤ect of earlierbig shocks die down.
I If a series has a stochastic trend, it will not go back to a trendline. Any big shock will have a permanent e¤ect.
63 / 82
Stochastic Trends
I A symptom of a random walk sequence is that the sampleautocorrelation is close to 1, and it falls only very slowly.
64 / 82
Stochastic TrendsI A strategy to transform the series into a stationary series is totake �rst di¤erence: ∆yt = yt � yt�1
I If a series becomes stationary after di¤erenced once, we callthis sequence integrated of order 1, or I(1).
I The random walk model is one example
yt � yt�1 = ∆yt = µ+ εt
with εt white noise.I If ∆yt is still not stationary, but ∆2yt = ∆yt � ∆yt�1 isstationary, we call the series I(2).
I If we need to di¤erence p times until it is stationary, we callthis a process integrated of order p, or I(p).
I If it is stationary without di¤erencing, it is I(0).I Generally, we can transform all variable to I(0) and usetechniques in part I and II.
I Later, we will look into a case where regression with I(1)variables can still be meaningful.
65 / 82
Test for Unit RootI We may want to test whether a time series has a unit root.I If we cannot reject the null of unit root, it is safer to use �rstdi¤erence for building dynamic model, and/or to investigatelong run relationship with the cointegration technique we willintroduce later.
I If we can reject the null that there is a unit root, then theseries can be treated as stationary or having a deterministictrend.
I The most common test is the Dickey-Fuller Test.
yt = µ+ γyt�1 + εt
yt � yt�1 = µ+ (γ� 1)yt�1 + εt = µ+ δyt�1 + εt
I We now want to test H0 : δ = γ� 1 = 0 versusH1 : δ = γ� 1 < 0.
I We can obtain an estimator through OLS. However, thedistribution of the OLS estimator is NOT normal evenasymptotically. (Usual CLT does NOT hold.)
66 / 82
Test for Unit Root
I Dickey-Fuller has produced one-sided critical values for the tstatistic of δ = 0 through simulations.
I The test is one-sided. We reject the null of unit root if thecalculate t ratio is smaller (more negative) than this criticalvalue.
I The critical values are usually much more negative than thosefor normal or t.
I Indeed, if the above model is not enough and εt seriallyuncorrelated, we should add more lag di¤erences until εt areserially uncorrelated, then test δ = 0.
∆yt = µ+ δyt�1 + θ1∆yt�1 + ...+ θp∆yt�p + εt
This is known as Augmented Dickey-Fuller (ADF) Test.I The choice of lag length may be relied on AIC or t test.
67 / 82
Test for Unit RootI The regression can be done in a few forms where the crticalvalues are di¤erent.
I Case 1: no constant (drift), no time trendI Case 2: with constant (drift) but without time trend in theregression.
I Case 3: with constant (drift) and a trend variable t in theregression.
∆yt = µ+ δyt�1 + αt + θ1∆yt�1 + ...+ θp∆yt�p + εt
I Including extra component that is not needed may lead to ahigher standard error and lower power.
I But excluding necessary components may make the testinvalid.
I Note that µ and α, as well as δ, are not normally distributed,so a formal test requires the knowledge of the specialdistribution.
I Adding lag of ∆yt�s do not change the critical values.68 / 82
Test for Unit Root
69 / 82
Test for Unit Root
_cons .4038087 .1970179 2.05 0.042 .01509 .7925274
L3D. .1801005 .0728908 2.47 0.014 .036286 .3239149 L2D. .2441502 .0741649 3.29 0.001 .3904785 .0978219 LD. .2223539 .0768134 2.89 0.004 .3739077 .0708001 L1. .1022547 .0397166 2.57 0.011 .180616 .0238933 infl
D.infl Coef. Std. Err. t P>|t| [95% Conf. Interval]
MacKinnon approximate pvalue for Z(t) = 0.0984
Z(t) 2.575 3.481 2.884 2.574
Statistic Value Value Value Test 1% Critical 5% Critical 10% Critical
Interpolated DickeyFuller
Augmented DickeyFuller test for unit root Number of obs = 188
. dfuller infl, lag(3) regress
_cons .3430082 .2266793 1.51 0.132 .1045578 .7905743
L12D. .1488426 .078397 1.90 0.059 .3036331 .005948 L11D. .068308 .0816735 0.84 0.404 .2295678 .0929519 L10D. .097309 .085964 1.13 0.259 .2670403 .0724222 L9D. .1209403 .0858716 1.41 0.161 .2904891 .0486085 L8D. .1628853 .0855675 1.90 0.059 .3318337 .0060632 L7D. .0967031 .0847709 1.14 0.256 .0706725 .2640788 L6D. .0556598 .0843976 0.66 0.510 .1109786 .2222983 L5D. .0813078 .084167 0.97 0.335 .0848754 .247491 L4D. .0408638 .0856101 0.48 0.634 .2098962 .1281687 L3D. .1444535 .0862692 1.67 0.096 .0258803 .3147873 L2D. .2622137 .0850041 3.08 0.002 .4300497 .0943777 LD. .2621823 .0833081 3.15 0.002 .4266697 .097695 L1. .081604 .0468313 1.74 0.083 .1740699 .0108619 infl
D.infl Coef. Std. Err. t P>|t| [95% Conf. Interval]
MacKinnon approximate pvalue for Z(t) = 0.4094
Z(t) 1.743 3.484 2.885 2.575
Statistic Value Value Value Test 1% Critical 5% Critical 10% Critical
Interpolated DickeyFuller
Augmented DickeyFuller test for unit root Number of obs = 179
. dfuller infl, lag(12) regress
70 / 82
Test for Unit Root
A modi�ed version called dfgls, which is more powerful, is alsoavailable in Stata, with lag length comparison.
Min MAIC = .9250933 at lag 12 with RMSE 1.44414Min SC = .9356207 at lag 3 with RMSE 1.505798Opt Lag (NgPerron seq t) = 12 with RMSE 1.44414
1 3.009 2.588 2.041 1.727 2 2.091 2.588 2.036 1.723 3 2.477 2.588 2.030 1.718 4 2.440 2.588 2.025 1.713 5 2.437 2.588 2.019 1.707 6 2.459 2.588 2.013 1.702 7 2.866 2.588 2.006 1.696 8 2.457 2.588 2.000 1.690 9 2.157 2.588 1.993 1.683 10 2.087 2.588 1.986 1.677 11 1.969 2.588 1.979 1.670 12 1.696 2.588 1.971 1.663 13 1.674 2.588 1.964 1.656 14 1.622 2.588 1.956 1.649
[lags] Test Statistic Value Value Value DFGLS mu 1% Critical 5% Critical 10% Critical
Maxlag = 14 chosen by Schwert criterionDFGLS for infl Number of obs = 177
. dfgls infl, notrend
71 / 82
For the unemployment rate
Min MAIC = 2.773879 at lag 12 with RMSE .2305078Min SC = 2.723621 at lag 1 with RMSE .2488458Opt Lag (NgPerron seq t) = 12 with RMSE .2305078
1 2.172 2.588 2.040 1.727 2 2.020 2.588 2.035 1.722 3 1.875 2.588 2.030 1.717 4 1.659 2.588 2.024 1.712 5 1.724 2.588 2.018 1.707 6 1.807 2.588 2.012 1.701 7 1.680 2.588 2.006 1.695 8 1.414 2.588 1.999 1.689 9 1.564 2.588 1.993 1.683 10 1.760 2.588 1.986 1.677 11 1.579 2.588 1.978 1.670 12 1.342 2.588 1.971 1.663 13 1.344 2.588 1.964 1.656 14 1.341 2.588 1.956 1.649
[lags] Test Statistic Value Value Value DFGLS mu 1% Critical 5% Critical 10% Critical
Maxlag = 14 chosen by Schwert criterionDFGLS for lhur Number of obs = 178
. dfgls lhur, notrend
72 / 82
Test for Unit Root
Another series about price of gasoline, allowing for a trend.
_cons .4923857 .1927009 2.56 0.015 .1022832 .8824883_trend .0087553 .0036412 2.40 0.021 .0013841 .0161264
L5D. .2349681 .1769224 1.33 0.192 .1231924 .5931287L4D. .0297323 .1723872 0.17 0.864 .3192473 .3787119L3D. .2311099 .1743362 1.33 0.193 .1218153 .5840351L2D. .182164 .1720071 1.06 0.296 .5303742 .1660462
LD. .4822171 .1529168 3.15 0.003 .1726532 .7917809 L1. .1920357 .0779265 2.46 0.018 .3497897 .0342817 lnpg
D.lnpg Coef. Std. Err. t P>|t| [95% Conf. Interval]
MacKinnon approximate pvalue for Z(t) = 0.3460
Z(t) 2.464 4.187 3.516 3.190
Statistic Value Value Value Test 1% Critical 5% Critical 10% Critical
Interpolated DickeyFuller
Augmented DickeyFuller test for unit root Number of obs = 46
. dfuller lnpg, lag(5) trend regress
73 / 82
Test for Unit Root
Min MAIC = 4.441592 at lag 2 with RMSE .0950793Min SC = 4.490884 at lag 1 with RMSE .0967121Opt Lag (NgPerron seq t) = 1 with RMSE .0967121
1 2.097 3.762 3.223 2.916 2 1.650 3.762 3.176 2.873 3 1.895 3.762 3.120 2.822 4 1.957 3.762 3.059 2.764 5 2.216 3.762 2.993 2.702 6 2.081 3.762 2.926 2.637 7 2.078 3.762 2.859 2.571 8 1.952 3.762 2.795 2.506 9 2.052 3.762 2.735 2.443 10 2.315 3.762 2.681 2.384
[lags] Test Statistic Value Value Value DFGLS tau 1% Critical 5% Critical 10% Critical
Maxlag = 10 chosen by Schwert criterionDFGLS for lnpg Number of obs = 41
. dfgls lnpg
74 / 82
CointegrationI In some cases, the integrated series indeed have a closerelationship where they share the same stochastic trend.
I Supposeyt � x 0tβ � I (0)
that means the error series is indeed stationary, then yt , xt arecointegrated.
I In this case, the relationship and the β coe¢ cients can bemeaningful. (e.g. long run equilibrium relation.)
I The most straightforward way to obtain β is by OLS, thentest whether we can reject the unit root for the residualet = yt � x 0tb.
I But the critical values are di¤erent from the usual DF tests,so you should compare with the critical value ofEngle-Granger ADF test.
I There are better ways to �nd cointegration vector andperform cointegration tests. They are beyond the scope ofthis brief introduction.
75 / 82
Cointegration
76 / 82
Error Correction ModelI If variables are cointegrated, we may also add this informationto the dynamic model of �rst di¤erenced stationary variables.
I In particular,
∆yt = α0+q
∑s=0
α1s∆xt�s +p
∑j=0
α2j∆yt�j + δ(yt�1� x 0t�1b)+ut
so yt � x 0tb is the deviation from the long-run equilibrium, andthe adjustment next period is likely to make the gap smaller.Thus, we would expect that �1 � δ � 0.
I Notice that all variables in the equation are stationary.I This is known as the error correction model (ECM) and theterm is error correction term.
I Here, though b is estimated, we do not need to adjust forstandard errors. Just add this extra variable into the equation.
I Again, there are more advanced methods for estimating ECMmodels, but they are beyond the scope of this introduction.
77 / 82
_cons .0030884 .004144 0.75 0.460 .0052806 .0114574
L4D. .2062598 .1337696 1.54 0.131 .0638933 .4764129 L3D. .1194632 .1509424 0.79 0.433 .1853711 .4242975 L2D. .3176426 .1449472 2.19 0.034 .0249158 .6103693 LD. .0643524 .1458979 0.44 0.661 .2302945 .3589993 L1. .4444283 .1276796 3.48 0.001 .7022824 .1865742 ub
D.ub Coef. Std. Err. t P>|t| [95% Conf. Interval]
MacKinnon approximate pvalue for Z(t) = 0.0085
Z(t) 3.481 3.600 2.938 2.604
Statistic Value Value Value Test 1% Critical 5% Critical 10% Critical
Interpolated DickeyFuller
Augmented DickeyFuller test for unit root Number of obs = 47
. dfuller ub, lag(4) regress
. predict ub, res
_cons 28.06216 1.465105 19.15 0.000 31.00795 25.11637 t .0196034 .0038165 5.14 0.000 .027277 .0119298 lny_pop 1.739224 .1624758 10.70 0.000 1.412544 2.065903 lnpg .1301604 .0322937 4.03 0.000 .1950913 .0652295
lng_pop Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 2.90857732 51 .057030928 Root MSE = .04986 Adj Rsquared = 0.9564
Residual .119317255 48 .002485776 Rsquared = 0.9590 Model 2.78926007 3 .929753355 Prob > F = 0.0000
F( 3, 48) = 374.03Source SS df MS Number of obs = 52
. reg lng_pop lnpg lny_pop t
78 / 82
Using a user-written SSC command.
L4D. .2103555 .1329474 1.58 0.121 .0579432 .4786542 L3D. .1156157 .1500536 0.77 0.445 .1872047 .418436 L2D. .3108009 .1438885 2.16 0.037 .020422 .6011797 LD. .057986 .1448748 0.40 0.691 .2343831 .3503551 L1. .4233945 .1238605 3.42 0.001 .6733551 .1734339 _egresid
D._egresid Coef. Std. Err. t P>|t| [95% Conf. Interval]
EngleGranger test regression
_cons 28.08177 1.468709 19.12 0.000 31.03481 25.12873 _trend .0196034 .0038165 5.14 0.000 .027277 .0119298 lny_pop 1.739224 .1624758 10.70 0.000 1.412544 2.065903 lnpg .1301604 .0322937 4.03 0.000 .1950913 .0652295
lng_pop Coef. Std. Err. t P>|t| [95% Conf. Interval]
EngleGranger 1ststep regression
Critical values from MacKinnon (1990, 2010)
Z(t) 3.418 4.644 3.972 3.638
Statistic Value Value Value Test 1% Critical 5% Critical 10% Critical
1st step includes linear trendNumber of lags = 4 N (test) = 47Augmented EngleGranger test for cointegration N (1st step) = 52
. egranger lng_pop lnpg lny_pop , regress trend lag(4)
79 / 82
Suppose we have a cointegrating relation (may be we reject due toweak power), we may look at the ECM model.
_cons .0047133 .0039887 1.18 0.244 .0033204 .012747
L1. .10335 .0575364 1.80 0.079 .2192342 .0125342 ub
dlny_pop .5519896 .1230212 4.49 0.000 .3042121 .799767 dlnpg .131232 .0243812 5.38 0.000 .1803382 .0821257
L1. .2455005 .0922497 2.66 0.011 .0597001 .4313009 dlng_pop
dlng_pop Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total .033990177 49 .000693677 Root MSE = .01612 Adj Rsquared = 0.6255
Residual .011689396 45 .000259764 Rsquared = 0.6561 Model .022300781 4 .005575195 Prob > F = 0.0000
F( 4, 45) = 21.46Source SS df MS Number of obs = 50
. reg dlng_pop l(1).dlng_pop dlnpg dlny_pop l.ub
. *consider the one with trend
. *suppose there is a cointegrating relationship (power of test may be bad)
80 / 82
81 / 82
Summary on non-stationary time series
I Since non-stationarity, especially having a unit root, can ruinthe usual properties of linear regression, it is a commonstrategy to �rst test whether each of the series has a unitroot, versus being stationary or with a deterministic trend.
I If the series have a unit root, then take the �rst di¤erence,and test again if it has a unit root. Take the �rst di¤erenceagain if it still has a unit root and so on.
I Use the appropriate di¤erenced variables to build an ARMA orARDL model.
I Test for cointegration to see whether the original variable hasa long term relationship.
I May plug back in the deviation from the long run relationshipto build an error-correction model (ECM).
82 / 82