Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Lectures on Structural Change
Eric ZivotDepartment of Economics, University of Washington
April 4, 2001
1 Overview of Testing for and Estimating Struc-tural Change in Econometric Models
2 Some Preliminary Asymptotic Theory
Reference: Stock, J.H. (1994) “Unit Roots, Structural Breaks and Trends,”in Handbook of Econometrics, Vol. IV.
3 Tests of Parameter Constancy in Linear Mod-els
3.1 Rolling Regression
Reference: Zivot, E. and J. Wang (2002). Modeling Financial Time Serieswith S-PLUS. Springer-Verlag, New York.For a window of width k < n < T, the rolling linear regression model is
yt(n)(n×1)
=Xt(n)(n×k)
βt(n)(k×1)
+ εt(n)(n×1)
, t = n, . . . , T
• Observations in yt(n) and Xt(n) are n most recent values from timest− n+ 1 to t
• OLS estimates are computed for sliding windows of width n and incrementm
• Poor man’s time varying regression model
3.1.1 Application: Simulated Data
Consider the linear regression model
yt = α+ βxt + εt, t = 1, . . . , T = 200
xt ∼ iid N(0, 1)
εt ∼ iid N(0, σ2)
1
No structural change parameterization: α = 0, β = 1, σ = 0.5Structural change cases
• Break in intercept: α = 1 for t > 100• Break in slope: β = 3 for t > 100• Break in variance: σ = 0.25 for t > 100• Random walk in slope: β = βt = βt−1+ηt, ηt ∼ iid N(0, 0.1) and β0 = 1.
(show simulated data)
3.1.2 Application: Exchange rate regressions
References:
1. Bailey, R. and T. Bollerslev (???)
2. Sakoulis, G. and E. Zivot (2001). “Time Variation and StructuralChange in the Forward Discount: Implications for the Forward RateUnbiasedness Hypothesis,” unpublished manuscript, Department of Eco-nomics, University of Washington.
Let
st = log of spot exchange rate in month t
ft = log of forward exchange rate in month t
The forward rate unbiased hypothesis is typically investigated using the so-calleddifferences regression
∆st+1 = α+ β(ft − st) + εt+1
ft − st = forward discount
If the forward rate ft is an unbiased forecast of the future spot rate st+1 thenwe should find
α = 0 and β = 1
The forward discount is often modeled as an AR(1) model
ft − st = δ + φ(ft−1 − st−1) + ut
Statistical Issues
• ∆st+1 is close to random walk with large variance
• ft − st behaves like highly persistent AR(1) with small variance• ft − st appears to be unstable over timeTasks:
• compute rolling regressions for 24-month windows incremented by 1 month• compute rolling regressions for 48-month windows incremented by 12 months
(insert rolling regression graphs here)
2
3.2 Chow Forecast Test
Reference: Chow, G.C. (1960). “Tests of Equality between Sets of Coeffi-cients in Two Linear Regressions,” Econometrica, 52, 211-22.Consider the linear regression model with k variables
yt = x0tβ + ut, ut ∼ (0, σ2), t = 1, . . . , n
y = Xβ + u
Parameter constancy hypothesis
H0 : β is constant
Chow forecast test intuition
• If parameters are constant then out-of-sample forecasts should be unbiased(forecast errors have mean zero)
Chow forecast test construction:
• Split sample into n1 > k and n2 = n− n1 observations
y =
µy1y2
¶n1n2
,X =
µX1
X2
¶n1n2
• Fit model using first n1 observationsβ1 = (X0
1X1)−1X0
1y1
u1 = y1 −X1β1
σ21 = u01u1/(n1 − k)
• Use β1 and X2 to predict y2 using next n2 observations
y2 = X2β1
• Compute out-of-sample prediction errorsu2 = y2−y2 = y2−X2β1
Under H0 : β is constant
u2 = u2 −X2(β1−β)and
E[u2] = 0
var(u2) = σ2³In2 +X2(X
01X1)
−1X02
´3
Further, If the errors u are Gaussian then
u2 ∼ N(0, var(u2))
u02var(u2)−1u2 ∼ χ2(n2)
(n1 − k)σ21/σ2 ∼ χ2(n1 − k)This motivates the Chow forecast test statistic
ChowFCST (n2) =u02³In2 +X2(X
01X1)
−1X02
´u2
n2σ21
∼ F (n2, n1 − k)
Remarks:
• Test is a general specification test for unbiased forecasts• Implementation requires a priori splitting of data into fit and forecastsamples
3.2.1 Application: Exchange rate regression cont’d
(show Eviews output for Chow forecast test)
3.3 CUSUM and CUSUMSQ Tests
Reference: Brown, R.L., J. Durbin and J.M. Evans (1975). “Techniquesfor Testing the Constancy of Regression Relationships over Time,” Journal ofthe Royal Statistical Society, Series B, 35, 149-192.
3.3.1 Recursive least squares estimation
The recursive least squares (RLS) estimates of β are based on estimating
yt = β0txt + ξt, t = 1, . . . , n
by least squares recursively for t = k+1, . . . , n giving n−k least squares (RLS)estimates (βk+1, . . . , βT ).
• RLS estimates may be efficiently computed using the Kalman Filter• If β is constant over time then βt should quickly settle down near a com-mon value.
• If some of the elements in β are not constant then the corresponding RLSestimates should show instability. Hence, a simple graphical technique foruncovering parameter instability is to plot the RLS estimates βit (i = 1, 2)and look for instability in the plots.
4
3.3.2 Recursive residuals
Formal tests for structural stability of the regression coefficients may be com-puted from the standardized 1− step ahead recursive residuals
wt =vt√ft=yt − β0t−1xt√
ft
ft = σ2h1 + x0t(X
0tXt)
−1xti
Intuition:
• If β changes in the next period then the forecast error will not have meanzero
3.3.3 CUSUM statistic
The CUSUM statistic of Brown, Durbin and Evans (1975) is
CUSUMt =tX
j=k+1
wj
σw
σ2w =1
n− knX
t=1
(wt − w)2
Under the null hypothesis that β is constant, CUSUMt has mean zero andvariance that is proportional to t− k − 1.
3.3.4 Application: Simulated Data
(insert graphs here)
Remarks
• Ploberger and Kramer (1990) show the CUSUM test can be constructedwith OLS residuals instead of recursive residuals
• CUSUM Test is essentially a test to detect instability in intercept alone
• CUSUM Test has power only in direction of the mean regressors
• CUSUMSQ test does not have good power for changing coefficients buthas power for changing variance
3.3.5 Application: Exchange Rate Regression cont’d
(show recursive estimates, CUSUM and CUSUMSQ plots)
5
3.4 Nyblom’s Parameter Stability Test
Reference:Nyblom, J. (1989). “Testing for the Constancy of Parameters OverTime,” Journal of the American Statistical Association, 84 (405), 223-230.Consider the linear regression model with k variables
yt = x0tβ + εt
The time varying parameter (TVP) model assumes
β = βt = βt−1 + ηt, ηt ∼ (0, σ2η)
Note: the TVP model nests the single break model by setting ηt = γ, t = r + 1and ηt = 0 otherwise.The hypotheses of interest are then
H0 : β is constant ⇔ σ2η = 0
H1 : σ2η > 0
Nyblom (1989) derives the locally best invariant test as the Lagrange multipliertest
L = tr
S−1n
nXj=1
nXk=j
εkxk
nXk=j
εkx0k
/σ2ε > c
εt = yt − x0tβ, β = (X0X)−1X0y
Sn = n−1X0X
Under mild assumptions regarding the behavior of the regressors, the limitingdistribution of L under the null is a Camer-von Mises distribution:
L ⇒Z 1
0
Bµk(λ)B
µk(λ)dλ
Bµk(λ) = Wk(λ)− λWk(1)
Wk(λ) = k dimensional Brownian motion
Remarks:
• Test statistic is a sum of cumulative sums of weighted full sample residuals• Test is for constancy of all parameters• Test is applicable for models estimated by methods other than OLS• Distribution of L is different if xt is non-stationary (unit root, determin-istic trend)
6
3.4.1 Application: Simulated Data
Nyblom TestModel Lc
No SC .332Mean shift 13.14∗∗∗
Slope shift 14.13∗∗∗
var shift .351RW slope 9.77∗∗∗
3.4.2 Application: Exchange rate regression cont’d
Nyblom TestModel Lc
AR(1) 1.27∗∗∗
Diff reg .413
3.5 Hansen’s Parameter Stability Tests
References
1. Hansen, B.E. (1992). “Testing for Parameter Instability in Linear Mod-els ”Journal of Policy Modeling, 14(4), 517-533.
2. Hansen, B.E. (1992). “Tests for Parameter Instability in Regressionswith I(1) Processes,” Journal of Business and Economic Statistics, 10,321-336.
Same set-up as Nyblom’s LM test. Under the null of constant parameters,the score vector from the linear model with Gaussian errors is based on thenormal equations
nXt=1
xitεt = 0, i = 1, . . . , kX(ε2t − σ2) = 0
εt = yt − x0tβ
σ2 = n−1nX
t=1
ε2t
Define
fit =
½xitεtε2t − σ2
i = 1, . . . ki = k + 1
Sit =tX
j=1
fij , i = 1, . . . , k + 1
7
Note thatnX
i=1
fit = 0, i = 1, . . . , k + 1
Hansen’s LM test for
H0 : βi is constant, i = 1, . . . , k
and forH0 : σ
2 is constant
is
Li =1
nVi
nXt=1
S2it, i = 1, . . . , k
Vi =nX
t=1
f2it
Under the null of no structural change
Li ⇒Z 1
0
Bµ1 (λ)B
µ1 (λ)dλ
For testing the joint hypothesis
H0 : β and σ2 are constant
define the (k + 1)× 1 vectorsft = (f1t, . . . , fk+1,t)
0
St = (S1t, . . . , Sk+1,t)0
Hansen’s LM statistic for testing the constancy of all parameters is
Lc =1
n
nXt=1
S0tV−1St =
1
ntr
ÃV−1
nXt=1
StS0t
!
V =nX
t=1
ftf0t
Under the null of no-structural change
Lc ⇒Z 1
0
Bµk+1(λ)B
µk+1(λ)dλ
Remarks
• Tests are very easy to compute and are robust to heteroskedasticity
8
• Null distribution is non-standard and depends upon number of parameterstested for stability
• 5% critical value for individual tests is 0.470
• Tests are not informative about the date of structural change• Hansen’s L1 test for constancy of intercept is analogous to the CUSUMtest
• Hansen’s Lk+1 test for constancy of variance is analogous to CUSUMSQtest
• Hansen’s Lc test for constancy of all parameters is similar to Nybolom’stest
• Distribution of tests is different if data are nonstationary (unit root, de-terministic trend) - see Hansen (1992), JBES.
3.5.1 Application: Simulated Data
Hansen TestsModel α β σ2 JointNo SC .179 .134 .248 .503Mean shift 13.19∗∗∗ .234 .064 13.3∗∗∗
Slope shift .588 5.11∗∗∗ .067 5.25∗∗∗
var shift .226 .119 .376∗ .736RW slope .253 4.08∗∗∗ .196 4.4∗∗∗
3.5.2 Application: Exchange rate regression cont’d
Hansen TestsModel intercept slope variance JointAR(1) .382 .147 2.94∗∗∗ 3.90∗∗∗
Diff reg .104 .153 .186 .520
4 Tests for Single Structural Change
Consider the linear regression model with k variables
yt = x0tβt + εt, t = 1 . . . , n
No structural change null hypothesis
H0 : βt= β
9
Single break date alternative hypothesis
H1 :
½βt= β, t ≤m = break dateβt = β + γ, t > m and γ 6= 0
k < m < T − kλ =
m
n= break fraction
Remarks:
• Under no break null γ = 0.• Pure structural change model: all coefficients change (γi 6= 0 for i =1, . . . , k)
• Partial structural change model: some coefficients change (γi 6= 0 for somei)
4.1 Chow’s Test with Known Break Date
Assume: m or λ is knownFor a data interval [r, . . . , s] such that s− r > k define
• βr,s = OLS estimate of β
• εr,s = OLS residual vector
• SSRr,s = ε0r,sεr,s = sum of squared residuals
Chow’s breakpoint test for testing H0 vs. H1 with m known is
Fn
³mn
´= Fn (λ) =
(SSR1,n − (SSR1,m + SSRm+1,n))/k
(SSR1,m + SSRm+1,n) /(n− 2k)The Chow test may also be computed as the F-statistic for testing γ = 0 fromthe dummy variable regression
yt = x0tβ +Dtx0tγ + εt
Dt = 1 if t > m; 0 otherwise
Under H0 : γ = 0 with m known
Fn(λ)d→ χ2(k)
4.1.1 Application: Simulated Data cont’d
(insert table here)
10
4.2 Quandt’s LR Test with Unknown Break Date
References:
1. Quandt, R.E. (1960). “Tests of Hypotheses that a Linear System ObeysTwo Separate Regimes,” Journal of the American Statistical Association,55, 324-330.
2. Davies, R.A. (1977). “Hypothesis Testing When a Nuisance Parameteris Present only Under the Alternative,“ Biometrika, 64, 247-254.
3. Kim, H.-J., and D. Siegmund (1989). “The Likelihood Ratio Test fora Change-Point in Simple Linear Regression,” Biometrika, 76, 3, 409-23.
Assume: m or λ is unknown.Quandt considered the LR statistic for testing H0 : γ = 0 vs. H1 : γ 6= 0
when m is unknown. This turns out to be the maximal Fn(λ) statistic over arange of break dates m0, . . . ,m1 :
QLR = maxm∈[m0,m1]
Fn
³mn
´= max
λ∈[λ0,λ1]Fn(λ)
λi =mi
n, i = 0, 1
Remarks
• Implicitly, the break data m and break fraction λ are estimated using
m = argmaxm
Fn
³mn
´λ = argmax
λFn (λ)
• Under the null, m defined under the alternative is not identified. This isan example of the “Davies problem”.
• Davies (1977) showed that that if parameters are unidentified under thenull, standard χ2 inference does not obtain.
Under H0 : γ = 0, Kim and Siegmund (1989) showed
QLR ⇒ supλ∈[λ0,λ1]
Bµk(λ)B
µk
λ(1− λ)Bµ
k(λ) = Wk(λ)− λWk(1) = Brownian Bridge
Remarks
• Distribution of QLR is non-standard and depends on the number of vari-ables k and the trimming parameters λ0 and λ1
11
• Critical values computed by simulation are given in Andrews (1993), andare larger than χ2(k) critical values:
5% critical valuesk χ2(k) QLR1 3.84 8.8510 18.3 27.03
4.2.1 Application: Simulated Data
Construction of F-statistics
(insert graphs here)
4.3 Andrew’s Tests with Unknown Break Date
Reference: Andrews, D.W.K. (1993). “Tests for Parameter Instability andStructural Change with Unknown Change Point,” Econometrica, 59, 817-858.
4.3.1 Application: AR(1) process with change in AR coefficientcont’d
4.4 Empirical Application
Reference: Stock and Watson (199?), “ ” Journal of Business and EconomicStatistics.
5 Day 2: Estimation of Models with One Struc-tural Change
Reference: Bai, J. (1994). “Least Squares Estimation of a Shift in LinearProcess,” Journal of Time Series Analysis
5.1 Abstract
• This paper considers a mean shift with an unknown shift point in a linearprocess and estimates the unknown shift point by the method of leastsquares.
• Pre-shift and post-shift means are estimated concurrently with the changepoint.
• The consistency and the rate of convergence for the estimated changepoint are established.
12
• The asymptotic distribution for the change point estimator is obtainedwhen the magnitude of shift is small. It is shown that serial correlationaffects the variance of the change point estimator via the sum of the coef-ficients of the linear process.
5.2 The Mean Shift Model
Consider a time series Yt that undergoes a mean shift at an unknown time:
Yt = µt +Xt, t = . . . ,−2,−1, 0, 1, 2, . . .Xt = a(L)εt
a(L) =∞Xj=0
ajLj , a(1) 6= 0
where
µt =µ1 if t ≤ k0µ2 if t > k0
and µ1, µ2 and k0 are unknown parameters and k0 is the change point.Estimation problem: estimate µ1, µ2 and k0 given T observations on Yt. IfXt
has an ARMA representation then it is also of interest to estimate the ARMAparameters. Define
k0 = [τT ], 0 < τ < 1
λ = µ2 − µ1
5.3 Results for Model with Known Break Date
If k = k0 were known, then µ1 and µ2 could be consistently estimated by leastsquares ignoring the dynamics in Xt by solving
minµ1,µ2
(k0Xt=1
(Yt − µ1)2 +TX
t=k0+1
(Yt − µ2)2)
Then µ1 and µ2 are consistent with limiting distributions
T 1/2(bµ1 − µ1) d−→ N©0, τ−1a(1)2σ2
ªT 1/2(bµ2 − µ2) d−→ N
©0, (1− τ)−1a(1)2σ2ª
5.4 Least Squares Estimation
The least squares (LS) estimator bk of the change point k0 ignoring the dynamicsin Xt is defined as
bk = argmink
"minµ1,µ2
(kX
t=1
(Yt − µ1)2 +TX
t=k+1
(Yt − µ2)2)#
bτ = bk/T13
i.e., the shift point is estimated by minimizing the sum of squares of residualsamong all possible sample splits ignoring the dynamics in Xt. The LS residualis defined as bXt = Yt − bµ1 − (bµ2 − bµ1)I[t>bk]where I[·] is the indicator function.The following assumptions are made:
• Assumption A:
εt ˜ iid(0, σ2) or εt ˜ mds(0, σ
2)
• Assumption B:∞Xj=0
j|aj| <∞
5.4.1 Consistency of bτProposition 1 Theorem 2 Under Assumptions A and B, the estimator bτ sat-isfies
|bτ − τ | = Op
µ1
Tλ2
¶.
so that τ is a consistent estimator of τ .
Since k = [τT ] it follows that
bk − k = Op(λ−2)
so that bk is not a consistent estimator for k.5.5 Limiting Distribution of Break Fraction
• To construct a managable asymptotic distribution for bτ it is necessary toassume that λ depends on T and diminishes as T increases.
• If λ is kept fixed independent of T then the limiting distribution of bτdepends on the distribution of εt and on λ in a complicated way.
• The asymptotic distribution for bτ is used to construct an asymptotic con-fidence interval for τ or k.
The dependence of λ on T is functionalized asAssumption C:
λT −→ 0,T 1/2λT(logT )1/2
−→∞.
14
Theorem 3 Under Assumptions A, B and C, for every M <∞,
Tλ2T (bτ − τ) d−→ a(1)σ2 argmaxv
½W (v)− 1
2|v|¾
and W (v) is a two-sided Brownian motion on <.
Remarks:
• τ converges to τ at rate T ⇒ τ is super-consistent
• The asymptotic distribution of τ is non-standard (not normal)• Scale of limiting distribution depends on autocorrelation in Xt throughσ2a(1)
• Confidence intervals for τ may be computed from limiting distribution
• Asymptotic distribution may not be accurate if λ = µ2 − µ1 is large
5.6 Limiting Distribution of Estimated Shift Coefficients
Proposition 4 Theorem 5
T 1/2(bµ1 − µ1) d−→ N©0, τ−1a(1)2σ2
ªT 1/2(bµ2 − µ2) d−→ N
©0, (1− τ)−1a(1)2σ2ª
• Since bτ converges at rate T the limiting distribution of bµ1 and bµ2 are thesame as if k0 were known
5.7 Simulation Results
The simulation model is
Yt = µ+ λI[t≥k0] +Xt, t = 1, . . . , T
Xt = ρXt−1 + εt + θεt−1εt ˜ iid N(0, 1)
where T = 100, k0 = 0.5T, λ = 2, ρ = −0.6, 0.0, 0.6 and θ = −0.5, 0.5. The mainresults based on N = 100 simulations are:
• For fixed θ, the range of bk becomes larger as ρ varies from −0.6 to 0.6.• The range of bk is smaller for θ = −0.5 than for θ = 0.5 for every given ρas predicted by theory.
• The ignored dynamics of the error does not have much effect on the pointestimates of the break date.
15
6 Estimation of Models with Multiple Struc-tural Change
6.1 Estimating Multiple Breaks One-at-a-Time
Reference: Bai, J. (1997). “Estimating Multiple breaks One at a Time,” Econo-metric Theory.
• Sequential (one-by-one) rather than simultaneous estimation of multiplebreaks is investigated.
• The number of least squares regressions required to compute all of thebreak points is of order T, the sample size.
• Each estimated break point is shown to be consistent for one of the trueones despite underspecification of the number of breaks.
• The estimated break points are shown to be T−consistent, the same rateas the simultaneous estimation.
• Unlike simultaneous estimation, the limiting distributions are generallynot symmetric and are influenced by regression parameters of all regimes.
• A simple method is introduced to obtain break point estimators that havethe same limiting distributions as those obtained via simultaneous esti-mation.
• A procedure is proposted to consistently estimate the number of breaks.
6.1.1 The Model
All of the results can be determined from the simple two break model
Yt = µ1 +Xt, if t ≤ k01Yt = µ2 +Xt if k
01 + 1 ≤ t ≤ k02
Yt = µ3 +Xt if k02 + 1 ≤ t ≤ T
where µi is the mean of regime i and Xt is a linear process of martingale differ-ences such that Xt = a(L)εt and k
01 and k
02 are the unknown break points.
6.1.2 Estimation of One Break
The idea of sequential estimation is to consider one break at a time: that is, themodel is treated as if there were only one break point and this break point isestimated using least squares. The break point is estimated by minimizing thesum of squared residuals allowing for one break
ST (k) =kX
t=1
(Yt − Y k)2 +
TXt=k+1
(Yt − Y ∗k)2
16
where Y k is the mean of the first k observations and Y∗k is the mean of the last
T − k observations. The minimization is achieved by searching over all possiblebreak points and picking the break point that gives the smallest ST (k). The
break point estimator is denoted bk.In addition to the break date, the break fraction τ = k/T is of interest. The
estimated break fraction is bτ = bk/T.6.1.3 Asymptotic Results
• In a multiple break model, bτ is T−consistent for one of the true breaksτ0i = k0i /T
• However, bk is not consistent for any k0i .• If the first break dominates in terms of the relative span of regimes andthe magnitude of shifts, i.e.
τ01τ02(µ1 − µ2)2 >
1− τ021− τ01
(µ2 − µ3)2,
then bτ −→ τ01. Otherwise, bτ −→ τ02.
• bτ is T−consistent for one of the true breaks.• If the first and second breaks are equivalent in terms of the relative span ofregimes and the magnitude of shifts then bτ converges to a random variablewith mass only on τ01 and τ
02.
6.1.4 Sequential Estimation
• When bτ = bk/T is consistent for τ01, an estimate for τ02 can be obtained byapplying the same technique to the subsample [bk, T ]. Let bk2 denote theresulting estimator. Then bτ2 = bk2/T is T−consistent for τ02 because in thesubsample [bk, T ], k02 is the dominating break and the limiting distributionof bk2 is the same as in the single break model.
• For fixed magnitudes of shifts the limiting distributions depend on theunknown distribution of the data and on the unknown magnitudes of theshifts. To get limiting distributions that are invariant the shifts need toshrink as the sample size increases.
• The limiting distribution from sequential estimation is not symmetricabout zero in general. In particular, if µ2 − µ1 and µ3 − µ2 have the
same sign then the distribution of bk will have a heavy right tail, reflect-ing a tendency to overestimate the break point relative to simultaneousestimation.
17
6.1.5 Repartition
In general the sequential estimation method has a tendency to either under orover-estimate the true location of a break point when there are multiple breaks.A simple re-estimation method called repartition can eliminate this asymmetryfrom the asymptotic distribution and works as follows:
• Start with T− consistent estimates of ki such as from sequential estimation.Call these bki.
• Reestimate k1 using the subsample [1,bk2] and reestimate bk2 using thesubsample [bk1, T ]. Denote the resulting estimators bk∗1 and bk∗2 .
• Because bki is close to k0i the subsample [k0i−1, k0i+1] is effectively used
to estimate k0i and sobk∗i is T− consistent with a limiting distribution
equivalent to that for a single break model (or for a model with multiplebreaks estimated by the simultaneous method)
6.1.6 More than Two Breaks
The general multiple break model with m breaks is
Yt = µ1 +Xt, if t ≤ k01Yt = µ2 +Xt if k
01 + 1 ≤ t ≤ k02
...
Yt = µm+1 +Xt if k0m + 1 ≤ t ≤ T.
• A subsample [k, l] is said to contain a nontrivial break point if both kand l are bounded away from a break point for a positive fraction ofobservations. Assuming knowledge of the number of breaks as well as theexistence of a nontrivial break in a given subsample, all the breaks can beidentified and all the estimated break fractions are T−consistent.
• In practice, a problem arises immediately as to whether a subsample con-tains a nontrivial break, which is clearly related to the determination ofthe number of breaks. It is suggested that the decision be made based ontesting the hypothesis of parameter constancy for the subsample using asup-Wald test. Such a decision rule leads to a consistent estimate of thenumber of breaks.
• The number of breaks is determined using a squential estimation procedurecoupled with hypothesis testing.
— First test the entire sample for parameter constancy using the sup-Wald test.
18
— If parameter constancy is rejected identify the first break as the valuethat maximizes the sup-Wald statistic. When the first break is iden-tified, the whole sample is divided into two subsamples with the firstsubsample consisting of the first bk observations and the second sub-sample consisting of the rest of the observations.
— Then use the sup-Wald test on the two subsamples and estimate abreak date on the subsample where the test fails. Divide the corre-sponding subsample in half at the new break date and continue withthe process.
— Stop when the sup-Wald test does not reject on any subsample. Thenumber of break points is equal to the number of subsamples minus1.
• Bai and Perron (1997) suggest an alternative but related procedure todetermine the number of breaks.
• Yet another procedure to select the number of breaks is to use the SchwarzBIC. This has be advocated by Yao (1988), Kim and Maddala (1991) andNunes et. al. (1997), Lui, Wu and Ziduck (2000) and Wang and Zivot(2001)
6.1.7 Simulation Results
The basic simulation model is the three break (4 regime) in mean model:
Yt = µ1 +Xt, t ≤ k01Yt = µ2 +Xt, k
01 + 1 ≤ t ≤ k02
Yt = µ3 +Xt, k02 + 1 ≤ t ≤ k03
Yt = µ4 +Xt, k03 + 1 ≤ t ≤ T
The design parameters are
µ = (1.0, 2.0, 1.0, 0.0)0: design 1= (1.0, 2.0,−1.0, 1.0)0: design 2= (1.0, 2.0, 3.0, 4.0)0: design 3
T = 160
k = (40, 80, 120)0
Xt ˜ N(0, 1)
and the number of simulations is 5, 000. In design 1, the magnitude of the breaksare the same. In design 2, the middle break is the largestExperiment 1: Estimate the break point assuming the number of breaks is
known using sequential, repartition and simultaneous estimation methods. Usedesigns I and II only
19
• For design I, the distribution of the three break points are similar since themagnitude of the breaks are identical. The distribution of sequential esti-mates is slightly asymmetric whereas the distributions of the repartitionand simultaneous estimates are symmetric and almost identical.
• For design II, the distribution of the three break points are not identicalsince the middle break is the most pronounced. For all estimation meth-ods, the middle break has the most concentrated distribution followedby the third and then first break. For the sequential methods, the firstand third breaks have the same distribution as the simultaneous estima-tion. The middle break has an asymmetric distribution for the sequentialmethod and the asymmetry is removed by repartition.
Experiment 2: Determine the number of breaks using the sequential methodwith hypothesis testing and the Schwarz BIC model selection criterion.
• The sequential method has a tendency to underestimate the true numberof breaks. The problem is caused in part by the inconsistent estimationof the error variance (for the no structural change test) in the presence ofmultiple breaks. When multiple breaks exist and only one is allowed inestimation, the error variance cannot be consistently estimated, is biasedupward and thus decreases the power of the structural change test.
• The problem of the bias in the estimation of the error variance can beovercome using a two-step method. In the first step, a consistent (or lessbiased) estimate for the error variance is obtained by allowing more breaks.Let m (= 4 in the simulations) denote the fixed number of breaks imposedfor the purpose of estimating the error variance. The error variance can beestimated via simultaneous or the “one additional” sequential procedure.In the second step, the number of breaks is determined by the sequentialprocedure coupled with hypothesis testing where the structural changetest usings the first step estimation of the error variance.
• For design I, BIC does better at determining the true number of breaksthan the two-step method and the two-step method often only finds 1break.
• For design II, the two methods are comparable.• For design III, the two-step method outperforms BIC.• The two-step method can be improved by using the Bai and Perron supF (l)-test for testing multiple breaks instead of the Andrews sup-Wald testfor a single break.
6.2 Estimating Linear Models with Multiple StructuralChange
Reference: Bai, J. (1997). “Estimation of a Change Point in Multiple Regres-sion,” Review of Economics and Statistics.
20
• This paper studies the least squares estimation of change pointz in linearregression models
• Extends the results of Bai (1997) Econometric Theory to linear regressionmodels
• The model allows for lagged dependent variables and trending regressors.• The error process can be dependent and heteroskedastic.• For nonstationary regressors or disturbances the asymptotic distributionis shown to be skewed.
• The analysis applies to both pure and partial changes.• A sequential method for estimating multiple breaks is described as well asmethods for constructing confidence intervals.
6.3 Estimation of Multiple Breaks Simultaneously
References:
1. Bai, J. and P. Perron (1998). “Estimating and Testing Linear Models withMultiple Structural Changes,” Econometrica, 66, 47-78
2. Bai, J. and P. Perron (2002). “Computation and Analysis of MultipleStructural Change Models,” Journal of Applied Econometrics.
6.3.1 Application: Exchange Rate Regressions
Reference: Sakoulis, G. and E. Zivot (2001). “Time-Variation and StructuralChange in the Forward Discount: Implications for the Forward Rate Unbiased-ness Hypothesis,” unpublished manuscript, Department of Economics, Univer-sity of Washington.
7 Estimation of Time Varying Parameter Mod-els
References:
1. Durbin, J. and S.-J. Koopman (2001). Time Series Analysis by StateSpace Methods. Oxford University Press, Oxford
2. Koopman, S.-J., N. Shephard, and J.A. Doornik (2001). “Statisti-cal Algorithms for State Space Models Using SsfPack 2.2,” EconometricsJournal, 2, 113-166.
21
7.1 Linear Gaussian State Space Models
αt+1m×1
= dtm×1
+ Ttm×m
· αtm×1
+Htm×r
· ηtr×1
ytN×1
= ctN×1
+ ZtN×m
· αtm×1
+ GtN×N
· εtN×1
where t = 1, . . . , n and
α1 ∼ N(a,P),ηt ∼ iid N(0, Ir), εt ∼ iid N(0, IN )
E[εtη0t] = 0
Compact notation used by SsfPack
µαt+1
yt
¶= δt
(m+N)×1+ Φt(m+N)×m
· αtm×1
+ ut(m+N)×1
,
α1 ∼ N(a,P)
ut ∼ iid N(0,Ωt)
where
δt =
µdt
ct
¶, Φt =
µTt
Zt
¶, ut =
µHtηt
Gtεt
¶,
Ωt =
µHtH
0t 0
0 GtG0t
¶Initial value parameters
Σ =
µPa0
¶Note: For multivariate models, i.e. N > 1, GtG0
t is assumed diagonal.
7.2 Initial Conditions
Initial state variance is assumed to be of the form
P = P∗ + κP∞κ = 107
P∗ is for stationary state componentsP∞ is for non-stationary state components
Note: When some elements of state vector are nonstationary, the SsfPackalgorithms implement an “exact diffuse prior” approach.
22
7.3 Regression Model with Time Varying Parameters
Time varying parameter regression
yt = β0,t + β1,txt + νt, νt ∼ N(0, σ2ν)
β0,t+1 = β0,t + ξt, ξt ∼ N(0, σ2ξ)
β1,t+1 = β1,t + ςt, ςt ∼ N(0, σ2ς )
Let αt = (β0,t, β1,t)0, xt = (1, xt)0, Ht = diag(σξ, σς)
0 and Gt = σν . The statespace form is µ
αt+1
yt
¶=
µI2x0t
¶αt +
µHηt
Gεt
¶and has parameters
Φt =
µI2x0t
¶, Ω =
σ2ξ 0 0
0 σ2ς 00 0 σ2ν
The initial state matrix is
Σ =
−1 00 −10 0
7.4 Kalman Filter and Smoother
The Kalman filter is a recursive algorithm for the evaluation of moments ofthe normally distributed state vector αt+1 conditional on the observed dataYt = (y1, . . . , yt) and the state space model parameters. Let at = E[αt|Yt−1]and Pt = var(αt|Yt−1)
• The filtering or updating equations computeat|t = E[αt|Yt],
Pt|t = var(αt|Yt),
vt = yt−ct − Ztat (prediction error),
Ft = var(vt) (prediction error variance)
• The prediction equations of the Kalman filter compute at+1 and Pt+1
The Kalman smoothing algorithm is a backward recursion which computesthe mean and variance of specific conditional distributions based on the fulldata set Yn = (y1, . . . , yn).
• The smoothed estimates of the state vector αt and its variance matrix aredenoted
αt = at|n = E[αt|Yn]
Pt|n = var(αt|Yn)
23
The smoothed estimate αt is the optimal estimate of αt using all availableinformation Yn.
• The smoothed estimate of the response yt and its variance are computedusing
yt = ct+Ztαt
var(yt|Yn) = Ztvar(αt|Yn)Z0t
• The smoothed disturbance estimates are the estimates εt and ηt based onall available information Yn, and are denoted
εt = εt|n = E[εt|Yn]
ηt = ηt|n = E[ηt|Yn]
7.5 Prediction Error Decomposition of Log-Likelihood
The prediction error decomposition of the log-likelihood function for the un-known parameters ϕ of a state space model is
lnL(ϕ|Yn) =nX
t=1
ln f(yt|Yt−1;ϕ)
= −nN2ln(2π)− 1
2
nXt=1
¡ln |Ft|+ v0tF−1t vt
¢where f(yt|Yt−1;ϕ) is a conditional Gaussian density implied by the state spacemodel and the vector of prediction errors vt and prediction error variance ma-trices Ft are computed from the Kalman filter recursions.
7.6 Example: Testing the forward rate unbiasedness hy-pothesis
24