Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Conditional Threshold Autoregression (CoTAR)
Kaiji Motegi∗ John W. Dennis† Shigeyuki Hamori‡
Kobe University IDA Kobe University
November 10, 2021
Abstract
We propose a new time series model where the threshold is specified as
an empirical quantile of recent observations of a threshold variable. The
resulting conditional threshold traces the fluctuation of the threshold vari-
able, which can enhance the fit and interpretation of the model. In the pro-
posed conditional threshold autoregressive (CoTAR) model, the existence
of threshold effects can be tested by wild-bootstrap tests which incorporate
all possible values of nuisance parameters. The estimation and hypoth-
esis testing of the CoTAR model satisfy desired statistical properties in
both large and small samples. We fit the CoTAR model to new confirmed
COVID-19 cases in the U.S. and Japan. Significant conditional threshold
effects are detected for both countries, and the implied persistence struc-
tures are consistent with the fact that the number of new confirmed cases
in the U.S. is larger than in Japan.
JEL codes: C22, C24, C51.
Keywords: COVID-19, Nonlinear time series analysis, Profiling estimation, Regime
switch, Self-exciting threshold autoregression (SETAR), Wild bootstrap.
∗Corresponding author. Graduate School of Economics, Kobe University. Address: 2-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan. E-mail: [email protected]
†Institute for Defense Analyses (IDA). Research results and conclusions expressed are those ofthe authors and do not necessarily reflect the views of IDA. E-mail: [email protected]
‡Graduate School of Economics, Kobe University. E-mail: [email protected]
1
1 Introduction
It is well known that a time series often has heterogeneous properties below versus
above a certain threshold. Such nonlinearities are referred to as threshold effects, and
there is a vast literature on modeling and testing them. One of the most well-known
models in this field is the threshold autoregression (TAR) proposed by Tong (1978).
In the TAR model, a target series y follows AR(p) processes with coefficients being
different across regimes, and a regime switch is triggered when a threshold variable
x crosses a constant threshold parameter µ.1 Hansen (2000) proposed the threshold
regression (TR) model to unify various forms of threshold effects. The TR model is
designed for both cross-section and time series data, and includes TAR as a special
case. The threshold parameter of TR is specified as a constant as in TAR.
This paper proposes an alternative model where the threshold is specified as an
empirical quantile of recent observations of x. More precisely, the regime at time t is
determined by whether xt−d < µt−d−1(c) or xt−d ≥ µt−d−1(c), where the conditional
threshold µt(c) is the 100c-percentile of {xt, xt−1, . . . , xt−m+1}. The memory size m is
chosen by the researcher, while the delay parameter d and the percentile parameter c
can be estimated from data. The conditional threshold represents a “normal” level of
recent x if c is around 0.5 and an “abnormal” level if c is far from 0.5. The conditional
threshold traces the fluctuation of x, which can enhance the fit and interpretation of
the model.
Time-varying or state-dependent threshold models can be categorized into five
groups, depending on the specifications of thresholds. First, Bessec (2003) generalizes
the self-exciting TAR model of Balke and Fomby (1997) by replacing constant thresh-
olds with time-dependent but deterministic thresholds. Second, the threshold may be
specified as a linear combination of observed covariates; the TR model is extended
in this direction by Seo and Linton (2007) and Yu and Fan (2021); the smooth tran-
sition autoregressive (STAR) model (e.g., Granger and Terasvirta, 1993) is extended
by Dueker, Psaradakis, Sola, and Spagnolo (2013); the regression kink (RK) model
of Hansen (2017) is extended by Yang and Su (2018). Third, the threshold may be
assumed to follow a latent AR process; STAR is extended in this direction by Dueker,
Owyang, and Sola (2010); TR is extended by Zhu, Chen, and Lin (2019). Fourth,
Yang, Lee, and Chen (2021) generalize TR by applying a Fourier approximation to the
1 See Tong and Lim (1980), Chen and Tsay (1991), and Liu and Susko (1992) for early contribu-tions on the statistical properties of TAR. See Chen, So, and Liu (2011), Hansen (2011), Tong (2011,2015), and Tsay and Chen (2019) for extensive surveys on TAR.
2
threshold. Fifth, Motegi, Cai, Hamori, and Xu (2020, MCHX2020) add time-varying
threshold effects to the heterogeneous autoregressive (HAR) model of Corsi (2009),
where the thresholds are sample averages of recent observations of x for each sam-
pling frequency. A novel contribution of MCHX2020 is that the empirical conditional
moment of x is used as a threshold for the first time in the literature.
Of these existing models, the conditional average approach of MCHX2020 is clos-
est to the proposed CoTAR approach. An advantage of our specification is that the
threshold can be an “abnormal” level of x by setting c to be close to 0 or 1, whereas
the threshold of MCHX2020 is restricted to be an average level of x. Further, our
framework is more general because we do not impose the HAR restriction. We also es-
tablish asymptotic properties of the estimation and hypothesis testing of the CoTAR
model, while MCHX2020 reported simulation and empirical evidence only.
Statistical inference of CoTAR is analogous to TAR; a key issue is the identifi-
cation of nuisance parameters γ = (d, c)⊤. The CoTAR model contains regression
parameters (β1,β2), where βr signifies the vector of the intercept and AR parame-
ters in regime r ∈ {1, 2}. When β1 = β2, threshold effects are absent and γ is not
identified. When β1 = β2, threshold effects are present and γ is identified. We adopt
a two-step method called profiling to estimate (β1,β2) and γ. Asymptotic properties
of the profiling estimator and the hypothesis testing of regression parameters (e.g.,
testing the individual zero restriction) depend crucially on whether γ is identified or
not.2
Testing the null hypothesis of no threshold effects (i.e., β1 = β2) requires a non-
standard testing procedure, since γ is not identified under the null. To establish
asymptotically valid tests, we apply Hansen’s (1996) wild-bootstrap tests which in-
corporate all possible values of γ.3 Using the bootstrap test of the no-threshold-effect
hypothesis as a pre-test, we test a general linear parametric restriction as a post-test.
To control the size of the post-test, we adopt the identification category selection
procedure of Andrews and Cheng (2012). Under some regularity conditions, both the
pre-test and the post-test attain asymptotic validity. Further, we show via Monte
Carlo simulations that the proposed tests have sharp size and high power in small
samples.
We fit the CoTAR model to daily new confirmed COVID-19 cases per million peo-
2 See Chan (1993) and Chan and Tsay (1998) for early contributions on the estimation of TAR.3 Also see Davies (1977, 1987), Andrews and Ploberger (1994), Stinchcombe and White (1998),
Hansen (2000), Gonzalo and Wolf (2005), Andrews and Cheng (2012, 2013), Elliott, Muller, andWatson (2015), McCloskey (2017), and Hill (2021).
3
ple in the United States (U.S.) and Japan. The threshold variable is chosen to be the
target variable itself, making it a self-exciting CoTAR model. Significant conditional
threshold effects are detected for both countries, and the implied persistence struc-
tures explain —from a time series standpoint— why the number of new confirmed
cases per million people in the U.S. is larger than in Japan. On average, the decelera-
tion regime lasts one day shorter in the U.S. than in Japan, while the duration of the
acceleration regime is roughly the same between the two countries. In view of these
empirical findings, a potential policy implication requiring further investigation is that
the U.S. could more efficiently combat the pandemic by more strongly encouraging
safety when the pandemic is decelerating.
The rest of this paper is organized as follows. In Section 2, the CoTAR model is
proposed formally. In Section 3, the procedures of estimation and hypothesis testing
are described. In Section 4, the asymptotic properties of the proposed methods are
derived. In Section 5, we evaluate the finite sample performance of CoTAR via Monte
Carlo simulations. In Section 6, the empirical application on daily new confirmed
COVID-19 cases is presented. Some concluding remarks are provided in Section 7.
Omitted technical details are collected in Appendices. Extra simulation results are
provided in the separate supplemental material.
2 Conditional threshold autoregression
2.1 Motivation and specification
Let yt and xt be a target variable and a threshold variable at time t ∈ {1, . . . , n},respectively. As a benchmark, consider a threshold autoregressive (TAR) model with
two regimes:
yt =
α1 +∑p
k=1 ϕ1kyt−k + ut if xt−d < µ,
α2 +∑p
k=1 ϕ2kyt−k + ut if xt−d ≥ µ,(1)
where (αr, ϕr1, . . . , ϕrp) are regression parameters in regime r ∈ {1, 2}; d is the delay
parameter; µ is the threshold parameter. Usual conditions on the error term ut
include a martingale difference sequence (i.e., E(ut |ut−1, ut−2, . . . ) = 0) and finite
second moment (i.e., E(u2t ) = σ2 < ∞); the assumptions on ut are defined more
precisely in Section 4. When the threshold variable is the target variable itself (i.e.,
xt = yt), (1) is called the self-exciting TAR (SETAR) model.
A key feature of (1) is that y has different autocorrelation structures below versus
4
above the unconditional threshold µ. The term “unconditional” means that µ is time-
independent and chosen from the entire memory X n1 = {x1, . . . , xn}. We propose an
alternative approach of replacing µ with a conditional threshold µt which is time-
dependent and chosen from a local memory X tt−m+1 = {xt−m+1, . . . , xt}. Specifically,
we propose the conditional threshold autoregressive (CoTAR) model:
yt =
α1 +∑p
k=1 ϕ1kyt−k + ut if xt−d < µt−d−1(c),
α2 +∑p
k=1 ϕ2kyt−k + ut if xt−d ≥ µt−d−1(c),(2)
where the conditional threshold µt(c) is the mc-th smallest value (or equivalently
the 100c-percentile) of X tt−m+1; m = #X t
t−m+1 is the size of the local memory; c ∈{1/m, 2/m, . . . , 1} signifies the relevant percentile, where the possible values c can
take are restricted to be discrete so that mc ∈ {1, . . . ,m}. When xt = yt, we call (2)
the self-exciting CoTAR (SE-CoTAR) model.
A unique feature of CoTAR lies in the specification of the conditional threshold
µt−d−1(c). If m = 1, then c = 1 by construction and hence µt−d−1(c) = xt−d−1. In
this special case, (2) reduces to (1) with the threshold variable ∆xt−d = xt−d−xt−d−1
and the known threshold µ = 0. If m is an odd number and c = (2m)−1(m+1), then
µt−d−1(c) coincides with the median of X t−d−1t−d−m. In this case, regime 1 arises when
x is below the “normal” level given the local memory, while regime 2 arises when
x is above it. A similar interpretation holds when m is an even number, although
µt−d−1(c) would not be exactly equal to the median of X t−d−1t−d−m.
If the value of c is close to the lower bound 1/m or the upper bound 1, then a
regime switch is triggered by a rare event of x crossing an “abnormal” level given
the local memory. Suppose, for example, that a SE-CoTAR model with m = 14 and
c = 11/14 = 0.786 is fitted to daily changes in new confirmed COVID-19 cases, where
weekday effects are assumed to be smoothed out. Then, the conditional threshold
is the 78.6-percentile of the recent 14-day observations of the changes in new cases,
hence regime 2 represents an extremely serious phase of pandemic (in the relative
sense). The conditional threshold approach is intuitively reasonable, since individuals
seem to evaluate the current status of pandemic relative to the recent past, not to a
constant cut-off value. Similar arguments might well apply to other variables such as
asset price, economic growth, and public debt.
One might be tempted to specify the conditional threshold as a conditional av-
erage of x (i.e., µt = m−1∑m−1
ℓ=0 xt−ℓ); indeed, MCHX2020 took this approach in
the HAR framework. This is an intuitively plausible specification, and there is a
5
computational advantage that the percentile parameter c disappears. A possible lim-
itation, however, is that the threshold level cannot be an “abnormal” level unlike the
conditional quantile specification with c being away from 0.5.
A more general specification than the simple average would be a weighted average:
µt =∑m−1
ℓ=0 wℓxt−ℓ with wℓ ≥ 0 and∑m−1
ℓ=0 wℓ = 1. If the entire weighting scheme w =
(w0, . . . , wm−1)⊤ is estimated without any restrictions, then parameter proliferation
is likely to occur. If w is given or tightly parameterized, then there is a higher
risk of misspecification. Further, whether w is estimated or not, an identification
problem arises since two distinct weighting schemes can lead to an identical profile
of regimes for all time periods. Hence, the proposed conditional quantile approach is
better balanced between flexible specification and practical implementation than the
conditional average approaches.
2.2 Matrix representation
For elaborating the statistical properties of CoTAR, it is of use to rewrite (2) in a
matrix form. First, stack the parameters as
β1 =
α1
ϕ11
...
ϕ1p
, β2 =
α2
ϕ21
...
ϕ2p
, β︸︷︷︸
K×1
=
β1
β2
, γ =
dc
, θ =
βγ
,
where K = 2(p+1). The target parameter vector θ is partitioned into the regression
parameters β and the nuisance parameters γ. One or both elements of γ could be
pre-specified by the researcher, but this paper estimates both of them in order to avoid
misspecification. To focus on the estimation of θ, we sidestep lag selection issues by
assuming that (p,m) are known.
Second, define binary variables which determine the regime:
I1t(c) = 1 {xt < µt−1(c)} , I2t(c) = 1 {xt ≥ µt−1(c)} , (3)
where 1(A) is the indicator function which equals 1 if event A occurs and 0 otherwise.
6
Trivially,∑2
r=1 Irt(c) = 1 for any c and t. Using (3), stack the regressors as
zt−1︸︷︷︸(p+1)×1
= (1, yt−1, . . . , yt−p)⊤, Zt−1(γ)︸ ︷︷ ︸
K×1
=
zt−1I1,t−d(c)
zt−1I2,t−d(c)
. (4)
Then, (2) can be rewritten as a single equation:
yt = Zt−1(γ)⊤β + ut. (5)
3 Statistical inference of CoTAR
In this section, we describe the procedures of the estimation and testing of CoTAR.
In Section 3.1, the profiling estimation of β and γ is proposed. In Section 3.2, wild-
bootstrap tests for general linear restrictions of β are described. In Section 3.3, we
propose sequential tests where the no-threshold-effect hypothesis is tested at the pre-
test and then other linear restrictions (e.g., the individual zero restriction of β) are
tested at the post-test.
3.1 Profiling estimation
The CoTAR model (2) can be estimated in a similar manner with the TAR model
(1). Let B ⊆ RK be the space of β. The space of the delay parameter d is defined as
D = {d, . . . , d}, where the lower and upper bounds are pre-specified by the researcher.
Taking the memory size m as given, the largest possible space of c is given by C =
{1/m, 2/m, . . . , 1}. Let δr(c) = n−1∑n
t=1 Irt(c) be the share of regime r ∈ {1, 2} to
the entire sample. For some c ∈ C, δr(c) may be too small to identify both regimes
in finite samples. A practical compromise often made in the TAR literature is to
restrict the parameter space so that both regimes account for at least 15% of the
entire sample:4
C ={c ∈ C | min{δ1(c), δ2(c)} > 0.15
}. (6)
The space of γ = (d, c)⊤ is defined as Γ = D × C, where × signifies the Cartesian
product. The dimension of Γ is finite by construction, which simplifies the derivation
of asymptotic properties. Finally, the space of θ is given by Θ = B × Γ.
Define the quadratic loss function L(θ) =∑n
t=1{yt−Eθ(yt | yt−1, . . . , y1)}2, where
4 The practice of using the cut-off value 15% originates in a suggestion by Andrews (1993).
7
Eθ(yt | yt−1, . . . , y1) is the conditional expectation of yt given θ and the past observa-
tions {yt−1, . . . , y1}. The least squares (LS) estimator for θ is defined as the minimizer
of the quadratic loss function:
θ = argminθ∈Θ
L(θ). (7)
The LS estimator θ can be computed via a two-step approach called profiling.
Fixing γ ∈ Γ, the regressors Zt−1(γ) in (5) can be computed from data; hence the
LS estimator for β conditional on γ can easily be computed by
β(γ) =
{n∑
t=1
Zt−1(γ)Zt−1(γ)⊤
}−1{ n∑t=1
Zt−1(γ)yt
}. (8)
The resulting residual is given by
ut(γ) = yt −Zt−1(γ)⊤β(γ). (9)
Since Γ is a finite parameter space by construction, the optimization problem (7)
is equivalent to a two-step optimization problem which minimizes the conditional
quadratic loss function given γ and then chooses an optimal γ that delivers the
smallest conditional loss function. Specifically, the LS estimator for γ coincides with
γ = argminγ∈Γ
n∑t=1
ut(γ)2. (10)
Substitute (10) into (8) to compute β = β(γ), resulting in the LS estimator θ =
(β⊤, γ⊤)⊤.
3.2 Bootstrap tests for linear restrictions
Consider testing linear restrictions with respect to the regression parameters β in (5):
H0 : Rβ = q, H1 : Rβ = q, (11)
where R is an h×K pre-specified selection matrix of full row rank and q is an h× 1
pre-specified vector; h is the number of restrictions. An important special case of (11)
8
is the no-threshold-effect hypothesis:
H∗0 : β1 = β2, H∗
1 : β1 = β2. (12)
Clearly, H∗0 is a special case of H0 with the following choice of (R, q):
R = (Ip+1, −Ip+1) ≡ R∗︸︷︷︸(p+1)×K
, q = 0(p+1)×1 ≡ q∗, (13)
where Ip+1 is the identity matrix of dimension p + 1. Under H∗0 , (5) reduces to
the single-regime AR(p) which does not depend on the threshold variable x. Hence,
γ = (d, c)⊤ is not identified under H∗0 . In fact, H∗
0 is the only case where γ is not
identified; under H∗1 , γ is always identifiable.
In this section, we describe the procedure of wild-bootstrap tests for the general
linear restrictions H0 which may or may not coincide with the no-threshold-effect
hypothesis H∗0 . (In general, the researcher does not know whether H∗
0 is true or false,
and the asymptotic validity of the bootstrap tests for H0 depends crucially on the
truth of H∗0 . Sequential tests which address this dilemma are proposed in Section
3.3.) To proceed, it is of use to define some key quantities conditional on γ ∈ Γ. The
regression score associated with (5) is given by
st(γ) = Zt−1(γ)ut. (14)
Let ut be the LS residual from (5) with H0 being imposed, then the estimated regres-
sion score under H0 is given by
st(γ) = Zt−1(γ)ut. (15)
The estimated regression score under H1 is given by
st(γ) = Zt−1(γ)ut(γ), (16)
where ut(γ) is defined in (9).
The Wald test statistic with respect to (11) is given by
Wn(γ) = n{Rβ(γ)− q
}⊤ {RV n(γ)R
⊤}−1 {
Rβ(γ)− q}, (17)
where β(γ) is defined in (8). The heteroscedasticity-robust covariance matrix estima-
9
tor is given by
V n(γ) =Mn(γ)−1Sn(γ)Mn(γ)
−1, (18)
where
Sn(γ) =1
n
n∑t=1
st(γ)st(γ)⊤, Mn(γ) =
1
n
n∑t=1
Zt−1(γ)Zt−1(γ)⊤. (19)
Similarly, the Lagrange multiplier (LM) test statistic is given by
LMn(γ) = n{Rβ(γ)− q
}⊤ {RV n(γ)R
⊤}−1 {
Rβ(γ)− q}, (20)
where
V n(γ) =Mn(γ)−1Sn(γ)Mn(γ)
−1, Sn(γ) =1
n
n∑t=1
st(γ)st(γ)⊤. (21)
All possible values of γ in Wn(γ) can be incorporated in at least three common
ways:
supWn ≡ supγ∈Γ
Wn(γ) = maxγ∈Γ
Wn(γ), (22)
aveWn ≡ 1
#Γ
∑γ∈Γ
Wn(γ), (23)
expWn ≡ ln
[1
#Γ
∑γ∈Γ
exp
{Wn(γ)
2
}]. (24)
The asymptotic distributions of the test statistics (22)-(24) are non-standard in gen-
eral; hence, we adopt the wild bootstrap of Hansen (1996). Let g(Wn) denote either
supWn, aveWn, or expWn, then proceed as follows.
Step 1 For each b ∈ {1, . . . , B}, generate ξ(b)t
i.i.d.∼ N (0, 1) with t ∈ {1, . . . , n}.
Step 2 Compute a bootstrap test statistic g{W(b)n }, where
W(b)n (γ) = v(b)n (γ)⊤Mn(γ)
−1R⊤{RV n(γ)R
⊤}−1
RMn(γ)−1v(b)n (γ); (25)
v(b)n (γ) =1√n
n∑t=1
st(γ)ξ(b)t ; (26)
R, st(γ), V n(γ), and Mn(γ) are defined in (11), (16), (18), and (19),
10
respectively.
Step 3 Repeat Steps 1-2 independently, resulting in g{W(1)n }, . . . , g{W(B)
n }.
Step 4 Compute the bootstrap p-value:
pBn (H0) =1
B
B∑b=1
1[g{W(b)
n
}≥ g(Wn)
]. (27)
Reject H0 at the 100a% level if pBn (H0) < a, where a ∈ (0, 1) is the nominal
size.
The testing procedure is analogous when the Wald test is replaced with the LM
test. Use (15) and (21) to compute st(γ) and V n(γ), respectively. The transformed
LM test statistics are obtained via (22)-(24), where Wn(γ) is replaced with LMn(γ)
using (20). Steps 1-4 are executed with (25) and (26) being replaced with
LM(b)n (γ) = v(b)n (γ)⊤Mn(γ)
−1R⊤{RV n(γ)R
⊤}−1
RMn(γ)−1v(b)n (γ),
v(b)n (γ) =1√n
n∑t=1
st(γ)ξ(b)t .
3.3 Sequential tests for linear restrictions
In this section, consider testing the general linear restrictions H0 which are distinct
from the no-threshold-effect hypothesis H∗0 . In this case, we have that (R, q) =
(R∗, q∗), where (R, q) are defined in (11) and (R∗, q∗) are defined in (12).
If we knew that H∗0 were true, then H0 could be tested via the bootstrap test as
described in Section 3.2. If we knew that H∗0 were false, then H0 could be tested via
the asymptotic χ2 test. Taking the Wald test as an example, substitute (10) into (17)
to get the test statistic:
Wn ≡ Wn(γ) = n(Rβ − q
)⊤ (RV nR
⊤)−1 (
Rβ − q), (28)
where V n = V n(γ). Reject H0 at the 100a% level if Wn exceeds the upper 100a%
point of the chi-squared distribution χ2h, where the degrees of freedom coincide with
the number of restrictions, h.
In practice, whether H∗0 is true or false is unknown to the researcher. Since the
asymptotic distribution of β depends on whether γ is identified or not, asymptotically
11
valid tests of H0 must take into account both possibilities of identified and unidenti-
fied γ. Andrews and Cheng (2012) propose Identification Category Selection (ICS)
procedures in which a pre-test is performed to determine the identification category
of nuisance parameters (see also Andrews and Cheng, 2013, Hill, 2021). Adopting
the ICS procedure, we first perform the bootstrap test for H∗0 , and then test H0 in a
way that depends on the result of the pre-test. The following outlines this sequential
procedure.
Pre-test for the no-threshold-effect hypothesis H∗0 . In view of (13) and (17),
the conditional Wald test statistic corresponding to H∗0 is given by
W∗n(γ) = n
{R∗β(γ)− q∗
}⊤ {R∗V n(γ)(R
∗)⊤}−1 {
R∗β(γ)− q∗}. (29)
Incorporate all values of γ using (22), (23), or (24) to calculate supW∗n, aveW∗
n, or
expW∗n, respectively. Recall that the asymptotic distribution of these test statistics
are non-standard under H∗0 ; hence, we perform the bootstrap test outlined in Section
3.2, replacing R in (25) with R∗ defined in (13), and replacing Wn in (27) with W∗n
defined in (29). Reject H∗0 at the 100a1% level if pBn (H
∗0 ) < a1, where a1 ∈ (0, 1) is
the nominal size of the pre-test.
Post-test for linear restrictions H0. Perform the post-test for H0 as follows.
Case 1 If H∗0 is not rejected at the 100a1% level by the pre-test, assume that
γ is not identified and perform the bootstrap test for H0 as described in
Section 3.2. Following (27), denote the p-value associated with this test as
pBn (H0). Reject H0 at the 100a2% level if pBn (H0) < a2, where a2 ∈ (0, 1)
is the nominal size of the post-test.
Case 2 If H∗0 is rejected at the 100a1% level by the pre-test, do the following:
1. Compute pBn (H0) as in Case 1.
2. Perform the asymptotic χ2 test for H0, assuming that γ0 is identified; see
(28) for the Wald test statistic. Denote the p-value associated with this
test as pn,χ2(H0).
3. Compute the least favorable p-value: pn,lf (H0) = max{pBn (H0), pn,χ2(H0)}.Reject H0 at the 100a2% level if pn,lf (H0) < a2.
12
An intuition behind this algorithm is as follows. Under some regularity conditions,
the pre-test attains asymptotically correct size and power approaching 1 against any
deviation from H∗0 ; see Sections 4.2 and 4.3 for a formal proof. Given these asymptotic
properties, a non-rejection of H∗0 unambiguously motivates the use of the bootstrap
test for H0 (Case 1).
A rejection of H∗0 , by contrast, leaves two possibilities (Case 2). The first pos-
sibility is that H∗0 is true but rejected; this is the type-I error of the pre-test which
is guaranteed to occur with probability approaching a1. In this case, the use of the
bootstrap test for H0 is motivated. The second possibility is that H∗0 is false and
rejected, a correct decision which is guaranteed to occur with probability approaching
1. In this case, the use of the χ2 test for H0 is motivated. Since the researcher does
not know which of the two possibilities is true, the least favorable approach which
combines both the bootstrap test and the χ2 test is adopted.
4 Asymptotic theory
In this section, we derive key asymptotic properties of the estimation and hypothesis
testing of CoTAR. First, define xt(γ) = xt(d, c) = xt−d − µt−d−1(c) so that regime 1
arises at time t if xt(γ) < 0 and regime 2 arises if xt(γ) ≥ 0. A virtue of the CoTAR
specification is that xt(γ) is a measurable function of {xt−d−m, . . . , xt−d} and hence
the existing asymptotic theory of TAR can be applied almost directly (e.g., Chan,
1993, Hansen, 1996).
To proceed, some notation is introduced. Letp→ denote the convergence in prob-
ability, letd→ denote the convergence in distribution, and let ⇒ denote weak con-
vergence. Let zt = maxγ∈Γ[tr{Zt(γ)⊤Zt(γ)}]1/2 be the norm of the regressor matrix
given in (4). Define some population quantities:
V (γ1,γ2) =M (γ1)−1S(γ1,γ2)M (γ2)
−1, (30)
S(γ1,γ2) = E{st(γ1)st(γ2)
⊤} , M (γ) = E{Zt(γ)Zt(γ)
⊤} , (31)
where st(γ) = Zt−1(γ)ut as defined in (14). We will sometimes abbreviate V (γ) =
V (γ,γ) and S(γ) = S(γ,γ) when appropriate; they are the population counter-
parts of V n(γ) and Sn(γ) defined in (18) and (19), respectively. Impose some basic
assumptions which are analogous to Assumption 1 of Hansen (1996).
Assumption 1. (i) {yt, xt, ut} are strictly stationary and absolutely regular with mix-
13
ing coefficients η(k) = O(ν−k) for some ν > 1. (ii) For some ι > ν, E|zt|4ι < ∞ and
E|ut|4ι < ∞. (iii) infγ∈Γ det{M (γ)} > 0. (iv) xt(γ) has density function f(x) such
that supx f(x) < ∞.
Assumption 1 is standard in TAR literature. The mixing condition in item (i) controls
the degree of serial dependence. The rate of the mixing condition is set to be faster
than in Hansen (1996) so that the conditions of Chan (1993) are satisfied. Chan (1993)
shows that item (i) implies a strong form of geometric ergodicity for TAR processes;
the same implication holds for CoTAR processes.
Sufficient conditions for Assumption 1 include iid {ut} with finite 4ι-th moment
for some ι > 1 and the regime-specific stability condition (i.e., the roots of the char-
acteristic equation λp −∑p
k=1 λp−kϕrk = 0 lie strictly inside the unit circle for each
r ∈ {1, 2}). Such a regime-specific stability condition is generally stronger than needed
to ensure the ergodicity of TAR processes (Chan and Tong, 1985, Chen and Tsay,
1991). Further, Hansen (1996) notes that it is likely that a martingale difference con-
dition is sufficient in place of the iid condition. These observations apply to CoTAR
as well.
4.1 Profiling estimation
Let β0 and γ0 be the true values of β and γ, respectively. Assume that β0 is an interior
point of B and that γ0 ∈ Γ, where B and Γ are the parameter spaces constructed in
Section 3.1. Under H∗0 , β(γ) is consistent for β0 and asymptotically normal for any
fixed γ ∈ Γ. Further, under H∗0 , γ0 is not identified and γ is not consistent for γ0;
consequently, β = β(γ) is consistent for β0 but not asymptotically normal. Under
H∗1 , γ0 is identified and γ is super-consistent for γ0; consequently, β is consistent for
β0 and asymptotically normal. These results are summarized in Theorem 1.
Theorem 1. If Assumption 1 holds, then the following are true: i) Under H∗0 ,√
n{β(γ)−β0} ⇒ G(γ) where G(γ) is a mean zero Gaussian process with covariance
kernel V (γ1,γ2), defined in (30). ii) Under H∗0 , β(γ)
p→ β0 uniformly over γ ∈ Γ.
iii) Under H∗1 , γ − γ0 = Op(n
−1) and√n(β − β0)
d→ N{0,V (γ0)}.
The proof of Theorem 1 is provided in Appendix A.1. Our Assumption 1 implies
the assumptions of Chan (1993), hence the theorems of Chan (1993) can be used to
establish Theorem 1.(iii).
Different assumptions or estimators can result in different asymptotic distribu-
tions for γ under H∗1 (e.g., Chan, 1993, Hansen, 2000, Yang, Lee, and Chen, 2021).
14
Instead of deriving the asymptotic distribution of γ, we focus on proving the asymp-
totic validity of the proposed tests for linear parametric restrictions; see Sections 4.2
and 4.3.
4.2 Bootstrap tests for linear restrictions
Consider the wild-bootstrap test for the linear-restriction hypothesis H0, described in
Section 3.2, where H0 may or may not be distinct from H∗0 . When γ0 is not identified,
a key condition for the asymptotic validity of this test is the uniform convergence
of {β(γ)}γ∈Γ to β0, which has been established in Theorem 1.(ii). While we can
establish the asymptotic validity of the wild-bootstrap test when H∗0 is true, it is
generally not true that this test for H0 is asymptotically valid when H∗0 is false.
Theorem 2 summarizes these results for the Wald and LM tests with any of the three
transformations (22)-(24).
Theorem 2. If Assumption 1 holds, then the following are true: i) If H∗0 and H0 are
both true, the bootstrap p-value pBn (H0) defined in (27) is asymptotically uniform on
[0, 1]. ii) Under H1, pBn (H0)
p→ 0 as n → ∞ and B → ∞ when either H∗0 is true or
false.
The proof of Theorem 2 is provided in Appendix A.2. An intuitive reason why Theo-
rem 2 holds is as follows. Under H∗0 , the conditional Wald test statistic Wn(γ) in (17)
converges weakly to a chi-squared process over γ ∈ Γ; this is a direct implication from
the uniform asymptotic normality of β(γ) established in Theorem 1.(i). In general,
the asymptotic distributions of the test statistics supWn, aveWn, and expWn under
H0 and H∗0 must be computed by simulation. Let vn(γ) = n−1/2
∑nt=1 st(γ)ξt with ξt
being an iid standard normal random variable; then vn(γ) converges weakly in prob-
ability to a mean zero Gaussian process with covariance kernel S(γ1,γ2) in the sense
of Gine and Zinn (1990), where S(γ1,γ2) appears in (31). This implies convergence
in probability of pBn (H0) to the true p-value under H0 when H∗0 is true.
4.3 Sequential tests for linear restrictions
In this section, consider testing H0 which is distinct from H∗0 . In this case, we have
that (R, q) = (R∗, q∗), where (R, q) are defined in (11) and (R∗, q∗) are defined in
(12). We establish the asymptotic validity of the sequential test proposed in Section
3.3.
15
The asymptotic validity of the pre-test for H∗0 follows as a special case of Theorem
2 by setting H0 to be identical to H∗0 .
Corollary 3. If Assumption 1 holds, then the following are true: i) Under H∗0 , the
bootstrap p-value pBn (H∗0 ) defined in (27) is asymptotically uniform on [0, 1]. ii) Under
H∗1 , p
Bn (H
∗0 )
p→ 0 as n → ∞ and B → ∞.
When H∗0 is not rejected by the pre-test, our post-test uses only the bootstrap
test (Case 1). This testing strategy is justified by two facts. First, Corollary 3.(ii)
guarantees that the pre-test rejects H∗0 with probability approaching 1 when H∗
0 is
false; hence, the non-rejection of H∗0 during the pre-test provides overwhelming evi-
dence for H∗0 . Second, Theorem 2 guarantees that the post-test for H0 has desired
asymptotic properties when H∗0 is true.
Next, we elaborate on Case 2, whereH∗0 is rejected in the pre-test. This case leaves
two conflicting possibilities: (1) a correct rejection of H∗0 or (2) an incorrect rejection
of H∗0 (i.e., the type-I error of the pre-test). We begin with the first possibility. When
H∗1 is true, the bootstrap test for H0 is not correctly sized in general. However, the
asymptotic normality of β under H∗1 established in Theorem 1.(iii) can be used to
our advantage; this motivates incorporating the asymptotic χ2 test into the post-test
when we have rejected H∗0 . The asymptotic properties of the χ2 test are summarized
in Lemma 4, where pn,χ2(H0) is the associated p-value.
Lemma 4. If Assumption 1 holds, then the following are true: i) If both H∗1 and H0
are true, then pn,χ2(H0) is asymptotically uniform on [0, 1]. ii) If H1 is true, then
pn,χ2(H0)p→ 0 when either H∗
0 is true or false.
The proof of Lemma 4 is provided in Appendix A.3. Lemma 4.(i) and Lemma 4.(ii)
with H∗0 being false are almost direct implications of Theorem 1.(iii). Lemma 4.(ii)
with H∗0 being true follows from Theorem 1.(i) and (ii).
Now consider the second possibility: the type-I error of the pre-test. In general,
the asymptotic χ2 test is not correctly sized when H∗0 is true. Theorem 2, however,
guarantees the asymptotic validity of the bootstrap test for H0 when H∗0 is true. The
combination of Theorem 2.(i) and Lemma 4.(i) motivates the least favorable test,
which bounds the rejection frequencies of the bootstrap and χ2 tests.
Theorem 5 establishes that the sequential testing procedure bounds the asymp-
totic type-I error rate at the nominal level under H0 and maintains consistency against
alternatives H1. Let a1 denote the significance level of the pre-test and a2 denote the
16
significance level of the post-test. Let pn(H0) be the p-value associated with the post-
test in the sequential procedure, and recall the decision rule that H0 is rejected when
pn(H0) < a2. We omit the dependence of pn(H0) on B, the number of bootstrap
iterations, for convenience.
Theorem 5. If Assumption 1 holds, then the following are true: i) Under H0,
limB→∞ lim supn→∞ Pr{pn(H0) < a2} ≤ a2. ii) Under H1, pn(H0)p→ 0 as n → ∞
and B → ∞.
Theorem 5 is proven in Appendix A.4. Note that Theorem 5.(i) does not guarantee
the nominal size a2 is exactly attained by the sequential test for H0; instead, Theorem
5.(i) guarantees that a2 is not exceeded asymptotically.
An intuition behind Theorem 5 is as follows. Consider each scenario in the pre-
test:
a) Correctly failing to reject H∗0 : γ0 is not identified, and the bootstrap test,
which assumes non-identification of γ0, is solely used in the post-test.
b) Incorrect rejection of H∗0 (i.e., the type-I error): γ0 is not identified, and the
least favorable test is used.
c) Correct rejection of H∗0 : γ0 is identified, and the least favorable test is used.
d) Incorrectly failing to reject H∗0 (i.e., the type-II error): γ0 is identified, but
the bootstrap test is solely used.
Corollary 3 indicates, asymptotically, that (a) occurs with probability 1−a1, (b) occurs
with probability a1, (c) occurs with probability 1, and (d) occurs with probability 0.
By Theorem 2, the post-test forH0 in scenario (a) is asymptotically valid. By Theorem
2 and Lemma 4, the post-test in scenarios (b) and (c) is asymptotically valid. Thus,
the proposed sequential test is asymptotically valid when either H∗0 is true or false.
5 Monte Carlo simulation
To evaluate the finite sample performance of the proposed methods, we conduct Monte
Carlo simulations. The DGP is SE-CoTAR with p = 1:
yt =
α10 + ϕ10yt−1 + ϵt if yt−d0 < µt−d0−1(c0),
α20 + ϕ20yt−1 + ϵt if yt−d0 ≥ µt−d0−1(c0),(32)
17
where α10 = α20 = 0, d0 = 1, c0 = 0.5, and ϵti.i.d.∼ N (0, 1). The conditional threshold
µt(c0) takes them0c0-th smallest value (i.e., almost the median) of {yt, yt−1, . . . , yt−m0+1}.The memory sizem0 ∈ {6, 18} is assumed to be known. The AR(1) parameters are set
to be ϕ10 = 0.2 and ϕ20 ∈ {0.2, 0.8}. When ϕ20 = 0.2, threshold effects are absent and
(32) reduces to the one-regime AR(1) process yt = 0.2yt−1 + ϵt for all t ∈ {1, . . . , n}.When ϕ20 = 0.8, threshold effects are present and (32) does not degenerate.
The sample size is set to be n ∈ {100, 500, 1000}, which resembles typical em-
pirical applications in economics and finance. The case with (n,m0) ∈ {100, 6}, forexample, can be thought of as quarterly data with the sample period being 25 years
and the memory size being 1 year and a half. The case with (n,m0) ∈ {500, 18} can
be thought of as business-daily data with the sample period being approximately 2
years and the memory size being slightly less than 1 month.
Given this set-up, we inspect the performance of the profiling estimation in Sec-
tion 5.1; the bootstrap tests for the no-threshold-effect hypothesis in Section 5.2;
the sequential tests for the individual zero restriction of the regression parameter in
Section 5.3.
5.1 Profiling estimation
The SE-CoTAR model with p = 1 is specified as
yt =
α1 + ϕ1yt−1 + ut if yt−d < µt−d−1(c),
α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c),(33)
The space of the delay parameter d is D = {1, 2}. The space of the percentile
parameter c is given by (6) so that each regime accounts for at least 15% of the
entire sample. We fit (33) to each of J = 1000 Monte Carlo samples generated from
(32), and estimate the regression parameters β = (α1, ϕ1, α2, ϕ2)⊤ and the nuisance
parameters γ = (d, c)⊤ via profiling.
We report the bias, standard deviation, and root mean squared error (RMSE) for
each element of θ = (β⊤,γ⊤)⊤. The results under ϕ20 ∈ {0.2, 0.8} are summarized in
Tables 1-2, respectively. In Table 1, threshold effects do not exist since ϕ10 = ϕ20 =
0.2. In this case, γ0 is not identified and γ is inconsistent. The simulation results
in Table 1 are in line with this fact. Focus on the percentile parameter c with the
memory size m0 = 6, for example. For each sample size n ∈ {100, 500, 1000}, the biasis {0.077, 0.084, 0.090} and the RMSE is {0.260, 0.259, 0.275}, respectively.
18
Table 1: Simulation results on the profiling estimation (γ0 is not identified)
m0 = 6
n = 100 n = 500 n = 1000
Bias Stdev RMSE Bias Stdev RMSE Bias Stdev RMSE
α1 −0.018 0.381 0.382 −0.003 0.142 0.142 −0.003 0.110 0.110
ϕ1 −0.036 0.305 0.308 −0.008 0.121 0.121 −0.007 0.088 0.088
α2 0.017 0.337 0.338 0.005 0.149 0.149 0.000 0.108 0.108
ϕ2 −0.033 0.286 0.288 −0.011 0.123 0.123 0.000 0.089 0.089
d 0.595 0.491 0.771 0.631 0.483 0.794 0.621 0.485 0.788
c 0.077 0.248 0.260 0.084 0.245 0.259 0.090 0.260 0.275
m0 = 18
n = 100 n = 500 n = 1000
Bias Stdev RMSE Bias Stdev RMSE Bias Stdev RMSE
α1 −0.043 0.503 0.505 −0.016 0.194 0.195 −0.006 0.145 0.145
ϕ1 −0.056 0.366 0.370 −0.017 0.149 0.150 −0.008 0.108 0.108
α2 0.070 0.506 0.510 0.001 0.190 0.190 0.003 0.138 0.138
ϕ2 −0.057 0.386 0.390 −0.006 0.148 0.148 −0.005 0.103 0.103
d 0.619 0.486 0.787 0.638 0.481 0.799 0.625 0.484 0.791
c 0.022 0.243 0.244 0.028 0.252 0.253 0.021 0.253 0.254
DGP: yt = ϕ10yt−1 + ϵt if yt−d0< µt−d0−1(c0) and yt = ϕ20yt−1 + ϵt if yt−d0
≥ µt−d0−1(c0), where
ϕ10 = ϕ20 = 0.2, d0 = 1, m0 ∈ {6, 18}, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the m0c0-th smallest
value of {yt, yt−1, . . . , yt−m0+1}. Since ϕ10 = ϕ20, γ0 is not identified. Model: yt = α1 + ϕ1yt−1 + ut
if yt−d < µt−d−1(c) and yt = α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c). The profiling estimator for
(α1, ϕ1, α2, ϕ2, d, c) is computed with the choice sets being d ∈ {1, 2} and c ∈ {1/m0, . . . , 1}. The
bias, standard deviation, and RMSE across J = 1000 Monte Carlo samples are reported.
The results in Table 1 are also in line with the fact that βp→ β0; recall Theorem
1. Focus on ϕ1 with m0 = 6, for example. For each n ∈ {100, 500, 1000}, the bias is
{−0.036,−0.008,−0.007} and the RMSE is {0.308, 0.121, 0.088}. The consistency of
β is observed for m0 = 18 too, but the standard deviation increases for each n. For
ϕ1 with m0 = 18, the RMSE is {0.370, 0.150, 0.108}. This result suggests that the
larger value of m0 has an adverse effect on the small sample performance of the point
estimation of β, probably due to the larger choice set of c.
In Table 2, threshold effects exist since ϕ10 = ϕ20. In this case, γ0 is identified
and γp→ γ0; recall Theorem 1.(iii). The simulation results in Table 2 are consistent
with this fact. Focus again on c with m0 = 6, for example. For each n, the bias is
19
Table 2: Simulation results on the profiling estimation (γ0 is identified)
m0 = 6
n = 100 n = 500 n = 1000
Bias Stdev RMSE Bias Stdev RMSE Bias Stdev RMSE
α1 −0.005 0.240 0.240 −0.003 0.076 0.076 −0.002 0.054 0.054
ϕ1 −0.029 0.217 0.219 −0.007 0.071 0.072 −0.005 0.049 0.050
α2 0.085 0.361 0.371 0.014 0.090 0.092 0.008 0.063 0.063
ϕ2 −0.056 0.193 0.201 −0.012 0.056 0.057 −0.005 0.039 0.040
d 0.212 0.409 0.460 0.002 0.045 0.045 0.000 0.000 0.000
c 0.052 0.209 0.215 0.001 0.044 0.044 −0.000 0.016 0.016
m0 = 18
n = 100 n = 500 n = 1000
Bias Stdev RMSE Bias Stdev RMSE Bias Stdev RMSE
α1 0.009 0.368 0.368 −0.008 0.088 0.089 −0.004 0.057 0.057
ϕ1 −0.041 0.312 0.315 −0.019 0.091 0.093 −0.007 0.059 0.060
α2 0.219 0.540 0.583 0.026 0.126 0.129 0.015 0.083 0.085
ϕ2 −0.122 0.252 0.280 −0.018 0.068 0.070 −0.010 0.046 0.047
d 0.332 0.471 0.576 0.007 0.083 0.084 0.000 0.000 0.000
c 0.021 0.207 0.208 −0.003 0.071 0.071 −0.000 0.034 0.034
DGP: yt = ϕ10yt−1 + ϵt if yt−d0< µt−d0−1(c0) and yt = ϕ20yt−1 + ϵt if yt−d0
≥ µt−d0−1(c0), where
ϕ10 = 0.2, ϕ20 = 0.8, d0 = 1, m0 ∈ {6, 18}, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the m0c0-
th smallest value of {yt, yt−1, . . . , yt−m0+1}. Since ϕ10 = ϕ20, γ0 is identified. Model: yt = α1 +
ϕ1yt−1+ut if yt−d < µt−d−1(c) and yt = α2+ϕ2yt−1+ut if yt−d ≥ µt−d−1(c). The profiling estimator
for (α1, ϕ1, α2, ϕ2, d, c) is computed with the choice sets being d ∈ {1, 2} and c ∈ {1/m0, . . . , 1}. Thebias, standard deviation, and RMSE across J = 1000 Monte Carlo samples are reported.
{0.052, 0.001,−0.000} and the RMSE is {0.215, 0.044, 0.016}. The results in Table 2
are also consistent with the fact that βp→ β0. In summary, the simulation results
indicate that the profiling estimation performs well in finite samples.
5.2 Testing the no-threshold-effect hypothesis
Consider testing the no-threshold-effect hypothesis H∗0 : (α10, ϕ10) = (α20, ϕ20) versus
the alternative hypothesis H∗1 : (α10, ϕ10) = (α20, ϕ20). Since γ0 is not identified under
H∗0 , we implement the bootstrap tests described in Section 3.2. For comparison, the
sup-, ave-, and exp-Wald tests as well as their LM counterparts are all implemented.
20
The rejection frequencies of each test are computed, where the nominal size is a = 0.05
and the number of bootstrap samples is B = 500. Since α10 = α20 = 0 by assumption,
the rejection frequencies correspond to empirical size when ϕ10 = ϕ20 and empirical
power when ϕ10 = ϕ20.
In Table 3, the rejection frequencies of each test are reported. The LM tests
achieve sharp empirical size even for the smallest sample size of n = 100. Taking the
ave-LM test with m0 = 6 as an example, the empirical size is {0.037, 0.038, 0.049} for
n ∈ {100, 500, 1000}, respectively. These results are consistent with Corollary 3.(i).
The sup-LM and exp-LM tests are slightly more conservative than the ave-LM test.
The empirical size of each LM test is almost unchanged when the memory size m0
changes from 6 to 18, indicating that the test is robust against the choice of m0.
The Wald tests are over-sized when n = 100 but correctly sized when n ≥ 500.
Taking the ave-Wald test as an example, the empirical size is {0.126, 0.048, 0.054}for m0 = 6. When n = 100, the sup-Wald and exp-Wald tests exhibit even worse
size distortions than the ave-Wald test. Hence, it is advised to use the LM tests
—especially the ave-LM test— to control the empirical size in small samples.
The empirical power of each test approaches 1 as n → ∞, confirming Corollary
3.(ii). The empirical power of the ave-LM test is {0.504, 1.000, 1.000} for m0 = 6 and
{0.419, 0.999, 1.000} for m0 = 18. The sup-LM and exp-LM tests are less powerful
than the ave-LM test when n = 100, reflecting the fact that they are conservative
under H∗0 . Their empirical power, however, reaches 1 when n = 500. In summary,
the ave-LM test performs remarkably well under both H∗0 and H∗
1 .
5.3 Sequential tests for the individual zero restriction
Consider testing the zero restriction of β, an arbitrary element of β = (α1, ϕ1, α2, ϕ2)⊤.
We perform the sequential test developed in Section 3.3. The no-threshold-effect
hypothesis H∗0 is pre-tested by the bootstrap ave-LM test at the 5% level. The ave-
LM test statistic is selected since it is found to have the best finite sample performance
in Section 5.2. The individual zero restriction of β is post-tested at the 5% level as
follows: (i) if H∗0 is not rejected by the pre-test, then the bootstrap sup-LM test is
performed; (ii) if H∗0 is rejected, then both the Wald test based on the asymptotic χ2
distribution and the bootstrap sup-LM test are performed, and the larger one of the
two p-values is taken. The Wald statistic and the sup-LM statistic are selected since
they are found to have the best finite sample performance in extra simulations not
reported here; all test statistics are carefully compared in the supplemental material.
21
Table 3: Rejection frequencies of the bootstrap tests for the no-threshold-effect hy-pothesis H∗
0
ϕ10 = 0.2 and ϕ20 = 0.2 (empirical size)
m0 = 6 m0 = 18
Statistic n = 100 n = 500 n = 1000 n = 100 n = 500 n = 1000
sup-Wald 0.188 0.045 0.062 0.238 0.073 0.067
ave-Wald 0.126 0.048 0.054 0.138 0.062 0.055
exp-Wald 0.179 0.046 0.063 0.205 0.069 0.063
sup-LM 0.028 0.027 0.051 0.020 0.044 0.049
ave-LM 0.037 0.038 0.049 0.040 0.045 0.049
exp-LM 0.028 0.031 0.049 0.024 0.034 0.050
ϕ10 = 0.2 and ϕ20 = 0.8 (empirical power)
m0 = 6 m0 = 18
Statistic n = 100 n = 500 n = 1000 n = 100 n = 500 n = 1000
sup-Wald 0.730 1.000 1.000 0.642 0.999 1.000
ave-Wald 0.715 1.000 1.000 0.631 0.999 1.000
exp-Wald 0.741 1.000 1.000 0.655 0.999 1.000
sup-LM 0.325 1.000 1.000 0.215 0.999 1.000
ave-LM 0.504 1.000 1.000 0.419 0.999 1.000
exp-LM 0.389 1.000 1.000 0.292 0.999 1.000
DGP: yt = ϕ10yt−1 + ϵt if yt−d0< µt−d0−1(c0) and yt = ϕ20yt−1 + ϵt if yt−d0
≥ µt−d0−1(c0), where
ϕ10 = 0.2, ϕ20 ∈ {0.2, 0.8}, d0 = 1, m0 ∈ {6, 18}, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the m0c0-th
smallest value of {yt, yt−1, . . . , yt−m0+1}. Model: yt = α1 + ϕ1yt−1 + ut if yt−d < µt−d−1(c) and
yt = α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c). The profiling estimation is executed with the choice sets
being d ∈ {1, 2} and c ∈ {1/m0, . . . , 1}. We report the rejection frequencies of the wild-bootstrap
Wald and LM tests for H∗0 : (α1, ϕ1) = (α2, ϕ2), where the nominal size is a = 0.05.
In Table 4, we report rejection frequencies of the sequential test across J = 1000
Monte Carlo samples. The rejection frequencies are interpreted as empirical size for
(α1, α2) and empirical power for (ϕ1, ϕ2), since α10 = α20 = 0, ϕ10 = 0, and ϕ20 = 0.
As in Section 5.2, the number of bootstrap samples is B = 500. In this section, the
memory size is fixed at m0 = 6 for the sake of brevity; qualitatively similar results
appear when m0 = 18.
In view of Table 4, the empirical size of the sequential test is well controlled for
all cases considered, confirming Theorem 5.(i). When H∗0 is true with ϕ20 = 0.2,
the empirical size with respect to α1 is {0.027, 0.038, 0.046} for n ∈ {100, 500, 1000},
22
Table 4: Rejection frequencies of the sequential tests for the individual zero restriction
ϕ20 = 0.2 (H∗0 is true) ϕ20 = 0.8 (H∗
0 is false)
Size Power Size Power
n α1 α2 ϕ1 ϕ2 α1 α2 ϕ1 ϕ2
100 0.027 0.034 0.168 0.151 0.088 0.025 0.591 0.988
500 0.038 0.034 0.896 0.901 0.045 0.041 0.778 1.000
1000 0.046 0.054 0.986 0.987 0.044 0.063 0.972 1.000
DGP: yt = ϕ10yt−1 + ϵt if yt−d0 < µt−d0−1(c0) and yt = ϕ20yt−1 + ϵt if yt−d0 ≥ µt−d0−1(c0), where
ϕ10 = 0.2, ϕ20 ∈ {0.2, 0.8}, d0 = 1, m0 = 6, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the (m0c0)-th
smallest value of {yt, yt−1, . . . , yt−m0+1}. Model: yt = α1+ϕ1yt−1+ut if yt−d < µt−d−1(c) and yt =
α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c). The no-threshold-effect hypothesis, H∗0 : (α1, ϕ1) = (α2, ϕ2),
is pre-tested by the bootstrap ave-LM test at the 5% level. The zero restriction of each regression
parameter is post-tested at the 5% level as follows: (i) if H∗0 is not rejected by the pre-test, then the
bootstrap sup-LM test is performed; (ii) if H∗0 is rejected, then both the Wald test based on the χ2
distribution and the bootstrap sup-LM test are performed, and the larger one of the two p-values is
taken. This table reports the rejection frequencies across J = 1000 Monte Carlo samples, which are
interpreted as empirical size for (α1, α2) and empirical power for (ϕ1, ϕ2).
respectively. When H∗0 is false with ϕ20 = 0.8, the empirical size with respect to α1 is
{0.088, 0.045, 0.044}. Similar results are observed for α2.
Another implication from Table 4 is that the empirical power of the sequential
test is sufficiently high, which confirms Theorem 5.(ii). When H∗0 is true, the empirical
power with respect to ϕ1 is {0.168, 0.896, 0.986} for n ∈ {100, 500, 1000}. When H∗0
is false, the empirical power is {0.591, 0.778, 0.972}. Similar results are observed for
ϕ2. In summary, the sequential test of the individual zero restriction achieves sharp
size and high power whether threshold effects are present or absent.
6 Empirical application
Modelling and predicting the spread of the novel coronavirus are one of the most
urgent research topics in the modern global society. There is a rapidly growing lit-
erature in which time series methods are employed to model and predict COVID-19
data (see, e.g., Chimmula and Zhang, 2020, Zeroual, Harrou, Dairi, and Sun, 2020).
In particular, Aidoo, Ampofo, Awashie, Appiah, and Adebanji (2021) fitted STAR
models to daily new confirmed COVID-19 cases in the African sub-region, detecting
23
nonlinear effects. Indeed, the number of new confirmed cases is likely an explosive
process, since individuals feel tempted to take tests in response to an emerging pan-
demic. Such a built-in acceleration mechanism could produce time-varying threshold
effects, motivating the use of SE-CoTAR. In Section 6.1, we describe our data and
perform some preliminary analysis. In Section 6.2, we perform the main analysis using
SE-CoTAR.
6.1 Data and preliminary analysis
The target series is daily new confirmed COVID-19 cases per million people in the
U.S. and Japan. The data are publicly available at Our World in Data (OWID).
As is well known, the raw data (new_cases_per_million) have strong weekday
effects (e.g., the number of new confirmed cases tends to be smaller on weekends
and holidays due to the smaller number of tests). A seasonally adjusted version
(new_cases_smoothed_per_million) is also available at OWID, in which the week-
day effects are smoothed out. We analyze the seasonally adjusted version, denoted
by {wt}nt=1, from April 4, 2020 through June 23, 2021 (n = 446 days). The start date
of the sample period roughly matches the time when the first wave of the pandemic
began in the U.S. and Japan.
In Figure 1, log series lnwt and log-differenced series yt = ∆ lnwt = wt − wt−1
are plotted. The log series move smoothly due to the seasonal adjustment, and we
clearly observe several waves of the pandemic. The number of new cases in the U.S.
has been decreasing dramatically since May 2021, while the log series of Japan seems
to have a moderate upward trend throughout the sample period. The count for the
U.S., however, has been far larger than the count of Japan. Indeed, the number of
new cases per million people on June 23, 2021 (i.e., the end date of the sample) is
wn = 34.14 for the U.S. and wn = 11.40 for Japan, almost triple. The log-difference
of the new confirmed cases, yt, exhibits rather complex fluctuations with persistent
swings and temporary noise being combined, which suggests the presence of nonlinear
effects.
24
Figure 1: The number of daily new confirmed COVID-19 cases per million people
Jul20 Oct20 Jan21 Apr213
3.5
4
4.5
5
5.5
6
6.5
7
a) lnw, United States
Jul20 Oct20 Jan21 Apr21
-0.2
-0.1
0
0.1
0.2
b) ∆ lnw, United States
Jul20 Oct20 Jan21 Apr21-2
-1
0
1
2
3
4
5
c) lnw, Japan
Jul20 Oct20 Jan21 Apr21
-0.2
-0.1
0
0.1
0.2
d) ∆ lnw, Japan
wt = the smoothed version of the number of new confirmed COVID-19 cases per million people on
day t. This figure plots lnwt and ∆ lnwt = lnwt − lnwt−1 for the U.S. and Japan. Sample period:
April 4, 2020 – June 23, 2021.
6.2 Main analysis and discussions
The SE-CoTAR model with p = 3 and m = 14 is fitted to the daily change in the
number of new confirmed COVID-19 cases:
yt =
α1 +∑3
k=1 ϕ1kyt−k + ut if yt−d < µt−d−1(c),
α2 +∑3
k=1 ϕ2kyt−k + ut if yt−d ≥ µt−d−1(c).
Regime 1 represents a deceleration phase where the change in new confirmed cases
is small relative to the local memory of size m = 14 days (i.e., 2 weeks). Regime 2
25
represents an acceleration phase where the change is relatively large. The space of
the delay parameter d is D = {1, . . . , 14}. The choice of m = d = 14 is in accordance
with a common perception that a present status of infection is an outcome of people’s
activities approximately 2 weeks ago. The space of the percentile parameter c is given
by (6) so that each regime accounts for at least 15% of the entire sample. The AR
lag length p = 3 is subjectively selected to balance the model fit and parsimony; a
data-driven selection of p is left as a future task.
Let βr = (αr, ϕr1, ϕr2, ϕr3)⊤ be a vector of regression parameters in regime r ∈
{1, 2}. Let γ = (d, c)⊤ be a vector of nuisance parameters. We conduct the profiling to
estimate β = (β⊤1 ,β
⊤2 )
⊤ and γ. Our primary interest lies in testing the no-threshold-
effect hypothesis H∗0 : β1 = β2. To this end, the wild-bootstrap ave-LM test with
B = 5000 iterations is implemented; the ave-LM test statistic is chosen since it is
found to have the best finite sample performance in Section 5.2.
Taking the ave-LM test of H∗0 as the pre-test, the individual zero restriction of
each element of β is post-tested as follows: (i) if H∗0 is not rejected by the pre-test
at the 5% level, then the bootstrap sup-LM test is performed; (ii) if H∗0 is rejected,
then both the Wald test based on the asymptotic χ2 distribution and the bootstrap
sup-LM test are performed, and the larger one of the two p-values is taken. The Wald
statistic and the sup-LM statistic are chosen since they are found to perform best in
small samples (Section 5.3).
Further, we compute several key quantities which help us understand contrasts
between the deceleration and acceleration regimes. First, we compute the estimated
share of regime r as δr(c) = n−1∑n
t=1 Irt(c), where c is the profiling estimator for c
and Irt(c) is defined in (3). Second, the empirical transition probability from regime
r′ to regime r is computed as follows.
δrr′(c) =
∑nt=1 Irt(c)Ir′,t−1(c)∑n
t=1 Ir′,t−1(c), r, r′ ∈ {1, 2}.
Third, the average duration of regime r, denoted as Dr(c), is computed for each
r ∈ {1, 2}.In Figure 2, the estimated conditional threshold µt(c) is plotted for each country.
We observe that µt(c) traces the persistent swing of yt strikingly well, which high-
lights the prominent feature of SE-CoTAR. A key question is whether the persistence
structure of yt is homogeneous or heterogeneous across the regimes. This question can
be addressed by testing the no-threshold-effect hypothesis H∗0 . As reported in Table
26
5, the bootstrap p-value of the ave-LM test for H∗0 is 0.049 for the U.S. and 0.002 for
Japan. Hence, H∗0 is rejected marginally at the 5% level for the U.S., and rejected at
any conventional level for Japan. We therefore conclude that conditional threshold
effects are present for both countries.
Figure 2: Estimated conditional threshold given the SE-CoTAR model
Jul20 Oct20 Jan21 Apr21
-0.2
-0.1
0
0.1
0.2
a) United States
Jul20 Oct20 Jan21 Apr21
-0.2
-0.1
0
0.1
0.2
b) Japan
The target series, the log-difference of the daily new confirmed COVID-19 cases, is plotted with the
blue, noisier line. The estimated conditional threshold based on SE-CoTAR, {µt(c)}, is plotted with
the red, smoother line. Sample period: April 4, 2020 – June 23, 2021.
A number of other interesting implications can be drawn from Table 5. In view
of the profiling estimates of the AR parameters and their least favorable p-values, the
persistence structure seems to differ considerably across the regimes and countries. At
the 5% level, (ϕ11, ϕ12) and (ϕ22, ϕ23) are significantly positive in the U.S.; (ϕ11, ϕ12)
and ϕ21 are significantly positive in Japan. The estimated delay parameter d is 7
days for the U.S. and 10 days for Japan, confirming the existence of 1-week to 2-
week delay. The estimated percentile parameter c is 0.500 for the U.S. and 0.643 for
Japan, indicating that the thresholds are located around the median of the 14-day
local memory.
The estimated share of the deceleration regime, δ1(c), is 0.484 for the U.S. and
0.574 for Japan. This contrast is consistent with the fact that the number of new
confirmed cases is larger in the U.S. than in Japan. In view of the empirical transition
probabilities, the U.S. and Japan have similar persistence structures in the acceleration
regime; the probability of switching from regime 2 to regime 1, δ12(c), is 0.342 for the
U.S. and 0.355 for Japan. The persistence structures in the deceleration regime,
however, differ across the two countries; the probability of switching from regime 1
27
Table 5: Empirical results of the SE-CoTAR model on the COVID-19 cases
United States
α1 ϕ11 ϕ12 ϕ13 α2 ϕ21 ϕ22 ϕ23
0.004 0.266 0.439 0.026 −0.006 0.127 0.508 0.281
(0.014) (0.006) (0.000) (0.771) (0.045) (0.293) (0.000) (0.005)
δ1(c) δ2(c) δ11(c) δ21(c) δ12(c) δ22(c) D1(c) D2(c)
0.484 0.516 0.636 0.364 0.342 0.658 2.750 2.896
d c p(H∗0 ) - - - - -
7 0.500 0.049 - - - - -
Japan
α1 ϕ11 ϕ12 ϕ13 α2 ϕ21 ϕ22 ϕ23
−0.004 0.349 0.299 0.227 0.010 0.420 0.112 0.049
(0.180) (0.000) (0.017) (0.343) (0.129) (0.000) (0.205) (0.604)
δ1(c) δ2(c) δ11(c) δ21(c) δ12(c) δ22(c) D1(c) D2(c)
0.574 0.426 0.734 0.266 0.355 0.645 3.758 2.788
d c p(H∗0 ) - - - - -
10 0.643 0.002 - - - - -
The SE-CoTAR model with p = 3 and m = 14 is fitted to the log-difference of the daily new
confirmed COVID-19 cases per million people in the U.S. and Japan. Regime 1: yt−d < µt−d−1(c).
Regime 2: yt−d ≥ µt−d−1(c). Sample period: April 4, 2020 – June 23, 2021 (n = 446). The spaces
of the nuisance parameters are d ∈ {1, . . . , 14} and c ∈ {1/m, . . . , 1}. This table reports the profilingestimates of the regression parameters as well as their least favorable p-values in parentheses; the
share of regime r denoted as δr(c); the empirical transition probability from regime r′ to regime r
denoted as δrr′(c); the average duration of regime r denoted as Dr(c); the profiling estimates of the
nuisance parameters; the wild-bootstrap p-value for the no-threshold-effect hypothesis H∗0 .
to regime 2, δ21(c), is 0.364 for the U.S. and 0.266 for Japan. These results suggest
that the U.S. has the stronger tendency to switch from the deceleration regime to the
acceleration regime than Japan does. Indeed, the average duration of the deceleration
regime, D1(δ), is 2.750 days for the U.S. and 3.758 days for Japan. On average, the
deceleration regime lasts one day shorter in the U.S. than in Japan.
These empirical results bring a new insight on why the pandemic is more seri-
ous in the U.S. than in Japan. At the acceleration regime, the two countries are
homogeneous in the time series sense, perhaps because it is hard or even impossible
to control the pandemic when it is accelerating. What makes the difference between
28
the U.S. and Japan is the duration of the deceleration regime. Given these empirical
findings, a possible policy implication requiring further analysis is that the U.S. could
more efficiently combat the pandemic by more strongly encouraging safety when the
pandemic seems to be slowing down (e.g., staying at home or wearing a mask just for
another day when the danger seems past).
7 Conclusion
We have proposed the conditional threshold autoregression (CoTAR), a novel time se-
ries model where the threshold is specified as an empirical quantile of the local memory
of a threshold variable x. The resulting conditional threshold traces the fluctuation
of x, which can enhance the fit and interpretation of the model. The parameters
of CoTAR consist of (β1,β2,γ), where βr is the vector of regression parameters in
regime r ∈ {1, 2}; γ is the vector of nuisance parameters. The entire parameters can
be estimated via profiling, and the asymptotic properties of the profiling estimator
depends on whether γ is identifiable or not. A key insight is that γ is unidentified if
and only if there are no threshold effects (i.e., H∗0 : β1 = β2).
To test H∗0 , we have proposed the wild-bootstrap tests which incorporate all
possible values of γ. Using the bootstrap test forH∗0 as a pre-test, any linear constraint
of the regression parameters, such as the individual zero restriction, can be tested by
the proposed sequential test. The construction of the pre-test is inspired by Hansen
(1996), and the construction of the post-test is inspired by the identification category
selection procedure of Andrews and Cheng (2012). We have proven that both the
pre-test and the post-test are asymptotically valid. Furthermore, we have shown via
the Monte Carlo simulation that both tests achieve sharp size and high power in finite
samples.
We have analyzed the daily new confirmed COVID-19 cases per million people in
the U.S. and Japan by fitting the self-exciting CoTAR model. Significant conditional
threshold effects have been detected for both countries, indicating the practical use
of the CoTAR model. The implied persistence structures are consistent with the
fact that the number of new confirmed cases in the U.S. is larger than in Japan. In
particular, the deceleration regime of the U.S. is approximately one day shorter than
the deceleration regime of Japan on average. This empirical result suggests that a
potentially effective measure for the U.S. to combat the pandemic would be to more
strongly encourage safety when the pandemic is decelerating.
29
Acknowledgements
We thank Yasumasa Matsuda and participants at the 3rd Hosoya Prize Lecture and
the 6th Annual International Conference on Applied Econometrics in Hawaii for help-
ful comments and discussions. The first author, Kaiji Motegi, is grateful for the
financial support of Ishii Memorial Securities Research Promotion Foundation and
the Organization for Advanced and Integrated Research (OAIR), Kobe University.
The third author, Shigeyuki Hamori, is grateful for the financial support of JSPS
KAKENHI Grant Number (A) 17H00983 and OAIR.
References
Aidoo, E. N., R. T. Ampofo, G. E. Awashie, S. K. Appiah, and A. O. Ade-banji (2021): “Modelling COVID-19 incidence in the African sub-region usingsmooth transition autoregressive model,” Modeling Earth Systems and Environ-ment, https://doi.org/10.1007/s40808-021-01136-1.
Andrews, D. W. K. (1993): “Tests for Parameter Instability and Structural Changewith Unknown Change Point,” Econometrica, 61, 821–856.
Andrews, D. W. K., and X. Cheng (2012): “Estimation and Inference withWeak, Semi-Strong, and Strong Identification,” Econometrica, 80(5), 2153–2211.
(2013): “Maximum likelihood estimation and uniform inference with spo-radic identification failure,” Journal of Econometrics, 173, 36–56.
Andrews, D. W. K., and W. Ploberger (1994): “Optimal tests when a nuisanceparameter is present only under the alternative,” Econometrica, 62(6), 1383–1414.
Balke, N. S., and T. B. Fomby (1997): “Threshold Cointegration,” InternationalEconomic Review, 38, 627–645.
Bessec, M. (2003): “The asymmetric exchange rate dynamics in the EMS: a time-varying threshold test,” European Review of Economics and Finance, 2, 3–40.
Bradley, R. C. (2005): “Basic properties of strong mixing conditions: A surveyand some open questions,” Probability Surveys, 2, 107–144.
Chan, K. S. (1993): “Consistency and limiting distribution of the least squaresestimator of a threshold autoregressive model,” The Annals of Statistics, 21, 520–533.
Chan, K. S., and H. Tong (1985): “On the use of the deterministic Lyapunovfunction for the ergodicity of stochastic difference equations,” Advances in AppliedProbability, 17, 666–678.
30
Chan, K. S., and R. S. Tsay (1998): “Limiting properties of the least squaresestimator of a continuous threshold autoregressive model,” Biometrika, 85, 413–426.
Chen, C. W. S., M. K. P. So, and F.-C. Liu (2011): “A review of threshold timeseries models in finance,” Statistics and Its Interface, 4, 167–181.
Chen, R., and R. S. Tsay (1991): “On the ergodicity of TAR(1) processes,” TheAnnals of Applied Probability, 1, 613–634.
Chimmula, V. K. R., and L. Zhang (2020): “Time series forecasting of COVID-19transmission in Canada using LSTM networks,” Chaos, Solitons and Fractals, 135,#109864.
Corsi, F. (2009): “A Simple Approximate Long-Memory Model of Realized Volatil-ity,” Journal of Financial Econometrics, 7, 174–196.
Davies, R. B. (1977): “Hypothesis testing when a nuisance parameter is presentonly under the alternative,” Biometrika, 64(2), 247–254.
(1987): “Hypothesis testing when a nuisance parameter is present only underthe alternative,” Biometrika, 74(1), 33–43.
Dueker, M., M. T. Owyang, and M. Sola (2010): “A Time-Varying ThresholdSTAR Model of Unemployment and the Natural Rate,” Working Paper 2010-029A,Federal Reserve Bank of St. Louis.
Dueker, M. J., Z. Psaradakis, M. Sola, and F. Spagnolo (2013): “State-Dependent Threshold Smooth Transition Autoregressive Models,” Oxford Bulletinof Economics and Statistics, 75, 835–854.
Elliott, G., U. K. Muller, and M. W. Watson (2015): “Nearly optimal testswhen a nuisance parameter is present under the null hypothesis,” Econometrica,83, 771–811.
Gine, E., and J. Zinn (1990): “Bootstrapping general empirical measures,” TheAnnals of Probability, 18, 851–869.
Gonzalo, J., and M. Wolf (2005): “Subsampling inference in threshold autore-gressive models,” Journal of Econometrics, 127, 201–224.
Granger, C. W. J., and T. Terasvirta (1993): Modelling Nonlinear EconomicRelationships. Oxford University Press.
Hansen, B. E. (1996): “Inference when a nuisance parameter is not identified underthe null hypothesis,” Econometrica, 64(2), 413–430.
(2000): “Sample splitting and threshold estimation,” Econometrica, 68,575–603.
31
(2011): “Threshold autoregression in economics,” Statistics and Its Interface,4, 123–127.
(2017): “Regression KinkWith an Unknown Threshold,” Journal of Business& Economic Statistics, 35, 228–240.
Hill, J. B. (2021): “Weak-identification robust wild bootstrap applied to a consistentmodel specification test,” Econometric Theory, 37, 409–463.
Liu, J., and E. Susko (1992): “On Strict Stationarity and Ergodicity of a Non-Linear ARMA Model,” Journal of Applied Probability, 29, 363–373.
McCloskey, A. (2017): “Bonferroni-based size-correction for nonstandard testingproblems,” Journal of Econometrics, 200, 17–35.
Motegi, K., X. Cai, S. Hamori, and H. Xu (2020): “Moving average thresh-old heterogeneous autoregressive (MAT-HAR) models,” Journal of Forecasting, 39,1035–1042.
Seo, M. H., and O. Linton (2007): “A smoothed least squares estimator forthreshold regression models,” Journal of Econometrics, 141, 704–735.
Stinchcombe, M. B., and H. White (1998): “Consistent specification testing withnuisance parameters present only under the alternative,” Econometric Theory, 14,295–325.
Tong, H. (1978): “On a threshold model,” in Pattern Recognition and Signal Pro-cessing, ed. by C. H. Chen. Sijthoff and Noordhoff, Amsterdam.
(2011): “Threshold models in time series analysis — 30 years on,” Statisticsand Its Interface, 4, 107–118.
(2015): “Threshold models in time series analysis—Some reflections,” Jour-nal of Econometrics, 189, 485–491.
Tong, H., and K. S. Lim (1980): “Threshold autoregression, limit cycles andcyclical data,” Journal of the Royal Statistical Society. Series B (Methodological),42, 245–292.
Tsay, R. S., and R. Chen (2019): Nonlinear Time Series Analysis. John Wiley &Sons, Inc.
Yang, L., C. Lee, and I. Chen (2021): “Threshold model with a time-varyingthreshold based on Fourier approximation,” Journal of Time Series Analysis, 42,406–430.
Yang, L., and J.-J. Su (2018): “Debt and growth: Is there a constant tippingpoint?,” Journal of International Money and Finance, 87, 133–143.
32
Yu, P., and X. Fan (2021): “Threshold Regression With a Threshold Boundary,”Journal of Business & Economic Statistics, 39, 953–971.
Zeroual, A., F. Harrou, A. Dairi, and Y. Sun (2020): “Deep learning methodsfor forecasting COVID-19 time-series data: A comparative study,” Chaos, Solitonsand Fractals, 140, #110121.
Zhu, Y., H. Chen, and M. Lin (2019): “Threshold models with time-varyingthreshold values and their application in estimating regime-sensitive Taylor rules,”Studies in Nonlinear Dynamics & Econometrics, 23, #20170114.
Appendices
In these appendices, we prove Theorem 1, Theorem 2, Lemma 4, and Theorem 5.
(Corollary 3 is an immediate consequence of Theorem 2.) To this end, we recall and
introduce some notation. Throughout the appendices,p→ denotes the convergence in
probability;d→ denotes the convergence in distribution; ⇒ denotes weak convergence;
⇒p denotes weak convergence in probability as defined by Gine and Zinn (1990).
In (11), the general linear parametric restrictions are specified as H0 : Rβ =
q, and the alternative hypothesis is specified as H1 : Rβ = q. In particular, the
no-threshold-effect hypothesis H∗0 : β1 = β2 and the alternative hypothesis H∗
1 :
β1 = β2 are expressed in (13) with R∗ = (Ip+1,−Ip+1) and q∗ = 0(p+1)×1. In (14),
the regression score conditional on the nuisance parameter γ is given by st(γ) =
Zt−1(γ)ut. In (16), the estimated regression score under H1 conditional on γ is given
by st(γ) = Zt−1(γ)ut(γ).
For γ1,γ2 ∈ Γ, define some matrices conditional on the sample:
V n(γ1,γ2) =Mn(γ1)−1Sn(γ1,γ2)Mn(γ2)
−1,
Sn(γ1,γ2) =1
n
n∑t=1
st(γ1)st(γ2)⊤, Mn(γ) =
1
n
n∑t=1
Zt−1(γ)Zt−1(γ)⊤.
We will sometimes abbreviate V n(γ) = V n(γ,γ) and Sn(γ) = Sn(γ,γ) when appro-
priate, recovering (18) and (19). The population versions of these matrices, denoted
as V (γ1,γ2), S(γ1,γ2), and M (γ), are defined in (30) and (31). We will sometimes
abbreviate V (γ) = V (γ,γ) and S(γ) = S(γ,γ) when appropriate. The following
lemma is useful for proving Theorem 1, Theorem 2, Lemma 4, and Theorem 5.
Lemma A.1. If Assumption 1 holds, then the following are true: (i) Sn(γ1,γ2)p→
S(γ1,γ2) uniformly over γ1,γ2 ∈ Γ, (ii) Mn(γ)p→ M (γ) uniformly over γ ∈ Γ,
(iii) V n(γ1,γ2)p→ V (γ1,γ2) uniformly over γ1,γ2 ∈ Γ, and (iv) n−1/2
∑nt=1 st(γ) ⇒
G(γ), where G(γ) is a mean zero Gaussian process with covariance kernel S(γ1,γ2).
33
The proof of Lemma A.1 follows directly from Assumption 1 by application of the law
of large numbers and the central limit theorem paired with the finite support of Γ.
Note that uniform convergence is necessary for results to hold under H∗0 . Assumption
1 allows uniform convergence to hold in situations where the support of Γ is not finite
(see, e.g., Hansen, 1996, Theorems 1 and 3).
A.1 Proof of Theorem 1
i) Recall that the CoTAR model is given in (5) and the conditional least squares
estimator β(γ) is given in (8). Substitute (5) into (8) and rearrange to get
β(γ) =
{n∑
t=1
Zt−1(γ)Zt−1(γ)⊤
}−1 [ n∑t=1
Zt−1(γ){Zt−1(γ)
⊤β0 + ut
}]
= β0 +1√n
{1
n
n∑t=1
Zt−1(γ)Zt−1(γ)⊤
}−1{1√n
n∑t=1
Zt−1(γ)ut
}
= β0 +1√nMn(γ)
−1
{1√n
n∑t=1
st(γ)
}.
Hence, we have that
√n{β(γ)− β0
}=Mn(γ)
−1
{1√n
n∑t=1
st(γ)
}. (A.1)
The desired result follows from application of Lemma A.1.
ii) Theorem 1.(ii) follows directly from arguments in the proof of Theorem 1.(i) and
application of Lemma A.1.
iii) It is sufficient to verify Conditions 1-4 of Chan (1993). By applying Theorem 3.7
of Bradley (2005) to our Assumption 1, Condition 1 of Chan (1993) can be verified.
Conditions 2 and 3 follow directly from Assumption 1, and H∗1 implies Condition 4.
A.2 Proof of Theorem 2
i) Impose bothH∗0 andH0. Let ψ(γ) be a mean zero Gaussian process with covariance
kernel V (γ1,γ2). In view of (A.1),√n{β(γ) − β0} ⇒ ψ(γ) by Theorem 1.(i). Let
W(γ) = ψ(γ)⊤R⊤{RV (γ)R⊤}−1Rψ(γ) and incorporate all possible values of γ in
34
W(γ) as follows.
supW ≡ supγ∈Γ
W(γ) = maxγ∈Γ
W(γ), (A.2)
aveW ≡∫Γ
W(γ)dµ∗(γ), (A.3)
expW ≡ ln
[∫Γ
exp
{W(γ)
2
}dµ∗(γ)
], (A.4)
where some subset of Γ has positive measure with respect to µ∗ (see, e.g., Davies,
1977, 1987, Andrews and Ploberger, 1994). Equations (A.2)-(A.4) are the asymptotic
counterparts to (22)-(24), respectively.
Let g(W) denote either supW , aveW , or expW . Observe that g(·) is a continuous
functional of the Gaussian process ψ(γ). Let F (·) denote the distribution function of
g(·), and define pn = 1−F (gn). Let {ξt}nt=1 be iid standard normal random variables.
Define
Wn(γ) = vn(γ)⊤Mn(γ)
−1R⊤{RV n(γ)R
⊤}−1
RMn(γ)−1vn(γ),
vn(γ) =1√n
n∑t=1
st(γ)ξt.
Note that vn(γ) is a mean zero Gaussian process with covariance kernel Sn(γ1,γ2)
conditional on the sample. Let gn = g(Wn), and let Fn denote the conditional distri-
bution function of gn conditional on the sample. Let pn = 1− Fn(gn), then Theorem
2 of Hansen (1996) implies that pn ⇒p 1−F (g). Finally, apply the Glivenko-Cantelli
Theorem to see that pBn (H0)p→ pn as n → ∞ and B → ∞.
ii) Impose H1 so that Rβ0 − q = 0. Recall from (17) that, conditional on γ, the
Wald test statistic with respect to H0 is given by
Wn(γ) = n{Rβ(γ)− q
}⊤ {RV n(γ)R
⊤}−1 {
Rβ(γ)− q}.
It is sufficient to prove that g(Wn)p→ ∞. Since g(Wn) denotes either supWn, aveWn,
or expWn, it is sufficient to prove that there exists γ ∈ Γ such that Wn(γ)p→ ∞. By
Lemma A.1.(iii), we have that RV n(γ)R⊤ converges uniformly in probability over
γ ∈ Γ to RV (γ)R⊤, and infγ∈Γ det{V (γ)} > 0.
If H∗0 is true, then by Theorem 1.(ii), β(γ)
p→ β0 uniformly over γ ∈ Γ. Hence,
Wn(γ)p→ ∞ for any γ ∈ Γ. If H∗
1 is true, then β(γ0)p→ β0 and hence Wn(γ0)
p→ ∞.
Thus, g(Wn)p→ ∞ under H1 when either H∗
0 is true or false.
35
A.3 Proof of Lemma 4
i) Recall from (28) that the Wald test statistic associated with H0 is given by Wn =
n(Rβ − q)⊤(RV nR⊤)−1(Rβ − q), where V n = V n(γ). If H
∗1 is true, then we have
by Theorem 1.(iii) that
γ − γ0 = Op(n−1), (A.5)
√n(β − β0)
d→ N{0,V (γ0)}. (A.6)
If H0 is additionally true, then by (A.6)√n(Rβ − q)
d→ N{0,RV (γ0)R⊤} and
hence n(Rβ − q)⊤{RV (γ0)R
⊤}−1(Rβ − q) d→ χ2
h. By Lemma A.1.(iii) and (A.5),
V np→ V (γ0). Hence, Wn
d→ χ2h when both H∗
1 and H0 are true. Now let F (·)be the cumulative distribution function of a χ2
h random variable so that pn,χ2(H0) =
1−F (Wn). Observe F is strictly increasing and continuous, and F is the asymptotic
cdf of Wn under H0 when H∗1 is true, so item i) follows.
ii) Let H1 be true. It is sufficient to prove that Wnp→ ∞. First, if H∗
1 is true, then
Rβ − q converges in probability to the nonzero vector Rβ0 − q by Theorem 1.(iii),
and Lemma A.1 implies that RV nR⊤ p→ RV (γ0)R
⊤, which is a positive-definite
matrix. Hence, Wnp→ ∞ under H1 and H∗
1 .
Second, if H∗0 is true, then Rβ(γ)− q converges uniformly in probability to the
nonzero vectorRβ0−q by Theorem 1.(ii), and Lemma A.1 implies thatRV n(γ)R⊤ p→
RV (γ)R⊤ uniformly over γ ∈ Γ. Since infγ∈Γ det{V (γ)} > 0, Wnp→ ∞ under H1
and H∗0 . Thus, we conclude that Wn
p→ ∞ under H1, when either H∗0 is true or false.
A.4 Proof of Theorem 5
Let pn,χ2(H0) be the p-value associated with the χ2 post-test in Lemma 4, let pBn (H0)
be the p-value of the bootstrap test for H0, and let pn(H0) be the p-value of the
sequential test for H0. Recall that the sequential test proceeds as follows: (1) if we
fail to reject H∗0 at the pre-test, then pn(H0) = pBn (H0); (2) if we reject H∗
0 at the
pre-test, then pn(H0) = max{pBn (H0), pn,χ2(H0)}. Given this procedure, Theorem 5 is
a direct implication of Theorem 2 and Lemma 4.
i) Impose H0, and let a2 be the significance level of the post-test. Since failure to
reject a false H∗0 occurs with probability approaching 0 by Corollary 3.(ii), we only
need show the following three cases:
36
a) H∗0 is true, and we fail to reject H∗
0 : limB→∞ lim supn→∞ Pr{pn(H0) < a2} =
limB→∞ limn→∞ Pr{pBn (H0) < a2} = a2 by Theorem 2.(i).
b) H∗0 is true, but we rejectH
∗0 : limB→∞ lim supn→∞ Pr{pn(H0) < a2}= limB→∞
lim supn→∞ Pr[max{pBn (H0), pn,χ2(H0)} < a2] ≤ limB→∞ limn→∞ Pr{pBn (H0) <
a2} = a2 by Theorem 2.(i).
c) H∗0 is false, and we reject H∗
0 : limB→∞ lim supn→∞ Pr{pn(H0) < a2} =
limB→∞ lim supn→∞ Pr[max{pBn (H0), pn,χ2(H0)} < a2] ≤ limn→∞ Pr{pn,χ2(H0) <
a2} = a2 by Lemma 4.(i).
Thus, limB→∞ lim supn→∞ Pr{pn(H0) < a2} ≤ a2 under H0.
ii) Impose H1. Theorem 2.(ii) proves that pBn (H0)p→ 0, and Lemma 4.(ii) proves that
pn,χ2(H0)p→ 0. Hence, pn(H0)
p→ 0 under H1, when either H∗0 is true or false.
37