Conditional Threshold Autoregression (CoTAR)

Conditional Threshold Autoregression (CoTAR)

Kaiji Motegi∗ John W. Dennis† Shigeyuki Hamori‡

Kobe University IDA Kobe University

November 10, 2021

Abstract

We propose a new time series model where the threshold is specified as

an empirical quantile of recent observations of a threshold variable. The

resulting conditional threshold traces the fluctuation of the threshold vari-

able, which can enhance the fit and interpretation of the model. In the pro-

posed conditional threshold autoregressive (CoTAR) model, the existence

of threshold effects can be tested by wild-bootstrap tests which incorporate

all possible values of nuisance parameters. The estimation and hypoth-

esis testing of the CoTAR model satisfy desired statistical properties in

both large and small samples. We fit the CoTAR model to new confirmed

COVID-19 cases in the U.S. and Japan. Significant conditional threshold

effects are detected for both countries, and the implied persistence struc-

tures are consistent with the fact that the number of new confirmed cases

in the U.S. is larger than in Japan.

JEL codes: C22, C24, C51.

Keywords: COVID-19, Nonlinear time series analysis, Profiling estimation, Regime

switch, Self-exciting threshold autoregression (SETAR), Wild bootstrap.

∗Corresponding author. Graduate School of Economics, Kobe University. Address: 2-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501 Japan. E-mail: [email protected]

†Institute for Defense Analyses (IDA). Research results and conclusions expressed are those ofthe authors and do not necessarily reflect the views of IDA. E-mail: [email protected]

‡Graduate School of Economics, Kobe University. E-mail: [email protected]

1

1 Introduction

It is well known that a time series often has heterogeneous properties below versus

above a certain threshold. Such nonlinearities are referred to as threshold effects, and

there is a vast literature on modeling and testing them. One of the most well-known

models in this field is the threshold autoregression (TAR) proposed by Tong (1978).

In the TAR model, a target series y follows AR(p) processes with coefficients being

different across regimes, and a regime switch is triggered when a threshold variable

x crosses a constant threshold parameter µ.1 Hansen (2000) proposed the threshold

regression (TR) model to unify various forms of threshold effects. The TR model is

designed for both cross-section and time series data, and includes TAR as a special

case. The threshold parameter of TR is specified as a constant as in TAR.

This paper proposes an alternative model where the threshold is specified as an

empirical quantile of recent observations of x. More precisely, the regime at time t is

determined by whether xt−d < µt−d−1(c) or xt−d ≥ µt−d−1(c), where the conditional

threshold µt(c) is the 100c-percentile of {xt, xt−1, . . . , xt−m+1}. The memory size m is

chosen by the researcher, while the delay parameter d and the percentile parameter c

can be estimated from data. The conditional threshold represents a “normal” level of

recent x if c is around 0.5 and an “abnormal” level if c is far from 0.5. The conditional

threshold traces the fluctuation of x, which can enhance the fit and interpretation of

the model.

Time-varying or state-dependent threshold models can be categorized into five

groups, depending on the specifications of thresholds. First, Bessec (2003) generalizes

the self-exciting TAR model of Balke and Fomby (1997) by replacing constant thresh-

olds with time-dependent but deterministic thresholds. Second, the threshold may be

specified as a linear combination of observed covariates; the TR model is extended

in this direction by Seo and Linton (2007) and Yu and Fan (2021); the smooth tran-

sition autoregressive (STAR) model (e.g., Granger and Terasvirta, 1993) is extended

by Dueker, Psaradakis, Sola, and Spagnolo (2013); the regression kink (RK) model

of Hansen (2017) is extended by Yang and Su (2018). Third, the threshold may be

assumed to follow a latent AR process; STAR is extended in this direction by Dueker,

Owyang, and Sola (2010); TR is extended by Zhu, Chen, and Lin (2019). Fourth,

Yang, Lee, and Chen (2021) generalize TR by applying a Fourier approximation to the

1 See Tong and Lim (1980), Chen and Tsay (1991), and Liu and Susko (1992) for early contribu-tions on the statistical properties of TAR. See Chen, So, and Liu (2011), Hansen (2011), Tong (2011,2015), and Tsay and Chen (2019) for extensive surveys on TAR.

2

threshold. Fifth, Motegi, Cai, Hamori, and Xu (2020, MCHX2020) add time-varying

threshold effects to the heterogeneous autoregressive (HAR) model of Corsi (2009),

where the thresholds are sample averages of recent observations of x for each sam-

pling frequency. A novel contribution of MCHX2020 is that the empirical conditional

moment of x is used as a threshold for the first time in the literature.

Of these existing models, the conditional average approach of MCHX2020 is clos-

est to the proposed CoTAR approach. An advantage of our specification is that the

threshold can be an “abnormal” level of x by setting c to be close to 0 or 1, whereas

the threshold of MCHX2020 is restricted to be an average level of x. Further, our

framework is more general because we do not impose the HAR restriction. We also es-

tablish asymptotic properties of the estimation and hypothesis testing of the CoTAR

model, while MCHX2020 reported simulation and empirical evidence only.

Statistical inference of CoTAR is analogous to TAR; a key issue is the identifi-

cation of nuisance parameters γ = (d, c)⊤. The CoTAR model contains regression

parameters (β1,β2), where βr signifies the vector of the intercept and AR parame-

ters in regime r ∈ {1, 2}. When β1 = β2, threshold effects are absent and γ is not

identified. When β1 = β2, threshold effects are present and γ is identified. We adopt

a two-step method called profiling to estimate (β1,β2) and γ. Asymptotic properties

of the profiling estimator and the hypothesis testing of regression parameters (e.g.,

testing the individual zero restriction) depend crucially on whether γ is identified or

not.2

Testing the null hypothesis of no threshold effects (i.e., β1 = β2) requires a non-

standard testing procedure, since γ is not identified under the null. To establish

asymptotically valid tests, we apply Hansen’s (1996) wild-bootstrap tests which in-

corporate all possible values of γ.3 Using the bootstrap test of the no-threshold-effect

hypothesis as a pre-test, we test a general linear parametric restriction as a post-test.

To control the size of the post-test, we adopt the identification category selection

procedure of Andrews and Cheng (2012). Under some regularity conditions, both the

pre-test and the post-test attain asymptotic validity. Further, we show via Monte

Carlo simulations that the proposed tests have sharp size and high power in small

samples.

We fit the CoTAR model to daily new confirmed COVID-19 cases per million peo-

2 See Chan (1993) and Chan and Tsay (1998) for early contributions on the estimation of TAR.3 Also see Davies (1977, 1987), Andrews and Ploberger (1994), Stinchcombe and White (1998),

Hansen (2000), Gonzalo and Wolf (2005), Andrews and Cheng (2012, 2013), Elliott, Muller, andWatson (2015), McCloskey (2017), and Hill (2021).

3

ple in the United States (U.S.) and Japan. The threshold variable is chosen to be the

target variable itself, making it a self-exciting CoTAR model. Significant conditional

threshold effects are detected for both countries, and the implied persistence struc-

tures explain —from a time series standpoint— why the number of new confirmed

cases per million people in the U.S. is larger than in Japan. On average, the decelera-

tion regime lasts one day shorter in the U.S. than in Japan, while the duration of the

acceleration regime is roughly the same between the two countries. In view of these

empirical findings, a potential policy implication requiring further investigation is that

the U.S. could more efficiently combat the pandemic by more strongly encouraging

safety when the pandemic is decelerating.

The rest of this paper is organized as follows. In Section 2, the CoTAR model is

proposed formally. In Section 3, the procedures of estimation and hypothesis testing

are described. In Section 4, the asymptotic properties of the proposed methods are

derived. In Section 5, we evaluate the finite sample performance of CoTAR via Monte

Carlo simulations. In Section 6, the empirical application on daily new confirmed

COVID-19 cases is presented. Some concluding remarks are provided in Section 7.

Omitted technical details are collected in Appendices. Extra simulation results are

provided in the separate supplemental material.

2 Conditional threshold autoregression

2.1 Motivation and specification

Let yt and xt be a target variable and a threshold variable at time t ∈ {1, . . . , n},respectively. As a benchmark, consider a threshold autoregressive (TAR) model with

two regimes:

yt =

α1 +∑p

k=1 ϕ1kyt−k + ut if xt−d < µ,

α2 +∑p

k=1 ϕ2kyt−k + ut if xt−d ≥ µ,(1)

where (αr, ϕr1, . . . , ϕrp) are regression parameters in regime r ∈ {1, 2}; d is the delay

parameter; µ is the threshold parameter. Usual conditions on the error term ut

include a martingale difference sequence (i.e., E(ut |ut−1, ut−2, . . . ) = 0) and finite

second moment (i.e., E(u2t ) = σ2 < ∞); the assumptions on ut are defined more

precisely in Section 4. When the threshold variable is the target variable itself (i.e.,

xt = yt), (1) is called the self-exciting TAR (SETAR) model.

A key feature of (1) is that y has different autocorrelation structures below versus

4

above the unconditional threshold µ. The term “unconditional” means that µ is time-

independent and chosen from the entire memory X n1 = {x1, . . . , xn}. We propose an

alternative approach of replacing µ with a conditional threshold µt which is time-

dependent and chosen from a local memory X tt−m+1 = {xt−m+1, . . . , xt}. Specifically,

we propose the conditional threshold autoregressive (CoTAR) model:

yt =

α1 +∑p

k=1 ϕ1kyt−k + ut if xt−d < µt−d−1(c),

α2 +∑p

k=1 ϕ2kyt−k + ut if xt−d ≥ µt−d−1(c),(2)

where the conditional threshold µt(c) is the mc-th smallest value (or equivalently

the 100c-percentile) of X tt−m+1; m = #X t

t−m+1 is the size of the local memory; c ∈{1/m, 2/m, . . . , 1} signifies the relevant percentile, where the possible values c can

take are restricted to be discrete so that mc ∈ {1, . . . ,m}. When xt = yt, we call (2)

the self-exciting CoTAR (SE-CoTAR) model.

A unique feature of CoTAR lies in the specification of the conditional threshold

µt−d−1(c). If m = 1, then c = 1 by construction and hence µt−d−1(c) = xt−d−1. In

this special case, (2) reduces to (1) with the threshold variable ∆xt−d = xt−d−xt−d−1

and the known threshold µ = 0. If m is an odd number and c = (2m)−1(m+1), then

µt−d−1(c) coincides with the median of X t−d−1t−d−m. In this case, regime 1 arises when

x is below the “normal” level given the local memory, while regime 2 arises when

x is above it. A similar interpretation holds when m is an even number, although

µt−d−1(c) would not be exactly equal to the median of X t−d−1t−d−m.

If the value of c is close to the lower bound 1/m or the upper bound 1, then a

regime switch is triggered by a rare event of x crossing an “abnormal” level given

the local memory. Suppose, for example, that a SE-CoTAR model with m = 14 and

c = 11/14 = 0.786 is fitted to daily changes in new confirmed COVID-19 cases, where

weekday effects are assumed to be smoothed out. Then, the conditional threshold

is the 78.6-percentile of the recent 14-day observations of the changes in new cases,

hence regime 2 represents an extremely serious phase of pandemic (in the relative

sense). The conditional threshold approach is intuitively reasonable, since individuals

seem to evaluate the current status of pandemic relative to the recent past, not to a

constant cut-off value. Similar arguments might well apply to other variables such as

asset price, economic growth, and public debt.

One might be tempted to specify the conditional threshold as a conditional av-

erage of x (i.e., µt = m−1∑m−1

ℓ=0 xt−ℓ); indeed, MCHX2020 took this approach in

the HAR framework. This is an intuitively plausible specification, and there is a

5

computational advantage that the percentile parameter c disappears. A possible lim-

itation, however, is that the threshold level cannot be an “abnormal” level unlike the

conditional quantile specification with c being away from 0.5.

A more general specification than the simple average would be a weighted average:

µt =∑m−1

ℓ=0 wℓxt−ℓ with wℓ ≥ 0 and∑m−1

ℓ=0 wℓ = 1. If the entire weighting scheme w =

(w0, . . . , wm−1)⊤ is estimated without any restrictions, then parameter proliferation

is likely to occur. If w is given or tightly parameterized, then there is a higher

risk of misspecification. Further, whether w is estimated or not, an identification

problem arises since two distinct weighting schemes can lead to an identical profile

of regimes for all time periods. Hence, the proposed conditional quantile approach is

better balanced between flexible specification and practical implementation than the

conditional average approaches.

2.2 Matrix representation

For elaborating the statistical properties of CoTAR, it is of use to rewrite (2) in a

matrix form. First, stack the parameters as

β1 =

α1

ϕ11

...

ϕ1p

, β2 =

α2

ϕ21

...

ϕ2p

, β︸︷︷︸

K×1

=

β1

β2

, γ =

dc

, θ =

βγ

,

where K = 2(p+1). The target parameter vector θ is partitioned into the regression

parameters β and the nuisance parameters γ. One or both elements of γ could be

pre-specified by the researcher, but this paper estimates both of them in order to avoid

misspecification. To focus on the estimation of θ, we sidestep lag selection issues by

assuming that (p,m) are known.

Second, define binary variables which determine the regime:

I1t(c) = 1 {xt < µt−1(c)} , I2t(c) = 1 {xt ≥ µt−1(c)} , (3)

where 1(A) is the indicator function which equals 1 if event A occurs and 0 otherwise.

6

Trivially,∑2

r=1 Irt(c) = 1 for any c and t. Using (3), stack the regressors as

zt−1︸︷︷︸(p+1)×1

= (1, yt−1, . . . , yt−p)⊤, Zt−1(γ)︸︷︷︸

K×1

=

zt−1I1,t−d(c)

zt−1I2,t−d(c)

. (4)

Then, (2) can be rewritten as a single equation:

yt = Zt−1(γ)⊤β + ut. (5)

3 Statistical inference of CoTAR

In this section, we describe the procedures of the estimation and testing of CoTAR.

In Section 3.1, the profiling estimation of β and γ is proposed. In Section 3.2, wild-

bootstrap tests for general linear restrictions of β are described. In Section 3.3, we

propose sequential tests where the no-threshold-effect hypothesis is tested at the pre-

test and then other linear restrictions (e.g., the individual zero restriction of β) are

tested at the post-test.

3.1 Profiling estimation

The CoTAR model (2) can be estimated in a similar manner with the TAR model

(1). Let B ⊆ RK be the space of β. The space of the delay parameter d is defined as

D = {d, . . . , d}, where the lower and upper bounds are pre-specified by the researcher.

Taking the memory size m as given, the largest possible space of c is given by C =

{1/m, 2/m, . . . , 1}. Let δr(c) = n−1∑n

t=1 Irt(c) be the share of regime r ∈ {1, 2} to

the entire sample. For some c ∈ C, δr(c) may be too small to identify both regimes

in finite samples. A practical compromise often made in the TAR literature is to

restrict the parameter space so that both regimes account for at least 15% of the

entire sample:4

C ={c ∈ C | min{δ1(c), δ2(c)} > 0.15

}. (6)

The space of γ = (d, c)⊤ is defined as Γ = D × C, where × signifies the Cartesian

product. The dimension of Γ is finite by construction, which simplifies the derivation

of asymptotic properties. Finally, the space of θ is given by Θ = B × Γ.

Define the quadratic loss function L(θ) =∑n

t=1{yt−Eθ(yt | yt−1, . . . , y1)}2, where

4 The practice of using the cut-off value 15% originates in a suggestion by Andrews (1993).

7

Eθ(yt | yt−1, . . . , y1) is the conditional expectation of yt given θ and the past observa-

tions {yt−1, . . . , y1}. The least squares (LS) estimator for θ is defined as the minimizer

of the quadratic loss function:

θ = argminθ∈Θ

L(θ). (7)

The LS estimator θ can be computed via a two-step approach called profiling.

Fixing γ ∈ Γ, the regressors Zt−1(γ) in (5) can be computed from data; hence the

LS estimator for β conditional on γ can easily be computed by

β(γ) =

{n∑

t=1

Zt−1(γ)Zt−1(γ)⊤

}−1{ n∑t=1

Zt−1(γ)yt

}. (8)

The resulting residual is given by

ut(γ) = yt −Zt−1(γ)⊤β(γ). (9)

Since Γ is a finite parameter space by construction, the optimization problem (7)

is equivalent to a two-step optimization problem which minimizes the conditional

quadratic loss function given γ and then chooses an optimal γ that delivers the

smallest conditional loss function. Specifically, the LS estimator for γ coincides with

γ = argminγ∈Γ

n∑t=1

ut(γ)2. (10)

Substitute (10) into (8) to compute β = β(γ), resulting in the LS estimator θ =

(β⊤, γ⊤)⊤.

3.2 Bootstrap tests for linear restrictions

Consider testing linear restrictions with respect to the regression parameters β in (5):

H0 : Rβ = q, H1 : Rβ = q, (11)

where R is an h×K pre-specified selection matrix of full row rank and q is an h× 1

pre-specified vector; h is the number of restrictions. An important special case of (11)

8

is the no-threshold-effect hypothesis:

H∗0 : β1 = β2, H∗

1 : β1 = β2. (12)

Clearly, H∗0 is a special case of H0 with the following choice of (R, q):

R = (Ip+1, −Ip+1) ≡ R∗︸︷︷︸(p+1)×K

, q = 0(p+1)×1 ≡ q∗, (13)

where Ip+1 is the identity matrix of dimension p + 1. Under H∗0 , (5) reduces to

the single-regime AR(p) which does not depend on the threshold variable x. Hence,

γ = (d, c)⊤ is not identified under H∗0 . In fact, H∗

0 is the only case where γ is not

identified; under H∗1 , γ is always identifiable.

In this section, we describe the procedure of wild-bootstrap tests for the general

linear restrictions H0 which may or may not coincide with the no-threshold-effect

hypothesis H∗0 . (In general, the researcher does not know whether H∗

0 is true or false,

and the asymptotic validity of the bootstrap tests for H0 depends crucially on the

truth of H∗0 . Sequential tests which address this dilemma are proposed in Section

3.3.) To proceed, it is of use to define some key quantities conditional on γ ∈ Γ. The

regression score associated with (5) is given by

st(γ) = Zt−1(γ)ut. (14)

Let ut be the LS residual from (5) with H0 being imposed, then the estimated regres-

sion score under H0 is given by

st(γ) = Zt−1(γ)ut. (15)

The estimated regression score under H1 is given by

st(γ) = Zt−1(γ)ut(γ), (16)

where ut(γ) is defined in (9).

The Wald test statistic with respect to (11) is given by

Wn(γ) = n{Rβ(γ)− q

}⊤ {RV n(γ)R

⊤}−1 {

Rβ(γ)− q}, (17)

where β(γ) is defined in (8). The heteroscedasticity-robust covariance matrix estima-

9

tor is given by

V n(γ) =Mn(γ)−1Sn(γ)Mn(γ)

−1, (18)

where

Sn(γ) =1

n

n∑t=1

st(γ)st(γ)⊤, Mn(γ) =

1

n

n∑t=1

Zt−1(γ)Zt−1(γ)⊤. (19)

Similarly, the Lagrange multiplier (LM) test statistic is given by

LMn(γ) = n{Rβ(γ)− q

}⊤ {RV n(γ)R

⊤}−1 {

Rβ(γ)− q}, (20)

where

V n(γ) =Mn(γ)−1Sn(γ)Mn(γ)

−1, Sn(γ) =1

n

n∑t=1

st(γ)st(γ)⊤. (21)

All possible values of γ in Wn(γ) can be incorporated in at least three common

ways:

supWn ≡ supγ∈Γ

Wn(γ) = maxγ∈Γ

Wn(γ), (22)

aveWn ≡ 1

#Γ

∑γ∈Γ

Wn(γ), (23)

expWn ≡ ln

[1

#Γ

∑γ∈Γ

exp

{Wn(γ)

2

}]. (24)

The asymptotic distributions of the test statistics (22)-(24) are non-standard in gen-

eral; hence, we adopt the wild bootstrap of Hansen (1996). Let g(Wn) denote either

supWn, aveWn, or expWn, then proceed as follows.

Step 1 For each b ∈ {1, . . . , B}, generate ξ(b)t

i.i.d.∼ N (0, 1) with t ∈ {1, . . . , n}.

Step 2 Compute a bootstrap test statistic g{W(b)n }, where

W(b)n (γ) = v(b)n (γ)⊤Mn(γ)

−1R⊤{RV n(γ)R

⊤}−1

RMn(γ)−1v(b)n (γ); (25)

v(b)n (γ) =1√n

n∑t=1

st(γ)ξ(b)t ; (26)

R, st(γ), V n(γ), and Mn(γ) are defined in (11), (16), (18), and (19),

10

respectively.

Step 3 Repeat Steps 1-2 independently, resulting in g{W(1)n }, . . . , g{W(B)

n }.

Step 4 Compute the bootstrap p-value:

pBn (H0) =1

B

B∑b=1

1[g{W(b)

n

}≥ g(Wn)

]. (27)

Reject H0 at the 100a% level if pBn (H0) < a, where a ∈ (0, 1) is the nominal

size.

The testing procedure is analogous when the Wald test is replaced with the LM

test. Use (15) and (21) to compute st(γ) and V n(γ), respectively. The transformed

LM test statistics are obtained via (22)-(24), where Wn(γ) is replaced with LMn(γ)

using (20). Steps 1-4 are executed with (25) and (26) being replaced with

LM(b)n (γ) = v(b)n (γ)⊤Mn(γ)

−1R⊤{RV n(γ)R

⊤}−1

RMn(γ)−1v(b)n (γ),

v(b)n (γ) =1√n

n∑t=1

st(γ)ξ(b)t .

3.3 Sequential tests for linear restrictions

In this section, consider testing the general linear restrictions H0 which are distinct

from the no-threshold-effect hypothesis H∗0 . In this case, we have that (R, q) =

(R∗, q∗), where (R, q) are defined in (11) and (R∗, q∗) are defined in (12).

If we knew that H∗0 were true, then H0 could be tested via the bootstrap test as

described in Section 3.2. If we knew that H∗0 were false, then H0 could be tested via

the asymptotic χ2 test. Taking the Wald test as an example, substitute (10) into (17)

to get the test statistic:

Wn ≡ Wn(γ) = n(Rβ − q

)⊤ (RV nR

⊤)−1 (

Rβ − q), (28)

where V n = V n(γ). Reject H0 at the 100a% level if Wn exceeds the upper 100a%

point of the chi-squared distribution χ2h, where the degrees of freedom coincide with

the number of restrictions, h.

In practice, whether H∗0 is true or false is unknown to the researcher. Since the

asymptotic distribution of β depends on whether γ is identified or not, asymptotically

11

valid tests of H0 must take into account both possibilities of identified and unidenti-

fied γ. Andrews and Cheng (2012) propose Identification Category Selection (ICS)

procedures in which a pre-test is performed to determine the identification category

of nuisance parameters (see also Andrews and Cheng, 2013, Hill, 2021). Adopting

the ICS procedure, we first perform the bootstrap test for H∗0 , and then test H0 in a

way that depends on the result of the pre-test. The following outlines this sequential

procedure.

Pre-test for the no-threshold-effect hypothesis H∗0 . In view of (13) and (17),

the conditional Wald test statistic corresponding to H∗0 is given by

W∗n(γ) = n

{R∗β(γ)− q∗

}⊤ {R∗V n(γ)(R

∗)⊤}−1 {

R∗β(γ)− q∗}. (29)

Incorporate all values of γ using (22), (23), or (24) to calculate supW∗n, aveW∗

n, or

expW∗n, respectively. Recall that the asymptotic distribution of these test statistics

are non-standard under H∗0 ; hence, we perform the bootstrap test outlined in Section

3.2, replacing R in (25) with R∗ defined in (13), and replacing Wn in (27) with W∗n

defined in (29). Reject H∗0 at the 100a1% level if pBn (H

∗0 ) < a1, where a1 ∈ (0, 1) is

the nominal size of the pre-test.

Post-test for linear restrictions H0. Perform the post-test for H0 as follows.

Case 1 If H∗0 is not rejected at the 100a1% level by the pre-test, assume that

γ is not identified and perform the bootstrap test for H0 as described in

Section 3.2. Following (27), denote the p-value associated with this test as

pBn (H0). Reject H0 at the 100a2% level if pBn (H0) < a2, where a2 ∈ (0, 1)

is the nominal size of the post-test.

Case 2 If H∗0 is rejected at the 100a1% level by the pre-test, do the following:

1. Compute pBn (H0) as in Case 1.

2. Perform the asymptotic χ2 test for H0, assuming that γ0 is identified; see

(28) for the Wald test statistic. Denote the p-value associated with this

test as pn,χ2(H0).

3. Compute the least favorable p-value: pn,lf (H0) = max{pBn (H0), pn,χ2(H0)}.Reject H0 at the 100a2% level if pn,lf (H0) < a2.

12

An intuition behind this algorithm is as follows. Under some regularity conditions,

the pre-test attains asymptotically correct size and power approaching 1 against any

deviation from H∗0 ; see Sections 4.2 and 4.3 for a formal proof. Given these asymptotic

properties, a non-rejection of H∗0 unambiguously motivates the use of the bootstrap

test for H0 (Case 1).

A rejection of H∗0 , by contrast, leaves two possibilities (Case 2). The first pos-

sibility is that H∗0 is true but rejected; this is the type-I error of the pre-test which

is guaranteed to occur with probability approaching a1. In this case, the use of the

bootstrap test for H0 is motivated. The second possibility is that H∗0 is false and

rejected, a correct decision which is guaranteed to occur with probability approaching

1. In this case, the use of the χ2 test for H0 is motivated. Since the researcher does

not know which of the two possibilities is true, the least favorable approach which

combines both the bootstrap test and the χ2 test is adopted.

4 Asymptotic theory

In this section, we derive key asymptotic properties of the estimation and hypothesis

testing of CoTAR. First, define xt(γ) = xt(d, c) = xt−d − µt−d−1(c) so that regime 1

arises at time t if xt(γ) < 0 and regime 2 arises if xt(γ) ≥ 0. A virtue of the CoTAR

specification is that xt(γ) is a measurable function of {xt−d−m, . . . , xt−d} and hence

the existing asymptotic theory of TAR can be applied almost directly (e.g., Chan,

1993, Hansen, 1996).

To proceed, some notation is introduced. Letp→ denote the convergence in prob-

ability, letd→ denote the convergence in distribution, and let ⇒ denote weak con-

vergence. Let zt = maxγ∈Γ[tr{Zt(γ)⊤Zt(γ)}]1/2 be the norm of the regressor matrix

given in (4). Define some population quantities:

V (γ1,γ2) =M (γ1)−1S(γ1,γ2)M (γ2)

−1, (30)

S(γ1,γ2) = E{st(γ1)st(γ2)

⊤} , M (γ) = E{Zt(γ)Zt(γ)

⊤} , (31)

where st(γ) = Zt−1(γ)ut as defined in (14). We will sometimes abbreviate V (γ) =

V (γ,γ) and S(γ) = S(γ,γ) when appropriate; they are the population counter-

parts of V n(γ) and Sn(γ) defined in (18) and (19), respectively. Impose some basic

assumptions which are analogous to Assumption 1 of Hansen (1996).

Assumption 1. (i) {yt, xt, ut} are strictly stationary and absolutely regular with mix-

13

ing coefficients η(k) = O(ν−k) for some ν > 1. (ii) For some ι > ν, E|zt|4ι < ∞ and

E|ut|4ι < ∞. (iii) infγ∈Γ det{M (γ)} > 0. (iv) xt(γ) has density function f(x) such

that supx f(x) < ∞.

Assumption 1 is standard in TAR literature. The mixing condition in item (i) controls

the degree of serial dependence. The rate of the mixing condition is set to be faster

than in Hansen (1996) so that the conditions of Chan (1993) are satisfied. Chan (1993)

shows that item (i) implies a strong form of geometric ergodicity for TAR processes;

the same implication holds for CoTAR processes.

Sufficient conditions for Assumption 1 include iid {ut} with finite 4ι-th moment

for some ι > 1 and the regime-specific stability condition (i.e., the roots of the char-

acteristic equation λp −∑p

k=1 λp−kϕrk = 0 lie strictly inside the unit circle for each

r ∈ {1, 2}). Such a regime-specific stability condition is generally stronger than needed

to ensure the ergodicity of TAR processes (Chan and Tong, 1985, Chen and Tsay,

1991). Further, Hansen (1996) notes that it is likely that a martingale difference con-

dition is sufficient in place of the iid condition. These observations apply to CoTAR

as well.


Let β0 and γ0 be the true values of β and γ, respectively. Assume that β0 is an interior

point of B and that γ0 ∈ Γ, where B and Γ are the parameter spaces constructed in

Section 3.1. Under H∗0 , β(γ) is consistent for β0 and asymptotically normal for any

fixed γ ∈ Γ. Further, under H∗0 , γ0 is not identified and γ is not consistent for γ0;

consequently, β = β(γ) is consistent for β0 but not asymptotically normal. Under

H∗1 , γ0 is identified and γ is super-consistent for γ0; consequently, β is consistent for

β0 and asymptotically normal. These results are summarized in Theorem 1.

Theorem 1. If Assumption 1 holds, then the following are true: i) Under H∗0 ,√

n{β(γ)−β0} ⇒ G(γ) where G(γ) is a mean zero Gaussian process with covariance

kernel V (γ1,γ2), defined in (30). ii) Under H∗0 , β(γ)

p→ β0 uniformly over γ ∈ Γ.

iii) Under H∗1 , γ − γ0 = Op(n

−1) and√n(β − β0)

d→ N{0,V (γ0)}.

The proof of Theorem 1 is provided in Appendix A.1. Our Assumption 1 implies

the assumptions of Chan (1993), hence the theorems of Chan (1993) can be used to

establish Theorem 1.(iii).

Different assumptions or estimators can result in different asymptotic distribu-

tions for γ under H∗1 (e.g., Chan, 1993, Hansen, 2000, Yang, Lee, and Chen, 2021).

14

Instead of deriving the asymptotic distribution of γ, we focus on proving the asymp-

totic validity of the proposed tests for linear parametric restrictions; see Sections 4.2

and 4.3.

4.2 Bootstrap tests for linear restrictions

Consider the wild-bootstrap test for the linear-restriction hypothesis H0, described in

Section 3.2, where H0 may or may not be distinct from H∗0 . When γ0 is not identified,

a key condition for the asymptotic validity of this test is the uniform convergence

of {β(γ)}γ∈Γ to β0, which has been established in Theorem 1.(ii). While we can

establish the asymptotic validity of the wild-bootstrap test when H∗0 is true, it is

generally not true that this test for H0 is asymptotically valid when H∗0 is false.

Theorem 2 summarizes these results for the Wald and LM tests with any of the three

transformations (22)-(24).

Theorem 2. If Assumption 1 holds, then the following are true: i) If H∗0 and H0 are

both true, the bootstrap p-value pBn (H0) defined in (27) is asymptotically uniform on

[0, 1]. ii) Under H1, pBn (H0)

p→ 0 as n → ∞ and B → ∞ when either H∗0 is true or

false.

The proof of Theorem 2 is provided in Appendix A.2. An intuitive reason why Theo-

rem 2 holds is as follows. Under H∗0 , the conditional Wald test statistic Wn(γ) in (17)

converges weakly to a chi-squared process over γ ∈ Γ; this is a direct implication from

the uniform asymptotic normality of β(γ) established in Theorem 1.(i). In general,

the asymptotic distributions of the test statistics supWn, aveWn, and expWn under

H0 and H∗0 must be computed by simulation. Let vn(γ) = n−1/2

∑nt=1 st(γ)ξt with ξt

being an iid standard normal random variable; then vn(γ) converges weakly in prob-

ability to a mean zero Gaussian process with covariance kernel S(γ1,γ2) in the sense

of Gine and Zinn (1990), where S(γ1,γ2) appears in (31). This implies convergence

in probability of pBn (H0) to the true p-value under H0 when H∗0 is true.

4.3 Sequential tests for linear restrictions

In this section, consider testing H0 which is distinct from H∗0 . In this case, we have

that (R, q) = (R∗, q∗), where (R, q) are defined in (11) and (R∗, q∗) are defined in

(12). We establish the asymptotic validity of the sequential test proposed in Section

3.3.

15

The asymptotic validity of the pre-test for H∗0 follows as a special case of Theorem

2 by setting H0 to be identical to H∗0 .

Corollary 3. If Assumption 1 holds, then the following are true: i) Under H∗0 , the

bootstrap p-value pBn (H∗0 ) defined in (27) is asymptotically uniform on [0, 1]. ii) Under

H∗1 , p

Bn (H

∗0 )

p→ 0 as n → ∞ and B → ∞.

When H∗0 is not rejected by the pre-test, our post-test uses only the bootstrap

test (Case 1). This testing strategy is justified by two facts. First, Corollary 3.(ii)

guarantees that the pre-test rejects H∗0 with probability approaching 1 when H∗

0 is

false; hence, the non-rejection of H∗0 during the pre-test provides overwhelming evi-

dence for H∗0 . Second, Theorem 2 guarantees that the post-test for H0 has desired

asymptotic properties when H∗0 is true.

Next, we elaborate on Case 2, whereH∗0 is rejected in the pre-test. This case leaves

two conflicting possibilities: (1) a correct rejection of H∗0 or (2) an incorrect rejection

of H∗0 (i.e., the type-I error of the pre-test). We begin with the first possibility. When

H∗1 is true, the bootstrap test for H0 is not correctly sized in general. However, the

asymptotic normality of β under H∗1 established in Theorem 1.(iii) can be used to

our advantage; this motivates incorporating the asymptotic χ2 test into the post-test

when we have rejected H∗0 . The asymptotic properties of the χ2 test are summarized

in Lemma 4, where pn,χ2(H0) is the associated p-value.

Lemma 4. If Assumption 1 holds, then the following are true: i) If both H∗1 and H0

are true, then pn,χ2(H0) is asymptotically uniform on [0, 1]. ii) If H1 is true, then

pn,χ2(H0)p→ 0 when either H∗

0 is true or false.

The proof of Lemma 4 is provided in Appendix A.3. Lemma 4.(i) and Lemma 4.(ii)

with H∗0 being false are almost direct implications of Theorem 1.(iii). Lemma 4.(ii)

with H∗0 being true follows from Theorem 1.(i) and (ii).

Now consider the second possibility: the type-I error of the pre-test. In general,

the asymptotic χ2 test is not correctly sized when H∗0 is true. Theorem 2, however,

guarantees the asymptotic validity of the bootstrap test for H0 when H∗0 is true. The

combination of Theorem 2.(i) and Lemma 4.(i) motivates the least favorable test,

which bounds the rejection frequencies of the bootstrap and χ2 tests.

Theorem 5 establishes that the sequential testing procedure bounds the asymp-

totic type-I error rate at the nominal level under H0 and maintains consistency against

alternatives H1. Let a1 denote the significance level of the pre-test and a2 denote the

16

significance level of the post-test. Let pn(H0) be the p-value associated with the post-

test in the sequential procedure, and recall the decision rule that H0 is rejected when

pn(H0) < a2. We omit the dependence of pn(H0) on B, the number of bootstrap

iterations, for convenience.

Theorem 5. If Assumption 1 holds, then the following are true: i) Under H0,

limB→∞ lim supn→∞ Pr{pn(H0) < a2} ≤ a2. ii) Under H1, pn(H0)p→ 0 as n → ∞

and B → ∞.

Theorem 5 is proven in Appendix A.4. Note that Theorem 5.(i) does not guarantee

the nominal size a2 is exactly attained by the sequential test for H0; instead, Theorem

5.(i) guarantees that a2 is not exceeded asymptotically.

An intuition behind Theorem 5 is as follows. Consider each scenario in the pre-

test:

a) Correctly failing to reject H∗0 : γ0 is not identified, and the bootstrap test,

which assumes non-identification of γ0, is solely used in the post-test.

b) Incorrect rejection of H∗0 (i.e., the type-I error): γ0 is not identified, and the

least favorable test is used.

c) Correct rejection of H∗0 : γ0 is identified, and the least favorable test is used.

d) Incorrectly failing to reject H∗0 (i.e., the type-II error): γ0 is identified, but

the bootstrap test is solely used.

Corollary 3 indicates, asymptotically, that (a) occurs with probability 1−a1, (b) occurs

with probability a1, (c) occurs with probability 1, and (d) occurs with probability 0.

By Theorem 2, the post-test forH0 in scenario (a) is asymptotically valid. By Theorem

2 and Lemma 4, the post-test in scenarios (b) and (c) is asymptotically valid. Thus,

the proposed sequential test is asymptotically valid when either H∗0 is true or false.

5 Monte Carlo simulation

To evaluate the finite sample performance of the proposed methods, we conduct Monte

Carlo simulations. The DGP is SE-CoTAR with p = 1:

yt =

α10 + ϕ10yt−1 + ϵt if yt−d0 < µt−d0−1(c0),

α20 + ϕ20yt−1 + ϵt if yt−d0 ≥ µt−d0−1(c0),(32)

17

where α10 = α20 = 0, d0 = 1, c0 = 0.5, and ϵti.i.d.∼ N (0, 1). The conditional threshold

µt(c0) takes them0c0-th smallest value (i.e., almost the median) of {yt, yt−1, . . . , yt−m0+1}.The memory sizem0 ∈ {6, 18} is assumed to be known. The AR(1) parameters are set

to be ϕ10 = 0.2 and ϕ20 ∈ {0.2, 0.8}. When ϕ20 = 0.2, threshold effects are absent and

(32) reduces to the one-regime AR(1) process yt = 0.2yt−1 + ϵt for all t ∈ {1, . . . , n}.When ϕ20 = 0.8, threshold effects are present and (32) does not degenerate.

The sample size is set to be n ∈ {100, 500, 1000}, which resembles typical em-

pirical applications in economics and finance. The case with (n,m0) ∈ {100, 6}, forexample, can be thought of as quarterly data with the sample period being 25 years

and the memory size being 1 year and a half. The case with (n,m0) ∈ {500, 18} can

be thought of as business-daily data with the sample period being approximately 2

years and the memory size being slightly less than 1 month.

Given this set-up, we inspect the performance of the profiling estimation in Sec-

tion 5.1; the bootstrap tests for the no-threshold-effect hypothesis in Section 5.2;

the sequential tests for the individual zero restriction of the regression parameter in

Section 5.3.


The SE-CoTAR model with p = 1 is specified as

yt =

α1 + ϕ1yt−1 + ut if yt−d < µt−d−1(c),

α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c),(33)

The space of the delay parameter d is D = {1, 2}. The space of the percentile

parameter c is given by (6) so that each regime accounts for at least 15% of the

entire sample. We fit (33) to each of J = 1000 Monte Carlo samples generated from

(32), and estimate the regression parameters β = (α1, ϕ1, α2, ϕ2)⊤ and the nuisance

parameters γ = (d, c)⊤ via profiling.

We report the bias, standard deviation, and root mean squared error (RMSE) for

each element of θ = (β⊤,γ⊤)⊤. The results under ϕ20 ∈ {0.2, 0.8} are summarized in

Tables 1-2, respectively. In Table 1, threshold effects do not exist since ϕ10 = ϕ20 =

0.2. In this case, γ0 is not identified and γ is inconsistent. The simulation results

in Table 1 are in line with this fact. Focus on the percentile parameter c with the

memory size m0 = 6, for example. For each sample size n ∈ {100, 500, 1000}, the biasis {0.077, 0.084, 0.090} and the RMSE is {0.260, 0.259, 0.275}, respectively.

18

Table 1: Simulation results on the profiling estimation (γ0 is not identified)

m0 = 6

n = 100 n = 500 n = 1000

Bias Stdev RMSE Bias Stdev RMSE Bias Stdev RMSE

α1 −0.018 0.381 0.382 −0.003 0.142 0.142 −0.003 0.110 0.110

ϕ1 −0.036 0.305 0.308 −0.008 0.121 0.121 −0.007 0.088 0.088

α2 0.017 0.337 0.338 0.005 0.149 0.149 0.000 0.108 0.108

ϕ2 −0.033 0.286 0.288 −0.011 0.123 0.123 0.000 0.089 0.089

d 0.595 0.491 0.771 0.631 0.483 0.794 0.621 0.485 0.788

c 0.077 0.248 0.260 0.084 0.245 0.259 0.090 0.260 0.275

m0 = 18

n = 100 n = 500 n = 1000


α1 −0.043 0.503 0.505 −0.016 0.194 0.195 −0.006 0.145 0.145

ϕ1 −0.056 0.366 0.370 −0.017 0.149 0.150 −0.008 0.108 0.108

α2 0.070 0.506 0.510 0.001 0.190 0.190 0.003 0.138 0.138

ϕ2 −0.057 0.386 0.390 −0.006 0.148 0.148 −0.005 0.103 0.103

d 0.619 0.486 0.787 0.638 0.481 0.799 0.625 0.484 0.791

c 0.022 0.243 0.244 0.028 0.252 0.253 0.021 0.253 0.254

DGP: yt = ϕ10yt−1 + ϵt if yt−d0< µt−d0−1(c0) and yt = ϕ20yt−1 + ϵt if yt−d0

≥ µt−d0−1(c0), where

ϕ10 = ϕ20 = 0.2, d0 = 1, m0 ∈ {6, 18}, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the m0c0-th smallest

value of {yt, yt−1, . . . , yt−m0+1}. Since ϕ10 = ϕ20, γ0 is not identified. Model: yt = α1 + ϕ1yt−1 + ut

if yt−d < µt−d−1(c) and yt = α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c). The profiling estimator for

(α1, ϕ1, α2, ϕ2, d, c) is computed with the choice sets being d ∈ {1, 2} and c ∈ {1/m0, . . . , 1}. The

bias, standard deviation, and RMSE across J = 1000 Monte Carlo samples are reported.

The results in Table 1 are also in line with the fact that βp→ β0; recall Theorem

1. Focus on ϕ1 with m0 = 6, for example. For each n ∈ {100, 500, 1000}, the bias is

{−0.036,−0.008,−0.007} and the RMSE is {0.308, 0.121, 0.088}. The consistency of

β is observed for m0 = 18 too, but the standard deviation increases for each n. For

ϕ1 with m0 = 18, the RMSE is {0.370, 0.150, 0.108}. This result suggests that the

larger value of m0 has an adverse effect on the small sample performance of the point

estimation of β, probably due to the larger choice set of c.

In Table 2, threshold effects exist since ϕ10 = ϕ20. In this case, γ0 is identified

and γp→ γ0; recall Theorem 1.(iii). The simulation results in Table 2 are consistent

with this fact. Focus again on c with m0 = 6, for example. For each n, the bias is

19

Table 2: Simulation results on the profiling estimation (γ0 is identified)

m0 = 6

n = 100 n = 500 n = 1000


α1 −0.005 0.240 0.240 −0.003 0.076 0.076 −0.002 0.054 0.054

ϕ1 −0.029 0.217 0.219 −0.007 0.071 0.072 −0.005 0.049 0.050

α2 0.085 0.361 0.371 0.014 0.090 0.092 0.008 0.063 0.063

ϕ2 −0.056 0.193 0.201 −0.012 0.056 0.057 −0.005 0.039 0.040

d 0.212 0.409 0.460 0.002 0.045 0.045 0.000 0.000 0.000

c 0.052 0.209 0.215 0.001 0.044 0.044 −0.000 0.016 0.016

m0 = 18

n = 100 n = 500 n = 1000


α1 0.009 0.368 0.368 −0.008 0.088 0.089 −0.004 0.057 0.057

ϕ1 −0.041 0.312 0.315 −0.019 0.091 0.093 −0.007 0.059 0.060

α2 0.219 0.540 0.583 0.026 0.126 0.129 0.015 0.083 0.085

ϕ2 −0.122 0.252 0.280 −0.018 0.068 0.070 −0.010 0.046 0.047

d 0.332 0.471 0.576 0.007 0.083 0.084 0.000 0.000 0.000

c 0.021 0.207 0.208 −0.003 0.071 0.071 −0.000 0.034 0.034



ϕ10 = 0.2, ϕ20 = 0.8, d0 = 1, m0 ∈ {6, 18}, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the m0c0-

th smallest value of {yt, yt−1, . . . , yt−m0+1}. Since ϕ10 = ϕ20, γ0 is identified. Model: yt = α1 +

ϕ1yt−1+ut if yt−d < µt−d−1(c) and yt = α2+ϕ2yt−1+ut if yt−d ≥ µt−d−1(c). The profiling estimator

for (α1, ϕ1, α2, ϕ2, d, c) is computed with the choice sets being d ∈ {1, 2} and c ∈ {1/m0, . . . , 1}. Thebias, standard deviation, and RMSE across J = 1000 Monte Carlo samples are reported.

{0.052, 0.001,−0.000} and the RMSE is {0.215, 0.044, 0.016}. The results in Table 2

are also consistent with the fact that βp→ β0. In summary, the simulation results

indicate that the profiling estimation performs well in finite samples.

5.2 Testing the no-threshold-effect hypothesis

Consider testing the no-threshold-effect hypothesis H∗0 : (α10, ϕ10) = (α20, ϕ20) versus

the alternative hypothesis H∗1 : (α10, ϕ10) = (α20, ϕ20). Since γ0 is not identified under

H∗0 , we implement the bootstrap tests described in Section 3.2. For comparison, the

sup-, ave-, and exp-Wald tests as well as their LM counterparts are all implemented.

20

The rejection frequencies of each test are computed, where the nominal size is a = 0.05

and the number of bootstrap samples is B = 500. Since α10 = α20 = 0 by assumption,

the rejection frequencies correspond to empirical size when ϕ10 = ϕ20 and empirical

power when ϕ10 = ϕ20.

In Table 3, the rejection frequencies of each test are reported. The LM tests

achieve sharp empirical size even for the smallest sample size of n = 100. Taking the

ave-LM test with m0 = 6 as an example, the empirical size is {0.037, 0.038, 0.049} for

n ∈ {100, 500, 1000}, respectively. These results are consistent with Corollary 3.(i).

The sup-LM and exp-LM tests are slightly more conservative than the ave-LM test.

The empirical size of each LM test is almost unchanged when the memory size m0

changes from 6 to 18, indicating that the test is robust against the choice of m0.

The Wald tests are over-sized when n = 100 but correctly sized when n ≥ 500.

Taking the ave-Wald test as an example, the empirical size is {0.126, 0.048, 0.054}for m0 = 6. When n = 100, the sup-Wald and exp-Wald tests exhibit even worse

size distortions than the ave-Wald test. Hence, it is advised to use the LM tests

—especially the ave-LM test— to control the empirical size in small samples.

The empirical power of each test approaches 1 as n → ∞, confirming Corollary

3.(ii). The empirical power of the ave-LM test is {0.504, 1.000, 1.000} for m0 = 6 and

{0.419, 0.999, 1.000} for m0 = 18. The sup-LM and exp-LM tests are less powerful

than the ave-LM test when n = 100, reflecting the fact that they are conservative

under H∗0 . Their empirical power, however, reaches 1 when n = 500. In summary,

the ave-LM test performs remarkably well under both H∗0 and H∗

1 .

5.3 Sequential tests for the individual zero restriction

Consider testing the zero restriction of β, an arbitrary element of β = (α1, ϕ1, α2, ϕ2)⊤.

We perform the sequential test developed in Section 3.3. The no-threshold-effect

hypothesis H∗0 is pre-tested by the bootstrap ave-LM test at the 5% level. The ave-

LM test statistic is selected since it is found to have the best finite sample performance

in Section 5.2. The individual zero restriction of β is post-tested at the 5% level as

follows: (i) if H∗0 is not rejected by the pre-test, then the bootstrap sup-LM test is

performed; (ii) if H∗0 is rejected, then both the Wald test based on the asymptotic χ2

distribution and the bootstrap sup-LM test are performed, and the larger one of the

two p-values is taken. The Wald statistic and the sup-LM statistic are selected since

they are found to have the best finite sample performance in extra simulations not

reported here; all test statistics are carefully compared in the supplemental material.

21

Table 3: Rejection frequencies of the bootstrap tests for the no-threshold-effect hy-pothesis H∗

0

ϕ10 = 0.2 and ϕ20 = 0.2 (empirical size)

m0 = 6 m0 = 18

Statistic n = 100 n = 500 n = 1000 n = 100 n = 500 n = 1000

sup-Wald 0.188 0.045 0.062 0.238 0.073 0.067

ave-Wald 0.126 0.048 0.054 0.138 0.062 0.055

exp-Wald 0.179 0.046 0.063 0.205 0.069 0.063

sup-LM 0.028 0.027 0.051 0.020 0.044 0.049

ave-LM 0.037 0.038 0.049 0.040 0.045 0.049

exp-LM 0.028 0.031 0.049 0.024 0.034 0.050

ϕ10 = 0.2 and ϕ20 = 0.8 (empirical power)

m0 = 6 m0 = 18

Statistic n = 100 n = 500 n = 1000 n = 100 n = 500 n = 1000

sup-Wald 0.730 1.000 1.000 0.642 0.999 1.000

ave-Wald 0.715 1.000 1.000 0.631 0.999 1.000

exp-Wald 0.741 1.000 1.000 0.655 0.999 1.000

sup-LM 0.325 1.000 1.000 0.215 0.999 1.000

ave-LM 0.504 1.000 1.000 0.419 0.999 1.000

exp-LM 0.389 1.000 1.000 0.292 0.999 1.000



ϕ10 = 0.2, ϕ20 ∈ {0.2, 0.8}, d0 = 1, m0 ∈ {6, 18}, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the m0c0-th

smallest value of {yt, yt−1, . . . , yt−m0+1}. Model: yt = α1 + ϕ1yt−1 + ut if yt−d < µt−d−1(c) and

yt = α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c). The profiling estimation is executed with the choice sets

being d ∈ {1, 2} and c ∈ {1/m0, . . . , 1}. We report the rejection frequencies of the wild-bootstrap

Wald and LM tests for H∗0 : (α1, ϕ1) = (α2, ϕ2), where the nominal size is a = 0.05.

In Table 4, we report rejection frequencies of the sequential test across J = 1000

Monte Carlo samples. The rejection frequencies are interpreted as empirical size for

(α1, α2) and empirical power for (ϕ1, ϕ2), since α10 = α20 = 0, ϕ10 = 0, and ϕ20 = 0.

As in Section 5.2, the number of bootstrap samples is B = 500. In this section, the

memory size is fixed at m0 = 6 for the sake of brevity; qualitatively similar results

appear when m0 = 18.

In view of Table 4, the empirical size of the sequential test is well controlled for

all cases considered, confirming Theorem 5.(i). When H∗0 is true with ϕ20 = 0.2,

the empirical size with respect to α1 is {0.027, 0.038, 0.046} for n ∈ {100, 500, 1000},

22

Table 4: Rejection frequencies of the sequential tests for the individual zero restriction

ϕ20 = 0.2 (H∗0 is true) ϕ20 = 0.8 (H∗

0 is false)

Size Power Size Power

n α1 α2 ϕ1 ϕ2 α1 α2 ϕ1 ϕ2

100 0.027 0.034 0.168 0.151 0.088 0.025 0.591 0.988

500 0.038 0.034 0.896 0.901 0.045 0.041 0.778 1.000

1000 0.046 0.054 0.986 0.987 0.044 0.063 0.972 1.000

DGP: yt = ϕ10yt−1 + ϵt if yt−d0 < µt−d0−1(c0) and yt = ϕ20yt−1 + ϵt if yt−d0 ≥ µt−d0−1(c0), where

ϕ10 = 0.2, ϕ20 ∈ {0.2, 0.8}, d0 = 1, m0 = 6, c0 = 0.5, ϵti.i.d.∼ N (0, 1), and µt(c0) is the (m0c0)-th

smallest value of {yt, yt−1, . . . , yt−m0+1}. Model: yt = α1+ϕ1yt−1+ut if yt−d < µt−d−1(c) and yt =

α2 + ϕ2yt−1 + ut if yt−d ≥ µt−d−1(c). The no-threshold-effect hypothesis, H∗0 : (α1, ϕ1) = (α2, ϕ2),

is pre-tested by the bootstrap ave-LM test at the 5% level. The zero restriction of each regression

parameter is post-tested at the 5% level as follows: (i) if H∗0 is not rejected by the pre-test, then the

bootstrap sup-LM test is performed; (ii) if H∗0 is rejected, then both the Wald test based on the χ2

distribution and the bootstrap sup-LM test are performed, and the larger one of the two p-values is

taken. This table reports the rejection frequencies across J = 1000 Monte Carlo samples, which are

interpreted as empirical size for (α1, α2) and empirical power for (ϕ1, ϕ2).

respectively. When H∗0 is false with ϕ20 = 0.8, the empirical size with respect to α1 is

{0.088, 0.045, 0.044}. Similar results are observed for α2.

Another implication from Table 4 is that the empirical power of the sequential

test is sufficiently high, which confirms Theorem 5.(ii). When H∗0 is true, the empirical

power with respect to ϕ1 is {0.168, 0.896, 0.986} for n ∈ {100, 500, 1000}. When H∗0

is false, the empirical power is {0.591, 0.778, 0.972}. Similar results are observed for

ϕ2. In summary, the sequential test of the individual zero restriction achieves sharp

size and high power whether threshold effects are present or absent.

6 Empirical application

Modelling and predicting the spread of the novel coronavirus are one of the most

urgent research topics in the modern global society. There is a rapidly growing lit-

erature in which time series methods are employed to model and predict COVID-19

data (see, e.g., Chimmula and Zhang, 2020, Zeroual, Harrou, Dairi, and Sun, 2020).

In particular, Aidoo, Ampofo, Awashie, Appiah, and Adebanji (2021) fitted STAR

models to daily new confirmed COVID-19 cases in the African sub-region, detecting

23

nonlinear effects. Indeed, the number of new confirmed cases is likely an explosive

process, since individuals feel tempted to take tests in response to an emerging pan-

demic. Such a built-in acceleration mechanism could produce time-varying threshold

effects, motivating the use of SE-CoTAR. In Section 6.1, we describe our data and

perform some preliminary analysis. In Section 6.2, we perform the main analysis using

SE-CoTAR.

6.1 Data and preliminary analysis

The target series is daily new confirmed COVID-19 cases per million people in the

U.S. and Japan. The data are publicly available at Our World in Data (OWID).

As is well known, the raw data (new_cases_per_million) have strong weekday

effects (e.g., the number of new confirmed cases tends to be smaller on weekends

and holidays due to the smaller number of tests). A seasonally adjusted version

(new_cases_smoothed_per_million) is also available at OWID, in which the week-

day effects are smoothed out. We analyze the seasonally adjusted version, denoted

by {wt}nt=1, from April 4, 2020 through June 23, 2021 (n = 446 days). The start date

of the sample period roughly matches the time when the first wave of the pandemic

began in the U.S. and Japan.

In Figure 1, log series lnwt and log-differenced series yt = ∆ lnwt = wt − wt−1

are plotted. The log series move smoothly due to the seasonal adjustment, and we

clearly observe several waves of the pandemic. The number of new cases in the U.S.

has been decreasing dramatically since May 2021, while the log series of Japan seems

to have a moderate upward trend throughout the sample period. The count for the

U.S., however, has been far larger than the count of Japan. Indeed, the number of

new cases per million people on June 23, 2021 (i.e., the end date of the sample) is

wn = 34.14 for the U.S. and wn = 11.40 for Japan, almost triple. The log-difference

of the new confirmed cases, yt, exhibits rather complex fluctuations with persistent

swings and temporary noise being combined, which suggests the presence of nonlinear

effects.

24

Figure 1: The number of daily new confirmed COVID-19 cases per million people

Jul20 Oct20 Jan21 Apr213

3.5

4

4.5

5

5.5

6

6.5

7

a) lnw, United States


-0.2

-0.1

0

0.1

0.2

b) ∆ lnw, United States

Jul20 Oct20 Jan21 Apr21-2

-1

0

1

2

3

4

5

c) lnw, Japan


-0.2

-0.1

0

0.1

0.2

d) ∆ lnw, Japan

wt = the smoothed version of the number of new confirmed COVID-19 cases per million people on

day t. This figure plots lnwt and ∆ lnwt = lnwt − lnwt−1 for the U.S. and Japan. Sample period:

April 4, 2020 – June 23, 2021.

6.2 Main analysis and discussions

The SE-CoTAR model with p = 3 and m = 14 is fitted to the daily change in the

number of new confirmed COVID-19 cases:

yt =

α1 +∑3

k=1 ϕ1kyt−k + ut if yt−d < µt−d−1(c),

α2 +∑3

k=1 ϕ2kyt−k + ut if yt−d ≥ µt−d−1(c).

Regime 1 represents a deceleration phase where the change in new confirmed cases

is small relative to the local memory of size m = 14 days (i.e., 2 weeks). Regime 2

25

represents an acceleration phase where the change is relatively large. The space of

the delay parameter d is D = {1, . . . , 14}. The choice of m = d = 14 is in accordance

with a common perception that a present status of infection is an outcome of people’s

activities approximately 2 weeks ago. The space of the percentile parameter c is given

by (6) so that each regime accounts for at least 15% of the entire sample. The AR

lag length p = 3 is subjectively selected to balance the model fit and parsimony; a

data-driven selection of p is left as a future task.

Let βr = (αr, ϕr1, ϕr2, ϕr3)⊤ be a vector of regression parameters in regime r ∈

{1, 2}. Let γ = (d, c)⊤ be a vector of nuisance parameters. We conduct the profiling to

estimate β = (β⊤1 ,β

⊤2 )

⊤ and γ. Our primary interest lies in testing the no-threshold-

effect hypothesis H∗0 : β1 = β2. To this end, the wild-bootstrap ave-LM test with

B = 5000 iterations is implemented; the ave-LM test statistic is chosen since it is

found to have the best finite sample performance in Section 5.2.

Taking the ave-LM test of H∗0 as the pre-test, the individual zero restriction of

each element of β is post-tested as follows: (i) if H∗0 is not rejected by the pre-test

at the 5% level, then the bootstrap sup-LM test is performed; (ii) if H∗0 is rejected,

then both the Wald test based on the asymptotic χ2 distribution and the bootstrap

sup-LM test are performed, and the larger one of the two p-values is taken. The Wald

statistic and the sup-LM statistic are chosen since they are found to perform best in

small samples (Section 5.3).

Further, we compute several key quantities which help us understand contrasts

between the deceleration and acceleration regimes. First, we compute the estimated

share of regime r as δr(c) = n−1∑n

t=1 Irt(c), where c is the profiling estimator for c

and Irt(c) is defined in (3). Second, the empirical transition probability from regime

r′ to regime r is computed as follows.

δrr′(c) =

∑nt=1 Irt(c)Ir′,t−1(c)∑n

t=1 Ir′,t−1(c), r, r′ ∈ {1, 2}.

Third, the average duration of regime r, denoted as Dr(c), is computed for each

r ∈ {1, 2}.In Figure 2, the estimated conditional threshold µt(c) is plotted for each country.

We observe that µt(c) traces the persistent swing of yt strikingly well, which high-

lights the prominent feature of SE-CoTAR. A key question is whether the persistence

structure of yt is homogeneous or heterogeneous across the regimes. This question can

be addressed by testing the no-threshold-effect hypothesis H∗0 . As reported in Table

26

5, the bootstrap p-value of the ave-LM test for H∗0 is 0.049 for the U.S. and 0.002 for

Japan. Hence, H∗0 is rejected marginally at the 5% level for the U.S., and rejected at

any conventional level for Japan. We therefore conclude that conditional threshold

effects are present for both countries.

Figure 2: Estimated conditional threshold given the SE-CoTAR model


-0.2

-0.1

0

0.1

0.2

a) United States


-0.2

-0.1

0

0.1

0.2

b) Japan

The target series, the log-difference of the daily new confirmed COVID-19 cases, is plotted with the

blue, noisier line. The estimated conditional threshold based on SE-CoTAR, {µt(c)}, is plotted with

the red, smoother line. Sample period: April 4, 2020 – June 23, 2021.

A number of other interesting implications can be drawn from Table 5. In view

of the profiling estimates of the AR parameters and their least favorable p-values, the

persistence structure seems to differ considerably across the regimes and countries. At

the 5% level, (ϕ11, ϕ12) and (ϕ22, ϕ23) are significantly positive in the U.S.; (ϕ11, ϕ12)

and ϕ21 are significantly positive in Japan. The estimated delay parameter d is 7

days for the U.S. and 10 days for Japan, confirming the existence of 1-week to 2-

week delay. The estimated percentile parameter c is 0.500 for the U.S. and 0.643 for

Japan, indicating that the thresholds are located around the median of the 14-day

local memory.

The estimated share of the deceleration regime, δ1(c), is 0.484 for the U.S. and

0.574 for Japan. This contrast is consistent with the fact that the number of new

confirmed cases is larger in the U.S. than in Japan. In view of the empirical transition

probabilities, the U.S. and Japan have similar persistence structures in the acceleration

regime; the probability of switching from regime 2 to regime 1, δ12(c), is 0.342 for the

U.S. and 0.355 for Japan. The persistence structures in the deceleration regime,

however, differ across the two countries; the probability of switching from regime 1

27

Table 5: Empirical results of the SE-CoTAR model on the COVID-19 cases

United States

α1 ϕ11 ϕ12 ϕ13 α2 ϕ21 ϕ22 ϕ23

0.004 0.266 0.439 0.026 −0.006 0.127 0.508 0.281

(0.014) (0.006) (0.000) (0.771) (0.045) (0.293) (0.000) (0.005)

δ1(c) δ2(c) δ11(c) δ21(c) δ12(c) δ22(c) D1(c) D2(c)

0.484 0.516 0.636 0.364 0.342 0.658 2.750 2.896

d c p(H∗0 ) - - - - -

7 0.500 0.049 - - - - -

Japan

α1 ϕ11 ϕ12 ϕ13 α2 ϕ21 ϕ22 ϕ23

−0.004 0.349 0.299 0.227 0.010 0.420 0.112 0.049

(0.180) (0.000) (0.017) (0.343) (0.129) (0.000) (0.205) (0.604)

δ1(c) δ2(c) δ11(c) δ21(c) δ12(c) δ22(c) D1(c) D2(c)

0.574 0.426 0.734 0.266 0.355 0.645 3.758 2.788

d c p(H∗0 ) - - - - -

10 0.643 0.002 - - - - -

The SE-CoTAR model with p = 3 and m = 14 is fitted to the log-difference of the daily new

confirmed COVID-19 cases per million people in the U.S. and Japan. Regime 1: yt−d < µt−d−1(c).

Regime 2: yt−d ≥ µt−d−1(c). Sample period: April 4, 2020 – June 23, 2021 (n = 446). The spaces

of the nuisance parameters are d ∈ {1, . . . , 14} and c ∈ {1/m, . . . , 1}. This table reports the profilingestimates of the regression parameters as well as their least favorable p-values in parentheses; the

share of regime r denoted as δr(c); the empirical transition probability from regime r′ to regime r

denoted as δrr′(c); the average duration of regime r denoted as Dr(c); the profiling estimates of the

nuisance parameters; the wild-bootstrap p-value for the no-threshold-effect hypothesis H∗0 .

to regime 2, δ21(c), is 0.364 for the U.S. and 0.266 for Japan. These results suggest

that the U.S. has the stronger tendency to switch from the deceleration regime to the

acceleration regime than Japan does. Indeed, the average duration of the deceleration

regime, D1(δ), is 2.750 days for the U.S. and 3.758 days for Japan. On average, the

deceleration regime lasts one day shorter in the U.S. than in Japan.

These empirical results bring a new insight on why the pandemic is more seri-

ous in the U.S. than in Japan. At the acceleration regime, the two countries are

homogeneous in the time series sense, perhaps because it is hard or even impossible

to control the pandemic when it is accelerating. What makes the difference between

28

the U.S. and Japan is the duration of the deceleration regime. Given these empirical

findings, a possible policy implication requiring further analysis is that the U.S. could

more efficiently combat the pandemic by more strongly encouraging safety when the

pandemic seems to be slowing down (e.g., staying at home or wearing a mask just for

another day when the danger seems past).

7 Conclusion

We have proposed the conditional threshold autoregression (CoTAR), a novel time se-

ries model where the threshold is specified as an empirical quantile of the local memory

of a threshold variable x. The resulting conditional threshold traces the fluctuation

of x, which can enhance the fit and interpretation of the model. The parameters

of CoTAR consist of (β1,β2,γ), where βr is the vector of regression parameters in

regime r ∈ {1, 2}; γ is the vector of nuisance parameters. The entire parameters can

be estimated via profiling, and the asymptotic properties of the profiling estimator

depends on whether γ is identifiable or not. A key insight is that γ is unidentified if

and only if there are no threshold effects (i.e., H∗0 : β1 = β2).

To test H∗0 , we have proposed the wild-bootstrap tests which incorporate all

possible values of γ. Using the bootstrap test forH∗0 as a pre-test, any linear constraint

of the regression parameters, such as the individual zero restriction, can be tested by

the proposed sequential test. The construction of the pre-test is inspired by Hansen

(1996), and the construction of the post-test is inspired by the identification category

selection procedure of Andrews and Cheng (2012). We have proven that both the

pre-test and the post-test are asymptotically valid. Furthermore, we have shown via

the Monte Carlo simulation that both tests achieve sharp size and high power in finite

samples.

We have analyzed the daily new confirmed COVID-19 cases per million people in

the U.S. and Japan by fitting the self-exciting CoTAR model. Significant conditional

threshold effects have been detected for both countries, indicating the practical use

of the CoTAR model. The implied persistence structures are consistent with the

fact that the number of new confirmed cases in the U.S. is larger than in Japan. In

particular, the deceleration regime of the U.S. is approximately one day shorter than

the deceleration regime of Japan on average. This empirical result suggests that a

potentially effective measure for the U.S. to combat the pandemic would be to more

strongly encourage safety when the pandemic is decelerating.

29

Acknowledgements

We thank Yasumasa Matsuda and participants at the 3rd Hosoya Prize Lecture and

the 6th Annual International Conference on Applied Econometrics in Hawaii for help-

ful comments and discussions. The first author, Kaiji Motegi, is grateful for the

financial support of Ishii Memorial Securities Research Promotion Foundation and

the Organization for Advanced and Integrated Research (OAIR), Kobe University.

The third author, Shigeyuki Hamori, is grateful for the financial support of JSPS

KAKENHI Grant Number (A) 17H00983 and OAIR.

References

Aidoo, E. N., R. T. Ampofo, G. E. Awashie, S. K. Appiah, and A. O. Ade-banji (2021): “Modelling COVID-19 incidence in the African sub-region usingsmooth transition autoregressive model,” Modeling Earth Systems and Environ-ment, https://doi.org/10.1007/s40808-021-01136-1.

Andrews, D. W. K. (1993): “Tests for Parameter Instability and Structural Changewith Unknown Change Point,” Econometrica, 61, 821–856.

Andrews, D. W. K., and X. Cheng (2012): “Estimation and Inference withWeak, Semi-Strong, and Strong Identification,” Econometrica, 80(5), 2153–2211.

(2013): “Maximum likelihood estimation and uniform inference with spo-radic identification failure,” Journal of Econometrics, 173, 36–56.

Andrews, D. W. K., and W. Ploberger (1994): “Optimal tests when a nuisanceparameter is present only under the alternative,” Econometrica, 62(6), 1383–1414.

Balke, N. S., and T. B. Fomby (1997): “Threshold Cointegration,” InternationalEconomic Review, 38, 627–645.

Bessec, M. (2003): “The asymmetric exchange rate dynamics in the EMS: a time-varying threshold test,” European Review of Economics and Finance, 2, 3–40.

Bradley, R. C. (2005): “Basic properties of strong mixing conditions: A surveyand some open questions,” Probability Surveys, 2, 107–144.

Chan, K. S. (1993): “Consistency and limiting distribution of the least squaresestimator of a threshold autoregressive model,” The Annals of Statistics, 21, 520–533.

Chan, K. S., and H. Tong (1985): “On the use of the deterministic Lyapunovfunction for the ergodicity of stochastic difference equations,” Advances in AppliedProbability, 17, 666–678.

30

Chan, K. S., and R. S. Tsay (1998): “Limiting properties of the least squaresestimator of a continuous threshold autoregressive model,” Biometrika, 85, 413–426.

Chen, C. W. S., M. K. P. So, and F.-C. Liu (2011): “A review of threshold timeseries models in finance,” Statistics and Its Interface, 4, 167–181.

Chen, R., and R. S. Tsay (1991): “On the ergodicity of TAR(1) processes,” TheAnnals of Applied Probability, 1, 613–634.

Chimmula, V. K. R., and L. Zhang (2020): “Time series forecasting of COVID-19transmission in Canada using LSTM networks,” Chaos, Solitons and Fractals, 135,#109864.

Corsi, F. (2009): “A Simple Approximate Long-Memory Model of Realized Volatil-ity,” Journal of Financial Econometrics, 7, 174–196.

Davies, R. B. (1977): “Hypothesis testing when a nuisance parameter is presentonly under the alternative,” Biometrika, 64(2), 247–254.

(1987): “Hypothesis testing when a nuisance parameter is present only underthe alternative,” Biometrika, 74(1), 33–43.

Dueker, M., M. T. Owyang, and M. Sola (2010): “A Time-Varying ThresholdSTAR Model of Unemployment and the Natural Rate,” Working Paper 2010-029A,Federal Reserve Bank of St. Louis.

Dueker, M. J., Z. Psaradakis, M. Sola, and F. Spagnolo (2013): “State-Dependent Threshold Smooth Transition Autoregressive Models,” Oxford Bulletinof Economics and Statistics, 75, 835–854.

Elliott, G., U. K. Muller, and M. W. Watson (2015): “Nearly optimal testswhen a nuisance parameter is present under the null hypothesis,” Econometrica,83, 771–811.

Gine, E., and J. Zinn (1990): “Bootstrapping general empirical measures,” TheAnnals of Probability, 18, 851–869.

Gonzalo, J., and M. Wolf (2005): “Subsampling inference in threshold autore-gressive models,” Journal of Econometrics, 127, 201–224.

Granger, C. W. J., and T. Terasvirta (1993): Modelling Nonlinear EconomicRelationships. Oxford University Press.

Hansen, B. E. (1996): “Inference when a nuisance parameter is not identified underthe null hypothesis,” Econometrica, 64(2), 413–430.

(2000): “Sample splitting and threshold estimation,” Econometrica, 68,575–603.

31

(2011): “Threshold autoregression in economics,” Statistics and Its Interface,4, 123–127.

(2017): “Regression KinkWith an Unknown Threshold,” Journal of Business& Economic Statistics, 35, 228–240.

Hill, J. B. (2021): “Weak-identification robust wild bootstrap applied to a consistentmodel specification test,” Econometric Theory, 37, 409–463.

Liu, J., and E. Susko (1992): “On Strict Stationarity and Ergodicity of a Non-Linear ARMA Model,” Journal of Applied Probability, 29, 363–373.

McCloskey, A. (2017): “Bonferroni-based size-correction for nonstandard testingproblems,” Journal of Econometrics, 200, 17–35.

Motegi, K., X. Cai, S. Hamori, and H. Xu (2020): “Moving average thresh-old heterogeneous autoregressive (MAT-HAR) models,” Journal of Forecasting, 39,1035–1042.

Seo, M. H., and O. Linton (2007): “A smoothed least squares estimator forthreshold regression models,” Journal of Econometrics, 141, 704–735.

Stinchcombe, M. B., and H. White (1998): “Consistent specification testing withnuisance parameters present only under the alternative,” Econometric Theory, 14,295–325.

Tong, H. (1978): “On a threshold model,” in Pattern Recognition and Signal Pro-cessing, ed. by C. H. Chen. Sijthoff and Noordhoff, Amsterdam.

(2011): “Threshold models in time series analysis — 30 years on,” Statisticsand Its Interface, 4, 107–118.

(2015): “Threshold models in time series analysis—Some reflections,” Jour-nal of Econometrics, 189, 485–491.

Tong, H., and K. S. Lim (1980): “Threshold autoregression, limit cycles andcyclical data,” Journal of the Royal Statistical Society. Series B (Methodological),42, 245–292.

Tsay, R. S., and R. Chen (2019): Nonlinear Time Series Analysis. John Wiley &Sons, Inc.

Yang, L., C. Lee, and I. Chen (2021): “Threshold model with a time-varyingthreshold based on Fourier approximation,” Journal of Time Series Analysis, 42,406–430.

Yang, L., and J.-J. Su (2018): “Debt and growth: Is there a constant tippingpoint?,” Journal of International Money and Finance, 87, 133–143.

32

Yu, P., and X. Fan (2021): “Threshold Regression With a Threshold Boundary,”Journal of Business & Economic Statistics, 39, 953–971.

Zeroual, A., F. Harrou, A. Dairi, and Y. Sun (2020): “Deep learning methodsfor forecasting COVID-19 time-series data: A comparative study,” Chaos, Solitonsand Fractals, 140, #110121.

Zhu, Y., H. Chen, and M. Lin (2019): “Threshold models with time-varyingthreshold values and their application in estimating regime-sensitive Taylor rules,”Studies in Nonlinear Dynamics & Econometrics, 23, #20170114.

Appendices

In these appendices, we prove Theorem 1, Theorem 2, Lemma 4, and Theorem 5.

(Corollary 3 is an immediate consequence of Theorem 2.) To this end, we recall and

introduce some notation. Throughout the appendices,p→ denotes the convergence in

probability;d→ denotes the convergence in distribution; ⇒ denotes weak convergence;

⇒p denotes weak convergence in probability as defined by Gine and Zinn (1990).

In (11), the general linear parametric restrictions are specified as H0 : Rβ =

q, and the alternative hypothesis is specified as H1 : Rβ = q. In particular, the

no-threshold-effect hypothesis H∗0 : β1 = β2 and the alternative hypothesis H∗

1 :

β1 = β2 are expressed in (13) with R∗ = (Ip+1,−Ip+1) and q∗ = 0(p+1)×1. In (14),

the regression score conditional on the nuisance parameter γ is given by st(γ) =

Zt−1(γ)ut. In (16), the estimated regression score under H1 conditional on γ is given

by st(γ) = Zt−1(γ)ut(γ).

For γ1,γ2 ∈ Γ, define some matrices conditional on the sample:

V n(γ1,γ2) =Mn(γ1)−1Sn(γ1,γ2)Mn(γ2)

−1,

Sn(γ1,γ2) =1

n

n∑t=1

st(γ1)st(γ2)⊤, Mn(γ) =

1

n

n∑t=1

Zt−1(γ)Zt−1(γ)⊤.

We will sometimes abbreviate V n(γ) = V n(γ,γ) and Sn(γ) = Sn(γ,γ) when appro-

priate, recovering (18) and (19). The population versions of these matrices, denoted

as V (γ1,γ2), S(γ1,γ2), and M (γ), are defined in (30) and (31). We will sometimes

abbreviate V (γ) = V (γ,γ) and S(γ) = S(γ,γ) when appropriate. The following

lemma is useful for proving Theorem 1, Theorem 2, Lemma 4, and Theorem 5.

Lemma A.1. If Assumption 1 holds, then the following are true: (i) Sn(γ1,γ2)p→

S(γ1,γ2) uniformly over γ1,γ2 ∈ Γ, (ii) Mn(γ)p→ M (γ) uniformly over γ ∈ Γ,

(iii) V n(γ1,γ2)p→ V (γ1,γ2) uniformly over γ1,γ2 ∈ Γ, and (iv) n−1/2

∑nt=1 st(γ) ⇒

G(γ), where G(γ) is a mean zero Gaussian process with covariance kernel S(γ1,γ2).

33

The proof of Lemma A.1 follows directly from Assumption 1 by application of the law

of large numbers and the central limit theorem paired with the finite support of Γ.

Note that uniform convergence is necessary for results to hold under H∗0 . Assumption

1 allows uniform convergence to hold in situations where the support of Γ is not finite

(see, e.g., Hansen, 1996, Theorems 1 and 3).

A.1 Proof of Theorem 1

i) Recall that the CoTAR model is given in (5) and the conditional least squares

estimator β(γ) is given in (8). Substitute (5) into (8) and rearrange to get

β(γ) =

{n∑

t=1


}−1 [ n∑t=1

Zt−1(γ){Zt−1(γ)

⊤β0 + ut

}]

= β0 +1√n

{1

n

n∑t=1


}−1{1√n

n∑t=1

Zt−1(γ)ut

}

= β0 +1√nMn(γ)

−1

{1√n

n∑t=1

st(γ)

}.

Hence, we have that

√n{β(γ)− β0

}=Mn(γ)

−1

{1√n

n∑t=1

st(γ)

}. (A.1)

The desired result follows from application of Lemma A.1.

ii) Theorem 1.(ii) follows directly from arguments in the proof of Theorem 1.(i) and

application of Lemma A.1.

iii) It is sufficient to verify Conditions 1-4 of Chan (1993). By applying Theorem 3.7

of Bradley (2005) to our Assumption 1, Condition 1 of Chan (1993) can be verified.

Conditions 2 and 3 follow directly from Assumption 1, and H∗1 implies Condition 4.


i) Impose bothH∗0 andH0. Let ψ(γ) be a mean zero Gaussian process with covariance

kernel V (γ1,γ2). In view of (A.1),√n{β(γ) − β0} ⇒ ψ(γ) by Theorem 1.(i). Let

W(γ) = ψ(γ)⊤R⊤{RV (γ)R⊤}−1Rψ(γ) and incorporate all possible values of γ in

34

W(γ) as follows.

supW ≡ supγ∈Γ

W(γ) = maxγ∈Γ

W(γ), (A.2)

aveW ≡∫Γ

W(γ)dµ∗(γ), (A.3)

expW ≡ ln

[∫Γ

exp

{W(γ)

2

}dµ∗(γ)

], (A.4)

where some subset of Γ has positive measure with respect to µ∗ (see, e.g., Davies,

1977, 1987, Andrews and Ploberger, 1994). Equations (A.2)-(A.4) are the asymptotic

counterparts to (22)-(24), respectively.

Let g(W) denote either supW , aveW , or expW . Observe that g(·) is a continuous

functional of the Gaussian process ψ(γ). Let F (·) denote the distribution function of

g(·), and define pn = 1−F (gn). Let {ξt}nt=1 be iid standard normal random variables.

Define

Wn(γ) = vn(γ)⊤Mn(γ)

−1R⊤{RV n(γ)R

⊤}−1

RMn(γ)−1vn(γ),

vn(γ) =1√n

n∑t=1

st(γ)ξt.

Note that vn(γ) is a mean zero Gaussian process with covariance kernel Sn(γ1,γ2)

conditional on the sample. Let gn = g(Wn), and let Fn denote the conditional distri-

bution function of gn conditional on the sample. Let pn = 1− Fn(gn), then Theorem

2 of Hansen (1996) implies that pn ⇒p 1−F (g). Finally, apply the Glivenko-Cantelli

Theorem to see that pBn (H0)p→ pn as n → ∞ and B → ∞.

ii) Impose H1 so that Rβ0 − q = 0. Recall from (17) that, conditional on γ, the

Wald test statistic with respect to H0 is given by

Wn(γ) = n{Rβ(γ)− q

}⊤ {RV n(γ)R

⊤}−1 {

Rβ(γ)− q}.

It is sufficient to prove that g(Wn)p→ ∞. Since g(Wn) denotes either supWn, aveWn,

or expWn, it is sufficient to prove that there exists γ ∈ Γ such that Wn(γ)p→ ∞. By

Lemma A.1.(iii), we have that RV n(γ)R⊤ converges uniformly in probability over

γ ∈ Γ to RV (γ)R⊤, and infγ∈Γ det{V (γ)} > 0.

If H∗0 is true, then by Theorem 1.(ii), β(γ)

p→ β0 uniformly over γ ∈ Γ. Hence,

Wn(γ)p→ ∞ for any γ ∈ Γ. If H∗

1 is true, then β(γ0)p→ β0 and hence Wn(γ0)

p→ ∞.

Thus, g(Wn)p→ ∞ under H1 when either H∗

0 is true or false.

35

A.3 Proof of Lemma 4

i) Recall from (28) that the Wald test statistic associated with H0 is given by Wn =

n(Rβ − q)⊤(RV nR⊤)−1(Rβ − q), where V n = V n(γ). If H

∗1 is true, then we have

by Theorem 1.(iii) that

γ − γ0 = Op(n−1), (A.5)

√n(β − β0)

d→ N{0,V (γ0)}. (A.6)

If H0 is additionally true, then by (A.6)√n(Rβ − q)

d→ N{0,RV (γ0)R⊤} and

hence n(Rβ − q)⊤{RV (γ0)R

⊤}−1(Rβ − q) d→ χ2

h. By Lemma A.1.(iii) and (A.5),

V np→ V (γ0). Hence, Wn

d→ χ2h when both H∗

1 and H0 are true. Now let F (·)be the cumulative distribution function of a χ2

h random variable so that pn,χ2(H0) =

1−F (Wn). Observe F is strictly increasing and continuous, and F is the asymptotic

cdf of Wn under H0 when H∗1 is true, so item i) follows.

ii) Let H1 be true. It is sufficient to prove that Wnp→ ∞. First, if H∗

1 is true, then

Rβ − q converges in probability to the nonzero vector Rβ0 − q by Theorem 1.(iii),

and Lemma A.1 implies that RV nR⊤ p→ RV (γ0)R

⊤, which is a positive-definite

matrix. Hence, Wnp→ ∞ under H1 and H∗

1 .

Second, if H∗0 is true, then Rβ(γ)− q converges uniformly in probability to the

nonzero vectorRβ0−q by Theorem 1.(ii), and Lemma A.1 implies thatRV n(γ)R⊤ p→

RV (γ)R⊤ uniformly over γ ∈ Γ. Since infγ∈Γ det{V (γ)} > 0, Wnp→ ∞ under H1

and H∗0 . Thus, we conclude that Wn

p→ ∞ under H1, when either H∗0 is true or false.


Let pn,χ2(H0) be the p-value associated with the χ2 post-test in Lemma 4, let pBn (H0)

be the p-value of the bootstrap test for H0, and let pn(H0) be the p-value of the

sequential test for H0. Recall that the sequential test proceeds as follows: (1) if we

fail to reject H∗0 at the pre-test, then pn(H0) = pBn (H0); (2) if we reject H∗

0 at the

pre-test, then pn(H0) = max{pBn (H0), pn,χ2(H0)}. Given this procedure, Theorem 5 is

a direct implication of Theorem 2 and Lemma 4.

i) Impose H0, and let a2 be the significance level of the post-test. Since failure to

reject a false H∗0 occurs with probability approaching 0 by Corollary 3.(ii), we only

need show the following three cases:

36

a) H∗0 is true, and we fail to reject H∗

0 : limB→∞ lim supn→∞ Pr{pn(H0) < a2} =

limB→∞ limn→∞ Pr{pBn (H0) < a2} = a2 by Theorem 2.(i).

b) H∗0 is true, but we rejectH

∗0 : limB→∞ lim supn→∞ Pr{pn(H0) < a2}= limB→∞

lim supn→∞ Pr[max{pBn (H0), pn,χ2(H0)} < a2] ≤ limB→∞ limn→∞ Pr{pBn (H0) <

a2} = a2 by Theorem 2.(i).

c) H∗0 is false, and we reject H∗

0 : limB→∞ lim supn→∞ Pr{pn(H0) < a2} =

limB→∞ lim supn→∞ Pr[max{pBn (H0), pn,χ2(H0)} < a2] ≤ limn→∞ Pr{pn,χ2(H0) <

a2} = a2 by Lemma 4.(i).

Thus, limB→∞ lim supn→∞ Pr{pn(H0) < a2} ≤ a2 under H0.

ii) Impose H1. Theorem 2.(ii) proves that pBn (H0)p→ 0, and Lemma 4.(ii) proves that

pn,χ2(H0)p→ 0. Hence, pn(H0)

p→ 0 under H1, when either H∗0 is true or false.

37

Documents

Conditional Threshold Autoregression (CoTAR)