Panel Approaches to Econometric Analysis of Bubble Behaviour€¦ · Bubble Behaviour Yanbo Liu Singapore Management University December 9, 2019 ... homogeneity over cross-sectional

Panel Approaches to Econometric Analysis ofBubble Behaviour∗

Yanbo LiuSingapore Management University

December 9, 2019

Abstract

This study proposes novel mechanisms for identifying explosive bubbles in panelautoregressions with a latent group structure. Two post-classification panel data ap-proaches are employed to test the explosiveness in time-series data. The first approachapplies a recursive k-means clustering algorithm to explosive panel autoregressions.The second approach applies a modified k-means clustering algorithm to mixed-rootpanel autoregressions. We establish the uniform consistency of the two classificationalgorithms. The k-means procedures achieve the oracle properties so that the post-classification estimators are asymptotically equivalent to the infeasible estimators thatuse the true group identities. Two right-tailed t-statistics, based on post-classificationestimators, are introduced to detect explosiveness. A panel recursive procedure isproposed to estimate the origination date of explosiveness. Asymptotic theory is de-veloped for concentration inequalities, clustering algorithms, and right-tailed t-testsbased on mixed-root panels. Extensive Monte Carlo simulations provide strong evi-dence that the proposed panel approaches lead to substantial power gains comparedwith the time-series approach. Finally, the proposed method is applied to China’s realestate market and generates some interesting empirical results.

JEL classification: C22, C33, C51, G01Keywords: k-means; Incidental parameter; Classification; Parameter heterogeneity;Bubble detection; Concentration inequality; Explosiveness.

1 Introduction

Financial bubbles, such as the dot-com bubble, are well recognized as explosive deviationsof asset prices from their fundamental values. According to the present value model,

Pt =∞∑i=0

(1

1 + rf

)iEt (Dt+i) +Bt, (1)

∗Yanbo Liu, School of Economics, Singapore Management University, 90 Stamford Rd, Singapore178903. Email: [email protected].

1

where at time t, Pt is the price of an asset, Dt is the payoff of the asset, rf is the risk-freeinterest rate, and Bt represents the bubble component which satisfies the submartingaleproperty:

Et (Bt+1) = (1 + rf )Bt.

When there are no bubbles (i.e., Bt = 0), the asset price is completely determined bydividends and unobserved fundamentals. If {Dt+i} contains a unit root (i.e., I(1)), thenthe asset prices {Pt} cannot be explosive. However, if there is a bubble (i.e., Bt 6= 0), {Bt}and hence, {Pt} must be explosive. This outcome is the economic reason why econometricanalysis of bubble behaviour has been focused on doing right-tailed unit root tests onasset prices adjusted by fundamentals; see, for example, Phillips, Shi, and Yu (2015a)(PSY hereafter).

Conventional econometric methods for bubble detection, including the Dickey-Fuller(DF) test and the augmented DF (ADF) test of Diba and Grossman (1988), the SADFtest of Phillips, Wu and Yu (2011) (PWY hereafter) and Phillips and Yu (2009, 2011),and the GSADF test of PSY (2015a, b), are always based on single time-series. After abubble is found, the time series method is then used to estimate the bubble originationand termination dates. For example, PWY (2011) used the ADF test and the first-crossingprinciple to timestamp a bubble whereas, PSY (2015a) used the maximum of ADF testand the first-crossing principle to timestamp each bubble when there are multiple bubblesin the sample.

Unfortunately, the time-series methods may not have good powers, especially whena bubble is short-lived or when a bubble grows slowly. To demonstrate the low-powerproblem of the DF test when a bubble is short-lived or when a bubble grows slowly,we design two experiments. In both experiments, we simulate data from the followingexplosive AR(1) model,

yt = ρyt−1 + ut, y0 = 0, ut ∼ N(0, 1), t = 1, 2, ..., T, (2)

and use the DF statistic (ρ− 1) /s.e.(ρ) to test H0 : ρ = 1 against H1 : ρ > 1, where

ρ =

T∑t=1

(yt − 1

T

T∑t=1

yt

)(yt−1 − 1

T

T∑t=1

yt−1

)/

(yt−1 − 1

T

T∑t=1

yt−1

)2

is the least-squares

(LS) estimator of ρ and s.e.(ρ) is the standard error of ρ. In the first experiment, wetake the empirical estimate of ρ as found in PWY as the true value of ρ (i.e., ρ = 1.033

and ρ = 1.040) but set T = 10, 20, 30. In this experiment, the bubble is short-lived butempirically realistic, judged by the empirical results reported in PSY (2015a). Table 1reports the powers (i.e., relative frequency out of 10,000 replications) of the right-tailedDF test rejecting the null hypothesis. Essentially, when the bubble is short-lived, thepower of the right-tailed DF test is deficient, ranging from 0.0916 to 0.2334. In the secondexperiment, we take the empirical estimates of ρ as found in Section 6 as the true value ofρ (i.e., ρ = 1.0017 and ρ = 1.0068). In this experiment, the bubble grows very slowly, butthese growth rates are empirically reasonable judged by the empirical results that will be

2

reported later. Once again, when the bubble grows slowly, the power of the right-tailedDF test is very low, ranging from 0.0598 to 0.2892.

T 10 20 30 10 20 30ρ 1.033 1.040

Power 0.0916 0.1202 0.1732 0.0975 0.1403 0.2334

Table 1: Power of the right-tailed DF test when a bubble is short-lived.

T 50 100 200 50 100 200ρ 1.0017 1.0068

Power 0.0598 0.0604 0.0766 0.0791 0.1095 0.2892

Table 2: Power of the right-tailed DF test when the bubble grows slowly

In practice, prices of multiple (say N) assets are often available over the same time,leading to the availability of panel data. In this paper, we propose to use panel datamodels to improve the power in bubble detection. Intuitively, as long as there is somehomogeneity over cross-sectional units within groups and the group structure is known,panel data models based on pooling cross-sectional data within the same group shouldsharpen the statistical inferences on the common explosive root and hence, deliver betterpower performance than the method based on single time series.

Unfortunately, in almost all practically relevant cases, the true group structure islatent and has to be estimated from the panel data. In this paper, to identify the latentmembership, we consider several grouping strategies. To determine the number of groups,we use the Bayesian information criterion (BIC).

Two different specifications for the panel data model are considered in this paper.The first specification is the explosive panel autoregressive model. We use the recursivek-means algorithm of Bonhomme and Manresa (2016) to identify the group structure.The second specification is the mixed-root panel autoregressive model in which explosive,stationary, and unit root time series are mixed together. For bubble detection, it perhapstoo restrictive to impose homogeneous explosive roots. Thus, it is more realistic to arguethat explosive roots exist in a proportion of individuals. For the case of mixed roots, weapply the modified k-means algorithm of Lin and Ng (2012) to identify the group structure.We show the uniform consistency of both classification algorithms.

We derive the oracle property of two post-classification estimators under the joint as-ymptotic scheme, that is, n → ∞ and T → ∞. The oracle property reveals that thedistance between the post-classification estimators and the oracle-within estimators isdiminishing. The diminishing distance verifies the optimality of two the k-means classifi-cation procedures from the estimation perspective.

After the classification of groups is made, we provide two right-tailed t-statistics forthe detection of explosiveness. Under the null hypothesis of a unit root in a specific group,the proposed statistics converge to a standard normal distribution. They diverge under

3

the alternative hypothesis of explosive roots. Our panel t-statistics are superior to theADF test in two aspects. First, our panel data based t-tests are more powerful than thetime series based DF test. Our asymptotic theory shows that the panel t-statistics divergeat a faster rate than the ADF statistic. Extensive Monte Carlo simulations demonstratethat the empirical power of the panel t-statistics is much higher than the ADF statistic.Second, unlike the ADF statistic whose limiting distribution is non-standard, the panel t-statistics have the standard asymptotically normal distribution under the null hypothesis.Hence, it is easier to implement the proposed tests than the ADF test.

Based on the panel t-statistics, we then propose a real-time estimate for the bubbleorigination date and develop the asymptotic distribution of the estimator. In particular,based on the k-means classification, we employ the forward recursive right-tailed panelt-test to estimate the starting date of the bubble. This real-time estimate is shown to beconsistent and used to date stamp the origination of the housing bubbles in China’s realestate market.

Our paper makes three contributions. First, it contributes to the literature on bubbledetection. Based on the uniformly consistent classification of individuals, the proposedpanel procedure greatly enhances the power of methods based on a single time series.

Second, our paper contributes to the literature on data-driven classification. Althoughthere are several existing classification algorithms, most of the methods are developed forthe stationary case only. An exception is the LASSO algorithm in Huang et al. (2019),which, however, does not directly apply to the mixed-root panel autoregression. To thebest of our knowledge, our study is the first attempt to extend classification algorithmsto mixed-root panels.

Finally, our paper contributes to the application of concentration inequality. In theclassification literature, concentration inequalities are the cornerstones of classificationalgorithms. They measure misclassification errors by controlling the decaying rates ofsample moments at tails. In the stationary case, sample moments converge to their popu-lation counterparts. The uniform upper bounds of the sample moments of the stationaryindividuals depend on their population counterparts. However, sample moments of explo-sive processes converge to a random variable. The uniform upper bound for the samplemoments of the explosive individuals relies on the asymptotic distance between sample mo-ments and population moments. Therefore, we have to introduce a novel decompositionof explosive sample moments to accommodate concentration inequality for the mixed-rootcase.

The rest of this paper is organized as follows. Section 2 discusses the model setup.Section 3 introduces the two-stage method consisting of both classifications in the firststage and inferences in the second stage. Section 4 derives the asymptotic properties ofthe two-stage procedure. Section 5 reports simulated results. Section 6 applies our methodto the data of China’s real estate markets. Section 7 concludes.

We use the following notations throughout the study. Id and 0d×d denote the d × didentity matrix and a d×d matrix of zeros, respectively. p→, d→, and⇒ denote convergence

4

in probability, convergence in distribution, and convergence in functional space, respec-tively. Correspondingly, (n, T ) → ∞ denotes the joint limit, and (n, T )seq → ∞ denotesthe sequential limit, where T →∞ followed by n→∞. Similarly, (T, n)seq →∞ denotesthe sequential limit, where n→∞ followed by T →∞. We denote T1 := T − 1. A >> B

implies B/A = op (1) as (n, T )→∞. A ∼ B implies B/A = Op (1) as (n, T )→∞

2 Model Setup

The generic panel autoregressive model we consider is

yit = µi + ρiyi,t−1 + uit,

yi0 = 0, ρi = 1 +ciT γ

, i = 1, 2, ..., n, t = 1, 2, ..., T, (3)

where {uit} is a martingale difference sequence with a conditional second moment σ2 (i.e.,E(u2it|Fi,t−1

)= σ2 for any i and t, where Fi,t−1 := σ {ui,t−1, ui,t−2, ...}) and finite qth

moments with q ≥ 4 for all i and t. We assume γ ∈ (0, 1). In this model, µi is anindividual fixed effect for each i with µi = Op (T−γ) whose magnitude depends on thesample size and is diminishing asymptotically.

For the AR coeffi cient, we assume there exist positive values c and c such that ci ∈(−c,−c) ∪ {0} ∪ (c, c). The boundedness imposes an identification condition for parame-ters. Otherwise, there exists an identification problem. Generally, we assume a knownand homogeneous scaling parameter γ and unknown heterogeneous distance parameters,{ci}ni=1. As γ can be estimated consistently, it is natural to consider it as a known value.For consistent estimation on the scaling parameter γ from single time series, see Phillips(2013).

In this study, we adopt a setup that lies between the homogeneous panel (ci = c) andthe heterogeneous panel (ci 6= cj for any i 6= j). In particular, we assume the followinggroup structure as

ci =K0∑g=1

α0g1{i ∈ G0

g

}, (4)

where α0g 6= α0

l for any g 6= l,⋃K0g=1G

0g = {1, 2, ..., n} , and G0

g ∩G0l = ∅ for any j 6= g. Let

ng := #G0g represents the cardinality of the true group G

0g. Let A be a set of arbitrary

K0×1 vectors α (:= (α1, α2, ..., αK0)), and C be a set of group-specific distance parameters,so that cn (:= (c1n, c2n, ..., cK0n)) ∈ C. Within the same market sector or convergenceclub, all cross-sectional units share the identical distance parameter α0

g. To obtain theasymptotic properties of classification and inference, we first assume that the true groupnumber, K0, is known while the true memberships are latent and unknown. We thenpropose to use the BIC to estimate the number of groups.

The true group-specific parameters are defined as c0n :=

(c0

1n, c02n, ..., c

0K0n

)∈ C, α0 :=

(α01, ..., α

0K0) ∈ A and c0 := (c0

1, ..., c0n) ∈ Φn, where Φ := (−c,−c) ∪ {0} ∪ (c, c). The

true group membership variable{g0i

}ni=1

maps individual units into groups. For each

5

i = 1, 2, ..., n, and g = 1, 2, ...,K0, the event ‘g0i = g’ is equivalent to ‘i ∈ G0

g’. Withany estimator {gi}ni=1, the event ‘gi = g’is equivalent to ‘i ∈ Gg’for each i = 1, 2, ..., n,and g = 1, 2, ...,K0. We denote δ := (g1, g2, ..., gn) ∈ ∆K0 as a particular grouping of n,where ∆K0 unit is the set of all groupings of {1, 2, ..., n} into at most K0 groups. Forthe gth group, we define the AR coeffi cients and their estimators as ρ0

gn

(:= exp

(α0g/T

γ)),

ρ0gn

(:= exp

(α0g/T

γ)), ρgn

(:= exp

(αg/T

γ)). For simplicity, we write ρ0

i

(:= exp

(c0i /T

γ))

as ρi.Two kinds of panel autoregressive models are considered in this paper. The first one is

the pure explosive panel, while the second one is the mixed-root panel. For the explosivepanel, its group structure follows,

Group 1 : α01 > 0

Group 2 : α02 > 0

......

Group K0 : α0K0 > 0

.

For the mixed-root panel autoregressive model, three potential classes of groups are consid-ered: (1) explosive roots (α0

g > 0); (2) unity roots (α0g = 0); (3) stationary roots (α0

g < 0).For the mixed-root panel, the group structure of the mixed-root panels follows as

Explosive Groups :

Group 1 : α0

1 > 0Group 2 : α0

2 > 0...

...Group k : α0

k > 0

Unit Root Group : Group : (k + 1) α0(k+1) = 0

Stationary Groups :

Group (k + 2) : α0

(k+2) < 0

Group (k + 3) : α0(k+3) < 0

......

Group K0 : α0K0 < 0

.

As it is highly restrictive to assume all individual assets have bubbles, the mixed-rootpanels accommodate explosive roots in a proportion of cross-sectional units.

3 A Two-stage Approach

For explosive analysis, we apply classification methods in the first stage. Based on theestimated group structures in the first stage, we build post-classification estimators andtesting statistics in the second stage. We consider two inference procedures for explosiveanalysis. The first is to detect the existence of explosive root using the right-tailed t-test.The second is to estimate the bubble origination dates using a recursive algorithm. Bothinference approaches rely on estimated group identities.

In the first panel model, we employ the recursive k-means classification proposed inBonhomme and Manresa (2016). In the second panel model, we consider the modifiedk-means classification proposed in Lin and Ng (2012).

6

3.1 Stage 1: classification

3.1.1 Recursive k-means classification for explosive panel

In this subsection, we consider using the recursive k-means classification algorithm toidentify the group structure in the explosive panel autoregressive model. When mem-berships are unobserved, two types of parameters are considered: the parameter vector{ci}ni=1 ⊆ [c, c], and the group membership variable {gi}ni=1, which maps cross-sectionalunits into groups. Note that group-specific distance parameters, {ci}ni=1, are well sepa-rated with minimum distance c∗ > 0; otherwise, we cannot correctly allocate individualsinto the true groups.

The grouped estimators of{c0i

}ni=1,{g0i

}ni=1

in (3) are defined as the solution to thefollowing optimization problem:

(c, δ) = arg min(c,δ)∈Φn×∆K0

1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

(yit − yi,t−1 exp

(cginT γ

))2, (5)

where δ = {g1, g2, ..., gn} groups n units into K0 groups. We employ {ρi}ni=1 as thecollection of least squares (LS) estimates for each individual time series. To eliminatefixed effects, we employ a demeaned process as yit := yit− yi and yi,t−1 := yit− yi,−1. For

given values of {cgn}K0

g=1, the optimal group classification for each i = 1, 2, ..., n is

gi(c) = arg ming∈{1,2,...,K0}

1

T 2γρ2Ti

T∑t=1

[yit − yi,t−1exp

(cgnT γ

)]2, (6)

where the minimum g optimizes a k-means classification problem. The estimator {ci}ni=1

of (5) optimizes the following objective function:

c = arg minc∈Φn

1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

(yit − yi,t−1exp

(cgi(c)nT γ

))2

, (7)

where gi(c) is derived by (6). The classification estimates of{g0i

}ni=1

are simply gi(c).The following algorithm summarizes the recursive k-means procedure to minimize (5)

in the following steps.Step 1: Let

{c

(0)i

}ni=1

be the collection of individual LS estimates for all i ∈ {1, 2, ..., n} ;

Step 2: For any i = 1, 2, ..., n, compute

g(s+1)i = arg min

g∈{1,2,...,K0}

1

T 2γρ2Ti

T∑t=1


(c

(s)gn

T γ

))2

; (8)

Step 3: Compute

c(s+1)gn = arg min

c∈Φn

1

n

n∑i=1

1

T 2γρ2Ti

T∑t=1


(cg(s+1)i n

T γ

))2

; (9)

7

Step 4: Set s = s+ 1 and go to Step 2 (until numerical convergence).This computation algorithm consists of two iterated steps, ‘assignment’as in Step 2

and ‘update’as in Step 3. In the ‘assignment’step, each cross-section unit i is assignedto the nearest group gi based on the distance defined in (8). In the ‘assignment’step, tore-allocate centres of groups {gi}ni=1, we compute {ci}

ni=1 by minimizing (9).

3.1.2 Modified k-means classification for mixed-root panel

If (3) incorporates explosive, stationary, and unit roots, the recursive k-means classifica-tion algorithm fails. Because of heterogeneity in adjustment rates, the sample momentsof stationary individuals are asymptotically unstable. This instability leads to severe mis-classification. That is why we introduce the modified k-means classification approach. Wefollow the clustering approach in Lin and Ng (2012).

We summarize the algorithm in the following steps.Step 1: We derive LS estimates {ci}ni=1 as

ci − ci = T γ∑T

t=1 yi,t−1uit∑Tt=1 y

2i,t−1

= T γ∑T

t=1

(yi,t−1 − yi,−1

)(uit − ui)∑T

t=1

(yi,t−1 − yi,−1

)2 ; (10)

Step 2: To recover latent memberships, we apply the k-means cluster algorithm for{ci}ni=1. Specifically, with α = (α1, α2, ..., αK0) ∈ A being any arbitrary K0 × 1 vector forα1, α2, ..., αK0 , we define

Qn (α) =1

n

n∑i=1

min1≤l≤k

(ci − αl)2 ,

and α = (α1, α2, ..., αK0) with α := arg minA Qn (α). Therefore, we further compute theestimated cluster identity as

gi = arg min1≤l≤K

|ci − αl| ,

where if there are multiple l’s that achieve the minimum, gi takes the value of the smallestone.

When c0i > 0, we have ci−c0

i = Op

(1ρTi

). When c0

i < 0, we have ci−c0i = Op

(1

T1−γ2

).

However, a pointwise convergence rate is insuffi cient for theoretical derivations on theclassifications. Classification algorithms require controlling the uniform upper bound ofsample moments. Therefore, we need to verify the uniform convergence rate for ci overi = 1, 2, ..., n.

3.2 Stage 2: post-classification estimation, bubble detection and bubbletimestamping

Based on classifications, we consider two pooled LS estimators for α0, c0n and c

0: oraclewithin estimator and post-classification within estimator. The oracle within estimator for

8

the AR coeffi cient in the gth group is

αg − α0g = T γ

∑i∈G0g

∑Tt=1 yi,t−1ui,t∑

i∈G0g∑T

t=1 y2i,t−1

. (11)

Similarly, the post-classification within estimator for the gth group is

αg − α0g = T γ

∑i∈Gg

∑Tt=1 yi,t−1ui,t∑

i∈Gg∑T

t=1 y2i,t−1

, (12)

where we define{Gg

}K0

g=1as any consistent classification estimates on

{G0g

}K0

g=1.

Under the model (3) with latent memberships, we can estimate σ2g consistently for

each g = 1, 2, ...K0. Define

σ2g =

1

2ngT

∑i∈Gg

T∑t=1

u2

it, (13)

where uit = yit − ρgnyi,t−1, ng = #Gg, and ρgn = exp(αgT γ

). Denote αg as any post-

classification estimate based on either the recursive k-means algorithm or the modifiedk-means algorithm. Since we assume homoskedasticity over i = 1, 2, ..., n, we have

σ2 =1

2ngT

∑i∈Gg

T∑t=1

u2

it, (14)

for any g = 1, 2, ...K0.

Based on (14), we can establish the inference procedure to justify the accuracy of ourpost-classification estimator. To test the null hypothesis H0 : Rρ0

g = r, we propose thefollowing t-statistic:

tg =(Rρgn − r)

√Dg,nT

Rσg, (15)

where Dg,nT =∑

i∈Gg∑T

t=1 y2i,t−1, R 6= 0 and r are two scalars. If we let R = r = 1, the

statistic (15) can be employed to test the existence of bubbles. To test the null hypothesisH0 : ρ0

g = 1, we choose the following the panel statistic as

tg =

(ρgn − 1

)√Dg,nT

σg. (16)

Under the alternative H0 : ρ0g > 1, the fact that the statistics (16) diverge faster than

the time-series statistic of Diba and Grossman (1988) illustrates the superiority of ourpanel approach. Obviously, the statistic tg of (16) corresponds to the full-sample statisticand can detect the signal of bubbles. However, the full-sample statistics cannot date theorigination of bubbles.

9

To consistently estimate the origination of explosive subperiod, we propose the follow-ing subsample statistic:

tg (r) =

(ρgn (r)− 1

)√Dg,nT (r)

σg(r), (17)

where ρgn (r) is the post-classification within estimator of ρ0gn based on the first τ = [Tr]

observations in the gth estimated group, and σ2g(r) is the estimator of σ

2g based on the

first τ = [Tr] observations in the gth estimated group. Dg,nT (r) is the sample moment onthe first τ = [Tr] observations in the gth estimated group. The sub-sample statistic (17)is the foundation for the estimate of the bubble origination date.

Our inference procedure extends the framework of PWY and PSY. In PSY, the recur-sive approach is proposed to detect the explosive behaviour and date-stamp the originationof bubbles. The regression model used in PSY is

yt = α+ βyt−1 + ut,

where ut is the equation residual. The parameter β = 0 under the null hypothesis of nobubble and β > 0 under the alternative hypothesis of bubbles. Therefore, the standardDF test under the null hypothesis H0 : β = 0 is

DF =β

se(β) ,

where

β =T∑T

t=1 ∆ytyt−1 −∑T

t=1 ∆yt∑T

t=1 yt−1

T∑T

t=1 y2t−1 −

(∑Tt=1 yt−1

)2 ,

se(β)

= σy

T∑t=1

y2t−1 −

1

T

(T∑t=1

yt−1

)2− 1

2

,

σ2y =

1

T

T∑t=1

(∆yt − α− βyt−1

)2, α =

1

T

(∆yt − βyt−1

).

The DF statistics obtained from these subsample (starting from r1 and ending at r2)regressions are represented in the sequence {DFr1,r2}. The detection for the existence ofbubbles relies on the supreme statistics as,

PSYr = supr1∈[0,r−rmin],r2=r

{DFr1,r2} ,

where τmin = [Trmin] is the minimum sample size required to initiate the procedure.The origination date of a bubble is defined to be the first observation where the supremestatistic exceeds the diverging critical value as,

r = infs≥rmin

{s : PSYs > cvβTn

},

10

where cvβTn is the critical value with significance level βTn → 0.Following PSY, we propose a recursive algorithm to date the bubble origination. Since

we only consider a single bubble case, we employ the t-statistic rather than the supremet-test. Based on the recursive procedure of the panel t-test, we use the classified groups ofpanels to data-stamp the origination. We date the origination of an explosive episode as

reg = infs≥r0

{s : tg (s) > cvβTn

}, (18)

andreg = inf

s≥r0

{s : tg (s) > cvβTn

}, (19)

where cvβTn is the right-side 100βTn% critical value of the limiting distribution of tg andtg statistics based on τ s = [Ts] time horizon, and βTn is the size of the one-sided statistics.The parameter r0 is the minimum sample size required to initiate the regression. We allowβTn → 0 as (n, T ) → ∞ because, in this event, cvβTn → ∞. This recursive method canapply in the same way to the PWY procedure based on the ADF statistic.

3.3 Estimation of K

To estimate the true number of groups, we rely on the Bayesian information criterion(BIC) which is defined as:

BIC(K) =1

nT

n∑i=1

T∑t=1

(yit − yi,t−1ρ

(K)gin

)2+ σ2K + n

nTln(nT ),

where(ρ

(K)gin

)ni=1

is the post-classification estimator based on K groups and σ2 is the

variance estimator (14). Post-classification estimators based on both the recursive k-means and the modified k-means are applicable. The estimation of the group number isachieved by choosing the optimal K which minimize the BIC, that is,

K = arg minK=1,2,...,Kmax

BIC(K), (20)

where Kmax is a generic upper bound of K. The BIC represents a balance between modelfitness and penalty of over-fitness. In addition to the BIC function considered in Bai andNg (2002), Bai (2003, 2009), the Deviance information criterion (DIC) in Spiegelhalter etal. (2002) and Li, Yu, and Zeng (2019) also can be employed to select K.

4 Asymptotic Theory

In this section, we study the asymptotic properties of the k-means algorithms under therespective panel models. We will show that the classification algorithms can recover thelatent group structure consistently. Given this consistency, we can demonstrate that thefeasible estimators of the AR coeffi cients are asymptotically equivalent to the oracle es-timators that are derived as if the true group structure was known. We also provide

11

the asymptotic distributions of the estimators of the AR coeffi cients. In this section, wejustify the consistency of explosiveness statistics under the alternative hypothesis of explo-sive roots. We also show our recursive method can date the origination dates of explosiveepisodes with better accuracy than PWY. At last, we demonstrate that the BIC estimatoron K0 is consistent.

To demonstrate the asymptotic theory of our two-stage procedure, we impose twoassumptions.

Assumption 1 (i) For each i, the individual fixed effect µi = Op (T−γ) where γ ∈ (0, 1).(ii) The error process {uit} is a martingale difference sequence with a homogeneous

conditional second moment σ2 (E(u2it|Fi,t−1

)= σ2 for all i and t, where Fi,t−1 :=

σ {ui,t−1, ui,t−2, ...}) and finite qth moments with q ≥ 4 for all i and t.(iii) Initial conditions: Assume that yi0 = 0 almost surely for all i, yi0 is independent

of uit for all i and t, and uis = 0 for all s ≤ 0.(iv) There exist c and c such that for each i ∈ {1, 2, ..., n} so that we have 0 < c <∣∣c0

i

∣∣ < c <∞ or c0i = 0.

(ii) There exists a constant c∗ ∈ (0,∞) such that inf1≤g≤g′≤K0

∣∣∣α0g − α0

g′

∣∣∣ ≥ c∗.Assumption 1(i) provides restrictions on individual fixed effects. Assumption 1(ii) as-

sumes the martingale property for innovations. Assumption 1(iii) imposes restrictions onthe initial conditions. Assumption 1(iv) imposes an identification condition for c0. As-sumption 1(v) gives another identification condition where the group-specific parametersare well separated from each other.

Assumption 2 (i) Let {ng}K0

g=1 denote the cardinality of latent groups. For each g ∈{1, 2, ...,K0

}, ngn → πg <∞. Moreover, inf1≤g≤K0 πg ≥M for some M > 0.

(ii) The following rate restrictions hold:√n/ log(n) = o

(T

3γ−12

), n = o

(T 1−γ) ,

n14 / log(n) = o

(T

14

).

(iii) The relationship δnT < M(c∗)2

15c holds for each n and T, where δnT is defined inLemma 8.11.

Assumption 2(i) implies that each group size increases proportionally to the dimen-sion of individuals. Assumption 2(ii) imposes rate restrictions so that the concentrationinequality is applicable. Assumption 2(iii) provides necessary conditions for the uniformconsistency of the modified k-means algorithm.

We find that incidental parameters do not show up in the asymptotics of samplemoments. The reason why this happens is that for each i = 1, 2, ..., n, the fixed effect isasymptotically diminishing and dominated by noise. To illustrate this point clearly, notethat,

yit =ρti − 1

ρi − 1µi +

t∑s=1

ρt−si uis, (21)

12

Since ρti−1ρi−1µi = O

(ρti)and

∑ts=1 ρ

t−si uis = Op

(ρtiT

γ), yit is asymptotically equivalent to∑t

s=1 ρt−si uis. Moreover, noise, not fixed effects, determines the asymptotics uniformly for

all i = 1, 2, ..., n. Therefore, to determine uniform bounds of sample moments, we needonly consider innovations.

4.1 Recursive k-means algorithm

We establish the consistency of the coeffi cient estimates in the explosive panel autoregres-sions.

Theorem 4.1 (Individual Consistency of Classification) Let Assumption 1 and 2 holdand c0

i > 0 for each i = 1, 2, ..., n. With joint convergence (n, T )→∞,

Pr

(max

1≤i≤n

∣∣gi − g0i

∣∣ > 0

)= o(1).

Theorem 4.1 justifies the recursive k-means classification algorithm asymptotically. Itis similar to Theorem 2 of Bonhonmme and Manresa (2016). This theorem states thatunder the joint convergence framework (n, T ) → ∞, we can correctly recover the groupstructure. From the Theorem 4.1, we observe that correct classifications strongly rely onAssumption 1(ii). In our discussion, as long as the AR coeffi cients are separated acrossgroups, the misclassification errors are asymptotically negligible.

The following theorem indicates that the post-classification estimator αg is asymptoti-cally equivalent to the oracle estimator αg for each g ∈

{1, 2, ...K0

}. With the classification

consistency of the recursive k-means algorithm, we can verify the asymptotic equivalencebetween our parameter estimate and the true value:

√ng(ρ0gn

)T (αg − α0

g

)=√ng(ρ0gn

)T (αg − α0

g

)+ op(1).

Therefore, the post-classification estimator αg share the identical limiting distribution asthe oracle estimator. The following theorem shows the limiting distribution of αg.

Theorem 4.2 Suppose Assumption 1 and 2 hold and c0i > 0 for i = 1, 2, ..., n. Under

joint convergence (n, T )→∞ and rate restriction nT 2−2γ = o (1) ,

√ng(ρ0gn

)T(αg − α0

g)d→ N(0, 2

(α0g

)).

Remark 4.1 Let αWYg be the estimator minimizing the objective function (5). Although

αWYg is a consistent estimator for each g, it does not enjoy the oracle property. Thereason is because the distance between the oracle estimator αg and the true value α0

g is

αg − α0g = Op

(1√

n(ρ0gn)T

), and the misclassification error (the distance between αWY

g

and αg) decays at a slower rate as Op

(1√

n(ρ0gn)T

). Therefore, the misclassification error

dominates the component of the limiting normality and ruins the oracle property.

13

Remark 4.2 We prefer the within estimator to the first-differenced estimator. We definethe first-differenced LS estimate for homogeneous panels with explosive roots as ρFD where

ρFD =

∑ni=1

∑Tt=1 (yit − yi,t−1) (yi,t−1 − yi,t−2)∑ni=1

∑Tt=1 (yi,t−1 − yi,t−2)2

.

Under (n, T )→∞ and√nTρT

= o (1) , we have

√n(ρ0)T (

ρFD − ρ0)

p→ 0. (22)

When√nTρT

= o (1) restriction fails, we have

(ρ0)2TT

(ρFD − ρ0

)p→ −4. (23)

Due to the dominance of the exponential rate, (22) is usually applicable. Since the first-differenced estimator leads to degenerate distributions, no valid inference procedure is avail-able. The degenerate distribution that is shown here explains why we choose the withinestimators in this paper.

Remark 4.3 To test for explosive roots, we need to accommodate both the grouped ex-plosive roots and the unit root. The new data generating mechanism follows (3) and thegroup structure as

Explosive Groups:

Group 1 : α0

1 > 0Group 2 : α0

2 > 0...

...Group

(K0 − 1

): α0

(K0−1) > 0

Unit Root Group: Group K0 : α0K0 = 0

.

To apply the recursive k-means approach, we can modify the objective function (5) withthe self-normalized adjustment rate as

(c, δ) = arg min(c,δ)∈Φn×∆K0

1

nT 2γ

n∑i=1

1

ΥiT

T∑t=1


(cginT γ

))2, (24)

where ΥiT :=∑T

t=1 y2i,t−1. Basically, the optimization of (24) follows the identical algo-

rithm in (6) and (7) except that we just need to replace ρ2Ti by ΥiT . However, due to the

assumption of latent group structures, self-normalization rates do not disturb our theoret-ical derivations and codings. Therefore, the consistency of classifications and the oracleproperties of parameter estimates still hold. For the technical details, kindly check theproof of Theorem 4.1.

14

4.2 Modified k-means algorithm

We now establish the consistency of the coeffi cient estimates in the mixed-root panelautoregressive model.

Theorem 4.3 (Individual Consistency of Classification) Suppose Assumptions 1 and 2hold and 3

13 < γ < 23 . When (n, T )→∞,

sup1≤i≤n

1{gi 6= g0

i

}= op (1) .

Theorem 4.3 shows that the modified k-means algorithm consistently recovers latentmemberships for the mixed-root panel autoregressive model. This machinery incorporatesmore general cases than the recursive classification algorithm. The following theoremreports the asymptotic distribution of αg, the parameter estimate based on the modifiedk-means algorithm.

Theorem 4.4 Assume 313 < γ < 2

3 and n/T2−2γ = o (1) and that Assumptions 1 and 2

hold. When (n, T )→∞,

√ng(ρ0gn

)T(αg − α0

g)d→ N(0, 2

(α0g

)).

Based on the modified k-means algorithms, we also obtain the oracle property of thepost-classification estimator. Therefore, we can develop reliable tests for explosivenessin the context of mixed-root panels. The significant advantage of the post-classificationestimates is that they employ both time-series and cross-sectional asymptotics.

Remark 4.4 The modified k-means algorithm can also consistently identify the latentmembership with structural breaks in the following model.

Stationary-to-Explosive Groups:

Group 1 : α0

1,1 < 0 α01,2 > 0

Group 2 : α01,2 < 0 α0

2,2 > 0...

......

Group k1 : α01,k1

< 0 α02,k1

> 0

Unity-to-Explosive Groups:

Group (k1 + 1) : α0

1,k1+1 = 0 α01,k1+1 > 0

Group (k1 + 2) : α01,k1+2 = 0 α0

2,k1+2 > 0...

......

Group k2 : α01,k2

= 0 α02,k2

> 0

Unit Root Group: Group (k2 + 1) : α0k2+1 = 0

Stationary Groups:

Group (k2 + 2) : α0

k2+2 < 0

Group (k2 + 3) : α0k2+3 < 0

......

Group K0 : α0K0 < 0

.

(25)

15

There are four types of latent groups in the panel: Case (1), Stationary-to-ExplosiveGroups (Before the time point τ eg, α

01,g < 0; After the time point τ eg, α

02,g > 0); Case

(2), Unity-to-Explosive Groups (Before the time point τ eg, α01,g = 0; After the time point

τ eg, α02,g > 0); Case (3), Unit Root Group (α0

g = 0); Case (4), Stationary Groups (α0g < 0).

For both Case (1) and (2), when i = 1, 2, ..., n and reg0i< 1, the time-series LS estimate

α2,g0iis convergent to α0

2,g0i. This fact shows that the signal of the explosive subperiods

dominates those of unit roots and stationarity. For technical details, kindly check theproof of the Theorem 4.7. As the validity of the modified k-means algorithm relies onthe consistency of LS estimates, we can still recover the latent membership with structuralbreaks in slopes. The classification consistency for the parameter instability case lays downthe foundation for bubble analysis in the forthcoming discussions.

4.3 Test statistic for bubbles

We provide the consistency of both (13) and (14) in the following lemma.

Lemma 4.1 Suppose Assumptions 1 and 2 hold, n/T 2−2γ = o (1), and 313 < γ < 2

3 . Forany g = 1, 2, ...,K0, when (n, T )→∞

σ2g

p→ σ2.

Consistency of σ2g is essential for valid inference procedures as we can properly scale

the statistic. We collect the details of (15) in the following theorem.

Theorem 4.5 Suppose Assumptions 1 and 2 hold, n/T 2−2γ = o (1), and 313 < γ < 2

3 .Under H0 : ρ0

gn = 1 and when (n, T )→∞,

tgd→ N(0, 1),

Under H1 : ρ0gn > 1 and when (n, T )→∞,

tg = Op

((ρ0gn

)T √n).

By comparison, the DF statistic under the alternative hypothesis diverges at the rateOp((ρ0gn)T

). The divergence rate is slower than that in (16), leading to a lower power

than the panel-based test.

Remark 4.5 When the slopes are time-invariant as in (3), Lemma 4.1 illustrates the con-sistency of the estimates (14) and (13). However, under the case of parametric instabilityof Remark 4.4, the estimates (14) and (13) are asymptotically unstable. For instance,under Case II where the regime switches from the unit root to the explosive root, (13) isasymptotically unbounded: for each g = 1, 2, ...,K0, the variance estimate follows

σ2g = Op

((ρ0gn

)2TT 2

).

The group-specific estimate diverges due to the dominance of exponential rates. Details ofthe derivations are collected in the proof of the Theorem 4.7.

16

4.4 Estimate of bubble origination date

When τ = [Tr]→∞ for all r ∈ [r0, 1], we have

tg (r)d→ N (0, 1) ,

under the null hypothesis of unit root in the gth group. For a bubble to have a meaningfulorigination date, we assume the following DGP for at least one group (say G0

g) and alli ∈ G0

g,yit = µi + ρityi,t−1 + uit, t = 1, 2, ..., T, and i = 1, 2, ..., n, (26)

where uit is a martingale difference sequences with a conditional second moment σ2 and afinite qth moment with q ≥ 4. The initialization of the process is yi0 = 0 for i = 1, 2, ..., n.The fixed effect is µi = Op

(1T γ

). The AR coeffi cient is ρit = 1 + cit

T γ with time-varying cit.Following PWY and Phillips and Yu (2009), we assume

cit = c01i1{t < τ eg0i

}+ c0

2i1{t ≥ τ eg0i

}, with c0

1i ≤ 0 and c02i > 0. (27)

Hence, model (26) allows for two regimes: a unit root or stationary regime and an explosiveregime with a bubble originating at τ e

g0i. If individual i does not contain any explosive

root, then reg0i

= 1.By Remark 4.4, under the case of parameter instability, we can still apply the mod-

ified k-means algorithm to consistently recover the true membership, showing that theestimated groups are equivalent to the true ones. Therefore we can directly employ thefeasible estimator on the origination date reg rather than r

eg. Based on r

eg and r

eg, we es-

tablish a limit theory for dating the origination of an explosive root under the case of nobubbles.

Theorem 4.6 Suppose Assumptions 1 and 2 hold. Under the null hypothesis of no episodeof explosiveness (α0

g = 0 for each g = 1, 2, ...,K0), and provided that cvβTn → ∞, theprobability of detecting the origination of a bubble using tg is zero as (n, T )→∞,

Pr(reg∈ [0, 1]

)→ 0, and Pr

(reg∈ [0, 1]

)→ 0,

where reg is defined in (18) and reg in (19).

Next, we show the consistency of the estimator under the case of a single bubble forg = 1, 2, ...,K0.

Theorem 4.7 Suppose Assumptions 1 and 2 hold. If 1cvβTn

+cvβTnPTn

→ 0 with PTn =√nT

2−γ2 , under the model (26) and (27),

regp→ reg, and r

eg

p→ reg.

17

When we include observations from the explosive regimes, the signals from explosiveroots dominate those from non-explosive regimes (unit root or/and stationary regimes).In this case, we can show that the statistics diverge at the rate of Op (PTn). When thecritical value cvβTn increases at a slower speed than PTn, we can obtain the consistencyof the origination estimate.

Remark 4.6 From practical implementation, we set the critical value sequences vβTn ac-cording to a rule of thumb such as cvβTn = 2

3 log log2 nT . The critical value diverges at aslower speed than PTn. Illustrated by our numerical experiment, there is only a small riskof choosing the explosive alternative using this critical value.

4.5 Estimate of K

We show the consistency of the estimator on K0 using the BIC.

Theorem 4.8 Suppose Assumptions 1 and 2 hold, T 2−2γ = o (n) and 313 < γ < 2

3 . When(n, T )→∞,

Kp→ K0

where K is defined in (20).

Remark 4.7 In the above analysis, we derive all the results, assuming that the numberof groups K0 is known. In practice, the researcher has to determine the K0 from databefore conducting inferences because when K0 deviates from the true value, we are unableto study the asymptotic properties of the post-classification estimates. For this reason, inthis subsection and previous discussions, we propose to determine the group number usingthe BIC-type function.

5 Monte Carlo Studies

We design several Monte Carlo experiments to check the finite sample performance ofthe recursive k-means algorithm and the modified k-means classification algorithm. Thenumber of replications is always set to 1,000.

5.1 Recursive k-means algorithm

To verify the classification accuracy of the recursive k-means classification, we consideronly a DGP on explosive roots. We simulate µi ∼ N(0, 1) and ui

iid∼ N(0, 2) over i andt. Data are simulated from a DGP with three groups (K0 = 3) with n1 : n2 : n3 = 1

3 :13 : 1

3 . The following settings are considered (DGP 1): n = 30, 60, T = 25, 50, 100,c = 0.2, 0.9, 1.6, γ = 0.5.

First, we check the performance of the BIC for estimating the number of groups. Table3 reports the relative frequency that K (K = 1, 2, 3, 4, 5) is selected by the recursive k-means classification algorithm. When T is larger than 100, the BIC can successfully choosethe true group number.

18

DGP 1N T K=1 K=2 K=3 K=4 K=530 25 0 0.01 0.92 0.05 0.0230 50 0 0 0.97 0.02 0.0130 100 0 0 1 0 060 25 0 0 0.94 0.04 0.0260 50 0 0 0.98 0.01 0.0160 100 0 0 1 0 0

Table 3: Empirical Frequency of BIC by Applying Recursive k-means under DGP 1

Second, with correctly selected group numbers, we show the classification consistency ofthe recursive k-means procedure and provide the numerical results of its post-classificationestimator. Tables 4 shows the classification errors, RMSE, asymptotic bias, and probabilitycoverage of the post-classification estimator and its oracle estimator, the infeasible withinestimator when informed of the true group membership. Here, we provide several criticaldefinitions before detailed discussions. The classification error follows,

1

n

K0∑g=1

∑i∈Gg

1{gi 6= g0

i

}.

The RMSE is the square root of mean-squared error on the post-classification estima-tor. Bias is the averaged difference between our estimator and the true parameter. Theprobability coverage follows,

1

n

K0∑g=1

∑i∈Gg

1

∣∣∣∣∣∣ ρi − ρ

0i

σ

√√√√ n∑i=1

T∑t=1

(yi,t−1)2

∣∣∣∣∣∣ < 1.96

,

where σ is the proposed consistent estimator for the standard deviations of the errorprocess. As one comparison, we show the results for the oracle within estimator. Classifi-cation errors and probability coverage of oracle estimators similarly follow the abovemen-tioned definitions. We only replace Gg with G0

g, since we can observe the true membership.Tables 4 shows extensive discussions on explosive roots. As shown here, the classifica-

tion error approaches zero as the time horizon increases, and the RMSE and bias of theoracle estimator are smaller than the post-classification estimator. For post-classificationestimators, the RMSE and bias generally decrease when T → ∞. With T ≥ 100, theasymptotic difference between the oracle estimator and the post-classification estimator isalmost negligible. The diminishing distance verifies the asymptotic equivalence betweenthese two estimators. The diminishing distance is due to the uniform consistency of ourrecursive k-means classification technology. The recursive k-means induces much betterfinite sample performance than the estimation and classification results of the modifiedk-means algorithm shown later. This finding is not surprising, as the increase of the cross-sectional dimension helps improve the accuracy of the recursive algorithm while magnifying

19

DGP 1 Post-Classification OracleN T Cluster Error RMSE Bias Coverage RMSE Bias Coverage

Group 1 30 25 0.0436 0.176 -0.0081 0.874 0.1031 -0.0363 0.866330 50 0.0081 0.0886 -0.0151 0.938 0.0458 -0.0149 0.933830 100 0.0018 0.1162 -0.0157 0.93 0.024 -0.0069 0.929960 25 0.0451 0.1616 0.0065 0.886 0.0697 -0.024 0.874360 50 0.0073 0.0574 -0.006 0.954 0.0269 -0.009 0.948560 100 0.0006 0.0665 -0.0073 0.97 0.0143 -0.0044 0.968

Group 2 30 25 0.0436 0.1515 -0.0286 0.934 0.0095 -0.0022 0.932630 50 0.0081 0.0505 -0.0033 0.94 0.0011 -0.0001 0.943930 100 0.0018 0.0001 0 0.94 0.0001 0 0.9460 25 0.0451 0.1528 -0.0298 0.914 0.0068 -0.0021 0.912160 50 0.0073 0.0516 -0.0034 0.936 0.0008 -0.0001 0.934960 100 0.0006 0 0 0.944 0 0 0.944

Group 3 30 25 0.0436 0.0005 -0.0001 0.954 0.0005 -0.0001 0.95430 50 0.0081 0 0 0.954 0 0 0.95430 100 0.0018 0 0 0.952 0 0 0.95260 25 0.0451 0.0004 -0.0001 0.924 0.0004 -0.0001 0.92460 50 0.0073 0 0 0.944 0 0 0.94460 100 0.0006 0 0 0.954 0 0 0.954

Table 4: Classification and Estimation by Recursive k-means Algorithm under DGP 1

the classification error of the modified k-means approach. The main explanation for thisphenomenon is that once we have larger n, the uniform convergence rate of individualleast squares estimators slows down.

Last, we evaluate the performance of explosive detection statistics. The nominal levelis set to 5%. We evaluate these tests with the correct number of groups of K0 = 3. Todemonstrate the superiority over the right-tailed DF test, we choose n = 30, 60, 90, andT = 25, 50, 100. With three groups, we assume π1 = π2 = π3 = 1

3 . We present thesimulated results in Tables 5.

The sizes of panel explosiveness tests are well controlled around nominal levels. Thepower of the test is greatly improved by the additional degree of cross-sectional asymptot-ics, n, when compared with the DF statistic. The power improvement corresponds to ouraim to create power-enhanced statistics for explosiveness detection. Based on the panelt-statistics, we can construct a recursive real-time algorithm, like PWY.

20

n Statistics Sizec1=0(c2=1.5, c3=3),γ=0.5 T=25 T=50 T=1001 DF 0.07 0.054 0.068

10(30) Post-Classification 0.048 0.044 0.05420(60) Post-Classification 0.068 0.52 0.04830(90) Post-Classification 0.066 0.056 0.05n Statistics Power

c1=0.3(c2 = 1.5, c3 = 3),γ=0.5 T=25 T=50 T=1001 DF 0.236 0.384 0.514

10(30) Post-Classification 0.84 0.978 0.99820(60) Post-Classification 0.984 1 130(90) Post-Classification 0.994 1 1n Statistics Powerc1=0.5(c2=1.5, c3=3),γ=0.5 T=25 T=50 T=1001 DF 0.492 0.63 0.816

10(30) Post-Classification 0.998 1 120(60) Post-Classification 1 1 130(90) Post-Classification 1 1 1

Table 5: Sizes and Powers of DF and Panel tests on Explosive Roots Detection (Recursivek-means, c1=0, 0.3, 0.5; c2=1.5, c3=3; γ=0.5)

5.2 Modified k-means classification

To verify the classification accuracy of the modified k-means classification, we consider aDGP on both explosive and stationary roots. We simulate µi ∼ N(0, 1) and ui

iid∼ N(0, 2)

over i and t. Data are simulated from a DGP with three groups (K0 = 3) with n1 : n2 :

n3 = 13 : 1

3 : 13 . The following settings are considered (DGP 2): n = 30, 60, T = 25, 50,

100, c = −1, 0.5, 1.5, γ = 0.5.The consistency of classifications and estimations relies heavily on correctly determin-

ing the exact group numbers, and the following numerical simulations verify the asymptoticbehaviour of our proposed BIC procedure. We pick up the set of K that minimizes theBIC objective function. Tables 6 demonstrates the performance of the identical infor-mation criterion under the modified k-means classification algorithm. Moreover, Table6 presents cases accommodating both explosive roots and stationary roots. Under themixed roots case, we observe rare selection errors in choosing group numbers when timeobservations are 50. When T is larger than 100, the BIC objective function can select thereal group number with no classification errors. This result is not surprising since adjust-ment rates for stationary and explosive roots are different and provide a better referencefor differentiation.

Second, with correctly selected group numbers, we next show the classification consis-tency of the modified k-means procedure and provide the numerical results of the post-classification estimator based on the modified k-means procedure. Tables 7 shows the clas-

21

DGPN T K=1 K=2 K=3 K=4 K=530 25 0 0 0.997 0.002 0.00130 50 0 0 0.999 0.001 030 100 0 0 1 0 060 25 0 0.001 0.998 0.001 060 50 0 0 1 0 060 100 0 0 1 0 0

Table 6: Empirical Frequency of BIC by Applying Modified k-means

sification errors, RMSE, asymptotic bias, and probability coverage of the post-classificationestimator and its oracle estimator.

DGP 2 Post-Classification OracleN T Cluster Error RMSE Bias Coverage RMSE Bias Coverage

Group 1 30 25 0.113 0.5056 -0.0372 0.838 0.2378 -0.0312 0.809130 50 0.0944 0.3712 -0.028 0.878 0.1991 -0.0284 0.869230 100 0.0841 0.2867 -0.0197 0.874 0.1769 -0.0026 0.879760 25 0.1288 0,435 -0.0059 0.766 0.16 -0.0049 0.755760 50 0.1183 0.3493 -0.021 0.806 0.1382 -0.0173 0.809960 100 0.0935 0.2653 -0.0204 0.834 0.1284 -0.0048 0.8523

Group 2 30 25 0.113 1.3086 0.1195 0.764 0.0138 -0.0021 0.734230 50 0.0944 1.3991 0.1288 0.768 0.0029 -0.0002 0.781430 100 0.0841 1.0227 0.213 0.792 0.001 -0.0001 0.810960 25 0,1288 1.1999 0.2093 0.723 0.0094 -0.0012 0.711760 50 0.1183 1.3589 0.2134 0.742 0.0019 -0.0001 0.734360 100 0.0935 0.87 0.2523 0.784 0.0006 0 0.7952

Group 3 30 25 0.113 4.0159 -1.7779 0.942 0 0 0.94230 50 0.0944 5.1474 -2.208 0.95 0 0 0.9530 100 0.0841 5.9771 -2.5076 0.952 0 0 0.95260 25 0.1288 4.3123 -2.0501 0.948 0 0 0.94860 50 0.1183 5.755 -2.76 0.976 0 0 0.97660 100 0.0935 6.1775 -2.6785 0.964 0 0 0.964

Table 7: Classification and Estimation by Modified k-means Algorithm under DGP 2

Tables 7 provides results on mixed-root cases, in which both explosive and stationaryroots appear. As illustrated in the numerical simulations, the classification errors ap-proach zero as the time horizon increases, and the RMSE and bias of the oracle estimatorare much smaller than those of the post-classification estimator based on the modifiedk-means classifications. For post-classification estimators, the RMSE and bias generallydecrease and get closer to the RMSE and bias of the oracle estimators as T → ∞. Thedecreasing RMSE and bias demonstrate that as the time horizon diverges, the asymptoticdifference between the post-classification and oracle estimators is asymptotically diminish-

22

ing. The diminishing distance is due to the uniform consistency of our modified k-meansclassification technology. Unfortunately, since classification errors of the modified k-meansalgorithm diminish at a slower rate than the recursive k-means algorithm, the RMSE andbias of post-classification estimators are reduced more slowly. Therefore the misclassifica-tion errors contribute to the low probability coverages in Group 1 and 2.

Next, we investigate the performance of the panel t-statistics to detect the existenceof explosive roots. The significance level is set to 5%. We evaluate these tests with thecorrect number of groups of K0 = 3. We examine the performance of the proposed testsunder two mixture cases of both explosive and stationary roots. The sample sizes overthe cross-sectional dimension and time horizon are chosen as n = 30, 60, 90, and T = 25,50, 100, respectively. With three groups, we assume π1 = π2 = π3 = 1

3 . We present thedetailed results in Tables 8.

n Statistics Sizec1=0(c2=-1,c3=1.5), γ=0.5 T=25 T=50 T=1001 DF 0.066 0.054 0.044

10(30) Post-Classification 0.074 0.064 0.06820(60) Post-Classification 0.68 0.058 0.0630(90) Post-Classification 0.61 0.05 0.056n Statistics Power

c1=0.3(c2=-1,c3=1.5), γ=0.5 T=25 T=50 T=1001 DF 0.33 0.566 0.774

10(30) Post-Classification 0.966 0.996 120(60) Post-Classification 0.998 1 130(90) Post-Classification 1 1 1n Statistics Power

c1=0.5(c2=-1,c3=1.5), γ=0.5 T=25 T=50 T=1001 DF 0.64 0.828 0.918

10(30) Post-Classification 0.998 1 120(60) Post-Classification 1 1 130(90) Post-Classification 1 1 1

Table 8: Sizes and Powers of DF and Panel Tests on Explosive Roots Detection(Modifiedk-means, c1=0, 0.3, 0.5; c2=-1; c3=1.5; γ=0.5)

The sizes of the proposed panel t-test are well controlled under mixed roots. Asillustrated here, empirical rejection frequency under the null hypothesis is very close tothe nominal level in the finite sample. The most interesting observations are the significantpower improvements brought by an additional degree of cross-sectional asymptotics. Forexample, in Tables 8, under the alternative hypothesis of the distance parameter as smallas 0.3, the rejection rate of the panel t-test is almost unity when n = 50, T = 25.However, the counterpart, the DF test, can detect explosive roots only with around a 50%chance. The power deficiency shows the high priority of the panel t-test on explosivenessidentifications.

23

5.3 Detection of bubble origination

This subsection reports some brief simulations examining the performance of the datingestimation procedure and the accuracy of the asymptotic theory. We employ the panelrecursive dating estimator reg based on tg for g = 1, 2, ...,K0. We also conduct the time-

series recursive estimate re proposed in PWY. We simulate µi ∼ N(0, 1) and uiiid∼ N(0, 2)

over i and t. Data are simulated from a DGP with three groups (K0 = 3) with n1 : n2 :

n3 = 13 : 1

3 : 13 . The following settings are considered: n = 60, T = 50, 100, 150; c1 = 0,

c2 = −1, c3 = −2.5, γ = 0.5 when r ≤ reg (:= 0.5); c1 = 0.2, 0.4, 0.6, 0.8, c2 = −1,c3 = −2.5, γ = 0.5 when r > reg (:= 0.5) . We use critical values as 2

3 log log2 (nT ) for thepanel recursive algorithm and 2

3 log log2(T ) for PWY procedure.

(c2=-1, c3=-2.5) c1=0.2 c2=0.4 c3=0.6 c4=0.8N=30, T=50Detector Panel TS Panel TS Panel TS Panel TSMean 0.5385 0.5929 0.5315 0.5746 0.5261 0.5644 0.5229 0.5565Std 0.0088 0.0567 0.0071 0.0419 0.0066 0.0372 0.006 0.034RMSE 0.6834 1.8841 0.5594 1.4822 0.4667 1.2879 0.4096 1.1414



Table 9: Results on estimations of bubble origination date based on panel recursive algo-rithm and time-series recursive algorithm. Innovations are normal distributions. Dimen-sion of cross-sectional units is 60. Time horizons are 50, 100 and 150

Tables 9 report results for both the panel recursive algorithm and the time-seriesrecursive algorithm, giving means, standard errors, and RMSE for reg. We can observethe following four patterns. First, the panel recursive estimate reg can estimate the truebubble origination date with high accuracy, reflected by small bias and a small standarderror. When T = 150, for cases, the mean of reg is very close to 0.5, the true value, withsmall standard errors. Second, the panel recursive estimate reg converges to the true valuefaster than the time-series estimate re. For almost all cases, the means of reg are closer to0.5 with smaller standard errors and RMSE. Especially when the bubble signal is small(c1 = 0.5) or the bubble period is short (T = 50), the bias of reg is much smaller than

24

0.04 while the bias of re can be bigger than 0.09. Third, when the explosive signal (c1) isstronger, it is easier to estimate the true origination date for both algorithms. In this case,bias, standard error, and RMSE of both recursive estimates are smaller. Fourth, when thesample size increases, it is easier to estimate re. Both the bias and standard errors becomesmaller, corroborating the consistency results. Specifically, the consistency of the panelrecursive estimate benefits from the increase in both n and T . However, the time-seriesrecursive estimate benefits only from the increase in T .

6 Empirical Illustrations

We apply our analysis to the ratio between the monthly real estate price index (PI) andthe monthly urban disposable income index for China’s large and medium-sized cities. Thehousing data are obtained from Fang et al. (2016). Fang et al. (2016) calculate the urbandisposable income index based on the urban disposable income per capita (10,000 yuan).We use the urban disposable income index to approximate the housing rent (fundamentalvalues). Our sample covers the period from March 2003 to March 2013. It covers 113cities, each of which has 123 monthly observations.

Chen and Wen (2017) argue that China’s housing prices have been snowballing overthe past two decades and suggest the existence of housing bubbles due to the high capitalreturns driven by resource misallocations. But until now, there is no rigorous justificationfor such a hypothesis. We, therefore, apply our panel procedure to the data set from Fanget al. (2016) and try to provide a reliable answer. As various nonstationary phenomenacan exist in the empirical data, we apply our procedures based on the modified k-meansclassifications.

K 1 2 3 4 5 6BIC 8.6493 8.2803 7.9571 7.782 7.7884 7.7937

Table 10: BIC for Group Number

Table 10 reports the BIC values for six different values ofK. The BIC suggestsK0 = 4.The modified k-means algorithm provides the classified group structure as in Table 11.Figure 1 gives the time series plots for each group.

For each identified group, we obtain the panel within estimator of ρ and calculate thet-statistic for the unit root null hypothesis. Table 6 reports the post-classification estimateand the full-sample t-statistics for explosive detection for each group. Two groups, namelyGroup 2 and Group 4, have the AR coeffi cient larger than one, suggesting evidence ofbubble in these two groups. Group 2 has a slightly stronger explosive root than Group3. Two groups, namely Group 1 and Group 3, have the AR coeffi cient smaller than one,suggesting evidence of stationarity in these two groups. Group 4 has a much smaller rootthan Group 1. Furthermore, we apply the panel recursive estimates of bubble originationdates to Groups 1-4. We present the results in Figure 2. For Groups 1 and 3, the statisticsare below the critical value, illustrating that we cannot reject the null hypothesis of unit

25

Group 1 Anshan, Baoding, Changzhou, Chengdu, Chuzhou, Dalian, Dongguan, Fos-han, Fuzhou, Hangzhou, Harbin, Hefei, Heyuan, Huaian, Huizhou, Huzhou,Jiaxing, Jieyang, Jinan, Jinhua, Jiujiang, Kaifeng, Kunming, Langfang, Le-shan, Luoyang, Mianyang, Nanjing, Nanning, Nantong, Nanyang, Ningbo,Qingdao, Qingyuan, Qinhuangdao, Quanzhou, Rizhao, Shanghai, Shaox-ing, Shijiazhuang, Songyuan, Suzhou, Tangshan, Tianjin, Tieling, Wenzhou,Wuhan, Wuhu, Wuxi, Xiamen, Xi’an, Xingtai, Xuancheng, Yingkou, Za-ozhuang, Zhaoqing, Zhenjiang, Zhongshan;

Group 2 Anqing, Baotou, Beijing, Bengbu, Chaoyang, Changchun, Changde,Changji, Changsha, Chongqing, Dandong, Deyang, Fuzhou1, Guangzhou,Haikou, Hohhot, Jiangmen, Jiangyan, Jingdezhen, Luzhou, Luohe, Nan-chang, Nanchong, Ningde, Pingxiang, Shangrao, Shantou, Shanwei,Shaoguan, Shenyang, Shenzhen, Suqian, Taizhou, Urumq, Wuludao, Xining,Xinxiang, Xinyu, Xuchang, Xuzhou, Yancheng, Yangzhou, Yichun, Zhangji-akou, Zhangzhou, Zhengzhou, Zhumadian;

Group 3 Huangshan, Puyang;Group 4 Dezhou, Jilin, Lianyungang, Nanyang, Xilingol, Yangjiang.

Table 11: Group Structure of Housing Prices on China’s Large and Medium-Sized Cities

Membership G1 G2 G3 G4ρi 0.9962 1.0068 0.881 1.0017

t-statistic -271.097 486.4522 -28.9456 3.2209

Table 12: Estimate of ρ and t-statistic for each group

26

Figure 1: Classified Membership Using Modified K-means Algorithm

root behaviour. In Group 2, the supreme statistics are larger than the critical value afterSeptember 2009. In Group 4, the supreme statistics are larger than the critical value afterFebruary 2011. These observations show the existence of bubbles and the starting datesof bubbles at the same time.

To demonstrate the robustness of post-classification procedures, we apply the time-series recursive algorithm in PWY. We select four cities from Group 1, four cities fromGroup 2, two cities from Group 3, and two cities from Group 4. We plot the sequences ofsupreme statistics for the selected cities in Figures 3—5. The time-series supreme statisticsverify the existence of bubbles in Groups 2 and 4, coinciding with the panel recursivealgorithms. The above analysis helps demonstrate the existence of explosive roots inGroup 2 and 4 using PWY. All in all, PWY demonstrate the robustness of our panelprocedures.

Moreover, our panel procedure illustrates the existence of a housing bubble as a rationalbubble component and corresponds the founding in Chen and Wen (2017)

27

Figure 2: Panel Recursive Bubble Detector for Each Group

Figure 3: Robustness Check for Group 1

28

Figure 4: Robustness Check for Group 2

Figure 5: Robustness Check for Group 3 and 4

29

7 Conclusions

Explosiveness in time series is related to asset price bubbles in economics. That is whyright-tailed unit root tests have been widely used to detect asset price bubbles in theliterature. For example, PWY develop the sup ADF statistic for the presence of bubblesand the sequential algorithm for real-time dating of the origination of a bubble. Thisprocedure relies on the recursive right-tailed unit root tests. PSY generalize the methodsof PWY to deal with multiple bubble episodes. Like PWY, PSY only use a single timeseries. We show that when the bubble duration is short or when the bubble grows slowly,these tests have low power to identify bubbles.

In practice, panel data may be available where multiple time series have explosivebehaviour. In this paper, we propose to use panel data models to improve the powerin bubble detection. When there is homogeneity over cross-sectional units within groups,pooling cross-sectional data within the same group should sharpen the statistical inferenceson the common explosive root.

However, it may be too strong to assume all of the time series in the panel data to havethe same explosive root. In many applications, it may be too strong to consider all of thetime series in the panel data to be explosive. That is why, in this paper, we introduce twopanel models with a latent group structure. In the first model, we consider all of the timeseries are explosive, allowing individual heterogeneity through latent group structures. Inthe second model, we assume time series have mixed roots, some with stationary roots,some unit roots, and some explosive roots, again allowing individual heterogeneity throughlatent group structures. We propose a two-stage algorithm to detect the presence of theexplosive behaviour and to estimate the origination date of the explosive period. In the firststage, we apply the k-means classifications to recover group structures. In the second stage,we establish the post-classification estimates and tests based on the estimated groups.

Both model specifications are in the form of panel autoregressions, where we modelindividual heterogeneity through latent group structures. The group patterns representthe homogeneous slope coeffi cient within the identical group and heterogeneous autore-gression coeffi cients across different groups. Under explosive panel autoregressions withlatent group structures, we apply the recursive k-means classification and illustrate theconsistency of the group clustering algorithm. Similarly, within the mixed-root panel au-toregressions, a modified k-means clustering algorithm is implemented with consistency.Therefore, in the first stage of the computing algorithm, we successfully recover the groupidentities.

With estimated group structures, we can furthermore build post-classification esti-mates and inferences in the second stage. The post-classification estimators based onboth k-means algorithms are asymptotically equivalent to the oracle estimates. Based onpost-classification estimates, we offer two consistent t-tests on explosiveness detections.By applying the recursive t-statistics, we demonstrate a consistent estimate for bubbleorigination dates. Compared to PWY, the additional asymptotics can provide betterasymptotic behaviours.

30

It is possible to extend our model specifications. For example, an empirically restrictiveassumption for the error processes is martingale difference sequences with cross-sectionalindependence. For another example, allowing cross-sectionally and intertemporally de-pendent noise is more realistic in empirical analysis. These extensions will be consideredin future work.

8 Appendix

We collect technical proofs for classifications, estimations and inferences on the recursivek-means classifications and the modified k-means classifications.

We denote gi := gi (c) as any k-means classification estimates for g0i for each i =

1, 2, ...n. To demonstrate consistency of the recursive k-means classifications, we intend toestablish the consistency of c in terms of the Hausdorff distance. The Hausdorff distancemeasures how far two compact subsets of a metric space are separated from each other:

dH(a, b) =

{max

g∈{1,2,...,K0}

(min

g∈{1,2,...,K0}

(ag − bg

)2), maxg∈{1,2,...,K0}

(min

g∈{1,2,...,K0}

(ag − bg

))2}.

Lemma 8.1 If Assumption 1 and 2 hold,

sup(c,δ)∈Φn×∆K0

T 2γ∣∣∣QnT (c, δ)− QnT (c, δ)

∣∣∣ = op(1),

where

QnT (c, δ) :=1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

(yit − yi,t−1ρgin

)2,

and

QnT (c, δ) :=1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

(yi,t−1ρgin

)2+

1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

u2it,

and {ρi}ni=1 are the collection of individual least squares estimates and ρgin := exp( cginT γ

).

The Proof of Lemma 8.1: Define ρ0g0i n

:= exp(c0g0i n/T γ

)and ρgin := exp

( cginT γ

).

Observe the following argument as,

T 2γ[QnT (c, δ)− QnT (c, δ)

]=

2

n

n∑i=1

1

ρ2Ti

T∑t=1

yi,t−1uit

(ρ0g0i n− ρgin

)=

2

nT γ

n∑i=1

1

ρ2Ti

T∑t=1

yi,t−1uit

(c0g0i n− cgin

)=

2

nT γ

n∑i=1

1

ρ2Ti

T∑t=1

yi,t−1uitc0g0i n

− 2

nT γ

n∑i=1

1

ρ2Ti

T∑t=1

yi,t−1uitcgin.

31

We have

2

nT γ

n∑i=1

1

ρ2Ti

T∑t=1

yi,t−1uitc0g0i n

=K0∑g=1

1{g0i = g

} 1

n

n∑i=1

1

ρ2Ti T γ

T∑t=1

yi,t−1uitc0gn.

For any g ∈{

1, 2, ...,K0}, we then have

1{g0i = g

} 1

n

n∑i=1

1

ρ2Ti T γ

T∑t=1

yi,t−1uitcgn ≤∣∣∣c0gn

∣∣∣1{g0i = g

} 1

n

n∑i=1

1

ρ2Ti T γ

T∑t=1

yi,t−1uit

= Op

(1

ρTgn√n

),

since∣∣cgn∣∣ ≤ c due to the compact support of distance parameters. Define ρgn :=

exp( cgnT γ

). Therefore

2

nT γ

n∑i=1

1

ρ2Ti

T∑t=1

yi,t−1uitc0g0i n

= Op

(1

ρT√n

), (28)

where we define ρ := exp( cT γ

).

Similar argument can be applied to the term 2nT γ

∑ni=1

1ρ2Ti

∑Tt=1 yi,t−1uitcgin and we

can justify

2

nT γ

n∑i=1

1

ρ2Ti

T∑t=1

yi,t−1uitcgin = Op

(1

ρT√n

). (29)

By combining above results (28) and (29), we have

sup(c,δ)∈Φn×∆K0

T 2γ∣∣∣QnT (c, δ)− QnT (c, δ)

∣∣∣ = Op

(1

ρT√n

)= op(1).

�

Lemma 8.2 (Estimation Consistency) Assumption 1 and 2 hold, and c0i > 0 for each i.

Under (n, T )→∞,dH(c0, c)

p→ 0.

By Lemma 8.2, there is one permutation τ : {1, 2, ...K0} → {1, 2, ...,K0} such thatparameter estimates converge to the true values,(

cτ(g)n − c0gn

)2 p→ 0.

By relabelling c, we take τ(g) = g. For any η > 0, we define

Nη :={cn ∈ [c, c]K

0

:(c0gn − cgn

)2< η,∀g ∈

{1, 2, ...,K0

}}. (30)

32

Let gi(c) = arg ming∈{1,2,...,K0}∑T

t=1


(cgnT γ

))2. After verifying the consis-

tency of c for c0, we provide the individual and uniform consistency of recursive k-meansclassifications in the following theorems.

Proof of Lemma 8.2: Define ρ0g0i n

:= exp(c0g0i n/T γ

)and ρgin := exp

( cginT γ

). Observe

the following argument as,

QnT (c, δ) = QnT (c, δ) + op(T−2γ) ≤ QnT (c0, δ0) + op(T

−2γ) = QnT (c0, δ0) + op(T−2γ),

where the equalities come from Lemma 8.1. Because QnT (c, δ) is minimized at c = c0 andδ = δ0, we have

QnT (c, δ)− QnT (c0, δ0) = op(T−2γ).

On the other hand, for any c, we have

op(T−2γ) = QnT (c, δ)− QnT (c0, δ0)

=1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

(yi,t−1(ρ0

g0i n− ρgin)

)2

= (ρ0gn − ρgn)2

[1

n

n∑i=1

1

T 2γρ2Ti

T∑t=1

y2i,t−1

]

≥ (ρ0gn − ρgn)2 σ

2

4c2 + op

(1

T 2γ

)≥ σ2

4c2

1

T 2γmin

g∈{1,2,...,K0}

(c0gn − cgn

)2+ op

(1

T 2γ

)≥ σ2

4c2

1

T 2γmax

g∈{1,2,...,K0}

(min

g∈{1,2,...,K0}

(c0gn − cgn

)2)+ op

(1

T 2γ

).

Note that σ2

4c2is bounded away from zeros by Assumption 1 and 2. As a result, we have

maxg∈{1,2,...,K0}

(min

g∈{1,2,...,K0}

(ρ0gn − ρgn

)2)= op

(T−2γ

), (31)

or

maxg∈{1,2,...,K0}

(min

g∈{1,2,...,K0}

(c0gn − cgn

)2)= op (1) . (32)

Letτ(g) = arg min

g∈{1,2,...,K0}

(c0gn − cgn

)2.

Then we have for g 6= g,

T 2γ

(1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

y2i,t−1(ρτ(g)n − ρτ(g)n)2

) 12

≥ T 2γ

(1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

y2i,t−1(ρ0

gn − ρ0gn)2

) 12

33

−T 2γ

(1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

y2i,t−1(ρτ(g)n − ρ0

gn)2

) 12

−T 2γ

(1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

y2i,t−1(ρτ(g)n − ρ0

gn)2

) 12

. (33)

The first term on the right hand side of (33) is bounded away from zero since

1

nT 2γ

n∑i=1

1

ρ2Ti

T∑t=1

y2i,t−1 ≥

σ2

4c2 + op

(1

T 2γ

),

and (c0gn − c0

gn)2≥ (c∗)2 > 0 as g 6= g. Due to (31) or (32), we have the 2nd and 3rd term of(33) are op(1). Therefore, we have τ(g) 6= τ(g) with probability approaching one. We notethat asymptotically τ is not only an onto mapping but also one-to-one mapping. Hence τhas the inverse mapping denoted as τ−1. Then we have

ming∈{1,2,...,K0}

(ρ0gn − ρgn

)2 ≥ (ρ0τ−1(g)n − ρgn

)2= min

h∈{1,2,...,K0}

(ρ0τ−1(h)n − ρhn

)2= op

(T−2γ

),

where the last equality comes from (31) or (32). So we have

maxg∈{1,2,...,K0}

(min

g∈{1,2,...,K0}

(ρ0gn − ρgn

)2)= op(T

−2γ). (34)

In all, by combining the results for two terms (31) and (34), we complete the proof. �

Lemma 8.3 If Assumption 1 and 2 hold. For any M > 0,

max1≤i≤n

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥M)

= o

(1

n

)The Proof of Lemma 8.3: Due to the uniform dominance of innovations over the

fixed effect in (3), we consider the following decomposition as,

1

ρ2Ti T γ

T∑t=1

yi,t−1uit =1

ρ2Ti T γ

T∑t=1

yi,t−1uit −1

ρ2Ti T γ−1

yi,−1ui. (35)

It is equivalent to consider the decomposition of (35) in the model (3) with µi = 0 for alli = 1, 2, ..., n. Fix M > 0. For the first term of (35), we apply the Markov inequality andderive the following

n max1≤i≤n

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥M2)≤ n max

1≤i≤n

4E(∑T

t=1 yi,t−1uit

)2

ρ4Ti T 2γM2

= n max1≤i≤n

4σ4

ρ2Ti M2c2

i

≤ 4σ4

ρ2TM2c2

34

= O

(n

ρ2T

)= o(1), (36)

with the dominance of exponential rates. Define ρ := exp( cT γ

). For the second term of

(35), Markov inequality is applied as,

n max1≤i≤n

Pr

(1

ρ2Ti T γ−1

∣∣yi,−1ui∣∣ ≥M

2

)≤ n max

1≤i≤n

E(yi,−1ui

)ρ2Ti T γ−1

≤ n max1≤i≤n

[4σ2

ρTi T1−γc2

iM2

+ o

(1

ρTi T1−γ

)]= O

(n

T 1−γρT

)= o(1), (37)

with the dominance of exponential rates. The second last equality of (37) is due to thefollowing fact,

E(yi,−1ui

)= E

(1

T 2

T∑t=1

t−1∑s=1

(t−1∑l=1

ρt−1−li uil

)uis

)= E

(1

T 2

T∑t=1

t−1∑l=1

ρt−1−li u2

il

)

=σ2ρTi

ρiT2−2γc2

i

− σ2

ρiT2−2γc2

i

− σ2

T 2−2γci.

By combining above results (36) and (37), we derive

max1≤i≤n

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥M)

= o

(1

n

).

�

Lemma 8.4 If Assumption 1 and 2 hold. For arbitrary M ≥ 5σ2

2c2,

max1≤i≤n

Pr

(1

ρ2Ti T γ

T∑t=1

y2i,t−1 ≥ M

)= o

(1

n

)The Proof of Lemma 8.4: Due to the uniform dominance of innovations over the

fixed effect in (3), we consider the following decomposition as,

1

ρ2Ti T 2γ

T∑t=1

y2i,t−1 =

1

2ci

{ρ−2Ti

T γy2i,T −

2ρ−2T+1i

T γ

T∑t=1

yi,t−1uit −ρ−2Ti

T γ

T∑t=1

u2it

}− 1

ρ2Ti T 2γ−1

y2i,−1.

It is equivalent to consider the decomposition in (3) with µi = 0 for all i = 1, 2, ..., n.Therefore we can derive the following uniform upper bound as

n max1≤i≤n

Pr

(1

ρ2Ti T 2γ

T∑t=1

y2i,t−1 ≥ M

)≤ n max

1≤i≤nPr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γ(y2i,T − Ey2

i,T

)∣∣∣∣∣ ≥ M

5

)

35

+n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

2ρ−2T+1i

T γ

T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ M

5

)

+n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γ

T∑t=1

u2it

∣∣∣∣∣ ≥ M

5

)

+n max1≤i≤n

Pr

(∣∣∣∣ 1

2ci

1

ρ2Ti T 2γ−1

y2i,−1

∣∣∣∣ ≥ M

5

)

+n max1≤i≤n

Pr

(1

2ci

ρ−2Ti

T γEy2

i,T ≥M

5

). (38)

For the first term of (38), we can bound this term using the exponential inequalityillustrated in Freedman (1975). By the martingale difference error process, we have

n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γ(y2i,T − Ey2

i,T

)∣∣∣∣∣ ≥ M

5

)≤ 2n max

1≤i≤nPr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γ

T∑t=1

ρ2(T−t)i

(u2it − Eu2

it

)∣∣∣∣∣ ≥ M

10

).

(39)Note the fact that for each i = 1, 2, ...n, {u2

it−Eu2it}Tt=1 is a martingale difference sequence

adaptive to the filtration Fi,t−1 := σ{ui,t−1, ui,t−2, ui,t−3, ...} as E{u2it − Eu2

it|Fi,t−1} = 0

∀t = 1, 2, ...T. Define zit := ρ2(T−t)i

(u2it − Eu2

it

)=: z1it + z2it − E{z2it|Fi,t−1} with z1it :=

zit1it − E{zit1it|Fi,t−1}, 1it := 1{|zit| ≤ θi,nT } with θi,nT := ρ2Ti T

14n

14 , 1it := 1− 1it, and

z2it := zit1it. Let ViT :=∑T

t=1E[z2

1it

]and vi,nT := ρ4T

i T1+γ2√n as one truncation of ViT .

E[V 2iT

]= E

(T∑t=1

E[z2

1it|Fi,t−1

])2

≤ TT∑t=1

E[{E[z2

1i,t−1|Fi,t−1

]}2]≤ T

T∑t=1

E |z1it|4

≤ 16TT∑t=1

E |zit|4 ≤ DT 1+γρ8Ti ,

where some positive constant D > 0. For the term related to z1it, with any constant d > 0,

n max1≤i≤n

Pr

(∣∣∣∣∣ρ−2Ti

T γ

T∑t=1

z1it

∣∣∣∣∣ ≥ d)

= n max1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ T γρ2Ti d, ViT ≤ vi,nT

)+ n max

1≤i≤nPr (ViT > vi,nT ) (40)

≤ max1≤i≤n

exp

(−ρ4T

i T 2γd2 + 2 (log n) vi,nT + 2 (log n) ρ2Ti T γdθi,nT

2vi,nT + 2ρ2Ti T γdθi,nT

)+ max

1≤i≤no(nv−2

i,nTT1+γρ8T

i

).

The arguments for terms related to z2it and E{z2it|Fi,t−1} are identical. Without losinggenerality, we just discuss z2it.The uniform upper bound of z2it is as,

n max1≤i≤n

Pr

(∣∣∣∣∣ρ−2Ti

T γ

T∑t=1

z2it

∣∣∣∣∣ ≥ d)≤ n max

1≤i≤nPr

(max

1≤t≤T|zit| ≥ θi,nT

)

36

≤ nT

θ4i,nT

max1≤i≤n

max1≤t≤T

E[|zit|4 1{|zit| ≥ θi,nT }

]= max

1≤i≤no(θ−4i,nTTnρ

8Ti

), (41)

where the first inequality is due to the more relaxed restrictions of truncations and thesecond inequality is because of the Markov inequality. To make sure the three terms inboth (40) and (41) are asymptotically negligible, we need the following restrictions: Foreach i = 1, 2, ..., n,

ρ4Ti T 2γ/ log n >> vi,nT , ρ

2Ti T γ/ log n >> θi,nT , vi,nT >> T

1+γ2 ρ8T

i

√n, θi,nT >> ρ2T

i T14n

14 .

Therefore the following rate restrictions are on request as, for each i = 1, 2, ..., n,

T3γ−12 >>

√n/ log n, ρTT γ−

14 >> n

14 log(n). (42)

Rate restrictions (42) are ensured by Assumption 1 and the dominance of exponentialrates. Therefore the choice of d > 0 is arbitrary, and we need M to be any positive valueto make sure

n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γ(y2i,T − Ey2

i,T

)∣∣∣∣∣ ≥ M

5

).

For the second term of (38), the Markov inequality is applied as,

n max1≤i≤n

Pr

(∣∣∣∣∣ρ−2T+1i

T γ

T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ Mci5

)≤ n max

1≤i≤n

25E(∑T

t=1 yi,t−1uit

)2

ρ4Ti T 2γM2c2

i

≤ O(n

ρ2T

)= o(1),

due to the dominance of exponential rates. To make sure that the second term of (38) isconverging to zero, M can be any positive constant.

For the third term of (38), similarly, by applying the Markov inequality, we have

n max1≤i≤n

Pr

(∣∣∣∣∣ρ−2T+1i

T γ

T∑t=1

u2it

∣∣∣∣∣ ≥ 2Mci5

)≤ n max

1≤i≤n

5E(∑T

t=1 u2it

)2ρ2T

i T γM ci≤ O

(n

ρ2TT γ−1

)= o(1),

and M can be any positive constant. The asymptotically negligibility is implied by thedominant exponential rates.

For the fourth term of (38), note that yi,t−1 is not a martingale difference sequence.Therefore Freedman exponential inequality is not directly applicable for 1

T

∑Tt=1 yi,t−1.

The following transformation is readily justified as,

T∑t=1

yi,t−1 =T∑t=1

t−1∑s=1

ρt−1−si uis =

T∑s=1

(T∑

t=s+1

ρt−1−si

)uis

=T γ

ci

T∑s=1

ρT−2s−2i uis −

T γ

ci

T∑s=1

ρ−1−si uis. (43)

37

The two terms of (43) are summations of martingale difference sequence. The exponentialinequality for martingale is applicable to the decomposition of (43). By (43), we have

n max1≤i≤n

Pr

(∣∣∣∣∣ρ−2T+1i

T 2γ−1y2i,−1

∣∣∣∣∣ ≥ 2ciM

5

)

≤ n max1≤i≤n

Pr

∣∣∣∣∣ ρ−TiT γ√T−1

yi,−1

∣∣∣∣∣ ≥√

2ciM

5

≤ n max

1≤i≤nPr

∣∣∣∣∣ 1

ρTi√T

T∑s=1

ρT−2s−2i uis

∣∣∣∣∣ ≥ ci2

√2ciM

5

+n max

1≤i≤nPr

∣∣∣∣∣ 1

ρTi√T

T∑s=1

ρ−s−2i uis

∣∣∣∣∣ ≥ ci2

√2ciM

5

. (44)

The second term of (44) is trivially bounded by Markov inequality: DefineM∗ := ci2

√2ciM

5 ,and we have

n max1≤i≤n

Pr

(∣∣∣∣∣ 1

ρTi√T

T∑s=1

ρ−3i uis

∣∣∣∣∣ ≥M∗)≤ max

1≤i≤n

nE(∑T

s=1 ρ−s−2i uis

)2

ρ2Ti T (M∗)2

≤ O(

n

ρ2TT 1−γ

)= o(1).

For nmax1≤i≤n Pr

(∣∣∣ 1√T

∑Ts=1 ρ

−2s−2i uis

∣∣∣ ≥ ci2

√2ciM

5

)we apply martingale exponential

inequality in Freedman (1975). Note the fact that for each i = 1, 2, ...n, the process{ρT−2s−2

i uit}Tt=1 is a martingale difference sequence adaptive to the filtration Fi,t−1 :=

σ{ui,t−1, ui,t−2, ui,t−3, ...} as E{ρT−2s−2i uit|Fi,t−1} = 0 for any t = 1, 2, ...T. Define xit :=

ρT−2s−2i uit =: x1it + x2it − E{x2it|Fi,t−1} with x1it := xit1it − E{xit1it|Fi,t−1}, 1it :=

1{|xit| ≤ ϕnT } with ϕnT := n14T

14 , 1it := 1 − 1it, and x2it := xit1it. Let WiT :=∑T

t=1E[x2

1it

]and wnT := T

1+γ2√n is one truncation of WiT .

E[W 2iT

]= E

(T∑t=1

E[x2

1it|Fi,t−1

])2

≤ TT∑t=1

E[{E[x2

1i,t−1|Fi,t−1

]}2]≤ T

T∑t=1

E |x1it|4

≤ 16T

T∑t=1

E |xit|4 ≤ BT 1+γ ,

with some positive B > 0. For the term related to x1it, there exist some constant η > 0,

n max1≤i≤n

Pr

(∣∣∣∣∣ 1√T

T∑t=1

x1it

∣∣∣∣∣ ≥ η)

= n max1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ √Tη, WiT ≤ wnT

)+ n max

1≤i≤nPr (WiT > wnT ) (45)

38

≤ max1≤i≤n

exp

(−Tη2 + 2 (log n)wnT + 2 (log n)

√TηϕnT

2wnT + 2√TηϕnT

)+ max

1≤i≤no(nw−2

nTT1+γ).

The arguments for terms related to x2it and E{x2it|Fi,t−1} are identical. Withoutlosing generality, we only discuss the case of x2it as

n max1≤i≤n

Pr

(∣∣∣∣∣ 1√T

T∑t=1

x2it

∣∣∣∣∣ ≥ ηnT)≤ n max

1≤i≤nPr

(max

1≤t≤T|xit| ≥ ϕnT

)≤ nT

ϕ4nT

max1≤i≤n

max1≤t≤T

E[|xit|4 1{|xit| ≥ ϕnT }

]= max

1≤i≤no(ϕ−4nTTn

), (46)

where the first inequality is due to the more relaxed restrictions of truncations and thesecond inequality is because of the Markov inequality. In order to make sure terms inboth (45) and (46) are asymptotically negligible, the following several restrictions are onrequest: For each i = 1, 2, ..., n,

T/ log n >> wnT ,√T/ log n >> ϕnT , wnT >> T

1+γ2√n, ϕnT >> T

14n

14 .

These rate restrictions are ensured by Assumption 1. In order to make sure the fourth

term of (38) is asymptotically negligible, just make sure that c2

√2cM

5 > η. Since η is an

arbitrarily value, M can be any positive value.For the fifth term of (38), if M > 5σ2

2c2or M < 5σ2

8c2,

n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γEy2

i,T

∣∣∣∣∣ ≥ M

5

)= 0.

All in all, in order to make sure

max1≤i≤n

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

y2i,t−1

∣∣∣∣∣ ≥ M)

= o

(1

n

),

we just need to make sure M is large enough as M > 5σ2

2c2. �

Lemma 8.5 If Assumption 1 and 2 hold, for any M satisfying M ≤ 5σ2

8c2,

max1≤i≤n

Pr

(1

ρ2Ti T γ

T∑t=1

y2i,t−1 ≤M

)= o

(1

n

)The Proof of Lemma 8.5: The proof follows the fashion in Lemma 8.3 and accom-

modates the following decomposition

n max1≤i≤n

Pr

(1

ρ2Ti T 2γ

T∑t=1

y2i,t−1 ≤M

)≤ n max

1≤i≤nPr

(1

ρ2Ti T 2γ

T∑t=1

(y2i,t−1−Ey2

i,t−1

)≤ −M

)

39

+n max1≤i≤n

Pr

(1

ρ2Ti T 2γ

T∑t=1

Ey2i,t−1 ≤ 2M

)

≤ n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γ(y2i,T − Ey2

i,T

)∣∣∣∣∣ ≥ M

4

)

+n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

2ρ−2T+1i

T γ

T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ M

4

)

+n max1≤i≤n

Pr

(∣∣∣∣∣ 1

2ci

ρ−2Ti

T γ

T∑t=1

u2it

∣∣∣∣∣ ≥ M

4

)

+n max1≤i≤n

Pr

(∣∣∣∣ 1

2ci

1

ρ2Ti T 2γ−1

y2i,−1

∣∣∣∣ ≥ M

4

)+n max

1≤i≤nPr

(1

2ci

ρ−2Ti

T γEy2

i,T ≤ 2M

). (47)

For any positive constant M > 0, the first four terms in (47) are o(1). The asymptoticnegligibility is due to Lemma 8.3. If we define 2M < 5σ2

8c2, then

n max1≤i≤n

Pr

(1

2ci

ρ−2Ti

T γEy2

i,T ≤ 2M

)= 0.

In all, we complete the proof. �

Lemma 8.6 Suppose Assumption 1 and 2 hold. For some η > 0, under joint convergence(n, T )→∞,

supc∈Nη

1

n

n∑i=1

1{gi(c) 6= g0

i

} p→ 0.

The Proof of Lemma 8.6: For the definition of gi(·), we have for all g ∈{

1, 2, ...,K0},

1 {gi(c) = g} ≤ 1{

T∑t=1

(yit − yi,t−1ρgn)2 ≤T∑t=1

(yit − yi,t−1ρg0i n)2

}.

Define ρg0i n := exp(cg0i n

/T γ)and ρgin := exp

( cginT γ

). For simplicity, we write ρ0

i

(:= exp

(c0iT γ

))as ρi. We derive the following transformation

1

n

n∑i=1

1 {gi(c) = g} =

K0∑g=1

1

n

n∑i=1

1{g0i 6= g

}1 {gi(c) = g}

≤K0∑g=1

1

n

n∑i=1

1{g0i 6= g

}1

{T∑t=1

(yit − yi,t−1ρgn)2 ≤T∑t=1

(yit − yi,t−1ρg0i n)2

}

= :K0∑g=1

1

n

n∑i=1

Zig(c),

40

where Zig(c) := 1{g0i 6= g

}1{∑T

t=1(yit − yi,t−1ρgn)2 ≤∑T

t=1(yit − yi,t−1ρg0i n)2}. We in-

tend to bound Zig(c) for all c ∈ Nη by the quantity irrelevant to c. Therefore for all i, ithas

Zig(c) ≤ maxg 6=g

1

{T∑t=1

(yit − yi,t−1ρgn

)2 ≤ T∑t=1

(yit − yi,t−1ρgn

)2}

= maxg 6=g

1

{T∑t=1

yi,t−1

(ρgn − ρgn

) (2yi,t−1ρ

0gn + 2uit − yi,t−1

(ρgn + ρgn

))≤ 0

}.

Let us define

HT : =

∣∣∣∣∣∣∑T

t=1 yi,t−1

(ρgn − ρgn

) (2yi,t−1ρ


(ρgn + ρgn

))−∑T

t=1 yi,t−1

(ρ0gn − ρ0

gn

)(2yi,t−1ρ


(ρ0gn + ρ0

gn

)) ∣∣∣∣∣∣≤

∣∣∣∣∣2T∑t=1

(ρgn − ρgn)yi,t−1uit

∣∣∣∣∣+

∣∣∣∣∣2T∑t=1

(ρ0gn − ρ0

gn)yi,t−1uit

∣∣∣∣∣+

∣∣∣∣∣∣∑T

t=1 yi,t−1

(ρgn − ρgn

) (2yi,t−1ρ

0gn − yi,t−1

(ρgn + ρgn

))−∑T

t=1 yi,t−1

(ρ0gn − ρ0

gn

)(2yi,t−1ρ

0gn − yi,t−1

(ρ0gn + ρ0

gn

)) ∣∣∣∣∣∣= : H1T +H2T +H3T ,

where H1T :=∣∣∣2∑T

t=1(ρgn − ρgn)yi,t−1uit

∣∣∣, H2T :=∣∣∣2∑T

t=1(ρ0gn − ρ0

gn)yi,t−1uit

∣∣∣ andH3T :=

∣∣∣∣∣∣∑T

t=1 yi,t−1

(ρgn − ρgn

) (2yi,t−1ρ

0gn − yi,t−1

(ρgn + ρgn

))−∑T

t=1 yi,t−1

(ρ0gn − ρ0

gn

)(2yi,t−1ρ

0gn − yi,t−1

(ρ0gn + ρ0

gn

)) ∣∣∣∣∣∣ .By the compactness of the parameter support, we have

H1T =

∣∣∣∣∣2T∑t=1

(ρgn − ρgn)yi,t−1uit

∣∣∣∣∣ ≤ 2∣∣ρgn − ρgn∣∣

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≤ B1

T γ√η

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ,where B1 is a constant independent of η and T . We have the above argument by thedefinition of η. Similarly we can justify that H2T ≤ B2

T γ√η∣∣∣∑T

t=1 yi,t−1uit

∣∣∣ where B2 is aconstant independent of η and T . For H3T , we have with B3 as a constant independentof η and T ,

H3T =

∣∣∣∣∣∣∑T

t=1 yi,t−1

(ρgn − ρgn

) (2yi,t−1ρ

0gn − yi,t−1

(ρgn + ρgn

))−∑T

t=1 yi,t−1

(ρ0gn − ρ0

gn

)(2yi,t−1ρ

0gn − yi,t−1

(ρ0gn + ρ0

gn

)) ∣∣∣∣∣∣=

∣∣∣∣∣T∑t=1

y2i,t−1

∣∣∣∣∣∣∣∣∣ρ0gn

(ρgn − ρ0

gn − ρgn + ρ0gn

)+

1

2

((ρ0gn)

2 − ρ2gn + ρ2

gn−(ρ0gn

)2)∣∣∣∣41

≤ B3

T γ√η

∣∣∣∣∣T∑t=1

y2i,t−1

∣∣∣∣∣ .By combining available results, we obtain the following,

Zig(c) ≤ maxg 6=g

1

{T∑t=1

yi,t−1

(ρgn − ρgn

) (2yi,t−1ρ


(ρgn + ρgn

))≤ 0

}

≤ maxg 6=g

1

∑T

t=1 yi,t−1

(ρ0gn − ρ0

gn

)(2yi,t−1ρ


(ρ0gn + ρ0

gn

))≤ B1+B2

T γ√η∣∣∣∑T

t=1 yi,t−1uit

∣∣∣+ B3T γ√η∣∣∣∑T

t=1 y2i,t−1

∣∣∣ .

Based on the following fact that

T∑t=1

yi,t−1

(ρ0gn − ρ0

gn

)(2yi,t−1ρ

0gn − yi,t−1

(ρ0gn + ρ0

gn

))=

T∑t=1

y2i,t−1

(ρ0gn − ρ0

gn

)2,

we define the following argument as

Zig := maxg 6=g

1

(ρ0gn − ρ0

gn

)2∑Tt=1 y

2i,t−1 + 2

(ρ0gn − ρ0

gn

)∑Tt=1 yi,t−1uit

≤ B1+B2T γ√η∣∣∣∑T

t=1 yi,t−1uit

∣∣∣+ B3T γ√η∣∣∣∑T

t=1 y2i,t−1

∣∣∣ .

Consequently, we can bound Zig(c) by supc∈Nη Zig(c) ≤ Zig. Note that

supc∈Nη

1

n

n∑i=1

{gi(c) 6= g0

i

}≤ 1

n

n∑i=1

K0∑g=1

Zig.

For i = 1, 2, ..., n, g0i = g 6= g, we have equivalent representations ρi = ρgn. For all

g ∈ {1, 2, ...K0}, we have

Pr(Zig = 1

)≤

∑g 6=g

Pr

(ρ0gn − ρ0

gn

)2∑Tt=1 y

2i,t−1 + 2

(ρ0gn − ρ0

gn

)2∑Tt=1 yi,t−1uit

≤ B1+B2T γ√η∣∣∣∑T

t=1 yi,t−1uit

∣∣∣+ B3T γ√η∣∣∣∑T

t=1 y2i,t−1

∣∣∣

≤∑g 6=g

Pr

2(ρ0gn − ρ0

gn

)∑Tt=1 yi,t−1uit ≤ −

(ρ0gn − ρ0

gn

)2∑Tt=1 y

2i,t−1

+B1+B2T γ√η∣∣∣∑T

t=1 yi,t−1uit

∣∣∣+ B3T γ√η∣∣∣∑T

t=1 y2i,t−1

∣∣∣

≤∑g 6=g

Pr

2

(c0gn−c

0gn

)T γ

∑Tt=1 yi,t−1uit ≤ −

(c0gn − c0

gn

)2 (ρ0gn

)2TM

+(B1 +B2)(ρ0gn

)2T √ηM +B3T

γ(ρ0gn

)2T √ηM

+∑g 6=g

Pr

(1

ρ2Ti T γ

T∑t=1

y2i,t−1 ≤M

)

+∑g 6=g

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥M)

+∑g 6=g

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

y2i,t−1

∣∣∣∣∣ ≥ M)

42

≤∑g 6=g

Pr

2 c∗

T γ∑T

t=1 yi,t−1uit ≤ − (c∗)2(ρ0gn

)2TM

+(B1 +B2)(ρ0gn

)2T √ηM +B3T

γ(ρ0gn

)2T √ηM

+∑g 6=g

Pr

(1

ρ2Ti T γ

T∑t=1

y2i,t−1 ≤M

)

+∑g 6=g

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥M)

+∑g 6=g

Pr

(1

ρ2Ti T γ

∣∣∣∣∣T∑t=1

y2i,t−1

∣∣∣∣∣ ≥ M). (48)

Based on Lemma 8.3, 8.4 and 8.5 we can argue the 2nd, 3rd and 4th terms of (48) are

o(

1n

). We restrict η as η ≤ (c∗)2M

2B3T γM. For example, we can set η = (c∗)2M

4B3T γM. Therefore we

have

∑g 6=g

Pr

(2 c∗

T γ∑T

t=1 yi,t−1uit ≤ − (c∗)2 ρ2TgnM

+(B1 +B2)ρ2Tgn

√ηM +B3T

γρ2Tgn

√ηM

)=

∑g 6=g

Pr

2 c∗

T γ(ρ0gn

)2T ∑Tt=1 yi,t−1uit ≤ − (c∗)2M

+(B1 +B2)√ηM +B3T

γ√ηM

≤

∑g 6=g

Pr

(2

c∗

T γρ2Ti

T∑t=1

yi,t−1uit ≤− (c∗)2M

2

)

≤∑g 6=g

Pr

(1

T γρ2Ti

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ c∗M

4

)

= o

(1

n

).

The last equality is due to Lemma 8.3. Combining the above results, we obtain Pr(Zig = 1

)=

o(

1n

). This implies that

supc∈Nη

1

n

n∑i=1

1{gi(c) 6= g0

i

}≤ 1

n

K0∑g=1

n∑i=1

EZig =1

n

K0∑g=1

n∑i=1

Pr(Zig = 1

)= K0

(K0 − 1

)o

(1

n

)= o

(1

n

).

�The Proof of Theorem 4.1: (i) Note the definition of gi, and we have

Pr

(max

1≤i≤n

∣∣gi(c)− g0i

∣∣ > 0

)≤ Pr (c /∈Nη) + E

[supc∈Nη

Pr

(max

1≤i≤n

∣∣gi(c)− g0i

∣∣ > 0

)].

Based on the proof of Lemma 8.6, we know that η = O(

1T γ

)asymptotically. The con-

vergence rates of our estimates are fast enough to satisfy this condition as cg − c0g =

Op

(1√nρT

)= op

(1T γ

)for g = 1, 2, ...,K0. Therefore we derive the following argument as,

Pr (c /∈Nη) = o(1).

Besides,

supc∈Nη

Pr

(max

1≤i≤n

∣∣gi(c)− g0i

∣∣ > 0

)≤ n sup

c∈Nηmax

1≤i≤nPr(∣∣gi(c)− g0

i

∣∣ > 0)

43

= n max1≤i≤n

supc∈Nη

Pr(∣∣gi(c)− g0

i

∣∣ > 0)

= n · o(

1

n

)= o (1) .

We completed the proof.�Assume that Gg represents the g-th estimated classification group and G0

g denotes theg-th true group in the population. To rigorously state the uniform consistency of therecursive k-means classifications, we define the following sequences of events as

Eg,i :={i /∈ Gg|i ∈ G0

g

}and Fg,i :=

{i /∈ G0

g|i ∈ Gg}, (49)

for g = 1, 2, ...,K0 and i = 1, 2, ..., n. Let Eg,nT :=⋃i∈G0g Eg,i and Fg,nT :=

⋃i∈Gg Fg,i.

Eg,nT demonstrates the error event of not classifying the individual unit of G0g into Gg as

a type-I classification error; Eg,nT demonstrates the error event of not clustering the cross-sectional unit of Gg into G0

g as a type-II classification error. Furthermore, we establishthe uniform consistency of the recursive k-means classifier.

Lemma 8.7 (Uniform Consistency of Classification) Let Assumption 1 hold and c0i > 0

for each i = 1, 2, ..., n. Under joint convergence (n, T )→∞,(i) Pr

(⋃K0

g=1 Eg,nT

)≤∑K0

g=1 Pr(Eg,nT

)→ 0; and

(ii) Pr(⋃K0

g=1 Fg,nT

)≤∑K0

g=1 Pr(Fg,nT

)→ 0.

Theorem 8.7 illustrates that for all g ∈{

1, 2, ...,K0}all cross-sectional units belonging

to group G0g are clustered into the same clustered group Gg asymptotically. Meanwhile, all

cross-sectional agents classified into the same group Gg for all g ∈{

1, 2, ...,K0}belong to

the same group G0g in the probability limit. These observations show that the summarized

classification errors are diminishing asymptotically.The Proof of Lemma 8.7: For the uniform consistency, observe that Pr

(⋃K0

g=1 Eg,nT

)≤∑K0

g=1 Pr(Eg,nT

)≤∑K0

g=1

∑i∈G0g Pr

(Eg,i

). We have

K0∑g=1

∑i∈G0g

Pr(Eg,i

)≤ n max

1≤i≤nE1{gi(c) 6= g0

i } = n max1≤i≤n

Pr{∣∣gi(c)− g0

i

∣∣ > 0}

≤ n max1≤i≤n

supc∈Nη

Pr{∣∣gi(c)− g0

i

∣∣ > 0}+ n max1≤i≤n

Pr{∣∣ci − c0

i

∣∣ > η}

= o(1) + n max1≤i≤n

Pr{∣∣ci − c0

i

∣∣ > η}. (50)

It remains to show that the second term of (50) is asymptotically negligible,

n max1≤i≤n

Pr{∣∣ci − c0

i

∣∣ > η}

= n max1≤i≤n

Pr

∣∣∣∣∣∣[

T∑t=1

y2i,t−1

]−1 [ T∑t=1

yi,t−1uit

]∣∣∣∣∣∣ ≥ η

44

≤ n max1≤i≤n

Pr

{1

ρ2Ti T 2γ

T∑t=1

y2i,t−1 ≤

5σ2

8c2

}

+n max1≤i≤n

Pr

{∣∣∣∣∣ 1

ρ2Ti T γ

T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ ηT γ 5σ2

8c2

}= o(1).

where the last equality is based on Lemma 8.3, 8.4 and 8.5. Note the fact that η = O(

1T γ

).

Then we successfully justify the argument of (i);For (ii), we basically follow the derivations in Su et al. (2016). By Baye’s theorem, we

have the following decomposition as,

Pr(Fg,i

)= 1− Pr

(i ∈ G0

g|i ∈ Gg)

=

∑K0

l=1,l 6=g Pr(i ∈ Gg|i ∈ G0

g

)Pr(i ∈ G0

l

)Pr(i ∈ Gg|i ∈ G0

g

)Pr(i ∈ G0

g

)+∑K0

l=1,l 6=g Pr(i ∈ Gg|i ∈ G0

l

)Pr(i ∈ G0

l

) .We have

K0∑l=1,l 6=g

Pr(i ∈ Gg|i ∈ G0

l

)Pr(i ∈ G0

l

)≤ (K0 − 1)

K0∑l=1

Pr(i /∈ Gl|i ∈ G0

l

)= o(1).

here the last equality is for the uniform consistency of (i). Note the fact that Pr(i ∈ Gg|i ∈ G0

g

)=

1− Pr(i /∈ Gg|i ∈ G0

g

)= 1− o(1) uniformly in i and g by (i), and we have


g

)Pr(i ∈ G0

g

)+

K0∑l=1,l 6=g


l

)Pr(i ∈ G0

l

)≥ Pr

(i ∈ G0

g

)/2,

asymptotically. Therefore we have the following results

Pr

K0⋃g=1

Fg,nT

≤K0∑g=1

Pr(Fg,nT

)≤

K0∑g=1

∑i∈Gg

Pr(Fg,i

)

≤

∑K0

l=1,l 6=g Pr(i ∈ Gg|i ∈ G0

g

)Pr(i ∈ G0

l

)min1≤i≤n min1≤g≤K0 Pr

(i ∈ G0

g

)/2

=o(1)

min1≤g≤K0 τ g= o(1),

based on the fact that ngn → πg for g = 1, 2, ...,K0. �

The Proof of Theorem 4.2: Based on the definition of the post-classification esti-mator, we have for each g ∈

{1, 2, ...,K0

}and cgn > 0

√ngT

γ(ρ0gn)T

(ρgn − ρ0

gn

)=

1√ngT γ(ρ0gn)T

∑i∈Gg

∑Tt=1 yi,t−1uit

1ngT 2γ(ρ0gn)2T

∑i∈Gg

∑Tt=1 y

2i,t−1

.

45

The numerator and denominator can be decomposed as

1√ngT γ(ρ0

gn)T

∑i∈Gg

T∑t=1

yi,t−1uit

=1

√ngT γ(ρ0

gn)T

∑i∈G0g

T∑t=1

yi,t−1uit +1

√ngT γ(ρ0

gn)T

K0∑g=1,g 6=g

∑i∈Gg\G0g

i∈G0g ,α0g<α

0g

T∑t=1

yi,t−1uit

+1

√ngT γ(ρ0

gn)T

K0∑g=1,g 6=g

∑i∈Gg\G0g

i∈G0g ,α0g≥α

0g

T∑t=1

yi,t−1uit −1

√ngT γ(ρ0

gn)T

∑i∈G0g\Gg

T∑t=1

yi,t−1uit,

and

1

ngT 2γ(ρ0gn)2T

∑i∈Gg

T∑t=1

y2i,t−1

=1

ngT 2γ(ρ0gn)2T

∑i∈G0g

T∑t=1

y2i,t−1 +

1

ngT 2γ(ρ0gn)2T

K0∑g=1,g 6=g

∑i∈Gg\G0g

i∈G0g ,α0g<α

0g

T∑t=1

y2i,t−1

+1

ngT 2γ(ρ0gn)2T

K0∑g=1,g 6=g

∑i∈Gg\G0g

i∈G0g ,α0g≥α

0g

T∑t=1

y2i,t−1 −

1

ngT 2γ(ρ0gn)2T

∑i∈G0g\Gg

T∑t=1

y2i,t−1.

Therefore it remains to justify these arguments:(i) For any g = 1, 2, ...,K0,

1√ngT (ρ0

gn)T

∑i∈Gg\G0g

i∈G0g ,α0g<α

0g

T∑t=1

yi,t−1uit = op(1),

1

ngT 2γ(ρ0gn)2T

∑i∈Gg\G0g

i∈G0g ,α0g<α

0g

T∑t=1

y2i,t−1 = op(1);

(ii) For any g = 1, 2, ...,K0,

1√ngT γ(ρ0

gn)T

∑i∈Gg\G0g

i∈G0g ,α0g≥α

0g

T∑t=1


1

ngT 2γ(ρ0gn)2T

∑i∈Gg\G0g

i∈G0g ,α0g≥α

0g

T∑t=1

y2i,t−1 = op(1);

46

(iii) For any g = 1, 2, ...,K0,

1√ngT γ(ρ0

gn)T

∑i∈G0g\Gg

T∑t=1


1

ngT 2γ(ρ0gn)2T

∑i∈G0g\Gg

T∑t=1

y2i,t−1 = op(1).

For (i) (ii) and (iii), second terms can be proved identically as the first ones. Withoutlosing generality we just focus on the first terms. For (iii), we have for any ε > 0, and forany g = 1, 2, ...,K0

Pr

∣∣∣∣∣∣∣

1√ngT γ(ρ0

gn)T

∑i∈G0g\Gg

T∑t=1

yi,t−1uit

∣∣∣∣∣∣∣ > ε

≤ Pr(Eg,nT

)→ 0,

under joint asymptotics. For (i), for any g 6= g = 1, 2, ...,K0 and for any ε > 0, we have

Pr

∣∣∣∣∣∣∣∣∣∣

1√ngT γ(ρ0

gn)T

∑i∈Gg\G0g

i∈G0g ,α0g<α

0g

T∑t=1

yi,t−1uit

∣∣∣∣∣∣∣∣∣∣> ε

≤ Pr

K0∑g=1,g 6=g

Fg,nT

→ 0,

under joint asymptotic framework. For (ii), for any g 6= g = 1, 2, ...,K0 and for any ε > 0,

we have

Pr

∣∣∣∣∣∣∣∣∣∣

1√ngT γ(ρ0

gn)T

K0∑g=1,g 6=g

∑i∈Gg\G0g

i∈G0g ,α0g≥α

0g

T∑t=1

yi,t−1uit

∣∣∣∣∣∣∣∣∣∣> ε

≤ Pr

K0∑g=1,g 6=g

Fg,nT

→ 0.

Summarizing above results, we have

1√ngT γ(ρ0

gn)T

∑i∈Gg

T∑t=1

yi,t−1uit =1

√ngT γ(ρ0

gn)T

∑i∈G0g

T∑t=1

yi,t−1uit + op(1).

and1

ngT 2γ(ρ0gn)2T

∑i∈Gg

T∑t=1

y2i,t−1 =

1

ngT 2γ(ρ0gn)2T

∑i∈G0g

T∑t=1

y2i,t−1 + op(1).

The asymptotics for the post-classification estimator is asymptotically equivalent to theinfeasible within estimator. The limiting behaviours of oracle estimators are establishedin the online supplement.�

We also intend to establish the asymptotic properties of the mixed-root panel au-toregressions. We collect the individual least squares {ci}ni=1 and discuss their uniformbound.

47

Lemma 8.8 If γ > 313 and Assumption 1, 2 hold,

sup1≤i≤n,c0i>0

∣∣ci − c0i

∣∣ = Op

((ρ)−T T

3−3γ4

),

where ρ := exp(cT γ

).

The Proof of Lemma 8.8: If ci > 0, there is one explosive root. The time seriesestimator is as,

ci − c0i = T γ

(T∑t=1

y2i,t−1

)−1( T∑t=1

yi,t−1uit

),

and

sup1≤i≤n

∣∣ci − c0i

∣∣ = T γ

(inf

1≤i≤n

T∑t=1

y2i,t−1

)−1(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣)

≤ T γ

(inf

1≤i≤n

T∑t=1

Ey2i,t−1− sup

1≤i≤n

T∑t=1

(Ey2

i,t−1 − y2i,t−1

))−1(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣).

(i) To justify the uniform upper bound for∣∣∣∑T

t=1 yi,t−1uit

∣∣∣, we apply the exponentialinequality of Freedman (1975) as

Pr

(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥MnT

)

≤ Pr

(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

2

)+ Pr

(sup

1≤i≤n

∣∣Tyi,−1ui∣∣ ≥ MnT

2

)

≤ n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

2

)︸︷︷︸

(A.1)

+n sup1≤i≤n

Pr

(∣∣Tyi,−1ui∣∣ ≥ MnT

2

)︸︷︷︸

(A.2)

,

where the innovations uniformly dominate the fixed effect. For (A.1) we apply the expo-nential inequality of Freedman (1975). The process {yi,t−1uit}Tt=1 is a martingale differencesequence as E (yi,t−1uit|Fi,t−1) = 0 with Fi,t−1 := σ {ui,t−1, ui,t−2, ...} .We set a truncationrate di,nT >> ρTi n

14T

14 . We define zit := yi,t−1uit and make the following decomposition

T∑t=1

zit =T∑t=1

z1it +T∑t=1

z2it −T∑t=1E [z2it|Fi,t−1] . Define z1it := zit1it − E [zit1it|Fi,t−1] and

z2it := zit1it. Define 1it := 1 {|zit| ≤ di,nt} and 1it = 1− 1it. It suffi ces to show that

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ MnT

6

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z2it

∣∣∣∣∣ ≥ MnT

6

)= o (1) ,

48

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

E [z2it|Fi,t−1]

∣∣∣∣∣ ≥ MnT

6

)= o (1) . (51)

The second and third arguments of (51) share identical derivations, and without losinggenerality we only focus on the second term. We define ViT :=

∑Tt=1 E

[z2

1it|Fi,t−1

], and

vi,nT >> ρ2Ti n

12T

12

+ γ2 as a truncation rate for ViT .

E[V 2iT

]= E

[T∑t=1

E[z2

1it|Fi,t−1

]]2

≤ TT∑t=1

E[z4

1it

]≤ 16T

T∑t=1

E[z4it

]≤ CT

T∑t=1

ρ4ti = Op

(T 1+γρ4T

i

).

By Proposition 2.1 in Freedman (1975), we have

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ MnT

6

)

≤ n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ MnT

6, ViT ≤ vi,nT

)+ n sup

1≤i≤nPr (ViT > vi,nT )

≤ sup1≤i≤n

exp

(−M2

nT /36 + 2vi,nT log (n) + 4MnTdi,nT log (n) /6

2vi,nT + 4MnTdi,nT /6

)+ sup

1≤i≤no(nT 1+γρ4T

i v−2i,nT

)= o(1). (52)

To show asymptotic negligibility of (52), we need MnT >> sup1≤i≤n ρTi n

14T

1+γ4

√log (n)

andMnT >> sup1≤i≤n ρTi n

14T

14 log (n) . The assumption for vi,nT ensures o

(nT 1+γρ4T

i v−2i,nT

)=

o(1). For the second term of (51), we have

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z2it

∣∣∣∣∣ ≥ MnT

6

)

≤ n sup1≤i≤n

Pr

(max

1≤t≤T|zit| ≥ di,nT

)≤ nT

d4i,nT

sup1≤i≤n

max1≤t≤T

E[|zit|4 1 {|zit| > di,nT }

]= sup

1≤i≤nmax

1≤t≤To

(nT

d4i,nT

ρ4ti

)= o (1) ,

which is guaranteed by our assumption for di,nT . It is easily justified that sup1≤i≤n ρTi n

14T

1+γ4

√log (n) >>

sup1≤i≤n ρTi n

14T

14 log (n) . Therefore, for (A.1) we need MnT >> ρTn

14T

1+γ4

+ε√

log (n) forany ε > 0.

For (A.2) term, the uniform upper bound follows decompositions as

sup1≤i≤n

∣∣Tyi,−1ui∣∣ ≤ T sup

1≤i≤n

∣∣yi,−1

∣∣ sup1≤i≤n

|ui| .

49

For sup1≤i≤n |ui| term, the exponential inequality of Freedman (1975) is easily applied,and shows sup1≤i≤n |ui| = Op

(T−

12n

14√

log n). For sup1≤i≤n

∣∣yi,−1

∣∣, we firstly apply thefollowing transformation as

sup1≤i≤n

∣∣yi,−1

∣∣ = sup1≤i≤n

∣∣∣∣∣ 1

T

T∑t=1

(t−1∑s=0

ρt−1−si uis

)∣∣∣∣∣ = sup1≤i≤n

∣∣∣∣∣ 1

T

T−1∑s=0

(T∑

t=s+1

ρt−1−si

)uis

∣∣∣∣∣ ,where the innovations dominate the fixed effects. Hence the exponential inequality for1T

∑T−1s=0

(∑Tt=s+1 ρ

t−1−si

)uis is applicable and demonstrates that

sup1≤i≤n

∣∣yi,−1

∣∣ = Op

(ρTn

14T

7γ−14

√log n

).

Combining above results, we have

sup1≤i≤n

∣∣Tyi,−1ui∣∣ = Op

(ρTn

12T

7γ+14 log n

)>> ρTn

14T

1+γ4

√log (n) .

Based on derivations for (A.1) and (A.2), for any ε > 0 we have

sup1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ = Op

(ρTn

12T

7γ+14

+ε log n). (53)

(ii) To justify the uniform lower bound for the denominator inf1≤i≤n∑T

t=1 y2i,t−1, we employ

the following decomposition as

inf1≤i≤n

T∑t=1

y2i,t−1 = inf

1≤i≤n

T∑t=1

Ey2i,t−1︸︷︷︸

(B.1)

− sup1≤i≤n

T∑t=1

(Ey2

i,t−1 − y2i,t−1

)︸︷︷︸

(B.2)

.

For (B.1) term, expectation operator removes randomness. We have

inf1≤i≤n

T∑t=1

Ey2i,t−1 = O

(ρ2TT 2γ

).

For (B.2) term, the martingale exponential inequality is not directly applicable. Therefore,we employ the following decomposition,

Pr

(sup

1≤i≤n

T∑t=1

(Ey2

i,t−1 − y2i,t−1

)≥ MnT

)

≤ n sup1≤i≤n

Pr

(T∑t=1

(Ey2

i,t−1 − y2i,t−1

)≥ MnT

)

≤ n sup1≤i≤n

Pr

(∣∣∣∣T γ2ci

(y2i,T − Ey2

i,T

)∣∣∣∣ ≥ MnT

)︸︷︷︸

(B.2.1)

+n sup1≤i≤n

Pr

(∣∣∣∣∣ρTi T γ2ci

T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

)︸︷︷︸

(B.2.2)

50

+n sup1≤i≤n

Pr

(∣∣∣∣∣T γ2ci

T∑t=1

u2it

∣∣∣∣∣ ≥ MnT

)︸︷︷︸

(B.2.3)

+n sup1≤i≤n

Pr

(∣∣∣∣T γ+1

2ciy2i,−1

∣∣∣∣ ≥ MnT

)︸︷︷︸

(B.2.4)

,

where the innovations dominate the fixed effect uniformly. For (B.2.1) term, we have

n sup1≤i≤n

Pr

(∣∣∣∣T γ2ci

(y2i,T − Ey2

i,T

)∣∣∣∣ ≥ MnT

)≤ n sup

1≤i≤nPr

(∣∣∣∣∣T γ2ci

T∑t=1

ρ2(T−t)i

(u2it − Eu2

it

)∣∣∣∣∣ ≥ MnT

).

Note that{ρ

2(T−t)i

(u2it − Eu2

it

)}Tt=1

is a martingale difference sequence as E(ρ

2(T−t)i

(u2it − Eu2

it

)|Fi,t−1

)=

0 with Fi,t−1 := σ {ui,t−1, ui,t−2, ...} . We set a truncation rate φi,nT >> ρ2Ti n

14T

14

+γ .

We define xit := ρ2(T−t)i

(u2it − Eu2

it

)and make the following decomposition

T∑t=1

xit =

T∑t=1

xit1it +T∑t=1

x2it −T∑t=1E [x2it|Fi,t−1] . Define x1it := xit1it − E [xit1it|Fi,t−1] and x2it :=

xit1it. Define 1it := 1{|xit| ≤ φi,nt

}and 1it = 1− 1it. It suffi ces to show that

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

x

1it

∣∣∣∣∣ ≥ MnT

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

x2it

∣∣∣∣∣ ≥ MnT

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

E [x2it|Fi,t−1]

∣∣∣∣∣ ≥ MnT

)= o (1) . (54)


∑Tt=1 E

[x2

1it|Fi,t−1

], and

vi,nT >> ρ4Ti n

12T

12

+ γ2 is a truncation rate for ViT . With some constant C > 0, we have

E[V 2iT

]= E

[T∑t=1

E[x2

1it|Fi,t−1

]]2

≤ TT∑t=1

E[x4

1it

]≤ 16T

T∑t=1

E[x4it

]≤ CT

T∑t=1

ρ8ti = Op

(T 1+γρ8T

i

).


n sup1≤i≤n

Pr

(T γ

2ci

∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥ MnT

)

≤ n sup1≤i≤n

Pr

(T γ

2ci

∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥ MnT , ViT ≤ vi,nT

)+ n sup

1≤i≤nPr(ViT > vi,nT

)≤ sup

1≤i≤nexp

(−4c2

i M2nT /T

2γ + 2vi,nT log (n) + 4ciMnTφi,nt log (n) /T γ

2vi,nT + 4ciMnTφi,nt/Tγ

)+ sup

1≤i≤no(nT 1+γρ8T

i v−2i,nT

)51

= o(1). (55)

By our assumptions for vi,nT , we have sup1≤i≤n o(nT 1+γρ8T

i v−2i,nT

)= o(1). The asymptotic

negligibility of the exponential term in (55) follows

MnT >> sup1≤i≤n

T γ√

log(n)T1+γ4 ρ2T

i n14 ,

MnT >> sup1≤i≤n

T γ log(n)T14 ρ2Ti n

14 .

Since sup1≤i≤n Tγ√

log(n)T1+γ4 ρ2T

i n14 >> sup1≤i≤n T

γ log(n)T14 ρ2Ti n

14 , we only require

MnT >> sup1≤i≤n Tγ√

log(n)T1+γ4 ρ2T

i n14 . For the second term of (54), we show

n sup1≤i≤n

Pr

(T γ

2ci

∣∣∣∣∣T∑t=1

x2it

∣∣∣∣∣ ≥ MnT

)≤ n sup

1≤i≤nPr

(max

1≤t≤T|xit| ≥ φi,nt

)≤ nT

φ4i,nT

sup1≤i≤n

max1≤t≤T

E[|xit|4 1

{|xit| > φi,nT

}]= sup

1≤i≤no

(nT

φ4i,nT

ρ8Ti

).

The assumption for φi,nt ensures that sup1≤i≤n o

(nTφ4i,nT

ρ8Ti

)= o(1).

Therefore, for (B.2.1) term, we have

sup1≤i≤n

∣∣∣∣T γ2ci

(y2i,T − Ey2

i,T

)∣∣∣∣ = Op

(sup

1≤i≤nT γ+ε

√log(n)T

1+γ4 ρ2T

i n14

),

for any ε > 0.

For (B.2.2) term, we have, for any ε > 0,

sup1≤i≤n

∣∣∣∣∣ρTi T γ2ci

T∑t=1

yi,t−1uit

∣∣∣∣∣ = Op

(T γρ2Tn

14T

1+γ4

+ε√

log (n)),

based on our derivations for the numerator.For (B.2.3) term, we have

sup1≤i≤n

T γ

2ci

T∑t=1

u2it = sup

1≤i≤n

T γ

2ci

T∑t=1

Eu2it − sup

1≤i≤n

T γ

2ci

T∑t=1

(Eu2

it − u2it

).

We easily show sup1≤i≤nT γ

2ci

∑Tt=1Eu2

it = O(T 1+γ

). Similarly by the law of iterated loga-

rithm, we show sup1≤i≤nT γ

2ci

∑Tt=1

(Eu2

it − u2it

)= op

(T 1+γ

). Therefore we show

sup1≤i≤n

T γ

2ci

T∑t=1

u2it = Op

(T 1+γ

),

52

dominated by exponential rates.For (B.2.4) term, we have the following decomposition as

n sup1≤i≤n

Pr

(∣∣∣∣T γ+1

2ciy2i,−1

∣∣∣∣ ≥MnT

)≤ n sup

1≤i≤nPr

(∣∣∣∣∣Tγ+12

√2ci

yi,−1

∣∣∣∣∣ ≥√MnT

)

= n sup1≤i≤n

Pr

(∣∣∣∣∣Tγ−12

√2ci

T∑t=1

(t−1∑s=0

ρt−1−si uis

)∣∣∣∣∣ ≥√MnT

)

= n sup1≤i≤n

Pr

(∣∣∣∣∣Tγ−12

√2ci

T−1∑s=0

(T∑

t=s+1

ρt−1−si

)uis

∣∣∣∣∣ ≥√MnT

),

where innovations dominate the fixed effect. Note that{(∑T

t=s+1ρt−1−si

)uis

}Ts=1

is a

martingale difference sequence as E((∑T

t=s+1ρt−1−si

)uis|Fi,s−1

)= 0 with Fi,s−1 :=

σ {ui,s−1, ui,s−2, ...} . We set a truncation rate oi,nT >> ρTi n14T

14

+γ . We define xit :=(∑Tt=s+1ρ

t−1−si

)uis and make the following decomposition

T∑t=1

xit =T∑t=1

x1it +T∑t=1

x2it −T∑t=1E [x2it|Fi,t−1] . Define x1it := xit1it − E [xit1it|Fi,t−1] and x2it := xit1it. Define 1it :=

1 {|xit| ≤ oi,nt} and 1it = 1− 1it. It suffi ces to show that

n sup1≤i≤n

Pr

(Tγ+12

√2ci

∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥√MnT

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

x2it

∣∣∣∣∣ ≥√MnT

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

E [x2it|Fi,t−1]

∣∣∣∣∣ ≥√MnT

)= o (1) . (56)

The second and third arguments of (56) share identical derivations, and without losinggenerality we only focus on the second term. We define V iT :=

∑Tt=1 E

[x2

1it|Fi,t−1

], and

vi,nT >> ρ2Ti n

12T

12

+ 5γ2 is a truncation rate for V iT . With some constant C > 0, we have

E[V

2iT

]= E

[T∑t=1

E[x2

1it|Fi,t−1

]]2

≤ TT∑t=1

E[x4

1it

]≤ 16T

T∑t=1

E[x4it

]≤ CT 1+4γ

T∑t=1

ρ4ti = Op

(T 1+5γρ4T

i

).


n sup1≤i≤n

Pr

(Tγ−12

√2ci

∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥√MnT

)

53

≤ n sup1≤i≤n

Pr

(Tγ−12

√2ci

∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥√MnT , V iT ≤ vi,nT

)+ n sup

1≤i≤nPr(V iT > vi,nT

)≤ sup

1≤i≤nexp

(−2ciMnTT

1−γ + 2vi,nT log (n) + 2√

2ci√MnT oi,nt log (n)T 1−γ

2vi,nT + 2√

2ci√MnT oi,ntT 1−γ

)+ sup

1≤i≤no(nT 1+5γρ4T

i v−2i,nT

)= o(1). (57)

By our assumptions for vi,nT , we have sup1≤i≤n o(nT 1+5γρ4T

i v−2i,nT

)= o(1). The asymp-

totic negligibility of the exponential term (57) is ensured by the following facts,

MnT >> sup1≤i≤n

T γ−1 (log(n))2 T12

+2γρ2Ti n

12 ,

MnT >> sup1≤i≤n

T γ−1 log(n)T5γ+12 ρ2T

i n12 .

Since sup1≤i≤n Tγ−1 log(n)T

5γ+12 ρ2T

i n12 >> sup1≤i≤n T

γ−1 (log(n))2 T12

+2γρ2Ti n

12 , we only

require MnT >> sup1≤i≤n Tγ−1 log(n)T

5γ+12 ρ2T

i n12 . For the second term of (56), we show

n sup1≤i≤n

Pr

(T

1−γ2

2ci

∣∣∣∣∣T∑t=1

x2it

∣∣∣∣∣ ≥√MnT

)≤ n sup

1≤i≤nPr

(max

1≤t≤T|xit| ≥ oi,nt

)≤ nT

o4i,nT

sup1≤i≤n

max1≤t≤T

E[|xit|4 1 {|xit| > oi,nT }

]= sup

1≤i≤no

(nT

o4i,nT

ρ4Ti T 4γ

).

The assumption for oi,nt ensures that sup1≤i≤n o

(nTo4i,nT

ρ4Ti T 4γ

)= o(1).

Combining results of (B.1), (B.2.1), (B.2.2), (B.2.3), and (B.2.4), for any ε > 0 andγ > 3

13 , we have

inf1≤i≤n

T∑t=1

y2i,t−1 =Op

(T γ−1+ε log(n)T

5γ+12 ρ2Tn

12

). (58)

All in all, based on (58) and (53), we have

sup1≤i≤nc0i>0

∣∣ci − c0i

∣∣ = Op

((ρ)−T T

3−3γ4

).

This concludes our proof.�

Lemma 8.9 If γ < 23 and Assumption 1, 2 hold,

sup1≤i≤n,c0i<0

∣∣ci − c0i

∣∣ = Op

(n14

+ε√

log(n)

Tγ2

+ 34

),

with arbitrary ε > 0.

54

The Proof of Lemma 8.9: Similarly, we demonstrate the uniform upper boundsand uniform lower bound of several terms as

inf1≤i≤n

T∑t=1

Ey2i,t−1,︸︷︷︸

(C.1)

sup1≤i≤n

T∑t=1

∣∣Ey2i,t−1 − y2

i,t−1

∣∣︸︷︷︸

(C.2)

, sup1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣︸︷︷︸(C.3)

.

For the term of (C.1), we have

inf1≤i≤n

T∑t=1

Ey2i,t−1 = O

(T 1+γ

),

since the expectation operator removes randomness. For (C.2), by applying the exponen-tial inequality of Freedman (1975) or as in Su et al. (2016), we can readily show

sup1≤i≤n

T∑t=1

∣∣Ey2i,t−1 − y2

i,t−1

∣∣ = op(T 1+γ

).

Therefore we show

inf1≤i≤n

T∑t=1

y2i,t−1 = inf

1≤i≤n

T∑t=1

Ey2i,t−1− sup

1≤i≤n

T∑t=1

(Ey2

i,t−1 − y2i,t−1

)= Op

(T 1+γ

).

For the term of (C.3), we accommodates the following decomposition as,

Pr

(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

)

≤ Pr

(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

)+ Pr

(sup

1≤i≤n


)

≤ n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

)︸︷︷︸

(C.3.1)

+n sup1≤i≤n

Pr(∣∣Tyi,−1ui

∣∣ ≥ MnT

)︸︷︷︸

(C.3.2)

,

where the innovations dominate the fixed effect uniformly. For (C.3.1) we apply the expo-nential inequality of Freedman (1975). The process {yi,t−1uit}Tt=1 is a martingale differencesequence as E (yi,t−1uit|Fi,t−1) = 0 with Fi,t−1 := σ {ui,t−1, ui,t−2, ...} .We set a truncationrate di,nT >> n

14T

1+γ4 . We define zit := yi,t−1uit and make the following decomposition

T∑t=1

zit =T∑t=1

z1it +T∑t=1


z2it := zit1it. Define 1it := 1{|zit| ≤ di,nT


n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ MnT

)= o (1) ,

55

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z2it

∣∣∣∣∣ ≥ MnT

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

E [z2it|Fi,t−1]

∣∣∣∣∣ ≥ MnT

)= o (1) . (59)


∑Tt=1 E

[z2

1it|Fi,t−1

], and

vi,nT >> n12T

12

+γ is a truncation rate for ViT .

E[V 2iT

]= E

[T∑t=1

E[z2

1it|Fi,t−1

]]2

≤ TT∑t=1

E[z4

1it

]≤ 16T

T∑t=1

E[z4it

]≤ CT 2γ+1 = Op

(T 2γ+1

).


n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ MnT

)

≤ n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it


)+ n sup


)≤ sup

1≤i≤nexp

(−M2

nT + 2vi,nT log (n) + 4MnT di,nT log (n)

2vi,nT + 4MnT di,nT

)+ sup

1≤i≤no(nT 1+2γ v−2

i,nT

)= o(1). (60)

To show asymptotic negligibility of (60), we need MnT >> n14T

1+2γ4

√log (n) and MnT >>

n14T

1+γ4 log (n) . Our assumption for vi,nT ensures sup1≤i≤n o

(nT 1+2γ v−2

i,nT

)= o(1). For

the second term of (59), we have

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z2it

∣∣∣∣∣ ≥ MnT

)

≤ n sup1≤i≤n

Pr

(max


)≤ nT

d4i,nT

sup1≤i≤n

max1≤t≤T

E[|zit|4 1

{|zit| > di,nT

}]= sup

1≤i≤nmax

1≤t≤To

(nT 1+γ

d4i,nT

)= o (1) ,

which is guaranteed by our assumption for di,nT . Since n14T

1+2γ4

√log (n) >> n

14T

1+γ4 log (n) ,

we need MnT >> n14T

1+2γ4

√log (n). Therefore we have, for any ε > 0,

sup1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ = Op

(n14T

1+2γ4

+ε√

log (n)).

56

For (C.3.2) term, we have

sup1≤i≤n

∣∣Tyi,−1ui∣∣ = T sup

1≤i≤n

∣∣yi,−1

∣∣ sup1≤i≤n

|ui| .

By applying exponential inequality, we have sup1≤i≤n |Tui| = Op

(√log nn

14T

12

). Note

that{(∑T

t=s+1ρt−1−si

)uis

}Ts=1

is a martingale difference sequence as

E

((T∑

t=s+1

ρt−1−si

)uis|Fi,s−1

)= 0,

with Fi,s−1 := σ {ui,s−1, ui,s−2, ...} .We set a truncation rate oi,nT >> n14T

14

+γ .We define

xit :=(∑T

t=s+1ρt−1−si

)uis and make the following decomposition

T∑t=1

xit =T∑t=1

x1it +

T∑t=1

x2it−T∑t=1E [x2it|Fi,t−1] . Define x1it := xit1it−E [xit1it|Fi,t−1] and x2it := xit1it. Define

1it := 1 {|xit| ≤ oi,nt} and 1it = 1− 1it. It suffi ces to show that

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥ T fnT)

= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

x2it

∣∣∣∣∣ ≥ T fnT)

= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

E [x2it|Fi,t−1]

∣∣∣∣∣ ≥ T fnT)

= o (1) . (61)

The second and third arguments of (61) share identical derivations, and without losinggenerality we only focus on the second term. We define WiT :=

∑Tt=1 E

[x2

1it|Fi,t−1

], and

wi,nT >> n12T

12

+ 5γ2 is a truncation rate for WiT . With some constant C > 0, we have

E[W 2iT

]= E

[T∑t=1

E[x2

1it|Fi,t−1

]]2

≤ TT∑t=1

E[x4

1it

]≤ 16T

T∑t=1

E[x4it

]≤ CT 1+4γ

T∑t=1

ρ4ti = Op

(T 1+5γ

).


n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥ T fnT)

≤ n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

x1it

∣∣∣∣∣ ≥ T fnT , WiT ≤ wi,nT

)+ n sup

1≤i≤nPr(WiT > wi,nT

)

57

≤ sup1≤i≤n

exp

(−T 2f2

nT + 2wi,nT log (n) + 4T fnT oi,nt log (n)

2wi,nT + 4T fnT oi,nt

)+ sup

1≤i≤no(nT 1+5γw−2

i,nT

)= o(1). (62)

To show asymptotic negligibility of (62), we need fnT >> n14T

5γ−34

√log (n) and fnT >>

n14T γ−

34 log (n) . Our assumption for wi,nT ensures sup1≤i≤n o

(nT 1+5γw−2

i,nT

)= o(1). For

the second term of (61), we have

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z2it

∣∣∣∣∣ ≥ T fnT)

≤ n sup1≤i≤n

Pr

(max

1≤t≤T|xit| ≥ oi,nt

)≤ nT

o4i,nT

sup1≤i≤n

max1≤t≤T

E[|xit|4 1 {|xit| > oi,nt}

]= sup

1≤i≤nmax

1≤t≤To

(nT 1+4γ

o4i,nT

)= o (1) ,

which is guaranteed by our assumption for oi,nt. Since n14T

5γ−34

√log (n) >> n

14T γ−

34 log (n) ,

we need fnT >> n14T

5γ−34

√log (n). This shows

sup1≤i≤n


1≤i≤n

∣∣yi,−1

∣∣ sup1≤i≤n

|ui| = Op

((log n)n

12T

5γ−14

).

With rate restriction γ < 23 , we have

sup1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ = Op

(n14

+εT1+2γ4

√log (n)

),

and

sup1≤i≤n,c0i<0

∣∣ci − c0i

∣∣ = Op

(n14

+ε√

log(n)

Tγ2

+ 34

),

with arbitrary ε > 0.�

Lemma 8.10 If Assumption 1 and 2 hold,

sup1≤i≤n,c0i=0

∣∣ci − c0i

∣∣ = Op

(n14

+ε√

log(n)

T

),

with arbitrary ε > 0.

The Proof of Lemma 8.10: Similarly, we demonstrate the uniform upper boundsand uniform lower bound of several terms as

inf1≤i≤n

T∑t=1

Ey2i,t−1,︸︷︷︸

(D.1)

sup1≤i≤n

T∑t=1

∣∣Ey2i,t−1 − y2

i,t−1

∣∣︸︷︷︸

(D.2)

, sup1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣︸︷︷︸(D.3)

.

58

For the term of (D.1), we have

inf1≤i≤n

T∑t=1

Ey2i,t−1 = O

(T 2),

since the expectation operator removes randomness. For (D.2), by applying the exponen-tial inequality of Freedman (1975) or as in Su et al. (2016), we can readily show

sup1≤i≤n

T∑t=1

∣∣Ey2i,t−1 − y2

i,t−1

∣∣ = op(T 2).

Therefore we show

inf1≤i≤n

T∑t=1

y2i,t−1 = inf

1≤i≤n

T∑t=1

Ey2i,t−1− sup

1≤i≤n

T∑t=1

(Ey2

i,t−1 − y2i,t−1

)= Op

(T 2).

For the term of (D.3), we accommodates the following decomposition as,

Pr

(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

)

≤ Pr

(sup

1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

)+ Pr

(sup

1≤i≤n


)

≤ n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ ≥ MnT

)︸︷︷︸

(D.3.1)

+n sup1≤i≤n

Pr(∣∣Tyi,−1ui

∣∣ ≥ MnT

)︸︷︷︸

(D.3.2)

,

where the innovations dominate the fixed effect uniformly. For (D.3.1) we apply the expo-nential inequality of Freedman (1975). The process {yi,t−1uit}Tt=1 is a martingale differencesequence as E (yi,t−1uit|Fi,t−1) = 0 with Fi,t−1 := σ {ui,t−1, ui,t−2, ...} . We set a trunca-tion rate di,nT >> n

14T

34 .We define zit := yi,t−1uit and make the following decomposition

T∑t=1

zit =T∑t=1

z1it +T∑t=1


z2it := zit1it. Define 1it := 1{|zit| ≤ di,nT


n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ MnT

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z2it

∣∣∣∣∣ ≥ MnT

)= o (1) ,

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

E [z2it|Fi,t−1]

∣∣∣∣∣ ≥ MnT

)= o (1) . (63)

59


∑Tt=1 E

[z2

1it|Fi,t−1

], and

vi,nT >> n12T 2 is a truncation rate for ViT .

E[V 2iT

]= E

[T∑t=1

E[z2

1it|Fi,t−1

]]2

≤ TT∑t=1

E[z4

1it

]≤ 16T

T∑t=1

E[z4it

]≤ CT 4 = Op

(T 4).


n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it

∣∣∣∣∣ ≥ MnT

)

≤ n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z1it


)+ n sup


)≤ sup

1≤i≤nexp

(−M2

nT + 2vi,nT log (n) + 4MnT di,nT log (n)

2vi,nT + 4MnT di,nT

)+ sup

1≤i≤no(nT 4v−2

i,nT

)= o(1). (64)

To show asymptotic negligibility of (64), we need MnT >> n14T√

log (n) and MnT >>

n14T

34 log (n) . Our assumption for vi,nT ensures sup1≤i≤n o

(nT 4v−2

i,nT

)= o(1). For the

second term of (63), we have

n sup1≤i≤n

Pr

(∣∣∣∣∣T∑t=1

z2it

∣∣∣∣∣ ≥ MnT

)

≤ n sup1≤i≤n

Pr

(max


)≤ nT

d4i,nT

sup1≤i≤n

max1≤t≤T

E[|zit|4 1

{|zit| > di,nT

}]= sup

1≤i≤nmax

1≤t≤To

(nT 3

d4i,nT

)= o (1) ,

which is guaranteed by our assumption for di,nT . Since n14T√

log (n) >> n14T

34 log (n) ,

we need MnT >> n14T√

log (n). Therefore we have, for any ε > 0,

sup1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ = Op

(n14T√

log (n)).

For (C.3.2) term, we have

sup1≤i≤n

∣∣Tyi,−1ui∣∣ = T sup

1≤i≤n

∣∣yi,−1

∣∣ sup1≤i≤n

|ui| .

By applying exponential inequality, we have sup1≤i≤n |Tui| = Op

(√log nn

14T

12

). Simi-

larly, by applying exponential inequality, we can show that T sup1≤i≤n∣∣yi,−1

∣∣ = Op

(√log nn

14T

34

).

60

Therefore we have

sup1≤i≤n


1≤i≤n

∣∣yi,−1

∣∣ sup1≤i≤n

|ui| = Op

((log n)n

12T

14

).

and

sup1≤i≤n

∣∣∣∣∣T∑t=1

yi,t−1uit

∣∣∣∣∣ = Op

(n14

+εT√

log (n)).

The uniform upper bound for individual least squares is

sup1≤i≤n,c0i<0

∣∣ci − c0i

∣∣ = Op

(n14

+ε√

log(n)

T

),

with arbitrary ε > 0.�

Lemma 8.11 If 313 < γ < 2

3 and suppose Assumption 1 and 2 hold,

sup1≤i≤n

∣∣ci − c0i

∣∣ = max

{Op

(n14

+ε√

log (n)

T34

+ γ2

), Op

(n14

+ε√

log(n)

T

)}=: Op(δnT ),

for arbitrary ε > 0.

The Proof of Lemma 8.11: The uniform convergence rate of individual leastsquare estimator is shown as

sup1≤i≤n

∣∣ci − c0i

∣∣ = max

{sup

1≤i≤n,c0i>0

∣∣ci − c0i

∣∣ , sup1≤i≤n,c0i<0

∣∣ci − c0i

∣∣ , sup1≤i≤n,c0i=0

∣∣ci − c0i

∣∣}

= max

{Op

(1

ρT

), Op

(n14

+ε√

log (n)

T34

+ γ2

), Op

(n14

+ε√

log(n)

T

)}

= max

{Op

(n14

+ε√

log (n)

T34

+ γ2

), Op

(n14

+ε√

log(n)

T

)}= Op(δnT ),

by Lemma 8.8, Lemma 8.9, Lemma 8.10 and dominance of exponential rates. We concludeour proof. �

Lemma 8.12 Assume rate restriction 313 < γ < 2

3 . If Assumption 1 holds,

dH(c0n, α

)= O

(δ12nT

),

where δnT is defined in Lemma 8.10.

61

When n grows more slowly than T , the Hausdorff distance between c0n and α is asymp-

totically diminishing. This lemma shows the uniform consistency of parameter estimations,based on which the following theorems establish the individual and uniform consistencyof the modified k-means algorithm for mixed-root panel autoregressions.

The Proof of Lemma 8.12: (This proof follows derivations in Su et al. (2019)).Let Qn (α) =

∑K0

g=1 min1≤l≤K(c0gn − αl

)2πg.

(i) Firstly we derive the convergence rate of Qn (α)−Qn (α) uniformly over α ∈M :={(α1, ..., αK0) : sup1≤g≤K0 |αg| ≤ 2c

}for some upper bound c. LetRn = sup1≤i≤n

∣∣∣ci − c0g0i n

∣∣∣.Then by assumptions, we have

Rn = sup1≤i≤n


∣∣∣ ≤ Op(δnT ) ≤ c a.s.,

by our previous derivations. In addition, we have,

(ci − αl)2 ≥(c0g0i n− αl

)2− 2

∣∣∣(c0g0i n− ci

)(c0g0i n− αl

)∣∣∣− (c0g0i n− ci

)2

≥(c0g0i n− αl

)2− 2

∣∣∣(c0g0i n− ci

)(c0g0i n− αl

)∣∣∣− ( sup1≤i≤n

(c0g0i n− ci

))2

≥(c0g0i n− αl

)2− 2

(sup

1≤i≤n

∣∣∣c0g0i n− ci

∣∣∣)(∣∣∣c0g0i n− αl

∣∣∣)− ( sup1≤i≤n

(c0g0i n− ci

))2

≥(c0g0i n− αl

)2− 2

(sup

1≤i≤n


∣∣∣)(∣∣∣c0g0i n

∣∣∣+ |αl|)−(

sup1≤i≤n

(c0g0i n− ci

))2

.

Taking min1≤l≤K0 on both sides and take average over i, we have

1

n

n∑i=1

min1≤l≤K0

(ci − αl)2 ≥ 1

n

n∑i=1

min1≤l≤K0

(c0g0i n− αl

)2− 2δnT (2c+ c)− cδnT ,

where(

sup1≤i≤n

(c0g0i n− ci

))2≤ cδnT is of lower order term. Therefore we have Qn (α) ≥

Qn (α)− 7cδnT . Similarly we have

(ci − αl)2 ≤(c0g0i n− αl

)2+ 2

(sup

1≤i≤n


∣∣∣)(∣∣∣c0g0i n

∣∣∣+ |αl|)

+

(sup

1≤i≤n

(c0g0i n− ci

))2

.

Hence we derive Qn (α) ≤ Q (α) + 7cδnT and supα∈M

∣∣∣Qn (α)−Qn (α)∣∣∣ ≤ 7cδnT .

(ii) Secondly, we show that α ∈M. Denote α := {α1, α1, ..., αK0} . By our assumption,we have

sup1≤i≤n

|ci| ≤ sup1≤i≤n


∣∣∣+ sup1≤i≤n

∣∣∣c0g0i n

∣∣∣ ≤ 2c.

Denote In (g) := {i : g = arg min1≤g≤K |ci − αl|} for some g ≤ K0. Here we use contradic-tions to demonstrate our results:

(ii.1) If |αg| > 2c and In (g) = ∅, then we choose α′

:={α1, ..., αg−1, α

′g, αg+1..., αK0

},

with α′g = ci for each i ∈ {1, 2, ..., n} . Therefore we can get

∣∣∣α′g∣∣∣ ≤ 2c < |αg| and

Qn

(α′)< Qn (α) . This demonstrates a contradiction.

62

(ii.2) If |αg| > 2c and In (g) 6= ∅, then we choose α′

:={α1, ..., αg−1, α

′g, αg+1..., αK

},

with α′g = 1

|In(g)|∑

i∈In(g) ci for each i ∈ {1, 2, ..., n} . Here |In (g)| denotes the cardinality

of In (g) . This shows∣∣∣α′g∣∣∣ ≤ 2c <

∣∣αg∣∣ and Qn (α′) < Qn (α) . This is a contradiction too.

Based on (ii.1) and (ii.2),∣∣αg∣∣ ≤ 2c for each g ∈

{1, 2, ...,K0

}.

(iii) We show for any η > 0 infα:dH(α,c0n)>η Qn (α) ≥ M min{η2, (c∗)2

}where c0

n :={c0

1n, c02n, ..., c

0K0n

}and M ≤ πg ≤ 1 for g = 1, 2, ...,K0. If there exists some lo ∈{

1, 2, ...,K0}and two indexes g1 and g2 such that

lo = arg min1≤l≤K

∣∣c0g1n − αlo

∣∣ = arg min1≤l≤K

∣∣c0g2n − αlo

∣∣ ,then we have

Qn (α) ≥ πg1

(c0g1n− αlo

)2+ πg2

(c0g2n− αlo

)2

≥ M(∣∣∣c0

g1n− αlo

∣∣∣+∣∣∣c0g2n− αlo

∣∣∣)2≥M

(c0g1n− c0

g2n

)2

≥ M (c∗)2 .

Besides, if there does not exist such an lo, then there is one-to-one mapping h:{

1, 2, ...,K0}→{

1, 2, ...,K0}such that

h (g) = arg min1≤l≤K

∣∣c0gn − αl

∣∣ .Thus Qn (α) =

∑K0

g=1 πg(c0gn − αh(g)

)2 ≥ (inf1≤g≤K0 πg)d2H

(α, c0

n

)≥Mη2. Then we show

(iii).(iv) Lastly, we show that dH

(α, c0

n

)≤√

15cδnT√M

for some constant D > 0 and arbitrary

ε > 0. We have

Pr

(dH(α, c0

n

)≥√

15cδnT√M

)= Pr

(dH(α, c0

n

)≥√

15cδnT√M

,Qn (α) ≥ Qn(c0n

)+ min

{M (c∗)2 , 15cδnT

})≤ Pr

(Qn (α) ≥ Qn

(c0n

)+ min

{M (c∗)2 , 15cδnT

})≤ Pr

(Qn (α) +Rn ≥ Qn

(c0n

)−Rn + min

{M (c∗)2 , 15cδnT

})= Pr

(2Rn ≥ Qn

(c0n

)− Qn (α) + min

{M (c∗)2 , 15cδnT

})≤ Pr

(2Rn ≥ min

{M (c∗)2 , 15cδnT

})= o (1) ,

since Qn(c0n

)− Qn (α) ≥ 0. Note the fact that for large enough n and T , we have

2Rn ≤ 2 · 7cδnT < 15cδnT < M (c∗)2 .

We complete the whole proof.�

63

The Proof of Theorem 4.3: (This proof follows derivations in Su et al. (2019)).By Assumption 1, under joint convergence framework (n, T ) → ∞, there is one-to-onemapping Fn :

{1, 2, ...,K0

}→{

1, 2, ...,K0}, such that

supg

∣∣∣αg − c0Fn(g)n

∣∣∣ ≤ Op(√15cδnT√M

)= Op

(δ12nT

).

Without losing generality, we can assume that Fn (g) = g such that

Rn = supg

∣∣∣αg − c0Fn(g)n

∣∣∣ = supg

∣∣αg − c0gn

∣∣ ≤ Op(√15cδnT√M

)= Op

(δ12nT

).

If gi 6= g0i , then

∣∣ci − αgi∣∣ ≤ ∣∣∣ci − αg0i ∣∣∣ . This, in conjunction with triangle inequality,implies that ∣∣∣αgi − αg0i ∣∣∣− ∣∣∣ci − αg0i ∣∣∣ ≤ ∣∣αgi − ci∣∣ ≤ ∣∣∣ci − αg0i ∣∣∣ .It follows that

∣∣∣ci − αg0i ∣∣∣ ≥ 12

∣∣∣αgi − αg0i ∣∣∣ . Therefore we have,δnT +Rn ≥


∣∣∣+∣∣∣c0g0i n− αg0i

∣∣∣ ≥ ∣∣∣ci − αg0i ∣∣∣ ≥ 1

2

∣∣∣αgi − αg0i ∣∣∣≥ 1

2

∣∣∣(cgin − c0g0i n

)+(αgi − c

0gin

)+(c0g0i n− αg0i

)∣∣∣≥ 1

2


)∣∣∣− 1

2

∣∣∣(αgi − c0gin

)∣∣∣− 1

2

∣∣∣(c0g0i n− αg0i

)∣∣∣≥ 1

2


)∣∣∣− supg

∣∣(αg − c0gn

)∣∣=

1

2


)∣∣∣−Rn ≥ c∗

2−Rn.

This implies that 1{gi 6= g0

i

}≤ 1

{2Rn + δnT ≥ c∗

2

}. Noting that the right hand side of

the above term is independent of i, we have

Pr

{sup

1≤i≤n1{gi 6= g0

i

}> 0

}≤ Pr

{2Rn + δnT ≥

c∗

2

}≤ Pr

{δnT + 2δ

12nT ≥

c∗

2

}p→ 0,

as (n, T )→∞. This concludes our proof.�Similarly, to verify the oracle property on the post-classification estimator, we provide

the uniform consistency of our modified k-means classification algorithm.

Lemma 8.13 (Uniform Consistency of Classification) Assumption 1 and 313 < γ < 2

3

hold. Under joint convergence (n, T )→∞,(i) Pr

(⋃K0

g=1 Eg,nT

)≤∑K0

g=1 Pr(Eg,nT

)→ 0; and

(ii) Pr(⋃K0

g=1 Fg,nT

)≤∑K0

g=1 Pr(Fg,nT

)→ 0.

64

Once we attain the uniform consistency of the modified k-means classification, we canderive the asymptotic oracle property of (12).

The Proof of Lemma 8.13: For uniform consistency, it is suffi cient to show

K0∑g=1

Pr(Eg,nT

)= o(1).

The∑K0

g=1 Pr(Fg,nT

)= o(1) follows the exact derivations as in recursive k-means. We

know∣∣ci − αgi∣∣ ≤ ∣∣∣ci − αg0i ∣∣∣. This implies that

1{gi 6= g0

i

}≤ 1

{δnT + 2Rn ≥

c∗

2

},

where RnT =√

15cδnT√M

. Since we intend to show n sup1≤i≤n Pr(∣∣gi − g0

i

∣∣ > 0)

= o (1), we

need to show

n sup1≤i≤n

Pr(∣∣gi − g0

i

∣∣ > 0)≤ n sup

1≤i≤nPr

(δnT + 2Rn ≥

c∗

2

)= o(1).

Indeed we have

n sup1≤i≤n

Pr(∣∣gi − g0

i

∣∣ > 0)

≤ n sup1≤i≤n

Pr

(δnT≥

c∗

4

)+ n sup

1≤i≤nPr

(Rn ≥

c∗

8

)= o(1),

under the joint convergence (n, T ) → ∞. The reason why we can have the asymptoticnegligibility is for the properties of deterministic trends. Hence, the type I error of modifiedk-means classification is asymptotically diminishing as

K0∑g=1

Pr(Eg,nT

)=

K0∑g=1

∑i∈G0g

Pr(EgnT,i

)≤

K0∑g=1

∑i∈G0g

Pr(i /∈ Gg|i ∈ G0

g

)≤ n sup

1≤i≤nPr(∣∣gi − g0

i

∣∣ > 0)

= o (1) .

The proof for∑K0

g=1 Pr(Fg,nT

)= o (1) is identical to the recursive k-means. Then we

complete the proof.�The Proof of Theorem 4.4: For mixed-root panel with latent groups, the differ-

ence between the post-classification estimator and the oracle estimator is asymptoticallydiminishing. For the explosive group, the proof here is identical to Theorem 4.2. For thestationary and unit root group, derivations follow similar procedures. These results arealso shown in Phillips (2014).�

We provide the proofs for inference procedures and model selection on the true groupnumbers.

65

The Proof of Lemma 4.1: For any g = 1, 2, ...,K0 with α0g > 0, we have

uit = yit − ρgnyi,t−1,

uit = yit − ρ0gnyi,t−1.

Therefore we have

1

2ngT

∑i∈Gg

T∑t=1

u2

it =1

2ngT

∑i∈Gg

T∑t=1

u2it + (ρgn − ρ0

gn)2 1

2ngT

∑i∈Gg

T∑t=1

y2i,t−1

−(ρgn − ρ0gn)

1

2ngT

∑i∈Gg

T∑t=1

yi,t−1uit

=1

2ngT

∑i∈G0g

T∑t=1

u2it +

1

2ngT

∑i∈Gg\G0g

T∑t=1

u2it −

1

2ngT

∑i∈G0g\Gg

T∑t=1

u2it

+(ρgn − ρ0gn)2 1

2ngT

∑i∈G0g

T∑t=1

y2i,t−1 + (ρgn − ρ0

gn)2 1

2ngT

∑i∈Gg\G0g

T∑t=1

y2i,t−1

−(ρgn − ρ0gn)2 1

2ngT

∑i∈G0g\Gg

T∑t=1

y2i,t−1 − (ρgn − ρ0

gn)1

2ngT

∑i∈G0g

T∑t=1

yi,t−1uit

−(ρgn − ρ0g)

1

2ngT

∑i∈Gg\G0g

T∑t=1

yi,t−1uit + (ρgn − ρ0gn)

1

2ngT

∑i∈G0g\Gg

T∑t=1

yi,t−1uit(65)

To demonstrate the dominance of the first term in (65), we first show that ng−ng = op(1).

Note the fact that 1{i ∈ Gg

}−1{i ∈ G0

g

}= 1

{i ∈ Gg\G0

g

}−1{i ∈ G0

g\Gg}. ByMarkov

inequality, for any ε > 0, we have

Pr(∣∣ng − ng∣∣ > 2ε

)≤ Pr

(n∑i=1

1{i ∈ Gg\G0

g

}≥ ε)

+ Pr

(n∑i=1

1{i ∈ G0

g\Gg}≥ ε)

≤ 1

ε

n∑i=1

Pr(Fg,i

)+

1

ε

n∑i=1

Pr(Eg,i

)

=1

ε

K0∑g=1

∑i∈G0g

Pr(Fg,i

)+

1

ε

K0∑g=1

∑i∈G0g

Pr(Eg,i

)= o(1)

The asymptotic negligibility of 12ngT

∑i∈G0g\Gg

∑Tt=1 u

2it, (ρgn−ρ0

gn)2 12ngT

∑i∈G0g\Gg

∑Tt=1 y

2i,t−1

and (ρgn − ρ0gn) 1

2ngT

∑i∈G0g\Gg

∑Tt=1 yi,t−1uit follows the same techniques as in (iii) of the

Theorem 4.2. The asymptotic negligibility of 12ngT

∑i∈Gg\G0g

∑Tt=1 u

2it, (ρgn−ρ0

gn)2 12ngT

∑i∈Gg\G0g

∑Tt=1 y

2i,t−1

and (ρgn − ρ0gn) 1

2ngT

∑i∈Gg\G0g

∑Tt=1 yi,t−1uit follows the same techniques as in (i) (ii) of

66

the Theorem 4.2. The asymptotic negligibility of (ρgn − ρ0gn)2 1

2ngT

∑i∈G0g

∑Tt=1 y

2i,t−1 and

(ρgn−ρ0gn) 1

2ngT

∑i∈G0g

∑Tt=1 yi,t−1uit follows identically as the proof of the oracle estimator.

Therefore, under the joint convergence (n, T )→∞, we have

1

2ngT

∑i∈Gg

T∑t=1

u2

it =1

2ngT

∑i∈G0g

T∑t=1

u2it + op(1).

For the stationary and unit root group, the consistency of the estimator follows Phillips(2014).�

The Proof of Theorem 4.5: Under the joint convergence (n, T ) → ∞ and thenull hypothesis, the tg statistics follows chi-square distribution. Under the alternativehypothesis of explosive roots, we have(

ρgn − 1) (√∑n

i=1

∑Tt=1 y

2i,t−1

)σ

=

(ρgn − ρ0

gn

)−(1− ρ0

gn

)σ

√√√√ n∑i=1

T∑t=1

y2i,t−1

=

(ρgn − ρ0

gn

) (√∑ni=1

∑Tt=1 y

2i,t−1

)σ

−

(1− ρ0

gn

) (√∑ni=1

∑Tt=1 y

2i,t−1

)σ

= Op (1) +Op

(√n(ρ0gn

)T).

Therefore we conclude our proof. �The Proof of Theorem 4.6: As (n, T ) → ∞, we have βTn → 0 and cvβTn → ∞.

Since tgd→ N(0, 1) under the null hypothesis of no bubble episode for g = 1, 2, ...K0, we

havelim

(n,T )→∞Pr(tg (·) > cvβTn

)= Pr (N (0, 1) =∞) = 0,

where cvβTn →∞. Hence, with joint convergence (n, T )→∞, no origination point for anexplosive model in the data will be detected under the null hypothesis.�

The Proof of Theorem 4.7: We fix g = 1, 2, ...,K0. Due to the uniform consistencyof the modified k-means classifiers, Gg is equivalent to G0

g, and gi is equivalent to g0i

asymptotically. For all individual in G0g, the time series data is sampled from

yit = µi + ρityi,t−1 + uit, i = 1, 2, ..., n; i = 1, 2, ..., n, (66)

where ρit = 1 + citT γ , and cit = c0

1i1{t < τ e

g0i

}+ c0

2i1{t ≥ τ e

g0i

}. For the g-th group, the

true distance parameter of explosive episode is α02g. The true value of the slope in the g-th

group is ρ02,gn, and we write as ρ2,gn for simplicity. The true value of the explosive slope

in the i-th individual is ρ02i, and we write as ρ2i for simplicity. For any i ∈ G0

g, we haveg0i = g. For rg < reg, we have the convergence in distribution,

tg (rg)d→ N (0, 1) ,

under the model of (66). Therefore, under the alternative hypothesis, we have for rg < reg,Pr(reg < reg

)→ 0 as (n, T )→∞.

67

Next suppose the data is sampled over t = 1, 2, ..., τ g0i=[Trg0i

], for each i ∈ G0

g. In

this case, the data{yit : t = τ e

g0i, ..., τ g0i

}satisfies

yit = µi + ρ2iyi,t−1 + uit =

t−τeg0i∑

j=0

ρj2iui,t−j +

t−τeg0i∑

j=0

ρj2iµi + ρt−τe

g0i

+1

2i yi,τeg0i

−1.

As t− τ eg0i→∞, the following asymptotic theory holds,

1

Tγ2

t−τeg0i∑

j=0

ρj−(t−τe

g0i

)2i ui,t−j

d→ N

(0,

σ2

2c02i

),

where 1√Tyi,τe

g0i

−1 ⇒ Bi

(τ eg0i

)by the functional central limit theorem. Then as t− τ e

g0i→

∞, we have

ρ−(t−τe

g0i

)2i √

Tyi,t =

1

T1−γ2

1√T γ

t−τeg0i∑

j=0

ρ−(t−τe

g0i

−j)

2i ui,t−j +ρ2i√Tyi,τe

g0i

−1

+1

T1−γ2

1√T γ

t−τeg0i∑

j=0

ρ−(t−τe

g0i

−j)

2i µi

⇒ Bi

(reg0i

).

So that for each i ∈ G0g, we have yit ∼

√Tρ

(t−τe

g0i

)2i Bi

(reg0i

), for all t− τ e

g0i→∞.

Now consider the centered quantities yit = yit− 1τ

∑τj=1 yi,j . Note the fact that g

0i = g.

For τ g = [Trg] (or τ g0i =[Trg0i

]) and reg < rg (or reg0i

< rg0i), we have

1

τ g0i

√T

τg0i∑

j=1

yij =1

τ g0i

√T

τg0i∑

j=τeg0i

yij +τ eg0i

τ g0i

1

τ eg0i

τeg0i∑

j=1

yij ∼1

τ g0i

√T

τg0i∑

j=τeg0i

yij +reg0i

rg0i

∫ 1

0Bi (s) ds,

and

1

τ g0i

τg0i∑

j=τeg0i

yij =1

τ g0i

τg0i∑

j=τeg0i

ρ(j−τe

g0i

)

2i

(ρ−(j−τe

g0i

)

2i yij

)=yi,τe

g0i

τ g0i

t−τeg0i∑

k=0

ρk2i (1 + o(1))

=yi,τe

g0i

τ g0i

ρt−τe

g0i

+1

2i − 1

τ g0i(ρ2i − 1)

(1 + o (1))

68

=yi,τe

g0i

τ g0i

T γρt−τe

g0i

+1

2i

τ g0ic0

2i

(1 + o (1)) .

It follows that

yit = yit −1

τ g0i

τg0i∑

j=1

yi,j =

ρ(t−τe

g0i

)2i − T γρ

(t−τe

g0i

)2i

τ g0ic0

2i

yi,τeg0i

{1 + o (1)} .

Therefore we have the following asymptotics for the sample moment in the post-classificationestimator as, ∑

i∈G0g

τeg0i∑

j=1

y2it =

nT 2γτ egρ2(t−τeg)2,gn

τ2g

(α0

2g

)2 Ey2i,τeg{1 + o (1)} .

Using these results in conjuction with the standard unit root limit theory, we have

∑i∈G0g

τg0i∑

j=1

y2i,j−1 =

∑i∈G0g

τeg0i

−1∑j=1

y2i,j−1 +

∑i∈G0g

τg0i∑

j=τeg0i

y2i,j−1 ∼

∑i∈G0g

τg0i∑

j=τeg0i

y2i,j−1, (67)

and

∑i∈G0g

τg0i∑

j=1

yi,t−1 (yi,j − ρ2iyi,j−1) =∑i∈G0g

τeg0i

−1∑j=1

yi,t−1

(ui,j −

c02i

T 2γyi,j−1

)+∑i∈G0g

τeg0i

−1∑j=1

yi,t−1ui,j (1 + o (1)) .

(68)Explicit probability limits of (67) and (68) are as,

∑i∈G0g

τg0i∑

j=τeg0i

y2i,j−1 =

∑i∈G0g

τg0i∑

j=τeg0i

ρ2

(j−τe

g0i

)2i y2

i,τeg0i

(1 + o (1)) =nρ

2(t−τeg+1)2,gn

ρ22,gn − 1

Ey2i,τeg{1 + o (1)}

=nT γρ

2(t−τeg)2,gn

2α02g

Ey2i,τeg{1 + o (1)} ,

which dominates∑

i∈G0g∑τe

g0i

j=1 y2i,j−1. The above derivations are due to the uniform con-

sistency of the modified k-means classifiers.Besides, we have for each i,

τg0i∑

j=τeg0i

yi,j−1ui,j =

τg0i∑

j=τeg0i

ρj−1−τe

g0i

2i yi,τeg0i

ui,j (1 + op (1))

= Tγ2 ρ

τg0i−τe

g0i

2i yi,τeg0i

1

Tγ2

τg0i∑

j=τeg0i

ρ−(τg0i−j+1

)2i ui,j

(1 + op (1))

69

∼ Tγ+12 ρ

τg−τeg2,gn ·N

(0,

σ2

2α02g

)·N(0, σ2

).

The terms 1√Tyi,τe

g0i

and

1

Tγ2

∑τg0i

j=τeg0i

ρ−(τg0i−j+1

)2i ui,j

are uncorrelated due to the mar-tingale difference property of innovations. Therefore the two limiting Guassian processesare independent. Under the joint convergence (n, T )→∞, we have

∑i∈G0g

τg0i∑

j=τeg0i

yi,j−1ui,j ∼√nT

γ+12 ρ

τg−τeg2,gn ·N

(0,

σ4

2α02g

).

It follows that for τ g = [Trg] and rg > reg we have

√nT

1+γ2 ρ

τg−τeg2,gn

2α02g

(ρ2,gn − ρ2,gn

) d→ N(0, 2α0

2g

),

where under (n, T )→∞, we have

ρ−(τg−τeg)2,gn√nT

1+γ2

∑i∈G0g

τg0i∑

j=τeg0i

yi,j−1ui,jd→ N

(0,

σ4

2α02g

),

and

2α02gρ−2(τg−τeg)2,gn

nT 1+γ

∑i∈G0g

τg0i∑

j=τeg0i

y2i,j−1

p→ σ2.

The regression residuals variance estimate is as, for any rg ∈ [r0, 1] , we have

σ2g (rg) =

1

2ngτ g

∑i∈G0g

τg∑j=1

(yi,j − ρ2i (rg) yi,j−1)2

=1

2ngτ g

∑i∈G0g

τg∑j=1

(ui,j − (ρ2i (rg)− ρ2i) yi,j−11

{j ≥ τ eg0i

}− (ρ2i (rg)− 1) yi,j−11

{j < τ eg0i

})2

∼τ egρ

2(τg−τeg)2,gn

2(α0

2g

)2τ3g

Ey2i,τeg

(1 + op (1)) ,

due to the fact that 12ngτg

∑i∈G0g

∑τgj=1 y

2i,j−1 =

τegρ2(τg−τeg)2,gn

2(α02g)2τ3Ey2

i,τeg(1 + op (1)) .

Collecting all results, the tg (rg) statistics for any rg ∈[reg, 1

], is as

tg (rg) =

(ρ2,gn − 1

)√∑i∈Gg

∑τgj=1 y

2i,j−1

σg (rg)

70

=

(ρ2,gn − 1

)τ g√

1τ2g

∑i∈G0g

∑τgj=1 y

2i,j−1

σg (rg)

=T 1−1γ

(rg · α0

2,g

)(1 + op (1))

√1τ2g

∑i∈G0g

∑τgj=1 y

2i,j−1

σg (rg)

=

T 1−γ (rg · α02,g

)(1 + op (1))

√nT γρ

2(τg−τeg)2,gn

2τ2gα02g

Ey2i,τeg√

regρ2(τg−τeg)2,gn

2rg(α02g)2τ2gEy2

i,τeg(1 + op (1))

=

√nT γα0

2,grgT1−γ (rg · α0

2,g

)(1 + op (1))√

reg= O

(√nT 1− γ

2

)= O (PTn) .

This argument shows that the statistics is consistent under the alternative hypothesis. �The Proof of Theorem 4.8: Based on Theorem 4.4 and Lemma 4.1, we have

BIC(K0) = ln(σ2Gg(K0)

)+ σ2

Gg(K0)

K0 + n

nTln(nT )

= ln

1

nT

K0∑g=1

∑i∈Gg(K0)

T∑t=1

(yit − ρGg(K)n

yi,t−1

)2

+ o(1)

→ σ2.

Moreover, under the under-fitted model with K∗ < K0, note that

σ2G(K∗)

= ln

1

nT

K∗∑g=1

∑i∈Gg(K∗)

T∑t=1

(yit − ρGg(K∗)nyi,t−1

)2

≥ min

1≤K∗<K0inf

G(K∗)

1

nT

K∗∑g=1

∑i∈Gg(K∗)

T∑t=1

(yit − ρGg(K∗)nyi,t−1

)2

= min1≤K∗<K0

infG(K∗)

σ2G(K∗).

Under Assumptions 1 and 2, we have

min1≤K∗<K

BIC (K∗) ≥ min1≤K∗<K

infG(K∗)∈G(K∗)

ln(σ2G(K∗)

)+ σ2

G(K∗)K0 + n

nTln (nT )

p→ ln(σ2)

> ln(σ2),

and it follows that

Pr

(min

1≤K∗<KBIC (K∗) > BIC (K)

)→ 1.

Lastly, under the overfitted moel, and in this case K0 < K∗ ≤ Kmax,

Pr

(min

K0<K∗≤Kmax

BIC (K∗) > BIC (K)

)71

= Pr

(min

K0<K∗≤Kmax

nT ln(σ2G(K∗)/σ

2G(K)

)+(K∗ −K0

)ln (nT ) > 0

)= Pr

(min

K0<K∗≤Kmax

nT ln

(σ2G(K∗) − σ

2G(K)

σ2G(K)

)+(K∗ −K0

)ln (nT ) > 0

)→ 1,

as (n, T )→∞.�

References

Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factormodels. Econometrica, 70(1), 191-221.

Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica,71(1), 135-171.

Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4),1229-1279.

Blanchard, O. J., & Watson, M. W. (1982). Bubbles, rational expectations and financialmarkets. NBER working paper.

Bonhomme, S., & Manresa, E. (2015). Grouped patterns of heterogeneity in panel data.Econometrica, 83(3), 1147-1184.

Chen, K., & Wen, Y. (2017). The great housing boom of China. American EconomicJournal: Macroeconomics, 9(2), 73-114.

Diba, B. T., & Grossman, H. I. (1987). On the inception of rational bubbles. TheQuarterly Journal of Economics, 102(3), 697-700.

Diba, B. T., & Grossman, H. I. (1988). Explosive rational bubbles in stock prices?. TheAmerican Economic Review, 78(3), 520-530.

Fang, H., Gu, Q., Xiong, W., & Zhou, L. A. (2016). Demystifying the Chinese housingboom. NBER macroeconomics annual, 30(1), 105-166.

Freedman, D. A. (1975). On tail probabilities for martingales. the Annals of Probability,3(1), 100-118.

Grout, P. A. (2003). Public and private sector discount rates in public—private partner-ships. The Economic Journal, 113(486), C62-C68.

Hahn, J., & Kuersteiner, G. (2002). Asymptotically unbiased inference for a dynamicpanel model with fixed effects when both n and T are large. Econometrica, 70(4),1639-1657.

Harrison, G. W., Lau, M. I., & Williams, M. B. (2002). Estimating individual discountrates in Denmark: A field experiment. American economic review, 92(5), 1606-1617.

Harvey, D. I., Leybourne, S. J., & Sollis, R. (2013). Recursive right-tailed unit roottests for an explosive asset price bubble. Journal of Financial Econometrics, 13(1),166-187.

72

Harvey, D. I., Leybourne, S. J., Sollis, R., & Taylor, A. R. (2016). Tests for explosivefinancial bubbles in the presence of non-stationary volatility. Journal of EmpiricalFinance, 38, 548-574.

Homm, U., & Breitung, J. (2012). Testing for speculative bubbles in stock markets: acomparison of alternative methods. Journal of Financial Econometrics, 10(1), 198-231.

Huang, W., S. Jin, and L. Su. (2019). Panel Cointegration with Latent Group Structuresand an Application to the PPP Theory. Econometric Theory, forthcoming.

Ke, Z. T., Fan, J., & Wu, Y. (2015). Homogeneity pursuit. Journal of the AmericanStatistical Association, 110(509), 175-194.

Lin, C. C., & Ng, S. (2012). Estimation of panel data models with parameter hetero-geneity when group membership is unknown. Journal of Econometric Methods, 1(1),42-55.

Li, Y., Yu, J., & Zeng, T. (2019). Deviance Information Criterion for Latent VariableModels and Misspecified Models. Journal of Econometrics, forthcoming.

Narayan, P. K., Mishra, S., Sharma, S., & Liu, R. (2013). Determinants of stock pricebubbles. Economic Modelling, 35, 661-667.

Pavlidis, E., Yusupova, A., Paya, I., Peel, D., Martínez-García, E., Mack, A., & Gross-man, V. (2016). Episodes of exuberance in housing markets: in search of the smokinggun. The Journal of Real Estate Finance and Economics, 53(4), 419-449.

Phllips. P. C. (2013). Estimation of the Localizing Rate for Mildly Integrated and MildlyExplosive Processes. Technical Report: Working Paper, Yale University.

Phillips, P. C., & Magdalinos, T. (2007). Limit theory for moderate deviations from aunit root. Journal of Econometrics, 136(1), 115-130.

Phillips, P. C., & Moon, H. R. (1999). Linear regression limit theory for nonstationarypanel data. Econometrica, 67(5), 1057-1111.

Phillips, P. C. (2014). Dynamic panel GMM with roots near unity. Technical Report:Working Paper, Yale University.

Phillips, P. C., Shi, S., & Yu, J. (2015a). Testing for multiple bubbles: Historical episodesof exuberance and collapse in the S&P 500. International economic review, 56(4),1043-1078.

Phillips, P. C., Shi, S., & Yu, J. (2015b). Testing for multiple bubbles: Limit theory ofreal-time detectors. International Economic Review, 56(4), 1079-1134.

Phillips, P. C., Wu, Y., & Yu, J. (2011). Explosive behavior in the 1990s Nasdaq: Whendid exuberance escalate asset values?. International economic review, 52(1), 201-226.

Phillips, P. C., & Yu, J. (2009). Limit theory for dating the origination and collapseof mildly explosive periods in time series data. Research Collection School Of Eco-nomics 2009-09.

Phillips, P. C., & Yu, J. (2011). Dating the timeline of financial bubbles during thesubprime crisis. Quantitative Economics, 2(3), 455-491.

73

Santos, M. S., & Woodford, M. (1997). Rational asset pricing bubbles. Econometrica:Journal of the Econometric Society, 19—57.

Sarafidis, V., & Weber, N. (2015). A partially heterogeneous framework for analyzingpanel data. Oxford Bulletin of Economics and Statistics, 77(2), 274-296.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesianmeasures of model complexity and fit. Journal of the royal statistical society: Seriesb (statistical methodology), 64(4), 583-639.

Shi, S. (2017). Speculative bubbles or market fundamentals? An investigation of USregional housing markets. Economic Modelling, 66, 101-111.

Su, L., Shi, Z., & Phillips, P. C. (2016). Identifying latent structures in panel data.Econometrica, 84(6), 2215-2264.

Su, L., Wang, W., & Zhang, Y. (2019). Strong consistency of spectral clustering forstochastic block models. IEEE Transactions on Information Theory, forthcoming.

74

Documents

Panel Approaches to Econometric Analysis of Bubble Behaviour€¦ · Bubble Behaviour Yanbo Liu Singapore Management University December 9, 2019 ... homogeneity over cross-sectional