Goodness-of-ﬁt tests for ARMA models · the portmanteau statistics Qm and Q˜m without requiring that the noise is independent or a martingale diﬀerence. The aim of this work

Goodness-of-fit tests for ARMA models

with uncorrelated errors∗

Christian Francq† Roch Roy‡

Jean-Michel Zakoıan§

CRM-2925

July 2003 (This version: April 2004)

∗This work was supported by grants to the second named author from the Natural Science and Engineering ResearchCouncil of Canada, the Network of Centres of Excellence on The Mathematics of Information Technology and ComplexSystems (MITACS) and the Fonds FQRNT (Government of Quebec).

†Universite Lille III, GREMARS, BP 149, 59653 Villeneuve d’Ascq cedex, France; [email protected]‡Universite de Montreal, Departement de mathematiques et de statistique et Centre de recherches mathematiques, C.P.

6128, succ. Centre-ville, Montreal, QC H3C 3J7, Canada; [email protected]§GREMARS and CREST, 3 Avenue Pierre Larousse, 92245 Malakoff Cedex, France; [email protected]

Abstract

We consider tests for lack of fit in ARMA models with non independent innovations. In thisframework, the standard Box-Pierce and Ljung-Box portmanteau tests can perform poorly.Specifically, the usual text book formulas for asymptotic distributions are based on strongassumptions and should not be applied without careful consideration. In this paper, we derivethe asymptotic covariance matrix Σρm of a vector of autocorrelations for residuals of ARMAmodels under weak assumptions on the noise. The asymptotic distribution of the portmanteaustatistics follows. A consistent estimator of Σρm , and a modification of the portmanteau testsare proposed. This allows to construct valid asymptotic significance limits for the residualautocorrelations, and (asymptotically) valid goodness-of-fit tests, when the underlying noiseprocess is assumed to be non correlated rather than independent or a martingale difference.A set of Monte-Carlo experiments, and an application to the Standard & Poor’s 500 returns,illustrate the practical relevance of our theoretical results.

Keywords: Residual Autocorrelations, Approximate Significance Limits, Portmanteau Tests,Weak ARMA Models, GARCH.

1 Introduction

Since the papers by Box and Pierce (1970) and Ljung and Box (1978), portmanteau tests have beenpopular diagnostic checking tools in the ARMA modelling of time series. Based on the residual empiricalautocorrelations ρ(h), the Box-Pierce and Ljung-Box statistics (BP and LB hereafter) are defined by

Qm = nm∑

h=1

ρ2(h) and Qm = n(n+ 2)m∑

h=1

ρ2(h)n− h

(1.1)

where n is the length of the series and m is a fixed integer.The standard test procedure consists in rejecting the null hypothesis of an ARMA (p, q) model if

Qm > χ2m−(p+q)(1 − α) (or Qm > χ2

m−(p+q)(1− α)), where m > p + q and χ2`(1 − α) denotes the (1− α)-

quantile of a χ2 distribution with ` degrees of freedom. Box and Pierce (1970) noted that if the noisesequence is independent and identically distributed (iid), the level of the Qm-test is approximately α whenboth m and n are large. The reader is referred to their paper, and to McLeod (1978), for a mathematicalbasis to this statement. The statistics Qm and Qm have the same asymptotic distribution, but the LBstatistic has the reputation of doing a better work for small or medium sample sizes (see Ljung-Box (1978)or Davies, Triggs and Newbold (1977)).

In the last decade, the time series literature has been characterized by a growing interest in nonlinearmodels. On the one hand these models provide useful characterizations of nonlinearities that are presentin the mean or the variance of economic and financial time series. But on the other hand, nonlinear modelsappear difficult to handle. Many practitioners still use linear (ARMA) models, even when there is evidenceof nonlinearities. Actually, this practice is not meaningless because many important classes of nonlinearprocesses admit ARMA representations. The representations are called weak because the innovations aredependent, though uncorrelated, and therefore constitute a weak white noise. However, some cautionshould be exercised in fitting ARMA models to nonlinear time series. The time series packages currentlyavailable for linear ARMA building (e.g. GAUSS, RATS, SAS, SPSS) rely on strong assumptions onthe noise process such as independence or martingale difference that are typically not satisfied for linearrepresentations of nonlinear processes. In recent years, a large part of the time-series and econometricliteratures have been devoted to weaken the noise dependence assumptions; see for instance the papers byFrancq and Zakoıan (1998) or Romano and Thombs (1996). In this paper we focus on the validation stepof the standard time series procedures. This validation stage is not only based on portmanteau tests, butalso on the examination of the autocorrelation function of the residuals.

It is a common practice to draw the sample autocorrelations of the observed time series and their5%-significance limits, computed on the basis of asymptotic results obtained for an iid process. Romanoand Thombs (1996) show that these significance limits can be quite misleading if the underlying process isuncorrelated rather than independent. They also show that the moving block bootstrap does a very goodjob in estimating the asymptotic variance of the lag 1 sample autocorrelation.

The above-mentioned significance limits are also used extensively as a diagnostic check on the residualsof a fitted ARMA model. In this article we show that these limits, as well as the BP and LB test procedures,are not (asymptotically) valid when the noise sequence is only uncorrelated and we propose valid procedures.To this aim we study the behaviour of the residuals autocorrelations, and of the portmanteau tests, in theframework of ARMA models with non independent error terms. By establishing the asymptotic distributionof a vector of m residual autocorrelations, we are able to provide the exact asymptotic distribution ofthe portmanteau statistics Qm and Qm without requiring that the noise is independent or a martingaledifference. The aim of this work is therefore to considerably widen the application area of these adequacytests.

Several papers in the recent time-series literature consider goodness-of-fit tests. We briefly review themost significant contributions. Chen and Deo (2003) propose a generalized portmanteau test based on thediscrete spectral average estimator, in the framework of linear processes (allowing for long memory). Theasymptotic distribution is derived under an iid assumption on the noise process. Extending the work of

1

Durlauf (1991), Deo (2000) considers testing the martingale difference hypothesis in presence of conditionalheteroskedasticity. Hong and Lee (2003) consider nonlinear time series models of a general form (includingARMA models) and propose a test based on a spectral density estimator. They derive the asymptoticdistribution of their test under the assumption that the noise is a martingale difference. Lobato, Nankervisand Savin (2001, 2002) address the problem of testing the null hypothesis that a time series is uncorrelatedup to some fixed order K and propose an extension of the Box-Pierce statistic. For the same problem,Lobato (2001) proposes an alternative test statistic that is asymptotically distribution-free under the null.None of these tests can be applied to the residuals of weak ARMA models because the underlying noiseprocess is (i) neither iid, nor a martingale difference, (ii) not observable.

As noted by Romano and Thombs (1996), although the distinction between uncorrelated and iid noisemay appear superficial, the implications in terms of statistical inference are broad. Many examples ofARMA processes in which the noise is not a martingale difference are given in Section 2. The rest ofthe paper is organized as follows. Section 3 introduces notations and provides the asymptotic distributionof a vector of sample autocorrelations of ARMA residuals. Then we obtain the limiting distribution ofthe portmanteau statistics. The noise is not required to be a martingale difference. Important particularcases are considered in Section 4, showing huge discrepancies between the true asymptotic distribution andthe commonly used χ2 approximation. These examples justify the need for adequate modifications of thestandard tests. To this aim, the asymptotic covariance matrix of the residual autocorrelations vector isestimated in Section 5. The method consists in estimating the spectral density of a multivariate processby means of auxiliary autoregressive models. The performances of the corrected portmanteau tests areevaluated in Section 6 through Monte-Carlo experiments. We concentrate on the LB test since it is moreused than the BP test. Section 7 is devoted to an application to the Standard & Poor’s 500 returns. Section8 concludes. Technical lemmas and proofs are collected in an appendix.

2 Examples of processes with weak ARMA representations

A second order stationary process (Xt)t∈Z is said to satisfy an ARMA(p, q) representation if, for all t ∈ Z,

Xt =p∑

i=1

aiXt−i + εt −q∑

i=1

biεt−i (2.1)

where the εt are error terms, with zero mean and common variance σ2 > 0, and where the polynomialsφ(z) = 1− a1z − · · · − apz

p and ψ(z) = 1− b1z − · · · − bqzq have all their zeros outside the unit disk and

have no zero in common. It is said that (2.1) is a weak ARMA model when the εt are only supposed to beuncorrelated. We will say that the representation (2.1) is semi-strong when (εt) is a martingale difference,and that (2.1) is a strong ARMA representation when (εt) is an iid sequence.

These noise assumptions are of crucial importance for the interpretation of Model (2.1) in terms ofpredictions. The linear predictor of Xt given Xt−1, Xt−2, . . . , is

P (Xt|Xt−1, . . . ) :=p∑

i=1

aiXt−i −q∑

i=1

biφ(B)ψ(B)

Xt−i,

where B denotes the backshift operator. Therefore, εt = Xt−P (Xt|Xt−1, . . . ) is the linear innovation of Xt.By elementary properties of projection mappings, the linear innovation process of a stationary process isalways a weak white noise (i.e. a stationary sequence of centered and uncorrelated random variables), but isnot always a martingale difference. Illustrations will be provided below. In fact, in (2.1), (εt) is a martingaledifference if and only if the best predictor of Xt is linear, that is E(Xt|Xt−1, . . . ) = P (Xt|Xt−1, . . . ), whereE(Xt|Xt−1, . . . ) denotes the conditional expectation of Xt given the σ-field generated by {Xu, u < t}.Since the assumption that the best predictor of Xt is a linear function of its past values is questionable,weak ARMA models seem more realistic (at least as approximations) than semi-strong ones.

2

Note that (2.1) is just a model, not a data generating process (DGP). Many DGP can be compatible witha weak ARMA representation. DGP admitting weak ARMA representations can be obtained as i) certaintransformations of strong ARMA processes, ii) causal representations of non causal ARMA processes, iii)linear representations of non linear processes, or iv) approximations of the so-called Wold decomposition.In the following examples, (ηt) is a sequence of iid random variables with E(ηt) = 0, Eη2

t = 1.

2.1 Transformations of strong ARMA processes

Consider a process (Xt) satisfying an ARMA model, and an aggregated process (Yt) of the form Yt =c1Xmt + c2Xmt+1 + · · ·+ cmXmt+m−1. There exist concrete situations in which only (Yt) is observed. Thisis the case, for instance, when only low frequency data, (Xmt)1≤t≤n, from a high frequency strong ARMAare available. The aggregation properties of ARMA models are well-known (see e.g. Amemiya and Wu,1972, Harvey and Pierse, 1984, Palm and Nijman, 1984). The aggregated process (Yt) also satisfies anARMA model. It is perhaps less well known that, in general, this ARMA representation is neither strongnor semi-strong, even when (Xt) satisfies a strong ARMA representation.

As a simple example, consider the strong ARMA(1,1) process

Xt − aXt−1 = ηt − bηt−1, a 6= b ∈ (−1, 1).

Consider the process (Yt) = (X2t). We have Yt − a2Yt−1 = ut := η2t + (a − b)η2t−1 − abη2t−2. SinceEu2

t = 1 + (a − b)2 + a2b2, Eutut−1 = −ab, and Eutut−h = 0 for all h > 1, it is seen that (ut) is aMA(1) of the form ut = εt − θεt−1 where (εt) is a weak white noise with variance Eε2t = Eu2

t /(1 + θ2)and θ ∈ (−1, 1) is such that θ/(1 + θ2) = −Eutut−1/Eu

2t . We have εt = −abη2t−2 + θut−1 + θ2η2t−4 + Rt

where Rt = η2t + (a − b)η2t−1 + θ2[(a − b)η2t−5 − abη2t−6] +∑

i≥3 θiut−i is centered and independent

of ut−1, which is a function of (η2t−2, η2t−3, η2t−4). Therefore, provided µ3 := Eη3t exists, Eεtu2

t−1 =−abEη2t−2u

2t−1+θEu3

t−1+θ2Eη2t−4u2t−1 = µ3

[−ab+ θ{1 + (a− b)3 − a3b3

}+ a2b2θ2

], which is generally

not equal to 0 when µ3 6= 0. In this case, since ut−1 belongs to the σ-field generated by {εu, u < t}, wehave Eεtu2

t−1 = E{u2

t−1E(εt|εt−1, . . . )} 6= 0. This shows that (εt) is not a martingale difference. Hence,

(Yt) satisfies a weak ARMA(1,1) representation, which is generally not semi-strong.Other weak ARMA representations are obtained by considering components of strong multivariate

ARMA processes. More generally, processes of the form Yt = c′Xt, where c ∈ Rd and (Xt) is a d-variateARMA process, admit ARMA representations (see Lutkepohl, Chapter 6, 1991; Nsiri and Roy, 1993), andthese representations are generally weak.

2.2 Causal representations of non causal ARMA processes

Let Xt = ηt − φηt−1 where |φ| > 1. The Xt process is a non causal MA(1). Now let εt =∑

i≥0 φ−iXt−i.

The εt process is centered and uncorrelated, and we have Xt = εt − φ−1εt−1, which is the causal MA(1)representation of (Xt). Obviously E(εtXt−1) = 0 because εt is the linear innovation of Xt. However,straightforward computations show that E(εtX2

t−1) = Eη3t (1 − φ2)(1 + φ−1) and E(εtX3

t−1) = (Eη4t −

3)(1 − φ2)2/φ. Thus the εt process is not a martingale difference in general (for instance if Eη3t 6= 0 or

Eη4t 6= 3).

2.3 Nonlinear processes

Nonlinear processes are often opposed to ARMA models, though these classes can be compatible. Thereexist numerous examples of nonlinear processes admitting ARMA representations: bilinear processes (seePham, 1985), GARCH processes and their powers (Francq and Zakoıan, 2003), Hidden Markov models,Markov-switching ARMA and GARCH processes (Francq and Zakoıan, 2001), autoregressive conditionalduration processes (Engle, 1998), logarithm of stochastic volatility processes (Ruiz, 1994). Other examplesof ARMA models with innovations that are only uncorrelated are given by Romano and Thombs (1996).For a nonlinear process (Xt), the best predictor is generally not linear. In other words, E(Xt|Xt−1, . . . ) 6=

3

P (Xt|Xt−1, . . . ). If (Xt) admits an ARMA representation of the form (2.1), then the σ-fields generated by{Xu, u < t} and {εu, u < t} coincide. Therefore, E(εt|εt−1, . . . ) = E {Xt − P (Xt|Xt−1, . . . )|εt−1, . . . } =E(Xt|Xt−1, . . . ) − P (Xt|Xt−1, . . . ) 6= 0, which shows that, when it exists, the ARMA representation of anonlinear process is only weak.

For the sake of concreteness, let us consider the simple bilinear model Xt = φXt−1 + ηt + bXt−1ηt−2.If φ2 + b2 < 1 it can be shown that this model has a unique nonanticipative (i.e. Xt is function ofthe ηt−i, i ≥ 0) strictly stationary solution with a finite second-order moment. For this solution, withut = φ+ bηt, we have the following expansion

Xt = ηt + ηt−1ut−2 +∞∑

k=2

ut−2ut−3 . . . ut−kηt−kut−k−1,

from which the second-order structure of (Xt) can be derived. Tedious calculations show that E(Xt) =bφ/(1− φ), and that for h > 3, γ(h) = φγ(h− 1), where γ(h) = Cov(Xt, Xt−h). It follows that Xt admitsan ARMA(1,3) representation of the form

Xt − φXt−1 = bφ+ εt − α1εt−1 − α2εt−2 − α3εt−3.

The values of the coefficients αi can be obtained from the first four autocovariances of Xt. In particular,in the pure bilinear case when φ = 0 it can be shown that α1 = α2 = 0. The coefficient α3 is obtainedby solving α3/(1 + α2

3) = b2(1− b2)/(1− b4 + b4Eη4t ), |α3| < 1. It is clear that for this class of bilinear

processes, the ARMA representations are only weak ones. This can be seen by comparing the optimalpredictions derived from the DGP and those derived from the ARMA equations. For instance in the caseφ = 0, the optimal linear prediction of Xt given its past takes the form α3Xt−3 − α2

3Xt−6 + α33Xt−9 + · · ·

whereas the optimal prediction is bXt−1Xt−2 − b2Xt−1Xt−3Xt−4 + b3Xt−1Xt−3Xt−5Xt−6 + · · · , providedthis expansion exists.

2.4 Approximations of the Wold decomposition

Wold (1938) has shown that any purely non deterministic, second-order stationary process (Xt) admits aninfinite MA representation of the form

Xt = εt +∞∑

i=1

ciεt−i, (2.2)

where (εt) is the linear innovation process of X and∑

i c2i < ∞. Defining the MA(q) process Xt(q) =

εt +∑q

i=1 ciεt−i, it is straightforward that

‖Xt(q)−Xt‖22 = Eε2t

∑

i>q

c2i → 0, when q →∞.

Therefore any purely non deterministic, second-order stationary process is the limit, in the mean-squaresense, of weak finite-order ARMA processes.

The examples discussed above show that weak ARMA models can arise from various situations. Makingstrong or semi-strong assumptions on the noise process precludes most of these DGP, as well as many others.The linear model (2.2), which consists of the ARMA models and their limits, is arguably very general underthe noise uncorrelatedness, but can be restrictive if stronger assumptions are made.

3 Limiting distributions

In this section we derive the limiting distribution of the residual autocorrelations and that of the port-manteau statistics, in the framework of weak ARMA. We will first recall asymptotic results concerning theestimation of weak ARMA models.

4

3.1 ARMA estimation under non-iid innovations

Let (Xt)t∈Z be an ARMA(p, q) process, i.e. a second order stationary solution of Model (2.1). Without lossof generality, assume that ap and bq are not both equal to zero (by convention a0 = b0 = 1). When p andq are not both equal to 0, let θ0 = (a1, . . . , ap, b1, . . . , bq)′, θ = (θ1, . . . , θp, θp+1, . . . , θp+q)′ and denote by Θthe parameter space, Θ = {θ ∈ Rp+q : φθ(z) = 1− θ1z − · · · − θpz

p and ψθ(z) = 1− θp+1z − · · · − θp+qzq

have all their zeros outside the unit disk}. For all θ ∈ Θ, let εt(θ) = ψ−1θ (B)φθ(B)Xt. Note that εt = εt(θ0).

For simplicity, we will omit the notation θ in all quantities taken at the true value, θ0. Given a realizationX1, X2, . . . , Xn, the variable εt(θ) can be approximated, for 0 < t ≤ n, by et(θ) defined recursively by

et(θ) = Xt +p∑

i=1

θiXt−i −q∑

i=1

θp+iet−i(θ)

where the unknown starting values are set to zero: e0(θ) = e−1(θ) = · · · = e−q+1(θ) = X0 = X−1 = · · · =X−p+1 = 0.

Let Θ∗ be a compact subset of Θ such that θ0 is in the interior of Θ∗. The random variable θ is calledleast squares estimator of θ if it satisfies, almost surely,

On(θ) = minθ∈Θ∗

On(θ) where On(θ) =1n

n∑

t=1

e2t (θ).

We denote by (αX(k))k∈N∗ the sequence of strong mixing coefficients of any process (Xt)t∈Z. We considerthe following assumption.

Assumption 1 (Xt) is strictly stationary, satisfies the ARMA(p, q) model (2.1), E|Xt|4+2ν < ∞ and∑∞k=0{αX(k)}ν/(2+ν) <∞ for some ν > 0.

Francq and Zakoıan (1998) showed that under this assumption, θ is strongly consistent and asymptoticallynormal:

θ −→ θ0 a.s., and√n(θ − θ0)

d; N (0, J−1IJ−1), as n→∞, (3.1)

where I = I(θ0) and J = J(θ0) with

I(θ) = limn→∞ var

{√n∂

∂θOn(θ)

}and J(θ) = lim

n→∞∂2

∂θ∂θ′On(θ) a.s.

Notice that Assumption 1 does not require independence of the noise, nor the fact that it is a martingaledifference. The mixing condition is valid for large classes of processes (see Pham (1986), Carrasco and Chen(2002)) for which the (stronger) condition of β-mixing with exponential decay can be shown. However, thiscondition could be replaced by any weak dependence condition allowing to apply a central limit theorem.The moment condition is mild since EX4

t <∞ is required for the existence of I and J .Before turning to the residuals autocovariances, it will be useful to consider the joint asymptotic

behaviour of the estimator θ and sample autocovariances of the (non-observed) noise εt(θ0).

3.2 Joint distribution of θ and the noise empirical autocovariances

Let, for ` ≥ 0,

γ(`) =1n

n−∑

t=1

εtεt+` and ρ(`) =γ(`)γ(0)

denote the white noise “empirical” autocovariances and autocorrelations. Notice that these quantities arenot statistics (unless if p = q = 0) since they depend on the unknown parameter θ0. They are introducedas a device to facilitate forthcoming derivations.

5

For any fixed m ≥ 1, let γm = (γ(1), . . . , γ(m))′ and ρm = (ρ(1), . . . , ρ(m))′ and let

Γ(`, `′) =∞∑

h=−∞E(εtεt+`εt+hεt+h+`′),

for (`, `′) 6= (0, 0). Remark that Γ(`, `′) = Γ(`′, `) = Γ(`,−`′). The existence of Γ(`, `′) is justified in theappendix (see Lemma A.1).

A few additional notations are required. We denote by φ∗i and ψ∗i the coefficients defined by

φ−1(z) =∞∑

i=0

φ∗i zi, ψ−1(z) =

∞∑

i=0

ψ∗i zi, |z| ≤ 1

for i ≥ 0. Take φ∗i = ψ∗i = 0 when i < 0. For m,m′ = 1, . . . ,∞, let Γm,m′ = (Γ(`, `′))1≤`≤m,1≤`′≤m′ . Letλi = (−φ∗i−1, . . . ,−φ∗i−p, ψ

∗i−1, . . . , ψ

∗i−q)

′ ∈ Rp+q and let the (p+ q)×m matrix

Λm = (λ1 λ2 · · · λm) =

−1 −φ∗1 · · · −φ∗m−1

0 −1. . .

......

. . .0 · · · −1 −φ∗1 · · · −φ∗m−p

1 ψ∗1 · · · ψ∗p · · · ψ∗m−1

0 1. . .

......

. . .0 · · · 1 ψ∗1 · · · ψ∗m−q

. (3.2)

It will be convenient to denote the matrix∑+∞

i=1 λiλ′i by Λ∞Λ′∞. This matrix is well defined because the

components of the λi, defined through the series expansions of φ−1 and ψ−1, decrease exponentially fastto zero as i goes to infinity. Similarly we define Λ∞Γ∞,∞Λ′∞ =

∑+∞`,`′=1 λ`Γ(`, `′)λ′`′ and Γm,∞Λ′∞ =∑+∞

`′=1

∑m`=1 Γ(`, `′)λ′`′ . We will show in the appendix (Lemma A.1) that |Γ(`, `′)| ≤ Kmax(`, `′) for some

constant K, which is sufficient to ensure the existence of these matrices.

Theorem 3.1 Assume p > 0 or q > 0. Under Assumption 1,√n(θ − θ0, γm)′ d

; N (0,Σθ,γm), where

Σθ,γm=

{σ2Λ∞Λ′∞}−1Λ∞Γ∞,∞Λ′∞{σ2Λ∞Λ′∞}−1 −{σ2Λ∞Λ′∞}−1Λ∞Γ∞,m

−Γm,∞Λ′∞{σ2Λ∞Λ′∞}−1 Γm,m

.

Obviously, the top-left term of this covariance matrix is the matrix J−1IJ−1 of (3.1). In this theorem itis explicitly given in terms of the AR and MA polynomials (through the matrix Λ∞), and in terms of thenoise variance and fourth-order structure (through the matrix Γ∞,∞).

3.3 Limiting distribution of residual autocorrelations

We now turn to the residuals. Let εt = et(θ) when p > 0 or q > 0, and let εt = εt = Xt when p = q = 0.The residuals autocovariances and autocorrelations are defined by

γ(`) =1n

n−∑

t=1

εtεt+` and ρ(`) =γ(`)γ(0)

. (3.3)

Let ρm = (ρ(1), . . . , ρ(m))′. Let R(`, `′) = Γ(`, `′)/σ4 and Ri,j = (R(`, `′))1≤`≤i,1≤`′≤j for i, j = 1, . . . ,∞.

6

Theorem 3.2 Under Assumption 1,√nρm

d; N (0,Σρm), where

Σρm = Rm,m when p = q = 0,

and, when p > 0 or q > 0,

Σρm = Rm,m + Λ′m{Λ∞Λ′∞}−1Λ∞R∞,∞Λ′∞{Λ∞Λ′∞}−1Λm

−Λ′m{Λ∞Λ′∞}−1Λ∞R∞,m −Rm,∞Λ′∞{Λ∞Λ′∞}−1Λm.

To validate an ARMA(p, q) model, the most basic technique is to examine the autocorrelation func-tion of the residuals. Theorem 3.2 can be used to obtain asymptotic significance limits for the residualautocorrelations.

Example 3.1 Romano and Thombs (1996) derived the asymptotic distribution of a vector of empiricalautocorrelations for processes satisfying Assumption 1. As an example they considered the weak whitenoise εt = ηtηt−1, where ηt is iid N (0, 1). Applying the first part of Theorem 3.2 to this weak white noise,we find that Varas {

√nρ(1)} = R1,1 = Γ1,1 = 3, which is the result obtained by Romano and Thombs. For

a strong white noise, recall that Varas {√nρ(1)} = 1.

Example 3.2 We now consider the MA(1) model Xt = εt − bεt−1, where (εt) is the weak white noisedefined in Example 3.1. We have λi = ψ∗i−1 = bi−1 and R(i + 1, i + 1) = 1 for all i ≥ 1, R(1, 1) = 3, andR(i, j) = 0 when i 6= j. Thus, Λ∞Λ′∞ = 1/(1− b2), Λ∞R∞,∞Λ′∞ = 3+ b2/(1− b2), Λ∞R∞,1 = 3. Applyingthe second part of Theorem 3.2 to the weak MA(1) model, we find that Varas {

√nρ(1)} = b2(1 + 2b2). For

a strong MA(1) model Xt = ηt − bηt−1, with ηt iid (0, 1), we find that Varas {√nρ(1)} = b2, which is the

result obtained by Box and Pierce (1970).

The previous example shows that the significance limits of the residual autocorrelations can be very differentfor ARMA models with iid noise and ARMA models with a noise that is only uncorrelated (see also Figure1 and 2 below). Of course, in practice Σρm has to be estimated (see Section 5).

Remark 3.1 It is clear from Theorem 3.2 that, for a given ARMA model, the asymptotic distribution ofthe residual autocorrelations depends on the noise distribution through the quantities Γ(`, `′) only. It is alsoworth noting that this asymptotic distribution depends on the ARMA parameters through the vectors λiλ

′j

only. So the asymptotic distribution of√nρm will be the same for the AR(p) model: φ(B)Xt = εt, and for

the MA(p) model: Xt = φ(B)εt. In the former case we have λi = (−φ∗i−1, . . . ,−φ∗i−p) and in the latter caseλi = (φ∗i−1, . . . , φ

∗i−p).

3.4 Limiting distribution of the portmanteau statistics

Portmanteau statistics, as defined in (1.1) and (3.3), are generally used to test the null hypothesis

H0: (Xt) satisfies an ARMA(p, q) representation 5

against the alternative

H1: (Xt) does not admit an ARMA representation, or admits an ARMA (p′, q′) representation withp′ > p or q′ > q.

5A more formal notation and formulation of the null hypothesis is as follows:

H0(p, q): there exist polynomials φ and ψ of orders p and q, with roots outside the unit disk and no common root, suchthat

�ψ−1(L)φ(L)Xt

�t

is a centered and uncorrelated stationary sequence,

where L denotes the standard lag operator.

7

Under H0, the residuals should resemble to a realization of a weak white noise. In other words, if thefitted ARMA(p, q) model is appropriate, the residuals autocorrelations should be close to zero, and theportmanteau statistics should not take too large values. Consequently, the adequacy of the ARMA modelwith (canonical) orders (p, q) is usually rejected when the portmanteau statistics take too large values. Ofcourse, portmanteau statistics, as any statistic based on a finite number m of autocorrelations, are unableto detect all departures from H0. Practitioners generally use several values of m (in the ARIMA procedureof SAS, by default the LB portmanteau tests are provided for m = 6, 12, 18 and 24).

It is not possible to derive the asymptotic distribution of Qm and Qm under H0 only. The standard limitdistribution of those statistics, and hence the standard tests, is obtained under the stronger assumptionthat (εt) is iid. The main result of this paper is a direct consequence of Theorem 3.2. It gives the exactasymptotic distribution of the standard portmanteau statistics under weak dependence assumptions.

Theorem 3.3 Let Assumption 1, in particular H0, hold. Then the statistics Qm and Qm defined in (1.1)converge in distribution, as n→∞, to

Zm(ξm) :=m∑

i=1

ξi,mZ2i

where ξm = (ξ1,m, . . . , ξm,m)′ is the eigenvalues vector of the matrix Σρm defined in Theorem 3.2, andZ1, . . . , Zm are independent N (0, 1) variables.

Consider the strong ARMA case, i.e. the case when (εt) in (2.1) is iid. In this case, for `, `′ > 0,Γ(`, `′) = 0 when ` 6= `′ and Γ(`, `) = σ4. Thus Rm,m = Im, where Im denotes the m×m identity matrix,and Λ∞R∞,m = Λm. We deduce that

Σρm = Im − Λ′m{Λ∞Λ′∞}−1Λm, (3.4)

which is the result obtained by McLeod (1978). When m is large, Σρm ' Im − Λ′m{ΛmΛ′m}−1Λm is closeto a projection matrix with m− (p+ q) eigenvalues equal to 1, and (p+ q) eigenvalues equal to 0, and weretrieve the result given in Box and Pierce (1970). Therefore, in the strong ARMA case, the asymptoticdistribution of the BP and LB statistics can be approximated by a χ2

m−(p+q) distribution. In the nextsection, we will see that the approximation in no longer valid when the innovations (εt) are not iid.

Remark 3.2 The distribution of the quadratic form Zm(ξm) can be computed using the algorithm by Imhof(1961). An alternative approach would consist in modifying the standard statistics. This approach wasfollowed by Lobato (2001) for testing that a dependent process is uncorrelated. It seems natural to considerthe statistic nρ′mΣ−1

ρmρm for some consistent estimate Σρm of Σρm. Indeed it is clear that Theorem 3.2

impliesnρ′mΣ−1

ρmρm

n→∞; χ2

m

provided Σρm is non singular. Unfortunately, when p+q 6= 0, the matrix Σρm may be singular: for examplewhen, an AR(1) model is fitted to a strong white noise (see Corollary 4.1 below with a = 0). In other casesΣρm may be close to be singular: for strong ARMA(p, q) models we have already seen that, when m is large,p+ q eigenvalues of Σρm are very close to 0. For these reasons we do not follow this approach.

Remark 3.3 It can be shown that the previous results remain valid when the moment and mixing assump-tions are made on the noise rather than on the observed process. Namely, Assumption 1 can be replacedbyAssumption 1’ (Xt)t∈Z is strictly stationary, satisfies the ARMA(p, q) model (2.1), E|εt|4+2ν < ∞ and∑∞

k=0{αε(k)}ν/(2+ν) <∞ for some ν > 0.Note however that Assumptions 1 and 1’ are not equivalent.

8

4 Examples

In the first example, the results of the previous section are particularized in the AR(1) and MA(1) cases.The second example concerns an ARMA model with martingale difference innovations. The third exampleis an ARMA model with innovations that are only uncorrelated.

4.1 AR(1) or MA(1)

In the AR(1) and MA(1) cases Theorem 3.2 takes a simpler form. For instance in the AR(1) case we have,with a = a1,

λi = −ai−1, Λi = −(1, a, . . . , ai−1), Λ∞Λ′∞ =1

1− a2. (4.1)

The next result follows.

Corollary 4.1 Let p = 1, q = 0. Under Assumption 1,√nρm

d; N (0,Σρm), where, for i, j = 1, . . . ,m

Σρm(i, j) = limn→∞nCov {ρ(i), ρ(j)}

= R(i, j) + (1− a2)2ai+j−2∞∑

`,`′=1

a`+`′−2R(`, `′)

−(1− a2)∞∑

`=1

{a`+i−2R(`, j) + a`+j−2R(`, i)

}.

When p = 0, q = 1 the same result holds with a replaced by b = b1.

4.2 ARMA models with GARCH errors

Assume that in the ARMA(p, q) equation (2.1), the innovation process (εt) is a GARCH(P,Q) process,solution of the model

εt = σtηt

σ2t = ω +

Q∑

i=1

αiε2t−i +

P∑

j=1

βjσ2t−j

(4.2)

where (ηt) is a sequence of iid centered variables with unit variance, the αi and βj are nonnegative constantsand ω is a positive constant. Under the assumption that

∑Qi=1 αi +

∑Pj=1 βj < 1, there exists a unique

strictly stationary and nonanticipative solution process (εt) and this process has a finite variance. Undermore restrictive conditions on the coefficients and if E(η4

t ) < ∞, then E(ε4t ) < ∞ (see e.g. Ling andMcAleer 2002). If we further assume that the variables ηt have a symmetric distribution, then it can beshown that for `, `′ > 0,

E(εtεt+`εt+hεt+h+`′) =

E(ε2t ε2t+`) if h = 0 and ` = `′

0 otherwise.

Hence Γ(`, `′) = 0 when ` 6= `′ and Γ(`, `) = E(ε2t ε2t+`).

To compute the autocovariance structure of (ε2t ) we can use the fact that (ε2t ) is solution of theARMA(P ∧Q,P ) model

ε2t = ω +P∧Q∑

i=1

(αi + βi)ε2t−i + νt −P∑

j=1

βjν2t−j (4.3)

where P ∧Q = max(P,Q) and νt = ε2t − σ2t is the innovation of ε2t .

9

For example consider the case of an ARCH(1), i.e. P = 0, Q = 1. Let κ = E(η4t ) and assume κ > 1

and 0 < κα21 < 1. We have,

E(ε2t ) = σ2 =ω

1− α1

Var (ε2t ) = γε2(0) = ω2

{(α1 + 1)κ

(1− κα21)(1− α1)

− 1(1− α1)2

}= σ4 κ− 1

1− κα21

Cov (ε2t , ε2t+`) = γε2(`) = α`

1γε2(0), ` ≥ 0

Γ(`, `) = σ4

{1 +

(κ− 1)α`1

1− κα21

}.

It follows that the matrix Rm,m of Theorem 3.2 is given by

Rm,m = Im +(κ− 1)1− κα2

1

diag(α1, α21, . . . α

m1 ) := Im + Rm,m.

For simplicity, let us consider the case of an AR(1) process (Xt), endowed with a sequence of ARCH(1)errors (εt). Denote by Σ0

ρmthe asymptotic variance obtained in (3.4) for the strong AR(1), i.e. when

α1 = 0. We have

Σ0ρm

= Im − (1− a2)

1a...

am−1

(

1 a . . . am−1). (4.4)

In view of (4.1) and the above expression for Rm,m

Λ∞R∞,∞Λ′∞ =α1(κ− 1)

(1− a2α1)(1− κα21), Λ′mΛ∞R∞,m =

(αj

1ai+j−2 κ− 1

1− κα21

),

from which we deduce

Σρm = Σ0ρm

+(κ− 1)1− κα2

1

(αi

11li=j + (1− a2)ai+j−2

{α1(1− a2)1− α1a2

− αi1 − αj

1

})

i,j

. (4.5)

The square roots of the diagonal terms of Σρm correspond to the asymptotic standard deviations of theresidual autocorrelations (multiplied by

√n). Figure 1 displays these asymptotic standard deviations for

AR(1) models with ARCH(1) innovations. These quantities can be very different from those of strong ARmodels (which are close to 1 when the lag is large enough).

For instance when m = 2, κ = 3 and a = 0.5 we have:

Σρ2 eigenvalues ξ2,1, ξ2,2 Z2(ξ2)

α1 = 0(

0.25 −0.375−0.375 0.8125

)(0.0625, 1) χ2

1 + 0.0625χ21

α1 = 0.5(

1.6786 −0.9107−0.9107 2.9196

)(0.1260, 3.4008) 0.13χ2

1 + 3.40χ21

It is clear that for α1 = 0.5, the BP approximation by a χ21 distribution will be disastrous. More precisely,

with the LB portmanteau test, applied to residuals, the adequacy of the AR(1) model is rejected at levelα = 5% if

Qm > χ2m−1(0.95). (4.6)

Table 1 gives the true asymptotic level of the test defined by the rejection region (4.6), for several valuesof m and a.6 As expected, the true asymptotic size of the test is far from the 5% nominal level when theARCH coefficient α1 6= 0. We also note that the difference between the true level and the nominal level

10

Table 1: Exact asymptotic level (in %), corresponding to the LB 5% nominal level test (4.6), for the AR(1) model:

Xt = aXt−1 + εt, with ARCH(1) innovations εt =√ω + α1ε2t−1ηt, (ηt) iid N (0, 1). The nominal level is obtained

from the χ2m−1 approximation, where m is the number of autocorrelations used in the statistics.

m (a, α1)(0.0,0.0) (0.0,0.2) (0.5,0.2) (0.9,0.2) (0.9,0.4)

2 5.00 6.06 7.17 15.12 29.173 5.00 5.84 6.62 10.97 23.834 5.00 5.68 6.29 9.35 20.9712 5.00 5.32 5.60 6.53 13.0424 5.00 5.20 5.39 5.91 10.1436 5.00 5.16 5.30 5.71 8.96

increases when a → 1, i.e. when the DGP is close to a unit root model. The case a = 0 (a pure ARCHseries) was studied by Diebold (1986).

4.3 An AR(1) model with weak innovations

Markov switching models have been found to be appropriate for series displaying sudden breaks, modelledthrough the change of state of a finite state space Markov chain (∆t). Following Hamilton (1989) andHamilton and Susmel (1994) several authors have considered ARMA models in which the noise is subjectto Markov-Switching. For instance, consider the AR(1) model with Markov-switching innovations

Xt = aXt−1 + εt, εt ={ηt + cηt−1 if ∆t = 0,ηt − cηt−1 if ∆t = 1,

(4.7)

where |a| < 1, c ∈ R, (ηt) iid N (0, 1), (∆t) is a stationary Markov chain independent of (ηt), withstate space {0, 1} and transition probabilities 0 < p := P (∆t = 1|∆t−1 = 0) = P (∆t = 0|∆t−1 = 1) < 1. Itis not difficult to see that (εt) is a white noise, of variance σ2 = 1 + c2, but its fourth-order cross-momentsstructure is more complicated than in the previous example. From Broze, Francq and Zakoıan (2002), wehave

Γ(1, 1) = 1 + 4c2 + c4 + 4c2(1− 2p) +c2(1− 2p)2

p,

Γ(`, `) = (1 + c2)2 + 2c2(1− 2p)`, ∀` ≥ 2,Γ(`, `+ 1) = 0, ∀` ≥ 1,Γ(`, `+ 2) = c2(1− 2p)`+1, ∀` ≥ 1,

Γ(`, `′) = 0, ∀` ≥ 1, `′ > `+ 2.

The numerical computation of Σρm follows.Table 2 displays the exact asymptotic level of the LB test defined by (4.6), for c = 1 and several values

of a, p and m. This table shows that, for this weak AR(1) model, the standard χ2m−1 approximation is

not a valid approximation of the asymptotic distribution of Qm, even when m is large or/and a is closeto zero. The most dramatic differences between the true asymptotic level and the nominal 5% level areobserved when p is close to 0. In this case, the noise is likely to stay in a same regime, the first MA(1)regime εt = ηt + ηt−1 or the second MA(1) regime εt = ηt − ηt−1, for a very long time, before switchingto the other regime. Consequently, the AR(1) model (4.7) is likely to be confused with an ARMA(1,1)

6The level was obtained from Theorem 3.3, computing the eigenvalues of the matrix given by (4.5) and using the algorithmby Imhof (1961).

11

model, which could explain the bad performances of any adequacy test in this situation. Concerning theLB portmanteau test, the high probability of incorrectly rejecting the AR(1) model is explained by thefact that, when p is small, most of the eigenvalues of Σρm are much larger than 1. Amazingly, the exactasymptotic level does not decrease with m, and is not even monotonous. Note also that for certain valuesof the parameters, the true asymptotic level can be less than the nominal 5% level. The Monte-Carloexperiments made in Section 6.1 below are in agreement with the values of this array.

Table 2: Exact asymptotic level (in %), corresponding to the LB 5% nominal level test (4.6), for the AR(1)model (4.7) with c = 1. The nominal level is obtained from the χ2

m−1 approximation, where m is the number ofautocorrelations used in the statistics.

m (a,p)(0.0,0.95) (0.6,0.95) (0.6,0.05) (0.9,0.5) (0.99,0.95)

2 9.82 8.45 25.29 14.61 16.923 5.84 4.50 24.83 10.50 9.744 7.85 6.78 24.60 8.96 11.2312 6.47 5.91 23.43 6.36 7.7224 5.84 5.47 20.24 5.80 6.4136 5.60 5.31 17.51 5.62 5.89

In ARMA modelling, the validation stage not only consists in using portmanteau tests. Another basictechnique is to examine the residual autocorrelations. Figure 2 is the same as Figure 1, but for weak AR(1)models of the form (4.7). The right-hand graph shows an oscillatory behaviour of the significance bandswhich is impossible to obtain with a strong AR(1). It is also seen that the autocorrelations for odd h havea smaller asymptotic variance than the same autocorrelations for the strong AR(1) model.

5 Modifying the standard tests

To be able to use the statistics Qm and Qm under more general assumptions on the noise process thanthose of independence or martingale difference, it is clear from the previous sections that the standardportmanteau tests deserve correction.

5.1 Modified portmanteau tests

First assume that a given consistent estimator Σρm of Σρm is available. Denote by ξm = (ξm,1, . . . , ξm,m)′

the eigenvalues vector of Σρm . The next result, which is a straightforward consequence of Theorem 3.3,provides a critical region of asymptotic level α ∈]0, 1[.

Theorem 5.1 Let Assumption 1, in particular H0, hold. Then

limn→∞P

{Qm > zm(1− α)

}= lim

n→∞P {Qm > zm(1− α)} = α, (5.1)

where zm(1− α) is such that: P{

Zm(ξm) > zm(1− α)}

= α.

This test is not consistent against any form of the alternative H1, just as standard portmanteau testsare not consistent against all violations of the strong ARMA(p, q) assumption. We illustrate the influenceof m on the consistency of the portmanteau tests, by means of a simple example.

Example 5.1 Let us consider the MA(1) DGP Xt = εt − bεt−1, where b 6= 0 and (εt) is an ergodic weakwhite noise with variance σ2. Assume that the fitted model is an AR(1). The least squares estimator of the

12

AR coefficient is the first-order empirical autocorrelation of X, namely a = ρX(1). By the ergodic theorem,a converges a.s. to the first-order autocorrelation ρX(1) of X, and it is easily seen that et(a) = Xt− aXt−1

converges a.s. to

Zt = Xt − ρX(1)Xt−1 = Xt − b

1 + b2Xt−1.

Consequently, for k > 0, a.s.

ρ(k) → Corr(Zt, Zt−k) ={1 + ρ2

X(1)}ρX(k)− ρX(1) {ρX(k − 1) + ρX(k + 1)}

=

0 when k > 2,−ρ2

X(1) when k = 2,ρ3

X(1) when k = 1.

Therefore for any m ≥ 1, limn→∞ Qm = limn→∞Qm ≥ limn→∞ nρ6X(1) = ∞, a.s., which proves that the

tests are consistent:

limn→∞P

{Qm > zm(1− α)

}= 1 and lim

n→∞P {Qm > zm(1− α)} = 1,

for any m ≥ 1. If instead of a MA(1), the DGP is MA(12), similar computations show that the tests areconsistent for m ≥ 12 only.

This example might lead to think that large values of m are recommended in portmanteau tests. Thisis of course not the case since it is well known that choosing m too large would alter the power in finitesamples. This point is not particular to our approach, and it also holds for the standard tests.

Evaluation of the critical value zm(1−α) can be done by using the Imhof algorithm. To implement themodified test, we have to estimate the eigenvalues of Σρm . There are two main approaches for estimatingthis matrix. A model-dependent method assumes a model for the noise (e.g. iid, GARCH, Markov-switching) such that Σρm can be explicitly computed (as in (4.5) for instance). Then this model is fittedto the residuals, and a plug-in approach leads, under appropriate assumptions, to a consistent estimatorΣρm . The second approach, which we will use, is a fully nonparametric one in which no specific time seriesmodel is assumed for the noise.

Nonparametric estimators of Γm,m = σ4Rm,m rely on the fact that this matrix can be interpreted asthe spectral density at frequency zero of a multivariate process (Υt). Kernel-based procedures, as proposedby e.g. Newey and West (1987), Andrews (1991), Andrews and Monahan (1992), use a weighted sum ofautocovariances to estimate Γm,m, where the weights are determined by the kernel and the bandwidthparameter. Another estimator of Γm,m, which we will consider in the next section, is obtained by fittingto (Υt) an auxiliary AR model of large order, and by taking an empirical estimate of the spectral densityof the auxiliary model.

5.2 Estimation of Γm,m

Note that Γm,m =∑+∞

h=−∞Cov (Υt,Υt+h), where

Υt = εt(εt+1, · · · , εt+m)′. (5.2)

The stationary process (Υt) admits the Wold decomposition

Υt = ut +∞∑

i=1

Biut−i,

where (ut) is a m-variate weak white noise. Assume that the covariance matrix Σu,u := Var(ut) is non-singular, that

∑∞i=1 ‖Bi‖ <∞, where ‖ · ‖ denotes any norm on the space of the real m×m matrices, and

that det(Im +

∑∞i=1Biz

i) 6= 0 if |z| ≤ 1. Then (Υt) admits an AR(∞) representation of the form

A(B)Υt := Υt −∞∑

i=1

AiΥt−i = ut, (5.3)

13

such that∑∞

i=1 ‖Ai‖ < ∞ and det(A(z)) 6= 0 if |z| ≤ 1. Interpreting (2π)−1Γm,m as the spectral densityof (Υt) evaluated at the frequency zero (see Brockwell and Davis (1991) p. 459), we obtain

Γm,m = A(1)−1Σu,uA′(1)−1. (5.4)

In the framework of univariate linear processes with independent innovations, Berk (1974) showed that thespectral density can be consistently estimated by fitting autoregressive models of order r = r(n), wheneverr → ∞ and r3/n → 0 as n → ∞. We will show that this result remains valid for the multivariate linearprocess (Υt), though its innovation (ut) is not an independent process. Another difference with Berk(1974), is that the construction of the estimator requires replacing the non observable vector Υt by thestatistic Υt = εt(εt+1, · · · , εt+m)′.

We make the following additional assumptions.

Assumption 2 The innovation process (εt) of the ARMA(p, q) model (2.1) is such that the process (Υt)defined by (5.2) admits the AR(∞) representation (5.3), where ‖Ai‖ = o

(i−2)

as i → ∞, the roots ofdet(A(z)) = 0 are outside the unit disk, and Σu,u = Var(ut) is non-singular. Moreover, E|εt|8+4ν <∞ and∑∞

k=0{αε(k)}ν/(2+ν) <∞ for some ν > 0.

Note that the moment condition required by Berk (1974), for the univariate linear AR(∞) model (5.3)with (ut) iid, was EΥ4

t . By comparison, our moment condition on (εt) entails that E‖Υt‖4+2ν <∞. If themixing coefficients tend to zero at an exponential rate, which is the case for a wide class of processes, νcan be chosen arbitrarily small. Therefore our moment assumption is only slightly stronger than the onemade by Berk.

Let Ar(z) = Im −∑ri=1 Ar,iz

i, where Ar,1 · · · Ar,r are the coefficients of the multivariate regression ofΥt on Υt−1, . . . , Υt−r for t = 1, . . . , n −m (by convention Υt = 0 when t ≤ 0). We denote by Σur,ur theempirical estimator of the variance of the error term of this regression. The matrices Ar,1, . . . , Ar,r andΣur,ur may be computed recursively using the algorithm given by Whittle (1963).

We are now in a position to state the main result of this section.

Theorem 5.2 Under Assumptions 1 and 2,

Γm,m = Ar(1)−1Σur,urA′r(1)−1 → Γm,m = A(1)−1Σu,uA′(1)−1

in probability when r = r(n) →∞ and r3/n→ 0 as n→∞.

The proof of Theorem 5.2 is given in the appendix.

5.3 Estimation of Σρm

In the asymptotic covariance matrix involved in Theorem 3.2, the infinite-dimensional matrices R∞,∞ andR∞,m have to be approximated by finite-dimensional ones. We therefore consider the matrix

ΣMρm

= Rm,m + Λ′m{Λ∞Λ′∞}−1ΛMRM,MΛ′M{Λ∞Λ′∞}−1Λm

−Λ′m{Λ∞Λ′∞}−1ΛMRM,m −Rm,MΛ′M{Λ∞Λ′∞}−1Λm (5.5)

for some M ≥ m. Write

Λ∞ = (ΛM ,ΛM+), R∞,m =(

RM,m

RM+,m

), R∞,∞ =

(RM,M RM,M+

RM+,M RM+,M+

).

We have

Σρm − ΣMρm

= Λ′m{Λ∞Λ′∞}−1{Λ∞R∞,∞Λ′∞ − ΛMRM,MΛ′M}{Λ∞Λ′∞}−1Λm

−Λ′m{Λ∞Λ′∞}−1{Λ∞R∞,m − ΛMRM,m}−{Rm,∞Λ′∞ −Rm,MΛ′M}{Λ∞Λ′∞}−1Λm (5.6)

14

For any multiplicative norm, the norm of the first term in the right-hand side is bounded by

‖Λ′m{Λ∞Λ′∞}−1‖‖Λ∞R∞,∞Λ′∞ − ΛMRM,MΛ′M‖‖{Λ∞Λ′∞}−1Λm‖≤ K

(‖ΛM+RM+,MΛ′M‖+ ‖ΛMRM,M+Λ′M+‖+ ‖ΛM+RM+,M+Λ′M+‖). (5.7)

We have φ∗i = O(ρi) and ψ∗i = O(ρi) for some ρ ∈ [0, 1[. Consequently, any element of the j-th column ofΛM is of order O(ρj), and any element of the j-th column of ΛM+ is of order O(ρM+j). Arguing as in theproof of Lemmas A.1 and A.3, it can be shown that sup`,`′>0 |R(`, `′)| < ∞. It follows that any elementof RM+,MΛ′M is bounded uniformly in M , and that any element of ΛM+RM+,MΛ′M is bounded by KρM ,where K is independent of M . Thus the first norm in the right-hand side of (5.7) is of order O(ρM ). Thesame bound is obtained for the two other norms. The other terms in the right-hand side of (5.6) can behandled in the same way. Thus

‖Σρm − ΣMρm‖ = O(ρM ).

Now note that Λm = Λm(θ) and Λ∞ = Λ∞(θ) are continuous functions of θ . Therefore Λm = Λm(θ) andΛ∞ = Λ∞(θ) are consistent estimates of Λm and Λ∞ defined in (3.2). The following algorithm was used inthe numerical experiments presented below.Step 1: Fit an ARMA(p, q) model (2.1) to the time series X1, . . . , Xn. Let θ be the least squares estimateof the θ, let εt, t = 1, . . . , n be the residuals and let σ2 = n−1

∑nt=1 ε

2t be an estimator of σ2. Let Λk = Λk(θ)

for k ∈ {m,M,∞}.Step 2: Given a small number τ > 0, define an integer M ≥ m such that ‖ΛM+(θ)‖ ≤ τ . If for instance(p, q) = (1, 0), then, by (4.1), ‖ΛM+(θ)‖ = ‖(aM

1 , aM+11 , . . . )‖ =

{a2M

1 /(1− a21)}1/2, and we can therefore

choose M = max{m, [{ln τ2(1− a21)}/ ln a2

1] + 1}, where [·] denotes the integer part.Step 3: For r = 0, . . . , rmax, fit aM -variate AR(r) model to Υ1, . . . , Υn−M , where Υt = (εtεt+1, . . . , εtεt+M )′.This can be done very rapidly using the multivariate version of the Durbin-Levinson algorithm (see Brock-well and Davis (1991), p. 422). An order r0 ∈ {0, . . . , rmax} can be selected by minimizing an informationcriterion. In view of Theorem 5.2, we define

ΓM,M = Ar0(1)−1Σur0 ,ur0A′r0

(1)−1

and we denote by Γm,m and Γm,M the corresponding sub-matrices of ΓM,M .Step 4: An estimate Σρm of Σρm is obtained by replacing, in (5.5), Ri,j and Λk by σ−4Γi,j and Λk, fori ∈ {m,M}, j ∈ {m,M} and k ∈ {m,M,∞}.

For the numerical illustrations presented in this paper, we used τ2 = 0.1 in Step 2 to select M . Tosave computation time, we also imposed M ≤ m + 20. For the selection of r0, in Step 3, we used theBIC criterion with rmax = 5. In the numerical experiments we made, the AIC criterion led to very similarresults. The parameters τ2 and rmax have influence on the time performance. Experiments made on asmall number of replications, lead us to think that the results, presented in Section 6 below, can be slightlyimproved with a smaller τ2 and a higher rmax.

In practice rmax should be chosen as large as possible, and τ2 as small as possible, in function of thecomputer hardware performance.

6 Monte Carlo results

In this section, we investigate the finite-sample properties of the test introduced in this paper. For il-lustrative purpose, we also consider the standard LB portmanteau test, and the recent test proposed byHong and Lee (2003) (HL hereafter). Other tests, already mentioned, could have been considered as well,but our goal is not to make a comparative study. Although the three tests considered were developed forchecking the hypothesis of serial correlation of the noise in time series models, it is important to point outthat their asymptotic properties were derived under very different hypotheses. The null hypotheses testedare, for our test, H0 (as defined in Section 3.4), for the HL test,

15

H ′0: (Xt) satisfies a semi-strong ARMA(p, q) model,

and for the standard LB test,

H ′′0 : (Xt) satisfies a strong ARMA(p, q) model.

These hypotheses are the main assumptions under which the asymptotic levels of these tests are valid but,of course, other regularity conditions (such as our Assumption 1’) are required. To our knowledge none ofthe tests of the literature are really designed for H0.

Monte Carlo experiments have been conducted for the AR(1)-ARCH(1) and the Markov-switchingmodels considered in Section 4.

6.1 Empirical size of the modified LB test

First we generate N = 1, 000 replications of size n = 1, 000 of the AR(1)-ARCH(1) model

Xt = aXt−1 + εt, εt =√

1 + α1ε2t−1ηt, (ηt) iid N (0, 1),

for different values of the parameters a and α1. Recall that this model is a semi-strong AR(1).Concerning the LB test, we compute, for each replication, and for different values of m, a and α1, the

statistic Qm. The model is rejected when Qm is greater than χ2m−1(0.95). This corresponds to a nominal

level α = 5% in the strong case. The rejection frequencies are presented in the first array of Table 3 andare similar to those of Table 1. With N = 1, 000 replications, the standard error of the relative rejectionfrequencies is at most 1.58%. Therefore, the finite-sample sizes of the tests should be at most at ±3.1%from the relative rejection frequencies with a probability greater than 0.95. Comparing Table 1 with thefirst array of Table 3, the asymptotic behaviour of the LB portmanteau test and its behaviour for thesample size n = 1, 000 do not appear significantly different. Most of the relative rejection frequencies areconsiderably greater than the theoretical 5%, and the conclusion drawn from the asymptotic study remainsvalid for this Monte Carlo experiment: the standard test should not be used to test H0.

The HL test statistic is of the form M(m) ={n∑n−1

h=1 k2(h/m)ρ2(h)− C(m)

}/

√D(m) where C(m)

and D(m) are centering and scaling factors, m = mn = cn1/5 and k(·) is the Parzen kernel (other choices ofthe parameters are possible). Under some regularity conditions, M(m) is asymptoticallyN (0, 1) distributedwhen the ρ(h) are the residual autocorrelations of a time series model with martingale difference errors. Inour Monte-Carlo experiment, the test defined by the rejection region {M(m) > 1.645} should therefore beapproximately of size 5%. For N = 1, 000 independent replications of a 5%-level test, the standard errorof the relative rejection frequencies is 0.69% and the latter should vary between 3.65% and 6.35% with anapproximate 0.95 probability. The second array of Table 3 indicates that, in this example, the HL test isslightly undersized when a = 0.9 and considerably more undersized when a = 0.0 or 0.5. Indeed in thelatter case, the only relative rejection frequency between the significance limits is when (a, α1) = (0, 0) andm = 39. This is in agreement with Chen and Deo (2003) who found that the finite sample distributions ofstatistics of the form M(m) are generally heavily skewed and they proposed level corrections.

Now consider the test of the present paper. We reject the model when Qm is greater than zm(0.95) (see(5.1)). The relative rejection frequencies, presented in the third array of Table 3, are almost always withinthe 0.05 significant limits 3.65% and 6.35%. We conclude that, for this set of experiments, the modifiedLB portmanteau test satisfactorily controls the type I error for n = 1, 000. For n = 100 (the results arenot reported here) our test still accurately controls the type I error when m ≤ 12, but the results are lesssatisfactory for m = 24 and m = 36.

The same experiments have been repeated for the Markov-switching model (4.7). Recall that thismodel is only a weak AR(1). The results are presented in Table 4. The first array reports the relativerejection frequencies of the standard LB test, which are similar to the asymptotic probabilities of type Ierror displayed in Table 2. The second array reports the relative rejection frequencies of the HL test. Themost significant output is that, as for the standard LB tests, numerous relative rejection frequencies are

16

very far from 5%. Since the length of the simulated series is n = 10, 000, it is very likely that this poorperformance under H0 is not due to a small-sample effect. This experiment lead us to think that testsdesigned to test H ′

0 should not be used to test H0. The third array concerns the modified version of the LBtest. For (a,p) = (0.6, 0.05) the test has a slight tendency to over-reject. It is probably due to the complexnature of this model when p is close to zero (see Section 4.3). However, most of the relative rejectionfrequencies are satisfactorily close to the theoretical 5%.

6.2 Empirical power

In this section we examine the power of the tests for the null hypothesis of an AR(1) against the ARMA(1,1)-ARCH(1) alternative

Xt = aXt−1 + εt + bεt−1, εt =√

1 + α1ε2t−1ηt, (ηt) iid N (0, 1).

We generate N = 1, 000 replications of size n = 1, 000 of this model and fit an AR(1) for each replication.Since the standard LB test suffers from a serious size distortion, the critical values of the standard LBtest have been adjusted so that the relative rejection frequencies be 5% under the null (i.e. for eachvalue of a and α, we take as critical value the 95%-quantile of the Qm distribution over 1, 000 replicationsof the model with b = 0). Note however that in practice, when the data-generating process (DGP)is unknown, it is not possible to do the adjustment for the type I error. The results are reported inTable 5. It is seen that the tests have high power, except in the case a = 0. The lack of power inthe latter case may be explained by the fact that the MA(1) process is very closed to an AR(1), as canbe seen from its causal representation: Xt = εt + 0.2Xt−1 − 0.04Xt−2 + 0.008Xt−3 + · · · (The causalrepresentation when a = 0.5 is Xt = εt + 0.7Xt−1 − 0.14Xt−2 + 0.028Xt−3 + · · · , and when a = 0.9 it isXt = εt + 1.1Xt−1 − 0.49Xt−2 + 0.368Xt−3 + · · · ). The powers of the three tests is in general decreasingwhen m increases. This is not surprising because, for this DGP, it is mainly in the first autocorrelationsthat the lack of AR(1) structure can be detected. The decrease of the power, as m increases, is moderatefor the HL test which gives decreasing weights to higher lag empirical autocorrelations. Of course for otherDGPs, the high values of m may be helpful to detect departure from the null hypothesis. The modifiedand (adjusted) standard versions of the LB test have very similar powers. This is not surprising becausethe tests are based on the same statistic and reject the null when this statistic is high. In the case a = 0,the HL test is slightly outperformed by the LB tests, especially for small values of m.

7 Standard & Poor’s index

We now consider an application to the daily returns of the Standard & Poor’s 500 index, from January

3, 1979 to December 31, 2001. The length of the series is n = 5804. First, we apply portmanteau tests

for checking the hypothesis that the S&P returns constitute a white noise (the standard economic theory

asserts that such stock indices should be martingale differences, though they are not generally independent

sequences). Table 6 displays the p-values of the standard and modified LB tests. Since the p-values of the

standard test are very small, the strong white noise hypothesis is rejected. The weak white noise hypothesis

is not rejected, since for the modified test, the p-values are far from zero. This is in accordance with other

works devoted to the analysis of stock-market returns (see Lobato, Nankervis and Savin (2001)).

Next, we fit an ARMA(1,1) model to the squares of the S&P returns. Denoting by (Xt) the mean-

corrected series of the squared returns, we obtain the modelXt−0.832Xt−1 = εt−0.727εt−1, Varεt = 0.526×

17

10−6. Table 7 displays the p-values of the standard and modified LB tests. The strong ARMA(1,1) model is

rejected, but a weak ARMA(1,1) model is not rejected. Figure 3 displays the residual autocorrelations and

their 5% significance limits under the strong ARMA(1,1) assumption (left graph) and under the assumption

of weak ARMA(1,1) model (right graph). This figure confirms the conclusions drawn from Table 7.

Note that, in view of (4.3), the first and second-order structures we found for the S&P returns, namely

a weak white noise for the returns and a weak ARMA(1,1) model for the squares of the returns, are

compatible with a GARCH(1,1) model.

8 Conclusion

At the model criticism stage of the Box-Jenkins methodology, residual autocorrelations are plotted and

portmanteau tests are performed in order to detect departures from the hypothesis that the disturbances of

the specified ARMA(p, q) model are uncorrelated. However, the standard significance limits for the residual

autocorrelations are not valid and the portmanteau tests do not well control the Type I error, when the

actual underlying process is an ARMA(p, q) with uncorrelated but non independent disturbances. Thus,

when the standard BP and LB portmanteau tests reject the specified ARMA(p0, q0) model and/or many

residual autocorrelations are outside the non-significance band, the practitioner does not know whether (i)

the ARMA(p0, q0) model is rejected because the disturbances are not uncorrelated, or (ii) the ARMA(p0, q0)

model is rejected because the disturbances are not independent, though they are uncorrelated. The same

problem occurs with all recent extensions of the LB test because, as we have seen, these extensions are

designed for testing the adequacy of models with independent or martingale difference errors. The issue

of distinguishing between weak and (semi-)strong ARMA models is important because, in the case (i)

one should search another ARMA(p, q) model with (p, q) 6= (p0, q0), in the case (ii) the actual underlying

process displays non linearities but the orders of the weak ARMA(p0, q0) representation are well selected.

In this paper we have partially solved the problem. The modifications proposed for the significance limits

and the portmanteau tests lead to reject a given ARMA(p0, q0) model in the case (i) but not in the case

(ii).

A computer program implementing the method is available on request.

Acknowledgements

The authors gratefully acknowledge the quick and careful reading of the manuscript by the Editor, the

Associate Editor and two referees. Their detailed comments led to a greatly improved presentation.

18

Appendix: Proofs and technical results

Proof of Theorem 3.1. A standard expansion of the derivative of On about θ0, taken at θ, yields

0 =∂

∂θOn(θ0) +

{∂2

∂θ∂θ′On(θ∗n)

}(θ − θ0)

where θ∗n is between θ0 and θ. Thus, by standard arguments,

θ − θ0 = J−1Yn +Op(1/n)

where Yn = − 2n

∑nt=1 εt

∂εt∂θ . It is easily seen (McLeod, 1978) that the noise derivatives can be represented

as∂εt(θ)∂θ

= (vt−1(θ), . . . , vt−p(θ), ut−1(θ), . . . , ut−q(θ))′

where

vt(θ) = −φ−1θ (B)εt(θ), ut(θ) = ψ−1

θ (B)εt(θ)

and

εt(θ) = ψ−1θ (B)φθ(B)Xt.

Hence, at θ0∂εt∂θ

=∑

i≥1

εt−iλi.

The asymptotic normality of√n(J−1Yn, γm)′ can be established along the same lines as in Francq and

Zakoıan (1998). The detailed proof is omitted. It is easily shown that for `, `′ ≥ 1,

Cov(√nγ(`),

√nγ(`′)) =

1n

n−∑

t=1

n−`′∑

t′=1

E(εtεt+`εt′εt′+`′)

→ Γ(`, `′) as n→∞,

Cov(√nJ−1Yn,

√nγ(`)) = −J−1 2

n

n∑

t=1

n−∑

t′=1

E(εt∂εt∂θ

εt′εt′+`)

→ −2J−1∞∑

h=−∞E(εt

∂εt∂θ

εt+hεt+h+`)

= −2J−1∞∑

h=−∞

∑

i≥1

E(εtεt−iεt+hεt+h+`)λi

= −2J−1∑

i≥1

Γ(i, `)λi as n→∞,

Varas(√nJ−1Yn) = J−1IJ−1

= 4J−1∑

i,j≥1

Γ(i, j)λiλ′jJ−1,

J = 2σ2∑

i≥1

λiλ′i

from which the asymptotic covariance matrix of Theorem 3.1 can be deduced.

19

2

Proof of Theorem 3.2. We have, for ` = 1, . . . ,m,

γ(`) = γ(`) +1n

n−∑

t=1

(εt+`

∂εt∂θ′

+ εt∂εt+`

∂θ′

)

θ=θ∗n

(θ − θ0) +Op(1/n)

= γ(`) + E

(εt∂εt+`

∂θ′

)

θ=θ0

(θ − θ0) +Op(1/n)

= γ(`) + σ2λ′`(θ − θ0) +Op(1/n)

where θ∗n is between θ0 and θ. Then γm := (γ(1), . . . , γ(m))′ = γm + σ2Λ′m(θ − θ0) + Op(1/n). Hence, by

Theorem 3.1, the asymptotic distribution of√nγm is normal, with mean zero and covariance matrix

Varas(√nγm) = Varas(

√nγm) + σ4Λ′mVaras(

√nθ)Λm

+σ2Λ′mCovas(√nθ,

√nγm) + σ2Covas(

√nγm,

√nθ)Λm.

Finally, we have

n

{γ(`)γ(0)

− γ(`)σ2

}=√nγ(`)

√n{σ2 − γ(0)}σ2γ(0)

so

ρm = γm/σ2 +Op(1/n).

Theorem 3.2 now follows from standard arguments.

2

The proof of Theorem 5.2 is based on a series of lemmas. We first justify the existence of the Γ(`, `′)

in the following result.

Lemma A.1 Under Assumption 1,

∞∑

h=−∞|E(εtεt+`εt+hεt+h+`′)| <∞, for (`, `′) 6= (0, 0).

Proof. Note that, for all h ∈ Z and all (`, `′) 6= (0, 0), E(εtεt+`εt+hεt+h+`′) = Cov (εtεt+`, εt+hεt+h+`′).

Note also that, the assumption made on ψθ0(·) entails

εt =+∞∑

i=0

πiXt−i with π0 = 0 and |πi| ≤ Kρi, 0 < K <∞, ρ ∈]0, 1[.

20

Using the Davydov inequality (1968), we deduce that

∞∑

h=−∞|E(εtεt+`εt+hεt+h+`′)|

≤∞∑

h=−∞

∞∑

i1,i2,i3,i4=0

|πi1πi2πi3πi4 | |Cov (Xt−i1Xt+`−i2 , Xt+h−i3Xt+h+`′−i4)|

≤ K0K4

4∑

k=1

∑

(i1,i2,i3,i4)∈Sk

∞∑

h=−∞ρi1+i2+i3+i4 ‖Xt−i1Xt+`−i2‖2+ν

×‖Xt+h−i3Xt+h+`′−i4‖2+ν {α (Xt−i1Xt+`−i2 , Xt+h−i3Xt+h+`′−i4)}ν

2+ν ,

where ‖X‖p = (EXp)1/p denotes the usual Lp−norm, α(X,Y ) denotes the strong mixing coefficient between

the σ-field generated by the random variable X and that generated by Y , K0 is a universal constant, and

S1 ={(i1, i2, i3, i4) ∈ {0, 1, . . .}4 : i1 ≥ i2 − `, i4 ≤ i3 + `′

},

S2 ={(i1, i2, i3, i4) ∈ {0, 1, . . .}4 : i1 ≥ i2 − `, i4 ≥ i3 + `′

},

S3 ={(i1, i2, i3, i4) ∈ {0, 1, . . .}4 : i1 ≤ i2 − `, i4 ≤ i3 + `′

},

S4 ={(i1, i2, i3, i4) ∈ {0, 1, . . .}4 : i1 ≤ i2 − `, i4 ≥ i3 + `′

}.

For (i1, i2, i3, i4) ∈ S1, we have t− i1 ≤ t+ `− i2, t+ h− i3 ≤ t+ h+ `′ − i4, and

|α (Xt−i1Xt+`−i2 , Xt+h−i3Xt+h+`′−i4)| ≤ αX(h− i3 − `+ i2), ∀h ≥ i3 + `− i2,

|α (Xt−i1Xt+`−i2 , Xt+h−i3Xt+h+`′−i4)| ≤ αX(−i1 − h− `′ + i4), ∀h ≤ i4 − `′ − i1,

|α (Xt−i1Xt+`−i2 , Xt+h−i3Xt+h+`′−i4)| ≤ αX(0) ≤ 1/4,

∀h = i4 − `′ − i1, . . . , i3 + `− i2.

Note also that, by the Holder inequality,

‖Xt−i1Xt+`−i2‖2+ν ≤ ‖Xt‖24+2ν := K1 <∞.

Therefore

∑

(i1,i2,i3,i4)∈S1

∞∑

h=−∞ρi1+i2+i3+i4 ‖Xt−i1Xt+`−i2‖2+ν ‖Xt+h−i3Xt+h+`′−i4‖2+ν

×{α (Xt−i1Xt+`−i2 , Xt+h−i3Xt+h+`′−i4)}ν

2+ν

≤ K21

∑

(i1,i2,i3,i4)∈S1

ρi1+i2+i3+i4

{i3 + `− i2 − i4 + `′ + i1 + 1 +

∞∑

k=0

{αX(k)}ν/(2+ν)

}

< ∞.

Continuing in this way, the conclusion follows.

2

21

We now turn to the estimation of Γm,m given by (5.4). We need to introduce additional notations.

Consider the regression of Υt on Υt−1, . . . ,Υt−r defined by

Υt =r∑

i=1

Ar,iΥt−i + ur,t, ur,t ⊥{Υt−1 · · ·Υt−r} . (A.1)

If Υ1, . . . ,Υn were observed, the least squares estimators of Ar = (Ar,1 · · ·Ar,r) and Σur,ur = Var(ur,t)

would be given by

Ar = ΣΥ,ΥrΣ−1

Υr,Υrand Σur,ur =

1n

n∑

t=1

(Υt − ArΥr,t

)(Υt − ArΥr,t

)′

where Υr,t = (Υ′t−1 · · ·Υ′

t−r)′,

ΣΥ,Υr=

1n− r

n∑

t=1

ΥtΥ′r,t, ΣΥr,Υr

=1

n− r

n∑

t=1

Υr,tΥ′r,t,

with by convention Υt = 0 when t ≤ 0, and assuming ΣΥr,Υris non singular.7

Actually, we just observe X1, . . . , Xn. The residuals εt = et(θ) are then available for t = 1, . . . , n, and

the vectors Υt are available for t = 1, . . . , n−m. We therefore define the estimators

Ar = ΣΥ,ΥrΣ−1

Υr,Υr

and Σur,ur =1n

n−m∑

t=1

(Υt − ArΥr,t

)(Υt − ArΥr,t

)′

where Υr,t = (Υ′t−1 · · · Υ′

t−r)′,

ΣΥ,Υr=

1n− r

n−m∑

t=1

ΥtΥ′r,t, ΣΥr,Υr

=1

n− r

n−m∑

t=1

Υr,tΥ′r,t.

Thus we have Ar(z) = Im −∑ri=1 Ar,iz

i, where Ar =(Ar,1 · · · Ar,r

).

In the sequel we employ the multiplicative matrix norm defined by: ‖A‖ = sup‖x‖≤1 ‖Ax‖ = ρ1/2(A′A),

where A is a d1×d2 matrix, ‖x‖ is the Euclidean norm of the vector x ∈ Rd2 , and ρ(·) denotes the spectral

radius. This norm satisfies

‖A‖2 ≤∑

i,j

a2i,j (A.2)

with obvious notations. This choice of the norm is crucial for the following lemma to hold (with e.g. the

Euclidean norm, this result is not valid). Let

ΣΥ,Υr= EΥtΥ′

r,t, ΣΥ,Υ = EΥtΥ′t, ΣΥr,Υr

= EΥr,tΥ′r,t, ΣΥ,Υ =

1n− r

n−m∑

t=1

ΥtΥ′t.

Lemma A.2 Under Assumption 2,

supr≥1

max{∥∥ΣΥ,Υr

∥∥ , ∥∥ΣΥr,Υr

∥∥ ,∥∥∥Σ−1

Υr,Υr

∥∥∥}≤ ∞.

7This condition means that there exists no λ 6= 0 such that λ′Υt = 0 for t = 1, . . . , n. In view of the non-singularity of

Σu,u, this condition holds true, at least asymptotically.

22

Proof. We readily have

‖ΣΥr,Υrx‖ ≤ ‖ΣΥr+1,Υr+1

(x′, 0′m)′‖ and ‖ΣΥ,Υrx‖ ≤ ‖ΣΥr+1,Υr+1

(0′m, x′)′‖

for any x ∈ Rmr and 0m = (0, . . . , 0)′ ∈ Rm. Therefore

0 < ‖Var (Υt)‖ =∥∥ΣΥ1,Υ1

∥∥ ≤∥∥ΣΥ2,Υ2

∥∥ ≤ · · ·

and∥∥ΣΥ,Υr

∥∥ ≤∥∥∥ΣΥr+1,Υr+1

∥∥∥ .

Let f(λ) be the spectral density of Υt. Because the autocovariance function of Υt is absolutely summable

(clearly, Lemma A.1 holds when Assumption 1 is replaced by Assumption 2), ‖f(λ)‖ is bounded by a

finite constant M , say. Denoting by δ = (δ′1, . . . , δ′r)′ an eigenvector of ΣΥr,Υr

associated with its largest

eigenvalue, such that ‖δ‖ = 1 and δi ∈ Rm for i = 1, . . . , r, we have

∥∥ΣΥr,Υr

∥∥ = ρ1/2(Σ2Υr,Υr

) = ρ(ΣΥr,Υr) = δ′ΣΥr,Υr

δ

=r∑

j,k=1

δ′j

∫ π

−πei(k−j)λf(λ)d(λ)δk ≤ 2πM.

By similar arguments, the smallest eigenvalue of ΣΥr,Υris greater than a positive constant independent of

r. Using the fact that ‖Σ−1Υr,Υr

‖ is equal to the inverse of the smallest eigenvalue of ΣΥr,Υr, the proof is

completed.

2

Lemma A.3 Let ε = (εt) be a sequence of centered and uncorrelated variables, with E|εt|8+4ν < ∞ and∑∞

k=0{αε(k)}ν/(2+ν) <∞ for some ν > 0. Let ε(2)t,` = εtεt+`. Then, for m1,m2 ≥ 1

sups∈Z

∞∑

h=−∞

∣∣∣Cov(ε(2)1,m1

ε(2)1+s,m2

, ε(2)1+h,m1

ε(2)1+h+s,m2

)∣∣∣ <∞.

Proof. Without loss of generality, we can take the supremum over the integers s > 0, and consider the

sum for positive h. Let m0 = m1 ∧m2. We have, using again the Davydov (1968) inequality

∞∑

h=s+m0

∣∣∣Cov(ε(2)1,m1

ε(2)1+s,m2

, ε(2)1+h,m1

ε(2)1+h+s,m2

)∣∣∣ ≤ K0

∞∑

h=s+m0

‖εt‖88+4ν {αε(h− s−m0)}

ν2+ν

which is bounded by a constant independent of s. To deal with the terms obtained for h < s + m0, we

write

Cov(ε(2)1,m1

ε(2)1+s,m2

, ε(2)1+h,m1

ε(2)1+h+s,m2

)= Cov

(ε(2)1,m1

ε(2)1+h,m1

, ε(2)1+s,m2

ε(2)1+h+s,m2

)

+E{ε(2)1,m1

ε(2)1+h,m1

}E{ε(2)1+s,m2

ε(2)1+h+s,m2

}

−E{ε(2)1,m1

ε(2)1+s,m2

}E{ε(2)1+h,m1

ε(2)1+h+s,m2

}(A.3)

23

With the convention αε(k) = 1/4 for k ≤ 0, we have

s+m0−1∑

h=0

∣∣∣Cov(ε(2)1,m1

ε(2)1+h,m1

, ε(2)1+s,m2

ε(2)1+h+s,m2

)∣∣∣

≤ K0

s+m0−1∑

i=0

‖εt‖88+4ν {αε(i+ 1−m0 −m1)}

ν2+ν ,

s+m0−1∑

h=0

∣∣∣E{ε(2)1,m1

ε(2)1+h,m1

}∣∣∣ ≤ K0

s+m0−1∑

h=0

‖εt‖44+2ν {αε(h−m1)}

ν2+ν ,

s+m0−1∑

h=0

∣∣∣E{ε(2)1,m1

ε(2)1+s,m2

}∣∣∣ ≤ (s+m0)K0‖εt‖44+2ν {αε(s−m1)}

ν2+ν .

The right-hand sides of the first two inequalities are clearly bounded by constants independent of s. The

same is true for the right-hand side of the third inequality because it can be shown that suph≥1 h {αε(h)}ν

2+ν <

∞. Finally, the expectations in (A.3) are bounded, in absolute value, by Eε4t . Hence the lemma is proved.

2

Lemma A.4 Under Assumptions 1 and 2,√r‖ΣΥr,Υr

− ΣΥr,Υr‖, √r‖ΣΥ,Υ − ΣΥ,Υ‖, and

√r‖ΣΥ,Υr

−ΣΥ,Υr

‖ tend to zero in probability as n→∞ when r = o(n1/3).

Proof. For 1 ≤ m1,m2 ≤ m and 1 ≤ r1, r2 ≤ r, the element of the {(r1 − 1)m+m1}-th row and

{(r2 − 1)m+m2}-th column of ΣΥr,Υris of the form 1

n−r

∑nt=r0+1 ε

(4)t , where ε(4)

t = ε(4)t (m1,m2, r1, r2) =

εt−r1εt−r1+m1εt−r2εt−r2+m2 and r0 = r1 ∧ r2. By stationarity of(ε(4)t

)t, we have

E

{(1

n− r

n∑

t=r0+1

ε(4)t

)−Eε

(4)t

}2

= E

{1

n− r

n∑

t=r0+1

(ε(4)t − Eε

(4)t )

}2

+(r − r0n− r

)2 (Eε

(4)t

)2

=1

(n− r)2

n−r0−1∑

h=−(n−r0−1)

(n− r0) Cov(ε(4)t , ε

(4)t−h

)+(r − r0n− r

)2 (Eε

(4)t

)2

≤ n

(n− r)2

∞∑

h=−∞

∣∣∣Cov(ε(4)t , ε

(4)t−h)

∣∣∣+(r − r0n− r

)2 (Eε

(4)t

)2,

≤ K

(n

(n− r)2+

r2

(n− r)2

), (A.4)

for some constant K independent of r1, r2,m1,m2 and r, n. The last inequality holds because, by Lemma

A.3,∑∞

h=−∞∣∣∣Cov(ε(4)

t , ε(4)t−h)

∣∣∣ is uniformly bounded in r1, r2. In view of (A.2) and (A.4) we have

E{r‖ΣΥ,Υ − ΣΥ,Υ‖2

}≤ E

{r‖ΣΥ,Υr

− ΣΥ,Υr‖2}

≤ E{r‖ΣΥr,Υr

− ΣΥr,Υr‖2}≤ Km2r3

(n

(n− r)2+

r2

(n− r)2

)= o(1)

24

as n→∞ when r = r(n) = o(n1/3). Hence, when r = o(n1/3)

√r‖ΣΥr,Υr

− ΣΥr,Υr‖ = oP (1),

√r‖ΣΥ,Υ − ΣΥ,Υ‖ = oP (1),

√r‖ΣΥ,Υr

− ΣΥ,Υr‖ = oP (1). (A.5)

Let us introduce additional notations. Recall that εt = et(θ). We will first show that replacing εt

by εt = εt(θ) does not modify the asymptotic behaviour of the estimators. Then we will show that

εt can be replaced εt = εt(θ0). Let ΣΥr,Υrand ΣΥ,Υr

be the matrices obtained by replacing Υt by

Υt = εt(εt+1, . . . , εt+m)′ in ΣΥr,Υrand ΣΥ,Υr

. Similarly, we define ε(4)t and ε

(4)t obtained by replacing the

variables ε by e(θ) and ε(θ), respectively, in ε(4)t . It can be easily shown that, almost surely, there exist

constants K > 0 and ρ ∈]0, 1[ such that, supθ∈Θ∗ |εt(θ)− et(θ)| ≤ Kρt. Hence, we have for t ≥ r0,∣∣∣ε(4)

t − ε(4)t

∣∣∣≤ Kρt−r1 |εt−r1+m1 εt−r2 εt−r2+m2 |+Kρt−r1+m1 |εt−r1 εt−r2 εt−r2+m2 |

+Kρt−r2 |εt−r1 εt−r1+m1 εt−r2+m2 |+Kρt−r2+m2 |εt−r1 εt−r1+m1 εt−r2 |

≤ Kρt−r0{|εt−r1+m1 |+Kρt−r0

}{|εt−r2 |+Kρt−r0}{|εt−r2+m2 |+Kρt−r0

}

+Kρt−r0 |εt−r1 |{|εt−r2 |+Kρt−r0

}{|εt−r2+m2 |+Kρt−r0}

+Kρt−r0 |εt−r1 εt−r1+m1 |{|εt−r2+m2 |+Kρt−r0

}+Kρt−r0 |εt−r1 εt−r1+m1 εt−r2 |

≤ Kρt−r0

K

3ρ3(t−r0) +3∑

i=1

K(3−i)ρ(3−i)(t−r0)∑

t1,...,ti∈Ti

i∏

k=1

|εtk | ,

where Ti = Ti(t, r1, r2,m1,m2) denotes a set of indices t1, . . . , ti such that

tk ∈ {t− r1, t− r1 +m1, t− r2, t− r2 +m2} for 1 ≤ k ≤ i.

By the Holder and Lyapunov inequalities,

Ei∏

k=1

|εtk |2 ≤ E supθ∈Θ∗

|εt(θ)|6 <∞.

Consequently, for some finite constant K∗ independent of t, r1, r2,m1 and m2, we have∥∥∥ε(4)

t − ε(4)t

∥∥∥2≤ K∗ρt−r0

and, when r = o(n),∥∥∥∥∥

1n− r

n−m∑

t=r0+1

(ε(4)t − ε

(4)t

)∥∥∥∥∥2

≤ 1n− r

K∗∞∑

k=1

ρk = O(n−1

). (A.6)

From (A.6) we deduce that, when r = o(n2/3)

limn→∞

√r‖ΣΥr,Υr

− ΣΥr,Υr‖ = lim

n→∞√r‖ΣΥ,Υ − ΣΥ,Υ‖

= limn→∞

√r‖ΣΥ,Υr

− ΣΥ,Υr‖ = 0 (A.7)

25

tend to zero as n→∞ .

A Taylor expansion about θ0 yields∣∣∣εt(θ)− εt

∣∣∣ ≤ ε∗t∥∥∥θ − θ0

∥∥∥ , ε∗t =∥∥ ∂

∂θ′ εt(θ∗)∥∥ where θ∗ = θ∗(t, n) is

between θ and θ0.

Thus, for t ≥ r0,∣∣∣ε(4)

t − ε(4)t

∣∣∣ ≤∥∥∥θ − θ0

∥∥∥{ε∗t−r1

|εt−r1+m1εt−r2εt−r2+m2 |

+ε∗t−r1+m1|εt−r1εt−r2εt−r2+m2 |

+ε∗t−r2|εt−r1 εt−r1+m1εt−r2+m2 |+ ε∗t−r2+m2

|εt−r1 εt−r1+m1 εt−r2 |}.

Note that, in the previous inequality, the L2-norm of the terms into brackets is bounded, uniformly in

t, n, r1, r2,m1 and m2 because

E |ε∗t |8 ≤ E supθ∈Θ∗

∥∥∥∥∂εt∂θ′

(θ)∥∥∥∥

8

<∞, E |εt|8 <∞, and E |εt|8 <∞.

Using the Jensen inequality, it follows{

1n− r

n∑

t=r0+1

(ε(4)t − ε

(4)t

)}2

≤∥∥∥θ − θ0

∥∥∥2Dn,r1,m1,r2,m2 ,

where E |Dn,r1,m1,r2,m2 | ≤ K∗ for some constant K∗ independent of n, r1, r2,m1 and m2. Thus r‖ΣΥr,Υr−

ΣΥr,Υr‖2, r‖ΣΥ,Υ−ΣΥ,Υ‖2, and r‖ΣΥ,Υr

−ΣΥ,Υr‖ are respectively bounded by r3

∥∥∥θ − θ0

∥∥∥2OP (1), r

∥∥∥θ − θ0

∥∥∥2OP (1),

and r2∥∥∥θ − θ0

∥∥∥2OP (1). Since

∥∥∥θ − θ0

∥∥∥ = OP

(n−1/2

), we obtain for r = o(n1/3)

√r‖ΣΥr,Υr

− ΣΥr,Υr‖ = oP (1),

√r‖ΣΥ,Υ − ΣΥ,Υ‖‖ = oP (1), (A.8)

√r‖ΣΥ,Υr

− ΣΥ,Υr‖ = oP (1).

The proof of the lemma follows from (A.5), (A.7) and (A.8).

2

Write A∗r = (A1 · · ·Ar) where the Ai’s are defined by (5.3).

Lemma A.5 Under Assumption 2,√r ‖A∗r −Ar‖ → 0,

as r →∞.

Proof. Recall that by (5.3) and (A.1)

Υt = ArΥr,t + ur,t

= A∗rΥr,t +∞∑

i=r+1

AiΥt−i + ut := A∗rΥr,t + u∗r,t.

26

Hence, using the orthogonality conditions in (5.3) and (A.1)

A∗r −Ar = −Σu∗r ,ΥrΣ−1

Υr,Υr(A.9)

where Σu∗r ,Υr= Eu∗r,tΥ

′r,t. By the Cauchy-Schwarz inequality and (A.2),

∥∥Cov(Υt−r−h,Υr,t

)∥∥ ≤ r1/2m‖εt‖44.

Thus,

‖Σu∗r ,Υr‖ = ‖

∞∑

i=r+1

AiEΥt−iΥ′r,t‖ ≤

∞∑

h=1

‖Ar+h‖∥∥Cov

(Υt−r−h,Υr,t

)∥∥

= O(1)r1/2∞∑

h=1

‖Ar+h‖. (A.10)

Note that the assumption ‖Ai‖ = o(i−2)

entails r∑∞

h=1 ‖Ar+h‖ = o(1) as r → ∞. The Lemma therefore

follows from (A.9), (A.10) and Lemma A.2.

2

The following lemma is similar to Lemma 3 in Berk (1974).

Lemma A.6 Under Assumptions 1 and 2,

√r‖Σ−1

Υr,Υr

− Σ−1Υr,Υr

‖ = oP (1)

as n→∞ when r = o(n1/3) and r →∞.

Proof. We have∥∥∥Σ−1

Υr,Υr

− Σ−1Υr,Υr

∥∥∥

=∥∥∥{

Σ−1

Υr,Υr

− Σ−1Υr,Υr

+ Σ−1Υr,Υr

}{ΣΥr,Υr

− ΣΥr,Υr

}Σ−1

Υr,Υr

∥∥∥

≤∥∥∥Σ−1

Υr,Υr

∥∥∥∞∑

i=1

∥∥∥ΣΥr,Υr− ΣΥr,Υr

∥∥∥i ∥∥∥Σ−1

Υr,Υr

∥∥∥i.

Thus, for every ε > 0,

P(√

r∥∥∥Σ−1

Υr,Υr

− Σ−1Υr,Υr

∥∥∥ > ε)

≤ P

√r

∥∥∥Σ−1Υr,Υr

∥∥∥2 ∥∥∥ΣΥr,Υr

− ΣΥr,Υr

∥∥∥1−


∥∥∥∥∥∥Σ−1

Υr,Υr

∥∥∥> ε and


∥∥∥∥∥∥Σ−1

Υr,Υr

∥∥∥ < 1

+P(√

r∥∥∥ΣΥr,Υr

− ΣΥr,Υr

∥∥∥∥∥∥Σ−1

Υr,Υr

∥∥∥ ≥ 1)

≤ P

√r∥∥∥ΣΥr,Υr

− ΣΥr,Υr

∥∥∥ > ε∥∥∥Σ−1Υr,Υr

∥∥∥2+ εr−1/2

∥∥∥Σ−1Υr,Υr

∥∥∥

+P(√

r∥∥∥ΣΥr,Υr

− ΣΥr,Υr

∥∥∥ ≥∥∥∥Σ−1

Υr,Υr

∥∥∥−1)

= o(1)

by Lemmas A.4 and A.2. This establishes Lemma A.6.

27

2

Lemma A.7 Under Assumptions 1 and 2,

√r∥∥∥Ar −Ar

∥∥∥ = oP (1)

as r →∞ and r = o(n1/3).

Proof. By the triangle inequality and Lemmas A.2 and A.6, we have

∥∥∥Σ−1

Υr,Υr

∥∥∥ ≤∥∥∥Σ−1

Υr,Υr

− Σ−1Υr,Υr

∥∥∥+∥∥∥Σ−1

Υr,Υr

∥∥∥ = Op(1) (A.11)

when r = o(n1/3). Note that the orthogonality conditions in (A.1) entail that Ar = ΣΥ,ΥrΣ−1

Υr,Υr. This,

Lemmas A.2, A.4, A.6, and (A.11) give

√r∥∥∥Ar −Ar

∥∥∥ =√r∥∥∥ΣΥ,Υr

Σ−1

Υr,Υr

− ΣΥ,ΥrΣ−1

Υr,Υr

∥∥∥

=√r∥∥∥(ΣΥ,Υr

− ΣΥ,Υr

)Σ−1

Υr,Υr

+ ΣΥ,Υr

(Σ−1

Υr,Υr

− Σ−1Υr,Υr

)∥∥∥ = oP (1).

2

Proof of Theorem 5.2. It suffices to show that Ar(1) → A(1) and Σur,ur → Σu,u in probability. Let the

rm×m matrix Er = (1, . . . , 1)′. Using (A.2), and Lemmas A.5, A.7, we obtain

∥∥∥Ar(1)−A(1)∥∥∥ ≤

∥∥∥∥∥r∑

i=1

Ar,i −Ar,i

∥∥∥∥∥+

∥∥∥∥∥r∑

i=1

Ar,i −Ai

∥∥∥∥∥+

∥∥∥∥∥∞∑

i=r+1

Ai

∥∥∥∥∥

=∥∥∥(Ar −Ar

)E′r∥∥∥+

∥∥(A∗r −Ar) E′r∥∥+

∥∥∥∥∥∞∑

i=r+1

Ai

∥∥∥∥∥

≤ √m√r{∥∥∥Ar −Ar

∥∥∥+ ‖A∗r −Ar‖}

+

∥∥∥∥∥∞∑

i=r+1

Ai

∥∥∥∥∥ = oP (1).

Now note that

Σur,ur = ΣΥ,Υ − ArΣ′Υ,Υr

and, by (5.3)

Σu,u = Eutu′t = EutΥ′

t = E

{(Υt −

∞∑

i=1

AiΥt−i

)Υ′

t

}

= ΣΥ,Υ −∞∑

i=1

AiEΥt−iΥ′t = ΣΥ,Υ −A∗rΣ

′Υ,Υr

−∞∑

i=r+1

AiEΥt−iΥ′t.

28

Thus,

∥∥∥Σur,ur − Σu,u

∥∥∥ =∥∥∥ΣΥ,Υ − ΣΥ,Υ −

(Ar −A∗r

)Σ′

Υ,Υr

−A∗r(Σ′

Υ,Υr− Σ′Υ,Υr

)+

∞∑

i=r+1

AiEΥt−iΥ′t

∥∥∥∥∥

≤∥∥∥ΣΥ,Υ − ΣΥ,Υ

∥∥∥+∥∥∥(Ar −A∗r

)(Σ′


)∥∥∥+∥∥∥(Ar −A∗r

)Σ′Υ,Υr

∥∥∥+∥∥∥A∗r

(Σ′


)∥∥∥

+

∥∥∥∥∥∞∑

i=r+1

AiEΥt−iΥ′t

∥∥∥∥∥ . (A.12)

In the right-hand side of this inequality, the first norm is oP (1) by Lemma A.4. By Lemmas A.5 and

A.7, we have ‖Ar − A∗r‖ = op(r−1/2) = op(1), and by Lemma A.4, ‖Σ′Υ,Υr

− Σ′Υ,Υr‖ = op(r−1/2) = op(1).

Therefore the second norm in the right-hand side of (A.12) tends to zero in probability. The third norm

tends to zero in probability because ‖Ar −A∗r‖ = op(1) and, by Lemma A.2, ‖Σ′Υ,Υr‖ = O(1). The fourth

norm tends to zero in probability because, in view of Lemma A.4, ‖Σ′Υ,Υr

− Σ′Υ,Υr‖ = op(1), and, in view

of (A.2), ‖A∗r‖2 ≤∑∞i=1 Tr(AiA

′i) <∞. Clearly, the last norm tends to zero, which completes the proof.

References

Amemiya, T. and Wu, R. Y. (1972) The Effect of Aggregation on Prediction in the Autoregressive Model J.

Amer. Statist. Assoc., 67, 628–632.

Andrews, D. W. K. (1991) Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation.

Econometrica, 59, 817–858.

Andrews, D. W. K. and Monahan, J.C. (1992) An Improved Heteroskedasticity and Autocorrelation Consis-

tent Covariance Matrix Estimator. Econometrica, 60, 953–966.

Berk, K. N. (1974) Consistent Autoregressive Spectral Estimates. Ann. Statist., 2, 489–502.

Bollerslev, T. (1986) Generalized Autoregressive Conditional Heteroskedasticity. J. Econometrics, 31, 307–327.

Box, G. E. P. and Pierce, D. A. (1970) Distribution of Residual Autocorrelations in Autoregressive-Integreted

Moving Average Time Series Models. J. Amer. Statist. Assoc., 65, 1509–1526.

Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods, Springer-Verlag, New York, 2nd

edition.

Broze, L., Francq, C. and Zakoıan, J-M. (2002) Efficient use of higher-lag autocorrelations for estimating au-

toregressive processes J. Time Series Analysis, 23, 287–312.

Carrasco, M. and Chen, X. (2002) Mixing and Moment Properties of Various GARCH and Stochastic Volatility

Models. Econometric Theory, 18, 17-39.

29

Chen, W. W. and Deo, R. S. (2003) A Generalized Portmanteau Goodness-of-fit Test for Time Series Models.

Econometric Theory, forthcoming.

Chen, W. W. and Deo, R. S. (2003) Power Transformations to Induce Normality and Their Applications. J.

Roy. Statist. Soc. B, forthcoming.

Davies, N., Triggs, C. M. and Newbold, P. (1977) Significance levels of the Box-Pierce Portmanteau Statistic

in Finite Samples. Biometrika, 64, 517–522.

Davydov, Y. A. (1968) Convergence of Distributions Generated by Stationary Stochastic Processes. Theor. Probab.

Appl., 13, 691–696.

Deo, R. S. (2000) Spectral Tests of the Martingale Hypothesis under Conditional Heteroskedasticity. J. Econo-

metrics, 99, 291–315.

Diebold, F. X. (1986) Testing for Serial Correlation in the Presence of ARCH. 1986 Proceedings of the Business

and Economics Statistics Section, American Statistical Association, 323–328.

Durlauf, S. N. (1991) Spectral based Testing of the Martingale Hypothesis. J. Econometrics, 50, 355–376.

Engle, R. F. (1982) Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of the United

Kingdom Inflation. Econometrica, 50, 987–1007.

Engle, R. F. (1998) Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data.

Econometrica, 66, 1127–1162.

Francq, C. and Zakoıan, J-M. (1998) Estimating Linear Representations of Nonlinear Processes. J. Statist.

Plan. Infer., 68, 145–165.

Francq, C. and Zakoıan, J-M. (2001) Stationarity of Multivariate Markov-switching ARMA Models. J. Econo-

metrics, 102, 339-364.

Francq, C. and Zakoıan, J-M. (2003) The L2-Structures of Standard and Switching-Regime GARCH Models.

Unpublished document, CREST.

Hamilton, J. D. (1989) A new approach to the economic analysis of nonstationary time series and the business

cycle. Econometrica 57, 357–384.

Hamilton, J. D. (1994) Time Series Analysis. Princeton University Press.

Hamilton, J. D., and Susmel, R. (1994) Autoregressive conditional heteroskedasticity and changes in regime.

J. Econometrics 64, 307-333.

Harvey, A. C. and Pierse, R. G. (1984) Estimating Missing Observations in Economic Time Series. J. Amer.

Statist. Assoc., 79, 125–131.

Hong, Y. (1996) Consistent Testing for Serial Correlation of Unknown Form. Econometrica, 64, 837–864.

Hong, Y. and Lee Y. J. (2003) Consistent Testing for Serial Correlation of Unknown Form Under General Con-

ditional Heteroskedasticity. Preprint, Cornell University.

30

Imhof, J. P. (1961) Computing the Distribution of Quadratic Forms in Normal Variables. Biometrika, 48, 419–426.

Ling, S. and McAleer, M. (2002) Necessary and Sufficient Moment Conditions for the GARCH(r, s) and Asym-

metric Power GARCH(r, s) Models. Econometric Theory, 18, 722–729.

Ljung, G. M. and Box, G. E. P. (1978) On the Measure of Lack of Fit in Time Series Models Biometrika, 65,

297–303.

Lobato, I. N. (2001) Testing that a Dependent Process is Uncorrelated. J. Amer. Statist. Assoc., 96, 1066–1076.

Lobato, I. N., Nankervis J. C. and Savin, N. E. (2001) Testing for Autocorrelation Using a Modified Box-

Pierce Q Test. Inter. Econ. Review, 42, 187–205.

Lobato, I. N., Nankervis J. C. and Savin, N. E. (2002) testing for Zero Autocorrelation in the Presence of

Statistical Dependence. Econometric Theory, 18, 730-743.

Lutkepohl, H. (1991) Introduction to Multiple Time Series Analysis, Springer Verlag , Berlin.

McLeod, A. I. (1978) On the Distribution of Residual Autocorrelations in Box-Jenkins Method. J. Roy. Statist.

Soc. B, 40, 296–302.

Newey, W.K. and West, K.D. (1987) A Simple Positive Semi-Definite Heteroskedasticity and Autocorrelation

Consistent Covariance Matrix. Econometrica, 55, 703–708.

Nsiri, S. and Roy, R. (1993) On the invertibility of Multivariate Linear Processes. J. Time Ser. Anal., 14,

305–316.

Palm, F. C. and Nijman, T. E. (1984) Missing Observations in the Dynamic Regression Model. Econometrica,

52, 1415–1435.

Pham, D. T. (1986) The Mixing Property of Bilinear and Generalized Random Coefficient Autoregressive Models.

Stochastic Process. Appl., 23, 291–300.

Romano, J. L. and Thombs, L. A. (1996) Inference for Autocorrelations under Weak Assumptions. J. Amer.

Statist. Assoc., 91, 590–600.

Ruiz, E. (1994) Quasi-Maximum Likelihood Estimation of Stochastic Volatility Models. J. Econometrics, 63, 289–

306.

Whittle, P. (1963) On the fitting of multivariate autoregressions and the approximate canonical factorization of a

spectral density matrix. Biometrika, 40, 129–134.

31

limn→∞√nVarρ(h) limn→∞

√nVarρ(h)

h h5 10 15 20

-2

-1

1

2

5 10 15 20

-3

-2

-1

1

2

3

a = 0.9, α1 = 0.55 a = 0.2, α1 = 0.55

Figure 1: Asymptotic significance limits for the residual autocorrelations in the AR(1)-ARCH(1) model: Xt = aXt−1 + εt,

with εt =pω + α1ε2tηt, (ηt) iid N (0, 1). Dotted lines correspond to ± limn→∞

pnVarρ(h) (obtained from Theorem 3.2).

Full lines correspond to ± limn→∞pnVarρ(h) for a strong AR(1) model, with a = 0.9 for the left graph, and a = 0.2 for the

right graph.

limn→∞√nVarρ(h) limn→∞

√nVarρ(h)

h h5 10 15 20

-1

-0.5

0.5

1

5 10 15 20

-4

-2

2

4

a = 0.9, c = 1, p = 0.01 a = 0.99, c = 1, p = 0.95

Figure 2: Asymptotic significance limits for the residual autocorrelations in the Markov-switching AR(1) model (4.7). Dotted

lines correspond to ± limn→∞pnVarρ(h) (obtained from Theorem 3.2). Full lines correspond to ± limn→∞

pnVarρ(h) for a

strong AR(1) model, with a = 0.9 for the left graph, and a = 0.99 for the right graph.

32

Table 3: Relative rejection frequencies (in %) at the 5% nominal level of the standard LB, the modified LB and the

HL tests applied to the AR(1) model with ARCH(1) innovations εt =√

1 + α1ε2t−1ηt, (ηt) iid N (0, 1). The samplesize is n = 1, 000 and the number of replications is N = 1, 000.

Standard LB testm (a, α1)

(0.0,0.0) (0.0,0.2) (0.5,0.2) (0.9,0.2) (0.9,0.4)2 5.30 7.20 7.70 15.40 26.303 5.80 5.60 6.40 11.60 21.404 5.60 4.20 5.90 9.20 20.0012 4.60 5.90 5.10 6.40 12.2024 5.60 4.50 5.10 7.40 10.7036 5.40 5.60 3.60 4.70 8.20

HL testm (a, α1)

(0.0,0.0) (0.0,0.2) (0.5,0.2) (0.9,0.2) (0.9,0.4)7 1.20 0.20 0.20 3.80 3.8015 2.10 1.10 1.30 4.50 3.7023 2.90 1.30 2.30 3.10 3.8031 3.10 1.50 2.60 3.80 4.2039 3.70 1.60 3.00 4.00 4.20

Modified LB testm (a, α1)

(0.0,0.0) (0.0,0.2) (0.5,0.2) (0.9,0.2) (0.9,0.4)2 5.20 5.80 5.80 5.70 5.103 5.60 4.80 5.10 5.10 4.904 5.20 3.70 4.30 5.20 5.6012 4.30 5.00 3.80 5.00 5.3024 4.90 3.50 3.90 5.00 5.7036 4.90 4.60 2.80 4.00 4.70

33

Table 4: Relative rejection frequencies (in %) at the 5% nominal level of the standard LB, the modified LB andthe HL tests applied to the Markov switching AR(1) model (4.7). The sample size is n = 10, 000 and the number ofreplications is N = 1, 000.

Standard LB testm (a,p)

(0.0,0.95) (0.6,0.95) (0.6,0.05) (0.9,0.5) (0.99,0.95)2 10.00 9.60 25.20 16.90 15.303 6.70 4.00 26.00 10.60 10.304 7.30 6.50 25.50 9.70 11.0012 6.30 5.60 23.70 6.10 9.6024 6.00 4.80 19.40 6.10 5.9036 6.60 5.80 15.80 5.80 6.20

HL testm (a,p)

(0.0,0.95) (0.6,0.95) (0.6,0.05) (0.9,0.5) (0.99,0.95)12 1.70 1.50 14.00 4.50 3.0025 2.60 2.60 17.50 4.20 4.6037 3.70 4.60 19.00 4.00 4.4050 3.20 3.00 19.20 4.70 6.7063 3.80 3.20 20.30 4.30 4.90

Modified LB testm (a,p)

(0.0,0.95) (0.6,0.95) (0.6,0.05) (0.9,0.5) (0.99,0.95)2 5.50 5.10 8.40 6.00 3.603 5.90 4.60 7.50 5.00 4.804 5.00 5.30 9.50 4.50 3.7012 4.60 4.40 7.70 5.40 5.8024 4.70 3.80 7.30 5.70 3.6036 5.30 4.70 6.70 5.20 3.80

34

Table 5: Empirical power (in %) at the 5% nominal level of the standard LB test, the modified LB test and theHL test for goodness of fit of an AR(1) model. The DGP is an ARMA(1,1) process: Xt = aXt−1 + εt + bεt−1,

εt =√

1 + α1ε2t−1ηt, (ηt) iid N (0, 1). The sample size is n = 1, 000 and the number of replications is N = 1, 000.

Standard LB test (adjusted for a correct type I error)m (a, b, α1)

(0.0,0.2,0.0) (0.0,0.2,0.2) (0.5,0.2,0.2) (0.9,0.2,0.2) (0.9,0.2,0.4)2 27.00 19.10 90.70 99.70 96.703 16.50 20.20 89.80 99.50 96.004 19.30 17.20 89.70 99.30 96.2012 11.80 10.10 76.20 97.70 92.5024 9.90 9.60 56.80 91.30 86.1036 6.90 10.50 47.80 88.70 81.40

HL testm (a, b, α1)

(0.0,0.2,0.0) (0.0,0.2,0.2) (0.5,0.2,0.2) (0.9,0.2,0.2) (0.9,0.2,0.4)7 10.30 4.10 82.30 99.60 95.9015 9.70 8.60 83.30 99.40 95.0023 11.80 8.80 83.10 99.20 95.0031 10.30 8.30 80.70 99.10 93.9039 10.10 7.80 77.30 98.40 92.60

Modified LB testm (a, b, α1)

(0.0,0.2,0.0) (0.0,0.2,0.2) (0.5,0.2,0.2) (0.9,0.2,0.2) (0.9,0.2,0.4)2 27.80 23.10 91.70 99.60 95.903 16.30 20.00 90.50 99.50 95.904 18.00 15.70 89.50 99.20 95.0012 11.00 9.10 73.50 97.50 91.8024 8.20 8.10 55.90 91.60 85.0036 7.50 7.50 47.30 89.00 80.60

Table 6: Standard and modified LB white noise tests on the S&P returns.

lag 1 2 3 4 5LB statistic 8.04 16.11 22.35 24.55 24.57p-value of the standard version 0.0046 0.0003 0.0001 0.0001 0.0002p-value of the modified version 0.2688 0.3242 0.3104 0.3139 0.4432

6 7 8 9 12 18 24 30 3624.59 25.51 26.59 26.61 35.34 38.43 43.93 60.46 78.110.0004 0.0006 0.0008 0.0016 0.0004 0.0034 0.0078 0.0008 0.00010.4872 0.5057 0.5351 0.5839 0.5042 0.6173 0.6862 0.5527 0.4376

35

Table 7: Standard and modified LB tests for goodness-of-fit of an ARMA(1,1) model for the squares of the S&Preturns.

lag 1 2 3 4 5LB statistic 2.25 15.30 16.60 37.26 84.12p-value of the standard version 0.0000 0.0000 0.0000 0.0000 0.0000p-value of the modified version 0.5827 0.5190 0.5652 0.4285 0.3796

6 7 8 9 12 18 24 30 3688.31 94.21 95.84 95.94 97.39 98.32 100.34 106.60 111.010.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.00000.3788 0.3735 0.3733 0.3755 0.3759 0.3765 0.3752 0.3702 0.3655

ρ(h) ρ(h)

h h

5 10 15 20 25 30 35

-0.15

-0.1

-0.05

0.05

0.1

0.15

5 10 15 20 25 30 35

-0.15

-0.1

-0.05

0.05

0.1

0.15

Figure 3: Autocorrelation of the ARMA(1,1) residuals for the squares of the S&P returns.

36

Documents

Goodness-of-ﬁt tests for ARMA models · the portmanteau statistics Qm and Q˜m without requiring that the noise is independent or a martingale diﬀerence. The aim of this work