22
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/223237237 Goodness-of-fit testing under long memory ARTICLE in JOURNAL OF STATISTICAL PLANNING AND INFERENCE · DECEMBER 2010 Impact Factor: 0.68 · DOI: 10.1016/j.jspi.2010.04.039 CITATIONS 8 READS 26 2 AUTHORS: Hira Lal Koul Michigan State University 125 PUBLICATIONS 2,447 CITATIONS SEE PROFILE Donatas Surgailis Vilnius University 139 PUBLICATIONS 2,327 CITATIONS SEE PROFILE Available from: Donatas Surgailis Retrieved on: 04 February 2016

Goodness-of-fit testing under long memory

Embed Size (px)

Citation preview

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/223237237

Goodness-of-fittestingunderlongmemory

ARTICLEinJOURNALOFSTATISTICALPLANNINGANDINFERENCE·DECEMBER2010

ImpactFactor:0.68·DOI:10.1016/j.jspi.2010.04.039

CITATIONS

8

READS

26

2AUTHORS:

HiraLalKoul

MichiganStateUniversity

125PUBLICATIONS2,447CITATIONS

SEEPROFILE

DonatasSurgailis

VilniusUniversity

139PUBLICATIONS2,327CITATIONS

SEEPROFILE

Availablefrom:DonatasSurgailis

Retrievedon:04February2016

Goodness-of-fit testing under long memory1

Hira L. Koul and Donatas Surgailis

Michigan State University and Vilnius Institute ofMathematics & Informatics

Abstract

This paper discusses the problem of fitting a distribution function to the marginaldistribution of a long memory moving average process. Because of the uniform re-duction principle, unlike in the i.i.d. set up, classical tests based on empirical processare relatively easy to implement. More importantly, we discuss fitting the marginaldistribution of the error process in location, scale and linear regression models. Aninteresting observation is that in the location model, or more generally in the linearregression models with non-zero intercept parameter, the null weak limit of the firstorder difference between the residual empirical process and the null model is degenerateat zero, and hence it can not be used to asymptotically distinguish between the twomarginal distributions that differ only in their means. This finding is in sharp contrastto a recent claim of Chan and Ling (2008) that the null weak limit of such a processis a continuous Gaussian process. This note also proposes some tests based on thesecond order difference for the location case. Another finding is that residual empiricalprocess tests in the scale problem are robust against not knowing the scale parameter.

1 Introduction

The problem of fitting a parametric family of distributions to a probability distribution,

known as the goodness-of-fit testing problem, is well studied in the literature when the

underlying observations are i.i.d. See, for example, Durbin (1973, 1975), Khmaladze (1979,

1981), D’Agostino and Stephens (1986), among others.

A discrete time stationary stochastic process with finite variance is said to have long

memory if its autocorrelations tend to zero hyperbolically in the lag parameter, as the lag

tends to infinity, but their sum diverges. The importance of these processes in econometrics,

hydrology and other physical sciences is abundantly demonstrated in the works of Beran

(1992, 1994), Baillie (1996), Dehling, Mikosch and Sorenson (2002), and Doukhan, Oppen-

heim and Taqqu (2003), and the references therein.

1AMS Classification: Primary 62M09; secondary 62M10, 62M99.Research of the first author supported in part by the NSF DMS grant 0704130 while that of the secondauthor supported in part by the bilateral France-Lithuania scientific project Gilibert and the LithuanianState Science and Studies Foundation grant T-70/09. Corresponding author: Koul. October 9, 2009Key words and phrases. Moving averages, residual empirical process, long memory

1

We model long memory via moving averages. Let Z := {0,±1, · · · }. We suppose for the

time being that the observable process is

εj =∞∑

k=0

bkζj−k, j ∈ Z,(1.1)

where ζs, s ∈ Z, are i.i.d., with zero mean and unit variance. The constants bj are assumed

to satisfy bk = 0, k < 0, b0 = 1, and

bj ∼ c0 j−(1−d), as j →∞, for some 0 < c0 < ∞ and 0 < d < 1/2.(1.2)

Let B(a, b) :=∫ 1

0xa−1(1− x)b−1dx, a > 0, b > 0. One can directly verify that as k →∞,

Cov(ε0, εk) =∞∑

j=0

bjbj+k ∼ c20

∫ ∞

0

yd−1(1 + y)d−1dy k−(1−2d) = c20B(d, 1− 2d)k−(1−2d),

so that the process εj, j ∈ Z, has long memory.

Now, let ε denote a copy of ε1, F denote the marginal d.f. of ε and F0 be a known d.f.

The problem of interest here is to test the simple hypothesis

H0 : F = F0, vs. H1 : F 6= F0.

This problem is of interest in applications. For example in the value at risk analysis, cf.

Tsay (2002), various probability calculations are based on the assumption that the under-

lying process is a Gaussian process. If one would reject the null hypothesis of a marginal

distribution being Gaussian then such analysis would be a suspect.

In this note we shall discuss asymptotic behavior of some omnibus tests for H0 based on

the empirical d.f. Fn(x) :=∑n

i=1 I(εi ≤ x)/n of εi, 1 ≤ i ≤ n for testing H0.

A bit more interesting and at the same time surprisingly challenging problem is to test

H loc0 : F (x) = F0(x− µ), ∀x ∈ R, for some µ ∈ R; vs.

H loc1 : H loc

0 is not true.

This problem is equivalent to testing for H0 based on the observations

Yi = µ + εi, i = 1, · · · , n, for some µ ∈ R,(1.3)

where now εi, i ∈ Z, are unobservable moving average errors of (1.1). Here tests would be

based on

Fn(x) := n−1

n∑i=1

I(Yi − Yn ≤ x), Yn := n−1

n∑i=1

Yi, x ∈ R.

2

Some times we may be interested in testing the equivalence of F to F0 up to a scale

parameter, i.e., to test

Hsc0 : F (x) = F0(x/σ), ∀ x ∈ R, for some σ > 0; vs.

Hsc1 : Hsc

0 is not true.

This problem is equivalent to testing for H0 based on the observations

Yi = σεi, i = 1, · · · , n, for some σ > 0,(1.4)

where again εi, i ∈ Z, are unobservable moving average errors of (1.1). Here tests will be

based on

Fn(x) := n−1

n∑i=1

I(Yi/σn ≤ x), σ2n := n−1

n∑i=1

Y 2i , x ∈ R.

Another interesting problem is to fit a distribution up to unknown location and scale

parameters, i.e., to test the hypothesis

H0 : F (x) = F0(x− µ

σ), ∀x ∈ R and for some µ ∈ R, σ > 0,

H1 : H0 is not true.

This is equivalent to testing for H0 based on the observations

Yi = µ + σεi, i = 1, · · · , n, for some µ ∈ R, σ > 0,(1.5)

where again εi, i ∈ Z, are unobservable as in (1.1). Here tests will be based on

Fn(x) := n−1

n∑i=1

I(Yi − Yn ≤ xsn), s2n := n−1

n∑i=1

(Yi − Yn)2, x ∈ R.

Next, consider the problem of fitting marginal error d.f. in the linear regression model

where one is given an array of p× 1 design vectors xni, 1 ≤ i ≤ n, and one observes an array

of random variables {Yni; 1 ≤ i ≤ n} from the model

Yni = x′niβ + εi, 1 ≤ i ≤ n, for some β ∈ Rp,(1.6)

with εi, i ∈ Z, as in (1.1). Now F denotes marginal d.f. of the error process. Consider the

problem of testing the above H0 vs. H1 based on Yni, 1 ≤ i ≤ n. Let βn be the LSE of β

and Fn denote the empirical d.f. of the residuals εi := Yni − x′niβn, 1 ≤ i ≤ n.

In the next section we discuss tests for the first problem based on Fn, while section 3

pertains to tests for H loc0 , Hsc

0 and H0 based on Fn, Fn, and Fn, respectively. It is observed

that the first order differences n1/2−d(Fn − F0) and n1/2−d(Fn − F0) can not be used to test

3

for H loc0 and H0, respectively, while tests based on n1/2−d(Fn − F0) for testing Hsc

0 have the

same large sample behavior as in the case of known σ.

Tests for fitting an error d.f. in linear regression set up based on n1/2−d(Fn − F0) are

discussed in section 4. This process also converges to zero in probability under H0 if there is a

non-zero intercept parameter in (1.6), and thus can not be used to test for H0 asymptotically.

Section 5 suggests some tests of H loc0 based on the second order difference n1−2d(Fn − F0).

The recent paper Chan and Ling (2008) discussed the asymptotics of residual empirical

processes and goodness-of-fit tests for long-memory errors. Although the first order approx-

imations in this paper are similar to ours, some conclusions of Chan and Ling (2008) for

hypotheses testing are incorrect (see Remark 3.1 below). Moreover, Chan and Ling (2008)

do not discuss tests based on second order differences.

2 Tests for simple hypothesis

Here, we shall analyze asymptotic behavior of the first order process n1/2−d(Fn − F0) when

εi, i ∈ Z, is an observable process. The Kolomorov test based on the supremum statistic

Kn := supx∈R

|Fn(x)− F0(x)|,

will be discussed in some detail.

To proceed, we need to assume, for some C < ∞ and δ > 0,

|Eeiuζ0| ≤ C(1 + |u|)−δ, E|ζ0|3 < ∞.(2.1)

Let c(θ) = c20B(d, 1−2d)

/d(1+2d), θ := (c0, d)′, and εn := n−1

∑ni=1 εi. Then, from Giraitis,

Koul and Surgailis (1996) (GKS) and Koul and Surgailis (2002), we obtain that under H0,

F0 is infinitely differentiable with smooth and bounded Lebesgue density f0, and

supx∈R

∣∣n1/2−d(Fn(x)− F0(x)) + f0(x) n1/2−dεn

∣∣ = op(1), (H0).(2.2)

Dehling and Taqqu (1989) proved an analog of (2.2) when εi, i ∈ Z, is a stationary

Gaussian process, and coined the phrase uniform reduction principle (URP) for this type

of result. Koul and Mukherjee (1993) established URP when εi, i ∈ Z, is subordinated to

a Gaussian process, and GKS proved it when εi, i ∈ Z, is a long memory moving average

process and under a higher moment assumption on ζ0. Koul and Surgailis (2002) obtained

the above expansion under the third moment assumption as in (2.1).

In the sequel, up(1) denotes a sequence of stochastic processes indexed by x ∈ R and

tending to zero, uniformly over x ∈ R, in probability, and =⇒ (→D) denotes the weak

4

convergence of a sequence of stochastic processes in the Skorohod space D(R) with the sup-

topology (r.v.’s), where R := [−∞,∞]. For any smooth function g from R to R, g and g

denote its first and second derivatives, respectively.

Now, by a result of Davydov (1970),

c−1/2(θ)n1/2−dεn →D N (0, 1).(2.3)

Hence, from (2.2) we obtain

n1/2−d(Fn(x)− F0(x)) =⇒ −c1/2(θ)f0(x)Z,(2.4)

where Z ∼ N (0, 1) r.v. Consequently, with ‖f0‖∞ := supx∈R f0(x),

n1/2−dKn →D c1/2(θ)|Z| ‖f0‖∞,

and we readily obtain that under H0,

Kn(θ) :=n1/2−dKn

c1/2(θ)‖f0‖∞ →D |Z|.

Implementation of this test requires a consistent and a log(n)- consistent estimators of

c0 and d, respectively. Dalla et al. (2006) show that the semi-parametric local Whittle

estimators c0 and d of c0 and d satisfy these conditions. Let θ := (c0, d)′. The proposed test

would reject H0 whenever Kn(θ) is large.

Clearly, the determination of the asymptotic critical values of this test is simple compared

to its analog in the i.i.d. set up. For example, the test that rejects H0 whenever Kn(θ) > zα/2

would be of asymptotic size α, 0 < α < 1, where zα is the 100(1−α)th percentiles of standard

normal distribution.

Needless to say similar statements apply to any other tests based on continuous func-

tionals of the first order difference n1/2−d(Fn − F0). For example, the Cramer - von Mises

test that rejects H0 whenever Cn(θ) > χ21−α would also be of asymptotic size α, where χ2

α is

the 100(1− α)th percentile of chi-square distribution with 1 degree of freedom. Here

Cn :=

∫[Fn(x)− F0(x)]2dF0(x), Cn(θ) :=

n1−2dCn

c(θ)∫

f 30 (x)dx

.

These findings are thus in complete contrast to the results available in the i.i.d. set up

where one must use the full knowledge of the distribution of Brownian bridge on [0, 1] to

implement tests based on the first order difference n1/2(Fn−F0) for large samples, cf., Durbin

(1973, 1975).

We shall now briefly analyze asymptotic power of the Kn test. Let F 6= F0 be another

marginal d.f. of ε such that (2.1) is satisfied and the local Whittle estimators c0, d are

5

consistent and log(n)-consistent for c0, d. Then, arguing as above, F has a smooth Lebesgue

density f and

Kn(θ) =supx∈R

∣∣− c1/2(θ)f(x)Z + n1/2−d(F (x)− F0(x))∣∣

c1/2(θ) ‖f0‖∞ + op(1).

From this we readily see that the above Kolmogorov test is consistent at this F . It has

trivial asymptotic power against sequences of alternatives for which supx n1/2−d|F (x)−F0(x)|→ 0. Thus this test can not distinguish the n1/2-neighborhoods of F0, i.e., this test has

asymptotic power α against those {F} in the class of d.f. satisfying the above assumed

conditions and for which n1/2 supx |F (x) − F0(x)| = O(1). Similar conclusions would hold

for other tests based on n1/2−d(Fn − F0).

Asymptotic power of the Kn(θ) test against the sequence of local alternatives F = F0 +

n−(1/2−d)∆, where ∆ is absolutely continuous with a.e. derivative ∆ bounded, equals

P(

supx| − f0(x)c1/2(θ)Z + ∆(x)| > zα/2c

1/2(θ)‖f0‖∞).

3 Testing for H loc0 , Hsc

0 , and H0

First, consider testing for H loc0 . Let, now εn :=

∑ni=1(Yi − µ)/n. Recall that here Fn(x) =

Fn(x + εn).

Proposition 3.1 (URP for the residual empirical process Fn.) Assume (1.2) and

(2.1) hold. Then,

supx∈R

n1/2−d∣∣Fn(x)− F0(x)

∣∣ = op(1).(3.1)

Proof. From (2.2), (2.4) and the mean value theorem one obtains

n1/2−d(Fn(x)− F0(x)) = n1/2−d[Fn(x + εn)− F0(x + εn) + εnf0(x + εn)

+

∫ x+εn

x

(f0(u)− f0(x + εn))du]

= up(1) + Op(‖f0‖∞n1/2−d|εn|2) = up(1).

This completes the proof. 2

According to Proposition 3.1, the first order difference Dn(x) := n1/2−d(Fn(x) − F0(x))

can not distinguish between the two marginal distributions of a long memory moving average

process that differ only in their means. This finding is in sharp contrast to what is available

in the case of i.i.d. errors where the first order difference n1/2[Fn(x)−F0(x)] converges weakly

to a time transformed Brownian bridge with a drift under this hypothesis, cf., Durbin (1973).

6

Remark 3.1 Proposition 3.1 contradicts Corollary 3.1 of Chan and Ling (2008) which

claims that Dn(x), x ∈ R converges weakly to a continuous Gaussian process under H loc0 .

This corollary is incorrectly derived from the first order expansion in Theorem 2.1 of Chang

and Ling (2008) which is correct and agrees with the expansion in (4.5) below.

Next, consider the scale problem. Here, εi = Yi/σ, and

Fn(x) =1

n

n∑i=1

I(Yi/σn ≤ x) = Fn(xσn/σ).

Proposition 3.2 (URP for the residual empirical process Fn.) Assume (1.2), (2.1)

hold and Eζ40 < ∞. Then,

supx∈R

∣∣n1/2−d[Fn(x)− F0(x)] + n1/2−dεn f0(x)∣∣ = op(1).

Proof. Without loss of generality, assume Eε2t =

∑∞j=0 b2

j = 1. We have

E(σ2n − σ2)2 = σ4n−2E

( n∑i=1

(ε2i − 1)

)2

= σ4n−2

n∑i,j=1

Cov(ε2i , ε

2j).

Let χ4 = Eζ40 + 3 be the 4th cumulant of ζ0. By elementary computation (see Giraitis and

Surgailis (1990, (2.3)),

Cov(ε2i , ε

2j) =

( ∞∑

k=0

bkbk+i−j

)2

+ χ4

∞∑

k=0

b2kb

2k+i−j

=(Cov(εi, εj)

)2+ O(|i− j|−2(1−d)),

implying Cov(ε2i , ε

2j) = O((Cov(εi, εj))

2) = O(|i − j|−2(1−2d)), as |i − j| → ∞. Whence it

follows that

σn

σ− 1 = Op(n

−1/2), 0 < d < 1/4,(3.2)

= Op((log(n)/n)1/2), d = 1/4,

= Op(n−(1−2d)), 1/4 < d < 1/2.

Now, let ∆n := (σn/σ − 1). Recall from Lemma 5.1 of Koul and Surgailis (2002) (KS)

that under (2.1), F0 is infinitely smooth with density f0 satisfying

supx∈R

(1 + x2)(f0(x) + |f0(x)|) < ∞.

7

This bound and (5.12) of Lemma 5.2 of KS imply

|F0(x + x∆n)− F0(x)| =∣∣∣∫ x∆n

0

f0(x + u)du∣∣∣ ≤ C(1 + x2)−1(|x∆n|+ x2∆2

n)(3.3)

≤ C(|∆n|+ ∆2n),

|f0(x + x∆n)− f0(x)| =∣∣∣∫ x∆n

0

f0(x + u)du∣∣∣ ≤ C(1 + x2)−1(|x∆n|+ x2∆2

n)

≤ C(|∆n|+ ∆2n).

Hence,

n1/2−d∣∣Fn(x)− F0(x) + εnf0(x)

∣∣≤ n1/2−d

∣∣Fn(xσn/σ)− F0(xσn/σ) + εn f0(xσn/σ)∣∣

+n1/2−dεn

∣∣f0(x)− f0(x + x∆n)∣∣ + n1/2−d

∣∣F0(x + x∆n)− F0(x)∣∣

≤ up(1) + Cn1/2−d(|∆n|+ ∆2n).

The statement of the proposition now follows from this bound, (3.2) and the fact that

0 < d < 1/2. 2

Note the URP for Fn is precisely the same as for Fn given in (2.2). In other words,

asymptotic null distribution of tests based on n1/2−d(Fn−F0) for testing Hsc0 is the same as

those of tests based on n1/2−d(Fn−F0) for testing H0. This is a kind of robustness property

of these tests against the unknown error variance. It is also unlike the above situation in

the location model, and unlike the situation in the i.i.d. set up, where n1/2(Fn − F0) weakly

converges to a Brownian bridge with a drift, cf., e.g., Durbin (1973).

Location-Scale Problem. Now consider the problem of testing for H0. Assume, as in

the scale problem, that Eζ40 < ∞. Let δn := (sn−σ)/σ and εi = (Yi−µ)/σ, εn := (Yn−µ)/σ.

Then, with Fn the same as in Proposition 3.1,

Fn(x) = n−1

n∑i=1

I(Yi − Yn ≤ xsn) = n−1

n∑i=1

I(εi ≤ x + xδn + εn)

= Fn(x + xδn).

Also note that the bound (3.2) continues to hold with σn replaced by sn under the current

set up. Using this fact, (3.1), and an argument for deriving the bound in (3.3), we readily

obtain

n1/2−d supx|Fn(x)− F0(x)| ≤ n1/2−d sup

x|Fn(x)− F0(x)|+ n1/2−d sup

x|F0(x + xδn)− F0(x)|

= op(1) + Cn1/2−d(|δn|+ δ2n)

= op(1), under H0.

8

Thus, here also the location case dominates in the sense that the first order difference

n1/2−d[Fn(x) − F0(x)], x ∈ R, is not useful for fitting a marginal d.f. up to the unknown

location and scale parameters.

4 Fitting an error d.f. in a regression model

Recall the linear regression model (1.6) and definition of Fn from section 1, viz.,

Fn(x) = n−1

n∑i=1

I(Yni − x′niβn ≤ x) = n−1

n∑i=1

I(εi ≤ x + x′ni(βn − β)),

where βn is the LSE of β ∈ Rp.

Proposition 4.1 (URP for the residual empirical process Fn). Assume the same

conditions on errors as in Proposition 3.1. Assume the matrix Xn whose ith row is x′ni, 1 ≤i ≤ n, is of full rank. Let Dn := (X ′

nXn)1/2. Assume additionally

n1/2 max1≤i≤n

‖D−1n xni‖ = O(1).(4.1)

Then,

supx∈R

∣∣∣n1/2−d[Fn(x)− F0(x)]− f0(x)n−1/2−d

n∑i=1

[x′ni(βn − β)− εi]∣∣∣ = op(1).(4.2)

Proof. To begin with we have uniform linearity of the residual empirical processes: For

any nonrandom array {ξni, 1 ≤ i ≤ n} of real numbers such that max1≤i≤n |ξni| = O(1),

supx∈R

∣∣∣n−1/2−d

n∑i=1

[I(εi ≤ x + ξnin

d−1/2)− I(εi ≤ x)− nd−1/2ξnif0(x)]∣∣∣ = op(1).(4.3)

This result was proved in GKS (Theorem 1(1.9)) under a higher moment assumption on

ζ0. Under the current third moment assumption it follows from (3.16) of Koul and Surgailis

(2003) and an argument as in GKS.

The result (4.3) entails the following fact. For any 0 < b < ∞ and any nonrandom array

{cni, 1 ≤ i ≤ n, n ≥ 1}, cni ∈ Rp, with max1≤i≤n ‖cni‖ = O(1),

supx∈R,‖s‖≤b

n−1/2−d

∣∣∣∣n∑

i=1

[I(εi ≤ x + c′nisn

d−1/2)− I(εi ≤ x)− c′nisnd−1/2f0(x)

] ∣∣∣∣ = op(1),(4.4)

This is proved using arguments as in Koul (2002). For the sake of completeness we are

reproducing this argument in the Appendix below.

9

Let vni := n1/2D−1n xni. Now write x′ni(βn − β) = v′nin

d−1/2n−dDn(βn − β). Recall that

Dn(βn − β) = D−1n

n∑i=1

xniεi = n−1/2

n∑i=1

vniεi.

In view of (1.2), we have |Eεiεj| ≤ C|i− j|2d−1, for all i 6= j. Hence, in view of (4.1),

E‖Dn(βn − β)‖2 = n−1

n∑i,j=1

v′nivnjEεiεj ≤ Cn−1

n∑i,j=1

|Eεiεj| ≤ Cn2d,

n−dDn(βn − β) = Op(1).

Now (4.4) applied with cni = vni, s = n−dDn(βn − β) and a routine argument yields

supx∈R

∣∣∣n1/2−d[Fn(x)− Fn(x)]− n−d−1/2

n∑i=1

x′ni(βn − β)f0(x)∣∣∣ = op(1).(4.5)

Write ∆n for the l.h.s. of (4.2). From (4.5) and (2.2) we obtain

∆n = supx∈R

∣∣∣n1/2−d[Fn(x)− Fn(x)]− f0(x)n−d−1/2

n∑i=1

x′ni(βn − β)

+ n1/2−d[Fn(x)− F0(x)] + f0(x)n−d−1/2

n∑i=1

εi

∣∣∣

= op(1),

proving (4.2). 2

Now, let vn = n−1∑n

i=1 vni, and

Zn := n−1/2−d

n∑i=1

[x′ni(β − β)− εi] = n−1/2−d

n∑i=1

[v′nvni − 1]εi.

Note that if the regression model (1.6) is the one sample location model (1.3), i.e., if in

(1.6), p = 1, xni ≡ 1, β = µ, then vni ≡ 1, LSE is Yn, Fn(x) = Fn(x) and Zn = 0, so that we

again obtain the conclusion (3.1).

More generally, Zn = 0, for all n ≥ 1, w.p.1, whenever there is a non-zero intercept

parameter in the model (1.6). To see this and to keep the exposition transparent, consider

the model (1.6) with p = 2, x′ni = (1, ai), where a1, · · · , an are some known constants with

σ2a :=

∑ni=1(ai − a)2 > 0, and a := n−1

∑ni=1 ai. Then,

v′nvni = n(1, a)(X ′nXn)−1

(1

ai

)

=n

σ2a

(1, a)

(n−1

∑nj=1 a2

j −a

−a 1

)(1

ai

)= 1, ∀ i = 1, · · · , n.

10

Thus, as in the location problem, as long as there is a non-zero intercept parameter

present in the regression model (1.6), the first order difference n1/2−d(Fn − F0) can not be

use to test for H0 in these cases.

Next, if in (1.6),∑n

i=1 xni = 0, then Zn = −n−1/2−d∑n

i=1 εi, and by (2.3), we again

obtain the analog of (2.4) for Fn. In other words, if the design vectors corresponding to the

slope parameters are orthogonal to the p vector of 1’s, then asymptotic null distribution of

n1/2−d(Fn − F0) is not affected by not knowing the slope parameters, and is the same as in

(2.4). This fact has nothing to do with long memory. Analogous fact is available in the i.i.d.

errors set up also, cf., Ch. 6 in Koul (2002).

In general Zn is a weighted sum of long memory moving average process. The following

lemma gives a CLT for such r.v.’s.

Lemma 4.1 Let εi, i ∈ Z, be a linear process as in (1.1) and (1.2). Let {cni ∈ Rp, 1 ≤ i ≤ n}be uniformly bounded nonrandom weights, i.e., supn≥1 sup1≤i≤n |cni| < ∞ and let

Σn := Cov(n−d−1/2

n∑i=1

cniεi

).

Assume that the limit limn→∞Σn = Σ exists and is positive definite. Then,

n−d−1/2

n∑i=1

cniεi →D Np(0,Σ).

The proof of Lemma 4.1 is given in Appendix. Now consider the weighted sum Zn.

Clearly, under (4.1), the weights cni ≡ v′nvni − 1 are uniformly bounded. So one needs only

the existence of the limit of Σn = Var(Zn) in order to apply the above CLT to Zn.

As an example of design where this limit exists and the above lemma is applicable to Zn,

consider the first degree polynomial regression through the origin where p = 1 and xni ≡ i/n.

Here, (4.1) is satisfied and an := n1/2D−1n = n1/2

( ∑ni=1(i/n)2

)−1/2 ∼ 31/2, vni := (i/n) an,

vn ∼ 31/2/2, and

Σn ≡ Σn(θ) ∼ 2c20B(d, 1− 2d)

1

n1+2d

∑1≤i<j≤n

[vn(

i

n)an − 1

][vn(

j

n)an − 1

] 1

(j − i)(1−2d)

∼ 2c20B(d, 1− 2d)

[(3

2)B(2, 2d)− 1

2d

][ 3

2(2 + 2d)− 1

2d + 1

]=: Σ(θ) ≡ Σ.

Hence we can apply the above lemma to Zn to conclude that Zn →D N (0, Σ(θ)). The

role of c(θ) in the simple hypothesis case is now played by Σ(θ). Consequently, here the

analog of Kolmogorov test for testing that the error d.f. is F0 would be based on

Kn :=n1/2−d supx |Fn(x)− F0(x)|

Σ(θ)1/2‖f0‖∞,

11

where now θ is the local Whittle estimator of θ based either on Yni’s or on the residuals

Yni−x′niβn’s. Consistency of θ for θ under (4.1) follows from Dalla et al. (2006). Other tests

based on Fn may be modified in a similar fashion. Needless to say these conclusions remain

valid for any finite degree polynomial model.

5 Tests for H loc0 based on the second order difference

In view of (3.1), it is desirable to ask the question whether there is a higher order difference

of Fn−F0 that will provide a reasonable test for testing H loc0 . In order to attempt to answer

this question, we need to recall a result from Ho and Hsing (1996) and Koul and Surgailis

(2002). Let ε(0)j := 1, ε

(1)j = εj of (1.1) and

ε(k)j :=

∑sk<···<s1≤j

bj−s1 · · · bj−skζs1 · · · ζsk

,

be a polynomial of order k ≥ 2 in the i.i.d. random variables ζs, s ∈ Z. This series

converges in mean square for each k ≥ 1, under the conditions∑∞

k=0 b2k < ∞, Eζ2

0 < ∞alone. Moreover, for each k ≥ 1, the process ε

(k)j , j ∈ Z, is strictly stationary with zero mean

and covariance

E(ε(k)0 ε

(k)j ) =

∑0≤i1<···<ik

bj+i1bi1 · · · bj+ikbik ,(5.1)

and for any integers j, i, Eε(k)j ε

(`)i = 0, (k 6= `, k, ` = 0, 1, · · · ). It follows from (1.2) and

(5.1), that for each k ≥ 1,

∣∣E(ε(k)0 ε

(k)j )

∣∣ ≤ 1

k!

( ∞∑i=1

|bj+ibi|)k

= O(j−k(1−2d)), j →∞,

E( n∑

j=1

ε(k)j

)2

= O(n2−k(1−2d)), k(1− 2d) < 1,

= O(n log(n)), k(1− 2d) = 1,

= O(n), k(1− 2d) > 1, n →∞.

Let

Z(k)n := nk( 1

2−d)−1

n∑j=1

ε(k)j , k ≥ 1.

Note that

Z(1)n = n1/2−dεn, Z(2)

n = n−2d

n∑j=1

∑s2<s1≤j

bj−s1bj−s2ζs1ζs2 .(5.2)

12

Assume 1/(1− 2d) is not an integer and let k∗ denote the greatest integer in 1/(1− 2d),

i.e., k∗ := [1/(1− 2d)]. Introduce the multiple Wiener-Ito integral

Z(k) :=ck0

k!

Rk

{∫ 1

0

k∏j=1

(u− sj)−(1−d)+ du

}W (ds1) · · ·W (dsk)

w.r.t. a Gaussian white noise W (ds), E(W (ds))2 = ds, which is well-defined for 1 ≤ k ≤ k∗.

From Surgailis (2003a) we obtain under the conditions (1.2) and Eζ20 < ∞,

(Z(k)

n , 0 ≤ k ≤ k∗) →D

(Z(k), 0 ≤ k ≤ k∗

).(5.3)

Note that Z(1) is a Gaussian r.v. while Z(2) equals the Rosenblatt process at 1, cf. Taqqu

(1975).

We are now ready to state the following theorem giving higher order expansion of empir-

ical process due to Ho and Hsing (1996).

Theorem 5.1 Let {εj} be a long memory moving average satisfying (1.1) and (1.2). Sup-

pose the d.f. of ζ0 is k∗ + 3 times differentiable with bounded, continuous and integrable

derivatives, and E|ζ0|4 < ∞. Then, under H0,

Fn(x)− F0(x) =∑

1≤k≤k∗(−1)kn−

k(1−2d)2 Z(k)

n

dk−1f0(x)

dxk−1+ n−1/2Qn(x),

with

supx∈R

|Qn(x)| = Op(nδ), ∀ δ > 0.(5.4)

An open (although technical) problem is to show the existence of a Gaussian limit of

the remainder term Qn(x) in Theorem 5.1 (see the discussion in Koul and Surgailis (2002,

pp.220-221)). The last problem is solved in Koul and Surgailis (2002, Theorem 2.1) in the

important special case of Gaussian errors {εj}. Below, we formulate this result for k∗ = 1,

or 0 < d < 1/4, only. Let

Ri(x) := I(εi ≤ x)− F0(x) + f0(x)εi, i ∈ Z, x ∈ R.

Proposition 5.1 Let {εj} be a Gaussian long memory moving average satisfying (1.1) and

(1.2), 0 < d < 1/4. Then

(5.5) Qn(x) := n1/2{Fn(x)− F0(x) + f0(x)εn

}=⇒ Q(x),

where {Q(x), x ∈ R} is a Gaussian process with zero mean and covariance

Cov(Q(x), Q(y)) =∑

j∈ZCov(R0(x), Rj(y)).

13

Proposition 5.2 (i) Assume the same conditions as in Theorem 5.1, and let 1/4 < d < 1/2.

Then,

supx

∣∣n1−2d{Fn(x)− F0(x)} − f0(x)[Z(2)

n − 2−1(Z(1)n )2]

∣∣ = op(1),(5.6)

and

n1−2d{Fn(x)− F0(x)} =⇒ f0(x)Y ,(5.7)

where Y := Z(2) − 2−1(Z(1))2. Moreover,

EY = −2−1c(θ),(5.8)

EY2 =c40B

2(d, 1− 2d)

2d

{ 1

2(4d− 1)+

1

2d(1 + 2d)2− 1

d(4d + 1)− B(1 + 2d, 1 + 2d)

d

}.

(ii) Assume the conditions in Proposition 5.1, 0 < d < 1/4. Then

n1/2{Fn(x)− F0(x)} =⇒ Q(x),(5.9)

where Q(x), x ∈ R is the Gaussian process in (5.5).

Proof. (i) Note that d > 1/4 implies 2d− 1/2 > 0 and k∗ ≥ 2. Combine these facts with

(5.4) and (5.6) with a δ < 2d− 1/2 to obtain the second order expansion under H0:

(5.10) supx∈R

∣∣n1−2d{Fn(x)− F0(x) + f0(x)εn

}− f0(x)Z(2)n

∣∣ = op(1).

The decomposition

Fn(x)− F0(x) = Fn(x + εn)− F0(x + εn) + F0(x + εn)− F0(x)

and (5.10) yield

supx∈R

∣∣n1−2d{Fn(x)− F0(x) + εn f0(x + εn)} − f0(x + εn)Z(2)n

−n1−2d{F0(x + εn)− F0(x)}∣∣ = op(1).

Using Taylor expansion of f0, f0 and the boundedness of f0, we thus obtain

n1−2d{Fn(x)− F0(x)}= −n1−2dεnf0(x + εn) + f0(x + εn)Z(2)

n

−n1−2d{F0(x + εn)− F0(x)} + up(1)

14

= − n1−2dεn

[f0(x) + εnf0(x) +

(εn)2

2f0(x + ξn)

]

+[f0(x) + εnf0(x + ξn)

]Z(2)

n

+ n1−2d[εnf0(x) +

(εn)2

2f0(x) +

(εn)3

6f0(x + ξn)

]+ up(1)

= f0(x)[Z(2)

n − (εn)2

2n1−2d

]+ Op

(n1−2d(εn)3

)+ up(1),

where ξn is a sequence of r.v.’s with |ξn| ≤ |εn|, (εn)2n1−2d = (Z(1)n )2 →D (Z(1))2 and

n1−2d(εn)3 = op(nd−1/2) = op(1). This proves (5.6). Relation (5.7) follows from (5.6) and

(5.3).

(ii) Using the definition of Qn(x) in (5.5), rewrite n1/2(Fn(x)−F0(x)) = Qn(x+ εn)+Un(x),

where Un(x) := n1/2(F0(x + εn)− F0(x)− f0(x + εn)εn. By Taylor expansion,

supx|Un(x)| ≤ n1/2(εn)2 sup

x|f0(x)| = Op(n

2d−1/2) = op(1).

Therefore (5.9) follows from (5.5) and

(5.11) supx|Qn(x + εn)−Qn(x)| = op(1).

For any δ1, δ2 > 0,

P (supx|Qn(x + εn)−Qn(x)| > δ1) ≤ P (|εn| ≥ δ2) + P ( sup

|x−y|<δ2

|Qn(x)−Qn(y)| > δ1).

Since the sequence Qn, n ≥ 1 is tight with respect to the sup-topology on R, the last

probability can be made arbitrarily small by choosing δ2 small enough uniformly in all large

n. Since εn = op(1), this proves (5.11) and the proposition, too. 2

Proof of (5.8). From definition (5.3) and the diagram formula of Wiener-Ito integrals

(Surgailis (2003b)) one obtains

EZ(2) = 0,

E(Z(1))2 = c20

R

{ ∫ 1

0

(u− s)d−1+ du

}2

ds =c20B(d, 1− 2d)

d(1 + 2d)= c(θ),

E(Z(1))4 = 3(E(Z(1))2

)2= 3c2(θ),

E(Z(2))2 = 2−1c40

R2

{ ∫ 1

0

(u− s1)d−1+ (u− s2)

d−1+ du

}2

ds1ds2

=c40B

2(d, 1− 2d)

4d(4d− 1),

15

and

EZ(2)(Z(1))2 = c40

R

Rds1ds2

∫ 1

0

(u− s1)d−1+ (u− s2)

d−1+ du

×∫ 1

0

(v − s1)d−1+ dv

∫ 1

0

(w − s2)d−1+ dw

= c40B

2(d, 1− 2d)

∫ 1

0

∫ 1

0

∫ 1

0

|u− v|2d−1|u− w|2d−1 du dv dw

= c40B

2(d, 1− 2d)

∫ 1

0

{(2d)−1(u2d + (1− u)2d)

}2du

=c40B

2(d, 1− 2d)

2d2

( 1

4d + 1+ B(2d + 1, 2d + 1)

).

This completes the proof of Proposition 5.2. 2

Now, let d be a log(n) consistent estimator of d and

Dn := supx∈R

n1−2d|Fn(x)− F0(x)|.

From Proposition 5.2, we readily obtain that asymptotic null distribution of Dn is the same

as that of ‖f0‖∞Y . Consequently, the critical values of the test that rejects H loc0 whenever

Dn/‖f0‖∞ is large may be determined from the distribution of Y . Unfortunately this dis-

tribution depends on c0, d in a complicated fashion, and is not easy to track. However, one

may use a Monte Carlo method to simulate the distribution of this r.v. corresponding to

c0 = c0, d = d, where c0 is a consistent estimator of c0.

Higher order moments of the limiting r.v. Y can be computed using the diagram formula

for Wiener-Ito integrals; see e.g. Major (1981, Corollary 5.4), Surgailis (2003b, Prop. 5.1),

although the resulting formulas are rather cumbersome. The characteristic function of r.v.

Z(2) and the representation of it as a infinite weighted sum of chi-squares was obtained by

Rosenblatt (1961). See also Taqqu (1975, (6.2)).

6 Appendix

Proof of (4.4). For s ∈ Rp, z, y ∈ R put

W (s, z; y) := n−1/2−d

n∑i=1

[I(εi ≤ y + (c′nis + ‖cni‖z)nd−1/2)− I(εi ≤ y)

− f0(y)(c′nis + ‖cni‖z)nd−1/2]

Fact (4.3) applied with ξni = c′nis + ‖cni‖z implies

supy|W (s, z; y)| = op(1), ∀ s ∈ Rp, z ∈ R.(6.1)

16

Note W (s; y) := W (s, 0; y) is the empirical process on the left hand side of (4.4). It

suffices to show that for any ε > 0 there exists an Nε ≥ 1 such that for any n > Nε and for

every 0 < b < ∞,

(6.2) P(

supy∈R,‖t‖≤b

|W (t; y)| > 2ε)

< 2ε.

Fix a 0 < b < ∞ and an ε > 0. Because the ball Kb := {s ∈ Rp; ‖s‖ ≤ b} is compact,

for any δ > 0 there exist k = k(δ, b) and points s1, · · · , sk in Kb such that for any t ∈ Kb,

‖t− sj‖ < δ, for some j = 1, 2, · · · , k. Hence,

P(

supy∈R,‖t‖≤b

|W (t; y)| > 2ε)

(6.3)

≤k∑

j=1

P(

supt∈Kb,‖t−sj‖<δ,y∈R

|W (t; y)−W (sj; y)| > ε)

+k∑

j=1

P(

supy|W (sj; y)| > ε

).

Note that for any t ∈ Rp such that ‖t− sj‖ < δ,

c′nisj − ‖cni‖δ ≤ c′nit ≤ c′nisj + ‖cni‖δ, ∀ 1 ≤ i ≤ n.

Let δ1 := ε/(18C‖f‖∞), where C is such that maxi ‖cni‖ ≤ C. The above inequality and the

monotonicity of the indicator function imply that for all δ < δ1, and for all ‖t− sj‖ < δ,

|W (t; y)−W (sj; y)| ≤ |W (sj, δ1; y)|+ |W (sj,−δ1; y)|+ n−1

n∑i=1

‖cni‖f0(y) (2δ1 + δ)

≤ |W (sj, δ1; y)|+ |W (sj,−δ1; y)|+ 3Cf0(y)δ1.

Introduce the events

Anj :={

sup‖t−sj‖<δ;y∈R

|W (t; y)−W (sj; y)| > ε}

,

Bnj :={

supy|W (sj, δ1; y)|+ sup

y|W (sj,−δ1; y)| > ε/2

}.

Then, by (6.1), applied with z = ±δ1, there is an Nj,ε such that for all n > Nj,ε,

P (Bnj) ≤ ε/k, for each j = 1, · · · , k.

Let Bcnj denote the complement of Bnj. On Bc

nj,

sup‖t−sj‖<δ;y∈R

|W (t; y)−W (sj; y)| ≤ ε/2 + 3C‖f0‖∞ δ1, = ε/2 + ε/6 = 2ε/3 < ε,

for all j = 1, · · · , k. Hence, for all n > Nε := max1≤j≤k Nj,ε,

P (Anj) = P (Anj ∩ Bnj) + P (Anj ∩ Bcnj) ≤ ε/k, j = 1, · · · , k.

17

Similarly as above, by (6.1) applied with z = 0, there is an Mε such that for all n > Mε,

P(sup

y|W (sj; y)| > ε

)< ε/k, j = 1, · · · , k.

Hence, for all n > Nε ∨Mε,

P(

supy∈R,‖t‖≤b

|W (t; y)| > 2ε)

≤k∑

j=1

P (Anj) +k∑

j=1

P(sup

y|W (sj; y)| > ε

)

< 2ε,

proving (6.2) and hence, (4.4). 2

Proof of Lemma 4.1. Recall Eζ20 = 1. For a given a ∈ Rp, introduce a normal r.v.

U ∼ N (0, a′Σa). In (1.2), define bj = 0, j < 0, and let

bnj := n−d−1/2

n∑i=1

a′cnibi−j, Un := n−d−1/2

n∑i=1

a′cniεi =∑

j∈Zbnjζj.(6.4)

Then, Var(Un) = a′Σna → a′Σa = Var(U) and the statement of the lemma translates to

Un →D U .

Introduce truncated i.i.d. r.v.’s ζi,K := ζiI(|ζi| ≤ K) − EζiI(|ζi| ≤ K) (i ∈ Z), and let

Un,K be defined as in (6.4) with ζj’s replaced by ζj,K ’s. Then E(Un − Un,K)2 = Var(ζ0 −ζ0,K)Var(Un) → 0, as K → ∞, uniformly in n ≥ 1. Therefore, it suffices to show the

asymptotic normality of Un,K instead of Un, for each K fixed. Since ζi,K are bounded r.v.’s,

it thus suffices to prove the claimed result under the assumption that the r.v. ζi’s are bounded

having all moments finite. Then Un →D U follows if we show that

(6.5) Cumk(Un) = o(1), ∀ k = 3, 4, · · · ,

where Cumk(Y ) denotes the kth order cumulant of the r.v. Y . Let χk := Cumk(ζ0).

According to the multilinearity and additivity properties of cumulants,

Cumk(Un) = χk

j∈Zbknj ≤ |χk| sup

j∈Z|bnj|k−2Var(Un)

since∑

j∈Z b2nj = Var(Un), see (6.4). Therefore, (6.5) follows from

(6.6) supj∈Z

|bnj|k−2 = o(1), ∀ k = 3, 4, · · · .

But,

|bnj| ≤ Cn−d−1/2

n∑i=1

(i− j)d−1+ = O(n−1/2) = o(1), ∀ j ∈ Z.

This proves (6.6) and also Lemma 4.1. 2

18

Acknowledgements

The authors are grateful to the two referees for valuable comments and careful reading of

the manuscript.

References

Baillie, R.T., 1996. Long memory processes and fractional integration in econometrics. J.

Econometrics 73, 5–59.

Beran, J., 1992. Statistical methods for data with long-range dependence. (With discus-

sion). Statistical Science 7, 404-427.

Beran, J., 1994. Statistics for Long Memory Processes. Monographs on Statistics and

Applied Probability, vol. 61. Chapman and Hall, New York.

Chan, N.G., Ling, S., 2008. Residual empirical processes for long and short memory time

series. Ann. Statist., 36, 2453–2470.

D’Agostino, R.B., Stephens, M. (Eds.), 1986. Goodness-of-fit techniques. Statistics: Text-

books and Monographs, vol. 68. Marcel Dekker, New York.

Dalla, V., Giraitis, L., Hidalgo, J., 2006. Consistent estimation of the memory parameter

for nonlinear time series. J. Time Ser. Anal. 27, 211–251.

Davydov, Y., 1970. The invariance principle for stationary process. Theory Probab. Appl.

15, 145–180.

Dehling, H., Taqqu, M.S., 1989. The empirical process of some long-range dependent

sequences with an application to U-statistics. Ann. Statist. 17, 1767–1783.

Dehling, H., Mikosch, T., Sørensen, M. (Eds.), 2002. Empirical Process Techniques for

Dependent Data. Birkhauser, Boston.

Doukhan, P., Oppenheim, G., Taqqu, M.S. (Eds.), 2003. Theory and Applications of Long-

Range Dependence. Birkhauser, Boston.

Durbin, J., 1973. Distribution theory for test based on the sample d.f. SIAM, Philadelphia.

Durbin, J., 1975. Kolmogorov-Smirnov tests when parameters are estimated with applica-

tions to tests of exponentiality and tests on spacings. Biometrika 62, 5–22.

Giraitis, L., Surgailis, D., 1990. A central limit theorem for quadratic forms in strongly

dependent linear variables and its application to asymptotic normality of Whittle’s

estimate. Prob. Theory Relat. Fields 86, 87-104.

Giraitis, L., Koul, H.L., Surgailis, D., 1996. Asymptotic normality of regression estimators

with long memory errors. Statist. Probab. Lett. 29, 317–335.

19

Ho, H.-C., Hsing, T., 1996. On the asymptotic expansion of the empirical process of long

memory moving averages. Ann. Statist. 24, 992–1024.

Khmaladze, E.V., 1979. The use of ω2 tests for testing parametric hypotheses. Theory

Probab. Appl. 24, 283–301.

Khmaladze, E.V., 1981. Martingale approach in the theory of goodness-of-fit tests. Theory

Probab. Appl. 26, 240–257.

Koul, H.L., 2002. Weighted Empirical Processes in Dynamic Nonlinear Models. Second

Edition. Lecture Notes in Statistics, vol. 166. Springer, New York.

Koul, H.L., Mukherjee, K., 1993. Asymptotics of R-, MD- and LAD-estimators in linear

regression models with long range dependent errors. Probab. Theory Related Fields

95, 535–553.

Koul, H.L., Surgailis, D., 2002. Asymptotic expansion of the empirical process of long

memory moving averages. In: Dehling, H., Mikosch, T., Sørensen, M. (Eds.), Empirical

Process Techniques for Dependent Data. Birkhauser, Boston, pp. 213–239.

Koul, H.L., Surgailis, D., 2003. Robust estimators in regression models with long memory

errors. In: Doukhan, P., Oppenheim, G., Taqqu, M.S. (Eds.), Theory and Applications

of Long-Range Dependence, Birkhauser, Boston, pp. 339–353.

Major, P., 1981. Multiple Wiener-Ito Integrals. Lecture Notes in Mathematics., vol. 849.

Springer, Berlin.

Rosenblatt, M., 1961. Independence and dependence. In: Proc. 4th Berkeley Symp. Math.

Statist. Probab. University of California Press, Berkeley, pp. 411–443.

Surgailis, D., 2003a. Non-CLTs: U-statistics, multinomial formula and approximations of

multiple Ito-Wiener integrals. In: Doukhan, P., Oppenheim, G., Taqqu, M.S. (Eds.),

Theory and Applications of Long-Range Dependence. Birkhauser, Boston, pp. 129–

142.

Surgailis, D., 2003b. CLTs for polynomials of linear sequences: diagram formula with

illustrations. In: Doukhan, P., Oppenheim, G., Taqqu, M.S. (Eds.), Theory and Ap-

plications of Long-Range Dependence, Birkhauser, Boston, pp. 111–127.

Taqqu, M.S, 1975. Weak convergence to Fractional Brownian Motion and to the Rosenblatt

Process. Z. Wahrsch. verw. Gebiete 31, 287–302.

Tsay, R.S., 2002. Analysis of Financial Time Series. Wiley series in Probability and

Statistics, New York.

20

Hira L. Koul

Department of Statistics & Probability

Michigan State University

East Lansing, MI 48824-1027, U. S. A.

[email protected]

Donatas Surgailis

Vilnius Institute of Mathematics

& Informatics

08663 Vilnius, Lithuania

[email protected]

21