Understanding Regressions with Observations Collected …koho/english/events/20140315/park.pdf · Understanding Regressions with Observations Collected ... on the unit interval [0;1]

Understanding Regressions with Observations Collected

at High Frequency over Long Span

Yoosoon Chang

Department of Economics, Indiana University

Joon Y. Park

Department of Economics, Indiana University & Sungkyunkwan University

Abstract

In this paper, we consider regressions with observations collected at small time

interval over long period of time. For the formal asymptotic analysis, we as-

sume that samples are obtained from continuous time stochastic processes, and

let the sampling interval δ shrink down to zero and the sample span T diverge

up to infinity. In this set-up, we show that the standard Wald statistic always

diverges to infinity as long as δ → 0 sufficiently fast relative to T → ∞, and

regressions become spurious. This is indeed well expected from our asymp-

totics which shows that, in such a set-up, samples from any continuous time

process become strongly dependent with their serial correlation approaching to

unity, and regressions become spurious exactly as in the conventional spurious

regression. However, as we show in the paper, the spuriousness of Wald test

disappears if we account for strong persistency adequately using an appropriate

longrun variance estimate. The empirical illustrations in the paper provide a

strong and unambiguous support for the practical relevancy of our asymptotic

theory.

Keywords: high frequency regression, spurious regression, continuous time model, asymp-

totics

1

Background and PreliminariesModel and Assumptions

Spuriousness at High FrequencyModified Tests

Understanding Regressions with ObservationsCollected at High Frequency over Long Span

Joon Y. Park

Department of Economics

Indiana University

Conference on Econometrics for Macroeconomics and Finance

Hitotsubashi University

15-16 March 2014

1 / 80 Joon Y. Park Regressions at High Frequency



Objective

To provide a formal analysis of the effect of sampling frequency onstatistical inference to help with the choice of sampling frequencywhen observations are available at many different frequencies.

It is widely, but rather vaguely, believed that there are bothpositive and negative aspects of using high frquency observations.In particular, high frequency observations are thought to give us

I more information

on the one hand, but we have to deal with

I excessive noise and erratic volatility

in high frequency observations.




References

Main Contents

I Chang and Park (2014), Understanding Regressions withObservations Collected at High Frequency over Long Span.

Background Material

I Jeong and Park (2011), Asymptotic Theory of MaximumLikelihood Estimation for Diffusion Model.

I Jeong and Park (2014), Asymptotic Theory of MaximumLikelihood Estimation for Jump Diffusion Model.

I Kim and Park (2014), Asymptotics for Recurrent Diffusionswith Application to High Frequency Regression.

I Lu and Park (2014), Estimation of Longrun Variance ofContinuous Time Stochastic Process Using Discrete Sample.




Contents

Background and PreliminariesEmpirical IllustrationsLimit Theories in Continuous Time

Model and AssumptionsContinuous Time RegressionTechnical Assumptions

Spuriousness at High FrequencyAsymptotics of High Frequency RegressionFurther Analysis of Spuriousness

Modified TestsAsymptotics for Tests with HAC EstimatorsBandwidth Selection in HAC Estimation




Empirical IllustrationsLimit Theories in Continuous Time

Background and Preliminaries





Illustrative Regression and Test

We consider the simple regression model

yi = α+ βxi + ui

for i = 1, . . . , n, and use samples at varying frequencies, collectedat sampling interval δ, to test the null hypothesis

α = 0 or β = 1

using the standard t-statistic.





Models

We consider the following four models

Model (yi) and (xi)

I 20-year T-bond rates and 3-month T-bill ratesII 3-month eurodollar rates and 3-month T-bill ratesIII log US/UK exchange rates forward and spotIV log SP500 index futures and index

as illustrative examples.





t-Test for α in Model I





t-Test for α in Model II





t-Test for α in Model III





t-Test for α in Model IV





t-Test for β in Model I





t-Test for β in Model II





t-Test for β in Model III





t-Test for β in Model IV





Summary

Our test results have some common features, which we maysummarize as

I Test results are heavily dependent upon sampling intervals.

I Test values fluctuate significantly but without any trend untilthe sampling interval reaches one month, and start to increaserather sharply as the sampling interval further decreases.

I At the daily frequency, our tests strongly and unambiguouslyreject both α = 0 and β = 1 in all models.

Clearly, it is not sensible to use daily observations for the t-test inthese models.





Data Plot for Model I





Data Plot for Model II





Data Plot for Model III





Data Plot for Model IV





Summary

We may summarize the sample characteristics of our observationsas

I All time series in our models are either nonstationary ornear-nonstationary. Stock prices and exchange rates areclearly nonstationary. Interest rates are more stationary, butshow some evidence of nonstationarity.

I Existing nonstationarity in stock prices and exchange ratescannot be fully removed by simple detrending ortransformations. However, we may detrend or transform themat least to make them recurrent.

In sum, it seems that recurrent nonstationary or near-nonstationaryprocesses are of most relevant characteristics of our samples.





Stationary Limit Theory

For a wide class of mean zero stationary processes in continuoustime, the continuous time version of invariance principle holds.That is, if we define, for a mean zero stationary process V = (Vt),the normalized process V T = (V T

t ) on the unit interval [0, 1] by

V Tt =

1√T

∫ tT

0Vsds,

we have V T →d V as T →∞, where V is Brownian motion

with variance

Π = limT→∞

1

TE(∫ T

0Vtdt

)(∫ T

0Vtdt

)′> 0,

which is assumed to exist.





Nonstationary Limit Theory

For a general nonstationary process X = (Xt) in continuous time,the normalized process XT = (XT

t ) defined on the unit interval[0, 1] by

XTt = Λ−1T XTt

with some normalizing sequence (ΛT ) of matrices such that‖ΛT ‖ → ∞ as T →∞ is expected to yield functional limit theory

XT →d X

as T →∞ for a well defined limit process X.





Limit Theory for Nonstationary Diffusion

Let X be a diffusion in natural scale with speed density m, whichis regularly varying with index r > −1, and define XT on [0, 1] by

XTt = T−1/(r+2)XTt.

Then we haveXT →d X

for a generalized diffusion X as T →∞ in the space C[0, 1] ofcontinuous functions defined on [0, 1].





Limit Process

The limit process X is defined by

X = B A

with

At = inf

s

∣∣∣∣∫Dm(x)L(s, x)dx > t

for 0 ≤ t ≤ 1, where B is Brownian motion with local time L, andm is the limit homogeneous function of m given by

m(x) =(a1x ≥ 0+ b1x < 0

)|x|r with a and b are

nonnegative constants such that a+ b > 0.





Special Cases

The scaled limit process[(a

r + 1

) 1r+2

+

(b

r + 1

) 1r+2

]X

is a skew Bessel process in natural scale of dimension2(r + 1)/(r + 2), which reduces to

I Bessel process if b = 0

I skew Brownian motion if r = 0, and

I Brownian motion if r = 0 and a = b





Summary

Asymptotics for diffusions and jump diffusions are now fullyavailable in

I Jeong and Park (2011): diffusion asymptotics with propermoment conditions.

I Kim and Park (2014): more general diffusion asymptoticsallowing for nonexisting moments.

I Jeong and Park (2014): standard asymptotics for jumpdiffusion models.

The existing asymptotics for diffusions may well be extended tomore general class of models.




Continuous Time RegressionTechnical Assumptions

Model and Assumptions





Model

We assume that the regression

yi = x′iβ + ui

for i = 1, . . . , n is generated from the underlying continuous timeregression

Yt = X ′tβ + Ut

over 0 ≤ t ≤ T , where Y,X and U are continuous time stochasticprocesses, and

yi = Yiδ, xi = Xiδ, ui = Uiδ,

where δ is the sampling interval and T = nδ is the sample span.





Wald Statistic

For the general linear hypothesis

Rβ = r,

we consider the standard Wald statistic F (β) that is given by

F (β) = (Rβ − r)′R( n∑

i=1

xix′i

)−1R′

−1 (Rβ − r)/σ2,

where β is the OLS estimator for β and σ2 is the usual estimatorfor the error variance obtained from the OLS residuals (ui).





Modified Statistics

In the presence of serial correlation in (ui), the Wald test isgenerally not applicable. In this case, modified versions of theWald statistic such as

G(β) = (Rβ − r)′R( n∑

i=1

xix′i

)−1R′

−1 (Rβ − r)/ω2,

where ω2 is a consistent estimator for the longrun variance of (ui)based on (ui), or

H(β) = (Rβ − r)′R( n∑

i=1

xix′i

)−1Ω

(n∑i=1

xix′i

)−1R′

−1 (Rβ − r),

where Ω is a consistent estimator for the longrun variance of (xiui)based on (xiui).





Asymptotic Framework

Our asymptotics are developed as

δ → 0 and T →∞

jointly. In particular, we assume that δ → 0 sufficiently fast relativeto T →∞, and therefore, our asymptotics yield the same resultsas the sequential asymptotics relying on δ → followed by T →∞.

However, our asymptotics are fundamentally different from thesequential asymptotics in that our asymptotics approximate thefinite sample distributions for all combinations of δ and T , as longas T is large and δ is small relative to T .





Assumption A

Let Z be any element in U,XX ′ or XU . We have∑0≤t≤T

E|∆Zt| = O(T ).

Moreover, if we define ∆δ,T (Z) = sup0≤s,t≤T sup|t−s|≤δ |Zct − Zcs |,then

max

(δ,δ

Tsup

0≤t≤T|Zt|

)= O

(∆δ,T (Z)

)as δ → 0 and T →∞.





Assumption B

We assume that1

T

∫ T

0U2t dt→p σ

2 > 0

as T →∞ for some σ2 > 0.





Assumption C1

We assume that1

T

∫ T

0XtX

′tdt→p M

as T →∞ for some nonrandom matrix M > 0, and we have

1√T

∫ T

0XtUtdt→d N(0,Π)

as T →∞, where

Π = limT→∞

1

TE(∫ T

0XtUtdt

)(∫ T

0XtUtdt

)′> 0,

which is assumed to exist.





Assumption C2

For XT defined on [0, 1] with an appropriately defined nonsingularnormalizing sequence (ΛT ) of matrices as

XTt = Λ−1T XTt,

we assume that XT →d X in the product space of D[0, 1] as

T →∞ with linearly independent limit process X, and if wedefine UT on [0, 1] as

UTt = T−1/2∫ Tt

0Usds,

then UT →d U in D[0, 1] as T →∞, where U is Brownian

motion with variance π2 = limT→∞ T−1E

(∫ T0 Utdt

)2> 0, which

is assumed to exist.36 / 80 Joon Y. Park Regressions at High Frequency




Assumptions D1 and D2

Assumption D1: ∆δ,T (U),∆δ,T (‖XX ′‖)→ 0 and√T∆δ,T (‖XU‖)→ 0 as δ → 0 and T →∞.

Assumption D2: ∆δ,T (U), ‖ΛT ‖2∆δ,T (‖XX ′‖)→ 0 and√T‖ΛT ‖∆δ,T (‖XU‖)→ 0 as δ → 0 and T →∞.





Assumption E

We let U c, the continuous part of U , be a semimartingale given byU c = A+B, where A and B are respectively the boundedvariation and martingale components of U c satisfying

sup0≤s,t≤T

|At −As||t− s|

= Op(aT ) and sup0≤s,t≤T

∣∣[B]t − [B]s∣∣

|t− s|= Op(bT ),

and aT∆δ,T (U)→ 0 and (bT /√T )∆δ,T (U)→ 0 with δ = ∆2

δ,T (U)as δ → 0 and T →∞. Moreover, we assume that∑

0≤t≤T E(∆Ut)4 = O(T ) and T−1[U ]T →p τ

2 for some τ2 > 0 asT →∞.




Asymptotics of High Frequency RegressionFurther Analysis of Spuriousness

Spuriousness at High Frequency





Lemma

Let Assumption A hold. If we define Z = U,XX ′ or XU andzi = Ziδ for i = 1, . . . , n, we have

1

n

n∑i=1

zi =1

T

∫ T

0Ztdt+Op

(∆δ,T (‖Z‖)

)for all small δ and large T .





Theorem 1

Assume Rβ = r and let Assumptions A and B hold. UnderAssumption C1, we have

√T (β − β)→d N

δF (β)→d N′R′(RM−1R′

)−1RN/σ2,

where N =d N(0,M−1ΠM−1

), as δ → 0 and T →∞ jointly

satisfying Assumption D1.





Theorem 2

Assume Rβ = r and let Assumptions A and B hold. UnderAssumptions C2, we have

√TΛ′T (β − β)→d P

δF (β)→d P′R′(RQ−1R′

)−1RP/σ2

where P =(∫ 1

0 XtX′t dt

)−1 ∫ 10 X

t dU

t and Q =

∫ 10 X

tX′t dt, as

δ → 0 and T →∞ jointly satisfying Assumption D2.





Pitfall

Note thatβ →p β

as δ → 0 and T →∞. However, we have

F (β)→p ∞

as δ → 0 and T →∞. Therefore, we are led to falsely reject thenull hypothesis when it is true, if δ is small relative to T .





LLN at High Frequency

The LLN holds for discrete samples, since

1

n

n∑i=1

ui =1

T

N∑i=1

δUiδ

≈ 1

T

∫ T

0Utdt→p 0

as δ → 0 and T →∞.





CLT at High Frequency

However, the CLT for discrete samples fails to hold, since

1√n

n∑i=1

ui =1√δ

1√T

n∑i=1

δUiδ

≈ 1√δ

(1√T

∫ T

0Utdt

)→p ∞

as δ → 0 and T →∞. This is because

cor (ui, ui−1) = cor (Uiδ, U(i−1)δ)→ 1

as δ → 0 and the dependency becomes too strong for the CLT tohold.





Residual Autoregression

To further analyze the serial dependency in (ui), we considerAR(1) regression

ui = ρui−1 + εi,

which is called the residual autoregression, and obtain theasymptotics of the OLS estimator ρ of the AR coefficient ρ.

Given the consistency of the OLS estimator β of β, it isstraightforward to show that the estimated residual AR coefficientρ based on (ui) is asymptotically equivalent to the estimatedresidual AR coefficient ρ, say, based on the OLS residual (ui).





Lemma

Under Assumption E, we have

ρ = 1− τ2δ

σ2+ op(δ)

as δ → 0 and T →∞.

In particular, we haveρ→p 1

as δ → 0 and T →∞. Therefore, (ui) becomes stronglydependent.





Remarks

The speed at which ρ→ 1 depends upon the ratio τ2/σ2. Notethat

1

T[U ]T →p τ

2 and1

T

∫ T

0U2t dt→p σ

2,

which may be interpreted respectively as the mean local variationand the mean global variation. If the local variation is too largerelative to the global variation, ρ approaches to unity very slowly,or it even will not converge to unity at all if the local variation isexcessively large and τ2 =∞ while σ2 <∞.





Estimated Residual AR Coefficient for Model I





Estimated Residual AR Coefficient for Model II





Estimated Residual AR Coefficient for Model III





Estimated Residual AR Coefficient for Model IV





Summary

We may summarize our empirical results as follows.

I In Models I-III, estimated residual AR(1) coefficient ρconverges to unity, evidently and rapidly, as δ → 0.

I In Model IV, estimated residual AR(1) coefficient ρ convergesvery slowly to unity, or may even not converge to unity, asδ → 0. For this model, it seems that the local variation isrelatively too big compared with the magnitude of globalvariation.

Our asymptotic theory very well explains what we observe in ourempirical illustrations.





Conventional Spurious Regression

The conventional spurious regression refers to the regressionbetween two independent random walks, or more generally, anyregression with the regression error that has a unit root and isstrongly dependent. It is well known that, in the conventionalspurious regression, the standard Wald statistic diverges as thesample size increases and yields spurious test results.

The high frequency regressions we consider yield spurious resultsfor the same reason as the conventional spurious regression.However, our high frequency regressions are authentic andrepresent truthful relationships, whereas the conventional spuriousregression has no meaningful interpretation. Therefore, thespuriousness in our regressions is rectifiable.




Asymptotics for Tests with HAC EstimatorsBandwidth Selection in HAC Estimation

HAC Estimation of Longrun Variance

The commonly used longrun variance estimators ω2 and Ω of (ui)and (xiui) may be written as

ω2 =∑|j|≤m

K

(j

m

)γ(j), Ω =

∑|j|≤m

K

(j

m

)Γ(j)

where K is the kernel function, γ(j) = T−1∑

i uiui−j andΓ(j) = T−1

∑i xiuix

′i−jui−j are the sample autocovariances, and

m is the bandwidth parameter that determines the number ofsample autocovariances included in the estimator. In practice, weuse ω2 and Ω defined with (ui) and (xiui) from the OLS residual(ui), which are asymptotically equivalent to ω2 and Ω.





HAC Estimators in Discrete and Continuous Times

If we let V = U or XU , and vi = Viδ, i = 1, . . . , n, then

√δ√n

n∑i=1

vi ≈1√T

∫ T

0Vtdt

as shown earlier, and therefore, we may expect that

δω2 ≈ π2 and δΩ ≈ Π,

where π2 and Π are the longrun variances of U and XUintroduced in Assumptions C1 and C2.





Estimation of Continuous Time Longrun Variance

We may estimate the continuous time longrun variances π2 and Πof U and XU , respectively, using the longrun variance estimatorsof their discrete samples (ui) and (xiui). In fact, Lu and Park(2014) show that

Assumption F: If mδ →∞ and m/n→ 0 as δ → 0 and T →∞,then we have δω2 →p π

2 and δΩ→p Π as δ → and T →∞.

under very mild conditions. Here we just use their result as anassumption. Note in particular that ω2 and Ω do not converge.





Theorem

Assume Rβ = r and let Assumptions A, B and F hold.(a) Under Assumption C1, we have

H(β)→d χ2q

as δ → 0 and T →∞ satisfying Assumption D1.(b) Under Assumption C2, we have

G(β)→d P′R′(RQ−1R′

)−1RP

where P =(∫ 1

0 XtX′t dt

)−1 ∫ 10 X

t dU

t with U = πU

and

Q =∫ 10 X

tX′t dt, as δ → 0 and T →∞ satisfying Assumption D2.





Possible Failure in Bandwidth Choice

Assumption F, which is crucial to have our asymptotics for G(β)and H(β) tests, may not hold if the bandwidth parameter m iscareless chosen in the usual discrete time set-up. If, for instance,we let

m = nκ

for some 0 < κ < 1, Assumption F does not hold. This is becausewe have

mδ = nκδ = T κδ1−κ → 0,

if δ → 0 fast enough so that δ = o(T−κ/(1−κ)). In this case, wehave δω2 →p 0 and δΩ→p 0, and consequently, G(β)→p ∞ and

H(β)→p ∞, exactly as for the standard Wald statistic F (β).





Optimal Bandwidth

The optimal bandwidth parameter m∗, which minimizes theasymptotic mean squared error variance, is given by

m∗ν =

(νK2

νC2ν∫

K(x)2dxn

)1/(2ν+1)

,

where ν is the so-called characteristic exponent,

Kν = limx→0

(1−K(x))/|x|ν , Cν =∑|j|νγ(j)/

∑γ(j).

We have ν = 1 for the Bartlett kernel, and ν = 2 for all othercommonly used kernels such as Parzen, Tukey-Hanning andQuadratic Spectral kernels. Note that ν,Kν , as well as∫K(x)2dx, are determined entirely by the choice of kernel

function K, whereas Cν is given by the covariance structure of theunderlying random sequence (ui).





Andrews’ Automatic Bandwidth

If we assume (ui) is AR(1) with the autoregressive coefficient ρfollowing the suggestion of Andrews (1991), then we have

C21 =

4ρ2

(1− ρ)2(1 + ρ)2, C2

2 =4ρ2

(1− ρ)4

for ν = 1, 2. Therefore, once we fit AR(1) model to (ui) and obtainthe OLS estimate ρ of the residual AR coefficient ρ, we may use itto estimate the constant Cν , and get an estimate for the optimalbandwidth parameter m∗ν from this estimate of Cν , ν = 1, 2.





Asymptotics of Andrews’ Automatic Bandwidth

From our previous lemma on the residual AR coefficient, we mayreadily derive that

C21 =

τ4

σ4δ2+ op(δ

−2), C22 =

4τ8

σ8δ4+ op(δ

−4),

and

m∗1 =

(τ4K2

1

σ4∫K(x)2dx

n

)1/3

δ−2/3(

1 + op(1))

m∗2 =

(8τ8K2

2

σ8∫K(x)2dx

n

)1/5

δ−4/5(

1 + op(1)),

as δ → and T →∞.





Asymptotics of Modified Tests with Automatic Bandwidth

For the optimal bandwidth with the Andrews’ data dependentprocedure, we have

m∗1δ =

(τ4K2

1

σ4∫K(x)2dx

T

)1/3

(1 + op(1))

m∗2δ =

(8τ8K2

2

σ8∫K(x)2dx

T

)1/5

(1 + op(1))

as δ → 0 and T →∞. Consequently, Assumption F holds and wehave proper limit distributions for G(β) and H(β) tests. Note thatthe limit distribution of G(β) test is nonnormal unless theregressor and regression error are asymptotically independent.





Estimated Bandwidth Parameter for Model I





Estimated Bandwidth Parameter for Model II





Estimated Bandwidth Parameter for Model III





Estimated Bandwidth Parameter for Model IV





H-Test for α in Model I





G-Test for α in Model I





H-Test for α in Model II





G-Test for α in Model II





G-Test for α in Model III





G-Test for α in Model IV





H-Test for β in Model I





G-Test for β in Model I





H-Test for β in Model II





G-Test for β in Model II





G-Test for β in Model III





G-Test for β in Model IV





Summary

The test results from the modified tests can be summarized as

I The tests do not reject both hypotheses α = 0 and β = 1 inmost cases. The H-test reject the hypothesis for α in Model Iand for β in Model II. However, it may be due to thenonstationarity of interest rates. If the G-test is used instead,the hypotheses are not rejected.

I We have a good incentive to use high frequency observationsand use modified tests with appropriate choices of bandwidthparameter. Test results become very stable as the samplinginterval decreases and become robust once the samplingfrequency crosses the threshold level.

For fully efficient tests, we should use the optimal bandwidthderived specifically for the underlying process in continuous time.


Documents

Understanding Regressions with Observations Collected …koho/english/events/20140315/park.pdf · Understanding Regressions with Observations Collected ... on the unit interval [0;1]