Asymptotic Inference about Predictive Accuracy using High … · 2017. 6. 21. · Asymptotic Inference about Predictive Accuracy using High Frequency Data Jia Li Department of Economics

Asymptotic Inference about Predictive Accuracy

using High Frequency Data∗

Jia Li

Department of Economics

Duke University

Andrew J. Patton

Department of Economics

Duke University

This Version: June 3, 2017

Abstract

This paper provides a general framework that enables many existing inference methods for

predictive accuracy to be used in applications that involve forecasts of latent target variables.

Such applications include the forecasting of volatility, correlation, beta, quadratic variation,

jump variation, and other functionals of an underlying continuous-time process. We provide

primitive conditions under which a “negligibility” result holds, and thus the asymptotic size

of standard predictive accuracy tests, implemented using a high-frequency proxy for the latent

variable, is controlled. An extensive simulation study verifies that the asymptotic results apply

in a range of empirically relevant applications, and an empirical application to correlation

forecasting is presented.

Keywords: Forecast evaluation, realized variance, volatility, jumps, semimartingale.

JEL Codes: C53, C22, C58, C52, C32.

∗We wish to thank Francis Diebold, Dobrislav Dobrev, Ronald Gallant, Silvia Goncalves, Michael McCracken,

Nour Meddahi, Benoit Perron, Dacheng Xiu, conference and seminar participants at the 2013 NBER summer insti-

tute, 2013 Symposium on Econometric Theory and Applications (SETA), 2013 Econometric Society Asian Meeting,

Cambridge, UCL and University of Pennsylvania for helpful comments and discussions. All errors are ours. Financial

support from the NSF under grants SES-1227448 and SES-1326819 (Li) is gratefully acknowledged. Contact address:

Department of Economics, Duke University, 213 Social Sciences Building, Box 90097, Durham NC 27708-0097, USA.

Email: [email protected] and [email protected].

1

1 Introduction

A central problem in times series analysis is the forecasting of economic variables. In financial

applications, the variables to be forecast are often risk measures, such as volatility, beta, correlation,

and jump characteristics (see Andersen et al. (2006) for a survey). Since the seminal work of Engle

(1982), numerous models have been proposed to forecast risk measures, and these forecasts are

of fundamental importance in financial decisions. The problem of evaluating the performance

of these forecasts is complicated by the fact that many risk measures, although well-defined in

models, are not observable even ex post. A large literature (see West (2006) for a survey) has

evolved presenting methods for (pseudo) out-of-sample inference for predictive accuracy, however

existing work typically relies on the observability of the forecast target. The goal of the current

paper is to provide a general methodology for extending the applicability of forecast evaluation

methods to settings with unobservable forecast target variables.

Inspired by Andersen and Bollerslev (1998), we propose to evaluate competing forecasts with

respect to a proxy of the latent target variable, with the proxy computed from high-frequency

(intraday) data, in the application of forecast evaluation methods. Prima facie, such inference is

not of direct economic interest, in that a good forecast for the proxy may not be a good forecast

of the latent target variable. The gap, formally speaking, arises from the fact that hypotheses

concerning the proxy (which we label “proxy hypotheses”) are not the same as those concerning

the true target variable (i.e., “true hypotheses”). To fill this gap, we consider an asymptotic setting

in which the proxy is constructed using data sampled from asymptotically-increasing frequencies.

Under this setting, the proxy hypotheses can be considered as “local” to the true hypotheses, and

we provide both high-level and primitive sufficient conditions under which the moments that specify

the proxy hypotheses converge sufficiently fast to their counterparts in the true hypotheses. This

convergence leads to an asymptotic negligibility result: forecast evaluation methods using proxies

have the same asymptotic size and power properties under the proxy hypotheses as under the true

hypotheses.

The strategy of using high-frequency proxies to conduct inference has proven successful in

prior work on the estimation of stochastic volatility models. Bollerslev and Zhou (2002) estimate

stochastic volatility models treating the realized variance as the unobserved integrated variance.

Corradi and Distaso (2006) and Todorov (2009) generalize this approach by considering additional

realized measures for the integrated variance using the generalized method of moments (GMM)

2

of Hansen (1982). These authors provide theoretical justifications for this approach by provid-

ing conditions that ensure the asymptotic negligibility of the proxy error in GMM inference for

stochastic volatility models. Realized measures for other volatility functionals have also been used

for parametric and nonparametric estimation of stochastic volatility models: for example, Todorov

et al. (2011) use the realized Laplace transform of volatility (Todorov and Tauchen (2012)) for

estimating parametric stochastic volatility models; Reno (2006), Kanaya and Kristensen (2016)

and Bandi and Reno (2016) consider nonparametric estimation of stochastic volatility models us-

ing spot volatility estimates (Foster and Nelson (1996), Comte and Renault (1998), Kristensen

(2010)).

Our asymptotic negligibility result shares the same nature as that in the important work of

Corradi and Distaso (2006), among others. However, the focus of the current paper is distinct

from aforementioned work in two important aspects. First, compared with (in-sample) GMM

estimation, the out-of-sample forecast evaluation problem has unique complications in the econo-

metric structure. Indeed, even in the case with ex post observable forecast targets, it is well known

that forecast evaluation procedures can be drastically different from each other depending on how

unknown parameters in a forecast model are estimated and updated, on whether the competing

forecast models are nested or nonnested, and on how critical values of tests are computed (e.g.,

via direct estimation or bootstrap); see, for example, Diebold and Mariano (1995), West (1996),

White (2000), McCracken (2000), Hansen (2005), Giacomini and White (2006) and McCracken

(2007), as well as the comprehensive review of West (2006). The apparent idiosyncrasies of these

methods present a nontrivial challenge for designing a general theoretical framework for solving

the latent-target problem for a broad range of evaluation methods. Second, while prior work used

proxies of the volatility or its integrated functionals such as integrated volatility and the volatil-

ity Laplace transform for estimating stochastic volatility models, forecasting applications often

concern a much broader set of risk factors, such as beta, correlation, total quadratic variation,

semivariance and jump variations. The broad practical scope of financial forecasting thus calls for

an extensive analysis on a wide spectrum of risk measures and proxies.

The main contribution of the current paper is to address these two issues in a general and

compact framework. We achieve generality by using two (sets of) high-level conditions that are

designed for bridging two large literatures: forecast evaluation and high-frequency econometrics.

The first set of conditions posit an abstract structure on the forecast evaluation methods; we

show that these conditions are readily verified for many inference methods proposed in the exist-

3

ing literature, including all of the evaluation methods cited above, and can be readily extended

to stepwise testing procedures such as Romano and Wolf (2005) and Hansen et al. (2011). The

second condition concerns the approximation accuracy of the high-frequency proxy relative to the

latent target variable. The main technical contribution of this paper is to verify this condition

under primitive conditions for general classes of high-frequency based estimators of volatility and

jump risk measures in a general Ito semimartingale model for asset prices. In particular, we al-

low for realistic features such as the leverage effect and (active) price and volatility jumps. Our

results cover many existing estimators as special cases, such as realized variation (Andersen et al.

(2003)), truncated variation (Mancini (2001)), bipower variation (Barndorff-Nielsen and Shephard

(2004b)), realized covariation, beta and correlation (Barndorff-Nielsen and Shephard (2004a)),

realized Laplace transform (Todorov and Tauchen (2012)), general integrated volatility function-

als (Jacod and Protter (2012), Jacod and Rosenbaum (2013)), realized skewness, kurtosis and

their extensions (Lepingle (1976), Jacod (2008), Amaya et al. (2011)), and realized semivariance

(Barndorff-Nielsen et al. (2010) and Patton and Sheppard (2013)). These technical results may be

useful for other applications as well (e.g., Corradi and Distaso (2006) and Todorov (2009)).

The existing literature includes some work on forecast evaluation for latent target variables

using proxy variables. In their seminal work, Andersen and Bollerslev (1998) advocated using

realized variance as a proxy for evaluating volatility forecast models; see also Andersen et al.

(2003) and Andersen et al. (2005). A theoretical justification for this approach was proposed by

Hansen and Lunde (2006) and Patton (2011), based on restrictions on the loss function used for

comparison and the availability of conditionally unbiased proxies. Their unbiasedness condition

must hold in finite samples, which is hard to verify except for certain cases: it may be plausible

for realized variance in some applications, but is unlikely to hold for other realized measures (such

as jump-robust measures of volatility like bipower variation, or ratios of measures like realized

correlation). In contrast, our framework extends the insight of prior work with an asymptotic

argument and is applicable for most known high-frequency based estimators.

We note that our asymptotic negligibility result reflects a simple and robust intuition: the

approximation error in the high-frequency proxy will be negligible when it is small in comparison

with the “intrinsic” sampling variability for forecast evaluation that would arise even in situations

with observable targets. Since the ex post measurement of latent risks is generally much easier than

their ex ante prediction, this intuition and, hence, our asymptotic formalization, should be relevant

in many empirical settings. To judge the performance of the asymptotic results, we conduct three

4

distinct and realistically calibrated Monte Carlo studies. The Monte Carlo evidence is supportive

for our theory.

We illustrate the usefulness of our approach in an empirical example for evaluating forecasts

of the conditional correlation between stock returns. Correlation forecasting is of substantial

importance in practice (Engle (2008)) but existing evaluation methods (see, e.g., Hansen and

Lunde (2006), Patton (2011)) are silent on how rigorous forecast evaluation can be conducted.

We consider four forecasting methods, starting with the popular dynamic conditional correlation

(DCC) model of Engle (2002). We then extend this model to include an asymmetric term, as in

Cappiello et al. (2006), which allows correlations to rise more following joint negative shocks than

other shocks, and to include the lagged realized correlation matrix, which enables the model to

exploit higher frequency data, in the spirit of Noureldin et al. (2012). We find evidence, across a

range of correlation proxies, that including high frequency information in the forecast model leads

to out-of-sample gains in accuracy, while the inclusion of an asymmetric term does not lead to

such gains.

This paper is organized as follows. Section 2 presents the econometric setting. Section 3

presents the asymptotic properties of generic forecast evaluation methods using proxies under a

high-level condition, and in Section 4 we provide results for verifying the high-level condition for

a variety of high-frequency proxies under primitive conditions. We discuss further extensions to

other evaluation methods in Section 5. Monte Carlo results and an empirical application are in

Sections 6 and 7, respectively. All proofs are in the appendix.

All limits below are for T →∞. We useP−→ to denote convergence in probability and

d−→ to

denote convergence in distribution. All vectors are column vectors. For any matrix A, we denote

its transpose by Aᵀ and its (i, j) component by Aij . The (i, j) component of a matrix-valued

stochastic process At is denoted by Aij,t. We write (a, b) in place of (aᵀ, bᵀ)ᵀ. The jth component

of a vector x is denoted by xj . For x, y ∈ Rq, q ≥ 1, we write x ≤ y if and only if xj ≤ yj for

every j ∈ 1, . . . , q. For a generic variable X taking values in a finite-dimensional space, we use

κX to denote its dimensionality; the letter κ is reserved for such use. We use ‖·‖ to denote the

Euclidean norm of a vector, where a matrix is identified as its vectorized version. For each p ≥ 1,

‖·‖p denotes the Lp norm. We use to denote the Hadamard product between two identically

sized matrices, which is computed simply by element-by-element multiplication. The notation ⊗

stands for the Kronecker product. For two sequences of strictly positive real numbers at and bt,

t ≥ 1, we write at bt if and only if the sequences at/bt and bt/at are both bounded.

5

2 The setting

2.1 A motivating example

We start with a simple motivating example that concerns one-period-head volatility forecasting

and use this to illustrate the key concepts of our framework, and in the next section we present

the general setting that we use for the remainder of the paper.

Let (σt)t≥0 denote the stochastic volatility process of an asset and normalize the unit of time to

be one day. Since Andersen and Bollerslev (1998), the integrated volatility IVt ≡∫ tt−1 σ

2sds has been

widely used as a model-free measure of volatility. Although the IV is defined in continuous time, it

is typical in practice to construct forecasts for it by using discrete-time models. One leading choice

is the classical GARCH(1,1) model (Bollerslev (1986)) estimated using quasi maximum likelihood

on daily returns rt : t = 1, 2, . . .:

Model 1:

rt = stεt, (εt)t≥1 are i.i.d. N (0, 1) ,

s2t = ω + γs2

t−1 + αr2t−1,

(2.1)

and the resulting volatility forecast at time t + 1 is F1,t+1 = s2t+1. Another popular forecasting

model is the heterogeneous autoregressive (HAR) model (Corsi (2009)) estimated via ordinary

least squares using realized variances (RV), that is,

Model 2:

RV 5mint = b0 + b1RV

5mint−1 + b2

∑5k=1RV

5mint−k

+b3∑22

k=1RV5mint−k + et,

(2.2)

where RV 5mint denotes the RV formed as the sum of squared 5-minute returns within day t and

the volatility forecast at time t+ 1 is F2,t+1 = RV t+1. Our goal in this example is to compare the

predictive accuracy of the two competing IV forecast series, F1,t+1 and F2,t+1, where t ranges over

the out-of-sample period R, . . . , T.

Before discussing the evaluation problem, we make two remarks on these forecasts. Firstly, we

stress that we shall be agnostic about the underlying true dynamics of the volatility process and we

do not assume these forecasting models to be correctly specified. After all, potentially misspecified

models can still produce good forecasts and are widely used in practice. Given this, we are not

interested in the model parameters per se, but instead our focus is on comparing the forecasts that

these models generate. Therefore, our focus is very different from the semiparametric estimation

problems in stochastic volatility models studied by Corradi and Distaso (2006), Todorov (2009),

Todorov et al. (2011), Kanaya and Kristensen (2016) and Bandi and Reno (2016), among others.

6

Secondly, we treat the data that enters into the forecasts “as is,” and do not attempt to consider

what the data sets might converge to. For example, when forming forecasts using Model 2, we

treat the 5-minute RV simply as an observed time series (which forms the conditioning information

set when making forecasts), instead of as an approximation to the unobserved quadratic variation

of the asset price. In other words, we do not aim to evaluate the infeasible forecasts that could be

generated using Model 2 but with the 5-minute RV replaced by the limiting (unobserved) quadratic

variation. Doing so allows us to compare the forecasts from Model 2 with those formed similarly,

say, by fitting the HAR model using the 1-minute RV. This type of comparison is relevant in

applications and, from a theoretical point of view, can only be done meaningfully by treating

these forecasts as two distinct series, instead of as two approximations of the same infeasible

forecast (because the latter would lead to the trivial conclusion that these forecasts are the same

asymptotically).

We now turn to the forecast evaluation problem. Clearly, the inference would be standard

if one could observe the forecast target (i.e., IVt+1); the evaluation methods mentioned in the

Introduction could be directly applied in this case. As an example, consider the Diebold–Mariano

test for equal predictive ability under the absolute deviation loss, that is, a test for the null

hypothesis

H†0 : E [|IVt+1 − F1,t+1|] = E [|IVt+1 − F2,t+1|] , t ∈ R, . . . , T; (2.3)

here, we use † to highlight the dependency on the latent forecast target. If IVt+1 were observable,

one could estimate the expected loss differential between the two forecasts using its sample analogue

DM †T ≡1

P

T∑t=R

(|IVt+1 − F1,t+1| − |IVt+1 − F2,t+1|) , (2.4)

where P = T − R + 1 is the length of the testing sample. Under some mild weak-dependence

assumptions on the series of loss differentials, we have

√P(DM †T − E[DM †T ]

)d−→ N(0, S†), (2.5)

where S† denotes the long-run variance of the loss differential series |IVt+1−F1,t+1|−|IVt+1−F2,t+1|.

Under the null hypothesis H†0 , E[DM †T ] = 0 holds, which can be tested by examining whether DM †T

is statistically different from zero.

The complication here, of course, is that IVt+1 is not directly observed; hence, the testing

procedure above is infeasible. As suggested by Andersen and Bollerslev (1998), a feasible alter-

native to the above procedure is to use an observable proxy Yt+1 in place of IVt+1 for evaluating

7

the forecasts. Possible choices of the proxy include the truncated variation of Mancini (2001) or

the bipower variation of Barndorff-Nielsen and Shephard (2004b) constructed from high-frequency

data, because these realized measures are known to estimate the IV while being robust to the

presence of jumps. The feasible counterpart of DM †T in (2.4) is then given by

DMT =1

P

T∑t=R

(|Yt+1 − F1,t+1| − |Yt+1 − F2,t+1|) .

Applying a central limit theorem on the series |Yt+1 − F1,t+1| − |Yt+1 − F2,t+1|, we have

√P (DMT − E[DMT ])

d−→ N (0, S) , (2.6)

where S denotes the associated long-run variance. In view of (2.6), we can implement a two-sided

t-test at level α which rejects the null hypothesis of

H0 : E [DMT ] ≡ 1

P

T∑t=R

E [|Yt+1 − F1,t+1| − |Yt+1 − F2,t+1|] = 0 (2.7)

if |DMT | > zα/2S1/2T , where ST is a consistent estimator of S and zα/2 is the α/2 upper quantile

of the standard normal distribution.

Although this feasible test is readily implementable, we stress that the hypothesis being tested

(i.e., (2.7)) is different from the original one (i.e., (2.3)), because the former concerns the relative

closeness of the forecasts to the proxy rather than to the true forecast target. To differentiate (2.3)

from (2.7), we refer to them as the true hypothesis and the proxy hypothesis, respectively.

Work in the forecast evaluation literature has established conditions under which the feasible

test has desirable size and power properties under the proxy hypothesis. Our goal is to provide a

set of sufficient conditions under which the feasible test also attains the same rejection probabilities

under the true hypothesis. In the current simple example, a sufficient condition is

√P(E[DMT ]− E[DM †T ]

)= o(1). (2.8)

Indeed, under the condition in (2.8), equation (2.6) implies that

√P(DMT − E[DM †T ]

)d−→ N (0, S) . (2.9)

And so the feasible test has the same rejection probabilities also under the true hypothesis, which

formally justifies the use of the feasible test for testing the true hypothesis.

8

Intuitively, condition (2.8) requires that replacing the forecast target with its proxy leads

to asymptotically negligible difference in the expected loss differential. Since the true and the

proxy hypotheses only depend on the expected loss differentials, we refer to condition (2.8) as a

“convergence-of-hypotheses” condition. This high-level condition is closely related to the conver-

gence rate of the high-frequency proxy Yt+1 towards the target IVt+1 as the sampling interval ∆

of the high-frequency data goes to zero. Section 4 is devoted to providing primitive conditions to

ensure that (2.8) holds.

There are two building blocks underlying the above example, as well as the general theory that

follows. Firstly, we establish (drawing on the large literature on forecast evaluation) the properties

of the feasible test under the proxy hypothesis. In the example above, this relates to equation (2.6).

Secondly, we ensure that the proxy hypothesis is indistinguishable, up to statistical precision, from

the true hypothesis (relating to equation condition (2.8) above), making them effectively the same

for computing rejection probabilities.

Below we substantially generalize each of the two building blocks. Firstly, we accommodate

essentially all leading forecast evaluation procedures in the econometrics literature. Unlike the

Diebold–Mariano test considered above, which arguably has the simplest econometric structure

among such procedures, other evaluation methods can be much more involved as we describe

in Section 3. For example, the asymptotic behavior of the test statistic may depend on how the

forecasts are updated (e.g., using fixed, rolling, or recursive windows); their asymptotic distribution

may be nonstandard; the inference may be done by bootstrap; and the long-run variance estimator

may be inconsistent. Despite these complications, we show that the econometric structure of most

evaluation methods can be cast under a high-level condition that generalizes (2.6) in the above

example and plays the same role in our general framework.

Secondly, we consider a broad class of forecast targets and proxies, beyond the simple realized

variance considered above. This generalization is relevant in practice because researchers are

interested in forecasting not only the IV but also general functionals of volatility (e.g., beta,

correlation and idiosyncratic variance) as well as functionals of jumps (e.g., power variations of

jumps). To this end, we need to characterize, in a proper sense, the proxy accuracy of various

high-frequency estimators so as to verify the convergence-of-hypotheses condition under general

primitive conditions. These results are collected in Section 4.

9

2.2 True hypotheses and proxy hypotheses in a general setting

We now describe the setting for the general framework. Let (Y †t )t≥1 be the time series to be

forecast, which takes values in Y ⊆ RκY . We stress at the outset that Y †t is not observable, but

a proxy Yt is available. At time t, the forecaster uses data Dt ≡ Ds : 1 ≤ s ≤ t to form a

forecast of Y †t+τ , where the horizon τ ≥ 1 is fixed throughout the paper. We consider k competing

sequences of forecasts of Y †t+τ , collected by Ft+τ ≡ (F1,t+τ , . . . , Fk,t+τ ). In practice, Ft+τ is often

constructed from forecast models that involve some parameter β (e.g., β = (ω, γ, α, b0, b1, b2, b3)

for the example in Section 2.1). We write Ft+τ (β) to emphasize such dependence and refer to the

function Ft+τ (·) : β 7→ Ft+τ (β) as the forecast model. Let βt be an estimator constructed using

(possibly a subset of) the dataset Dt and β∗ be its “population” analogue. We do not require the

forecast model to be correctly specified, so we treat β∗ as a pseudo-true parameter (White (1982)).

Two types of forecasts have been considered in the literature: the actual forecast Ft+τ =

Ft+τ (βt) and the population forecast Ft+τ (β∗). While our motivating example in the previous

subsection concerns the evaluation of the actual forecasts, it is also possible to use them to make

inference about Ft+τ (β∗), that is, an inference concerning the forecast model (see, e.g., West

(1996)). Of course, if the researcher is interested in assessing the performance of the actual

forecasts in Ft+τ , he/she can treat the actual forecast as an observable sequence (see, e.g., Diebold

and Mariano (1995) and Giacomini and White (2006)), which amounts to setting β∗ to be empty.

With this convention, we can use the notation Ft+τ (β∗) in the study of the inference for actual

forecasts without conceptual ambiguity.

Given the target Y †t+τ , the performance of the competing forecasts is measured by f †∗t+τ ≡

ft+τ (Y †t+τ , β∗), where ft+τ (y, β) ≡ f(y, Ft+τ (β)) for some known measurable Rκf -valued function

f(·). The function f(·) plays the role of an evaluation measure. Typically, f(·) computes the

loss differential between competing forecasts: for example, f(y, (F1, F2)) = |y − F1| − |y − F2|

corresponds to the absolute deviation loss that is used in Section 2.1. The proxy of f †∗t+τ is given

by f∗t+τ ≡ ft+τ (Yt+τ , β∗), which in turn can be estimated by ft+τ ≡ ft+τ (Yt+τ , βt). We then set

f †∗T ≡ P−1

T∑t=R

f †∗t+τ , f∗T ≡ P−1T∑t=R

f∗t+τ , fT ≡ P−1T∑t=R

ft+τ , (2.10)

where T is the size of the full sample, P = T − R + 1 is the size of the prediction sample and R

is the size of the estimation sample.1 In the sequel, we always assume P T as T →∞ without

1The notations PT and RT may be used in place of P and R. We follow the literature and suppress the dependence

on T . The estimation and prediction samples are often called the in-sample and (pseudo-) out-of-sample periods.

10

further mention, while R may be fixed or diverge to ∞, depending on the application.

We now turn to the hypotheses of interest. We consider two classical testing problems for

forecast evaluation: testing for equal predictive ability (one-sided or two-sided) and testing for

superior predictive ability. Formally, we consider the following hypotheses: for some user-specified

constant χ ∈ Rκf ,

Equal

Predictive Ability

(EPA)

H†0 : E[f †∗T ] = χ,

vs. H†1a : lim infT→∞ E[f †∗j,T ] > χj for some j ∈ 1, . . . , κf ,

or H†2a : lim infT→∞ ‖E[f †∗T ]− χ‖ > 0,

(2.11)

Superior

Predictive Ability

(SPA)

H†0 : E[f †∗T ] ≤ χ,

vs. H†a : lim infT→∞ E[f †∗j,T ] > χj for some j ∈ 1, . . . , κf ,(2.12)

where H†1a (resp. H†2a) in (2.11) is the one-sided (resp. two-sided) alternative. In practice, the

constant χ is usually set to be zero.2 Note that despite their assigned labels, these hypotheses can

also be used to test for forecast encompassing and forecast rationality by setting the function f(·)

properly; see, for example, West (2006).

Since the hypotheses in (2.11) and (2.12) rely on the true forecast target Y †t , we refer to them

as the true hypotheses. These hypotheses allow for data heterogeneity and are cast in the same

fashion as in Giacomini and White (2006). Under (mean) stationarity, these hypotheses coincide

with those considered by Diebold and Mariano (1995), West (1996) and White (2000), among

others. Clearly, if Y †t were observable, these existing inference methods could be applied to test

the true hypotheses by forming test statistics based on ft+τ (Y †t+τ , βt). However, the latency of Y †t

renders these inference methods infeasible.

Feasible versions of these tests can be implemented with Y †t+τ replaced by Yt+τ . However, as

we illustrated in the previous section, the hypotheses underlying the feasible inference procedure

are proxy hypotheses given by

Proxy Equal

Predictive Ability

(PEPA)

H0 : E

[f∗T]

= χ,

vs. H1a : lim infT→∞ E[f∗j,T ] > χj for some j ∈ 1, . . . , κf ,

or H2a : lim infT→∞ ‖E[f∗T ]− χ‖ > 0,

(2.13)

2Allowing χ to be nonzero incurs no additional cost in our derivations. This flexibility is particularly useful in

the design of Monte Carlo experiment that examines the finite-sample performance of the asymptotic theory below;

see Section 6 for details.

11

Proxy Superior

Predictive Ability

(PSPA)

H0 : E[f∗T ] ≤ χ,

vs. Ha : lim infT→∞ E[f∗j,T ] > χj for some j ∈ 1, . . . , κf .(2.14)

These hypotheses are not of immediate economic relevance, because economic agents are, by as-

sumption, interested in forecasting the true target Y †t+τ , rather than its proxy.

Below, we provide conditions under which the moments that define the proxy hypotheses

converge “sufficiently fast” to their equivalents under the true hypotheses, and we show that tests

which are valid under the former are also valid under the latter.

3 Forecast evaluation methods with proxies

In this section, we present the asymptotic properties of the feasible evaluation methods using

proxies. In Section 3.1, we focus on testing proxy hypotheses and introduce high-level conditions

that link many apparently distinct tests of predictive accuracy into a unified framework. Doing

so greatly simplifies the presentation in Section 3.2, where we show that the feasible tests using

proxies are also asymptotically valid under the true hypotheses. This result relies on a high-level

“convergence-of-hypotheses” condition, which can be verified under primitive conditions using the

convergence rate results that we develop in Section 4.

3.1 Conditions on evaluation methods based on proxies

In this subsection, we introduce an abstract econometric structure that we show is common to

most forecast evaluation procedures with an observable forecast target, the role of which is played

by the proxy Yt in the setting of the current paper. These conditions speak to the proxy hypotheses

PEPA and PSPA, but not the true hypotheses. We link these conditions to the true hypotheses

in Section 3.2.

We consider a test statistic of the form

ϕT ≡ ϕ(aT (fT − χ), a′TST ) (3.1)

for some measurable function ϕ : Rκf × S 7→ R, where aT → ∞ and a′T are known deterministic

sequences, fT is defined in (2.10) and ST is a sequence of S-valued estimators that is mainly used

for studentization.3 In almost all cases, aT = P 1/2 and a′T ≡ 1; recall that P increases with T . An

3The space S changes across applications, but is always implicitly assumed to be a Polish space.

12

exception is given by Example 3.4 below. In many applications, ST plays the role of an estimator

of some asymptotic variance, which may or may not be consistent (see Example 3.2 below); S is

then the space of positive definite matrices.

Let α ∈ (0, 1) be the significance level of a test. We consider a (nonrandomized) test of the

form φT = 1ϕT > zT,1−α, that is, we reject the null hypothesis when the test statistic ϕT is

greater than some critical value zT,1−α. We now introduce some high-level assumptions.

Assumption A1: (aT (fT−E[f∗T ]), a′TST )d−→ (ξ, S) for some deterministic sequences aT →∞

and a′T , and random variables (ξ, S). Here, (aT , a′T ) may be chosen differently under the null and

the alternative hypotheses, but ϕT is invariant to such choice.

Assumption A1 mainly posits that fT is centered at E[f∗T ] with a well-behaved asymptotic

distribution. Since E[f∗T ] characterizes the proxy hypotheses (recall (2.13) and (2.14)), Assumption

A1 concerns an evaluation problem with the observed proxy instead of the latent true target. This

assumption can be verified for many existing methods that involve observable forecast targets; for

example (2.8) in our motivating example is a special case of Assumption A1. In this basic case,

Assumption A1 is verified by using a (feasible) central limit theorem on the observed time series

of proxy loss differentials for which general primitive conditions are well known in econometrics.

Below we first discuss a generalized version of it, and then introduce a battery of additional

examples that involve various complications that arise in forecast evaluation problems, and describe

how to verify Assumption A1 in each of them.

Example 3.1: Giacomini and White (2006) consider tests for equal predictive ability between

two sequences of actual forecasts, or “forecast methods” in their terminology, assuming R fixed.

In this case, f(Yt, (F1,t, F2,t)) = L(Yt, F1,t) − L(Yt, F2,t) for some loss function L(·, ·). Moreover,

one can set β∗ to be empty and treat each actual forecast as an observed sequence, so fT =

f∗T . Using a CLT for heterogeneous weakly dependent data, one can take aT = P 1/2 and verify

aT (fT − E[fT ])d−→ ξ, where ξ is centered Gaussian with long-run variance denoted by Σ. We

then set S = Σ and a′T ≡ 1, and let ST be a heteroskedasticity and autocorrelation consistent

(HAC) estimator of S (Newey and West (1987), Andrews (1991)). Assumption A1 then follows

from Slutsky’s lemma. Diebold and Mariano (1995) intentionally treat the actual forecasts as

primitives without introducing the forecast model (and hence β∗); their setting is also covered by

Assumption A1 by the same reasoning.

13

Example 3.2: Consider the same setting as in Example 3.1, but let ST be an inconsistent

long-run variance estimator of Σ as considered by, for example, Kiefer and Vogelsang (2005). Using

their theory, we verify (P 1/2(fT − E[fT ]), ST )d−→ (ξ, S), where S is a (nondegenerate) random

matrix and the joint distribution of ξ and S is known, up to the unknown parameter Σ, but is

nonstandard.

Example 3.3: West (1996) considers inference for nonnested forecast models in a setting with

R→∞. West’s Theorem 4.1 shows that P 1/2(fT−E[f∗T ])d−→ ξ, where ξ is centered Gaussian with

its variance-covariance matrix denoted here by S, which captures both the sampling variability of

the forecast error and the discrepancy between βt and β∗. We can set ST to be the consistent

estimator of S as proposed in West’s comment 6 to Theorem 4.1. Assumption A1 is then verified

by using Slutsky’s lemma for aT = P 1/2and a′T ≡ 1. West’s theory relies on the differentiability

of the function ft+τ (·) with respect to β and concerns βt in the recursive scheme. Similar results

allowing for a nondifferentiable ft+τ (·) function can be found in McCracken (2000). Giacomini and

Rossi (2009) generalize West’s theory to settings without covariance stationarity. Assumption A1

can be verified similarly in these more general settings.

Example 3.4: McCracken (2007) considers inference on nested forecast models allowing for

recursive, rolling, and fixed estimation schemes, all with R→∞. The evaluation measure ft+τ is

the difference between the quadratic losses of the nesting and the nested models. For his OOS-t

test, McCracken proposes using a normalizing factor ΩT = P−1∑T

t=R(ft+τ−fT )2 and considers the

test statistic ϕT ≡ ϕ(P fT , P ΩT ), where ϕ(u, s) = u/√s. Implicitly in his proof of Theorem 3.1, it

is shown that under the null hypothesis of equal predictive ability, (P (fT−E[f∗T ]), P ΩT )d−→ (ξ, S),

where the joint distribution of (ξ, S) is nonstandard and is specified as a function of a multivariate

Brownian motion. Assumption A1 is verified with aT = P , a′T ≡ P and ST = ΩT . The nonstandard

rate arises as a result of the degeneracy between correctly specified nesting models. Under the

alternative hypothesis, it can be shown that Assumption A1 holds for aT = P 1/2 and a′T ≡ 1,

as in West (1996). Clearly, the OOS-t test statistic is invariant to the change of (aT , a′T ), that

is, ϕT = ϕ(P 1/2fT , ΩT ) holds. Assumption A1 can also be verified for various extensions of

McCracken (2007); see, for example, Inoue and Kilian (2004), Clark and McCracken (2005) and

Hansen and Timmermann (2012).

Example 3.5: White (2000) considers a setting similar to West (1996), with an emphasis on

considering a large number of competing forecasts, but uses a test statistic without studentization.

14

Assumption A1 is verified similarly as in Example 3.3, but with ST and S being empty.

Assumption A2: ϕ (·, ·) is continuous almost everywhere under the law of (ξ, S).

Assumption A2 is satisfied by all standard test statistics used in forecast evaluation: for simple

pair-wise forecast comparisons, the test statistic usually takes the form of t-statistic, that is,

ϕt-stat(ξ, S) = ξ/√S. For joint tests it may take the form of a Wald-type statistic, ϕWald(ξ, S) =

ξᵀS−1ξ, or a maximum over individual (possibly studentized) test statistics ϕMax(ξ, S) = maxi ξi

or ϕStuMax(ξ, S) = maxi ξi/√Si.

Assumption A2 imposes continuity on ϕ (·, ·) in order to facilitate the use of the continuous

mapping theorem for studying the asymptotics of the test statistic ϕT . More specifically, under

the null hypothesis of PEPA, which is also the null least favorable to the alternative in PSPA

(White (2000), Hansen (2005)), Assumption A1 implies that (aT (fT − χ), a′TST )d−→ (ξ, S). By

the continuous mapping theorem, Assumption A2 then implies that the asymptotic distribution of

ϕT under this null is ϕ(ξ, S). The critical value of a test at nominal level α is given by the 1− α

quantile of ϕ(ξ, S), on which we impose the following condition.

Assumption A3: The distribution function of ϕ (ξ, S) is continuous at its 1 − α quantile

z1−α. Moreover, the sequence zT,1−α of critical values satisfies zT,1−αP−→ z1−α.

The first condition in Assumption A3 is very mild. Assumption A3 is mainly concerned with

the availability of the consistent estimator of the 1− α quantile z1−α. This assumption is slightly

stronger than what we actually need. Indeed, we only need the convergence to hold under the null

hypothesis, while, under the alternative, we only need the sequence zT,1−α to be tight.

Below, we discuss examples for which Assumption A3 can be verified.

Example 3.6: In many cases, the limit distribution of ϕT under the null of PEPA is standard

normal or chi-square with some known number of degrees of freedom. Examples include tests

considered by Diebold and Mariano (1995), West (1996) and Giacomini and White (2006). In

the setting of Example 3.2 or 3.4, ϕT is a t-statistic or Wald-type statistic, with an asymptotic

distribution that is nonstandard but pivotal, with quantiles tabulated in the original papers.4

4One caveat is that the OOS-t statistic in McCracken (2007) is asymptotically pivotal only under the somewhat

restrictive condition that the forecast errors form a conditionally homoskedastic martingale difference sequence.

In the presence of conditional heteroskedasticity or serial correlation in the forecast errors, the null distribution

15

Assumption A3 for these examples can be verified by simply taking zT,1−α as the known quantile

of the limit distribution.

Example 3.7: White (2000) considers tests for superior predictive ability. Under the null

least favorable to the alternative, White’s test statistic is not asymptotically pivotal, as it depends

on the unknown covariance matrix of the limit variable ξ. White suggests computing the critical

value via either simulation or the stationary bootstrap (Politis and Romano (1994)), corresponding

respectively to his “Monte Carlo reality check” and “bootstrap reality check” methods. In particu-

lar, under stationarity, White shows that the bootstrap critical value consistently estimates z1−α.5

Hansen (2005) considers test statistics with studentization and shows the validity of a refined

bootstrap critical value, under stationarity. The validity of the stationary bootstrap holds in more

general settings allowing for moderate heterogeneity (Goncalves and White (2002), Goncalves and

de Jong (2003)). We hence conjecture that the bootstrap results of White (2000) and Hansen

(2005) can be extended to a setting with moderate heterogeneity, although a formal discussion is

beyond the scope of the current paper. In these cases, the simulation- or bootstrap-based critical

value can be used as zT,1−α in order to verify Assumption A3.

Finally, we need two alternative sets of assumptions on the test function ϕ (·, ·) for one-sided

and two-sided tests, respectively.

Assumption B1: For any s ∈ S, we have (i) ϕ(u, s) ≤ ϕ(u′, s) whenever u ≤ u′, where

u, u′ ∈ Rκf ; (ii) ϕ(u, s)→∞ whenever uj →∞ for some 1 ≤ j ≤ κf and s→ s.

Assumption B2: For any s ∈ S, ϕ(u, s)→∞ whenever ‖u‖ → ∞ and s→ s.

Assumption B1(i) imposes monotonicity on the test statistic as a function of the evaluation

measure, and is used for size control in the PSPA setting. Assumption B1(ii) concerns the consis-

tency of the test against the one-sided alternative and is easily verified for commonly used one-sided

test statistics, such as ϕt-stat, ϕMax and ϕStuMax described in the comment following Assumption

A2. Assumption B2 serves a similar purpose for two-sided tests, and is also easily verifiable.

generally depends on a nuisance parameter (Clark and McCracken (2005)). Nevertheless, the critical values can be

consistently estimated via a bootstrap (Clark and McCracken (2005)) or plug-in method (Hansen and Timmermann

(2012)).5White (2000) shows the validity of the bootstrap critical value in a setting where the sampling error in βt is

asymptotically irrelevant (West (1996), West (2006)). Corradi and Swanson (2007) propose a bootstrap critical value

in the general setting of West (1996), without imposing asymptotic irrelevance.

16

3.2 Asymptotic properties of the feasible inference procedure

In this subsection, we show that the feasible tests described in Section 3.1 are asymptotically

valid under the true hypotheses. Similar to condition (2.8) in our basic example, we need a

convergence-of-hypotheses condition so as to bridge the gap between the proxy hypotheses and the

true hypotheses. In the general setting, this condition is formalized as follows:

Assumption C: aT (E[f∗T ]− E[f †∗T ])→ 0, where aT is given by Assumption A1.

Assumption C is closely related to the approximation accuracy of the proxies. Since we are

interested in proxies constructed using high-frequency data, this condition is mainly related to the

convergence rate of high-frequency estimators, together with the growth rates of the time-series

sample span and the high-frequency sampling frequency. In Section 4, we consider broad classes

of high-frequency proxies Yt and forecast targets Y †t , and show under primitive conditions that

‖Yt − Y †t ‖p ≤ Kdθt , for all t (3.2)

for some constants K > 0 and θ ∈ (0, 1/2], where dt denotes the sampling mesh of the high-

frequency data in day t and ‖·‖p denotes the Lp-norm for p ≥ 1. Given the convergence rate

condition (3.2), Assumption C mainly requires that the sequence (dt)t≥1 of sampling meshes goes

to zero sufficiently fast relative to T → ∞, provided that the evaluation measure f(·) is smooth

in the target variable. Proposition 3.1, below, formalizes this statement and is useful for verifying

Assumption C.

Proposition 3.1. Suppose (i) condition (3.2) holds for some K > 0 and θ ∈ (0, 1/2]; (ii) there

exist a constant h ∈ (0, 1] and a sequence (mt)t≥0 of random variables such that for each t,

‖f(Yt, Ft(β∗)) − f(Y †t , Ft(β

∗))‖ ≤ mt‖Yt − Y †t ‖h; and (iii) supt ‖mt‖p/(p−1) < ∞. The following

statements hold:

(a) E[f∗T ]− E[f †∗T ] = O(T−1

∑Tt=1 d

θht

).

(b) If, in addition, aT T k for some k > 0 and∑T

t=1 tk−1dθht <∞, then Assumption C holds.

Comments. (i) Part (a) of Proposition 3.1 characterizes the rate at which the proxy hypothesis

converges to the true hypothesis. If the sampling mesh dt does not change across days, so that

dt = ∆ identically, then the convergence rate is simply ∆−θh. Typically, the evaluation function

f(·, ·) is stochastically Lipschitz in the forecast target, so h = 1. In addition, as shown in Section

17

4, a majority of high-frequency proxies satisfy (3.2) with θ = 1/2. Hence, the “typical” rate of

convergence of the proxy hypotheses is ∆−1/2.

(ii) As shown in Section 3.1, most (but not all) evaluation methods are associated with aT

T 1/2. In view of the comment above, the “typical” sufficient condition for Assumption C is T∆→ 0.

(iii) More generally, part (b) shows that Assumption C holds if the summability condition∑Tt=1 t

k−1dθht < ∞ holds. This requires that the sampling mesh goes to zero sufficient fast. A

sufficient condition is dT = O(T−k/(θh)(log T )−1/(θh)−η) for some η > 0 that is arbitrarily small

but fixed; see Theorem 2.31 in Davidson (1994).

(iv) Corradi et al. (2011) derived convergence-rate results like (3.2) in the case when high-

frequency data are contaminated by the microstructure noise. These authors show that when the

proxy Yt is the two-scale RV estimator (Zhang et al. (2005)), the multi-scale RV estimator (Zhang

(2006)) or the realized kernel estimator (Barndorff-Nielsen et al. (2008)), (3.2) is satisfied with

θ = 1/6, 1/4 or 1/4, respectively, where Yt is the IV.

In order to facilitate applications, we now illustrate the use of Proposition 3.1 for verifying

Assumption C in concrete examples. Here, we take condition (3.2) as given (see Section 4 for

results on this), and mainly illustrate how to verify condition (ii) in Proposition 3.1. We remind

the reader that P T is a maintained assumption.

Example 3.8: Consider a forecast comparison setting with the evaluation measure being the

loss differential of two competing forecasts, that is, f(Yt, (F1t, F2t)) = L(Yt − F1t) − L(Yt − F2t),

where L(·) is a loss function. If L(·) is Lipschitz (e.g. Lin-Lin loss), then |f(Yt, (F1,t, F2,t)) −

f(Y †t , (F1,t, F2,t))| ≤ K‖Y †t − Yt‖, so that the sequence mt in Proposition 3.1 can be taken to be a

constant.

Example 3.9: Non-Lipschitz loss functions can also be accommodated. Consider the same

setting as in Example 3.8 but with L(·) being the quadratic loss (i.e., L(x) = x2). We have

f(Yt, (F1,t, F2,t))− f(Y †t , (F1,t, F2,t)) = 2(Yt−Y †t )(F2,t−F1,t). If supt≥1(‖F1,t‖q + ‖F2,t‖q) <∞ for

q = p/(p− 1),6 then the conditions in Proposition 3.1 is verified for mt = 2 |F2,t − F1,t|, by the Cr

inequality.

Example 3.10: Consider correlation forecasting for a bivariate asset price process Xt =

6Uniform boundedness on moments are commonly used for deriving asymptotic results for heterogeneous data;

see, for example, White (2001). This condition is trivially satisfied if the forecasts F1,t and F2,t are bounded (e.g.

forecasts for correlations).

18

(X1t, X2t). Let Y †t =∫ tt−1 csds be the integrated covariance matrix and Yt be a proxy of it (see, e.g.,

Theorems 4.1 and 4.2). Following Barndorff-Nielsen and Shephard (2004a), we use the integrated

correlation as a model-free correlation measure, which is defined as H(Y †t ) ≡ Y †12,t/√Y †12,tY

†22,t.

For an evaluation problem under the absolute deviation loss, the associated evaluation mea-

sure is f(Yt, (F1t, F2t)) = |H (Yt) − F1t| − |H (Yt) − F2t|. By the mean-value theorem and the

Cauchy–Schwarz inequality, condition (ii) in Proposition 3.1 is verified for mt ≡ 2‖∇H(Yt)‖,

where Yt is some mean-value between Yt and Y †t . By Jensen’s inequality and Holder’s inequal-

ity, we see that a sufficient condition for condition (iii) of Proposition 3.1 is that the variables

(Y12,t, 1/Y11,t, 1/Y22,t, Y†

12,t, 1/Y†

11,t, 1/Y†

22,t) have bounded qth moment, q = 3p/ (p− 1).

Finally, under the conditions discussed in Sections 3.1 and Assumption C above, Proposition

3.2 shows that the feasible test φT is valid under the true hypotheses.

Proposition 3.2. The following statements hold under Assumptions A1–A3 and C.

(a) Under the EPA setting (2.11), EφT → α under H†0. If Assumption B1(ii) (resp. B2) holds

in addition, we have EφT → 1 under H†1a (resp. H†2a).

(b) Under the SPA setting (2.12) and Assumption B1, we have lim supT→∞ EφT ≤ α under

H†0 and EφT → 1 under H†a.

Comments. (i) It can be shown that the test φT satisfies the same asymptotic level and power

properties under the proxy hypotheses, without requiring Assumption C. Assumption C is needed

for deriving asymptotic properties of φT under the true hypotheses. In particular, Proposition

3.2 shows that the level and power properties of the test are the same for the true and the proxy

hypotheses. In this sense, the proxy error is negligible for the asymptotic inference about preditive

accuracy.

(ii) Similar to our negligibility result, West (1996) defines cases exhibiting “asymptotic irrel-

evance” as those in which valid inference about predictive accuracy can be made while ignoring

the presence of parameter estimation error βt − β∗. Our negligibility result is very distinct from

West’s result: here, the unobservable quantity is a latent stochastic process (Y †t )t≥1 that grows in

T as T → ∞, while in West’s setting it is a fixed deterministic and finite-dimensional parameter

β∗. Unlike West’s (1996) case, where a correction can be applied when the asymptotic irrelevance

condition (w.r.t. β∗) is not satisfied, no such correction (w.r.t. Y †t ) is readily available in our

application, nor in that of Corradi and Distaso (2006), among others.

19

3.3 Discussion of an alternative approach

The theoretical framework that we develop above is based on the “convergence-of-hypotheses”

approach. We have shown that if the proxy only results in an asymptotically negligible difference

in the expected evaluation measure (i.e., Assumption C), then the feasible tests based on the proxy

has desirable rejection probabilities under the true hypotheses (see Proposition 3.2).

We stress that the convergence-of-hypotheses approach is the natural choice here because we are

interested in hypothesis testing. This is thus very different from prior work that derives asymptotic

negligibility results involving high-frequency proxies in estimation problems of stochastic volatility

models; see, for example, Corradi and Distaso (2006), Todorov (2009), Todorov et al. (2011),

Kanaya and Kristensen (2016) and Bandi and Reno (2016). The approach used in these papers

can be regarded as one with “convergence-of-statistics.” That is, these authors show that certain

feasible proxy-based statistics (e.g., the estimator of the parameter of interest and that of the

asymptotic variance) has asymptotically negligible difference from their infeasible counterparts

(which are constructed using latent variables such as the IV).

It is possible to use the convergence-of-statistics approach as an alternative proof strategy to

show the validity of the feasible tests. For concreteness, we use the example in Section 2.1 to

illustrate how this can be done. In this example, the feasible statistics include DMT and the

long-run variance estimator ST . To fix idea, we suppose that ST is the Newey–West estimator

given by

ST =

hT∑l=−hT

w (l, hT ) Γl,T ,

where Γl,T is a sample autocovariance function of the loss differential series |Yt+1 − F1,t+1| −

|Yt+1 − F2,t+1| at lag l, w (·) is a kernel function and hT is a bandwidth parameter. The associated

infeasible statistics are DM †T and S†T , where S†T is constructed similarly as ST but with Yt replaced

by Y †t . The feasible and infeasible t-statistics are then given by, respectively,

ϕT ≡√PDMT /

√ST , ϕ†T ≡

√PDM †T /

√S†T .

The convergence-of-statistics approach amounts to seeking sufficient conditions that ensures ϕT −

ϕ†T = op(1), for which it suffices to have

√P (DMT −DM †T ) = op(1), ST − S†T = op(1). (3.3)

In contrast, the convergence-of-hypotheses condition in this example is given by (2.8), which

20

is recalled below for the reader’s convenience

√P(E[DMT ]− E[DM †T ]

)= o(1).

Immediately, we observe that this condition is very similar to the first part of (3.3). In fact,

both are implied by E|DMT − DM †T | = o(P−1/2), which is underlying the proof of Proposition

3.1. However, unlike (3.3), the convergence-of-hypotheses approach does not require the additional

negligibility result concerning the long-run variance estimators, that is, ST − S†T = op(1).

It is of course technically possible to show (3.3) in this simple example.7 However, our key

observation is that this effort is unnecessary, because our “end goal” is not to recover the infeasible

test statistic (i.e., ϕ†T ), but to ensure that the feasible test has the desired asymptotic rejection

probabilities under the true hypotheses (which we have shown in Proposition 3.2). The former

clearly implies the latter, but not without cost. In this sense, the convergence-of-hypotheses

approach offers a “shorter path” for proving the validity of the feasible test than the convergence-

of-statistics alternative.

More generally, the additional cost of recovering the infeasible test statistic can be much higher

than establishing a negligibility result for the long-run variance. Indeed, as shown in the examples

in Section 3.1, the test statistics may have non-standard asymptotic distributions, the long-run

variance may not be consistently estimated, and the inference may be done via bootstrap. These

idiosyncrasies would require a method-by-method calculation (together with additional method-

specific regularity conditions) for showing the negligible difference between the feasible and the

infeasible test statistics. This would defeat our goal of establishing a concise but general framework

for studying a broad range of proxy-based forecast evaluation problems. This is the main reason

why we have developed the convergence-of-hypotheses approach in the current study.

4 High-frequency proxies and their accuracy

In this section, we introduce high-frequency proxies Yt for various risk measures Y †t and verify the

high-level Assumption C, invoked in the previous section, under primitive conditions in a range

of cases. Section 4.1 introduces the setting for the high-frequency data. Sections 4.2–4.4 consider

7Let Γ†l,T denote the infeasible counterpart of Γl,T . Suppose that the variables Yt, Y†t , F1,t and F2,t have

bounded qth moment for q = p/(p − 1) and (3.2) holds. Then by the triangle inequality and Holder’s inequality,

E|Γl,T − Γ†l,T | ≤ KT−1 ∑T

t=R dθt . Since the kernel function w(·) is bounded, ST − S†T = Op(hTT

−1 ∑Tt=R d

θt ). In the

special case with regular sampling (i.e. dt = ∆ identically), hT∆θ → 0 implies ST − S†T = op(1).

21

three general classes of proxies for various volatility and jump functionals and Section 4.5 considers

some additional important examples. In Section 4.6, we compare these results with existing ones

in the literature and summarize our technical contribution.

4.1 The underlying asset price process

In this subsection, we describe the setting for constructing proxies using high-frequency data. Our

basic assumption is that the log price process (Xt)t≥0 is a d-dimensional Ito semimartingale defined

on a filtered probability space (Ω,F , (Ft)t≥0,P) with the following form

Xt = X0 +

∫ t

0bsds+

∫ t

0σsdWs

+

∫ t

0

∫Rδ (s, z) 1‖δ(s,z)‖≤1µ (ds, dz) +

∫ t

0

∫Rδ (s, z) 1‖δ(s,z)‖>1µ (ds, dz) ,

(4.1)

where bt is a d-dimensional cadlag adapted process, Wt is a d′-dimensional standard Brownian

motion, σt is a d× d′ stochastic volatility process, δ : Ω×R+ ×R 7→ Rd is a predictable function,

µ is a Poisson random measure on R+ × R with compensator ν (ds, dz) = ds ⊗ λ (dz) for some

σ-finite measure λ, and µ ≡ µ−ν. Ito semimartingales are widely used for modeling asset prices in

financial economics and econometrics; see, for example, Duffie (2001), Singleton (2006) and Jacod

and Protter (2012).

The diffusive risk and the jump risk in Xt are respectively captured by the spot covariance

matrix ct ≡ σtσᵀt and the jump process ∆Xt ≡ Xt − Xt−, where Xt− ≡ lims↑tXs. In practice,

these risks are often summarized as various functionals of the processes ct and ∆Xt, which play

the role of the latent forecast target Y †t in our analysis.

To simplify the discussion, we normalize the unit of time to be one day. For each day t, the

process X is sampled at deterministic discrete times t− 1 = τ(t, 0) < · · · < τ (t, nt) = t, where nt

is the number of intraday returns. We denote the returns and sampling durations by, respectively,

∆t,iX ≡ Xτ(t,i) −Xτ(t,i−1), dt,i = τ(t, i)− τ(t, i− 1),

and denote the sampling mesh by dt = max1≤i≤nt dt,i. The basic assumption on the sampling

scheme is that dt should be “small” in the prediction sample, as formalized below.

Assumption S: dT → 0 and dT = O(n−1T ) as T →∞.

Assumption S posits that the sampling mesh and the sample span T respectively go to 0 and

∞ in a joint, rather than a sequential, way. Under this condition, we characterize the rate of

22

convergence of various high-frequency proxies in Section 4. This sampling scheme is essentially the

same as the “double asymptotic” setting considered by Corradi and Distaso (2006) and Todorov

(2009), among others. Indeed, the latter amounts to setting dt,i to be a constant ∆, so that

Assumption S posits ∆ → 0 and T → ∞ asymptotically. Allowing for time-varying sampling

incurs no additional cost in our derivation, but is conceptually desirable in practice. As the

trading activity has grown substantially over the past two decades, later samples have a much

larger number of, and less noisy, intradaily observations than those in earlier samples, so it may be

more efficient to sample more frequently in later samples (Aıt-Sahalia et al. (2005), Zhang et al.

(2005), Bandi and Russell (2008)). This setting is also aligned naturally with the focal point of our

approximation argument: we are interested in using the proxy Yt+τ to approximate the true target

Y †t+τ in the prediction sample (i.e., t ∈ R, . . . , T) for evaluation, while being agnostic about the

regression sample (i.e., t < R).

We need the following regularity condition for the process Xt.

Assumption HF: Suppose that the following conditions hold for constants r ∈ (0, 2], k ≥ 2

and C > 0.

(i) The process σt is a d× d′ Ito semimartingale with the form

σt = σ0 +

∫ t

0bsds+

∫ t

0σsdWs +

∫ t

0

∫Rδ (s, z) µ (ds, dz) , (4.2)

where b is a d× d′ cadlag adapted process, σ is a d× d′ × d′ cadlag adapted process and δ (·) is a

d× d′ predictable function on Ω× R+ × R.

(ii) For some nonnegative deterministic functions Γ (·) and Γ(·) on R, we have ‖δ (ω, s, z) ‖ ≤

Γ (z) and ‖δ(ω, s, z)‖ ≤ Γ(z) for all (ω, s, z) ∈ Ω× R+ × R and∫R

(Γ (z)r ∧ 1)λ (dz) +

∫R

Γ (z)k 1Γ(z)>1λ (dz) <∞,∫R

(Γ (z)2 + Γ (z)k)λ (dz) <∞.(4.3)

(iii) Let b′s = bs −∫R δ (s, z) 1‖δ(s,z)‖≤1λ (ds) if r ∈ (0, 1] and b′s = bs if r ∈ (1, 2]. We have for

all s ≥ 0,

E‖b′s‖k + E‖σs‖k + E‖bs‖k + E‖σs‖k ≤ C. (4.4)

Assumption HF(i) posits that the stochastic volatility process σt is also an Ito semimartingale.

For the results below, we allow for volatility jumps of arbitrary activity. For this reason, we do

23

not need to distinguish small jumps from big jumps in volatility, so we group them together as a

purely discontinuous local martingale in (4.2). Assumption HF(ii) imposes a type of dominance

condition on the random jump size for the price and the volatility. The constant r provides an

upper bound for the generalized Blumenthal–Getoor index, or “activity,” of price jumps in X. The

assumption is weaker when r is larger, in which case it is more difficult to separate jumps from

the diffusive component of Xt. The kth-order integrability of Γ(·) and Γ (·) places restrictions on

jump tails and it facilitates the derivation of bounds via sufficiently high moments. Assumption

HF(iii) imposes integrability conditions that serve the same purpose.

In the subsections below, we show (3.2) under these primitive conditions. We stress that, unlike

the existing convergence rate results under the fixed-T setting (see, e.g., Jacod and Protter (2012)),

we consider rate results that are valid in the large-T setting, which demands different conditions

and proofs; see Section 4.6 for a detailed discussion. Throughout the rest of this section, we

maintain Assumption S without further mention.

4.2 Generalized realized variations for continuous processes

We start with the basic setting with X continuous; the continuity condition will be relaxed in later

subsections. That said, we allow for general volatility jumps as described in Assumption HF. We

consider the following general class of estimators: for any measurable function g : Rd 7→ R, we set

It (g) ≡nt∑i=1

g(∆t,iX/d1/2t,i )dt,i.

We also associate g with the following function: for any d × d positive semidefinite matrix A, we

set ρ (A; g) ≡ E [g (U)] for U ∼ N (0, A), provided that the expectation is well-defined. Theorem

4.1 below provides a bound for the approximation error between the proxy Yt = It(g) and the

target variable Y †t = It(g) ≡∫ tt−1 ρ (cs; g) ds.

In many applications, the function ρ ( · ; g) and, hence, It(g) can be expressed in closed form.

For example, in the scalar case (i.e., d = 1), if we take g (x) = |x|a /ma for some a ≥ 2, where ma

is the ath absolute moment of a standard normal variable, then It(g) =∫ tt−1 c

a/2s ds; the integrated

variance is a special case with a = 2. Another univariate example is to take g(x) = cos(√

2ux),

u > 0, yielding It(g) =∫ tt−1 exp(−ucs)ds. In this case, It(g) is the realized Laplace transform

of volatility (Todorov and Tauchen (2012)) and It(g) is the Laplace transform of the volatility

occupation density which captures the distributional information of volatility. A simple bivariate

24

example is g(x1, x2) = x1x2, which leads to It(g) =∫ tt−1 c12,sds, that is, the integrated covariance

between the two components of Xt; see Barndorff-Nielsen and Shephard (2004a).

Theorem 4.1. Let p ∈ [1, 2) and C > 0 be constants. Suppose (i) Xt is continuous; (ii)

g(·) and ρ ( · ; g) are continuously differentiable and, for some q ≥ 0, ‖∂xg (x) ‖ ≤ C(1 + ‖x‖q)

and ‖∂Aρ (A; g) ‖ ≤ C(1 + ‖A‖q/2); (iii) Assumption HF with k ≥ max 2qp/ (2− p) , 4; (iv)

E[ρ(cs; g2)] ≤ C for all s ≥ 0. Then ‖It(g)− It(g)‖p ≤ Kd1/2

t for some constant K > 0 and all t.

4.3 Jump-robust proxies for integrated volatility functionals

We now turn to a general setting in which Xt may have jumps. In this subsection, we consider

jump-robust proxies for risk measures with the form I?t (g) =∫ tt−1 g(cs)ds, where g : Rd×d 7→ R

is a twice continuously differentiable function with at most polynomial growth. This class of risk

factors is quite general: integrated variance and covariance, integrated quarticity, and volatility

Laplace and Fourier transforms are special cases. The estimation and inference for this class of

integrated volatility functionals has been studied by Kristensen (2010) in a case without price or

volatility jumps.

In order to construct a jump-robust proxy for I?t (g), we first nonparametrically recover the

spot covariance process by using a spot truncated covariation estimator given by8

cτ(t,i) =1

kt

kt∑j=1

d−1t,i+j∆t,i+jX∆t,i+jX

ᵀ1‖∆t,i+jX‖≤αd$t,i+j, (4.5)

where α > 0 and $ ∈ (0, 1/2) are constant tuning parameters, and kt is an integer that specifies

the local window for the spot covariance estimation and may vary across days. We consider the

sample analogue of I?t (g) as its proxy, that is, I?t (g) =∑nt−kt

i=0 g(cτ(t,i))dt,i.

Theorem 4.2. Let q ≥ 2, p ∈ [1, 2) and C > 0 be constants. Suppose (i) g is twice continuously

differentiable and ‖∂jxg(x)‖ ≤ C(1 + ‖x‖q−j) for j ∈ 0, 1, 2; (ii) kt d−1/2t ; (iii) Assumption

HF with k ≥ max4q, 4p(q − 1)/(2 − p), (1 −$r)/(1/2 −$) and r ∈ (0, 2). We set θ1 = 1/(2p)

in the general case and θ1 = 1/2 if we further assume that σt is continuous. We also set θ2 =

8Spot variance estimators can be dated back to Foster and Nelson (1996) and Comte and Renault (1998); also

see Kristensen (2010) and references therein. The truncation technique was proposed by Mancini (2001) for the

estimation of integrated variance. The spot truncated covariation estimator appeared in Chapter 9 of Jacod and

Protter (2012), although they have been considered as auxiliary results in other contexts (see, e.g., Aıt-Sahalia and

Jacod (2009)).

25

min1−$r+ q(2$− 1), 1/r− 1/2. Then ‖I?t (g)− I?t (g)‖p ≤ Kdθ1∧θ2t for some constant K and

all t.

Comments. (i) The rate exponent θ1 is associated with the contribution from the continuous

component of Xt. The exponent θ2 captures the approximation error due to the elimination of

jumps. The larger these indexes, the faster the proxy hypotheses converge to the true ones; recall

Proposition 3.1. If we further impose r < 1 and $ ∈ [(q− 1/2)/(2q− r), 1/2), then θ2 ≥ 1/2 ≥ θ1.

That is, the presence of “inactive” jumps does not affect the rate of convergence, provided that

the jumps are properly truncated.

(ii) Jacod and Rosenbaum (2013) characterize the limit distribution of I?t (g) under the in-fill

asymptotic setting with a fixed time span, under the assumption that g is three-times continuously

differentiable and r < 1. The same rate is attained by Kristensen (2010) in the “fixed-T” setting for

twice continuously differentiable functions in the case without price or volatility jumps. Here, we

obtain the same rate of convergence under the L1 norm, and under the Lp norm if σt is continuous,

in a setting with dT → 0 and T → ∞. Our results also cover the case with active jumps, that is,

the setting with r ≥ 1.

4.4 Functionals of price jumps

In this subsection, we consider jump risk measures. The target variable of interest takes the form

Jt(g) ≡∑

t−1<s≤t g (∆Xs) for some function g : Rd 7→ R. The proxy is the sample analogue

estimator Jt (g) ≡∑nt

i=1 g (∆t,iX) . Basic examples include jump power variations such as the

unnormalized realized skewness (g(x) = x3), kurtosis (g(x) = x4), coskewness (g(x1, x2) = x21x2)

and cokurtosis (g(x1, x2) = x21x

22). See Amaya et al. (2011) for applications of these risk factors.

Theorem 4.3. Let p ∈ [1, 2) and C > 0 be constants. Suppose (i) g is twice continuously differ-

entiable; (ii) for some q2 ≥ q1 ≥ 3, we have ‖∂jxg(x)‖ ≤ C(‖x‖q1−j + ‖x‖q2−j) for all x ∈ Rd and

j ∈ 0, 1, 2; (iii) Assumption HF with k ≥ max2q2, 4p/(2−p). Then ‖Jt (g)−Jt(g)‖p ≤ Kd1/2t

for some constant K and all t.

Comment. The polynomial ‖x‖q1−j in condition (ii) bounds the growth of g(·) and its derivatives

near zero. This condition ensures that the contribution of the continuous part of X to the approx-

imation error is dominated by the jump part of X. This condition can be relaxed at the cost of

a more complicated expression for the rate. The polynomial ‖x‖q2−j controls the growth of g(·)

near infinity so as to tame the effect of big jumps.

26

4.5 Additional special examples

We now consider a few special examples which are not covered by Theorems 4.1–4.3. In the first

example, the true target is the daily quadratic covariation matrix QVt of the process X, that is,

QVt ≡∫ tt−1 csds+

∑t−1<s≤t ∆Xs∆X

ᵀs . The associated proxy is the realized covariation matrix

RVt ≡nt∑i=1

∆t,iX∆t,iXᵀ. (4.6)

Theorem 4.4. Let p ∈ [1, 2). Suppose Assumption HF with k ≥ max2p/(2 − p), 4. Then

‖RVt −QVt‖p ≤ Kd1/2t for some K and all t.

Comment. In the case without price jumps, Corradi and Distaso (2006) established a similar

result; see their Proposition 1. Theorem 4.4 holds in the general case with price jumps, without

any restriction on their activity.

Second, we consider the bipower variation of Barndorff-Nielsen and Shephard (2004b) for uni-

variate X that is defined as

BVt =nt

nt − 1

π

2

nt−1∑i=1

|d−1/2t,i ∆t,iX||d−1/2

t,i+1∆t,i+1X|dt,i. (4.7)

This estimator serves as a proxy for the integrated variance∫ tt−1 csds.

Theorem 4.5. Let p and p′ be constants such that 1 ≤ p < p′ ≤ 2. Suppose that Assumption

HF holds with d = 1 and k ≥ maxpp′/(p′ − p), 4. We have, for some K and all t, (a) ‖BVt −∫ tt−1 csds‖p ≤ Kd

(1/r)∧(1/p′)−1/2t ; (b) if, in addition, X is continuous, then ‖BVt −

∫ tt−1 csds‖p ≤

Kd1/2t .

Comment. Part (b) shows that, when X is continuous, the approximation error of the bipower

variation achieves the√nt rate. Part (a) provides a bound for the rate of convergence in the

case with jumps. The rate is slower than that in the continuous case. The constant p′ arises as

a technical device in our proofs and should be chosen close to p so that the bound in part (a) is

sharper. We note that, the rate in part (a) is sharper when r is smaller. In particular, with r ≤ 1

and p′ being close to 1, the bound in the jump case can be made arbitrarily close to O(d1/2t ), at

the cost of assuming higher-order moments to be finite (i.e., larger k). The slower rate in the jump

case is in line with the known fact that the bipower variation estimator does not admit a CLT

when X is discontinuous.9

9See p. 313 in Jacod and Protter (2012) and Vetter (2010).

27

Finally, we consider the realized semivariance estimator proposed by Barndorff-Nielsen et al.

(2010) for univariate X. Let x+ and x− denote the positive and the negative parts of x ∈ R,

respectively. The upside (+) and the downside (−) realized semivariances are defined as SV±t =∑nt

i=1∆t,iX2±, which serve as proxies for SV ±t = 12

∫ tt−1 csds+

∑t−1<s≤t∆Xs2±.

Theorem 4.6. Let p and p′ be constants such that 1 ≤ p < p′ ≤ 2. Suppose that Assumption

HF holds with d = 1, r ∈ (0, 1] and k ≥ maxpp′/(p′ − p), 4. Then for some K and all t, (a)

‖SV±t −SV ±t ‖p ≤ Kd

1/p′−1/2t ; (b) if, in addition, X is continuous, then ‖SV

±t −SV ±t ‖p ≤ Kd

1/2t .

Comment. Part (b) shows that, when X is continuous, the approximation error of the semi-

variance achieves the√nt rate, which agrees with the rate shown in Barndorff-Nielsen et al. (2010)

under the fixed-span setting. Part (a) provides a bound for the rate of convergence in the case

with jumps. The constant p′ arises as a technical device in the proof. One should make it small

so as to achieve a better rate, subject to the regularity condition k ≥ pp′/(p′ − p). In particular,

the rate can be made close to that in the continuous case when p′, hence p too, are close to 1.

Barndorff-Nielsen et al. (2010) do not consider rate results in the case with price or volatility

jumps.

4.6 Discussion: comparison with existing results

The high-frequency proxies studied in this section have been proposed in the literature; see Jacod

and Protter (2012) for a comprehensive review. However, the convergence rate results in this

section are distinct from existing work in two important dimensions.

First, with a few exceptions, prior work in the high-frequency literature mainly concerns an

asymptotic setting in which the sampling interval goes to zero but the sample span T is fixed.

Indeed, this is the setting of Jacod and Protter (2012). In contrast, we consider a high-frequency

long-span (double) asymptotic setting in which T → ∞. Consequently, existing rate results de-

veloped in the fixed-T setting cannot be directly invoked here. To be more specific, we note that

in the fixed-T setting, it is routine to apply the localization argument (see Section 4.4.1 in Jacod

and Protter (2012)), so that one can assume the underlying processes to be uniformly bounded

without loss of generality. However, we cannot invoke localization in the current long-span set-

ting. Hence, we need to design a different set of regularity conditions (see Assumption HF) and

use different technical arguments. We note that the Lp-bounds of the proxy errors derived in the

above subsections hold for the original process X at every t, instead of for the localized process

28

(as is done in the fixed-T setting).

Second, we are only interested in the rate of convergence, rather than proving a central limit

theorem. We thus provide direct proofs on the rates, which are, not surprisingly, notably dif-

ferent from proofs of central limit theorems for the high-frequency estimators. We note that we

derive rate results in cases even when there is no known central limit theorems. Examples in-

clude the semivariance, realized (co)skewness, and jump-robust estimators for integrated volatility

functionals under the setting with active jumps.

The high-frequency long-span setting has also been used in the literature on proxy-based semi-

parametric estimation of stochastic volatility models; see Corradi and Distaso (2006), Todorov

(2009), Todorov et al. (2011), Kanaya and Kristensen (2016) and Bandi and Reno (2016). Cor-

radi et al. (2009, 2011) further consider the problem of nonparametric density estimation. These

papers use proxies such as the realized variance, bipower variation, truncated variation, volatility

Laplace transform and spot variance estimates, and establish asymptotic negligibility results for

them. Our Theorem 4.1 and Theorem 4.2 cover these volatility functionals as special cases (but

our econometric interest on forecast evaluation is very different from prior work on estimation).

Indeed, instead of focusing on specific volatility functionals (such as the IV), we state our results

for general transformations on the volatility, so that other risk measures, such as beta, correlation,

idiosyncratic variance, volatility beta and eigenvalues can be readily incorporated in our forecast

evaluation setting.

The convergence rate results in this section does not concern high-frequency proxies that are

robust to microstructure noise. The current literature concerning noise mainly focuses on the esti-

mation of integrated variance and covariance. Corradi et al. (2011) has established the convergence

rate results under the Lp-norm for several popular noise-robust estimators in their Lemma 1. As

mentioned in comment (iv) of Proposition 3.1, their results can be used for verifying the high-level

condition. Generalizing the results in this section further to the noisy setting is beyond the scope

of the current paper.

5 Extensions: additional forecast evaluation methods

In this section we discuss several extensions of our baseline result (Proposition 3.2). We first

consider tests for instrumented conditional moment equalities, as in Giacomini and White (2006).

We then consider stepwise evaluation procedures that include the multiple testing method of

29

Romano and Wolf (2005) and the model confidence set of Hansen et al. (2011). Our purpose is

twofold: one is to facilitate the application of these methods in the context of forecasting latent

risk measures, the other is to demonstrate the generalizability of the framework presented above

through known, but distinct, examples. The stepwise procedures (Romano and Wolf (2005),

Hansen et al. (2011)) each involve some method-specific aspects that are not used elsewhere in the

paper; hence, for the sake of readability, we only briefly discuss the results here, and present the

details (assumptions, algorithms and formal results) in Supplemental Appendix B to this paper.

5.1 Tests for instrumented conditional moment equalities

Many interesting forecast evaluation problems can be stated as a test for the conditional moment

equality:

H†0 : E[g(Y †t+τ , Ft+τ (β∗))|Ht] = 0, all t ≥ 0, (5.1)

where Ht is a sub-σ-field that represents the forecast evaluator’s information set at day t, and

g(·, ·) : Y × Y k 7→ Rκg is a measurable function. Specific examples are given below. Let ht denote

a Ht-measurable Rκh-valued data sequence that serves as an instrument. The conditional moment

equality (5.1) implies the following unconditional moment equality:

H†0,h : E[g(Y †t+τ , Ft+τ (β∗))⊗ ht] = 0, all t ≥ 0. (5.2)

We cast (5.2) in the setting of Section 2 by setting ft+τ (y, β) ≡ g(y, Ft+τ (β))⊗ht. Then the theory

in Section 3 can be applied without further change. In particular, Proposition 3.2 suggests that

the two-sided PEPA test (with χ = 0) using the proxy has a valid asymptotic level under H†0 and

is consistent against the alternative

H†2a,h : lim infT→∞

‖E[g(Y †t+τ , Ft+τ (β∗))⊗ ht]‖ > 0. (5.3)

Examples include tests for conditional predictive accuracy and tests for conditional forecast

rationality. To simplify the discussion, we only consider scalar forecasts, so κY = 1. Below, let

L(·, ·) : Y × Y 7→ R be a loss function, with its first and second arguments being the target and

the forecast, respectively.

Example 5.1: Giacomini and White (2006) consider two-sided tests for conditional equal

predictive ability of two sequences of actual forecasts Ft+τ = (F1,t+τ , F2,t+τ ). The null hy-

pothesis of interest is (5.1) with g(Y †t+τ , Ft+τ (β∗)) = L(Y †t+τ , F1,t+τ (β∗)) − L(Y †t+τ , F2,t+τ (β∗)).

30

Since Giacomini and White (2006) concern the actual forecasts, we set β∗ to be empty and treat

Ft+τ = (F1,t+τ , F2,t+τ ) as an observable sequence. Primitive conditions for Assumptions A1 and

A3 can be found in Giacomini and White (2006), which involve standard regularity conditions

for weak convergence and HAC estimation. The test statistic is of Wald-type and readily verifies

Assumptions A2 and B2. As noted by Giacomini and White (2006), their test is consistent against

the alternative (5.3) and the power generally depends on the choice of ht.

Example 5.2: The population forecast Ft+τ (β∗), which is also the actual forecast if β∗ is

empty, is rational with respect to the information set Ht if it solves minF∈Ht E[L(Y †t+τ , F )|Ht]

almost surely. Suppose that L(y, F ) is differentiable in F for almost every y ∈ Y under the

conditional law of Y †t+τ given Ht, with the partial derivative denoted by ∂FL(·, ·). As shown in

Patton and Timmermann (2010), a test for conditional rationality can be carried out by testing

the first-order condition of the minimization problem. That is to test the null hypothesis (5.1)

with g(Y †t+τ , Ft+τ (β∗)) = ∂FL(Y †t+τ , Ft+τ (β∗)). The variable g(Y †t+τ , Ft+τ (β∗)) is the generalized

forecast error (Granger (1999)). In particular, when L(y, F ) = (F − y)2/2, that is, the quadratic

loss, we have g(Y †t+τ , Ft+τ (β∗)) = F − y; in this case, the test for conditional rationality is reduced

to a test for conditional unbiasedness. Tests for unconditional rationality and unbiasedness are

special cases of their conditional counterparts, with Ht being the degenerate information set.

5.2 Stepwise multiple testing procedure for superior predictive accuracy

In the context of forecast evaluation, the multiple testing problem of Romano and Wolf (2005)

consists of k individual testing problems of pairwise comparison for superior predictive accu-

racy. Let F0,t+τ (·) be the benchmark forecast model and let f †∗j,t+τ = L(Y †t+τ , F0,t+τ (β∗)) −

L(Y †t+τ , Fj,t+τ (β∗)), 1 ≤ j ≤ k, be the relative performance of forecast j relative to the benchmark.

As before, f †∗j,t+τ is defined using the true target variable Y †t+τ . We consider k pairs of hypotheses

Multiple SPA

H†j,0 : E[f †∗j,t+τ ] ≤ 0 for all t ≥ 1,

H†j,a : lim infT→∞ E[f †∗j,T ] > 0,1 ≤ j ≤ k. (5.4)

These hypotheses concern the true target variable and are stated in a way that allows for data

heterogeneity.

Romano and Wolf (2005) propose a stepwise multiple (StepM) testing procedure that conducts

decisions for individual testing problems while asymptotically control the familywise error rate

(FWE), that is, the probability of any null hypothesis being falsely rejected. The StepM procedure

31

relies on the observability of the forecast target. By imposing the condition on proxy accuracy

(Assumption C), we can show that the StepM procedure, when applied to the proxy, asymptotically

controls the FWE for the hypotheses (5.4) that concern the latent target. The details are in

Supplemental Appendix B.1.

5.3 Model confidence sets

The model confidence set (MCS) proposed by Hansen et al. (2011), henceforth HLN, can be

specialized in the forecast evaluation context to construct confidence sets for superior forecasts.

To fix ideas, let f †∗j,t+τ denote the performance (e.g., the negative loss) of forecast j with respect

to the true target variable. The set of superior forecasts is defined as

M† ≡j ∈

1, . . . , k

: E[f †∗j,t+τ ] ≥ E[f †∗l,t+τ ] for all 1 ≤ l ≤ k and t ≥ 1

.

That is,M† collects the forecasts that are superior to others when evaluated using the true target

variable. Similarly, the set of asymptotically inferior forecasts is defined as

M† ≡j ∈

1, . . . , k

: lim inf

T→∞

(E[f †∗l,t+τ ]− E[f †∗j,t+τ ]

)> 0

for some (and hence any) l ∈M†.

We are interested in constructing a sequence MT,1−α of 1−α nominal level MCS’s forM† so that

lim infT→∞

P(M† ⊆ MT,1−α

)≥ 1− α, P

(MT,1−α ∩M† = ∅

)→ 1. (5.5)

That is, MT,1−α has valid (pointwise) asymptotic coverage and has asymptotic power one against

fixed alternatives.

HLN’s theory for the MCS is not directly applicable due to the latency of the forecast target.

Following the prevailing strategy of the current paper, we propose a feasible version of HLN’s

algorithm that uses the proxy in place of the associated latent target. Under Assumption C, we

can show that this feasible version achieves (5.5). The details are in Supplemental Appendix B.2.

6 Monte Carlo analysis

6.1 Simulation designs

We consider three simulation designs which are intended to cover some of the most common and

important applications of high-frequency data in forecasting: (A) forecasting univariate volatility

32

in the absence of price jumps; (B) forecasting univariate volatility in the presence of price jumps;

and (C) forecasting correlation. In each design, we consider the EPA hypotheses, equation (2.11),

under the quadratic loss for two competing one-day-ahead forecasts using the method of Giacomini

and White (2006) and with the function ϕ(·, ·) corresponding to the t-statistic. In addition, the

proxies used below satisfy (3.2) with θ = 1/2 and, in view of comments (i) and (ii) of Proposition

3.1, Assumption C is implied by T∆→ 0, where ∆ is the sampling interval.

Each forecast is formed using a rolling scheme with window size R ∈ 250, 500, 1000 days.

The prediction sample contains P ∈ 500, 1000, 2000 days. The high-frequency data are simulated

using the Euler scheme at every second, and proxies are computed using sampling interval ∆ =

5 seconds, 1 minute, 5 minutes, or 30 minutes. As on the New York stock exchange, each day is

assumed to contain 6.5 trading hours. There are 1000 Monte Carlo trials in each experiment and

all tests are at the 5% nominal level.

We now describe the simulation designs. Simulation A concerns forecasting the logarithm of

the quadratic variation of a continuous price process. Following one of the simulation designs in

Andersen et al. (2005), we simulate the logarithmic price Xt and the spot variance process σ2t

according to the following stochastic differential equations: dXt = 0.0314dt+ σt(−0.5760dW1,t +√

1− 0.57602dW2,t) + dJt,

d log σ2t = −0.0136(0.8382 + log σ2

t )dt+ 0.1148dW1,t,(6.1)

where W1 and W2 are independent Brownian motions and the jump process J is set to be identically

zero. The values of the parameters of this process are taken from Andersen et al. (2005). The

target variable to be forecast is log IVt ≡ log∫ tt−1 σ

2sds and the proxy is logRV ∆

t , where RV ∆t is

defined by (4.6) for data sampled at ∆ = 5 seconds, 1 minute, 5 minutes, or 30 minutes.

The first forecast model in Simulation A is a GARCH(1,1) model (Bollerslev (1986)) estimated

using quasi maximum likelihood on daily returns:

Model A1:

rt = Xt −Xt−1 = σtεt, εt|Ft−1 ∼ N (0, 1) ,

σ2t = ω + βσ2

t−1 + αr2t−1.

(6.2)

The second model is a heterogeneous autoregressive (HAR) model (Corsi (2009)) for RV 5mint

estimated via ordinary least squares:

Model A2:

RV 5mint = β0 + β1RV

5mint−1 + β2

∑5k=1RV

5mint−k

+β3

∑22k=1RV

5mint−k + et.

(6.3)

33

The logarithm of the one-day-ahead forecast for σ2t+1 (resp. RV 5min

t+1 ) from the GARCH (resp.

HAR) model is taken as a forecast for log IVt+1.

In Simulation B, we also set the forecast target to be log IVt, but consider a more complicated

setting with price jumps. We simulate Xt and σ2t according to (6.1) and, following Huang and

Tauchen (2005), we specify Jt as a compound Poisson process with intensity λ = 0.05 per day and

with jump size distribution N (0.2, 1.42). (These parameter values are in the middle of the ranges

considered by Huang and Tauchen (2005).) The proxy for IVt is the bipower variation BV ∆t , where

BV ∆t is defined by (4.7) for data sampled with observation mesh ∆.

The competing forecast sequences in Simulation B are as follows. The first forecast is based

on a simple random walk model, applied to the 5-minute bipower variation BV 5mint :

Model B1: BV 5mint = BV 5min

t−1 + εt, where E [εt|Ft−1] = 0. (6.4)

The second model is a HAR model for BV 1mint

Model B2:

BV 1mint = β0 + β1BV

1mint−1 + β2

∑5k=1BV

1mint−k

+β3

∑22k=1BV

1mint−k + et.

(6.5)

The logarithm of the one-day-ahead forecast for BV 5 mint+1 (resp. BV 1 min

t+1 ) from the random walk

(resp. HAR) model is taken as a forecast for log IVt+1.

Finally, we consider correlation forecasting in Simulation C. This simulation exercise is of

particular interest as our empirical application in Section 7 concerns a similar forecasting problem.

We adopt the bivariate stochastic volatility model used in the simulation study of Barndorff-Nielsen

and Shephard (2004a). Let Wt = (W1,t,W2,t). The bivariate logarithmic price process Xt is given

by

dXt = σtdWt, σtσᵀt =

σ21,t ρtσ1,tσ2,t

• σ22,t

.

Let Bj,t, j = 1, . . . , 4, be Brownian motions that are independent of each other and of Wt. The

process σ21,t follows a two-factor stochastic volatility model: σ2

1,t = vt + vt, where dvt = −0.0429(vt − 0.1110)dt+ 0.2788√vtdB1,t,

dvt = −3.7400(vt − 0.3980)dt+ 2.6028√vtdB2,t.

(6.6)

The process σ22,t is specified as a GARCH diffusion:

dσ22,t = −0.0350(σ2

2,t − 0.6360)dt+ 0.2360σ22,tdB3,t. (6.7)

34

The specification for the correlation process ρt is a GARCH diffusion for the inverse Fisher trans-

formation of the correlation: ρt = (e2yt − 1)/(e2yt + 1),

dyt = −0.0300 (yt − 0.6400) dt+ 0.1180ytdB4,t.(6.8)

(The parameter values used here are the same as in Barndorff-Nielsen and Shephard (2004a),

although our notation differs slightly.) In this simulation design we take the target variable to be

the daily integrated correlation, which is defined as

ICt ≡QV12,t√

QV11,t

√QV22,t

. (6.9)

The proxy is given by the realized correlation computed using returns sampled at frequency ∆:

RC∆t ≡

RV ∆12,t√

RV ∆11,t

√RV ∆

22,t

. (6.10)

The first forecasting model is a GARCH(1,1)–DCC(1,1) model (Engle (2002)) applied to daily

returns rt = Xt −Xt−1:

Model C1:

rj,t = σj,tεj,t, σ2j,t = ωj + βjσ

2j,t−1 + αjr

2j,t−1, for j = 1, 2,

ρεt ≡ E[ε1,tε2,t|Ft−1] =Q12,t√Q11,tQ22,t

, Qt =

Q11,t Q12,t

• Q22,t

,

Qt = Q (1− a− b) + bQt−1 + a εt−1εᵀt−1, εt = (ε1,t, ε2,t).

(6.11)

The forecast for ICt+1 is the one-day-ahead forecast of ρεt+1. The second forecasting model extends

Model C1 by adding the lagged 30-minute realized correlation to the evolution of Qt:

Model C2: Qt = Q (1− a− b− g) + bQt−1 + a εt−1εᵀt−1 + g RC30min

t−1 . (6.12)

In each simulation, we set the evaluation function f(·) to be the loss of Model 1 less that of

Model 2 and conduct the one-sided EPA test (see equation (2.11)). We note that the competing

forecasts are not engineered to have the same mean-squared error (MSE). Therefore, for the purpose

of examining size properties of the tests, the hypotheses to be imposed are those in (2.11) with

χ being the population MSE of Model 1 less that of Model 2. We remind the reader that the

population MSE is computed using the true latent target variable, whereas the feasible tests are

conducted using proxies. The goal of this simulation study is to determine whether our feasible

tests have finite-sample rejection rates similar to those of the infeasible tests (i.e., tests based on

35

GW–NW GW–KVProxy RV ∆

t+1 P = 500 P = 1000 P = 2000 P = 500 P = 1000 P = 2000

R = 250

True Y †t+1 0.08 0.07 0.07 0.01 0.01 0.01∆ = 5 sec 0.08 0.07 0.07 0.01 0.01 0.01∆ = 1 min 0.08 0.07 0.07 0.01 0.01 0.01∆ = 5 min 0.07 0.07 0.06 0.01 0.01 0.01∆ = 30 min 0.07 0.06 0.06 0.01 0.01 0.01

R = 500


R = 1000


Table 1: Giacomini–White test rejection frequencies for Simulation A. The nominal size is 0.05,R is the length of the estimation sample, P is the length of the prediction sample, ∆ is thesampling frequency for the proxy. The left panel shows results based on a Newey–West estimateof the long-run variance, the right panel shows results based on Kiefer and Vogelsang’s “fixed-b”asymptotics.

true target variables), and, moreover, whether these tests have satisfactory size properties under

the true null hypothesis.10

36

GW–NW GW–KVProxy BV ∆

t+1 P = 500 P = 1000 P = 2000 P = 500 P = 1000 P = 2000

R = 250


R = 500


R = 1000


Table 2: Giacomini–White test rejection frequencies for Simulation B. The nominal size is 0.05,R is the length of the estimation sample, P is the length of the prediction sample, ∆ is thesampling frequency for the proxy. The left panel shows results based on a Newey–West estimateof the long-run variance, the right panel shows results based on Kiefer and Vogelsang’s “fixed-b”asymptotics.

6.2 Results

The results for Simulations A, B and C are presented in Tables 1, 2 and 3, respectively. In the top

row of each panel are the results for the infeasible tests that are implemented with the true target

variable, and in the other rows are the results for feasible tests based on proxies. We consider two

10Due to the complexity from the data generating processes and volatility models we consider, computing the

population MSE analytically for each forecast sequence is difficult. We instead compute the population MSE by

simulation, using a Monte Carlo sample of 500,000 days. Similarly, it is difficult to construct data generating

processes under which two forecast sequences have identical population MSE, which motivates our considering a

nonzero χ in the null hypothesis, equation (2.11), of our simulation design. Doing so enables us to use realistic

data generating processes and reasonably sophisticated forecasting models which mimic those used in prior empirical

work.

37

GW–NW GW–KVProxy RC∆

t+1 P = 500 P = 1000 P = 2000 P = 500 P = 1000 P = 2000

R = 250


R = 500


R = 1000


Table 3: Giacomini–White test rejection frequencies for Simulation C. The nominal size is 0.05,R is the length of the estimation sample, P is the length of the prediction sample, ∆ is thesampling frequency for the proxy. The left panel shows results based on a Newey–West estimateof the long-run variance, the right panel shows results based on Kiefer and Vogelsang’s “fixed-b”asymptotics.

implementations of the Giacomini–White (GW) test: the first is based on a Newey–West estimate

of the long-run variance and critical values from the standard normal distribution. The second is

based on the “fixed-b” asymptotics of Kiefer and Vogelsang (2005), using the Bartlett kernel. We

denote these two implementations as NW and KV, respectively. The KV method is of interest here

because of the well-known size distortion problem for inference procedures based on the standard

HAC estimation theory; see Muller (2012) and references therein. We set the truncation lag to be

3P 1/3 for NW and to be 0.5P for KV.11

Overall, we find that the rejection rates of the feasible tests based on proxies are generally

11In the KV case, the one-sided critical value for the t-statistic is 2.774 at 5% level when the truncation lag is

0.5P ; see Table 1 in Kiefer and Vogelsang (2005).

38

very close to the rejection rates of the infeasible tests using the true forecast target, and thus

that our negligibility result holds well in a range of realistic simulation scenarios. The standard

GW-NW method has reasonable size control in Simulations A and B, but has nontrivial size

distortion for Simulation C.12 This size distortion occurs even when the true target variable is

used, and is not exacerbated by the use of proxies. The GW-KV method has better size control in

these simulation scenarios, being somewhat conservative in Simulations A and B, and having good

rejection rates in Simulation C for P = 1000 and P = 2000. Supplemental Appendix S.C presents

results that confirm that these findings are robust with respect to the choice of the truncation lag

in the estimation of the long-run variance, along with some additional results on the disagreement

between the feasible and the infeasible tests. We caution here, though, that our simulation study

is clearly not exhaustive (for example, we focus on one-step-ahead forecasts and use the quadratic

loss function). It may be advisable that future researchers conduct further simulations if their

application differs greatly from those considered here.

7 Application: Comparing correlation forecasts

7.1 Data and model description

We now illustrate the use of our method with an empirical application on forecasting the integrated

correlation between two assets. Correlation forecasts are critical in financial decisions such as

portfolio construction and risk management; see Engle (2008) for example. Standard forecast

evaluation methods do not directly apply here due to the latency of the target variable, and

methods that rely on an unbiased proxy for the target variable (e.g., Hansen and Lunde (2006)

and Patton (2011)) cannot be used either, due to the absence of any such proxy.13 This is thus an

ideal example to illustrate the usefulness of the method proposed in the current paper.

12The reason for the large size distortion of the NW method in Simulation C appears to be the relatively high

persistence in the quadratic loss differentials. In Simulations A and B, the autocorrelations of the loss differential

sequence essentially vanish at about the 50th and the 30th lag, respectively, whereas in Simulation C they remain

non-negligible even at the 100th lag.13When based on relatively sparse sampling frequencies it may be considered plausible that the realized covariance

matrix is finite-sample unbiased for the true quadratic covariation matrix, however as the correlation involves a ratio

of the elements of this matrix, this property is lost.

39

Our sample consists two pairs of stocks: (i) Procter and Gamble (NYSE: PG) and General

Electric (NYSE: GE) and (ii) Microsoft (NYSE: MSFT) and Apple (NASDAQ: AAPL). The

sample period ranges from January 2000 to December 2010, consisting of 2,733 trading days, and

we obtain our data from the TAQ database. As in Simulation C from the previous section, we

take the proxy to be the realized correlation RC∆t formed using intraday returns with sampling

interval ∆.14 We consider ∆ ranging from 1 minute to 130 minutes, which covers sampling intervals

typically employed in empirical work.

We compare four forecasting models, all of which have the following specification for the con-

ditional mean and variance: for stock i, i = 1 or 2, its daily logarithmic return rit follows rit = µi + σitεit,

σ2it = ωi + βiσ

2i,t−1 + αiσ

2i,t−1ε

2i,t−1 + δiσ

2i,t−1ε

2i,t−11εi,t−1≤0 + γiRV

1 mini,t−1 .

(7.1)

That is, we assume a constant conditional mean, and a GJR-GARCH (Glosten et al. (1993))

volatility model augmented with lagged one-minute RV.

The baseline correlation model is Engle’s (2002) DCC model as considered in Simulation C; see

equation (6.11). The other three models are extensions of the baseline model. The first extension

is the asymmetric DCC (A-DCC) model of Cappiello et al. (2006), which is designed to capture

asymmetric reactions in correlation to the sign of past shocks:

Qt = Q (1− a− b− d) + bQt−1 + a εt−1εᵀt−1 + d ηt−1η

ᵀt−1, where ηt ≡ εt 1εt≤0. (7.2)

The second extension (R-DCC) augments the DCC model with the 65-minute realized correlation

matrix. This extension is in the same spirit as Noureldin et al. (2012), and is designed to exploit

high-frequency information about current correlation:

Qt = Q (1− a− b− g) + bQt−1 + a εt−1εᵀt−1 + g RC65 min

t−1 . (7.3)

The third extension (AR-DCC) encompasses both A-DCC and R-DCC with the specification

Qt = Q (1− a− b− d− g) + bQt−1 + a εt−1εᵀt−1 + d ηt−1η

ᵀt−1 + g RC65 min

t−1 . (7.4)

We conduct pairwise comparisons, under the quadratic loss function, of forecasts based on

these four models, which include both nested and nonnested cases. We use the framework of

14For all sampling intervals we use the “subsample-and-average” estimator of Zhang et al. (2005), with five sub-

samples when ∆ = 5 seconds, and with ten equally-spaced subsamples for the other choices of sampling frequency.

40

Giacomini and White (2006), so that nested and nonnested models can be treated in a unified

manner. Each one-day-ahead forecast is constructed in a rolling scheme with fixed estimation

sample size R = 1500 and prediction sample size P = 1233.

7.2 Results

Table 4 presents results for comparisons of each of the three generalized models and the baseline

DCC model, using both the GW–NW and the GW–KV tests; this amounts to conducting one-sided

t-test for (2.11) with χ set to be zero. The results in the first and fourth columns indicate that

the A-DCC model does not improve predictive accuracy relative to the baseline DCC model. The

GW–KV tests reveal that the loss of the A-DCC forecast is not statistically different from that of

DCC. The GW–NW tests, on the other hand, report statistically significant outperformance of the

A-DCC model relative to the DCC for some proxies, however this finding should be interpreted

with care, as the GW–NW test was found to over-reject in finite samples in Simulation C of

the previous section. Interestingly, for the MSFT–AAPL pair, the more general A-DCC model

actually underperforms the baseline model, though the difference is not significant. The next

columns reveal that the R-DCC model outperforms the DCC model, particularly for the MSFT–

AAPL pair, where the finding is highly significant and robust to the choice of proxy. Finally, we

find that the AR-DCC model outperforms the DCC model, however the statistical significance

of the outperformance of AR-DCC depends on the testing method. In view of the over-rejection

problem of the GW–NW test, we conclude conservatively that the AR-DCC is not significantly

better than the baseline DCC model.

Table 5 presents results from pairwise comparisons among the generalized models. Consistent

with the results in Table 4, we find that the A-DCC forecast underperforms those of R-DCC and

AR-DCC, and significantly so for MSFT–AAPL. The comparison between R-DCC and AR-DCC

yields mixed, but statistically insignificant, findings across the two pairs of stocks.

Overall, we find that augmenting the DCC model with lagged realized correlation significantly

improves its predictive ability, while adding an asymmetric term to the DCC model generally does

not improve, and sometimes hurts, its forecasting performance. These findings are robust to the

choice of proxy.

41

GW–NW GW–KVDCC vs DCC vs DCC vs DCC vs DCC vs DCC vs

Proxy RC∆t+1 A-DCC R-DCC AR-DCC A-DCC R-DCC AR-DCC

Panel A. PG–GE Correlation

∆ = 1 min 1.603 3.130∗ 2.929∗ 1.947 1.626 1.745∆ = 5 min 1.570 2.932∗ 2.724∗ 1.845 2.040 2.099∆ = 15 min 1.892∗ 2.389∗ 2.373∗ 2.047 1.945 1.962∆ = 30 min 2.177∗ 1.990∗ 2.206∗ 2.246 1.529 1.679∆ = 65 min 1.927∗ 0.838 1.089 1.642 0.828 0.947∆ = 130 min 0.805 0.835 0.688 0.850 0.830 0.655

Panel B. MSFT–AAPL Correlation

∆ = 1 min -0.916 2.647∗ 1.968∗ -1.024 4.405∗ 3.712∗

∆ = 5 min -1.394 3.566∗ 2.310∗ -1.156 4.357∗ 2.234∆ = 15 min -1.391 3.069∗ 1.927∗ -1.195 4.279∗ 2.116∆ = 30 min -1.177 3.011∗ 2.229∗ -1.055 3.948∗ 2.289∆ = 65 min -1.169 2.634∗ 2.071∗ -1.168 3.506∗ 2.222∆ = 130 min -1.068 1.825∗ 1.280 -1.243 3.342∗ 1.847

Table 4: T-statistics for out-of-sample forecast comparisons of correlation forecasting models. Inthe comparison of “A vs B,” a positive t-statistic indicates that B outperforms A. The 95% criticalvalues for one-sided tests of the null are 1.645 (GW–NW, in the left panel) and 2.774 (GW–KV,in the right panel). Test statistics that are greater than the critical value are marked with anasterisk.

8 Concluding remarks

This paper proposes a simple but general framework for the problem of testing predictive ability

when the target variable is unobservable. We consider an array of popular forecast evaluation

methods, including, for example, Diebold and Mariano (1995), West (1996), White (2000), Giaco-

mini and White (2006) and McCracken (2007), in cases where the latent target variable is replaced

by a proxy computed using high-frequency (intraday) data. We derive convergence rate results

for general classes of high-frequency based estimators of volatility and jump functionals, which

cover a majority of existing estimators as special cases, such as realized (co)variance, truncated

(co)variation, bipower variation, realized correlation, realized beta, jump power variation, realized

semivariance, realized Laplace transform, realized skewness and kurtosis. Based on these results,

we provide conditions under which the moments that define the proxy hypotheses converge suffi-

ciently quickly to their counterparts under the true hypotheses, so that the feasible tests based on

42

GW–NW GW–KVA-DCC vs A-DCC vs R-DCC vs A-DCC vs A-DCC vs R-DCC vs

Proxy RC∆t+1 R-DCC AR-DCC AR-DCC R-DCC AR-DCC AR-DCC

Panel A. PG–GE Correlation

∆ = 1 min 2.231∗ 2.718∗ 0.542 1.231 1.426 0.762∆ = 5 min 2.122∗ 2.430∗ 0.355 1.627 1.819 0.517∆ = 15 min 1.564 1.969∗ 0.888 1.470 1.703 1.000∆ = 30 min 0.936 1.561 1.282 0.881 1.271 0.486∆ = 65 min -0.110 0.391 1.039 -0.153 0.413 0.973∆ = 130 min 0.503 0.474 -0.024 0.688 0.516 -0.031

Panel B. MSFT–AAPL Correlation

∆ = 1 min 3.110∗ 3.365∗ -1.239 3.134∗ 3.657∗ -1.580∆ = 5 min 4.005∗ 4.453∗ -1.554 4.506∗ 6.323∗ -1.586∆ = 15 min 3.616∗ 4.053∗ -1.307 4.044∗ 5.449∗ -1.441∆ = 30 min 3.345∗ 3.770∗ -0.834 4.635∗ 7.284∗ -0.882∆ = 65 min 2.999∗ 3.215∗ -0.542 6.059∗ 7.868∗ -0.635∆ = 130 min 2.223∗ 2.357∗ -1.039 3.392∗ 5.061∗ -1.582

Table 5: T-statistics for out-of-sample forecast comparisons of correlation forecasting models. Inthe comparison of “A vs B,” a positive t-statistic indicates that B outperforms A. The 95% criticalvalues for one-sided tests of the null are 1.645 (GW–NW, in the left panel) and 2.774 (GW–KV,in the right panel). Test statistics that are greater than the critical value are marked with anasterisk.

proxies are valid under not only the former, but also the latter. In so doing, we bridge the vast

literature on forecast evaluation and the burgeoning literature on high-frequency time series. The

theoretical framework is structured in a way to facilitate further extensions in both directions.

We verify that the asymptotic results perform well in three distinct and realistically calibrated

Monte Carlo studies, though it is possible that finite-sample adjustments may be employed in

specific applications for further improvement. The results in this paper may serve as a general

benchmark for future work along this line. Our empirical application uses these results to reveal

the out-of-sample predictive gains from augmenting the widely-used DCC model (Engle (2002))

with high-frequency estimates of correlation.

References

Aıt-Sahalia, Y. and J. Jacod (2009). Testing for jumps in a discretely observed process. Annals ofStatistics 37, 184–222.

43

Aıt-Sahalia, Y., P. A. Mykland, and L. Zhang (2005). How often to sample a continuous-timeprocess in the presence of market microstructure noise. Review of Financial Studies 18, 351–416.

Amaya, D., P. Christoffersen, K. Jacobs, and A. Vasquez (2011). Do realized skewness and kurtosispredict the cross-section of equity returns? Technical report, University of Toronto.

Andersen, T. G. and T. Bollerslev (1998). Answering the skeptics: yes, standard volatility modelsdo provide accurate forecasts. International Economic Review 39, 885–905.

Andersen, T. G., T. Bollerslev, P. Christoffersen, and F. X. Diebold (2006). Volatility and corre-lation forecasting. In G. Elliott, C. W. J. Granger, and A. Timmermann (Eds.), Handbook ofEconomic Forecasting, Volume 1. Oxford: Elsevier.

Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2003). Modeling and forecastingrealized volatility. Econometrica 71 (2), pp. 579–625.

Andersen, T. G., T. Bollerslev, and N. Meddahi (2005). Correcting the errors: Volatility forecastevaluation using high-frequency data and realized volatilities. Econometrica 73 (1), pp. 279–296.

Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrixestimation. Econometrica 59 (3), pp. 817–858.

Bandi, F. and R. Reno (2016). Price and volatility co-jumps. Journal of Financial Economics 119,107–146.

Bandi, F. M. and J. R. Russell (2008). Microstructure noise, realized volatility and optimalsampling. Review of Economic Studies 75, 339–369.

Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2008). Designing realizedkernels to measure the ex post variation of equity prices in the presence of noise. Economet-rica 76 (6), 1481–1536.

Barndorff-Nielsen, O. E., S. Kinnebrouck, and N. Shephard (2010). Measuring downside risk:Realised semivariance. In T. Bollerslev, J. Russell, and M. Watson (Eds.), Volatility and TimeSeries Econometrics: Essays in Honor of Robert F. Engle, pp. 117–136. Oxford University Press.

Barndorff-Nielsen, O. E. and N. Shephard (2004a). Econometric analysis of realized covariation:High frequency based covariance, regression, and correlation in financial economics. Economet-rica 72 (3), pp. 885–925.

Barndorff-Nielsen, O. E. and N. Shephard (2004b). Power and bipower variation with stochasticvolatility and jumps (with discussion). Journal of Financial Econometrics 2, 1–48.

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econo-metrics 31, 307–327.

Bollerslev, T. and H. Zhou (2002). Estimating stochastic volatility diffusions using conditionalmoments of integrated volatility. Journal of Econometrics 109, 33–65.

44

Cappiello, L., R. F. Engle, and K. Sheppard (2006). Asymmetric dynamics in the correlations ofglobal equity and bond returns. Journal of Financial Econometrics 4, 537–572.

Clark, T. E. and M. W. McCracken (2005). Evaluating direct multistep forecasts. EconometricReviews 24, 369–404.

Comte, F. and E. Renault (1998). Long memory in continuous-time stochastic volatility models.Mathematical Finance 8, 291–323.

Corradi, V. and W. Distaso (2006). Semi-parametric comparison of stochastic volatility modelsusing realized measures. The Review of Economic Studies 73 (3), pp. 635–667.

Corradi, V., W. Distaso, and N. R. Swanson (2009). Predictive density estimators for dailyvolatility based on the use of realized measures. Journal of Econometrics 150, 119–138.

Corradi, V., W. Distaso, and N. R. Swanson (2011). Predictive inference for integrated volatility.Journal of the American Statistical Association 106, 1496–1512.

Corradi, V. and N. R. Swanson (2007). Nonparametric bootstrap procedures for predictive infer-ence based on recursive estimation schemes. International Economic Review 48 (1), pp. 67–109.

Corsi, F. (2009). A simple approximate long memory model of realized volatility. Journal ofFinancial Econometrics 7, 174–196.

Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press.

Diebold, F. X. and R. S. Mariano (1995). Comparing predictive accuracy. Journal of Businessand Economic Statistics, 253–263.

Duffie, D. (2001). Dynamic Asset Pricing Theory (Third ed.). Princeton University Press.

Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the varianceof U.K. inflation. Econometrica 50, 987–1008.

Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate generalizedautoregressive conditional heteroskedasticity models. Journal of Business & Economic Statis-tics 20 (3), pp. 339–350.

Engle, R. F. (2008). Anticipating Correlations. Princeton, New Jersey: Princeton University Press.

Foster, D. and D. B. Nelson (1996). Continuous record asymptotics for rolling sample varianceestimators. Econometrica 64, 139–174.

Giacomini, R. and B. Rossi (2009). Detecting and predicting forecast breakdowns. The Review ofEconomic Studies 76 (2), pp. 669–705.

Giacomini, R. and H. White (2006). Tests of conditional predictive ability. Econometrica 74 (6),1545–1578.

Goncalves, S. and R. de Jong (2003). Consistency of the stationary bootstrap under weak momentconditions. Economic Letters 81, 273–278.

45

Goncalves, S. and H. White (2002). The bootstrap of the mean for dependent heterogeneousarrays. Econometric Theory 18 (6), pp. 1367–1384.

Granger, C. (1999). Outline of forecast theory using generalized cost functions. Spanish EconomicReview 1, 161–173.

Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica 50, 1029–1054.

Hansen, P. R. (2005). A test for superior predictive ability. Journal of Business & EconomicStatistics 23 (4), 365–380.

Hansen, P. R. and A. Lunde (2006). Consistent ranking of volatility models. Journal of Econo-metrics 131, 97–121.

Hansen, P. R., A. Lunde, and J. M. Nason (2011). The model confidence set. Econometrica 79 (2),pp. 453–497.

Hansen, P. R. and A. Timmermann (2012). Choice of sample split in out-of-sample forecastevaluation. Technical report, European University Institute.

Huang, X. and G. T. Tauchen (2005). The relative contribution of jumps to total price variance.Journal of Financial Econometrics 4, 456–499.

Inoue, A. and L. Kilian (2004). In-sample or out-of-sample tests of predictability: Which oneshould we use? Econometric Reviews 23, 371–402.

Jacod, J. (2008). Asymptotic properties of realized power variations and related functionals ofsemimartingales. Stochastic Processes and their Applications 118, 517–559.

Jacod, J. and P. Protter (2012). Discretization of Processes. Springer.

Jacod, J. and M. Rosenbaum (2013). Quarticity and Other Functionals of Volatility: EfficientEstimation. Annals of Statistics 118, 1462–1484.

Kanaya, S. and D. Kristensen (2016). Estimation of stochastic volatility models by nonparametricfiltering. Econometric Theory 32, 861–916.

Kiefer, N. M. and T. J. Vogelsang (2005). A new asymptotic theory for heteroskedasticity-autocorrelation robust tests. Econometric Theory 21, pp. 1130–1164.

Kristensen, D. (2010). Nonparametric filtering of the realized spot volatility: A kernel-basedapproach. Econometric Theory 26 (1), pp. 60–93.

Lepingle, D. (1976). La variation d’ordre p des semi-martingales. Zeitschrift fur Wahrschein-lichkeitstheorie und Verwandte Gebiete 36, 295–316.

Mancini, C. (2001). Disentangling the jumps of the diffusion in a geometric jumping Brownianmotion. Giornale dell’Istituto Italiano degli Attuari LXIV, 19–47.

McCracken, M. W. (2000). Robust out-of-sample inference. Journal of Econometrics 99, 195–223.

46

McCracken, M. W. (2007). Asymptotics for out of sample tests of granger causality. Journal ofEconometrics 140, 719–752.

Muller, U. (2012). Hac corrections for strongly autocorrelated time series. Technical report,Princeton University.

Newey, W. K. and K. D. West (1987). A simple, positive semidefinite, heteroskedasticity andautocorrelation consistent covariance matrix. Econometrica 55, 703–708.

Noureldin, D., N. Shephard, and K. Sheppard (2012). Multivariate high-frequency-based volatility(HEAVY) models. Journal of Applied Econometrics 27, 907–933.

Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal ofEconometrics 160 (1), 246–256.

Patton, A. J. and K. Sheppard (2013). Good volatility, bad volatility: Signed jumps and thepersistence of volatility. Review of Economics and Statistics. Forthcoming.

Patton, A. J. and A. Timmermann (2010). Generalized forecast errors, a change of measure, andforecast optimality conditions. In Volatility and Time Series Econometrics: Essays in Honor ofRobert F. Engle. Oxford University Press.

Politis, D. N. and J. P. Romano (1994). The stationary bootstrap. Journal of the AmericanStatistical Association, 1303–1313.

Reno, R. (2006). Nonparametric estimation of stochastic volatility models. Economics Letters 90,390–395.

Romano, J. P. and M. Wolf (2005). Stepwise multiple testing as formalized data snooping. Econo-metrica 73 (4), 1237–1282.

Singleton, K. J. (2006). Empirical Dynamic Asset Pricing: Model Specification and EconometricAssessment. Princeton University Press.

Todorov, V. (2009). Estimation of Continuous-time Stochastic Volatility Models with Jumps usingHigh-Frequency Data. Journal of Econometrics 148, 131–148.

Todorov, V. and G. Tauchen (2012). The realized Laplace transform of volatility. Econometrica 80,1105–1127.

Todorov, V., G. Tauchen, and I. Grynkiv (2011). Realized laplace transforms for estimation ofjump diffusive volatility models. Journal of Econometrics 164, 367–381.

Vetter, M. (2010). Limit theorems for bipower variation of semimartingales. Stochastic Processesand their Applications 120, 22–38.

West, K. D. (1996). Asymptotic inference about predictive ability. Econometrica 64 (5), pp.1067–1084.

West, K. D. (2006). Forecast evaluation. In Handbook of Economic Forecasting. North HollandPress, Amsterdam.

47

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25.

White, H. (2000). A reality check for data snooping. Econometrica 68 (5), 1097–1126.

White, H. (2001). Asymptotic Theory for Econometricians. Academic Press.

Zhang, L. (2006). Efficient estimation of stochastic volatility using noisy observations: A multi-scale approach. Bernoulli 12, 1019–1043.

Zhang, L., P. A. Mykland, and Y. Aıt-Sahalia (2005). A tale of two time scales: Determiningintegrated volatility with noisy high-frequency data. Journal of the American Statistical Asso-ciation 100, 1394–1411.

48

Documents

Asymptotic Inference about Predictive Accuracy using High … · 2017. 6. 21. · Asymptotic Inference about Predictive Accuracy using High Frequency Data Jia Li Department of Economics