Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Asymptotic Inference about Predictive Accuracy
using High Frequency Data∗
Jia Li
Department of Economics
Duke University
Andrew J. Patton
Department of Economics
Duke University
This Version: June 3, 2017
Abstract
This paper provides a general framework that enables many existing inference methods for
predictive accuracy to be used in applications that involve forecasts of latent target variables.
Such applications include the forecasting of volatility, correlation, beta, quadratic variation,
jump variation, and other functionals of an underlying continuous-time process. We provide
primitive conditions under which a “negligibility” result holds, and thus the asymptotic size
of standard predictive accuracy tests, implemented using a high-frequency proxy for the latent
variable, is controlled. An extensive simulation study verifies that the asymptotic results apply
in a range of empirically relevant applications, and an empirical application to correlation
forecasting is presented.
Keywords: Forecast evaluation, realized variance, volatility, jumps, semimartingale.
JEL Codes: C53, C22, C58, C52, C32.
∗We wish to thank Francis Diebold, Dobrislav Dobrev, Ronald Gallant, Silvia Goncalves, Michael McCracken,
Nour Meddahi, Benoit Perron, Dacheng Xiu, conference and seminar participants at the 2013 NBER summer insti-
tute, 2013 Symposium on Econometric Theory and Applications (SETA), 2013 Econometric Society Asian Meeting,
Cambridge, UCL and University of Pennsylvania for helpful comments and discussions. All errors are ours. Financial
support from the NSF under grants SES-1227448 and SES-1326819 (Li) is gratefully acknowledged. Contact address:
Department of Economics, Duke University, 213 Social Sciences Building, Box 90097, Durham NC 27708-0097, USA.
Email: [email protected] and [email protected].
1
1 Introduction
A central problem in times series analysis is the forecasting of economic variables. In financial
applications, the variables to be forecast are often risk measures, such as volatility, beta, correlation,
and jump characteristics (see Andersen et al. (2006) for a survey). Since the seminal work of Engle
(1982), numerous models have been proposed to forecast risk measures, and these forecasts are
of fundamental importance in financial decisions. The problem of evaluating the performance
of these forecasts is complicated by the fact that many risk measures, although well-defined in
models, are not observable even ex post. A large literature (see West (2006) for a survey) has
evolved presenting methods for (pseudo) out-of-sample inference for predictive accuracy, however
existing work typically relies on the observability of the forecast target. The goal of the current
paper is to provide a general methodology for extending the applicability of forecast evaluation
methods to settings with unobservable forecast target variables.
Inspired by Andersen and Bollerslev (1998), we propose to evaluate competing forecasts with
respect to a proxy of the latent target variable, with the proxy computed from high-frequency
(intraday) data, in the application of forecast evaluation methods. Prima facie, such inference is
not of direct economic interest, in that a good forecast for the proxy may not be a good forecast
of the latent target variable. The gap, formally speaking, arises from the fact that hypotheses
concerning the proxy (which we label “proxy hypotheses”) are not the same as those concerning
the true target variable (i.e., “true hypotheses”). To fill this gap, we consider an asymptotic setting
in which the proxy is constructed using data sampled from asymptotically-increasing frequencies.
Under this setting, the proxy hypotheses can be considered as “local” to the true hypotheses, and
we provide both high-level and primitive sufficient conditions under which the moments that specify
the proxy hypotheses converge sufficiently fast to their counterparts in the true hypotheses. This
convergence leads to an asymptotic negligibility result: forecast evaluation methods using proxies
have the same asymptotic size and power properties under the proxy hypotheses as under the true
hypotheses.
The strategy of using high-frequency proxies to conduct inference has proven successful in
prior work on the estimation of stochastic volatility models. Bollerslev and Zhou (2002) estimate
stochastic volatility models treating the realized variance as the unobserved integrated variance.
Corradi and Distaso (2006) and Todorov (2009) generalize this approach by considering additional
realized measures for the integrated variance using the generalized method of moments (GMM)
2
of Hansen (1982). These authors provide theoretical justifications for this approach by provid-
ing conditions that ensure the asymptotic negligibility of the proxy error in GMM inference for
stochastic volatility models. Realized measures for other volatility functionals have also been used
for parametric and nonparametric estimation of stochastic volatility models: for example, Todorov
et al. (2011) use the realized Laplace transform of volatility (Todorov and Tauchen (2012)) for
estimating parametric stochastic volatility models; Reno (2006), Kanaya and Kristensen (2016)
and Bandi and Reno (2016) consider nonparametric estimation of stochastic volatility models us-
ing spot volatility estimates (Foster and Nelson (1996), Comte and Renault (1998), Kristensen
(2010)).
Our asymptotic negligibility result shares the same nature as that in the important work of
Corradi and Distaso (2006), among others. However, the focus of the current paper is distinct
from aforementioned work in two important aspects. First, compared with (in-sample) GMM
estimation, the out-of-sample forecast evaluation problem has unique complications in the econo-
metric structure. Indeed, even in the case with ex post observable forecast targets, it is well known
that forecast evaluation procedures can be drastically different from each other depending on how
unknown parameters in a forecast model are estimated and updated, on whether the competing
forecast models are nested or nonnested, and on how critical values of tests are computed (e.g.,
via direct estimation or bootstrap); see, for example, Diebold and Mariano (1995), West (1996),
White (2000), McCracken (2000), Hansen (2005), Giacomini and White (2006) and McCracken
(2007), as well as the comprehensive review of West (2006). The apparent idiosyncrasies of these
methods present a nontrivial challenge for designing a general theoretical framework for solving
the latent-target problem for a broad range of evaluation methods. Second, while prior work used
proxies of the volatility or its integrated functionals such as integrated volatility and the volatil-
ity Laplace transform for estimating stochastic volatility models, forecasting applications often
concern a much broader set of risk factors, such as beta, correlation, total quadratic variation,
semivariance and jump variations. The broad practical scope of financial forecasting thus calls for
an extensive analysis on a wide spectrum of risk measures and proxies.
The main contribution of the current paper is to address these two issues in a general and
compact framework. We achieve generality by using two (sets of) high-level conditions that are
designed for bridging two large literatures: forecast evaluation and high-frequency econometrics.
The first set of conditions posit an abstract structure on the forecast evaluation methods; we
show that these conditions are readily verified for many inference methods proposed in the exist-
3
ing literature, including all of the evaluation methods cited above, and can be readily extended
to stepwise testing procedures such as Romano and Wolf (2005) and Hansen et al. (2011). The
second condition concerns the approximation accuracy of the high-frequency proxy relative to the
latent target variable. The main technical contribution of this paper is to verify this condition
under primitive conditions for general classes of high-frequency based estimators of volatility and
jump risk measures in a general Ito semimartingale model for asset prices. In particular, we al-
low for realistic features such as the leverage effect and (active) price and volatility jumps. Our
results cover many existing estimators as special cases, such as realized variation (Andersen et al.
(2003)), truncated variation (Mancini (2001)), bipower variation (Barndorff-Nielsen and Shephard
(2004b)), realized covariation, beta and correlation (Barndorff-Nielsen and Shephard (2004a)),
realized Laplace transform (Todorov and Tauchen (2012)), general integrated volatility function-
als (Jacod and Protter (2012), Jacod and Rosenbaum (2013)), realized skewness, kurtosis and
their extensions (Lepingle (1976), Jacod (2008), Amaya et al. (2011)), and realized semivariance
(Barndorff-Nielsen et al. (2010) and Patton and Sheppard (2013)). These technical results may be
useful for other applications as well (e.g., Corradi and Distaso (2006) and Todorov (2009)).
The existing literature includes some work on forecast evaluation for latent target variables
using proxy variables. In their seminal work, Andersen and Bollerslev (1998) advocated using
realized variance as a proxy for evaluating volatility forecast models; see also Andersen et al.
(2003) and Andersen et al. (2005). A theoretical justification for this approach was proposed by
Hansen and Lunde (2006) and Patton (2011), based on restrictions on the loss function used for
comparison and the availability of conditionally unbiased proxies. Their unbiasedness condition
must hold in finite samples, which is hard to verify except for certain cases: it may be plausible
for realized variance in some applications, but is unlikely to hold for other realized measures (such
as jump-robust measures of volatility like bipower variation, or ratios of measures like realized
correlation). In contrast, our framework extends the insight of prior work with an asymptotic
argument and is applicable for most known high-frequency based estimators.
We note that our asymptotic negligibility result reflects a simple and robust intuition: the
approximation error in the high-frequency proxy will be negligible when it is small in comparison
with the “intrinsic” sampling variability for forecast evaluation that would arise even in situations
with observable targets. Since the ex post measurement of latent risks is generally much easier than
their ex ante prediction, this intuition and, hence, our asymptotic formalization, should be relevant
in many empirical settings. To judge the performance of the asymptotic results, we conduct three
4
distinct and realistically calibrated Monte Carlo studies. The Monte Carlo evidence is supportive
for our theory.
We illustrate the usefulness of our approach in an empirical example for evaluating forecasts
of the conditional correlation between stock returns. Correlation forecasting is of substantial
importance in practice (Engle (2008)) but existing evaluation methods (see, e.g., Hansen and
Lunde (2006), Patton (2011)) are silent on how rigorous forecast evaluation can be conducted.
We consider four forecasting methods, starting with the popular dynamic conditional correlation
(DCC) model of Engle (2002). We then extend this model to include an asymmetric term, as in
Cappiello et al. (2006), which allows correlations to rise more following joint negative shocks than
other shocks, and to include the lagged realized correlation matrix, which enables the model to
exploit higher frequency data, in the spirit of Noureldin et al. (2012). We find evidence, across a
range of correlation proxies, that including high frequency information in the forecast model leads
to out-of-sample gains in accuracy, while the inclusion of an asymmetric term does not lead to
such gains.
This paper is organized as follows. Section 2 presents the econometric setting. Section 3
presents the asymptotic properties of generic forecast evaluation methods using proxies under a
high-level condition, and in Section 4 we provide results for verifying the high-level condition for
a variety of high-frequency proxies under primitive conditions. We discuss further extensions to
other evaluation methods in Section 5. Monte Carlo results and an empirical application are in
Sections 6 and 7, respectively. All proofs are in the appendix.
All limits below are for T →∞. We useP−→ to denote convergence in probability and
d−→ to
denote convergence in distribution. All vectors are column vectors. For any matrix A, we denote
its transpose by Aᵀ and its (i, j) component by Aij . The (i, j) component of a matrix-valued
stochastic process At is denoted by Aij,t. We write (a, b) in place of (aᵀ, bᵀ)ᵀ. The jth component
of a vector x is denoted by xj . For x, y ∈ Rq, q ≥ 1, we write x ≤ y if and only if xj ≤ yj for
every j ∈ 1, . . . , q. For a generic variable X taking values in a finite-dimensional space, we use
κX to denote its dimensionality; the letter κ is reserved for such use. We use ‖·‖ to denote the
Euclidean norm of a vector, where a matrix is identified as its vectorized version. For each p ≥ 1,
‖·‖p denotes the Lp norm. We use to denote the Hadamard product between two identically
sized matrices, which is computed simply by element-by-element multiplication. The notation ⊗
stands for the Kronecker product. For two sequences of strictly positive real numbers at and bt,
t ≥ 1, we write at bt if and only if the sequences at/bt and bt/at are both bounded.
5
2 The setting
2.1 A motivating example
We start with a simple motivating example that concerns one-period-head volatility forecasting
and use this to illustrate the key concepts of our framework, and in the next section we present
the general setting that we use for the remainder of the paper.
Let (σt)t≥0 denote the stochastic volatility process of an asset and normalize the unit of time to
be one day. Since Andersen and Bollerslev (1998), the integrated volatility IVt ≡∫ tt−1 σ
2sds has been
widely used as a model-free measure of volatility. Although the IV is defined in continuous time, it
is typical in practice to construct forecasts for it by using discrete-time models. One leading choice
is the classical GARCH(1,1) model (Bollerslev (1986)) estimated using quasi maximum likelihood
on daily returns rt : t = 1, 2, . . .:
Model 1:
rt = stεt, (εt)t≥1 are i.i.d. N (0, 1) ,
s2t = ω + γs2
t−1 + αr2t−1,
(2.1)
and the resulting volatility forecast at time t + 1 is F1,t+1 = s2t+1. Another popular forecasting
model is the heterogeneous autoregressive (HAR) model (Corsi (2009)) estimated via ordinary
least squares using realized variances (RV), that is,
Model 2:
RV 5mint = b0 + b1RV
5mint−1 + b2
∑5k=1RV
5mint−k
+b3∑22
k=1RV5mint−k + et,
(2.2)
where RV 5mint denotes the RV formed as the sum of squared 5-minute returns within day t and
the volatility forecast at time t+ 1 is F2,t+1 = RV t+1. Our goal in this example is to compare the
predictive accuracy of the two competing IV forecast series, F1,t+1 and F2,t+1, where t ranges over
the out-of-sample period R, . . . , T.
Before discussing the evaluation problem, we make two remarks on these forecasts. Firstly, we
stress that we shall be agnostic about the underlying true dynamics of the volatility process and we
do not assume these forecasting models to be correctly specified. After all, potentially misspecified
models can still produce good forecasts and are widely used in practice. Given this, we are not
interested in the model parameters per se, but instead our focus is on comparing the forecasts that
these models generate. Therefore, our focus is very different from the semiparametric estimation
problems in stochastic volatility models studied by Corradi and Distaso (2006), Todorov (2009),
Todorov et al. (2011), Kanaya and Kristensen (2016) and Bandi and Reno (2016), among others.
6
Secondly, we treat the data that enters into the forecasts “as is,” and do not attempt to consider
what the data sets might converge to. For example, when forming forecasts using Model 2, we
treat the 5-minute RV simply as an observed time series (which forms the conditioning information
set when making forecasts), instead of as an approximation to the unobserved quadratic variation
of the asset price. In other words, we do not aim to evaluate the infeasible forecasts that could be
generated using Model 2 but with the 5-minute RV replaced by the limiting (unobserved) quadratic
variation. Doing so allows us to compare the forecasts from Model 2 with those formed similarly,
say, by fitting the HAR model using the 1-minute RV. This type of comparison is relevant in
applications and, from a theoretical point of view, can only be done meaningfully by treating
these forecasts as two distinct series, instead of as two approximations of the same infeasible
forecast (because the latter would lead to the trivial conclusion that these forecasts are the same
asymptotically).
We now turn to the forecast evaluation problem. Clearly, the inference would be standard
if one could observe the forecast target (i.e., IVt+1); the evaluation methods mentioned in the
Introduction could be directly applied in this case. As an example, consider the Diebold–Mariano
test for equal predictive ability under the absolute deviation loss, that is, a test for the null
hypothesis
H†0 : E [|IVt+1 − F1,t+1|] = E [|IVt+1 − F2,t+1|] , t ∈ R, . . . , T; (2.3)
here, we use † to highlight the dependency on the latent forecast target. If IVt+1 were observable,
one could estimate the expected loss differential between the two forecasts using its sample analogue
DM †T ≡1
P
T∑t=R
(|IVt+1 − F1,t+1| − |IVt+1 − F2,t+1|) , (2.4)
where P = T − R + 1 is the length of the testing sample. Under some mild weak-dependence
assumptions on the series of loss differentials, we have
√P(DM †T − E[DM †T ]
)d−→ N(0, S†), (2.5)
where S† denotes the long-run variance of the loss differential series |IVt+1−F1,t+1|−|IVt+1−F2,t+1|.
Under the null hypothesis H†0 , E[DM †T ] = 0 holds, which can be tested by examining whether DM †T
is statistically different from zero.
The complication here, of course, is that IVt+1 is not directly observed; hence, the testing
procedure above is infeasible. As suggested by Andersen and Bollerslev (1998), a feasible alter-
native to the above procedure is to use an observable proxy Yt+1 in place of IVt+1 for evaluating
7
the forecasts. Possible choices of the proxy include the truncated variation of Mancini (2001) or
the bipower variation of Barndorff-Nielsen and Shephard (2004b) constructed from high-frequency
data, because these realized measures are known to estimate the IV while being robust to the
presence of jumps. The feasible counterpart of DM †T in (2.4) is then given by
DMT =1
P
T∑t=R
(|Yt+1 − F1,t+1| − |Yt+1 − F2,t+1|) .
Applying a central limit theorem on the series |Yt+1 − F1,t+1| − |Yt+1 − F2,t+1|, we have
√P (DMT − E[DMT ])
d−→ N (0, S) , (2.6)
where S denotes the associated long-run variance. In view of (2.6), we can implement a two-sided
t-test at level α which rejects the null hypothesis of
H0 : E [DMT ] ≡ 1
P
T∑t=R
E [|Yt+1 − F1,t+1| − |Yt+1 − F2,t+1|] = 0 (2.7)
if |DMT | > zα/2S1/2T , where ST is a consistent estimator of S and zα/2 is the α/2 upper quantile
of the standard normal distribution.
Although this feasible test is readily implementable, we stress that the hypothesis being tested
(i.e., (2.7)) is different from the original one (i.e., (2.3)), because the former concerns the relative
closeness of the forecasts to the proxy rather than to the true forecast target. To differentiate (2.3)
from (2.7), we refer to them as the true hypothesis and the proxy hypothesis, respectively.
Work in the forecast evaluation literature has established conditions under which the feasible
test has desirable size and power properties under the proxy hypothesis. Our goal is to provide a
set of sufficient conditions under which the feasible test also attains the same rejection probabilities
under the true hypothesis. In the current simple example, a sufficient condition is
√P(E[DMT ]− E[DM †T ]
)= o(1). (2.8)
Indeed, under the condition in (2.8), equation (2.6) implies that
√P(DMT − E[DM †T ]
)d−→ N (0, S) . (2.9)
And so the feasible test has the same rejection probabilities also under the true hypothesis, which
formally justifies the use of the feasible test for testing the true hypothesis.
8
Intuitively, condition (2.8) requires that replacing the forecast target with its proxy leads
to asymptotically negligible difference in the expected loss differential. Since the true and the
proxy hypotheses only depend on the expected loss differentials, we refer to condition (2.8) as a
“convergence-of-hypotheses” condition. This high-level condition is closely related to the conver-
gence rate of the high-frequency proxy Yt+1 towards the target IVt+1 as the sampling interval ∆
of the high-frequency data goes to zero. Section 4 is devoted to providing primitive conditions to
ensure that (2.8) holds.
There are two building blocks underlying the above example, as well as the general theory that
follows. Firstly, we establish (drawing on the large literature on forecast evaluation) the properties
of the feasible test under the proxy hypothesis. In the example above, this relates to equation (2.6).
Secondly, we ensure that the proxy hypothesis is indistinguishable, up to statistical precision, from
the true hypothesis (relating to equation condition (2.8) above), making them effectively the same
for computing rejection probabilities.
Below we substantially generalize each of the two building blocks. Firstly, we accommodate
essentially all leading forecast evaluation procedures in the econometrics literature. Unlike the
Diebold–Mariano test considered above, which arguably has the simplest econometric structure
among such procedures, other evaluation methods can be much more involved as we describe
in Section 3. For example, the asymptotic behavior of the test statistic may depend on how the
forecasts are updated (e.g., using fixed, rolling, or recursive windows); their asymptotic distribution
may be nonstandard; the inference may be done by bootstrap; and the long-run variance estimator
may be inconsistent. Despite these complications, we show that the econometric structure of most
evaluation methods can be cast under a high-level condition that generalizes (2.6) in the above
example and plays the same role in our general framework.
Secondly, we consider a broad class of forecast targets and proxies, beyond the simple realized
variance considered above. This generalization is relevant in practice because researchers are
interested in forecasting not only the IV but also general functionals of volatility (e.g., beta,
correlation and idiosyncratic variance) as well as functionals of jumps (e.g., power variations of
jumps). To this end, we need to characterize, in a proper sense, the proxy accuracy of various
high-frequency estimators so as to verify the convergence-of-hypotheses condition under general
primitive conditions. These results are collected in Section 4.
9
2.2 True hypotheses and proxy hypotheses in a general setting
We now describe the setting for the general framework. Let (Y †t )t≥1 be the time series to be
forecast, which takes values in Y ⊆ RκY . We stress at the outset that Y †t is not observable, but
a proxy Yt is available. At time t, the forecaster uses data Dt ≡ Ds : 1 ≤ s ≤ t to form a
forecast of Y †t+τ , where the horizon τ ≥ 1 is fixed throughout the paper. We consider k competing
sequences of forecasts of Y †t+τ , collected by Ft+τ ≡ (F1,t+τ , . . . , Fk,t+τ ). In practice, Ft+τ is often
constructed from forecast models that involve some parameter β (e.g., β = (ω, γ, α, b0, b1, b2, b3)
for the example in Section 2.1). We write Ft+τ (β) to emphasize such dependence and refer to the
function Ft+τ (·) : β 7→ Ft+τ (β) as the forecast model. Let βt be an estimator constructed using
(possibly a subset of) the dataset Dt and β∗ be its “population” analogue. We do not require the
forecast model to be correctly specified, so we treat β∗ as a pseudo-true parameter (White (1982)).
Two types of forecasts have been considered in the literature: the actual forecast Ft+τ =
Ft+τ (βt) and the population forecast Ft+τ (β∗). While our motivating example in the previous
subsection concerns the evaluation of the actual forecasts, it is also possible to use them to make
inference about Ft+τ (β∗), that is, an inference concerning the forecast model (see, e.g., West
(1996)). Of course, if the researcher is interested in assessing the performance of the actual
forecasts in Ft+τ , he/she can treat the actual forecast as an observable sequence (see, e.g., Diebold
and Mariano (1995) and Giacomini and White (2006)), which amounts to setting β∗ to be empty.
With this convention, we can use the notation Ft+τ (β∗) in the study of the inference for actual
forecasts without conceptual ambiguity.
Given the target Y †t+τ , the performance of the competing forecasts is measured by f †∗t+τ ≡
ft+τ (Y †t+τ , β∗), where ft+τ (y, β) ≡ f(y, Ft+τ (β)) for some known measurable Rκf -valued function
f(·). The function f(·) plays the role of an evaluation measure. Typically, f(·) computes the
loss differential between competing forecasts: for example, f(y, (F1, F2)) = |y − F1| − |y − F2|
corresponds to the absolute deviation loss that is used in Section 2.1. The proxy of f †∗t+τ is given
by f∗t+τ ≡ ft+τ (Yt+τ , β∗), which in turn can be estimated by ft+τ ≡ ft+τ (Yt+τ , βt). We then set
f †∗T ≡ P−1
T∑t=R
f †∗t+τ , f∗T ≡ P−1T∑t=R
f∗t+τ , fT ≡ P−1T∑t=R
ft+τ , (2.10)
where T is the size of the full sample, P = T − R + 1 is the size of the prediction sample and R
is the size of the estimation sample.1 In the sequel, we always assume P T as T →∞ without
1The notations PT and RT may be used in place of P and R. We follow the literature and suppress the dependence
on T . The estimation and prediction samples are often called the in-sample and (pseudo-) out-of-sample periods.
10
further mention, while R may be fixed or diverge to ∞, depending on the application.
We now turn to the hypotheses of interest. We consider two classical testing problems for
forecast evaluation: testing for equal predictive ability (one-sided or two-sided) and testing for
superior predictive ability. Formally, we consider the following hypotheses: for some user-specified
constant χ ∈ Rκf ,
Equal
Predictive Ability
(EPA)
H†0 : E[f †∗T ] = χ,
vs. H†1a : lim infT→∞ E[f †∗j,T ] > χj for some j ∈ 1, . . . , κf ,
or H†2a : lim infT→∞ ‖E[f †∗T ]− χ‖ > 0,
(2.11)
Superior
Predictive Ability
(SPA)
H†0 : E[f †∗T ] ≤ χ,
vs. H†a : lim infT→∞ E[f †∗j,T ] > χj for some j ∈ 1, . . . , κf ,(2.12)
where H†1a (resp. H†2a) in (2.11) is the one-sided (resp. two-sided) alternative. In practice, the
constant χ is usually set to be zero.2 Note that despite their assigned labels, these hypotheses can
also be used to test for forecast encompassing and forecast rationality by setting the function f(·)
properly; see, for example, West (2006).
Since the hypotheses in (2.11) and (2.12) rely on the true forecast target Y †t , we refer to them
as the true hypotheses. These hypotheses allow for data heterogeneity and are cast in the same
fashion as in Giacomini and White (2006). Under (mean) stationarity, these hypotheses coincide
with those considered by Diebold and Mariano (1995), West (1996) and White (2000), among
others. Clearly, if Y †t were observable, these existing inference methods could be applied to test
the true hypotheses by forming test statistics based on ft+τ (Y †t+τ , βt). However, the latency of Y †t
renders these inference methods infeasible.
Feasible versions of these tests can be implemented with Y †t+τ replaced by Yt+τ . However, as
we illustrated in the previous section, the hypotheses underlying the feasible inference procedure
are proxy hypotheses given by
Proxy Equal
Predictive Ability
(PEPA)
H0 : E
[f∗T]
= χ,
vs. H1a : lim infT→∞ E[f∗j,T ] > χj for some j ∈ 1, . . . , κf ,
or H2a : lim infT→∞ ‖E[f∗T ]− χ‖ > 0,
(2.13)
2Allowing χ to be nonzero incurs no additional cost in our derivations. This flexibility is particularly useful in
the design of Monte Carlo experiment that examines the finite-sample performance of the asymptotic theory below;
see Section 6 for details.
11
Proxy Superior
Predictive Ability
(PSPA)
H0 : E[f∗T ] ≤ χ,
vs. Ha : lim infT→∞ E[f∗j,T ] > χj for some j ∈ 1, . . . , κf .(2.14)
These hypotheses are not of immediate economic relevance, because economic agents are, by as-
sumption, interested in forecasting the true target Y †t+τ , rather than its proxy.
Below, we provide conditions under which the moments that define the proxy hypotheses
converge “sufficiently fast” to their equivalents under the true hypotheses, and we show that tests
which are valid under the former are also valid under the latter.
3 Forecast evaluation methods with proxies
In this section, we present the asymptotic properties of the feasible evaluation methods using
proxies. In Section 3.1, we focus on testing proxy hypotheses and introduce high-level conditions
that link many apparently distinct tests of predictive accuracy into a unified framework. Doing
so greatly simplifies the presentation in Section 3.2, where we show that the feasible tests using
proxies are also asymptotically valid under the true hypotheses. This result relies on a high-level
“convergence-of-hypotheses” condition, which can be verified under primitive conditions using the
convergence rate results that we develop in Section 4.
3.1 Conditions on evaluation methods based on proxies
In this subsection, we introduce an abstract econometric structure that we show is common to
most forecast evaluation procedures with an observable forecast target, the role of which is played
by the proxy Yt in the setting of the current paper. These conditions speak to the proxy hypotheses
PEPA and PSPA, but not the true hypotheses. We link these conditions to the true hypotheses
in Section 3.2.
We consider a test statistic of the form
ϕT ≡ ϕ(aT (fT − χ), a′TST ) (3.1)
for some measurable function ϕ : Rκf × S 7→ R, where aT → ∞ and a′T are known deterministic
sequences, fT is defined in (2.10) and ST is a sequence of S-valued estimators that is mainly used
for studentization.3 In almost all cases, aT = P 1/2 and a′T ≡ 1; recall that P increases with T . An
3The space S changes across applications, but is always implicitly assumed to be a Polish space.
12
exception is given by Example 3.4 below. In many applications, ST plays the role of an estimator
of some asymptotic variance, which may or may not be consistent (see Example 3.2 below); S is
then the space of positive definite matrices.
Let α ∈ (0, 1) be the significance level of a test. We consider a (nonrandomized) test of the
form φT = 1ϕT > zT,1−α, that is, we reject the null hypothesis when the test statistic ϕT is
greater than some critical value zT,1−α. We now introduce some high-level assumptions.
Assumption A1: (aT (fT−E[f∗T ]), a′TST )d−→ (ξ, S) for some deterministic sequences aT →∞
and a′T , and random variables (ξ, S). Here, (aT , a′T ) may be chosen differently under the null and
the alternative hypotheses, but ϕT is invariant to such choice.
Assumption A1 mainly posits that fT is centered at E[f∗T ] with a well-behaved asymptotic
distribution. Since E[f∗T ] characterizes the proxy hypotheses (recall (2.13) and (2.14)), Assumption
A1 concerns an evaluation problem with the observed proxy instead of the latent true target. This
assumption can be verified for many existing methods that involve observable forecast targets; for
example (2.8) in our motivating example is a special case of Assumption A1. In this basic case,
Assumption A1 is verified by using a (feasible) central limit theorem on the observed time series
of proxy loss differentials for which general primitive conditions are well known in econometrics.
Below we first discuss a generalized version of it, and then introduce a battery of additional
examples that involve various complications that arise in forecast evaluation problems, and describe
how to verify Assumption A1 in each of them.
Example 3.1: Giacomini and White (2006) consider tests for equal predictive ability between
two sequences of actual forecasts, or “forecast methods” in their terminology, assuming R fixed.
In this case, f(Yt, (F1,t, F2,t)) = L(Yt, F1,t) − L(Yt, F2,t) for some loss function L(·, ·). Moreover,
one can set β∗ to be empty and treat each actual forecast as an observed sequence, so fT =
f∗T . Using a CLT for heterogeneous weakly dependent data, one can take aT = P 1/2 and verify
aT (fT − E[fT ])d−→ ξ, where ξ is centered Gaussian with long-run variance denoted by Σ. We
then set S = Σ and a′T ≡ 1, and let ST be a heteroskedasticity and autocorrelation consistent
(HAC) estimator of S (Newey and West (1987), Andrews (1991)). Assumption A1 then follows
from Slutsky’s lemma. Diebold and Mariano (1995) intentionally treat the actual forecasts as
primitives without introducing the forecast model (and hence β∗); their setting is also covered by
Assumption A1 by the same reasoning.
13
Example 3.2: Consider the same setting as in Example 3.1, but let ST be an inconsistent
long-run variance estimator of Σ as considered by, for example, Kiefer and Vogelsang (2005). Using
their theory, we verify (P 1/2(fT − E[fT ]), ST )d−→ (ξ, S), where S is a (nondegenerate) random
matrix and the joint distribution of ξ and S is known, up to the unknown parameter Σ, but is
nonstandard.
Example 3.3: West (1996) considers inference for nonnested forecast models in a setting with
R→∞. West’s Theorem 4.1 shows that P 1/2(fT−E[f∗T ])d−→ ξ, where ξ is centered Gaussian with
its variance-covariance matrix denoted here by S, which captures both the sampling variability of
the forecast error and the discrepancy between βt and β∗. We can set ST to be the consistent
estimator of S as proposed in West’s comment 6 to Theorem 4.1. Assumption A1 is then verified
by using Slutsky’s lemma for aT = P 1/2and a′T ≡ 1. West’s theory relies on the differentiability
of the function ft+τ (·) with respect to β and concerns βt in the recursive scheme. Similar results
allowing for a nondifferentiable ft+τ (·) function can be found in McCracken (2000). Giacomini and
Rossi (2009) generalize West’s theory to settings without covariance stationarity. Assumption A1
can be verified similarly in these more general settings.
Example 3.4: McCracken (2007) considers inference on nested forecast models allowing for
recursive, rolling, and fixed estimation schemes, all with R→∞. The evaluation measure ft+τ is
the difference between the quadratic losses of the nesting and the nested models. For his OOS-t
test, McCracken proposes using a normalizing factor ΩT = P−1∑T
t=R(ft+τ−fT )2 and considers the
test statistic ϕT ≡ ϕ(P fT , P ΩT ), where ϕ(u, s) = u/√s. Implicitly in his proof of Theorem 3.1, it
is shown that under the null hypothesis of equal predictive ability, (P (fT−E[f∗T ]), P ΩT )d−→ (ξ, S),
where the joint distribution of (ξ, S) is nonstandard and is specified as a function of a multivariate
Brownian motion. Assumption A1 is verified with aT = P , a′T ≡ P and ST = ΩT . The nonstandard
rate arises as a result of the degeneracy between correctly specified nesting models. Under the
alternative hypothesis, it can be shown that Assumption A1 holds for aT = P 1/2 and a′T ≡ 1,
as in West (1996). Clearly, the OOS-t test statistic is invariant to the change of (aT , a′T ), that
is, ϕT = ϕ(P 1/2fT , ΩT ) holds. Assumption A1 can also be verified for various extensions of
McCracken (2007); see, for example, Inoue and Kilian (2004), Clark and McCracken (2005) and
Hansen and Timmermann (2012).
Example 3.5: White (2000) considers a setting similar to West (1996), with an emphasis on
considering a large number of competing forecasts, but uses a test statistic without studentization.
14
Assumption A1 is verified similarly as in Example 3.3, but with ST and S being empty.
Assumption A2: ϕ (·, ·) is continuous almost everywhere under the law of (ξ, S).
Assumption A2 is satisfied by all standard test statistics used in forecast evaluation: for simple
pair-wise forecast comparisons, the test statistic usually takes the form of t-statistic, that is,
ϕt-stat(ξ, S) = ξ/√S. For joint tests it may take the form of a Wald-type statistic, ϕWald(ξ, S) =
ξᵀS−1ξ, or a maximum over individual (possibly studentized) test statistics ϕMax(ξ, S) = maxi ξi
or ϕStuMax(ξ, S) = maxi ξi/√Si.
Assumption A2 imposes continuity on ϕ (·, ·) in order to facilitate the use of the continuous
mapping theorem for studying the asymptotics of the test statistic ϕT . More specifically, under
the null hypothesis of PEPA, which is also the null least favorable to the alternative in PSPA
(White (2000), Hansen (2005)), Assumption A1 implies that (aT (fT − χ), a′TST )d−→ (ξ, S). By
the continuous mapping theorem, Assumption A2 then implies that the asymptotic distribution of
ϕT under this null is ϕ(ξ, S). The critical value of a test at nominal level α is given by the 1− α
quantile of ϕ(ξ, S), on which we impose the following condition.
Assumption A3: The distribution function of ϕ (ξ, S) is continuous at its 1 − α quantile
z1−α. Moreover, the sequence zT,1−α of critical values satisfies zT,1−αP−→ z1−α.
The first condition in Assumption A3 is very mild. Assumption A3 is mainly concerned with
the availability of the consistent estimator of the 1− α quantile z1−α. This assumption is slightly
stronger than what we actually need. Indeed, we only need the convergence to hold under the null
hypothesis, while, under the alternative, we only need the sequence zT,1−α to be tight.
Below, we discuss examples for which Assumption A3 can be verified.
Example 3.6: In many cases, the limit distribution of ϕT under the null of PEPA is standard
normal or chi-square with some known number of degrees of freedom. Examples include tests
considered by Diebold and Mariano (1995), West (1996) and Giacomini and White (2006). In
the setting of Example 3.2 or 3.4, ϕT is a t-statistic or Wald-type statistic, with an asymptotic
distribution that is nonstandard but pivotal, with quantiles tabulated in the original papers.4
4One caveat is that the OOS-t statistic in McCracken (2007) is asymptotically pivotal only under the somewhat
restrictive condition that the forecast errors form a conditionally homoskedastic martingale difference sequence.
In the presence of conditional heteroskedasticity or serial correlation in the forecast errors, the null distribution
15
Assumption A3 for these examples can be verified by simply taking zT,1−α as the known quantile
of the limit distribution.
Example 3.7: White (2000) considers tests for superior predictive ability. Under the null
least favorable to the alternative, White’s test statistic is not asymptotically pivotal, as it depends
on the unknown covariance matrix of the limit variable ξ. White suggests computing the critical
value via either simulation or the stationary bootstrap (Politis and Romano (1994)), corresponding
respectively to his “Monte Carlo reality check” and “bootstrap reality check” methods. In particu-
lar, under stationarity, White shows that the bootstrap critical value consistently estimates z1−α.5
Hansen (2005) considers test statistics with studentization and shows the validity of a refined
bootstrap critical value, under stationarity. The validity of the stationary bootstrap holds in more
general settings allowing for moderate heterogeneity (Goncalves and White (2002), Goncalves and
de Jong (2003)). We hence conjecture that the bootstrap results of White (2000) and Hansen
(2005) can be extended to a setting with moderate heterogeneity, although a formal discussion is
beyond the scope of the current paper. In these cases, the simulation- or bootstrap-based critical
value can be used as zT,1−α in order to verify Assumption A3.
Finally, we need two alternative sets of assumptions on the test function ϕ (·, ·) for one-sided
and two-sided tests, respectively.
Assumption B1: For any s ∈ S, we have (i) ϕ(u, s) ≤ ϕ(u′, s) whenever u ≤ u′, where
u, u′ ∈ Rκf ; (ii) ϕ(u, s)→∞ whenever uj →∞ for some 1 ≤ j ≤ κf and s→ s.
Assumption B2: For any s ∈ S, ϕ(u, s)→∞ whenever ‖u‖ → ∞ and s→ s.
Assumption B1(i) imposes monotonicity on the test statistic as a function of the evaluation
measure, and is used for size control in the PSPA setting. Assumption B1(ii) concerns the consis-
tency of the test against the one-sided alternative and is easily verified for commonly used one-sided
test statistics, such as ϕt-stat, ϕMax and ϕStuMax described in the comment following Assumption
A2. Assumption B2 serves a similar purpose for two-sided tests, and is also easily verifiable.
generally depends on a nuisance parameter (Clark and McCracken (2005)). Nevertheless, the critical values can be
consistently estimated via a bootstrap (Clark and McCracken (2005)) or plug-in method (Hansen and Timmermann
(2012)).5White (2000) shows the validity of the bootstrap critical value in a setting where the sampling error in βt is
asymptotically irrelevant (West (1996), West (2006)). Corradi and Swanson (2007) propose a bootstrap critical value
in the general setting of West (1996), without imposing asymptotic irrelevance.
16
3.2 Asymptotic properties of the feasible inference procedure
In this subsection, we show that the feasible tests described in Section 3.1 are asymptotically
valid under the true hypotheses. Similar to condition (2.8) in our basic example, we need a
convergence-of-hypotheses condition so as to bridge the gap between the proxy hypotheses and the
true hypotheses. In the general setting, this condition is formalized as follows:
Assumption C: aT (E[f∗T ]− E[f †∗T ])→ 0, where aT is given by Assumption A1.
Assumption C is closely related to the approximation accuracy of the proxies. Since we are
interested in proxies constructed using high-frequency data, this condition is mainly related to the
convergence rate of high-frequency estimators, together with the growth rates of the time-series
sample span and the high-frequency sampling frequency. In Section 4, we consider broad classes
of high-frequency proxies Yt and forecast targets Y †t , and show under primitive conditions that
‖Yt − Y †t ‖p ≤ Kdθt , for all t (3.2)
for some constants K > 0 and θ ∈ (0, 1/2], where dt denotes the sampling mesh of the high-
frequency data in day t and ‖·‖p denotes the Lp-norm for p ≥ 1. Given the convergence rate
condition (3.2), Assumption C mainly requires that the sequence (dt)t≥1 of sampling meshes goes
to zero sufficiently fast relative to T → ∞, provided that the evaluation measure f(·) is smooth
in the target variable. Proposition 3.1, below, formalizes this statement and is useful for verifying
Assumption C.
Proposition 3.1. Suppose (i) condition (3.2) holds for some K > 0 and θ ∈ (0, 1/2]; (ii) there
exist a constant h ∈ (0, 1] and a sequence (mt)t≥0 of random variables such that for each t,
‖f(Yt, Ft(β∗)) − f(Y †t , Ft(β
∗))‖ ≤ mt‖Yt − Y †t ‖h; and (iii) supt ‖mt‖p/(p−1) < ∞. The following
statements hold:
(a) E[f∗T ]− E[f †∗T ] = O(T−1
∑Tt=1 d
θht
).
(b) If, in addition, aT T k for some k > 0 and∑T
t=1 tk−1dθht <∞, then Assumption C holds.
Comments. (i) Part (a) of Proposition 3.1 characterizes the rate at which the proxy hypothesis
converges to the true hypothesis. If the sampling mesh dt does not change across days, so that
dt = ∆ identically, then the convergence rate is simply ∆−θh. Typically, the evaluation function
f(·, ·) is stochastically Lipschitz in the forecast target, so h = 1. In addition, as shown in Section
17
4, a majority of high-frequency proxies satisfy (3.2) with θ = 1/2. Hence, the “typical” rate of
convergence of the proxy hypotheses is ∆−1/2.
(ii) As shown in Section 3.1, most (but not all) evaluation methods are associated with aT
T 1/2. In view of the comment above, the “typical” sufficient condition for Assumption C is T∆→ 0.
(iii) More generally, part (b) shows that Assumption C holds if the summability condition∑Tt=1 t
k−1dθht < ∞ holds. This requires that the sampling mesh goes to zero sufficient fast. A
sufficient condition is dT = O(T−k/(θh)(log T )−1/(θh)−η) for some η > 0 that is arbitrarily small
but fixed; see Theorem 2.31 in Davidson (1994).
(iv) Corradi et al. (2011) derived convergence-rate results like (3.2) in the case when high-
frequency data are contaminated by the microstructure noise. These authors show that when the
proxy Yt is the two-scale RV estimator (Zhang et al. (2005)), the multi-scale RV estimator (Zhang
(2006)) or the realized kernel estimator (Barndorff-Nielsen et al. (2008)), (3.2) is satisfied with
θ = 1/6, 1/4 or 1/4, respectively, where Yt is the IV.
In order to facilitate applications, we now illustrate the use of Proposition 3.1 for verifying
Assumption C in concrete examples. Here, we take condition (3.2) as given (see Section 4 for
results on this), and mainly illustrate how to verify condition (ii) in Proposition 3.1. We remind
the reader that P T is a maintained assumption.
Example 3.8: Consider a forecast comparison setting with the evaluation measure being the
loss differential of two competing forecasts, that is, f(Yt, (F1t, F2t)) = L(Yt − F1t) − L(Yt − F2t),
where L(·) is a loss function. If L(·) is Lipschitz (e.g. Lin-Lin loss), then |f(Yt, (F1,t, F2,t)) −
f(Y †t , (F1,t, F2,t))| ≤ K‖Y †t − Yt‖, so that the sequence mt in Proposition 3.1 can be taken to be a
constant.
Example 3.9: Non-Lipschitz loss functions can also be accommodated. Consider the same
setting as in Example 3.8 but with L(·) being the quadratic loss (i.e., L(x) = x2). We have
f(Yt, (F1,t, F2,t))− f(Y †t , (F1,t, F2,t)) = 2(Yt−Y †t )(F2,t−F1,t). If supt≥1(‖F1,t‖q + ‖F2,t‖q) <∞ for
q = p/(p− 1),6 then the conditions in Proposition 3.1 is verified for mt = 2 |F2,t − F1,t|, by the Cr
inequality.
Example 3.10: Consider correlation forecasting for a bivariate asset price process Xt =
6Uniform boundedness on moments are commonly used for deriving asymptotic results for heterogeneous data;
see, for example, White (2001). This condition is trivially satisfied if the forecasts F1,t and F2,t are bounded (e.g.
forecasts for correlations).
18
(X1t, X2t). Let Y †t =∫ tt−1 csds be the integrated covariance matrix and Yt be a proxy of it (see, e.g.,
Theorems 4.1 and 4.2). Following Barndorff-Nielsen and Shephard (2004a), we use the integrated
correlation as a model-free correlation measure, which is defined as H(Y †t ) ≡ Y †12,t/√Y †12,tY
†22,t.
For an evaluation problem under the absolute deviation loss, the associated evaluation mea-
sure is f(Yt, (F1t, F2t)) = |H (Yt) − F1t| − |H (Yt) − F2t|. By the mean-value theorem and the
Cauchy–Schwarz inequality, condition (ii) in Proposition 3.1 is verified for mt ≡ 2‖∇H(Yt)‖,
where Yt is some mean-value between Yt and Y †t . By Jensen’s inequality and Holder’s inequal-
ity, we see that a sufficient condition for condition (iii) of Proposition 3.1 is that the variables
(Y12,t, 1/Y11,t, 1/Y22,t, Y†
12,t, 1/Y†
11,t, 1/Y†
22,t) have bounded qth moment, q = 3p/ (p− 1).
Finally, under the conditions discussed in Sections 3.1 and Assumption C above, Proposition
3.2 shows that the feasible test φT is valid under the true hypotheses.
Proposition 3.2. The following statements hold under Assumptions A1–A3 and C.
(a) Under the EPA setting (2.11), EφT → α under H†0. If Assumption B1(ii) (resp. B2) holds
in addition, we have EφT → 1 under H†1a (resp. H†2a).
(b) Under the SPA setting (2.12) and Assumption B1, we have lim supT→∞ EφT ≤ α under
H†0 and EφT → 1 under H†a.
Comments. (i) It can be shown that the test φT satisfies the same asymptotic level and power
properties under the proxy hypotheses, without requiring Assumption C. Assumption C is needed
for deriving asymptotic properties of φT under the true hypotheses. In particular, Proposition
3.2 shows that the level and power properties of the test are the same for the true and the proxy
hypotheses. In this sense, the proxy error is negligible for the asymptotic inference about preditive
accuracy.
(ii) Similar to our negligibility result, West (1996) defines cases exhibiting “asymptotic irrel-
evance” as those in which valid inference about predictive accuracy can be made while ignoring
the presence of parameter estimation error βt − β∗. Our negligibility result is very distinct from
West’s result: here, the unobservable quantity is a latent stochastic process (Y †t )t≥1 that grows in
T as T → ∞, while in West’s setting it is a fixed deterministic and finite-dimensional parameter
β∗. Unlike West’s (1996) case, where a correction can be applied when the asymptotic irrelevance
condition (w.r.t. β∗) is not satisfied, no such correction (w.r.t. Y †t ) is readily available in our
application, nor in that of Corradi and Distaso (2006), among others.
19
3.3 Discussion of an alternative approach
The theoretical framework that we develop above is based on the “convergence-of-hypotheses”
approach. We have shown that if the proxy only results in an asymptotically negligible difference
in the expected evaluation measure (i.e., Assumption C), then the feasible tests based on the proxy
has desirable rejection probabilities under the true hypotheses (see Proposition 3.2).
We stress that the convergence-of-hypotheses approach is the natural choice here because we are
interested in hypothesis testing. This is thus very different from prior work that derives asymptotic
negligibility results involving high-frequency proxies in estimation problems of stochastic volatility
models; see, for example, Corradi and Distaso (2006), Todorov (2009), Todorov et al. (2011),
Kanaya and Kristensen (2016) and Bandi and Reno (2016). The approach used in these papers
can be regarded as one with “convergence-of-statistics.” That is, these authors show that certain
feasible proxy-based statistics (e.g., the estimator of the parameter of interest and that of the
asymptotic variance) has asymptotically negligible difference from their infeasible counterparts
(which are constructed using latent variables such as the IV).
It is possible to use the convergence-of-statistics approach as an alternative proof strategy to
show the validity of the feasible tests. For concreteness, we use the example in Section 2.1 to
illustrate how this can be done. In this example, the feasible statistics include DMT and the
long-run variance estimator ST . To fix idea, we suppose that ST is the Newey–West estimator
given by
ST =
hT∑l=−hT
w (l, hT ) Γl,T ,
where Γl,T is a sample autocovariance function of the loss differential series |Yt+1 − F1,t+1| −
|Yt+1 − F2,t+1| at lag l, w (·) is a kernel function and hT is a bandwidth parameter. The associated
infeasible statistics are DM †T and S†T , where S†T is constructed similarly as ST but with Yt replaced
by Y †t . The feasible and infeasible t-statistics are then given by, respectively,
ϕT ≡√PDMT /
√ST , ϕ†T ≡
√PDM †T /
√S†T .
The convergence-of-statistics approach amounts to seeking sufficient conditions that ensures ϕT −
ϕ†T = op(1), for which it suffices to have
√P (DMT −DM †T ) = op(1), ST − S†T = op(1). (3.3)
In contrast, the convergence-of-hypotheses condition in this example is given by (2.8), which
20
is recalled below for the reader’s convenience
√P(E[DMT ]− E[DM †T ]
)= o(1).
Immediately, we observe that this condition is very similar to the first part of (3.3). In fact,
both are implied by E|DMT − DM †T | = o(P−1/2), which is underlying the proof of Proposition
3.1. However, unlike (3.3), the convergence-of-hypotheses approach does not require the additional
negligibility result concerning the long-run variance estimators, that is, ST − S†T = op(1).
It is of course technically possible to show (3.3) in this simple example.7 However, our key
observation is that this effort is unnecessary, because our “end goal” is not to recover the infeasible
test statistic (i.e., ϕ†T ), but to ensure that the feasible test has the desired asymptotic rejection
probabilities under the true hypotheses (which we have shown in Proposition 3.2). The former
clearly implies the latter, but not without cost. In this sense, the convergence-of-hypotheses
approach offers a “shorter path” for proving the validity of the feasible test than the convergence-
of-statistics alternative.
More generally, the additional cost of recovering the infeasible test statistic can be much higher
than establishing a negligibility result for the long-run variance. Indeed, as shown in the examples
in Section 3.1, the test statistics may have non-standard asymptotic distributions, the long-run
variance may not be consistently estimated, and the inference may be done via bootstrap. These
idiosyncrasies would require a method-by-method calculation (together with additional method-
specific regularity conditions) for showing the negligible difference between the feasible and the
infeasible test statistics. This would defeat our goal of establishing a concise but general framework
for studying a broad range of proxy-based forecast evaluation problems. This is the main reason
why we have developed the convergence-of-hypotheses approach in the current study.
4 High-frequency proxies and their accuracy
In this section, we introduce high-frequency proxies Yt for various risk measures Y †t and verify the
high-level Assumption C, invoked in the previous section, under primitive conditions in a range
of cases. Section 4.1 introduces the setting for the high-frequency data. Sections 4.2–4.4 consider
7Let Γ†l,T denote the infeasible counterpart of Γl,T . Suppose that the variables Yt, Y†t , F1,t and F2,t have
bounded qth moment for q = p/(p − 1) and (3.2) holds. Then by the triangle inequality and Holder’s inequality,
E|Γl,T − Γ†l,T | ≤ KT−1 ∑T
t=R dθt . Since the kernel function w(·) is bounded, ST − S†T = Op(hTT
−1 ∑Tt=R d
θt ). In the
special case with regular sampling (i.e. dt = ∆ identically), hT∆θ → 0 implies ST − S†T = op(1).
21
three general classes of proxies for various volatility and jump functionals and Section 4.5 considers
some additional important examples. In Section 4.6, we compare these results with existing ones
in the literature and summarize our technical contribution.
4.1 The underlying asset price process
In this subsection, we describe the setting for constructing proxies using high-frequency data. Our
basic assumption is that the log price process (Xt)t≥0 is a d-dimensional Ito semimartingale defined
on a filtered probability space (Ω,F , (Ft)t≥0,P) with the following form
Xt = X0 +
∫ t
0bsds+
∫ t
0σsdWs
+
∫ t
0
∫Rδ (s, z) 1‖δ(s,z)‖≤1µ (ds, dz) +
∫ t
0
∫Rδ (s, z) 1‖δ(s,z)‖>1µ (ds, dz) ,
(4.1)
where bt is a d-dimensional cadlag adapted process, Wt is a d′-dimensional standard Brownian
motion, σt is a d× d′ stochastic volatility process, δ : Ω×R+ ×R 7→ Rd is a predictable function,
µ is a Poisson random measure on R+ × R with compensator ν (ds, dz) = ds ⊗ λ (dz) for some
σ-finite measure λ, and µ ≡ µ−ν. Ito semimartingales are widely used for modeling asset prices in
financial economics and econometrics; see, for example, Duffie (2001), Singleton (2006) and Jacod
and Protter (2012).
The diffusive risk and the jump risk in Xt are respectively captured by the spot covariance
matrix ct ≡ σtσᵀt and the jump process ∆Xt ≡ Xt − Xt−, where Xt− ≡ lims↑tXs. In practice,
these risks are often summarized as various functionals of the processes ct and ∆Xt, which play
the role of the latent forecast target Y †t in our analysis.
To simplify the discussion, we normalize the unit of time to be one day. For each day t, the
process X is sampled at deterministic discrete times t− 1 = τ(t, 0) < · · · < τ (t, nt) = t, where nt
is the number of intraday returns. We denote the returns and sampling durations by, respectively,
∆t,iX ≡ Xτ(t,i) −Xτ(t,i−1), dt,i = τ(t, i)− τ(t, i− 1),
and denote the sampling mesh by dt = max1≤i≤nt dt,i. The basic assumption on the sampling
scheme is that dt should be “small” in the prediction sample, as formalized below.
Assumption S: dT → 0 and dT = O(n−1T ) as T →∞.
Assumption S posits that the sampling mesh and the sample span T respectively go to 0 and
∞ in a joint, rather than a sequential, way. Under this condition, we characterize the rate of
22
convergence of various high-frequency proxies in Section 4. This sampling scheme is essentially the
same as the “double asymptotic” setting considered by Corradi and Distaso (2006) and Todorov
(2009), among others. Indeed, the latter amounts to setting dt,i to be a constant ∆, so that
Assumption S posits ∆ → 0 and T → ∞ asymptotically. Allowing for time-varying sampling
incurs no additional cost in our derivation, but is conceptually desirable in practice. As the
trading activity has grown substantially over the past two decades, later samples have a much
larger number of, and less noisy, intradaily observations than those in earlier samples, so it may be
more efficient to sample more frequently in later samples (Aıt-Sahalia et al. (2005), Zhang et al.
(2005), Bandi and Russell (2008)). This setting is also aligned naturally with the focal point of our
approximation argument: we are interested in using the proxy Yt+τ to approximate the true target
Y †t+τ in the prediction sample (i.e., t ∈ R, . . . , T) for evaluation, while being agnostic about the
regression sample (i.e., t < R).
We need the following regularity condition for the process Xt.
Assumption HF: Suppose that the following conditions hold for constants r ∈ (0, 2], k ≥ 2
and C > 0.
(i) The process σt is a d× d′ Ito semimartingale with the form
σt = σ0 +
∫ t
0bsds+
∫ t
0σsdWs +
∫ t
0
∫Rδ (s, z) µ (ds, dz) , (4.2)
where b is a d× d′ cadlag adapted process, σ is a d× d′ × d′ cadlag adapted process and δ (·) is a
d× d′ predictable function on Ω× R+ × R.
(ii) For some nonnegative deterministic functions Γ (·) and Γ(·) on R, we have ‖δ (ω, s, z) ‖ ≤
Γ (z) and ‖δ(ω, s, z)‖ ≤ Γ(z) for all (ω, s, z) ∈ Ω× R+ × R and∫R
(Γ (z)r ∧ 1)λ (dz) +
∫R
Γ (z)k 1Γ(z)>1λ (dz) <∞,∫R
(Γ (z)2 + Γ (z)k)λ (dz) <∞.(4.3)
(iii) Let b′s = bs −∫R δ (s, z) 1‖δ(s,z)‖≤1λ (ds) if r ∈ (0, 1] and b′s = bs if r ∈ (1, 2]. We have for
all s ≥ 0,
E‖b′s‖k + E‖σs‖k + E‖bs‖k + E‖σs‖k ≤ C. (4.4)
Assumption HF(i) posits that the stochastic volatility process σt is also an Ito semimartingale.
For the results below, we allow for volatility jumps of arbitrary activity. For this reason, we do
23
not need to distinguish small jumps from big jumps in volatility, so we group them together as a
purely discontinuous local martingale in (4.2). Assumption HF(ii) imposes a type of dominance
condition on the random jump size for the price and the volatility. The constant r provides an
upper bound for the generalized Blumenthal–Getoor index, or “activity,” of price jumps in X. The
assumption is weaker when r is larger, in which case it is more difficult to separate jumps from
the diffusive component of Xt. The kth-order integrability of Γ(·) and Γ (·) places restrictions on
jump tails and it facilitates the derivation of bounds via sufficiently high moments. Assumption
HF(iii) imposes integrability conditions that serve the same purpose.
In the subsections below, we show (3.2) under these primitive conditions. We stress that, unlike
the existing convergence rate results under the fixed-T setting (see, e.g., Jacod and Protter (2012)),
we consider rate results that are valid in the large-T setting, which demands different conditions
and proofs; see Section 4.6 for a detailed discussion. Throughout the rest of this section, we
maintain Assumption S without further mention.
4.2 Generalized realized variations for continuous processes
We start with the basic setting with X continuous; the continuity condition will be relaxed in later
subsections. That said, we allow for general volatility jumps as described in Assumption HF. We
consider the following general class of estimators: for any measurable function g : Rd 7→ R, we set
It (g) ≡nt∑i=1
g(∆t,iX/d1/2t,i )dt,i.
We also associate g with the following function: for any d × d positive semidefinite matrix A, we
set ρ (A; g) ≡ E [g (U)] for U ∼ N (0, A), provided that the expectation is well-defined. Theorem
4.1 below provides a bound for the approximation error between the proxy Yt = It(g) and the
target variable Y †t = It(g) ≡∫ tt−1 ρ (cs; g) ds.
In many applications, the function ρ ( · ; g) and, hence, It(g) can be expressed in closed form.
For example, in the scalar case (i.e., d = 1), if we take g (x) = |x|a /ma for some a ≥ 2, where ma
is the ath absolute moment of a standard normal variable, then It(g) =∫ tt−1 c
a/2s ds; the integrated
variance is a special case with a = 2. Another univariate example is to take g(x) = cos(√
2ux),
u > 0, yielding It(g) =∫ tt−1 exp(−ucs)ds. In this case, It(g) is the realized Laplace transform
of volatility (Todorov and Tauchen (2012)) and It(g) is the Laplace transform of the volatility
occupation density which captures the distributional information of volatility. A simple bivariate
24
example is g(x1, x2) = x1x2, which leads to It(g) =∫ tt−1 c12,sds, that is, the integrated covariance
between the two components of Xt; see Barndorff-Nielsen and Shephard (2004a).
Theorem 4.1. Let p ∈ [1, 2) and C > 0 be constants. Suppose (i) Xt is continuous; (ii)
g(·) and ρ ( · ; g) are continuously differentiable and, for some q ≥ 0, ‖∂xg (x) ‖ ≤ C(1 + ‖x‖q)
and ‖∂Aρ (A; g) ‖ ≤ C(1 + ‖A‖q/2); (iii) Assumption HF with k ≥ max 2qp/ (2− p) , 4; (iv)
E[ρ(cs; g2)] ≤ C for all s ≥ 0. Then ‖It(g)− It(g)‖p ≤ Kd1/2
t for some constant K > 0 and all t.
4.3 Jump-robust proxies for integrated volatility functionals
We now turn to a general setting in which Xt may have jumps. In this subsection, we consider
jump-robust proxies for risk measures with the form I?t (g) =∫ tt−1 g(cs)ds, where g : Rd×d 7→ R
is a twice continuously differentiable function with at most polynomial growth. This class of risk
factors is quite general: integrated variance and covariance, integrated quarticity, and volatility
Laplace and Fourier transforms are special cases. The estimation and inference for this class of
integrated volatility functionals has been studied by Kristensen (2010) in a case without price or
volatility jumps.
In order to construct a jump-robust proxy for I?t (g), we first nonparametrically recover the
spot covariance process by using a spot truncated covariation estimator given by8
cτ(t,i) =1
kt
kt∑j=1
d−1t,i+j∆t,i+jX∆t,i+jX
ᵀ1‖∆t,i+jX‖≤αd$t,i+j, (4.5)
where α > 0 and $ ∈ (0, 1/2) are constant tuning parameters, and kt is an integer that specifies
the local window for the spot covariance estimation and may vary across days. We consider the
sample analogue of I?t (g) as its proxy, that is, I?t (g) =∑nt−kt
i=0 g(cτ(t,i))dt,i.
Theorem 4.2. Let q ≥ 2, p ∈ [1, 2) and C > 0 be constants. Suppose (i) g is twice continuously
differentiable and ‖∂jxg(x)‖ ≤ C(1 + ‖x‖q−j) for j ∈ 0, 1, 2; (ii) kt d−1/2t ; (iii) Assumption
HF with k ≥ max4q, 4p(q − 1)/(2 − p), (1 −$r)/(1/2 −$) and r ∈ (0, 2). We set θ1 = 1/(2p)
in the general case and θ1 = 1/2 if we further assume that σt is continuous. We also set θ2 =
8Spot variance estimators can be dated back to Foster and Nelson (1996) and Comte and Renault (1998); also
see Kristensen (2010) and references therein. The truncation technique was proposed by Mancini (2001) for the
estimation of integrated variance. The spot truncated covariation estimator appeared in Chapter 9 of Jacod and
Protter (2012), although they have been considered as auxiliary results in other contexts (see, e.g., Aıt-Sahalia and
Jacod (2009)).
25
min1−$r+ q(2$− 1), 1/r− 1/2. Then ‖I?t (g)− I?t (g)‖p ≤ Kdθ1∧θ2t for some constant K and
all t.
Comments. (i) The rate exponent θ1 is associated with the contribution from the continuous
component of Xt. The exponent θ2 captures the approximation error due to the elimination of
jumps. The larger these indexes, the faster the proxy hypotheses converge to the true ones; recall
Proposition 3.1. If we further impose r < 1 and $ ∈ [(q− 1/2)/(2q− r), 1/2), then θ2 ≥ 1/2 ≥ θ1.
That is, the presence of “inactive” jumps does not affect the rate of convergence, provided that
the jumps are properly truncated.
(ii) Jacod and Rosenbaum (2013) characterize the limit distribution of I?t (g) under the in-fill
asymptotic setting with a fixed time span, under the assumption that g is three-times continuously
differentiable and r < 1. The same rate is attained by Kristensen (2010) in the “fixed-T” setting for
twice continuously differentiable functions in the case without price or volatility jumps. Here, we
obtain the same rate of convergence under the L1 norm, and under the Lp norm if σt is continuous,
in a setting with dT → 0 and T → ∞. Our results also cover the case with active jumps, that is,
the setting with r ≥ 1.
4.4 Functionals of price jumps
In this subsection, we consider jump risk measures. The target variable of interest takes the form
Jt(g) ≡∑
t−1<s≤t g (∆Xs) for some function g : Rd 7→ R. The proxy is the sample analogue
estimator Jt (g) ≡∑nt
i=1 g (∆t,iX) . Basic examples include jump power variations such as the
unnormalized realized skewness (g(x) = x3), kurtosis (g(x) = x4), coskewness (g(x1, x2) = x21x2)
and cokurtosis (g(x1, x2) = x21x
22). See Amaya et al. (2011) for applications of these risk factors.
Theorem 4.3. Let p ∈ [1, 2) and C > 0 be constants. Suppose (i) g is twice continuously differ-
entiable; (ii) for some q2 ≥ q1 ≥ 3, we have ‖∂jxg(x)‖ ≤ C(‖x‖q1−j + ‖x‖q2−j) for all x ∈ Rd and
j ∈ 0, 1, 2; (iii) Assumption HF with k ≥ max2q2, 4p/(2−p). Then ‖Jt (g)−Jt(g)‖p ≤ Kd1/2t
for some constant K and all t.
Comment. The polynomial ‖x‖q1−j in condition (ii) bounds the growth of g(·) and its derivatives
near zero. This condition ensures that the contribution of the continuous part of X to the approx-
imation error is dominated by the jump part of X. This condition can be relaxed at the cost of
a more complicated expression for the rate. The polynomial ‖x‖q2−j controls the growth of g(·)
near infinity so as to tame the effect of big jumps.
26
4.5 Additional special examples
We now consider a few special examples which are not covered by Theorems 4.1–4.3. In the first
example, the true target is the daily quadratic covariation matrix QVt of the process X, that is,
QVt ≡∫ tt−1 csds+
∑t−1<s≤t ∆Xs∆X
ᵀs . The associated proxy is the realized covariation matrix
RVt ≡nt∑i=1
∆t,iX∆t,iXᵀ. (4.6)
Theorem 4.4. Let p ∈ [1, 2). Suppose Assumption HF with k ≥ max2p/(2 − p), 4. Then
‖RVt −QVt‖p ≤ Kd1/2t for some K and all t.
Comment. In the case without price jumps, Corradi and Distaso (2006) established a similar
result; see their Proposition 1. Theorem 4.4 holds in the general case with price jumps, without
any restriction on their activity.
Second, we consider the bipower variation of Barndorff-Nielsen and Shephard (2004b) for uni-
variate X that is defined as
BVt =nt
nt − 1
π
2
nt−1∑i=1
|d−1/2t,i ∆t,iX||d−1/2
t,i+1∆t,i+1X|dt,i. (4.7)
This estimator serves as a proxy for the integrated variance∫ tt−1 csds.
Theorem 4.5. Let p and p′ be constants such that 1 ≤ p < p′ ≤ 2. Suppose that Assumption
HF holds with d = 1 and k ≥ maxpp′/(p′ − p), 4. We have, for some K and all t, (a) ‖BVt −∫ tt−1 csds‖p ≤ Kd
(1/r)∧(1/p′)−1/2t ; (b) if, in addition, X is continuous, then ‖BVt −
∫ tt−1 csds‖p ≤
Kd1/2t .
Comment. Part (b) shows that, when X is continuous, the approximation error of the bipower
variation achieves the√nt rate. Part (a) provides a bound for the rate of convergence in the
case with jumps. The rate is slower than that in the continuous case. The constant p′ arises as
a technical device in our proofs and should be chosen close to p so that the bound in part (a) is
sharper. We note that, the rate in part (a) is sharper when r is smaller. In particular, with r ≤ 1
and p′ being close to 1, the bound in the jump case can be made arbitrarily close to O(d1/2t ), at
the cost of assuming higher-order moments to be finite (i.e., larger k). The slower rate in the jump
case is in line with the known fact that the bipower variation estimator does not admit a CLT
when X is discontinuous.9
9See p. 313 in Jacod and Protter (2012) and Vetter (2010).
27
Finally, we consider the realized semivariance estimator proposed by Barndorff-Nielsen et al.
(2010) for univariate X. Let x+ and x− denote the positive and the negative parts of x ∈ R,
respectively. The upside (+) and the downside (−) realized semivariances are defined as SV±t =∑nt
i=1∆t,iX2±, which serve as proxies for SV ±t = 12
∫ tt−1 csds+
∑t−1<s≤t∆Xs2±.
Theorem 4.6. Let p and p′ be constants such that 1 ≤ p < p′ ≤ 2. Suppose that Assumption
HF holds with d = 1, r ∈ (0, 1] and k ≥ maxpp′/(p′ − p), 4. Then for some K and all t, (a)
‖SV±t −SV ±t ‖p ≤ Kd
1/p′−1/2t ; (b) if, in addition, X is continuous, then ‖SV
±t −SV ±t ‖p ≤ Kd
1/2t .
Comment. Part (b) shows that, when X is continuous, the approximation error of the semi-
variance achieves the√nt rate, which agrees with the rate shown in Barndorff-Nielsen et al. (2010)
under the fixed-span setting. Part (a) provides a bound for the rate of convergence in the case
with jumps. The constant p′ arises as a technical device in the proof. One should make it small
so as to achieve a better rate, subject to the regularity condition k ≥ pp′/(p′ − p). In particular,
the rate can be made close to that in the continuous case when p′, hence p too, are close to 1.
Barndorff-Nielsen et al. (2010) do not consider rate results in the case with price or volatility
jumps.
4.6 Discussion: comparison with existing results
The high-frequency proxies studied in this section have been proposed in the literature; see Jacod
and Protter (2012) for a comprehensive review. However, the convergence rate results in this
section are distinct from existing work in two important dimensions.
First, with a few exceptions, prior work in the high-frequency literature mainly concerns an
asymptotic setting in which the sampling interval goes to zero but the sample span T is fixed.
Indeed, this is the setting of Jacod and Protter (2012). In contrast, we consider a high-frequency
long-span (double) asymptotic setting in which T → ∞. Consequently, existing rate results de-
veloped in the fixed-T setting cannot be directly invoked here. To be more specific, we note that
in the fixed-T setting, it is routine to apply the localization argument (see Section 4.4.1 in Jacod
and Protter (2012)), so that one can assume the underlying processes to be uniformly bounded
without loss of generality. However, we cannot invoke localization in the current long-span set-
ting. Hence, we need to design a different set of regularity conditions (see Assumption HF) and
use different technical arguments. We note that the Lp-bounds of the proxy errors derived in the
above subsections hold for the original process X at every t, instead of for the localized process
28
(as is done in the fixed-T setting).
Second, we are only interested in the rate of convergence, rather than proving a central limit
theorem. We thus provide direct proofs on the rates, which are, not surprisingly, notably dif-
ferent from proofs of central limit theorems for the high-frequency estimators. We note that we
derive rate results in cases even when there is no known central limit theorems. Examples in-
clude the semivariance, realized (co)skewness, and jump-robust estimators for integrated volatility
functionals under the setting with active jumps.
The high-frequency long-span setting has also been used in the literature on proxy-based semi-
parametric estimation of stochastic volatility models; see Corradi and Distaso (2006), Todorov
(2009), Todorov et al. (2011), Kanaya and Kristensen (2016) and Bandi and Reno (2016). Cor-
radi et al. (2009, 2011) further consider the problem of nonparametric density estimation. These
papers use proxies such as the realized variance, bipower variation, truncated variation, volatility
Laplace transform and spot variance estimates, and establish asymptotic negligibility results for
them. Our Theorem 4.1 and Theorem 4.2 cover these volatility functionals as special cases (but
our econometric interest on forecast evaluation is very different from prior work on estimation).
Indeed, instead of focusing on specific volatility functionals (such as the IV), we state our results
for general transformations on the volatility, so that other risk measures, such as beta, correlation,
idiosyncratic variance, volatility beta and eigenvalues can be readily incorporated in our forecast
evaluation setting.
The convergence rate results in this section does not concern high-frequency proxies that are
robust to microstructure noise. The current literature concerning noise mainly focuses on the esti-
mation of integrated variance and covariance. Corradi et al. (2011) has established the convergence
rate results under the Lp-norm for several popular noise-robust estimators in their Lemma 1. As
mentioned in comment (iv) of Proposition 3.1, their results can be used for verifying the high-level
condition. Generalizing the results in this section further to the noisy setting is beyond the scope
of the current paper.
5 Extensions: additional forecast evaluation methods
In this section we discuss several extensions of our baseline result (Proposition 3.2). We first
consider tests for instrumented conditional moment equalities, as in Giacomini and White (2006).
We then consider stepwise evaluation procedures that include the multiple testing method of
29
Romano and Wolf (2005) and the model confidence set of Hansen et al. (2011). Our purpose is
twofold: one is to facilitate the application of these methods in the context of forecasting latent
risk measures, the other is to demonstrate the generalizability of the framework presented above
through known, but distinct, examples. The stepwise procedures (Romano and Wolf (2005),
Hansen et al. (2011)) each involve some method-specific aspects that are not used elsewhere in the
paper; hence, for the sake of readability, we only briefly discuss the results here, and present the
details (assumptions, algorithms and formal results) in Supplemental Appendix B to this paper.
5.1 Tests for instrumented conditional moment equalities
Many interesting forecast evaluation problems can be stated as a test for the conditional moment
equality:
H†0 : E[g(Y †t+τ , Ft+τ (β∗))|Ht] = 0, all t ≥ 0, (5.1)
where Ht is a sub-σ-field that represents the forecast evaluator’s information set at day t, and
g(·, ·) : Y × Y k 7→ Rκg is a measurable function. Specific examples are given below. Let ht denote
a Ht-measurable Rκh-valued data sequence that serves as an instrument. The conditional moment
equality (5.1) implies the following unconditional moment equality:
H†0,h : E[g(Y †t+τ , Ft+τ (β∗))⊗ ht] = 0, all t ≥ 0. (5.2)
We cast (5.2) in the setting of Section 2 by setting ft+τ (y, β) ≡ g(y, Ft+τ (β))⊗ht. Then the theory
in Section 3 can be applied without further change. In particular, Proposition 3.2 suggests that
the two-sided PEPA test (with χ = 0) using the proxy has a valid asymptotic level under H†0 and
is consistent against the alternative
H†2a,h : lim infT→∞
‖E[g(Y †t+τ , Ft+τ (β∗))⊗ ht]‖ > 0. (5.3)
Examples include tests for conditional predictive accuracy and tests for conditional forecast
rationality. To simplify the discussion, we only consider scalar forecasts, so κY = 1. Below, let
L(·, ·) : Y × Y 7→ R be a loss function, with its first and second arguments being the target and
the forecast, respectively.
Example 5.1: Giacomini and White (2006) consider two-sided tests for conditional equal
predictive ability of two sequences of actual forecasts Ft+τ = (F1,t+τ , F2,t+τ ). The null hy-
pothesis of interest is (5.1) with g(Y †t+τ , Ft+τ (β∗)) = L(Y †t+τ , F1,t+τ (β∗)) − L(Y †t+τ , F2,t+τ (β∗)).
30
Since Giacomini and White (2006) concern the actual forecasts, we set β∗ to be empty and treat
Ft+τ = (F1,t+τ , F2,t+τ ) as an observable sequence. Primitive conditions for Assumptions A1 and
A3 can be found in Giacomini and White (2006), which involve standard regularity conditions
for weak convergence and HAC estimation. The test statistic is of Wald-type and readily verifies
Assumptions A2 and B2. As noted by Giacomini and White (2006), their test is consistent against
the alternative (5.3) and the power generally depends on the choice of ht.
Example 5.2: The population forecast Ft+τ (β∗), which is also the actual forecast if β∗ is
empty, is rational with respect to the information set Ht if it solves minF∈Ht E[L(Y †t+τ , F )|Ht]
almost surely. Suppose that L(y, F ) is differentiable in F for almost every y ∈ Y under the
conditional law of Y †t+τ given Ht, with the partial derivative denoted by ∂FL(·, ·). As shown in
Patton and Timmermann (2010), a test for conditional rationality can be carried out by testing
the first-order condition of the minimization problem. That is to test the null hypothesis (5.1)
with g(Y †t+τ , Ft+τ (β∗)) = ∂FL(Y †t+τ , Ft+τ (β∗)). The variable g(Y †t+τ , Ft+τ (β∗)) is the generalized
forecast error (Granger (1999)). In particular, when L(y, F ) = (F − y)2/2, that is, the quadratic
loss, we have g(Y †t+τ , Ft+τ (β∗)) = F − y; in this case, the test for conditional rationality is reduced
to a test for conditional unbiasedness. Tests for unconditional rationality and unbiasedness are
special cases of their conditional counterparts, with Ht being the degenerate information set.
5.2 Stepwise multiple testing procedure for superior predictive accuracy
In the context of forecast evaluation, the multiple testing problem of Romano and Wolf (2005)
consists of k individual testing problems of pairwise comparison for superior predictive accu-
racy. Let F0,t+τ (·) be the benchmark forecast model and let f †∗j,t+τ = L(Y †t+τ , F0,t+τ (β∗)) −
L(Y †t+τ , Fj,t+τ (β∗)), 1 ≤ j ≤ k, be the relative performance of forecast j relative to the benchmark.
As before, f †∗j,t+τ is defined using the true target variable Y †t+τ . We consider k pairs of hypotheses
Multiple SPA
H†j,0 : E[f †∗j,t+τ ] ≤ 0 for all t ≥ 1,
H†j,a : lim infT→∞ E[f †∗j,T ] > 0,1 ≤ j ≤ k. (5.4)
These hypotheses concern the true target variable and are stated in a way that allows for data
heterogeneity.
Romano and Wolf (2005) propose a stepwise multiple (StepM) testing procedure that conducts
decisions for individual testing problems while asymptotically control the familywise error rate
(FWE), that is, the probability of any null hypothesis being falsely rejected. The StepM procedure
31
relies on the observability of the forecast target. By imposing the condition on proxy accuracy
(Assumption C), we can show that the StepM procedure, when applied to the proxy, asymptotically
controls the FWE for the hypotheses (5.4) that concern the latent target. The details are in
Supplemental Appendix B.1.
5.3 Model confidence sets
The model confidence set (MCS) proposed by Hansen et al. (2011), henceforth HLN, can be
specialized in the forecast evaluation context to construct confidence sets for superior forecasts.
To fix ideas, let f †∗j,t+τ denote the performance (e.g., the negative loss) of forecast j with respect
to the true target variable. The set of superior forecasts is defined as
M† ≡j ∈
1, . . . , k
: E[f †∗j,t+τ ] ≥ E[f †∗l,t+τ ] for all 1 ≤ l ≤ k and t ≥ 1
.
That is,M† collects the forecasts that are superior to others when evaluated using the true target
variable. Similarly, the set of asymptotically inferior forecasts is defined as
M† ≡j ∈
1, . . . , k
: lim inf
T→∞
(E[f †∗l,t+τ ]− E[f †∗j,t+τ ]
)> 0
for some (and hence any) l ∈M†.
We are interested in constructing a sequence MT,1−α of 1−α nominal level MCS’s forM† so that
lim infT→∞
P(M† ⊆ MT,1−α
)≥ 1− α, P
(MT,1−α ∩M† = ∅
)→ 1. (5.5)
That is, MT,1−α has valid (pointwise) asymptotic coverage and has asymptotic power one against
fixed alternatives.
HLN’s theory for the MCS is not directly applicable due to the latency of the forecast target.
Following the prevailing strategy of the current paper, we propose a feasible version of HLN’s
algorithm that uses the proxy in place of the associated latent target. Under Assumption C, we
can show that this feasible version achieves (5.5). The details are in Supplemental Appendix B.2.
6 Monte Carlo analysis
6.1 Simulation designs
We consider three simulation designs which are intended to cover some of the most common and
important applications of high-frequency data in forecasting: (A) forecasting univariate volatility
32
in the absence of price jumps; (B) forecasting univariate volatility in the presence of price jumps;
and (C) forecasting correlation. In each design, we consider the EPA hypotheses, equation (2.11),
under the quadratic loss for two competing one-day-ahead forecasts using the method of Giacomini
and White (2006) and with the function ϕ(·, ·) corresponding to the t-statistic. In addition, the
proxies used below satisfy (3.2) with θ = 1/2 and, in view of comments (i) and (ii) of Proposition
3.1, Assumption C is implied by T∆→ 0, where ∆ is the sampling interval.
Each forecast is formed using a rolling scheme with window size R ∈ 250, 500, 1000 days.
The prediction sample contains P ∈ 500, 1000, 2000 days. The high-frequency data are simulated
using the Euler scheme at every second, and proxies are computed using sampling interval ∆ =
5 seconds, 1 minute, 5 minutes, or 30 minutes. As on the New York stock exchange, each day is
assumed to contain 6.5 trading hours. There are 1000 Monte Carlo trials in each experiment and
all tests are at the 5% nominal level.
We now describe the simulation designs. Simulation A concerns forecasting the logarithm of
the quadratic variation of a continuous price process. Following one of the simulation designs in
Andersen et al. (2005), we simulate the logarithmic price Xt and the spot variance process σ2t
according to the following stochastic differential equations: dXt = 0.0314dt+ σt(−0.5760dW1,t +√
1− 0.57602dW2,t) + dJt,
d log σ2t = −0.0136(0.8382 + log σ2
t )dt+ 0.1148dW1,t,(6.1)
where W1 and W2 are independent Brownian motions and the jump process J is set to be identically
zero. The values of the parameters of this process are taken from Andersen et al. (2005). The
target variable to be forecast is log IVt ≡ log∫ tt−1 σ
2sds and the proxy is logRV ∆
t , where RV ∆t is
defined by (4.6) for data sampled at ∆ = 5 seconds, 1 minute, 5 minutes, or 30 minutes.
The first forecast model in Simulation A is a GARCH(1,1) model (Bollerslev (1986)) estimated
using quasi maximum likelihood on daily returns:
Model A1:
rt = Xt −Xt−1 = σtεt, εt|Ft−1 ∼ N (0, 1) ,
σ2t = ω + βσ2
t−1 + αr2t−1.
(6.2)
The second model is a heterogeneous autoregressive (HAR) model (Corsi (2009)) for RV 5mint
estimated via ordinary least squares:
Model A2:
RV 5mint = β0 + β1RV
5mint−1 + β2
∑5k=1RV
5mint−k
+β3
∑22k=1RV
5mint−k + et.
(6.3)
33
The logarithm of the one-day-ahead forecast for σ2t+1 (resp. RV 5min
t+1 ) from the GARCH (resp.
HAR) model is taken as a forecast for log IVt+1.
In Simulation B, we also set the forecast target to be log IVt, but consider a more complicated
setting with price jumps. We simulate Xt and σ2t according to (6.1) and, following Huang and
Tauchen (2005), we specify Jt as a compound Poisson process with intensity λ = 0.05 per day and
with jump size distribution N (0.2, 1.42). (These parameter values are in the middle of the ranges
considered by Huang and Tauchen (2005).) The proxy for IVt is the bipower variation BV ∆t , where
BV ∆t is defined by (4.7) for data sampled with observation mesh ∆.
The competing forecast sequences in Simulation B are as follows. The first forecast is based
on a simple random walk model, applied to the 5-minute bipower variation BV 5mint :
Model B1: BV 5mint = BV 5min
t−1 + εt, where E [εt|Ft−1] = 0. (6.4)
The second model is a HAR model for BV 1mint
Model B2:
BV 1mint = β0 + β1BV
1mint−1 + β2
∑5k=1BV
1mint−k
+β3
∑22k=1BV
1mint−k + et.
(6.5)
The logarithm of the one-day-ahead forecast for BV 5 mint+1 (resp. BV 1 min
t+1 ) from the random walk
(resp. HAR) model is taken as a forecast for log IVt+1.
Finally, we consider correlation forecasting in Simulation C. This simulation exercise is of
particular interest as our empirical application in Section 7 concerns a similar forecasting problem.
We adopt the bivariate stochastic volatility model used in the simulation study of Barndorff-Nielsen
and Shephard (2004a). Let Wt = (W1,t,W2,t). The bivariate logarithmic price process Xt is given
by
dXt = σtdWt, σtσᵀt =
σ21,t ρtσ1,tσ2,t
• σ22,t
.
Let Bj,t, j = 1, . . . , 4, be Brownian motions that are independent of each other and of Wt. The
process σ21,t follows a two-factor stochastic volatility model: σ2
1,t = vt + vt, where dvt = −0.0429(vt − 0.1110)dt+ 0.2788√vtdB1,t,
dvt = −3.7400(vt − 0.3980)dt+ 2.6028√vtdB2,t.
(6.6)
The process σ22,t is specified as a GARCH diffusion:
dσ22,t = −0.0350(σ2
2,t − 0.6360)dt+ 0.2360σ22,tdB3,t. (6.7)
34
The specification for the correlation process ρt is a GARCH diffusion for the inverse Fisher trans-
formation of the correlation: ρt = (e2yt − 1)/(e2yt + 1),
dyt = −0.0300 (yt − 0.6400) dt+ 0.1180ytdB4,t.(6.8)
(The parameter values used here are the same as in Barndorff-Nielsen and Shephard (2004a),
although our notation differs slightly.) In this simulation design we take the target variable to be
the daily integrated correlation, which is defined as
ICt ≡QV12,t√
QV11,t
√QV22,t
. (6.9)
The proxy is given by the realized correlation computed using returns sampled at frequency ∆:
RC∆t ≡
RV ∆12,t√
RV ∆11,t
√RV ∆
22,t
. (6.10)
The first forecasting model is a GARCH(1,1)–DCC(1,1) model (Engle (2002)) applied to daily
returns rt = Xt −Xt−1:
Model C1:
rj,t = σj,tεj,t, σ2j,t = ωj + βjσ
2j,t−1 + αjr
2j,t−1, for j = 1, 2,
ρεt ≡ E[ε1,tε2,t|Ft−1] =Q12,t√Q11,tQ22,t
, Qt =
Q11,t Q12,t
• Q22,t
,
Qt = Q (1− a− b) + bQt−1 + a εt−1εᵀt−1, εt = (ε1,t, ε2,t).
(6.11)
The forecast for ICt+1 is the one-day-ahead forecast of ρεt+1. The second forecasting model extends
Model C1 by adding the lagged 30-minute realized correlation to the evolution of Qt:
Model C2: Qt = Q (1− a− b− g) + bQt−1 + a εt−1εᵀt−1 + g RC30min
t−1 . (6.12)
In each simulation, we set the evaluation function f(·) to be the loss of Model 1 less that of
Model 2 and conduct the one-sided EPA test (see equation (2.11)). We note that the competing
forecasts are not engineered to have the same mean-squared error (MSE). Therefore, for the purpose
of examining size properties of the tests, the hypotheses to be imposed are those in (2.11) with
χ being the population MSE of Model 1 less that of Model 2. We remind the reader that the
population MSE is computed using the true latent target variable, whereas the feasible tests are
conducted using proxies. The goal of this simulation study is to determine whether our feasible
tests have finite-sample rejection rates similar to those of the infeasible tests (i.e., tests based on
35
GW–NW GW–KVProxy RV ∆
t+1 P = 500 P = 1000 P = 2000 P = 500 P = 1000 P = 2000
R = 250
True Y †t+1 0.08 0.07 0.07 0.01 0.01 0.01∆ = 5 sec 0.08 0.07 0.07 0.01 0.01 0.01∆ = 1 min 0.08 0.07 0.07 0.01 0.01 0.01∆ = 5 min 0.07 0.07 0.06 0.01 0.01 0.01∆ = 30 min 0.07 0.06 0.06 0.01 0.01 0.01
R = 500
True Y †t+1 0.07 0.08 0.06 0.01 0.02 0.01∆ = 5 sec 0.08 0.08 0.06 0.01 0.02 0.01∆ = 1 min 0.07 0.08 0.06 0.01 0.02 0.01∆ = 5 min 0.07 0.08 0.06 0.01 0.02 0.01∆ = 30 min 0.06 0.07 0.05 0.01 0.02 0.01
R = 1000
True Y †t+1 0.09 0.07 0.06 0.02 0.01 0.01∆ = 5 sec 0.09 0.07 0.06 0.02 0.01 0.01∆ = 1 min 0.09 0.07 0.06 0.02 0.01 0.01∆ = 5 min 0.08 0.07 0.06 0.03 0.01 0.01∆ = 30 min 0.07 0.06 0.05 0.02 0.01 0.01
Table 1: Giacomini–White test rejection frequencies for Simulation A. The nominal size is 0.05,R is the length of the estimation sample, P is the length of the prediction sample, ∆ is thesampling frequency for the proxy. The left panel shows results based on a Newey–West estimateof the long-run variance, the right panel shows results based on Kiefer and Vogelsang’s “fixed-b”asymptotics.
true target variables), and, moreover, whether these tests have satisfactory size properties under
the true null hypothesis.10
36
GW–NW GW–KVProxy BV ∆
t+1 P = 500 P = 1000 P = 2000 P = 500 P = 1000 P = 2000
R = 250
True Y †t+1 0.08 0.09 0.07 0.02 0.01 0.01∆ = 5 sec 0.08 0.09 0.07 0.02 0.01 0.01∆ = 1 min 0.08 0.09 0.06 0.02 0.01 0.01∆ = 5 min 0.07 0.07 0.06 0.02 0.01 0.01∆ = 30 min 0.04 0.04 0.04 0.01 0.01 0.01
R = 500
True Y †t+1 0.09 0.08 0.07 0.01 0.01 0.01∆ = 5 sec 0.09 0.08 0.07 0.01 0.01 0.01∆ = 1 min 0.08 0.07 0.07 0.01 0.01 0.01∆ = 5 min 0.08 0.07 0.05 0.01 0.02 0.02∆ = 30 min 0.04 0.03 0.03 0.01 0.01 0.01
R = 1000
True Y †t+1 0.09 0.08 0.07 0.01 0.01 0.01∆ = 5 sec 0.09 0.08 0.07 0.01 0.01 0.01∆ = 1 min 0.08 0.07 0.07 0.01 0.01 0.01∆ = 5 min 0.06 0.07 0.07 0.02 0.01 0.01∆ = 30 min 0.03 0.03 0.04 0.01 0.01 0.01
Table 2: Giacomini–White test rejection frequencies for Simulation B. The nominal size is 0.05,R is the length of the estimation sample, P is the length of the prediction sample, ∆ is thesampling frequency for the proxy. The left panel shows results based on a Newey–West estimateof the long-run variance, the right panel shows results based on Kiefer and Vogelsang’s “fixed-b”asymptotics.
6.2 Results
The results for Simulations A, B and C are presented in Tables 1, 2 and 3, respectively. In the top
row of each panel are the results for the infeasible tests that are implemented with the true target
variable, and in the other rows are the results for feasible tests based on proxies. We consider two
10Due to the complexity from the data generating processes and volatility models we consider, computing the
population MSE analytically for each forecast sequence is difficult. We instead compute the population MSE by
simulation, using a Monte Carlo sample of 500,000 days. Similarly, it is difficult to construct data generating
processes under which two forecast sequences have identical population MSE, which motivates our considering a
nonzero χ in the null hypothesis, equation (2.11), of our simulation design. Doing so enables us to use realistic
data generating processes and reasonably sophisticated forecasting models which mimic those used in prior empirical
work.
37
GW–NW GW–KVProxy RC∆
t+1 P = 500 P = 1000 P = 2000 P = 500 P = 1000 P = 2000
R = 250
True Y †t+1 0.25 0.22 0.21 0.07 0.04 0.04∆ = 5 sec 0.25 0.22 0.21 0.07 0.04 0.04∆ = 1 min 0.25 0.23 0.20 0.07 0.04 0.04∆ = 5 min 0.24 0.23 0.20 0.06 0.05 0.04∆ = 30 min 0.24 0.21 0.19 0.07 0.05 0.04
R = 500
True Y †t+1 0.29 0.27 0.24 0.12 0.06 0.05∆ = 5 sec 0.29 0.27 0.24 0.12 0.06 0.05∆ = 1 min 0.29 0.27 0.24 0.12 0.06 0.05∆ = 5 min 0.29 0.28 0.24 0.12 0.06 0.05∆ = 30 min 0.30 0.26 0.23 0.12 0.07 0.05
R = 1000
True Y †t+1 0.27 0.23 0.20 0.14 0.07 0.06∆ = 5 sec 0.27 0.23 0.20 0.14 0.07 0.06∆ = 1 min 0.27 0.23 0.20 0.14 0.07 0.06∆ = 5 min 0.27 0.23 0.19 0.14 0.07 0.06∆ = 30 min 0.27 0.23 0.19 0.14 0.07 0.06
Table 3: Giacomini–White test rejection frequencies for Simulation C. The nominal size is 0.05,R is the length of the estimation sample, P is the length of the prediction sample, ∆ is thesampling frequency for the proxy. The left panel shows results based on a Newey–West estimateof the long-run variance, the right panel shows results based on Kiefer and Vogelsang’s “fixed-b”asymptotics.
implementations of the Giacomini–White (GW) test: the first is based on a Newey–West estimate
of the long-run variance and critical values from the standard normal distribution. The second is
based on the “fixed-b” asymptotics of Kiefer and Vogelsang (2005), using the Bartlett kernel. We
denote these two implementations as NW and KV, respectively. The KV method is of interest here
because of the well-known size distortion problem for inference procedures based on the standard
HAC estimation theory; see Muller (2012) and references therein. We set the truncation lag to be
3P 1/3 for NW and to be 0.5P for KV.11
Overall, we find that the rejection rates of the feasible tests based on proxies are generally
11In the KV case, the one-sided critical value for the t-statistic is 2.774 at 5% level when the truncation lag is
0.5P ; see Table 1 in Kiefer and Vogelsang (2005).
38
very close to the rejection rates of the infeasible tests using the true forecast target, and thus
that our negligibility result holds well in a range of realistic simulation scenarios. The standard
GW-NW method has reasonable size control in Simulations A and B, but has nontrivial size
distortion for Simulation C.12 This size distortion occurs even when the true target variable is
used, and is not exacerbated by the use of proxies. The GW-KV method has better size control in
these simulation scenarios, being somewhat conservative in Simulations A and B, and having good
rejection rates in Simulation C for P = 1000 and P = 2000. Supplemental Appendix S.C presents
results that confirm that these findings are robust with respect to the choice of the truncation lag
in the estimation of the long-run variance, along with some additional results on the disagreement
between the feasible and the infeasible tests. We caution here, though, that our simulation study
is clearly not exhaustive (for example, we focus on one-step-ahead forecasts and use the quadratic
loss function). It may be advisable that future researchers conduct further simulations if their
application differs greatly from those considered here.
7 Application: Comparing correlation forecasts
7.1 Data and model description
We now illustrate the use of our method with an empirical application on forecasting the integrated
correlation between two assets. Correlation forecasts are critical in financial decisions such as
portfolio construction and risk management; see Engle (2008) for example. Standard forecast
evaluation methods do not directly apply here due to the latency of the target variable, and
methods that rely on an unbiased proxy for the target variable (e.g., Hansen and Lunde (2006)
and Patton (2011)) cannot be used either, due to the absence of any such proxy.13 This is thus an
ideal example to illustrate the usefulness of the method proposed in the current paper.
12The reason for the large size distortion of the NW method in Simulation C appears to be the relatively high
persistence in the quadratic loss differentials. In Simulations A and B, the autocorrelations of the loss differential
sequence essentially vanish at about the 50th and the 30th lag, respectively, whereas in Simulation C they remain
non-negligible even at the 100th lag.13When based on relatively sparse sampling frequencies it may be considered plausible that the realized covariance
matrix is finite-sample unbiased for the true quadratic covariation matrix, however as the correlation involves a ratio
of the elements of this matrix, this property is lost.
39
Our sample consists two pairs of stocks: (i) Procter and Gamble (NYSE: PG) and General
Electric (NYSE: GE) and (ii) Microsoft (NYSE: MSFT) and Apple (NASDAQ: AAPL). The
sample period ranges from January 2000 to December 2010, consisting of 2,733 trading days, and
we obtain our data from the TAQ database. As in Simulation C from the previous section, we
take the proxy to be the realized correlation RC∆t formed using intraday returns with sampling
interval ∆.14 We consider ∆ ranging from 1 minute to 130 minutes, which covers sampling intervals
typically employed in empirical work.
We compare four forecasting models, all of which have the following specification for the con-
ditional mean and variance: for stock i, i = 1 or 2, its daily logarithmic return rit follows rit = µi + σitεit,
σ2it = ωi + βiσ
2i,t−1 + αiσ
2i,t−1ε
2i,t−1 + δiσ
2i,t−1ε
2i,t−11εi,t−1≤0 + γiRV
1 mini,t−1 .
(7.1)
That is, we assume a constant conditional mean, and a GJR-GARCH (Glosten et al. (1993))
volatility model augmented with lagged one-minute RV.
The baseline correlation model is Engle’s (2002) DCC model as considered in Simulation C; see
equation (6.11). The other three models are extensions of the baseline model. The first extension
is the asymmetric DCC (A-DCC) model of Cappiello et al. (2006), which is designed to capture
asymmetric reactions in correlation to the sign of past shocks:
Qt = Q (1− a− b− d) + bQt−1 + a εt−1εᵀt−1 + d ηt−1η
ᵀt−1, where ηt ≡ εt 1εt≤0. (7.2)
The second extension (R-DCC) augments the DCC model with the 65-minute realized correlation
matrix. This extension is in the same spirit as Noureldin et al. (2012), and is designed to exploit
high-frequency information about current correlation:
Qt = Q (1− a− b− g) + bQt−1 + a εt−1εᵀt−1 + g RC65 min
t−1 . (7.3)
The third extension (AR-DCC) encompasses both A-DCC and R-DCC with the specification
Qt = Q (1− a− b− d− g) + bQt−1 + a εt−1εᵀt−1 + d ηt−1η
ᵀt−1 + g RC65 min
t−1 . (7.4)
We conduct pairwise comparisons, under the quadratic loss function, of forecasts based on
these four models, which include both nested and nonnested cases. We use the framework of
14For all sampling intervals we use the “subsample-and-average” estimator of Zhang et al. (2005), with five sub-
samples when ∆ = 5 seconds, and with ten equally-spaced subsamples for the other choices of sampling frequency.
40
Giacomini and White (2006), so that nested and nonnested models can be treated in a unified
manner. Each one-day-ahead forecast is constructed in a rolling scheme with fixed estimation
sample size R = 1500 and prediction sample size P = 1233.
7.2 Results
Table 4 presents results for comparisons of each of the three generalized models and the baseline
DCC model, using both the GW–NW and the GW–KV tests; this amounts to conducting one-sided
t-test for (2.11) with χ set to be zero. The results in the first and fourth columns indicate that
the A-DCC model does not improve predictive accuracy relative to the baseline DCC model. The
GW–KV tests reveal that the loss of the A-DCC forecast is not statistically different from that of
DCC. The GW–NW tests, on the other hand, report statistically significant outperformance of the
A-DCC model relative to the DCC for some proxies, however this finding should be interpreted
with care, as the GW–NW test was found to over-reject in finite samples in Simulation C of
the previous section. Interestingly, for the MSFT–AAPL pair, the more general A-DCC model
actually underperforms the baseline model, though the difference is not significant. The next
columns reveal that the R-DCC model outperforms the DCC model, particularly for the MSFT–
AAPL pair, where the finding is highly significant and robust to the choice of proxy. Finally, we
find that the AR-DCC model outperforms the DCC model, however the statistical significance
of the outperformance of AR-DCC depends on the testing method. In view of the over-rejection
problem of the GW–NW test, we conclude conservatively that the AR-DCC is not significantly
better than the baseline DCC model.
Table 5 presents results from pairwise comparisons among the generalized models. Consistent
with the results in Table 4, we find that the A-DCC forecast underperforms those of R-DCC and
AR-DCC, and significantly so for MSFT–AAPL. The comparison between R-DCC and AR-DCC
yields mixed, but statistically insignificant, findings across the two pairs of stocks.
Overall, we find that augmenting the DCC model with lagged realized correlation significantly
improves its predictive ability, while adding an asymmetric term to the DCC model generally does
not improve, and sometimes hurts, its forecasting performance. These findings are robust to the
choice of proxy.
41
GW–NW GW–KVDCC vs DCC vs DCC vs DCC vs DCC vs DCC vs
Proxy RC∆t+1 A-DCC R-DCC AR-DCC A-DCC R-DCC AR-DCC
Panel A. PG–GE Correlation
∆ = 1 min 1.603 3.130∗ 2.929∗ 1.947 1.626 1.745∆ = 5 min 1.570 2.932∗ 2.724∗ 1.845 2.040 2.099∆ = 15 min 1.892∗ 2.389∗ 2.373∗ 2.047 1.945 1.962∆ = 30 min 2.177∗ 1.990∗ 2.206∗ 2.246 1.529 1.679∆ = 65 min 1.927∗ 0.838 1.089 1.642 0.828 0.947∆ = 130 min 0.805 0.835 0.688 0.850 0.830 0.655
Panel B. MSFT–AAPL Correlation
∆ = 1 min -0.916 2.647∗ 1.968∗ -1.024 4.405∗ 3.712∗
∆ = 5 min -1.394 3.566∗ 2.310∗ -1.156 4.357∗ 2.234∆ = 15 min -1.391 3.069∗ 1.927∗ -1.195 4.279∗ 2.116∆ = 30 min -1.177 3.011∗ 2.229∗ -1.055 3.948∗ 2.289∆ = 65 min -1.169 2.634∗ 2.071∗ -1.168 3.506∗ 2.222∆ = 130 min -1.068 1.825∗ 1.280 -1.243 3.342∗ 1.847
Table 4: T-statistics for out-of-sample forecast comparisons of correlation forecasting models. Inthe comparison of “A vs B,” a positive t-statistic indicates that B outperforms A. The 95% criticalvalues for one-sided tests of the null are 1.645 (GW–NW, in the left panel) and 2.774 (GW–KV,in the right panel). Test statistics that are greater than the critical value are marked with anasterisk.
8 Concluding remarks
This paper proposes a simple but general framework for the problem of testing predictive ability
when the target variable is unobservable. We consider an array of popular forecast evaluation
methods, including, for example, Diebold and Mariano (1995), West (1996), White (2000), Giaco-
mini and White (2006) and McCracken (2007), in cases where the latent target variable is replaced
by a proxy computed using high-frequency (intraday) data. We derive convergence rate results
for general classes of high-frequency based estimators of volatility and jump functionals, which
cover a majority of existing estimators as special cases, such as realized (co)variance, truncated
(co)variation, bipower variation, realized correlation, realized beta, jump power variation, realized
semivariance, realized Laplace transform, realized skewness and kurtosis. Based on these results,
we provide conditions under which the moments that define the proxy hypotheses converge suffi-
ciently quickly to their counterparts under the true hypotheses, so that the feasible tests based on
42
GW–NW GW–KVA-DCC vs A-DCC vs R-DCC vs A-DCC vs A-DCC vs R-DCC vs
Proxy RC∆t+1 R-DCC AR-DCC AR-DCC R-DCC AR-DCC AR-DCC
Panel A. PG–GE Correlation
∆ = 1 min 2.231∗ 2.718∗ 0.542 1.231 1.426 0.762∆ = 5 min 2.122∗ 2.430∗ 0.355 1.627 1.819 0.517∆ = 15 min 1.564 1.969∗ 0.888 1.470 1.703 1.000∆ = 30 min 0.936 1.561 1.282 0.881 1.271 0.486∆ = 65 min -0.110 0.391 1.039 -0.153 0.413 0.973∆ = 130 min 0.503 0.474 -0.024 0.688 0.516 -0.031
Panel B. MSFT–AAPL Correlation
∆ = 1 min 3.110∗ 3.365∗ -1.239 3.134∗ 3.657∗ -1.580∆ = 5 min 4.005∗ 4.453∗ -1.554 4.506∗ 6.323∗ -1.586∆ = 15 min 3.616∗ 4.053∗ -1.307 4.044∗ 5.449∗ -1.441∆ = 30 min 3.345∗ 3.770∗ -0.834 4.635∗ 7.284∗ -0.882∆ = 65 min 2.999∗ 3.215∗ -0.542 6.059∗ 7.868∗ -0.635∆ = 130 min 2.223∗ 2.357∗ -1.039 3.392∗ 5.061∗ -1.582
Table 5: T-statistics for out-of-sample forecast comparisons of correlation forecasting models. Inthe comparison of “A vs B,” a positive t-statistic indicates that B outperforms A. The 95% criticalvalues for one-sided tests of the null are 1.645 (GW–NW, in the left panel) and 2.774 (GW–KV,in the right panel). Test statistics that are greater than the critical value are marked with anasterisk.
proxies are valid under not only the former, but also the latter. In so doing, we bridge the vast
literature on forecast evaluation and the burgeoning literature on high-frequency time series. The
theoretical framework is structured in a way to facilitate further extensions in both directions.
We verify that the asymptotic results perform well in three distinct and realistically calibrated
Monte Carlo studies, though it is possible that finite-sample adjustments may be employed in
specific applications for further improvement. The results in this paper may serve as a general
benchmark for future work along this line. Our empirical application uses these results to reveal
the out-of-sample predictive gains from augmenting the widely-used DCC model (Engle (2002))
with high-frequency estimates of correlation.
References
Aıt-Sahalia, Y. and J. Jacod (2009). Testing for jumps in a discretely observed process. Annals ofStatistics 37, 184–222.
43
Aıt-Sahalia, Y., P. A. Mykland, and L. Zhang (2005). How often to sample a continuous-timeprocess in the presence of market microstructure noise. Review of Financial Studies 18, 351–416.
Amaya, D., P. Christoffersen, K. Jacobs, and A. Vasquez (2011). Do realized skewness and kurtosispredict the cross-section of equity returns? Technical report, University of Toronto.
Andersen, T. G. and T. Bollerslev (1998). Answering the skeptics: yes, standard volatility modelsdo provide accurate forecasts. International Economic Review 39, 885–905.
Andersen, T. G., T. Bollerslev, P. Christoffersen, and F. X. Diebold (2006). Volatility and corre-lation forecasting. In G. Elliott, C. W. J. Granger, and A. Timmermann (Eds.), Handbook ofEconomic Forecasting, Volume 1. Oxford: Elsevier.
Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys (2003). Modeling and forecastingrealized volatility. Econometrica 71 (2), pp. 579–625.
Andersen, T. G., T. Bollerslev, and N. Meddahi (2005). Correcting the errors: Volatility forecastevaluation using high-frequency data and realized volatilities. Econometrica 73 (1), pp. 279–296.
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrixestimation. Econometrica 59 (3), pp. 817–858.
Bandi, F. and R. Reno (2016). Price and volatility co-jumps. Journal of Financial Economics 119,107–146.
Bandi, F. M. and J. R. Russell (2008). Microstructure noise, realized volatility and optimalsampling. Review of Economic Studies 75, 339–369.
Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2008). Designing realizedkernels to measure the ex post variation of equity prices in the presence of noise. Economet-rica 76 (6), 1481–1536.
Barndorff-Nielsen, O. E., S. Kinnebrouck, and N. Shephard (2010). Measuring downside risk:Realised semivariance. In T. Bollerslev, J. Russell, and M. Watson (Eds.), Volatility and TimeSeries Econometrics: Essays in Honor of Robert F. Engle, pp. 117–136. Oxford University Press.
Barndorff-Nielsen, O. E. and N. Shephard (2004a). Econometric analysis of realized covariation:High frequency based covariance, regression, and correlation in financial economics. Economet-rica 72 (3), pp. 885–925.
Barndorff-Nielsen, O. E. and N. Shephard (2004b). Power and bipower variation with stochasticvolatility and jumps (with discussion). Journal of Financial Econometrics 2, 1–48.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econo-metrics 31, 307–327.
Bollerslev, T. and H. Zhou (2002). Estimating stochastic volatility diffusions using conditionalmoments of integrated volatility. Journal of Econometrics 109, 33–65.
44
Cappiello, L., R. F. Engle, and K. Sheppard (2006). Asymmetric dynamics in the correlations ofglobal equity and bond returns. Journal of Financial Econometrics 4, 537–572.
Clark, T. E. and M. W. McCracken (2005). Evaluating direct multistep forecasts. EconometricReviews 24, 369–404.
Comte, F. and E. Renault (1998). Long memory in continuous-time stochastic volatility models.Mathematical Finance 8, 291–323.
Corradi, V. and W. Distaso (2006). Semi-parametric comparison of stochastic volatility modelsusing realized measures. The Review of Economic Studies 73 (3), pp. 635–667.
Corradi, V., W. Distaso, and N. R. Swanson (2009). Predictive density estimators for dailyvolatility based on the use of realized measures. Journal of Econometrics 150, 119–138.
Corradi, V., W. Distaso, and N. R. Swanson (2011). Predictive inference for integrated volatility.Journal of the American Statistical Association 106, 1496–1512.
Corradi, V. and N. R. Swanson (2007). Nonparametric bootstrap procedures for predictive infer-ence based on recursive estimation schemes. International Economic Review 48 (1), pp. 67–109.
Corsi, F. (2009). A simple approximate long memory model of realized volatility. Journal ofFinancial Econometrics 7, 174–196.
Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press.
Diebold, F. X. and R. S. Mariano (1995). Comparing predictive accuracy. Journal of Businessand Economic Statistics, 253–263.
Duffie, D. (2001). Dynamic Asset Pricing Theory (Third ed.). Princeton University Press.
Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the varianceof U.K. inflation. Econometrica 50, 987–1008.
Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate generalizedautoregressive conditional heteroskedasticity models. Journal of Business & Economic Statis-tics 20 (3), pp. 339–350.
Engle, R. F. (2008). Anticipating Correlations. Princeton, New Jersey: Princeton University Press.
Foster, D. and D. B. Nelson (1996). Continuous record asymptotics for rolling sample varianceestimators. Econometrica 64, 139–174.
Giacomini, R. and B. Rossi (2009). Detecting and predicting forecast breakdowns. The Review ofEconomic Studies 76 (2), pp. 669–705.
Giacomini, R. and H. White (2006). Tests of conditional predictive ability. Econometrica 74 (6),1545–1578.
Goncalves, S. and R. de Jong (2003). Consistency of the stationary bootstrap under weak momentconditions. Economic Letters 81, 273–278.
45
Goncalves, S. and H. White (2002). The bootstrap of the mean for dependent heterogeneousarrays. Econometric Theory 18 (6), pp. 1367–1384.
Granger, C. (1999). Outline of forecast theory using generalized cost functions. Spanish EconomicReview 1, 161–173.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica 50, 1029–1054.
Hansen, P. R. (2005). A test for superior predictive ability. Journal of Business & EconomicStatistics 23 (4), 365–380.
Hansen, P. R. and A. Lunde (2006). Consistent ranking of volatility models. Journal of Econo-metrics 131, 97–121.
Hansen, P. R., A. Lunde, and J. M. Nason (2011). The model confidence set. Econometrica 79 (2),pp. 453–497.
Hansen, P. R. and A. Timmermann (2012). Choice of sample split in out-of-sample forecastevaluation. Technical report, European University Institute.
Huang, X. and G. T. Tauchen (2005). The relative contribution of jumps to total price variance.Journal of Financial Econometrics 4, 456–499.
Inoue, A. and L. Kilian (2004). In-sample or out-of-sample tests of predictability: Which oneshould we use? Econometric Reviews 23, 371–402.
Jacod, J. (2008). Asymptotic properties of realized power variations and related functionals ofsemimartingales. Stochastic Processes and their Applications 118, 517–559.
Jacod, J. and P. Protter (2012). Discretization of Processes. Springer.
Jacod, J. and M. Rosenbaum (2013). Quarticity and Other Functionals of Volatility: EfficientEstimation. Annals of Statistics 118, 1462–1484.
Kanaya, S. and D. Kristensen (2016). Estimation of stochastic volatility models by nonparametricfiltering. Econometric Theory 32, 861–916.
Kiefer, N. M. and T. J. Vogelsang (2005). A new asymptotic theory for heteroskedasticity-autocorrelation robust tests. Econometric Theory 21, pp. 1130–1164.
Kristensen, D. (2010). Nonparametric filtering of the realized spot volatility: A kernel-basedapproach. Econometric Theory 26 (1), pp. 60–93.
Lepingle, D. (1976). La variation d’ordre p des semi-martingales. Zeitschrift fur Wahrschein-lichkeitstheorie und Verwandte Gebiete 36, 295–316.
Mancini, C. (2001). Disentangling the jumps of the diffusion in a geometric jumping Brownianmotion. Giornale dell’Istituto Italiano degli Attuari LXIV, 19–47.
McCracken, M. W. (2000). Robust out-of-sample inference. Journal of Econometrics 99, 195–223.
46
McCracken, M. W. (2007). Asymptotics for out of sample tests of granger causality. Journal ofEconometrics 140, 719–752.
Muller, U. (2012). Hac corrections for strongly autocorrelated time series. Technical report,Princeton University.
Newey, W. K. and K. D. West (1987). A simple, positive semidefinite, heteroskedasticity andautocorrelation consistent covariance matrix. Econometrica 55, 703–708.
Noureldin, D., N. Shephard, and K. Sheppard (2012). Multivariate high-frequency-based volatility(HEAVY) models. Journal of Applied Econometrics 27, 907–933.
Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies. Journal ofEconometrics 160 (1), 246–256.
Patton, A. J. and K. Sheppard (2013). Good volatility, bad volatility: Signed jumps and thepersistence of volatility. Review of Economics and Statistics. Forthcoming.
Patton, A. J. and A. Timmermann (2010). Generalized forecast errors, a change of measure, andforecast optimality conditions. In Volatility and Time Series Econometrics: Essays in Honor ofRobert F. Engle. Oxford University Press.
Politis, D. N. and J. P. Romano (1994). The stationary bootstrap. Journal of the AmericanStatistical Association, 1303–1313.
Reno, R. (2006). Nonparametric estimation of stochastic volatility models. Economics Letters 90,390–395.
Romano, J. P. and M. Wolf (2005). Stepwise multiple testing as formalized data snooping. Econo-metrica 73 (4), 1237–1282.
Singleton, K. J. (2006). Empirical Dynamic Asset Pricing: Model Specification and EconometricAssessment. Princeton University Press.
Todorov, V. (2009). Estimation of Continuous-time Stochastic Volatility Models with Jumps usingHigh-Frequency Data. Journal of Econometrics 148, 131–148.
Todorov, V. and G. Tauchen (2012). The realized Laplace transform of volatility. Econometrica 80,1105–1127.
Todorov, V., G. Tauchen, and I. Grynkiv (2011). Realized laplace transforms for estimation ofjump diffusive volatility models. Journal of Econometrics 164, 367–381.
Vetter, M. (2010). Limit theorems for bipower variation of semimartingales. Stochastic Processesand their Applications 120, 22–38.
West, K. D. (1996). Asymptotic inference about predictive ability. Econometrica 64 (5), pp.1067–1084.
West, K. D. (2006). Forecast evaluation. In Handbook of Economic Forecasting. North HollandPress, Amsterdam.
47
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25.
White, H. (2000). A reality check for data snooping. Econometrica 68 (5), 1097–1126.
White, H. (2001). Asymptotic Theory for Econometricians. Academic Press.
Zhang, L. (2006). Efficient estimation of stochastic volatility using noisy observations: A multi-scale approach. Bernoulli 12, 1019–1043.
Zhang, L., P. A. Mykland, and Y. Aıt-Sahalia (2005). A tale of two time scales: Determiningintegrated volatility with noisy high-frequency data. Journal of the American Statistical Asso-ciation 100, 1394–1411.
48