View
9
Download
0
Category
Preview:
Citation preview
Heteroskedasticity/Autocorrelation Consistent
Standard Errors and the Reliability of Inference
Aris Spanos
Department of Economics,
Virginia Tech, USA
James J. Reade
Department of Economics,
University of Reading, UK
April 2015 [First draft]
Abstract
The primary aim of the paper is to investigate the error-reliability of F
tests that use Heteroskedasticity-Consistent Standard Errors (HCSE) and Het-
eroskedasticity and Autocorrelation-Consistent Standard Errors (HACSE) us-
ing Monte Carlo simulations. For the design of the appropriate simulation
experiments a broader perspective for departures from the homoskedasticity
and autocorrelation assumptions is proposed to avoid an internally inconsis-
tent set of probabilistic assumptions. Viewing regression models as based on
the first two conditional moments of the same conditional distribution brings
out the role of the other probabilistic assumptions such as Normality and Lin-
earity, and provides a more coherent framework. The simulation results under
the best case scenario for these tests show that all the HCSE/HACSE-based
tests exhibit major size and power distortions. The results call seriously into
question their use in practice as ways to robustify the OLS-based F-test.
1
1 Introduction
The basic objective of the paper is to revisit certain aspects of the traditional strate-
gies in dealing with departures from the homoskedasticity and autocorrelation as-
sumptions in the context of the Linear Regression (LR) model. In particular, the
paper aims to appraise the error-reliability of Heteroskedasticity-Consistent Standard
Errors (HCSE) (White, 1980) and its extension to Heteroskedasticity and Autocorre-
lation Consistent Standard Errors (HACSE); see Newey and West (1987), Andrews
(1991), Hansen (1992), Robinson (1998), Kiefer et al (2005).
The conventional wisdom relating to the use of HCSE has been articulated aptly
by Hansen (1999) in the form of the following recommendation to practitioners:
“... omit the tests of normality and conditional heteroskedasticity, and replace
all conventional standard errors and covariance matrices with heteroskedasticity
-robust versions.” (p. 195)
The heteroskedasticity-robust versions of the conventional standard errors and co-
variance matrices refers to HCSE/HACSE as they pertain to testing hypotheses con-
cerning the unknown regression coefficients.
The question raised by the above recommendation is how one should evaluate the
robustness of these procedures. It is argued that the proper way to do that is in
terms of the error-reliability of the inference procedures it gives rise to. In the case
of estimation one needs to consider the possibility that the robustness strategy gives
rise to non-optimal (e.g. inconsistent) estimators. In the case of testing, robustness
should be evaluated in terms of the discrepancy between the relevant actual and
nominal (assumed) error probabilities. As argued by Phillips (2005a):
“Although the generality that HAC estimation lends to inference is appealing, our en-
thusiasm for such procedures needs to be tempered by knowledge that finite sample
performance can be very unsatisfactory. Distortions in test size and low power in testing
are both very real problems that need to be acknowledged in empirical work and on which
further theoretical work is needed.” (p. 12)
The key problem is that any form of statistical misspecification, including the pres-
ence of heteroskedasticity/autocorrelation is likely to induce a discrepancy between
the nominal and actual error probabilities of any test procedure, and when this dis-
crepancy is sizeable, inferences are likely to give rise to erroneous results. The surest
way to an unreliable inference is to use a 05 significance level test but the actual
type I error is closer to 90. Such discrepancies can easily arise in practice even with
are often considered ‘minor’ departures from certain probabilistic assumptions; see
Spanos and McGuirk (2001). Hence, the need to evaluate such discrepancies asso-
ciated with the robustified procedures using HCSE/HACSE to ensure that they are
reliable enough for inference purposes.
The primary aim of the discussion that follows is twofold. First, to place the
traditional recommendation of ignoring non-Normality and using HCSE/HACSE for
inference in conjunction with the OLS estimators in a broader perspective that sheds
2
light on what is being assumed for the traditional account to hold and what are the
possible errors that might undermine this account. Second, to evaluate the extent
to which this strategy addresses the unreliability of inference problem raised by the
presence of heteroskedasticity/autocorrelation. Section 2 revisits the robustification
of inference procedures when dealing with departures from non-Normality and het-
eroskedasticity and calls into question some of the presumptions underlying the use of
the HCSE for inference purposes. It is argued that specifying the probabilistic struc-
ture of the LR model in terms of the error term provides a very narrow perspective on
the various facets of modeling and inference, including specification (specifying the
original model), misspecification testing (evaluating the validity of the model’s as-
sumptions) and respecification (respecifying the original model when found wanting).
Section 3 presents a broader probabilistic perspective that views statistical models
as parameterizations of the observable processes Z:=(X) ∈N underlying thedata Z0:=(x) =1 2 . When viewed in the broader context, the proba-bilistic assumptions specifying the LR model are shown to be interrelated, and thus
relaxing one at a time can be misleading in practice. In addition, the traditional
perspective on HCSE/HACSE is shown to involve several implicit assumptions that
might be inappropriate in practice. For example, the use of the HACSE raises the
possibility that the OLS estimator might be inconsistent when certain (implicit) re-
strictions imposed on the data are invalid. Section 4 uses Monte Carlo simulations
with a view to investigate the effectiveness of the HCSE strategies vis-a-vis the relia-
bility of inference pertaining to linear restrictions on the regression coefficients. The
simulation results are extended in section 5 to include the error-reliability of testing
procedures that use the HACSE. The simulation results call into question the wide-
spread use of the HCSE/HACSE-based tests because they are shown to have serious
size and power distortions that would lead inferences astray in practice.
2 Revisiting non-Normality/Heteroskedasticity
2.1 The traditional perspective on Linear Regression
The traditional Linear Regression (LR) model (table 1) is specified in terms of the
probabilistic assumptions (1)-(4) pertaining to the error term process (|X=x) ∈N,where x denotes the observed values of regressors X.
Table 1 - The Linear Regression (LR) Model
= 0 + β>1 x + ∈N=(1 2 )
(1) Normality: (|X=x) v N( ),(2) Zero mean: (|X=x)=0
(3) Homoskedasticity: (2 |X=x)=2
(4) No autocorrelation: (|X=x)=0 for
3
For inference purposes the LR model is framed in terms of the data:
Z0:=(x) =1 2 Z0:=(y:X1) y:(× 1) X1:(× )
and expressed in the matrix notation:
y=Xβ + u
where X:=(1:X1) is an (× [+1]) matrix, 1 is a column of 1’s and β>:=¡0:β
>1
¢.
To secure the data information adequacy, the above assumptions are supplemented
with an additional assumption:
(5) No collinearity: Rank(X)=+1
The cornerstone of the traditional LR model is the Gauss-Markov theorem for the
‘optimality’ of the OLS estimator:bβ = (X>X)−1X>y
as Best Linear Unbiased Estimator (BLUE) of β under the assumptions (2)-(5), i.e.,bβ has the smallest variance (relatively efficient) within the class of linear and unbiasedestimators.
The above recommendation to ignore any departures from assumptions (1) and
(3) and use HCSE, stems from four key presumptions.
The first presumption, stemming from the Gauss-Markov theorem, is that the
Normality assumption does not play a key role in securing the optimality of the OLS
estimator for inference purposes.
The second presumption is that all inferences based on the OLS estimators bθ:=(bβ 2):2= 1
−−1bu>bu bu=y−Xbβof θ:=(β2) can be based on the asymptotic approximations:bβ v
N(β2(X>X)−1) 2
P→ 2 (1)
where ‘v’ denotes ‘asymptotically distributed’ and ‘
P→’ denotes ‘convergence in prob-ability’.
The third presumption is that when the homoskedasticity assumption in (3) is
invalid, and instead:
(3)* (uu>)=Λ=diag(21
22
2)6=σ2I (2)
the OLS estimator bβ remains unbiased and consistent, but is less efficient relative tothe GLS estimator eβ= ¡X>Λ−1 X
¢−1X>Λ−1 y since:
(bβ)=(X>X)−1X>ΛX(X>X)−1 ≥ (eβ)= ¡X>Λ−1 X
¢−1
In light of that, one could proceed to do inference using bβ if only a consistent esti-mator of (bβ) can be found. White (1980) argued that, although estimating Λ
4
suffers from the incidental parameter problem — the number of unknown parameters
(21 22
2) increases with — estimating G=
¡X>ΛX
¢does not since it involves
only 12(+1) unknown terms. He proposed a consistent estimator of G:bΓ(0)= ¡ 1
P
=1 b2 (xx> )¢ that gives rise to Heteroskedasticity-Consistent Standard Errors (HCSE) for bβ, byreplacing 2(X>X)−1 in (1) with:
\(bβ)= 1
∙³X>X)
´−1 ¡1
P
=1 b2xx> ¢ ³X>X)
´−1¸ (3)
The fourth presumption is that one can view the validation of the LR model one
probabilistic assumption at a time. This presumption is more subtle because it is
often insufficiently realized that the assumptions (1)-(4) are interrelated and thus
any form of model validation should take that into account. The misleading im-
pression that departures from these assumptions can be viewed individually stems
partly from the fact that the probabilistic structure is specified of the error process
( | X=x) ∈N=(1 2 ) However, the probabilistic structure that mat-ters for modeling and inference is that of the observable stochastic process Z ∈Nunderlying the data Z0:=(x) =1 2 . This is because the latter definesthe distribution of the sample as well as the likelihood function, that provide the
cornerstones of both frequentist and Bayesian approaches to inference.
2.2 How narrow is the traditional perspective?
All four presumptions behind the recommendation to ignore non-Normality and use
HCSE for testing hypotheses pertaining to β such as:
0: β = 0 vs. 1: β 6= 0 (4)
can be called into question in terms of being vulnerable to several potential errors
that can undermine the reliability of inference.
To begin with, the Gauss-Markov theorem is of very little value for inferences pur-
poses for several reasons. First, the ‘linearity’ of bβ is a phony property, unbiasedness((bβ)=β) without consistency is dubious, and relative efficiency within an artificiallyrestricted class of estimators is of very limited value. In addition, despite the fact
that the theorem pertains to the finite sample properties of bβ it cannot be used asa reliable basis for inference because it yields an unknown sampling distribution forbβ i.e. bβ v ?
D(β 2(X>X)−1) (5)
This provides a poor basis for any form of finite sample inference that involves error
probabilities calling for the evaluation of tail-areas. The problem was demonstrated
by Bahadur and Savage (1956) in the context of the simple Normal model:
v NIID( 2) θ:=( 2)∈R×R+ ∈R ∈N
5
They showed that when the Normality assumption is replaced with the existence of
the first two moments, no unbiased or consistent t-type test for 0: =0 vs. 1:
6= 0 exist:“It is shown that there is neither an effective test of the hypothesis that = 0,
nor an effective confidence interval for , nor an effective point estimate of . These
conclusions concerning flow from the fact that is sensitive to the tails of the population
distribution; parallel conclusions hold for other sensitive parameters, and they can be
established by the same methods as are here used for .” (p. 1115)
That is, the existence of the first two moments (or even all moments) provides
insufficient information to pin down the tail-areas of the relevant distributions to
evaluate the type I and II error probabilities with any accuracy.
The Bahadur-Savage result explains why, practitioners who otherwise praise the
Gauss-Markov theorem, ignore it when it comes to inference, and instead they appeal
to the asymptotic distributions of bθ:=(bβ 2). To justify the asymptotic approxima-tions in (1), however, calls for supplementing the Gauss-Markov assumptions (2)-(5)
with additional restrictions on the underlying vector stochastic process Z:=(X)
∈N such as:lim→∞
(X>X)=Σ22 0 (P) lim
→∞(X>y)
=21 6=0
Lastly, despite the impression given by the Gauss-Markov theorem that the Nor-
mality assumption has a very limited role to play in inferences pertaining to θ:=(β2),
the error-reliability of inference concerning0 in (4) is actually adversely affected, not
only by the Normality in assumption (1), but also by the ‘non-Normality’ (especially
any form of skewness) of the X ∈N process; see Ali and Sharma (1996).The third presumption is also questionable two reasons. First, the claim about the
GLS being relatively more efficient than the OLS estimator depends on the covariance
matrix V being known, which is never the case in practice. Second, it relies on
asymptotics without any reason to believe that the use of the HCSE will alleviate
the unreliability of inference problem evaluated in terms of the discrepancy between
the actual and nominal error probabilities for a given . Indeed, there are several
potential errors that can give rise to sizeable discrepancies.
The fourth presumption is potentially the most questionable because what is of-
ten insufficiently appreciated is that the error process ( | X=x) ∈N providesa much narrower perspective on the specification, misspecification and respecification
facets of inference because, by definition =−0−β>1 x and thus the error processretains the systematic component ( | X=x)=0 + β>1 x. This was first demon-strated by Sargan (1964) by contrasting the modeling of temporal dependence using
the error term with modeling it using the observable process Z:=(X) ∈N Heshowed that the LR model with an AR(1) error term:
=0+β>1 x + =−1+ → =0(1−) + β>1 x+−1−β>1 x+
is a special case of a Dynamic Linear Regression (DLR) model:
= 0+α>1 x+2−1+α
>3 x−1 + (6)
6
subject to (non-linear) common factor restrictions:
α3+α12=0 (7)
The DLR model in (6) arises naturally by modeling the Markov dependence directly
in terms of the observable process Z:=(X) ∈N via:
= (| X=xZ−1) + ∈N
In relation to (7), it is important to note that the traditional claim that the OLS
estimator bβ retains its unbiasedness and consistency despite departures from the no-autocorrelation assumption, presumes the validity of the common factor restrictions;
see Spanos (1986). The latter is extremely important because without it bβ will bean inconsistent estimator.
A strong case can be made that the probabilistic assumptions (1)-(4) pertaining
to the error term provide an inadequate perspective for all three facets of modeling:
specification , misspecification testing and respecification. To illustrate that consider
the case where potential departures from assumption (4), i.e.
(|X=x) 6= 0 for (8)
are being probed using the Durbin-Watson (D-W) misspecification test based on:
0 : = 0 vs. 1 : 6= 0
in the context of the Autocorrelation-Corrected (A-C) LR model:
=0+β>1 x + =−1+ ∈N (9)
In the case where the D-W rejects 0 it is traditionally interpreted as providing
evidence for the (A-C) LR model in (9).
Hence, the traditional respecification takes the form of adopting (9) and declar-
ing that the misspecification has been accounted for by replacing the original OLS
estimator with a Generalized Least Squares (GLS) type estimator. This strategy,
however, constitutes a classic example of the fallacy of rejection: misinterpreting re-
ject 0 [evidence against 0] as evidence for a particular 1. A rejection of 0 by
a D-W test provides evidence against 0 and for the presence of generic temporal
dependence in (8), but does not provide any evidence for the particular form assumed
by 1:
1: (|X=x)=(|−|1−2 )
2 =1 2 (10)
which stems from the AR(1) model for the error term; (10) represents one of an
infinite number of dependence forms that (8) allows for. To have evidence for (9)
one needs to validate all the assumptions of the (A-C) LR model anew, including the
validity of the common factor restrictions in (7); see Mayo and Spanos (2004).
The broader and more coherent vantage point for all three facets of modeling
stems from viewing the Linear Regression model as specified in terms of the first two
7
moments of the same conditional distribution ( | X;θ) i.e., the regression and
skedastic functions:
( | X=x)=(x) ( | X=x)=(x) x∈R (11)
where the functional forms () and () are determined by the joint distribution
(X;ϕ). From this perspective, departures from assumptions pertaining to one
of the two functions is also likely to affect the other. As argued next, the above
claims pertaining to the properties of bβ under (3)* depend crucially on retaining
some of other Gauss-Markov assumptions, especially the linearity of the regression
function stemming from assumption (2). The broader perspective indicates that
relaxing one assumption at a time might not be a good general strategy because the
model assumptions are often interrelated.
Having said that, the focus of discussion in this paper will be the best case sce-
nario for the traditional textbook perspective on non-Normality and heteroskedas-
ticity/autocorrelation. This is the case where all the other assumptions of the Linear
Regression model, apart from Normality and homoskedasticity, are retained and the
departures from the latter are narrowed down to accord with the traditional perspec-
tive.
The question which naturally arises at this stage is: what are the probabilis-
tic assumptions of the LR model whose validity is retained by this best case sce-
nario? To answer that question unambiguously one needs to specify the LR model
in terms of a complete set of probabilistic assumptions pertaining to the observable
process Z:=(X) ∈N underlying the data Z0 Choosing an arbitrary form of
heteroskedasticity and simulating the LR model does not render the combination a
coherent statistical model.
3 A broader Probabilistic perspective
Viewing the LR model as a parameterization of the observable vector stochastic
process Z:=(X) ∈N underlying the data Z0, has numerous advantages, in-cluding (i) providing a complete and internally consistent set of testable probabilistic
assumptions, (ii) providing well-defined statistical parameterizations for the model
parameters, (iii) bringing out the interrelationship among the probabilistic assump-
tions specifying the LR model, and (iv) enabling the modeler to distinguish between
the statistical and the substantive premises of inference; see Spanos (2006).
This is achieved by relating the LRmodel to the joint distribution of the observable
process Z:=(X) ∈N via the sequential conditioning:(Z1 Z;φ) =
Q
=1(Z;ϕ)=Q
=1(|X;ψ1)·(X;ψ2) (12)
The joint distribution (Z;ϕ) takes the form:µX
¶v N
µµ1μ2
¶
µ11 σ>21σ21 Σ22
¶¶ (13)
8
which then gives rise to the conditional and marginal distributions, (|X;ψ1) and
(X;ψ2) :
( | X=x) v N(0+β>1 x 2)X v N(μ2Σ22)
The LR model, as a purely probabilistic construct, is specified exclusively in terms
of the conditional distribution (|X;ψ1) due to weak exogeneity; see Engle et
al (1983). To be more specific, the LR model comprises the statistical Generating
Mechanism (GM) in conjunction with the probabilistic assumptions [1]-[5] (table 2).
This purely probabilistic construal of the Normal, Linear Regression model brings
out the relevant parameterizations of interest in θ in terms of the primary parame-
ters ϕ:=(μ1μ211σ21Σ22) as well as the relationships between the probabilistic
assumptions of Z> :=(X> ) ∈N and the model assumptions (table 3).
Table 2: The Normal, Linear Regression Model
Statistical GM: = 0 + β>1 x + ∈N
[1] Normality: ( | X=x) v N( ),[2] Linearity: ( | X=x)=0 + β
>1 x linear in x
[3] Homoskedasticity: ( | X=x)=2 free of x∈R
[4] Independence: ( | X=x) ∈N is an independent process,[5] t-invariance: θ:=(0β1
2) do not change with
where 0=1−β>1 μ2 β1=Σ−122 σ21 2=11−σ>21Σ−122 σ21
Table 3: Reduction vs. model assumptions
Reduction: Z ∈N Model: (|X=x) ∈NN −→ [1]-[3]
I −→ [4]
ID −→ [5]
These relationships are particularly important for guiding Mis-Specification (M-S)
testing — probing the validity of the model assumptions [1]-[5] — as well as respecifi-
cation — choosing an alternative model when any of the assumptions [1]-[5] are found
to be invalid for data Z0
Of particular interest in this paper is the connection between [1] Normality and
[3] Homoskedasticity assumptions which stems from the joint Normality of Z In
relation to heteroskedasticity, it is very important to emphasize that the traditional
assumption in (2) does not distinguish between heteroskedasticity and conditional
variance heterogeneity:
(|X=x)=2() for x∈R
∈Nwhere 2() is an unknown function of the index . This distinction is crucial because
the source of two departures is very different, i.e. heteroskedasticity stems from
non-Normality of Z but heterogeneity stems from the covariance heterogeneity of
Z (table 3).
9
3.1 Viewing heteroskedasticity in a broader perspective
The recommendation to ignore any departures from Normality and Heteroskedasticity
and instead use HCSE for inferences pertaining to θ:=(0β1 2) takes for granted
that all the other assumptions [2], [4]-[5] are valid for data Z0 This is unlikely to be
the case in practice because of the interrelationships among the model assumptions. It
is obvious that in cases where other model assumptions, in addition to [1] and [3], are
invalid, the above recommendation will be a terrible strategy for practitioners. Hence,
for the purposes of an objective evaluation of this recommendation it is important to
focus on the best case scenario which assumes that the model assumptions [2], [4]-[5]
are valid for data Z0
Due to the interrelationship between the Normality of Z and model assumptions
[1]-[3], it is important to focus on joint distributions that retain the linearity assump-
tion in [2], including the underlying parameterizations of θ:=(0β1 2). Such a
scenario can be accommodated into a coherent PR specification by broadening the
joint Normal distribution in (13) to the Elliptically Symmetric (ES) family (Kelker,
1970), which can be specified in terms of the first two moments of Z via:µX
¶v ES
µµ1μ2
¶
µ11 σ>21σ21 Σ22
¶; ()
¶(14)
For different functional forms () this family includes the Normal, the Student’s t,
the Pearson type II, the Logistic and other distributions; see Fang et al (1990). The
appropriateness of the ES family of distributions for the best case scenario stems from
the fact that:
(i) All the distributions are bell-shape symmetric, resembling the Normal in terms
of the shape of the density function, but its members are either platykurtic (4 3) or
leptokurtic (4 3), with the Normal being the only mesokurtic (4=3) distribution.
(ii) The regression function for all the members of the ES family of distributions
is identical to that of the Normal (Nimmo-Smith, 1979):
( | X=x)=0 + β>1 x x∈R
(iii) The statistical parameterization of θ:=(0β1 2) (table 2) is retained by all
members of the ES family. This ensures that the OLS estimators of (0β1) will be
unbiased and consistent.
(iv) Homoskedasticity characterizes the Normal distribution within the ES fam-
ily, in the sense that all the other distributions have a heteroskedastic conditional
variance; see Spanos (1995) for further discussion.
For all the other members of the ES family the skedastic function takes the generic
form:
( | X=x)=(x)=2[∞
()
()] x∈R
where 2=12(X−μ2)>Σ−122 (X−μ2) and () is the marginal density of x; see Chu
(1973). For instance, in the case where (14) is Student’s t with degrees of freedom
10
(Spanos, 1994):
(|(X))=³
2
+−2
´ ¡1+ 2
2¢ (15)
Interestingly, the above form of heteroskedasticity in (15) is not unrelated to the
misspecification test for homoskedasticity proposed by White (1980), which is based
on testing the hypotheses:
0: γ1 = 0 vs. 1: γ1 6= 0in the context of the auxiliary regression:b2 = 0 + γ
>1 ψ+
where is a white-noise error and ψ:=(·) ≥ =1 2 It is often
claimed that this White test is very general as a misspecification test for departures
from the homoskedasticity assumption; see Greene (2011). The fact of the matter
is that the squares and cross-products of the regressors in ψ stem primarily from
retaining the linearity and the other model assumptions, and relaxing only the Ho-
moskedasticity assumptions. The broader perspective stemming from viewing the LR
model as based on the regression and skedastic functions in (11) would bring out the
fact that the above quadratic form of heteroskedasticity is likely to be the exception,
not the rule when the underlying joint distribution is non-Normal; see Spanos (1999),
ch. 7.
4 Monte Carlo Simulations 1: HCSE
To investigate the reliability of inference in terms of any discrepancies between actual
and nominal error probabilities we focus on linear restrictions on the coefficients β
based on the hypotheses of interest:
0 : (Rβ − r)= 0 vs. 1 : (Rβ − r) 6= 0 rank(R)=It is well-known that the optimal test under assumptions [1]-[5] (table 2) is the F-test
defined by the test statistic:
(y)=(R−r)>[R(X>X)−1R>]
−1(R−r)
2=(R−r)>[R(X>X)−1R>]
−1(R−r)u>u (−−1
)0v F ( −−1)
where bβ=(X>X)−1X>y2= (u>u)−−1 bu= y−Xbβ
in conjunction with the rejection region:
1=y: (y) The best case scenario in the context of which the error-reliability of the above F-
test will be investigated is that the simulated data are generated using the Student’s t
Linear/Heteroskedastic Regression model in table 4. The choice of this model is based
on the fact that, in addition to satisfying conditions (i)-(iv) above, the Student’s
t distribution approximates the Normal distribution as the number of degrees of
freedom () increases.
11
Table 4: Student’s t, Linear/Heteroskedastic Regression model
Statistical GM: = 0 + β>1X + ∈N
[1] Student’s t: (X;θ) is Student’s t with degrees of freedom,
[2] Linearity: (|(X))=0 + β>1X where X: ( × 1)
[3] Heteroskedasticity: (|(X))=³
2
+−2
´³1+ 1
(X−μ2)>Σ−122 (X−μ2)
´
[4] Independence: Z ∈N is an Independent process,[5] t-invariance: θ:=(0β1
2μ2Σ22) are t-invariant,
where 0=1−β>1 μ2 β1=Σ−122 σ21 2=11−σ>21Σ−122 σ21
Maximization of the log-likelihood function for the above Student’s t model in ta-
ble 4, yields MLE of β>:=(0β>1 ) that take an estimated Generalized Least Squares
(GLS) form: bβ=(X>bΩ−1X)−1X>bΩ−1ybΩ=diag (b1 b2 b)
b=e2
+ − 2µ1+1
(X−bμ2)>bΣ−122 (X−bμ2)¶ =1
b2=¡+
¢( u>Ω−1u
)
with bμ2 and bΣ22 denoting the MLE of μ2 and Σ22 where all the above estimators
are derived using numerical optimization; see Spanos (1994).
The F-type test statistic based on the above MLEs takes the form:
(y)=
(R−r)>[R(X>ΩX)−1R>]−1(R−r)
2
0≈ F ( −−1)
where ‘0≈’ denotes ‘distributed approximately under 0’. The asymptotic form of
this test statistic is:
F-MLE: ∞ (y)=(R−r)>[R(X>ΩX)−1R>]
−1(R−r)2
0v2 () (16)
For simplicity we consider the case where R=I:=diag(1 1 1) for the simulationsthat follow.
4.1 Monte Carlo Experiment 1
This design will be based on the same mean and covariance matrix for both the
Normal and the Student’s t distributions:⎛⎝ 12
⎞⎠ v ES
⎛⎝⎛⎝ 3
18
8
⎞⎠⎛⎝ 179375 7 −47 1 2
−4 2 1
⎞⎠⎞⎠12
giving rise to the following true parameter values:
0=19875 β>1 =(8125 −5625) 2=1
However, in order to evaluate the error-reliability of different inference procedures we
will allow different values for degrees of freedom to investigate their reliability as
increases; the Student’s t approximates better the Normal distribution.
In principle one can generate samples of size using the statistical Generating
Mechanism (GM):
y() = 1∗0 + x()β∗1 +
p(x())ε() =1 2
(x)=³
2∗+−2
´³1 + 1
(x −μ∗2)>Σ∗−122 (x −μ∗2)
´
to generate the ‘artificial’ data realizations:¡y(1)y(2) · · · y()¢ where y():=(1 2 )>
using pseudo-random numbers for ε() ∼ St(0 I; +) and X() v St(μ2Σ22; )
However, the simulations are more accurate when they are based on the multivariate
Student’s t distribution:
Z ∼ St (μΣ;) ∈N (17)
and then (μΣ) are reparameterized to estimate the unknown parameters θ:=(0β1 2).
Let U be an dimensional Normal random vector: U ∼ NIID (0 I) Using the transformation: Z=μ+
³q
´−1L>U where L>L=Σ v 2() is a
chi-square distributed random variable with degrees of freedom, and U and
are independent, one can generate data from a Student’s t random vector (17); see
Johnson (1987).
The aim of the above experiment 1 is to investigate the empirical error proba-
bilities (size and power) of the different estimators of (bβ) that give rise to theHCSE F-type test:
(y)=(Rbβ − r)>[R \(bβ)R]−1(Rbβ − r) 0v
2() (18)
where R=I:=diag(1 1 1) and \(bβ) is given in (3). For the probing of the
relevant error probabilities we consider six scenarios:
(a) using two different values for the degrees of freedom parameter (), =4 =8,
(b) using four sample sizes, =50 =100 =200
The idea is that increasing renders the tails of the underlying Student’s t distri-
bution less leptokurtic and closer to the Normal. For comparison purposes, the Stan-
dard Normal [N(0 1)] is plotted together with the Student’s t densities [St(0 1; )]
for =4 and =8 in the figure below. note that the St(0 1; ) densities are rescaledto ensure that their variance is also unity and not
−2 , to render them comparable
to the N(0,1). The rescaled Student’s t density is:
(; 0 1 )=·Γ( +12 )√Γ( 2)
(1+2
)−(
+12 ) for = =
p
−2
13
yielding: ()=0 ()=1
-4 -3 -2 -1 0 1 2 3 4
0.1
0.2
0.3
0.4
0.5
x
f(x) St(0,1;4)
St(0,1;8)
N(0,1)
To get some idea about the differences in terms of the tail area:
P(|| 2)=⎧⎨⎩ 0455 for N(0,1)
0805 for St(8)
1161 for St(4)
The kurtosis coefficient of the Normal distribution is mesokurtic since 3=3 but the
Student’s t is leptokurtic:
3=(−)3√ ()
3=3+ 6(−4)
which brings out the role of the degrees of freedom parameter It is important to
note that the degrees of freedom of the conditional distribution associated with the
Student’s t Linear Regression model in table 4 increases with the number of regressors,
i.e. +
The different sample sizes are chosen to provide a guide as to how large is ‘large
enough’ for asymptotic results to be approximately acceptable. These choices for
( ) are based on shedding maximum light on the issues involved using the smallest
number of different scenarios. Numerous other choices were used before reducing the
scenarios to 6 that adequately represent the simulation results.
The relevant error probabilities of the HCSE-based test are compared to F-test
(16) based on the Maximum Likelihood Estimators (MLEs) in table 4.
Actual type I error probability (size) of HCSE-based tests.
Figure 1 summarizes the size properties of the following forms of the F-test:
(i) OLS: bβ in conjunction with 2(X>X)−1(ii) HCSE-HW: bβ in conjunction with the Hansen-White estimator of (bβ)
14
(iii) HCSE-NW: bβ in conjunction with the Newey-West estimator of (bβ)(iv) HCSE-A: bβ in conjunction with the Andrews estimator of (bβ) and(v) F-MLE: the F-test based on the MLEs of both β and its covariance matrix
for the Student’s t Linear Model in table 4, using its asymptotic version in (16) to
ensure comparability with the F-type tests (i)-(iv).
Fig. 1: Size of F-HCSE-based tests for different ( )
The general conclusion is that all the heteroskedasticity robustified F-type tests
exhibit serious discrepancies between the actual and nominal type I error of 05. The
actual type I error is considerably greater than the nominal for =4 and all sample
sizes from =50 to =200, ranging from 12 to 37. The only case where the size
distortions are smaller is the case =200 and =8 but even then the discrepancies
will give rise to unreliable inferences. In contrast, the F-MLE has excellent size
properties for all values of and What is also noticeable is the test in (i) is not
uniformly worse than the other HCSE-based tests in terms of its size. Any attempt
to rank the HCSE-based F-tests in terms of size distortion makes little sense because
they all exhibit serious discrepancies from the nominal size.
The power properties of the HCSE-based tests.
The general conclusion is that the power curves of the HCSE-based tests (i)-(iv)
are totally dominated by the F-MLE test by serious margins for all scenarios with
=4, indicating that the degree of leptokurtosis has serious effects not only on the
size but also the power of HCSE-based tests; see figures 2-3, 6-7, 10-11. The power
domination of the F-MLE test becomes less pronounced, but still significant, for
scenarios with =8 as increases; see figures 4-5, 8-9 and 12-13.
Figure 2, depicts the power functions of the HCSE-based tests (i)-(iv) for =50
and =4 and different discrepancies from the null. The plot indicates serious power
15
distortions for all tests in (i)-(iv). These distortions are brought out more clearly in
the size-corrected power given in figure 3. For instance, in the case of 75 discrepancy
from the null the HCSE-HW has power 47 and the F-MLE has power equal to 94!
The best from the HCSE-based tests have power less than 74; that is a huge loss of
power for =50
Fig. 2: Power of F-HCSE-based tests for =50 =4
Fig. 3: Size-adjusted power of F-HCSE-based tests for =50 =4
16
Fig. 4: Power of F-HCSE-based tests for =50 =8
Fig. 5: Size-adjusted power of F-HCSE-based tests for =50 =8
What is worth noting in the case (=50 =8) is that the robust versions of the
covariance matrix are worse than the OLS with the traditional covariance matrices
in terms of the error-reliability of the F-type test. This calls into question any claims
that the use of the HCSE is always judicious and carries no penalty. For instance,
for .5 discrepancy from the null the best of these tests has power .63 but the F-MLE
test has power .96.
17
Fig. 6: Power of F-HCSE-based tests for =100 =4
Fig. 7: Size-adjusted power of F-HCSE-based tests for =100 =4
In the case (=100 =4) there is a significant loss of power associated with the
HCSE-based F-tests is that the robust versions of the covariance matrix are worse
than the OLS with the traditional covariance matrices in terms of the error-reliability
of the F-type test. This calls into question any claims that the use of the HCSE is
always judicious and carries no penalty.
18
Fig. 8: Power of F-HCSE-based tests for =100 =8
Fig. 9: Size-adjusted power of F-HCSE-based tests for =100 =8
The HCSE-based tests show less power distortions in the scenario (=100 =8)
than previous scenarios, but there size distortion renders them unreliable enough to
be of questionable value in practice.
19
Fig. 10: Power of F-HCSE-based tests for =200 =4
Fig. 11: Size-adjusted power of F-HCSE-based tests for =200 =4
Despite the increase in the sample size in the case (=200 =4) there are signif-
icant power distortions associated with the HCSE-based F-tests, stemming primarily
from the leptokurticity of the underlying distribution. This calls into question the
pertinency of the recommedation to ignore departures from Normality.
20
Fig. 12: Power of F-HCSE-based tests for =200 =8
Fig. 13: Size-adjusted power of F-HCSE-based tests for =200 =8
The scenario (=200 =8) represents the best case in terms of the distortions
of power associated with the HCSE-based F-tests, but even in this case the size
distortions are significant enough (at least double the nominal size) to render the
reliability of these tests questionable.
21
5 Monte Carlo Experiment 2: HACSE
5.1 Traditional perspective
This is the case where the error term assumptions (3)-(4), (uu> | X)=2I arerelaxed together by replacing this assumption with:
(3)*-(4)*: (uu> | X)=V (19)
where V:=[]=1 =( | X). In this case the relevant G=X
>VX is:
G =¡1
P
=1 xx>
¢=
= 1
P
=1(2xx
> )+
1
P−1=1
P
=+1(x−x>−+x−−x
> )
(20)
When only the assumption of homoskedasticity (3) is relaxed, only the first term is
relevant, but when both are relaxed the other term represents the temporal depen-
dence that stems from allowing for error autocorrelation.
An obvious extension of the White (1980) estimator bΓ(0)= ¡ 1
P
=1 b2 (xx> )¢of G=X
>VX in the case where there are departures from both homoskedasticity
and autocorrelation, is to use the additional term in (20) that involve cross-products
of the residuals with their lags:
bΓ()= 1
P
=+1 bb−xx>− = 1 2 Hansen (1992) extended the White estimator to derive an estimator ofG=X
>VX :
Hansen-White: bG=bΓ(0)+P
=1
hbΓ() + bΓ>()i 0Newey and West (1987) used a weighted sum (based on the Bartlett kernel) of these
matrices to estimate G:
Newey-West: bG=bΓ(0)+P
=1
³(1−
+1)hbΓ() + bΓ>()i´ 0
Andrews (1991) proposed different weights (1 2 ) :
Andrews: bG=bΓ(0)+P
=1
hbΓ()+bΓ>()i Using any one of these estimators bG ofG yields a Heteroskedasticity-Autocorrelation
consistent estimator of the SE of bβ:
^(bβ) =
1
h¡( 1X>X)−1
¢ bG
¡( 1X>X)−1
¢i
22
5.2 A broader probabilistic perspective for HACSE
How does one place the above departures from the error assumptions (3)-(4), per-
taining to homoskedasticity and non-autocorrelation into a broader but coherent per-
spective? The above estimators of G=X>VX make it clear that the temporal
dependence in data Z0 is modeled via the error term by indirectly imposing a certain
probabilistic structure on the latter such as -correlation (MA(p)) or some form of
asymptotic non-correlation; see White (1999), pp. 147-164.
Is bβ unbiased and consistent under (4)*? Viewing the testing proceduresusing the HAC standard errors from the broader perspective of the LR model — based
on the regression and skedastic functions in (11) — what is most surprising is that the
original regression function is retained in its static form but the skedastic function is
extended to include some form of error autocorrelation. How could that be justified?
The traditional discussion pertaining to the presence of error autocorrelation is that
the OLS estimator bβ=(X>X)−1X>y retains its unbiasedness and consistency. Thisclaim, however, depends crucially on the validity of the common factor restrictions in
(7). When these restrictions are invalid for data Z0, bβ is both biased and inconsistent,which will give rise to majorly unreliable inferences; see Spanos (1986). To bring out
how restrictive the traditional strategy adopting the HACSE robustification is we will
consider four different scenarios for the simulations that follow.
Scenario 1: no common factor restrictions are imposed and the modeler
estimates the Dynamic Linear Regression (DLR) model.
Scenario 2: no common factor restrictions are imposed and the modeler
estimates the Linear Regression (LR) model, ignoring the dynamics in the
autoregressive function.
Scenario 3: common factor restrictions are imposed and the modeler
estimates the Dynamic Linear Regression (DLR) model.
Scenario 4: common factor restrictions are imposed and the modeler
estimates the Linear Regression (LR) model.
The broader probabilistic perspective suggests that both, the regression and skedas-
tic functions, are likely to be affected by the presence of any form of temporal de-
pendence in data Z0. Hence, the natural way to account for such dependence is to
respecify the original model by expanding the relevant conditioning information set
to include the past history of the observable process Z:=(X) ∈N:( | XZ−1)=(XZ−1) ( | XZ−1)=(xZ−1) (x:z−1)∈R+
(21)
That is, any form of temporal dependence in Z0 should be modeled directly in terms
of the stochastic process Z:=(X) ∈N.This gives rise to an extension of the Student’s t, Linear/Heteroskedastic model
that includes lags of this process. The Student’s t Dynamic Linear Regression (DLR)
23
model (table 5) can be viewed as a particular parameterization of the vector (× 1)stochastic process Z:=(X) ∈N, assumed to be Student’s t with degrees of
freedom (df), Markov and stationary. The probabilistic reduction takes the form:
(Z1 Z;φ) =Q
=1(Z|Z−1;ϕ)=Q
=1(|XZ−1;ψ1)·(X|Z−1;ψ2)
with (|XZ−1;ψ1) underlying the specification of the St-DLR model in table 5.
Note that it is trivial to increase the number of lags to ≥ 1 by replacing Markovnesswith Markoveness of order
Table 5: Student’s t, Dynamic Linear/Heteroskedastic model
Statistical GM: = 0 + β>1X + β
>2 Z−1 + ∈N
[1] Student’s t: (ZZ−1 Z;ϕ) is Student’s t with df,
[2] Linearity: (|(XZ−1))=0 + β>1X + β
>2 Z−1
[3] Heteroskedasticity: (|(XZ−1))=³
2
+∗−2
´³1+22
´ ∗=2+1
2=12[(X-μ2)
>Q1(X-μ2)+(Z−1-μ)>Q2(Z−1-μ)[4] Markov: Z ∈N is an Markov process,[5] t-invariance: θ:=(0β1β2
20μQ1Q2) are t-invariant.
The estimation of the above model is based on the same log-likelihood function (see
Appendix) as the model in table 2, but now the regressor vector isX∗ :=¡X Z−1
¢whose number of elements is ∗=2+1
5.3 Scenario 1 for HACSE
For this case, no common factor restrictions are imposed and the modeler accounts for
the temporal structure in the regression function by estimating the Dynamic Linear
Regression (DLR) model.
This design will be based on the same mean and covariance matrix for both the
Normal and the Student’s t distributions:⎛⎜⎜⎜⎜⎜⎜⎝12−11−12−1
⎞⎟⎟⎟⎟⎟⎟⎠ ∼ ES⎡⎢⎢⎢⎢⎢⎢⎣
⎛⎜⎜⎜⎜⎜⎜⎝3
18
8
3
18
8
⎞⎟⎟⎟⎟⎟⎟⎠
⎛⎜⎜⎜⎜⎜⎜⎝120 070 −040 080 050 −038070 100 020 050 070 010
−040 020 100 −038 010 075
080 050 −038 120 070 −040050 070 010 070 100 020
−038 010 075 −040 020 100
⎞⎟⎟⎟⎟⎟⎟⎠
⎤⎥⎥⎥⎥⎥⎥⎦(22)
that gives rise to the true model parameters:
0=1159 β>1 = (803 −0453 0424 −0336 0116) 2=330
In the case of Normality, this gives rise to the following Dynamic Linear Regression
Model (DLR(1)):
=1159+8031−4532+424−1−3361−1+1162−1+ ∈N (23)
24
The initial conditions: 20=33, 0 ∼N(3 12), 10 ∼N(18 1), and 20 ∼N(8 1).In the case of a Student’s t distribution with degrees of freedom, (23) remains
the same but the conditional variance := (|(XZ−1)) takes the form:
:= 2
+2−1
⎛⎜⎜⎜⎜⎜⎝1+ 1
⎡⎢⎢⎢⎢⎢⎣⎛⎜⎜⎜⎜⎝
1-18
2-8
−1-31-1-18
2-1-8
⎞⎟⎟⎟⎟⎠>⎛⎜⎜⎜⎜⎝
2.23 -.822 -.011 -1.61 .712
-.822 2.623 .229 .533 -1.90
-.011 .229 2.48 -1.2 1.22
-1.61 .533 -1.2 3.83 -1.81
.712 -1.9 1.22 -1.81 3.20
⎞⎟⎟⎟⎟⎠⎛⎜⎜⎜⎜⎝
1-18
2-8
−1-31-1-18
2-1-8
⎞⎟⎟⎟⎟⎠⎤⎥⎥⎥⎥⎥⎦
⎞⎟⎟⎟⎟⎟⎠The simulations aim to investigate the discrepancies between the actual (empir-
ical) error probabilities and the nominal ones (size and power) associated with the
different estimators of (bβ)=G=X>VX bG bG bG =3 under:
(a) varying the values of the degrees of freedom parameter (), =4 =8,
(b) varying the sample size =50 =100 =200
Actual type I error probability (size) of HCSE-based tests.
Figure 1 summarizes the size properties of the following forms of the F-test:
(i) OLS: bβ in conjunction with 2(X>X)−1
(ii) HACSE-HW: bβ in conjunction with the Hansen-White estimator ofG=X>VX
(iii) HACSE-NW: bβ in conjunction with the Newey-West estimator of G=X>VX
(iv) HACSE-A: bβ in conjunction with the Andrews estimator of G=X>VX
(v) F-MLE: the F-test based on the MLEs of both β and its covariance matrix for
the Dynamic Student’s t Linear Regression Model in table 5.
It is important to emphasize at the outset that the scenario used for the sim-
ulations that follow includes estimating the DLR(1) in (23) because ignoring the
dynamics and just estimating 0=1−β>1 μ2 β1=Σ−122 σ21 2=11−σ>21Σ−122 σ21) willgive rise to inconsistent estimators, and thus practically useless F-tests with or with-
out the robustification; see Spanos and McGuirk (2001).
The general impression conveyed by fig. 14 is that all the HACSE robustified
F-type tests exhibit even more serious discrepancies between the actual and nominal
type I error of 05 than those in figure 1. The actual type I error is considerably
greater than the nominal for =4 and all sample sizes from =50 to =200, ranging
from 15 to 61. The only case where the size distortions are somewhat smaller is the
case =200 and =8 but even then the discrepancies will lead to seriously unreliable
inferences. In contrast, the F-MLE has excellent size properties for all values of
and What is also noticeable is the test in (i) is not uniformly worse than the other
HACSE-based tests in terms of its size. Indeed, for =8 =50 =100 the OLS-
based F-test has less size distortions than the robustified alternative F-tests. Any
attempt to rank the HACSE-based F-tests in terms of size distortion makes little
sense because they all exhibit serious discrepancies from the nominal size.
25
Fig. 14: Size of F-HACSE-based tests for different ( )
The power properties of the HACSE-based tests.
Figure 15, shows the power functions of all the HACSE-based F-tests for =50
and =4 and different discrepancies from the null. The plot indicates even more
serious power distortions for all the HACSE-based F-tests than those in figure 2.
These distortions are brought out more clearly in the size-corrected power given in
figure 16. For instance, in the case of .75% discrepancy from the null the HCSE-HW
has power .44 and the F-MLE has power equal to .99! Worse, the OLS-based F-test
does better in terms of power than any of the HACSE-based F-tests. Indeed, the
HACSE-HW is dominated by all the other tests in terms of power.
The power curves for all the HACSE-based F-tests are totally dominated by the
F-MLE test by serious margins for all scenarios with =4, indicating that the degree
of leptokurtosis has serious effects not only on the size but also the power of HCSE-
based tests; see figures 19-20, 23-24. For scenarios with =8 as increases, the
power domination of the F-MLE test becomes less pronounced, but still significant;
see figures 17-18, 21-22 and 25-26.
26
Fig. 15: Power of F-HACSE-based tests for =50 =4
Fig. 16: Size-adjusted power of F-HACSE-based tests for =50 =4
27
Fig. 17: Power of F-HACSE-based tests for =50 =8
Fig. 18: Size-adjusted power of F-HACSE-based tests for =50 =8
Despite the increase in the degrees of freedom parameter to =8 all the HACSE-
based F-tests for =50 suffer from serious power distortions, with the OLS-based
F-test dominating them in terms of both size and power; see figures 17-18.
28
Fig. 19: Power of F-HACSE-based tests for =100 =4
Fig. 20: Size-adjusted power of F-HACSE-based tests for =100 =4
The serious distortions in both size and power observed above continue to exist
in the scenario: (=100 =4) suggesting that the leptokurtosis has a serious effect
on both; see figures 19-20. This particular departure from Normality can be ignored
at the expense of the reliability of inference.
29
Fig. 21: Power of F-HACSE-based tests for =100 =8
Fig. 22: Size-adjusted power of F-HACSE-based tests for =100 =8
When the degrees of freedom increases to =8 the distortions in size and power
are moderated but they still serious enough to call into question the reliability of any
inference based on the HACSE-based F-tests; see figures 21-22.
30
Fig. 23: Power of F-HACSE-based tests for =200 =4
Fig. 24: Size-adjusted power of F-HACSE-based tests for =200 =4
The scenario (=200 =4) brings out the adverse effects of leptokurticity on both
the size and the power of the HACSE-based F-tests despite the large sample size.
31
Fig. 25: Power of F-HACSE-based tests for =200 =8
Fig. 26: Size-adjusted power of F-HACSE-based tests for =200 =8
5.4 Scenario 2 for HACSE
No common factor restrictions are imposed and the modeler estimates the Linear
Regression (LR) model, ignoring the dynamics in the autoregressive function, i.e.
estimation and the F-test focus exclusively on the static LR model.
###################################
32
5.5 Scenario 3 for HACSE
For this case, common factor restrictions are imposed and the modeler takes into
account the temporal structure by estimating the Dynamic Linear Regression (DLR)
model.
As first shown by Sargan (1964), modeling the Markov dependence of the process
Z:=(X) ∈N using an AR(1) error term:=0+β
>1 x + =−1+
constitutes a special case of the DLR model in (6) subject to common factor restric-
tionsα3+α12=0 It can be shown (McGuirk and Spanos, 2009) that these parameter
restrictions imply a highly unappetizing temporal structure for Z:=(X) ∈NThe common factor restrictions transform the unrestricted Σ into Σ∗:
(ZZ−1)=Σ=
⎛⎜⎜⎝11(0) σ>21(0) 11(1) σ>21(1)σ21(0) Σ22(0) σ>21(1) Σ>
22(1)
11(1) σ21(1) 11(0) σ>21(0)σ21(1) Σ22(1) σ21(0) Σ22(0)
⎞⎟⎟⎠
Σ∗=
⎛⎜⎜⎜⎝β>Σ22(0)β +
2
1−2 β>Σ22(0) β>Σ22(1)β+2
1−2 β>Σ22(1)
Σ22(0)β Σ22(0) β>Σ22(1) Σ22(1)
β>Σ22(1)β+2
1−2 Σ22(1)>β β>Σ22(0)β+
2
1−2 β>Σ22(0)
Σ22(1)β Σ22(1) Σ22(0)β Σ22(0)
⎞⎟⎟⎟⎠The nature of the common factor restrictions can be best appreciated in the context
of a VAR(1) model:
Z = A>Z−1 +E E v NIID(0Ω) ∈ N (24)
entailed by the restricted Σ∗, which takes the form:
A> =
µ (D− ρI)β0 D
¶ Ω=
µ2 + β>Λβ β>Λ
Λβ Λ
¶D =Σ22(0)
−1Σ22(1) Λ=Σ22(0)−Σ22(1)Σ22(0)−1Σ22(1)(25)
The question that naturally arises at this stage is whether the error-reliability
of the F-test will improve by imposing such a highly restrictive temporal structure
on the process Z:=(X) ∈N To ensure the validity of these restrictions thevariance-covariance of the scenario 1 needs to be modified slightly to:⎛⎜⎜⎜⎜⎜⎜⎝12−11−12−1
⎞⎟⎟⎟⎟⎟⎟⎠∼ES⎡⎢⎢⎢⎢⎢⎢⎣
⎛⎜⎜⎜⎜⎜⎜⎝30
18
80
30
18
80
⎞⎟⎟⎟⎟⎟⎟⎠
⎛⎜⎜⎜⎜⎜⎜⎝1.20 .700 -.400 .818 .5125 -.34063
.700 1.00 .200 .5125 .700 .100
-.400 .200 1.00 -.34063 .100 .750
.818 .5125 -.34063 1.289 .700 -.400
.5125 .700 .100 .700 1.00 .200
-.34063 .100 .750 -.400 .200 1.00
⎞⎟⎟⎟⎟⎟⎟⎠
⎤⎥⎥⎥⎥⎥⎥⎦ (26)
33
This gives rise to the true model parameters that do satisfy the common factor
restrictions:
0=11448 β>1 =(81249 −56248 424 −3445 23848) 2=317
In the case of Normality, this gives rise to the following Dynamic Linear Regression
Model (DLR(1)):
=1145+8121−5622+424−1−3451−1+2382−1+ ∈N (27)
#########################
5.6 Scenario 4 for HACSE
Common factor restrictions are imposed and the modeler estimates the Linear Regres-
sion (LR) model, ignoring the dynamics in the autoregressive function, i.e. estimation
and the F-test focus exclusively on the static LR model.
###################################
6 Summary and conclusions
The primary aim of the paper has been to investigate the error-reliability of HCSE/HACSE-
based tests using Monte Carlo simulations. For the design of the appropriate simula-
tion experiments it was important to view the departures from the homoskedasticity
and autocorrelation assumptions from the broader perspective of regression models
based on the first two conditional moments of the same distribution. It was argued
that viewing such models from the error term perspective provides a narrow and often
misleading view of these departures and the ways they can be accounted for.
The simulation results call into question the conventional wisdom of viewing
HCSE/HACSE as robustified versions of the OLS covariances. Robustness in this
case can only be evaluated in terms of how closely the actual error probabilities
approximate the nominal ones. In terms of the latter, it is shown that the various
HCSE and HACSE-based F-tests give rise to major size and power distortions. These
distortions are particularly pernicious in the case of the presence of leptokurticity. Al-
though further Monte Carlo simulations will be needed to get a more general picture
of the error-reliability of these tests, the results in this paper call into question their
wide-spread use in practice because they do not, in general, give rise to reliable test-
ing results. Hence, the recommendation to ignore departures from Normality and
account for departures from Homoskedasticity/Autocorrelation using HCSE/HACSE
is highly misleading.
In conclusion, it is important to emphasize that the above simulation results are
based on the best case scenario for these HCSE and HACSE-based F-tests where the
probabilistic assumptions of [2] linearity, [5] t-invariance and [1] bell-shape symmetry
of the underlying conditional distribution are retained. When any of these assump-
tions are invalid for the particular data, the discrepancy between actual and nominal
34
error probabilities for the HCSE and HACSE-based F-tests is likely to be much worse
than the above results indicate.
References
[1] Ali, M.M. and S.C. Sharma (1996), “Robustness to nonnormality of regression
F-tests,” Journal of Econometrics, 71: 175—205.
[2] Andrews, D.W.K. (1991), “Heteroskedasticity and Autocorrelation Consistent
Covariance matrix estimation”, Econometrica, 59: 817-854.
[3] Bahadur, R. R. L. J. Savage (1956), “The Nonexistence of Certain Statistical
Procedures in Nonparametric Problems,” The Annals of Mathematical Statistics,
27: 1115-1122.
[4] Chu, K.-C. (1973), “Estimation and decision for linear systems with elliptical
random processes”, IEEE Transactions on Automatic Control, 18: 499-505.
[5] Engle, R.F., D.F. Hendry and J.F. Richard (1983), “Exogeneity”, Econometrica,
51: 277-304.
[6] Fang, K.-T., S. Kotz and K-W. Ng. (1990), Symmetric Multivariate and Related
Distributions, Chapman and Hall, London.
[7] Hansen, B.E. (1992) “Consistent covariance matrix estimation for dependent
heterogenous processes”, Econometrica, 60: 967-972.
[8] Hansen, B.E. (1999), “Discussion of ‘Data mining reconsidered’”, The Econo-
metrics Journal, 2: 192-201.
[9] Johnson, M.E. (1987), Multivariate Statistical Simulation, Wiley, NY.
[10] Kelker, D. (1970), “Distribution theory of spherical distributions and a location-
scale parameter”, Sankhya A, 32, 419-30.
[11] Kiefer, N.M. and T.J. Vogelsang (2005), “A new asymptotic theory for
heteroskedasticity-autocorrelation robust tests”, Econometric Theory, 21: 1130-
1164.
[12] Mayo, D.G. and A. Spanos. (2006), “Severe Testing as a Basic Concept in a
Neyman-Pearson Philosophy of Induction,” The British Journal for the Philos-
ophy of Science, 57: 323-357.
[13] McGuirk, A. and A. Spanos (2009), “Revisiting Error Autocorrelation Correc-
tion: Common Factor Restrictions and Granger Non-Causality,” Oxford Bulletin
of Economics and Statistics, 71: 273-294.
[14] Newey, W.K. and K.D. West (1987), “A simple, positive semi-definite, het-
eroskedasticity and autocorrelation consistent covariance matrix”, Econometrica,
55: 703-708.
[15] Nimmo-Smith, I. (1979), “Linear regressions and sphericity”, Biometrika, 66,
390-2.
[16] Phillips, P.C.B. (2005a), “Automated Discovery in Econometrics”, Econometric
Theory, 21: 3-20.
35
[17] Phillips, P.C.B. (2005b), “Automated Inference and the Future of Econometrics”,
Econometric Theory, 21: 116-142.
[18] Robinson, P. (1998), “Inference without smoothing in the presence of nonpara-
metric autocorrelation”, Econometrica, 66: 1163-1182.
[19] Sargan, D.J. 1964. Wages and Prices in the U.K.: A Study in Econometric
Methodology. In Paul Hart, Gary Mills and John K. Whitaker, (Eds.) Econo-
metric Analysis for National Economic Planning, vol. 16 of Colston Papers.
London: Butterworths, 25-54.
[20] Spanos, A., (1986), Statistical Foundations of Econometric Modelling, Cam-
bridge University Press, Cambridge.
[21] Spanos, A. (1999), Probability Theory and Statistical Inference: econometric
modeling with observational data, Cambridge University Press, Cambridge.
[22] Spanos, A. (1994), “On modeling heteroskedasticity: the Student’s t and Ellip-
tical linear regression models,” Econometric Theory, 10: 386-415.
[23] Spanos, A. (1995), “On Normality and the Linear Regression Model”, Econo-
metric Reviews, 14: 195-203.
[24] Spanos, A. (2006), “Revisiting the Omitted Variables Argument: Substantive
vs. Statistical Adequacy,” Journal of Economic Methodology, 13: 179—218.
[25] White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estima-
tor and Direct Test for Heteroskedasticity”, Econometrica, 48: 817-838.
[26] White, H. (1999), Asymptotic Theory for Econometricians, revised edition, Aca-
demic Press, London.
36
7 Appendix: Student’s t log-likelihood function
Let Z:=(1 2 )> be a dimensional vector. Z is said to have -variate
Student’s distribution with degrees of freedom (df) , location vector μ and scaling
matrix Σ denoted by:
Z vSt (μΣ;) ∈Nwhen for θ=(μ,Σ) (Z)=μ (Z)=
−2Σ the joint probability density func-
tion is:
(z;θ)=Γ((+)2)
()2Γ(2)|Σ|12£1+ 1
(z −μ)>Σ−1(z −μ)
¤−(+)2 (28)
Let Z:=(X) the reduction in (12) yields:
where:
(|x;θ1)=Γ((+)2)||−12
()12Γ((+)2)
³1+
(−0−>1 X)2
´−(+)2=
2¡1+(X−μ2)>Q(X−μ2)
¢ Q=
1
Σ−122 =−1
(x;θ2)=Γ((+)2)|Q|12
2Γ(2)
£1+(x−μ2)>Q(x−μ2)
¤−(+)2In light of the fact that the parameters of the conditional and marginal densities, θ1and θ2 respectively, are not variation free (see Spanos, 1994), the relevant likelihood
for the estimation of θ1 is the product of the two densities:
(θ1θ2; z)∝Q
=1
∙||−
12
³1+
(−0−>1 X)2
´−(+)2¸ h|Q|12 £1+(x−μ2)>Q(x−μ2)¤−(+)2i
Hence, the log-likelihood function takes the form:
ln(θ1θ2; z) ∝ −12P
=1 ln − (+)
2
P
=1 ln³1+
(−0−>1 X)2
´+
+2ln(det(Q))− (+)
2
P
=1 ln£1+(x−μ2)>Q(x−μ2)
¤Maximization of ln(θ1θ2; z) with respect to θ1 does not yield close form solutions
for these parameters but one can express the MLE of β>:=(0β>1 ) takes a General-
ized Least Squares form: bβ = (X> bV−1X)−1X>bΩ−1y
where bΩ takes the form:bΩ=diag (b1 b2 b) b=
+−2
³1+ 1
(X−bμ2)>bΣ−122 (X−bμ2)´ =1
where bμ2 and bΣ22 denote the MLE of μ2 and Σ22; see Spanos (1994).
37
Recommended