Heteroskedasticity/Autocorrelation Consistent Standard ... · to which this strategy addresses the...

Heteroskedasticity/Autocorrelation Consistent

Standard Errors and the Reliability of Inference

Aris Spanos

Department of Economics,

Virginia Tech, USA

James J. Reade

Department of Economics,

University of Reading, UK

April 2015 [First draft]

Abstract

The primary aim of the paper is to investigate the error-reliability of F

tests that use Heteroskedasticity-Consistent Standard Errors (HCSE) and Het-

eroskedasticity and Autocorrelation-Consistent Standard Errors (HACSE) us-

ing Monte Carlo simulations. For the design of the appropriate simulation

experiments a broader perspective for departures from the homoskedasticity

and autocorrelation assumptions is proposed to avoid an internally inconsis-

tent set of probabilistic assumptions. Viewing regression models as based on

the first two conditional moments of the same conditional distribution brings

out the role of the other probabilistic assumptions such as Normality and Lin-

earity, and provides a more coherent framework. The simulation results under

the best case scenario for these tests show that all the HCSE/HACSE-based

tests exhibit major size and power distortions. The results call seriously into

question their use in practice as ways to robustify the OLS-based F-test.

1 Introduction

The basic objective of the paper is to revisit certain aspects of the traditional strate-

gies in dealing with departures from the homoskedasticity and autocorrelation as-

sumptions in the context of the Linear Regression (LR) model. In particular, the

paper aims to appraise the error-reliability of Heteroskedasticity-Consistent Standard

Errors (HCSE) (White, 1980) and its extension to Heteroskedasticity and Autocorre-

lation Consistent Standard Errors (HACSE); see Newey and West (1987), Andrews

(1991), Hansen (1992), Robinson (1998), Kiefer et al (2005).

The conventional wisdom relating to the use of HCSE has been articulated aptly

by Hansen (1999) in the form of the following recommendation to practitioners:

“... omit the tests of normality and conditional heteroskedasticity, and replace

all conventional standard errors and covariance matrices with heteroskedasticity

-robust versions.” (p. 195)

The heteroskedasticity-robust versions of the conventional standard errors and co-

variance matrices refers to HCSE/HACSE as they pertain to testing hypotheses con-

cerning the unknown regression coefficients.

The question raised by the above recommendation is how one should evaluate the

robustness of these procedures. It is argued that the proper way to do that is in

terms of the error-reliability of the inference procedures it gives rise to. In the case

of estimation one needs to consider the possibility that the robustness strategy gives

rise to non-optimal (e.g. inconsistent) estimators. In the case of testing, robustness

should be evaluated in terms of the discrepancy between the relevant actual and

nominal (assumed) error probabilities. As argued by Phillips (2005a):

“Although the generality that HAC estimation lends to inference is appealing, our en-

thusiasm for such procedures needs to be tempered by knowledge that finite sample

performance can be very unsatisfactory. Distortions in test size and low power in testing

are both very real problems that need to be acknowledged in empirical work and on which

further theoretical work is needed.” (p. 12)

The key problem is that any form of statistical misspecification, including the pres-

ence of heteroskedasticity/autocorrelation is likely to induce a discrepancy between

the nominal and actual error probabilities of any test procedure, and when this dis-

crepancy is sizeable, inferences are likely to give rise to erroneous results. The surest

way to an unreliable inference is to use a 05 significance level test but the actual

type I error is closer to 90. Such discrepancies can easily arise in practice even with

are often considered ‘minor’ departures from certain probabilistic assumptions; see

Spanos and McGuirk (2001). Hence, the need to evaluate such discrepancies asso-

ciated with the robustified procedures using HCSE/HACSE to ensure that they are

reliable enough for inference purposes.

The primary aim of the discussion that follows is twofold. First, to place the

traditional recommendation of ignoring non-Normality and using HCSE/HACSE for

inference in conjunction with the OLS estimators in a broader perspective that sheds

light on what is being assumed for the traditional account to hold and what are the

possible errors that might undermine this account. Second, to evaluate the extent

to which this strategy addresses the unreliability of inference problem raised by the

presence of heteroskedasticity/autocorrelation. Section 2 revisits the robustification

of inference procedures when dealing with departures from non-Normality and het-

eroskedasticity and calls into question some of the presumptions underlying the use of

the HCSE for inference purposes. It is argued that specifying the probabilistic struc-

ture of the LR model in terms of the error term provides a very narrow perspective on

the various facets of modeling and inference, including specification (specifying the

original model), misspecification testing (evaluating the validity of the model’s as-

sumptions) and respecification (respecifying the original model when found wanting).

Section 3 presents a broader probabilistic perspective that views statistical models

as parameterizations of the observable processes Z:=(X) ∈N underlying thedata Z0:=(x) =1 2 . When viewed in the broader context, the proba-bilistic assumptions specifying the LR model are shown to be interrelated, and thus

relaxing one at a time can be misleading in practice. In addition, the traditional

perspective on HCSE/HACSE is shown to involve several implicit assumptions that

might be inappropriate in practice. For example, the use of the HACSE raises the

possibility that the OLS estimator might be inconsistent when certain (implicit) re-

strictions imposed on the data are invalid. Section 4 uses Monte Carlo simulations

with a view to investigate the effectiveness of the HCSE strategies vis-a-vis the relia-

bility of inference pertaining to linear restrictions on the regression coefficients. The

simulation results are extended in section 5 to include the error-reliability of testing

procedures that use the HACSE. The simulation results call into question the wide-

spread use of the HCSE/HACSE-based tests because they are shown to have serious

size and power distortions that would lead inferences astray in practice.

2 Revisiting non-Normality/Heteroskedasticity

2.1 The traditional perspective on Linear Regression

The traditional Linear Regression (LR) model (table 1) is specified in terms of the

probabilistic assumptions (1)-(4) pertaining to the error term process (|X=x) ∈N,where x denotes the observed values of regressors X.

Table 1 - The Linear Regression (LR) Model

= 0 + β>1 x + ∈N=(1 2 )

(1) Normality: (|X=x) v N( ),(2) Zero mean: (|X=x)=0

(3) Homoskedasticity: (2 |X=x)=2

(4) No autocorrelation: (|X=x)=0 for

For inference purposes the LR model is framed in terms of the data:

Z0:=(x) =1 2 Z0:=(y:X1) y:(× 1) X1:(× )

and expressed in the matrix notation:

y=Xβ + u

where X:=(1:X1) is an (× [+1]) matrix, 1 is a column of 1’s and β>:=¡0:β

To secure the data information adequacy, the above assumptions are supplemented

with an additional assumption:

(5) No collinearity: Rank(X)=+1

The cornerstone of the traditional LR model is the Gauss-Markov theorem for the

‘optimality’ of the OLS estimator:bβ = (X>X)−1X>y

as Best Linear Unbiased Estimator (BLUE) of β under the assumptions (2)-(5), i.e.,bβ has the smallest variance (relatively efficient) within the class of linear and unbiasedestimators.

The above recommendation to ignore any departures from assumptions (1) and

(3) and use HCSE, stems from four key presumptions.

The first presumption, stemming from the Gauss-Markov theorem, is that the

Normality assumption does not play a key role in securing the optimality of the OLS

estimator for inference purposes.

The second presumption is that all inferences based on the OLS estimators bθ:=(bβ 2):2= 1

−−1bu>bu bu=y−Xbβof θ:=(β2) can be based on the asymptotic approximations:bβ v

N(β2(X>X)−1) 2

P→ 2 (1)

where ‘v’ denotes ‘asymptotically distributed’ and ‘

P→’ denotes ‘convergence in prob-ability’.

The third presumption is that when the homoskedasticity assumption in (3) is

invalid, and instead:

(3)* (uu>)=Λ=diag(21

2)6=σ2I (2)

the OLS estimator bβ remains unbiased and consistent, but is less efficient relative tothe GLS estimator eβ= ¡X>Λ−1 X

¢−1X>Λ−1 y since:

(bβ)=(X>X)−1X>ΛX(X>X)−1 ≥ (eβ)= ¡X>Λ−1 X

¢−1

In light of that, one could proceed to do inference using bβ if only a consistent esti-mator of (bβ) can be found. White (1980) argued that, although estimating Λ

suffers from the incidental parameter problem — the number of unknown parameters

(21 22

2) increases with — estimating G=

¡X>ΛX

¢does not since it involves

only 12(+1) unknown terms. He proposed a consistent estimator of G:bΓ(0)= ¡ 1

=1 b2 (xx> )¢ that gives rise to Heteroskedasticity-Consistent Standard Errors (HCSE) for bβ, byreplacing 2(X>X)−1 in (1) with:

\(bβ)= 1

∙³X>X)

´−1 ¡1

=1 b2xx> ¢ ³X>X)

´−1¸ (3)

The fourth presumption is that one can view the validation of the LR model one

probabilistic assumption at a time. This presumption is more subtle because it is

often insufficiently realized that the assumptions (1)-(4) are interrelated and thus

any form of model validation should take that into account. The misleading im-

pression that departures from these assumptions can be viewed individually stems

partly from the fact that the probabilistic structure is specified of the error process

( | X=x) ∈N=(1 2 ) However, the probabilistic structure that mat-ters for modeling and inference is that of the observable stochastic process Z ∈Nunderlying the data Z0:=(x) =1 2 . This is because the latter definesthe distribution of the sample as well as the likelihood function, that provide the

cornerstones of both frequentist and Bayesian approaches to inference.

2.2 How narrow is the traditional perspective?

All four presumptions behind the recommendation to ignore non-Normality and use

HCSE for testing hypotheses pertaining to β such as:

0: β = 0 vs. 1: β 6= 0 (4)

can be called into question in terms of being vulnerable to several potential errors

that can undermine the reliability of inference.

To begin with, the Gauss-Markov theorem is of very little value for inferences pur-

poses for several reasons. First, the ‘linearity’ of bβ is a phony property, unbiasedness((bβ)=β) without consistency is dubious, and relative efficiency within an artificiallyrestricted class of estimators is of very limited value. In addition, despite the fact

that the theorem pertains to the finite sample properties of bβ it cannot be used asa reliable basis for inference because it yields an unknown sampling distribution forbβ i.e. bβ v ?

D(β 2(X>X)−1) (5)

This provides a poor basis for any form of finite sample inference that involves error

probabilities calling for the evaluation of tail-areas. The problem was demonstrated

by Bahadur and Savage (1956) in the context of the simple Normal model:

v NIID( 2) θ:=( 2)∈R×R+ ∈R ∈N

They showed that when the Normality assumption is replaced with the existence of

the first two moments, no unbiased or consistent t-type test for 0: =0 vs. 1:

6= 0 exist:“It is shown that there is neither an effective test of the hypothesis that = 0,

nor an effective confidence interval for , nor an effective point estimate of . These

conclusions concerning flow from the fact that is sensitive to the tails of the population

distribution; parallel conclusions hold for other sensitive parameters, and they can be

established by the same methods as are here used for .” (p. 1115)

That is, the existence of the first two moments (or even all moments) provides

insufficient information to pin down the tail-areas of the relevant distributions to

evaluate the type I and II error probabilities with any accuracy.

The Bahadur-Savage result explains why, practitioners who otherwise praise the

Gauss-Markov theorem, ignore it when it comes to inference, and instead they appeal

to the asymptotic distributions of bθ:=(bβ 2). To justify the asymptotic approxima-tions in (1), however, calls for supplementing the Gauss-Markov assumptions (2)-(5)

with additional restrictions on the underlying vector stochastic process Z:=(X)

∈N such as:lim→∞

(X>X)=Σ22 0 (P) lim

→∞(X>y)

=21 6=0

Lastly, despite the impression given by the Gauss-Markov theorem that the Nor-

mality assumption has a very limited role to play in inferences pertaining to θ:=(β2),

the error-reliability of inference concerning0 in (4) is actually adversely affected, not

only by the Normality in assumption (1), but also by the ‘non-Normality’ (especially

any form of skewness) of the X ∈N process; see Ali and Sharma (1996).The third presumption is also questionable two reasons. First, the claim about the

GLS being relatively more efficient than the OLS estimator depends on the covariance

matrix V being known, which is never the case in practice. Second, it relies on

asymptotics without any reason to believe that the use of the HCSE will alleviate

the unreliability of inference problem evaluated in terms of the discrepancy between

the actual and nominal error probabilities for a given . Indeed, there are several

potential errors that can give rise to sizeable discrepancies.

The fourth presumption is potentially the most questionable because what is of-

ten insufficiently appreciated is that the error process ( | X=x) ∈N providesa much narrower perspective on the specification, misspecification and respecification

facets of inference because, by definition =−0−β>1 x and thus the error processretains the systematic component ( | X=x)=0 + β>1 x. This was first demon-strated by Sargan (1964) by contrasting the modeling of temporal dependence using

the error term with modeling it using the observable process Z:=(X) ∈N Heshowed that the LR model with an AR(1) error term:

=0+β>1 x + =−1+ → =0(1−) + β>1 x+−1−β>1 x+

is a special case of a Dynamic Linear Regression (DLR) model:

= 0+α>1 x+2−1+α

>3 x−1 + (6)

subject to (non-linear) common factor restrictions:

α3+α12=0 (7)

The DLR model in (6) arises naturally by modeling the Markov dependence directly

in terms of the observable process Z:=(X) ∈N via:

= (| X=xZ−1) + ∈N

In relation to (7), it is important to note that the traditional claim that the OLS

estimator bβ retains its unbiasedness and consistency despite departures from the no-autocorrelation assumption, presumes the validity of the common factor restrictions;

see Spanos (1986). The latter is extremely important because without it bβ will bean inconsistent estimator.

A strong case can be made that the probabilistic assumptions (1)-(4) pertaining

to the error term provide an inadequate perspective for all three facets of modeling:

specification , misspecification testing and respecification. To illustrate that consider

the case where potential departures from assumption (4), i.e.

(|X=x) 6= 0 for (8)

are being probed using the Durbin-Watson (D-W) misspecification test based on:

0 : = 0 vs. 1 : 6= 0

in the context of the Autocorrelation-Corrected (A-C) LR model:

=0+β>1 x + =−1+ ∈N (9)

In the case where the D-W rejects 0 it is traditionally interpreted as providing

evidence for the (A-C) LR model in (9).

Hence, the traditional respecification takes the form of adopting (9) and declar-

ing that the misspecification has been accounted for by replacing the original OLS

estimator with a Generalized Least Squares (GLS) type estimator. This strategy,

however, constitutes a classic example of the fallacy of rejection: misinterpreting re-

ject 0 [evidence against 0] as evidence for a particular 1. A rejection of 0 by

a D-W test provides evidence against 0 and for the presence of generic temporal

dependence in (8), but does not provide any evidence for the particular form assumed

1: (|X=x)=(|−|1−2 )

2 =1 2 (10)

which stems from the AR(1) model for the error term; (10) represents one of an

infinite number of dependence forms that (8) allows for. To have evidence for (9)

one needs to validate all the assumptions of the (A-C) LR model anew, including the

validity of the common factor restrictions in (7); see Mayo and Spanos (2004).

The broader and more coherent vantage point for all three facets of modeling

stems from viewing the Linear Regression model as specified in terms of the first two

moments of the same conditional distribution ( | X;θ) i.e., the regression and

skedastic functions:

( | X=x)=(x) ( | X=x)=(x) x∈R (11)

where the functional forms () and () are determined by the joint distribution

(X;ϕ). From this perspective, departures from assumptions pertaining to one

of the two functions is also likely to affect the other. As argued next, the above

claims pertaining to the properties of bβ under (3)* depend crucially on retaining

some of other Gauss-Markov assumptions, especially the linearity of the regression

function stemming from assumption (2). The broader perspective indicates that

relaxing one assumption at a time might not be a good general strategy because the

model assumptions are often interrelated.

Having said that, the focus of discussion in this paper will be the best case sce-

nario for the traditional textbook perspective on non-Normality and heteroskedas-

ticity/autocorrelation. This is the case where all the other assumptions of the Linear

Regression model, apart from Normality and homoskedasticity, are retained and the

departures from the latter are narrowed down to accord with the traditional perspec-

The question which naturally arises at this stage is: what are the probabilis-

tic assumptions of the LR model whose validity is retained by this best case sce-

nario? To answer that question unambiguously one needs to specify the LR model

in terms of a complete set of probabilistic assumptions pertaining to the observable

process Z:=(X) ∈N underlying the data Z0 Choosing an arbitrary form of

heteroskedasticity and simulating the LR model does not render the combination a

coherent statistical model.

3 A broader Probabilistic perspective

Viewing the LR model as a parameterization of the observable vector stochastic

process Z:=(X) ∈N underlying the data Z0, has numerous advantages, in-cluding (i) providing a complete and internally consistent set of testable probabilistic

assumptions, (ii) providing well-defined statistical parameterizations for the model

parameters, (iii) bringing out the interrelationship among the probabilistic assump-

tions specifying the LR model, and (iv) enabling the modeler to distinguish between

the statistical and the substantive premises of inference; see Spanos (2006).

This is achieved by relating the LRmodel to the joint distribution of the observable

process Z:=(X) ∈N via the sequential conditioning:(Z1 Z;φ) =

=1(Z;ϕ)=Q

=1(|X;ψ1)·(X;ψ2) (12)

The joint distribution (Z;ϕ) takes the form:µX

µµ1μ2

µ11 σ>21σ21 Σ22

¶¶ (13)

which then gives rise to the conditional and marginal distributions, (|X;ψ1) and

(X;ψ2) :

( | X=x) v N(0+β>1 x 2)X v N(μ2Σ22)

The LR model, as a purely probabilistic construct, is specified exclusively in terms

of the conditional distribution (|X;ψ1) due to weak exogeneity; see Engle et

al (1983). To be more specific, the LR model comprises the statistical Generating

Mechanism (GM) in conjunction with the probabilistic assumptions [1]-[5] (table 2).

This purely probabilistic construal of the Normal, Linear Regression model brings

out the relevant parameterizations of interest in θ in terms of the primary parame-

ters ϕ:=(μ1μ211σ21Σ22) as well as the relationships between the probabilistic

assumptions of Z> :=(X> ) ∈N and the model assumptions (table 3).

Table 2: The Normal, Linear Regression Model

Statistical GM: = 0 + β>1 x + ∈N

[1] Normality: ( | X=x) v N( ),[2] Linearity: ( | X=x)=0 + β

>1 x linear in x

[3] Homoskedasticity: ( | X=x)=2 free of x∈R

[4] Independence: ( | X=x) ∈N is an independent process,[5] t-invariance: θ:=(0β1

2) do not change with

where 0=1−β>1 μ2 β1=Σ−122 σ21 2=11−σ>21Σ−122 σ21

Table 3: Reduction vs. model assumptions

Reduction: Z ∈N Model: (|X=x) ∈NN −→ [1]-[3]

I −→ [4]

ID −→ [5]

These relationships are particularly important for guiding Mis-Specification (M-S)

testing — probing the validity of the model assumptions [1]-[5] — as well as respecifi-

cation — choosing an alternative model when any of the assumptions [1]-[5] are found

to be invalid for data Z0

Of particular interest in this paper is the connection between [1] Normality and

[3] Homoskedasticity assumptions which stems from the joint Normality of Z In

relation to heteroskedasticity, it is very important to emphasize that the traditional

assumption in (2) does not distinguish between heteroskedasticity and conditional

variance heterogeneity:

(|X=x)=2() for x∈R

∈Nwhere 2() is an unknown function of the index . This distinction is crucial because

the source of two departures is very different, i.e. heteroskedasticity stems from

non-Normality of Z but heterogeneity stems from the covariance heterogeneity of

Z (table 3).

3.1 Viewing heteroskedasticity in a broader perspective

The recommendation to ignore any departures from Normality and Heteroskedasticity

and instead use HCSE for inferences pertaining to θ:=(0β1 2) takes for granted

that all the other assumptions [2], [4]-[5] are valid for data Z0 This is unlikely to be

the case in practice because of the interrelationships among the model assumptions. It

is obvious that in cases where other model assumptions, in addition to [1] and [3], are

invalid, the above recommendation will be a terrible strategy for practitioners. Hence,

for the purposes of an objective evaluation of this recommendation it is important to

focus on the best case scenario which assumes that the model assumptions [2], [4]-[5]

are valid for data Z0

Due to the interrelationship between the Normality of Z and model assumptions

[1]-[3], it is important to focus on joint distributions that retain the linearity assump-

tion in [2], including the underlying parameterizations of θ:=(0β1 2). Such a

scenario can be accommodated into a coherent PR specification by broadening the

joint Normal distribution in (13) to the Elliptically Symmetric (ES) family (Kelker,

1970), which can be specified in terms of the first two moments of Z via:µX

¶v ES

µµ1μ2

µ11 σ>21σ21 Σ22

¶; ()

¶(14)

For different functional forms () this family includes the Normal, the Student’s t,

the Pearson type II, the Logistic and other distributions; see Fang et al (1990). The

appropriateness of the ES family of distributions for the best case scenario stems from

the fact that:

(i) All the distributions are bell-shape symmetric, resembling the Normal in terms

of the shape of the density function, but its members are either platykurtic (4 3) or

leptokurtic (4 3), with the Normal being the only mesokurtic (4=3) distribution.

(ii) The regression function for all the members of the ES family of distributions

is identical to that of the Normal (Nimmo-Smith, 1979):

( | X=x)=0 + β>1 x x∈R

(iii) The statistical parameterization of θ:=(0β1 2) (table 2) is retained by all

members of the ES family. This ensures that the OLS estimators of (0β1) will be

unbiased and consistent.

(iv) Homoskedasticity characterizes the Normal distribution within the ES fam-

ily, in the sense that all the other distributions have a heteroskedastic conditional

variance; see Spanos (1995) for further discussion.

For all the other members of the ES family the skedastic function takes the generic

( | X=x)=(x)=2[∞

()] x∈R

where 2=12(X−μ2)>Σ−122 (X−μ2) and () is the marginal density of x; see Chu

(1973). For instance, in the case where (14) is Student’s t with degrees of freedom

(Spanos, 1994):

(|(X))=³

´ ¡1+ 2

2¢ (15)

Interestingly, the above form of heteroskedasticity in (15) is not unrelated to the

misspecification test for homoskedasticity proposed by White (1980), which is based

on testing the hypotheses:

0: γ1 = 0 vs. 1: γ1 6= 0in the context of the auxiliary regression:b2 = 0 + γ

>1 ψ+

where is a white-noise error and ψ:=(·) ≥ =1 2 It is often

claimed that this White test is very general as a misspecification test for departures

from the homoskedasticity assumption; see Greene (2011). The fact of the matter

is that the squares and cross-products of the regressors in ψ stem primarily from

retaining the linearity and the other model assumptions, and relaxing only the Ho-

moskedasticity assumptions. The broader perspective stemming from viewing the LR

model as based on the regression and skedastic functions in (11) would bring out the

fact that the above quadratic form of heteroskedasticity is likely to be the exception,

not the rule when the underlying joint distribution is non-Normal; see Spanos (1999),

ch. 7.

4 Monte Carlo Simulations 1: HCSE

To investigate the reliability of inference in terms of any discrepancies between actual

and nominal error probabilities we focus on linear restrictions on the coefficients β

based on the hypotheses of interest:

0 : (Rβ − r)= 0 vs. 1 : (Rβ − r) 6= 0 rank(R)=It is well-known that the optimal test under assumptions [1]-[5] (table 2) is the F-test

defined by the test statistic:

(y)=(R−r)>[R(X>X)−1R>]

−1(R−r)

2=(R−r)>[R(X>X)−1R>]

−1(R−r)u>u (−−1

)0v F ( −−1)

where bβ=(X>X)−1X>y2= (u>u)−−1 bu= y−Xbβ

in conjunction with the rejection region:

1=y: (y) The best case scenario in the context of which the error-reliability of the above F-

test will be investigated is that the simulated data are generated using the Student’s t

Linear/Heteroskedastic Regression model in table 4. The choice of this model is based

on the fact that, in addition to satisfying conditions (i)-(iv) above, the Student’s

t distribution approximates the Normal distribution as the number of degrees of

freedom () increases.

Table 4: Student’s t, Linear/Heteroskedastic Regression model

Statistical GM: = 0 + β>1X + ∈N

[1] Student’s t: (X;θ) is Student’s t with degrees of freedom,

[2] Linearity: (|(X))=0 + β>1X where X: ( × 1)

[3] Heteroskedasticity: (|(X))=³

´³1+ 1

(X−μ2)>Σ−122 (X−μ2)

[4] Independence: Z ∈N is an Independent process,[5] t-invariance: θ:=(0β1

2μ2Σ22) are t-invariant,

where 0=1−β>1 μ2 β1=Σ−122 σ21 2=11−σ>21Σ−122 σ21

Maximization of the log-likelihood function for the above Student’s t model in ta-

ble 4, yields MLE of β>:=(0β>1 ) that take an estimated Generalized Least Squares

(GLS) form: bβ=(X>bΩ−1X)−1X>bΩ−1ybΩ=diag (b1 b2 b)

+ − 2µ1+1

(X−bμ2)>bΣ−122 (X−bμ2)¶ =1

b2=¡+

¢( u>Ω−1u

with bμ2 and bΣ22 denoting the MLE of μ2 and Σ22 where all the above estimators

are derived using numerical optimization; see Spanos (1994).

The F-type test statistic based on the above MLEs takes the form:

(R−r)>[R(X>ΩX)−1R>]−1(R−r)

0≈ F ( −−1)

where ‘0≈’ denotes ‘distributed approximately under 0’. The asymptotic form of

this test statistic is:

F-MLE: ∞ (y)=(R−r)>[R(X>ΩX)−1R>]

−1(R−r)2

0v2 () (16)

For simplicity we consider the case where R=I:=diag(1 1 1) for the simulationsthat follow.

4.1 Monte Carlo Experiment 1

This design will be based on the same mean and covariance matrix for both the

Normal and the Student’s t distributions:⎛⎝ 12

⎞⎠ v ES

⎛⎝⎛⎝ 3

⎞⎠⎛⎝ 179375 7 −47 1 2

−4 2 1

⎞⎠⎞⎠12

giving rise to the following true parameter values:

0=19875 β>1 =(8125 −5625) 2=1

However, in order to evaluate the error-reliability of different inference procedures we

will allow different values for degrees of freedom to investigate their reliability as

increases; the Student’s t approximates better the Normal distribution.

In principle one can generate samples of size using the statistical Generating

Mechanism (GM):

y() = 1∗0 + x()β∗1 +

p(x())ε() =1 2

(x)=³

2∗+−2

´³1 + 1

(x −μ∗2)>Σ∗−122 (x −μ∗2)

to generate the ‘artificial’ data realizations:¡y(1)y(2) · · · y()¢ where y():=(1 2 )>

using pseudo-random numbers for ε() ∼ St(0 I; +) and X() v St(μ2Σ22; )

However, the simulations are more accurate when they are based on the multivariate

Student’s t distribution:

Z ∼ St (μΣ;) ∈N (17)

and then (μΣ) are reparameterized to estimate the unknown parameters θ:=(0β1 2).

Let U be an dimensional Normal random vector: U ∼ NIID (0 I) Using the transformation: Z=μ+

´−1L>U where L>L=Σ v 2() is a

chi-square distributed random variable with degrees of freedom, and U and

are independent, one can generate data from a Student’s t random vector (17); see

Johnson (1987).

The aim of the above experiment 1 is to investigate the empirical error proba-

bilities (size and power) of the different estimators of (bβ) that give rise to theHCSE F-type test:

(y)=(Rbβ − r)>[R \(bβ)R]−1(Rbβ − r) 0v

2() (18)

where R=I:=diag(1 1 1) and \(bβ) is given in (3). For the probing of the

relevant error probabilities we consider six scenarios:

(a) using two different values for the degrees of freedom parameter (), =4 =8,

(b) using four sample sizes, =50 =100 =200

The idea is that increasing renders the tails of the underlying Student’s t distri-

bution less leptokurtic and closer to the Normal. For comparison purposes, the Stan-

dard Normal [N(0 1)] is plotted together with the Student’s t densities [St(0 1; )]

for =4 and =8 in the figure below. note that the St(0 1; ) densities are rescaledto ensure that their variance is also unity and not

−2 , to render them comparable

to the N(0,1). The rescaled Student’s t density is:

(; 0 1 )=·Γ( +12 )√Γ( 2)

+12 ) for = =

yielding: ()=0 ()=1

-4 -3 -2 -1 0 1 2 3 4

f(x) St(0,1;4)

St(0,1;8)

N(0,1)

To get some idea about the differences in terms of the tail area:

P(|| 2)=⎧⎨⎩ 0455 for N(0,1)

0805 for St(8)

1161 for St(4)

The kurtosis coefficient of the Normal distribution is mesokurtic since 3=3 but the

Student’s t is leptokurtic:

3=(−)3√ ()

3=3+ 6(−4)

which brings out the role of the degrees of freedom parameter It is important to

note that the degrees of freedom of the conditional distribution associated with the

Student’s t Linear Regression model in table 4 increases with the number of regressors,

i.e. +

The different sample sizes are chosen to provide a guide as to how large is ‘large

enough’ for asymptotic results to be approximately acceptable. These choices for

( ) are based on shedding maximum light on the issues involved using the smallest

number of different scenarios. Numerous other choices were used before reducing the

scenarios to 6 that adequately represent the simulation results.

The relevant error probabilities of the HCSE-based test are compared to F-test

(16) based on the Maximum Likelihood Estimators (MLEs) in table 4.

Actual type I error probability (size) of HCSE-based tests.

Figure 1 summarizes the size properties of the following forms of the F-test:

(i) OLS: bβ in conjunction with 2(X>X)−1(ii) HCSE-HW: bβ in conjunction with the Hansen-White estimator of (bβ)

(iii) HCSE-NW: bβ in conjunction with the Newey-West estimator of (bβ)(iv) HCSE-A: bβ in conjunction with the Andrews estimator of (bβ) and(v) F-MLE: the F-test based on the MLEs of both β and its covariance matrix

for the Student’s t Linear Model in table 4, using its asymptotic version in (16) to

ensure comparability with the F-type tests (i)-(iv).

Fig. 1: Size of F-HCSE-based tests for different ( )

The general conclusion is that all the heteroskedasticity robustified F-type tests

exhibit serious discrepancies between the actual and nominal type I error of 05. The

actual type I error is considerably greater than the nominal for =4 and all sample

sizes from =50 to =200, ranging from 12 to 37. The only case where the size

distortions are smaller is the case =200 and =8 but even then the discrepancies

will give rise to unreliable inferences. In contrast, the F-MLE has excellent size

properties for all values of and What is also noticeable is the test in (i) is not

uniformly worse than the other HCSE-based tests in terms of its size. Any attempt

to rank the HCSE-based F-tests in terms of size distortion makes little sense because

they all exhibit serious discrepancies from the nominal size.

The power properties of the HCSE-based tests.

The general conclusion is that the power curves of the HCSE-based tests (i)-(iv)

are totally dominated by the F-MLE test by serious margins for all scenarios with

=4, indicating that the degree of leptokurtosis has serious effects not only on the

size but also the power of HCSE-based tests; see figures 2-3, 6-7, 10-11. The power

domination of the F-MLE test becomes less pronounced, but still significant, for

scenarios with =8 as increases; see figures 4-5, 8-9 and 12-13.

Figure 2, depicts the power functions of the HCSE-based tests (i)-(iv) for =50

and =4 and different discrepancies from the null. The plot indicates serious power

distortions for all tests in (i)-(iv). These distortions are brought out more clearly in

the size-corrected power given in figure 3. For instance, in the case of 75 discrepancy

from the null the HCSE-HW has power 47 and the F-MLE has power equal to 94!

The best from the HCSE-based tests have power less than 74; that is a huge loss of

power for =50

Fig. 2: Power of F-HCSE-based tests for =50 =4

Fig. 3: Size-adjusted power of F-HCSE-based tests for =50 =4

What is worth noting in the case (=50 =8) is that the robust versions of the

covariance matrix are worse than the OLS with the traditional covariance matrices

in terms of the error-reliability of the F-type test. This calls into question any claims

that the use of the HCSE is always judicious and carries no penalty. For instance,

for .5 discrepancy from the null the best of these tests has power .63 but the F-MLE

test has power .96.

In the case (=100 =4) there is a significant loss of power associated with the

HCSE-based F-tests is that the robust versions of the covariance matrix are worse

than the OLS with the traditional covariance matrices in terms of the error-reliability

of the F-type test. This calls into question any claims that the use of the HCSE is

always judicious and carries no penalty.

The HCSE-based tests show less power distortions in the scenario (=100 =8)

than previous scenarios, but there size distortion renders them unreliable enough to

be of questionable value in practice.

Despite the increase in the sample size in the case (=200 =4) there are signif-

icant power distortions associated with the HCSE-based F-tests, stemming primarily

from the leptokurticity of the underlying distribution. This calls into question the

pertinency of the recommedation to ignore departures from Normality.

The scenario (=200 =8) represents the best case in terms of the distortions

of power associated with the HCSE-based F-tests, but even in this case the size

distortions are significant enough (at least double the nominal size) to render the

reliability of these tests questionable.

5 Monte Carlo Experiment 2: HACSE

5.1 Traditional perspective

This is the case where the error term assumptions (3)-(4), (uu> | X)=2I arerelaxed together by replacing this assumption with:

(3)*-(4)*: (uu> | X)=V (19)

where V:=[]=1 =( | X). In this case the relevant G=X

>VX is:

G =¡1

=1 xx>

=1(2xx

P−1=1

=+1(x−x>−+x−−x

When only the assumption of homoskedasticity (3) is relaxed, only the first term is

relevant, but when both are relaxed the other term represents the temporal depen-

dence that stems from allowing for error autocorrelation.

An obvious extension of the White (1980) estimator bΓ(0)= ¡ 1

=1 b2 (xx> )¢of G=X

>VX in the case where there are departures from both homoskedasticity

and autocorrelation, is to use the additional term in (20) that involve cross-products

of the residuals with their lags:

bΓ()= 1

=+1 bb−xx>− = 1 2 Hansen (1992) extended the White estimator to derive an estimator ofG=X

Hansen-White: bG=bΓ(0)+P

hbΓ() + bΓ>()i 0Newey and West (1987) used a weighted sum (based on the Bartlett kernel) of these

matrices to estimate G:

Newey-West: bG=bΓ(0)+P

³(1−

+1)hbΓ() + bΓ>()i´ 0

Andrews (1991) proposed different weights (1 2 ) :

Andrews: bG=bΓ(0)+P

hbΓ()+bΓ>()i Using any one of these estimators bG ofG yields a Heteroskedasticity-Autocorrelation

consistent estimator of the SE of bβ:

^(bβ) =

h¡( 1X>X)−1

¡( 1X>X)−1

5.2 A broader probabilistic perspective for HACSE

How does one place the above departures from the error assumptions (3)-(4), per-

taining to homoskedasticity and non-autocorrelation into a broader but coherent per-

spective? The above estimators of G=X>VX make it clear that the temporal

dependence in data Z0 is modeled via the error term by indirectly imposing a certain

probabilistic structure on the latter such as -correlation (MA(p)) or some form of

asymptotic non-correlation; see White (1999), pp. 147-164.

Is bβ unbiased and consistent under (4)*? Viewing the testing proceduresusing the HAC standard errors from the broader perspective of the LR model — based

on the regression and skedastic functions in (11) — what is most surprising is that the

original regression function is retained in its static form but the skedastic function is

extended to include some form of error autocorrelation. How could that be justified?

The traditional discussion pertaining to the presence of error autocorrelation is that

the OLS estimator bβ=(X>X)−1X>y retains its unbiasedness and consistency. Thisclaim, however, depends crucially on the validity of the common factor restrictions in

(7). When these restrictions are invalid for data Z0, bβ is both biased and inconsistent,which will give rise to majorly unreliable inferences; see Spanos (1986). To bring out

how restrictive the traditional strategy adopting the HACSE robustification is we will

consider four different scenarios for the simulations that follow.

Scenario 1: no common factor restrictions are imposed and the modeler

estimates the Dynamic Linear Regression (DLR) model.

Scenario 2: no common factor restrictions are imposed and the modeler

estimates the Linear Regression (LR) model, ignoring the dynamics in the

autoregressive function.

Scenario 3: common factor restrictions are imposed and the modeler

estimates the Dynamic Linear Regression (DLR) model.

Scenario 4: common factor restrictions are imposed and the modeler

estimates the Linear Regression (LR) model.

The broader probabilistic perspective suggests that both, the regression and skedas-

tic functions, are likely to be affected by the presence of any form of temporal de-

pendence in data Z0. Hence, the natural way to account for such dependence is to

respecify the original model by expanding the relevant conditioning information set

to include the past history of the observable process Z:=(X) ∈N:( | XZ−1)=(XZ−1) ( | XZ−1)=(xZ−1) (x:z−1)∈R+

That is, any form of temporal dependence in Z0 should be modeled directly in terms

of the stochastic process Z:=(X) ∈N.This gives rise to an extension of the Student’s t, Linear/Heteroskedastic model

that includes lags of this process. The Student’s t Dynamic Linear Regression (DLR)

model (table 5) can be viewed as a particular parameterization of the vector (× 1)stochastic process Z:=(X) ∈N, assumed to be Student’s t with degrees of

freedom (df), Markov and stationary. The probabilistic reduction takes the form:

(Z1 Z;φ) =Q

=1(Z|Z−1;ϕ)=Q

=1(|XZ−1;ψ1)·(X|Z−1;ψ2)

with (|XZ−1;ψ1) underlying the specification of the St-DLR model in table 5.

Note that it is trivial to increase the number of lags to ≥ 1 by replacing Markovnesswith Markoveness of order

Table 5: Student’s t, Dynamic Linear/Heteroskedastic model

Statistical GM: = 0 + β>1X + β

>2 Z−1 + ∈N

[1] Student’s t: (ZZ−1 Z;ϕ) is Student’s t with df,

[2] Linearity: (|(XZ−1))=0 + β>1X + β

>2 Z−1

[3] Heteroskedasticity: (|(XZ−1))=³

+∗−2

´³1+22

´ ∗=2+1

2=12[(X-μ2)

>Q1(X-μ2)+(Z−1-μ)>Q2(Z−1-μ)[4] Markov: Z ∈N is an Markov process,[5] t-invariance: θ:=(0β1β2

20μQ1Q2) are t-invariant.

The estimation of the above model is based on the same log-likelihood function (see

Appendix) as the model in table 2, but now the regressor vector isX∗ :=¡X Z−1

¢whose number of elements is ∗=2+1

5.3 Scenario 1 for HACSE

For this case, no common factor restrictions are imposed and the modeler accounts for

the temporal structure in the regression function by estimating the Dynamic Linear

Regression (DLR) model.

This design will be based on the same mean and covariance matrix for both the

Normal and the Student’s t distributions:⎛⎜⎜⎜⎜⎜⎜⎝12−11−12−1

⎞⎟⎟⎟⎟⎟⎟⎠ ∼ ES⎡⎢⎢⎢⎢⎢⎢⎣

⎛⎜⎜⎜⎜⎜⎜⎝3

⎞⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎜⎜⎜⎜⎝120 070 −040 080 050 −038070 100 020 050 070 010

−040 020 100 −038 010 075

080 050 −038 120 070 −040050 070 010 070 100 020

−038 010 075 −040 020 100

⎞⎟⎟⎟⎟⎟⎟⎠

⎤⎥⎥⎥⎥⎥⎥⎦(22)

that gives rise to the true model parameters:

0=1159 β>1 = (803 −0453 0424 −0336 0116) 2=330

In the case of Normality, this gives rise to the following Dynamic Linear Regression

Model (DLR(1)):

=1159+8031−4532+424−1−3361−1+1162−1+ ∈N (23)

The initial conditions: 20=33, 0 ∼N(3 12), 10 ∼N(18 1), and 20 ∼N(8 1).In the case of a Student’s t distribution with degrees of freedom, (23) remains

the same but the conditional variance := (|(XZ−1)) takes the form:

+2−1

⎛⎜⎜⎜⎜⎜⎝1+ 1

⎡⎢⎢⎢⎢⎢⎣⎛⎜⎜⎜⎜⎝

−1-31-1-18

⎞⎟⎟⎟⎟⎠>⎛⎜⎜⎜⎜⎝

2.23 -.822 -.011 -1.61 .712

-.822 2.623 .229 .533 -1.90

-.011 .229 2.48 -1.2 1.22

-1.61 .533 -1.2 3.83 -1.81

.712 -1.9 1.22 -1.81 3.20

⎞⎟⎟⎟⎟⎠⎛⎜⎜⎜⎜⎝

−1-31-1-18

⎞⎟⎟⎟⎟⎠⎤⎥⎥⎥⎥⎥⎦

⎞⎟⎟⎟⎟⎟⎠The simulations aim to investigate the discrepancies between the actual (empir-

ical) error probabilities and the nominal ones (size and power) associated with the

different estimators of (bβ)=G=X>VX bG bG bG =3 under:

(a) varying the values of the degrees of freedom parameter (), =4 =8,

(b) varying the sample size =50 =100 =200

Actual type I error probability (size) of HCSE-based tests.

Figure 1 summarizes the size properties of the following forms of the F-test:

(i) OLS: bβ in conjunction with 2(X>X)−1

(ii) HACSE-HW: bβ in conjunction with the Hansen-White estimator ofG=X>VX

(iii) HACSE-NW: bβ in conjunction with the Newey-West estimator of G=X>VX

(iv) HACSE-A: bβ in conjunction with the Andrews estimator of G=X>VX

(v) F-MLE: the F-test based on the MLEs of both β and its covariance matrix for

the Dynamic Student’s t Linear Regression Model in table 5.

It is important to emphasize at the outset that the scenario used for the sim-

ulations that follow includes estimating the DLR(1) in (23) because ignoring the

dynamics and just estimating 0=1−β>1 μ2 β1=Σ−122 σ21 2=11−σ>21Σ−122 σ21) willgive rise to inconsistent estimators, and thus practically useless F-tests with or with-

out the robustification; see Spanos and McGuirk (2001).

The general impression conveyed by fig. 14 is that all the HACSE robustified

F-type tests exhibit even more serious discrepancies between the actual and nominal

type I error of 05 than those in figure 1. The actual type I error is considerably

greater than the nominal for =4 and all sample sizes from =50 to =200, ranging

from 15 to 61. The only case where the size distortions are somewhat smaller is the

case =200 and =8 but even then the discrepancies will lead to seriously unreliable

inferences. In contrast, the F-MLE has excellent size properties for all values of

and What is also noticeable is the test in (i) is not uniformly worse than the other

HACSE-based tests in terms of its size. Indeed, for =8 =50 =100 the OLS-

based F-test has less size distortions than the robustified alternative F-tests. Any

attempt to rank the HACSE-based F-tests in terms of size distortion makes little

sense because they all exhibit serious discrepancies from the nominal size.

Fig. 14: Size of F-HACSE-based tests for different ( )

The power properties of the HACSE-based tests.

Figure 15, shows the power functions of all the HACSE-based F-tests for =50

and =4 and different discrepancies from the null. The plot indicates even more

serious power distortions for all the HACSE-based F-tests than those in figure 2.

These distortions are brought out more clearly in the size-corrected power given in

figure 16. For instance, in the case of .75% discrepancy from the null the HCSE-HW

has power .44 and the F-MLE has power equal to .99! Worse, the OLS-based F-test

does better in terms of power than any of the HACSE-based F-tests. Indeed, the

HACSE-HW is dominated by all the other tests in terms of power.

The power curves for all the HACSE-based F-tests are totally dominated by the

F-MLE test by serious margins for all scenarios with =4, indicating that the degree

of leptokurtosis has serious effects not only on the size but also the power of HCSE-

based tests; see figures 19-20, 23-24. For scenarios with =8 as increases, the

power domination of the F-MLE test becomes less pronounced, but still significant;

see figures 17-18, 21-22 and 25-26.

Fig. 15: Power of F-HACSE-based tests for =50 =4

Fig. 16: Size-adjusted power of F-HACSE-based tests for =50 =4

Despite the increase in the degrees of freedom parameter to =8 all the HACSE-

based F-tests for =50 suffer from serious power distortions, with the OLS-based

F-test dominating them in terms of both size and power; see figures 17-18.

The serious distortions in both size and power observed above continue to exist

in the scenario: (=100 =4) suggesting that the leptokurtosis has a serious effect

on both; see figures 19-20. This particular departure from Normality can be ignored

at the expense of the reliability of inference.

When the degrees of freedom increases to =8 the distortions in size and power

are moderated but they still serious enough to call into question the reliability of any

inference based on the HACSE-based F-tests; see figures 21-22.

The scenario (=200 =4) brings out the adverse effects of leptokurticity on both

the size and the power of the HACSE-based F-tests despite the large sample size.

No common factor restrictions are imposed and the modeler estimates the Linear

Regression (LR) model, ignoring the dynamics in the autoregressive function, i.e.

estimation and the F-test focus exclusively on the static LR model.

###################################

For this case, common factor restrictions are imposed and the modeler takes into

account the temporal structure by estimating the Dynamic Linear Regression (DLR)

model.

As first shown by Sargan (1964), modeling the Markov dependence of the process

Z:=(X) ∈N using an AR(1) error term:=0+β

>1 x + =−1+

constitutes a special case of the DLR model in (6) subject to common factor restric-

tionsα3+α12=0 It can be shown (McGuirk and Spanos, 2009) that these parameter

restrictions imply a highly unappetizing temporal structure for Z:=(X) ∈NThe common factor restrictions transform the unrestricted Σ into Σ∗:

(ZZ−1)=Σ=

⎛⎜⎜⎝11(0) σ>21(0) 11(1) σ>21(1)σ21(0) Σ22(0) σ>21(1) Σ>

11(1) σ21(1) 11(0) σ>21(0)σ21(1) Σ22(1) σ21(0) Σ22(0)

⎞⎟⎟⎠

Σ∗=

⎛⎜⎜⎜⎝β>Σ22(0)β +

1−2 β>Σ22(0) β>Σ22(1)β+2

1−2 β>Σ22(1)

Σ22(0)β Σ22(0) β>Σ22(1) Σ22(1)

β>Σ22(1)β+2

1−2 Σ22(1)>β β>Σ22(0)β+

1−2 β>Σ22(0)

Σ22(1)β Σ22(1) Σ22(0)β Σ22(0)

⎞⎟⎟⎟⎠The nature of the common factor restrictions can be best appreciated in the context

of a VAR(1) model:

Z = A>Z−1 +E E v NIID(0Ω) ∈ N (24)

entailed by the restricted Σ∗, which takes the form:

µ (D− ρI)β0 D

¶ Ω=

µ2 + β>Λβ β>Λ

Λβ Λ

¶D =Σ22(0)

−1Σ22(1) Λ=Σ22(0)−Σ22(1)Σ22(0)−1Σ22(1)(25)

The question that naturally arises at this stage is whether the error-reliability

of the F-test will improve by imposing such a highly restrictive temporal structure

on the process Z:=(X) ∈N To ensure the validity of these restrictions thevariance-covariance of the scenario 1 needs to be modified slightly to:⎛⎜⎜⎜⎜⎜⎜⎝12−11−12−1

⎞⎟⎟⎟⎟⎟⎟⎠∼ES⎡⎢⎢⎢⎢⎢⎢⎣

⎛⎜⎜⎜⎜⎜⎜⎝30

⎞⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎜⎜⎜⎜⎝1.20 .700 -.400 .818 .5125 -.34063

.700 1.00 .200 .5125 .700 .100

-.400 .200 1.00 -.34063 .100 .750

.818 .5125 -.34063 1.289 .700 -.400

.5125 .700 .100 .700 1.00 .200

-.34063 .100 .750 -.400 .200 1.00

⎞⎟⎟⎟⎟⎟⎟⎠

⎤⎥⎥⎥⎥⎥⎥⎦ (26)

This gives rise to the true model parameters that do satisfy the common factor

restrictions:

0=11448 β>1 =(81249 −56248 424 −3445 23848) 2=317

In the case of Normality, this gives rise to the following Dynamic Linear Regression

Model (DLR(1)):

=1145+8121−5622+424−1−3451−1+2382−1+ ∈N (27)

#########################

Common factor restrictions are imposed and the modeler estimates the Linear Regres-

sion (LR) model, ignoring the dynamics in the autoregressive function, i.e. estimation

and the F-test focus exclusively on the static LR model.

###################################

6 Summary and conclusions

The primary aim of the paper has been to investigate the error-reliability of HCSE/HACSE-

based tests using Monte Carlo simulations. For the design of the appropriate simula-

tion experiments it was important to view the departures from the homoskedasticity

and autocorrelation assumptions from the broader perspective of regression models

based on the first two conditional moments of the same distribution. It was argued

that viewing such models from the error term perspective provides a narrow and often

misleading view of these departures and the ways they can be accounted for.

The simulation results call into question the conventional wisdom of viewing

HCSE/HACSE as robustified versions of the OLS covariances. Robustness in this

case can only be evaluated in terms of how closely the actual error probabilities

approximate the nominal ones. In terms of the latter, it is shown that the various

HCSE and HACSE-based F-tests give rise to major size and power distortions. These

distortions are particularly pernicious in the case of the presence of leptokurticity. Al-

though further Monte Carlo simulations will be needed to get a more general picture

of the error-reliability of these tests, the results in this paper call into question their

wide-spread use in practice because they do not, in general, give rise to reliable test-

ing results. Hence, the recommendation to ignore departures from Normality and

account for departures from Homoskedasticity/Autocorrelation using HCSE/HACSE

is highly misleading.

In conclusion, it is important to emphasize that the above simulation results are

based on the best case scenario for these HCSE and HACSE-based F-tests where the

probabilistic assumptions of [2] linearity, [5] t-invariance and [1] bell-shape symmetry

of the underlying conditional distribution are retained. When any of these assump-

tions are invalid for the particular data, the discrepancy between actual and nominal

error probabilities for the HCSE and HACSE-based F-tests is likely to be much worse

than the above results indicate.

References

[1] Ali, M.M. and S.C. Sharma (1996), “Robustness to nonnormality of regression

F-tests,” Journal of Econometrics, 71: 175—205.

[2] Andrews, D.W.K. (1991), “Heteroskedasticity and Autocorrelation Consistent

Covariance matrix estimation”, Econometrica, 59: 817-854.

[3] Bahadur, R. R. L. J. Savage (1956), “The Nonexistence of Certain Statistical

Procedures in Nonparametric Problems,” The Annals of Mathematical Statistics,

27: 1115-1122.

[4] Chu, K.-C. (1973), “Estimation and decision for linear systems with elliptical

random processes”, IEEE Transactions on Automatic Control, 18: 499-505.

[5] Engle, R.F., D.F. Hendry and J.F. Richard (1983), “Exogeneity”, Econometrica,

51: 277-304.

[6] Fang, K.-T., S. Kotz and K-W. Ng. (1990), Symmetric Multivariate and Related

Distributions, Chapman and Hall, London.

[7] Hansen, B.E. (1992) “Consistent covariance matrix estimation for dependent

heterogenous processes”, Econometrica, 60: 967-972.

[8] Hansen, B.E. (1999), “Discussion of ‘Data mining reconsidered’”, The Econo-

metrics Journal, 2: 192-201.

[9] Johnson, M.E. (1987), Multivariate Statistical Simulation, Wiley, NY.

[10] Kelker, D. (1970), “Distribution theory of spherical distributions and a location-

scale parameter”, Sankhya A, 32, 419-30.

[11] Kiefer, N.M. and T.J. Vogelsang (2005), “A new asymptotic theory for

heteroskedasticity-autocorrelation robust tests”, Econometric Theory, 21: 1130-

[12] Mayo, D.G. and A. Spanos. (2006), “Severe Testing as a Basic Concept in a

Neyman-Pearson Philosophy of Induction,” The British Journal for the Philos-

ophy of Science, 57: 323-357.

[13] McGuirk, A. and A. Spanos (2009), “Revisiting Error Autocorrelation Correc-

tion: Common Factor Restrictions and Granger Non-Causality,” Oxford Bulletin

of Economics and Statistics, 71: 273-294.

[14] Newey, W.K. and K.D. West (1987), “A simple, positive semi-definite, het-

eroskedasticity and autocorrelation consistent covariance matrix”, Econometrica,

55: 703-708.

[15] Nimmo-Smith, I. (1979), “Linear regressions and sphericity”, Biometrika, 66,

390-2.

[16] Phillips, P.C.B. (2005a), “Automated Discovery in Econometrics”, Econometric

Theory, 21: 3-20.

[17] Phillips, P.C.B. (2005b), “Automated Inference and the Future of Econometrics”,

Econometric Theory, 21: 116-142.

[18] Robinson, P. (1998), “Inference without smoothing in the presence of nonpara-

metric autocorrelation”, Econometrica, 66: 1163-1182.

[19] Sargan, D.J. 1964. Wages and Prices in the U.K.: A Study in Econometric

Methodology. In Paul Hart, Gary Mills and John K. Whitaker, (Eds.) Econo-

metric Analysis for National Economic Planning, vol. 16 of Colston Papers.

London: Butterworths, 25-54.

[20] Spanos, A., (1986), Statistical Foundations of Econometric Modelling, Cam-

bridge University Press, Cambridge.

[21] Spanos, A. (1999), Probability Theory and Statistical Inference: econometric

modeling with observational data, Cambridge University Press, Cambridge.

[22] Spanos, A. (1994), “On modeling heteroskedasticity: the Student’s t and Ellip-

tical linear regression models,” Econometric Theory, 10: 386-415.

[23] Spanos, A. (1995), “On Normality and the Linear Regression Model”, Econo-

metric Reviews, 14: 195-203.

[24] Spanos, A. (2006), “Revisiting the Omitted Variables Argument: Substantive

vs. Statistical Adequacy,” Journal of Economic Methodology, 13: 179—218.

[25] White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estima-

tor and Direct Test for Heteroskedasticity”, Econometrica, 48: 817-838.

[26] White, H. (1999), Asymptotic Theory for Econometricians, revised edition, Aca-

demic Press, London.

7 Appendix: Student’s t log-likelihood function

Let Z:=(1 2 )> be a dimensional vector. Z is said to have -variate

Student’s distribution with degrees of freedom (df) , location vector μ and scaling

matrix Σ denoted by:

Z vSt (μΣ;) ∈Nwhen for θ=(μ,Σ) (Z)=μ (Z)=

−2Σ the joint probability density func-

tion is:

(z;θ)=Γ((+)2)

()2Γ(2)|Σ|12£1+ 1

(z −μ)>Σ−1(z −μ)

¤−(+)2 (28)

Let Z:=(X) the reduction in (12) yields:

where:

(|x;θ1)=Γ((+)2)||−12

()12Γ((+)2)

(−0−>1 X)2

´−(+)2=

2¡1+(X−μ2)>Q(X−μ2)

Σ−122 =−1

(x;θ2)=Γ((+)2)|Q|12

2Γ(2)

£1+(x−μ2)>Q(x−μ2)

¤−(+)2In light of the fact that the parameters of the conditional and marginal densities, θ1and θ2 respectively, are not variation free (see Spanos, 1994), the relevant likelihood

for the estimation of θ1 is the product of the two densities:

(θ1θ2; z)∝Q

∙||−

(−0−>1 X)2

´−(+)2¸ h|Q|12 £1+(x−μ2)>Q(x−μ2)¤−(+)2i

Hence, the log-likelihood function takes the form:

ln(θ1θ2; z) ∝ −12P

=1 ln − (+)

=1 ln³1+

(−0−>1 X)2

+2ln(det(Q))− (+)

=1 ln£1+(x−μ2)>Q(x−μ2)

¤Maximization of ln(θ1θ2; z) with respect to θ1 does not yield close form solutions

for these parameters but one can express the MLE of β>:=(0β>1 ) takes a General-

ized Least Squares form: bβ = (X> bV−1X)−1X>bΩ−1y

where bΩ takes the form:bΩ=diag (b1 b2 b) b=

³1+ 1

(X−bμ2)>bΣ−122 (X−bμ2)´ =1

where bμ2 and bΣ22 denote the MLE of μ2 and Σ22; see Spanos (1994).

Heteroskedasticity/Autocorrelation Consistent Standard ... · to which this strategy addresses the...

Documents

Heteroskedasticity-Robust Standard Errors for Fixed ...mwatson/papers/ecta6489.pdf · heteroskedasticity-robust standard errors for ... in stata and eviews. ... heteroskedasticity-robust

Econ 423 – Lecture Notes - UMDeconweb.umd.edu/~chao/Teaching/Econ423/Econ423_HAC_Estimation.pdf15-2 Heteroskedasticity and Autocorrelation-Consistent (HAC) Standard Errors • Consider

Heteroskedasticity in the Error Component Model - …homepage.univie.ac.at/robert.kunst/pan2010_pres_raeisian.pdf · Heteroskedasticity in the Error Component Model Heteroskedasticity

The Unreliability of Naive Introspection

ECON 7710, 2010 10.1 Heteroskedasticity What is heteroskedasticity? What are the consequences? How is heteroskedasticity identified? How is heteroskedasticity

Heteroskedasticity and Autocorrelation

Generalized Autoregressive Conditional Heteroskedasticity

Phelan Unreliability

Heteroskedasticity Autocorrelation Robust …tjv/rho-vogelsang-missing.pdfHeteroskedasticity Autocorrelation Robust Inference in Time Series Regressions with Missing Data Seung-Hwa

Heteroskedasticity and Autocorrelation Consistent Standard …papers.nber.org/WNE/slides7-14-08/Lecture9.pdf · 2008. 7. 23. · Lecture 9 ‐ 13, July 21, 2008 The algebra linking

A Generalization of Autocorrelation and Partial Autocorrelation

Heteroskedasticity- and Autocorrelation-Robust Inference · HAC = Heteroskedasticity- and Autocorrelation-Consistent HAR = Heteroskedasticity- and Autocorrelation-Robust 1) HAC/HAR

Recurrent Conditional Heteroskedasticity - arXiv

Autocorrelation 2: Spatial Autocorrelation

Spatial Autocorrelation (3) Global Spatial Autocorrelation

RobustInferenceinModelsIdentiﬁedvia Heteroskedasticity

Testing for Spatial Autocorrelation: the Regressors …...spatial autocorrelation can be very poor when the autocorrelation is large. Thirdly, a SAR(1) model with autocorrelation parameter

Overheads ECON232 Heteroskedasticity Handout

Bayesian Heteroskedasticity-Robust Regression …econ.ucsb.edu/~startz/Bayesian Heteroskedasticity-Robust...Bayesian Heteroskedasticity-Robust Regression Richard Startz* revised February

Autocorrelation & SSD