21
Journal of Econometrics 133 (2006) 421–441 The power of bootstrap and asymptotic tests Russell Davidson a,b , James G. MacKinnon c, a Department of Economics, McGill University, Montreal, Que., Canada, H3A 2T7 b GREQAM, Centre de la Vieille Charite´, 2 rue de la Charite´, 13002 Marseille, France c Department of Economics, Queen’s University, Kingston, Ont., Canada, K7L 3N6 Available online 1 August 2005 Abstract The bootstrap discrepancy measures the difference in rejection probabilities between a bootstrap test and one based on the true distribution. The order of magnitude of the bootstrap discrepancy is the same under the null hypothesis and under non-null processes described by Pitman drift. If the test statistic is not an exact pivot, critical values depend on which data- generating process (DGP) is used to determine the null distribution. We propose using the DGP which minimizes the bootstrap discrepancy. We also show that, under an asymptotic independence condition, the power of both bootstrap and asymptotic tests can be estimated cheaply by simulation. r 2005 Elsevier B.V. All rights reserved. JEL classification: C12; C15 Keywords: Hypothesis test; Pitman drift; Bootstrap DGP; KLIC; Test power 1. Introduction In recent years, it has become common to use the bootstrap to perform hypothesis tests in econometrics. Its use for this purpose has been advocated by Horowitz (1994, 1997), Davidson and MacKinnon (1999), and several others. If the bootstrap is to ARTICLE IN PRESS www.elsevier.com/locate/jeconom 0304-4076/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2005.06.002 Corresponding author. Tel.: +1 613 533 2293; fax: +1 613 533 2257. E-mail addresses: [email protected] (R. Davidson), [email protected] (J.G. MacKinnon).

The power of bootstrap and asymptotic tests

Embed Size (px)

Citation preview

ARTICLE IN PRESS

Journal of Econometrics 133 (2006) 421–441

0304-4076/$ -

doi:10.1016/j

�CorrespoE-mail ad

(J.G. MacKi

www.elsevier.com/locate/jeconom

The power of bootstrap and asymptotic tests

Russell Davidsona,b, James G. MacKinnonc,�

aDepartment of Economics, McGill University, Montreal, Que., Canada, H3A 2T7bGREQAM, Centre de la Vieille Charite, 2 rue de la Charite, 13002 Marseille, France

cDepartment of Economics, Queen’s University, Kingston, Ont., Canada, K7L 3N6

Available online 1 August 2005

Abstract

The bootstrap discrepancy measures the difference in rejection probabilities between a

bootstrap test and one based on the true distribution. The order of magnitude of the bootstrap

discrepancy is the same under the null hypothesis and under non-null processes described by

Pitman drift. If the test statistic is not an exact pivot, critical values depend on which data-

generating process (DGP) is used to determine the null distribution. We propose using the

DGP which minimizes the bootstrap discrepancy. We also show that, under an asymptotic

independence condition, the power of both bootstrap and asymptotic tests can be estimated

cheaply by simulation.

r 2005 Elsevier B.V. All rights reserved.

JEL classification: C12; C15

Keywords: Hypothesis test; Pitman drift; Bootstrap DGP; KLIC; Test power

1. Introduction

In recent years, it has become common to use the bootstrap to perform hypothesistests in econometrics. Its use for this purpose has been advocated by Horowitz (1994,1997), Davidson and MacKinnon (1999), and several others. If the bootstrap is to

see front matter r 2005 Elsevier B.V. All rights reserved.

.jeconom.2005.06.002

nding author. Tel.: +1613 533 2293; fax: +1 613 533 2257.

dresses: [email protected] (R. Davidson), [email protected]

nnon).

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441422

benefit from asymptotic refinements, the original test statistic must be asymptoticallypivotal under the null hypothesis, that is, its asymptotic distribution under the nullmust not depend on any unknown features of the data generating process (DGP).With such a test statistic, the errors committed by using the bootstrap are generallyof an order lower by a factor of either n�1=2 or n�1, where n is the sample size, thanthe errors committed by relying on asymptotic theory; see Beran (1988), Hall (1992,Section 3.12), or Davidson and MacKinnon (1999).

A convenient way to perform bootstrap inference is to compute bootstrap P

values. After computing a test statistic, say t, in the usual way, one uses a randombootstrap DGP, denoted by m� and constructed so as to satisfy the null hypothesisunder test, to generate B bootstrap samples, each of which is used to compute abootstrap test statistic t�j ; j ¼ 1; . . . ;B. The bootstrap P value may then be estimatedby the proportion of bootstrap statistics that are more extreme than t. As B!1,this estimated bootstrap P value will tend to the ‘‘ideal’’ bootstrap P value p�ðtÞ,which is defined as

p�ðtÞ � Prm� ðRejðtÞÞ,

where RejðtÞ is the rejection region for a test for which the critical value is t. For aone-tailed test that rejects in the upper tail, for instance, RejðtÞ is just the set of realnumbers greater than t. In this paper, we ignore the fact that p�ðtÞ has to beestimated. The effect of the estimation error can be made as small as desired byappropriate choice of B; see Davidson and MacKinnon (2000).

If the original data are generated by a DGP m, the ‘‘true’’ P valuepðtÞ � PrmðRejðtÞÞ, which is just a deterministic function of t, is by construction adrawing from the uniform distribution Uð0; 1Þ. But, since the bootstrap DGP m� is afunction of the data, the bootstrap P value p�ðtÞ is in general drawn from a differentdistribution. Consequently, the rejection probability (RP) of a bootstrap test atnominal level a is in general different from a, even when m satisfies the nullhypothesis under test.

It is natural to ask whether bootstrapping a test has any effect on its power.Answering this question is complicated by the fact that asymptotic tests often sufferfrom substantial size distortion. In simulation studies, it is common to adjust for thisdistortion by using critical values for which the RP under some DGP m0 that satisfiesthe null hypothesis is exactly equal to the desired nominal level. With statistics thatare not exactly pivotal, the adjustment depends on the specific choice of m0.

Conventional asymptotic power analysis relies on the notion of a drifting DGP,which, as the sample size tends to infinity, drifts to m0. In order to study thedifference between the power of a bootstrap test and the adjusted power of theasymptotic test on which it is based, which we call the bootstrap discrepancy, asuitable drifting DGP must be chosen. We demonstrate in Section 2 that, for anychoice of drifting DGP, the bootstrap discrepancy may be of either sign and is, ingeneral, of the same order in n as the size distortion of the bootstrap test.

In Section 3, we consider how best to choose the drifting DGP. We argue that theobjective should be minimization of the bootstrap discrepancy, and we show thatthis is feasible only if t and m� are asymptotically independent in a sense that we

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 423

make precise. In Davidson and MacKinnon (1999), we showed that asymptoticindependence of this sort leads to a reduction in the order of bootstrap sizedistortion. We characterize a class of drifting DGPs that serves to extend this resultto the bootstrap discrepancy.

In Section 4, we propose an extension to power analysis of a procedure given inDavidson and MacKinnon (2001) for estimating the RP of a bootstrap test bysimulation. This procedure, which is conceptually simple and computationallyinexpensive, allows one to estimate the power of bootstrap and asymptotic testsinexpensively in Monte Carlo experiments. In Section 5, we present some MonteCarlo results for tests of omitted variables in a logit model. Section 6 concludes.

2. Power of bootstrap and asymptotic tests

Suppose that a test statistic t has a fixed known asymptotic distribution under thenull hypothesis, represented by a probability measure P1 defined on the real line andabsolutely continuous with respect to Lebesgue measure. It is convenient to replace t

by another test statistic, which we denote by t, of which the nominal asymptoticdistribution is uniform on ½0; 1�. This is most conveniently done by replacing t by itsasymptotic P value, so that t is given by P1ðRejðtÞÞ, the probability mass in the partof the asymptotic distribution that is more extreme than t. The asymptotic test basedon t rejects the null hypothesis at level a whenever toa. In the remainder of thissection, without loss of generality, we consider statistics in this P value form.

To discuss power, we must consider DGPs that do not satisfy the null hypothesis.The asymptotic theory of power makes use of nonnull drifting DGPs, which aredetermined by a DGP belonging to the null, plus a perturbation that is usuallyOðn�1=2Þ; see Davidson and MacKinnon (1993, Chapter 12). The appropriate rate,usually n�1=2, at which a nonnull DGP m drifts towards the null is chosen so that theRP of the test associated with t tends neither to 0 nor to 1 as n!1 for levels adifferent from 0 or 1. Note that, even if the test is associated with a specificalternative hypothesis, an asymptotic power analysis does not require that a nonnulldrifting DGP should belong to it; see Davidson and MacKinnon (1987).1

In what follows, we limit ourselves to parametric null hypotheses. By this, wemean that the set of DGPs M0 that satisfy the null are in one–one correspondencewith the elements of a k-dimensional parameter spaceY. We assume that the driftingDGP of interest to us can be embedded in a ðk þ 1Þ-dimensional model, comprised ofthe DGPs in the setM1, parametrized byY�U , where U is an interval in R with theorigin as an interior point. The DGP that corresponds to the parameter vectorðh; dÞ 2 Y�U belongs to the null model if and only if d ¼ 0. The drifting DGP itselfis such that, for sample size n, it is characterized by the parametersðh0 þ n�1=2t; n�1=2d0Þ, for some h0 2 Y and d0 2 U , and some k-vector t. This

1In that paper, we did not use the term ‘‘drifting DGP’’. Rather, we spoke of a ‘‘sequence of local

DGPs’’. We much prefer the newer terminology, which has the advantage of making clear the link with

Pitman drift.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441424

drifting DGP drifts towards the DGP in M0, denoted m0, that corresponds to theparameters ðh0; 0Þ for all n. Such a DGP, where the parameters are independent of n,will be called a fixed DGP.

The parametric bootstrap DGP m� is the DGP whose parameters are given by anestimator h that is consistent under the null, and d ¼ 0, so that m� 2M0 byconstruction. The maximum likelihood estimator of the model M0 is asymptoticallyefficient, and so it is a sensible choice for h, but other consistent estimators, forinstance the MLE for a model that represents the alternative hypothesis for the test,can be used without affecting the results to be developed in this section. Since h

depends on n, so does m�, which is thus a drifting DGP that drifts entirely within M0.Under a fixed DGP m 2M0 with parameter vector h0, plim h ¼ h0, and so m� driftstowards m0. In fact, under weak regularity conditions to be specified later, h also hasa plim of h0 under DGPs that drift to h0, from which it follows that m� drifts to m0 inthis case as well.

We can analyze the performance of a test based on t for a given sample size n byuse of two functions that depend on the nominal level a of the test and the DGP m.The first of these is the rejection probability function, or RPF. This function, whichgives the true rejection probability under m of a test at nominal level a, is defined as

Rða;mÞ � PrmðtpaÞ. (1)

In this definition, everything except a should properly be indexed by n, but we preferan uncluttered notation without this explicit indexing. Throughout, we assume that,under any DGP m we consider, the distribution of t has support ½0; 1� and isabsolutely continuous with respect to the uniform distribution on that interval.

For given m and n, Rða; mÞ is just the cumulative distribution function, or CDF, oft evaluated at a. The inverse of the RPF is the critical value function, or CVF, whichis defined implicitly by the equation

PrmðtpQða; mÞÞ ¼ a. (2)

It is clear from (2) that Qða;mÞ is the a quantile of the distribution of t under m. Inaddition, the definitions (1) and (2) imply that

RðQða; mÞ;mÞ ¼ QðRða;mÞ;mÞ ¼ a (3)

for all a and m. As n!1, Rða;mÞ and Qða; mÞ both tend to a for all DGPs m 2M0. Ifan asymptotic test is exact in finite samples, then we have Rða; mÞ ¼ a and Qða;mÞ ¼ afor all a, for all n, and for all m in the null.

The bootstrap critical value for t at nominal level a is Qða;m�Þ. This is a randomvariable which is asymptotically nonrandom and equal to a, since, whether or not thetrue DGP belongs to the null hypothesis, the parametric bootstrap DGP m� does so.Any size distortion of the bootstrap test under a DGP m in the null arises from thepossibility that, in a finite sample, Qða;m�ÞaQða;mÞ.

A bootstrap test based on t rejects at nominal level a if toQða; m�Þ. Therefore,applying the increasing transformation Rð�;m�Þ to both sides and using (3), we seethat the bootstrap test rejects whenever

Rðt;m�ÞoRðQða; m�Þ; m�Þ ¼ a. (4)

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 425

Thus the bootstrap P value is just Rðt; m�Þ. This can be interpreted as a bootstrap teststatistic. The probability under m that the bootstrap test rejects at nominal level a is

PrmðtoQða; m�ÞÞ ¼ PrmðRðt; m�ÞoaÞ. (5)

For all sample sizes, and for all DGPs, fixed or drifting, in M1, let the randomvariable p be defined by

p ¼ Rðt; mÞ. (6)

Since Rð�;mÞ is the CDF of t under m, p is distributed as Uð0; 1Þ for all n and for all m.Further, define the random variable q as

q ¼ RðQða;m�Þ;mÞ � RðQða; m0Þ;mÞ, (7)

where if m is a fixed DGP in M0, m0 ¼ m. If instead m is a drifting DGP, then m0 is thefixed DGP inM0 to which it drifts. Instead of applying the transformation Rð�; m�Þ toboth sides of the inequality toQða; m�Þ, as we did to obtain (4), we can apply thetransformation Rð�; mÞ to both sides of this inequality. When we do this and use (6),we see that rejection by the bootstrap test is equivalent to the inequality

poRðQða;m0Þ; mÞ þ q. (8)

The first term on the right-hand side of (8) is the RP under m of the asymptotic testwhen the true a-level critical value of the DGP m0 is used. If m ¼ m0, it is equal to a. Ifnot, it is what would usually be called the size-corrected, or level-adjusted,2 power ofthe asymptotic test under m at level a.

From Eq. (7), it is clear that q is just the difference in the RPs under m according towhether the bootstrap critical value or the critical value correct for m0 is used. Sincem� converges to m0 as n!1, it follows that q tends to zero asymptotically. The rateat which q! 0 depends on the extent of bootstrap refinements.

In the analysis that follows, we abbreviate RðQða;m0Þ; mÞ to just R. The marginaldistribution of p under m is, by construction, just Uð0; 1Þ. Let F ðqjpÞ denote the CDFof q conditional on p. The RP of the bootstrap test under m is then

PrmðpoRþ qÞ ¼ EmðPrmðq4p� RjpÞÞ

¼ Emð1� F ðp� RjpÞÞ

¼ 1�

Z 1

0

F ðp� RjpÞdp, ð9Þ

where the last line follows because p�Uð0; 1Þ.The values of Rð�; �Þ must belong to the interval ½0; 1�, and so the support of the

random variable q is the interval ½�R; 1� R�. This implies that, for any p 2 ½0; 1�,

F ð�RjpÞ ¼ 0 and F ð1� RjpÞ ¼ 1. (10)

2The former term is probably more common in the econometrics literature, even if its use of the word

‘‘size’’ is incorrect in most contexts. Horowitz and Savin (2000) use the expression ‘‘Type I critical value’’

to refer to Qða; m0Þ.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441426

On integrating by parts in (9) and changing variables, we find using (10) that the RPof the bootstrap test is

1� ½pF ðp� RjpÞ�10 þ

Z 1

0

p dF ðp� RjpÞ ¼

Z 1�R

�R

ðxþ RÞdF ðxjRþ xÞ

¼ Rþ

Z 1�1

xdF ðxjRþ xÞ. ð11Þ

We refer to the integral in (11) as the bootstrap discrepancy. When m belongs to thenull, the bootstrap discrepancy is the error in rejection probability (ERP) of thebootstrap test, and so it tends to zero as n!1 at least as fast as q. When m is anonnull drifting DGP, the bootstrap discrepancy is the difference between the RP ofthe bootstrap test at nominal level a and that of the level-adjusted asymptotic test.

The following theorem shows that the bootstrap discrepancy tends to zero asn!1 at the same rate under drifting DGPs as under the null.

Theorem 1. Let t be a test statistic with asymptotic distribution Uð0; 1Þ under all

DGPs in a finite-dimensional null hypothesis model M0, with parameter space Y � Rk.Let M1 be a ðk þ 1Þ-dimensional model with parameter space Y�U , where U � R

contains the origin as an interior point, for which the set of DGPs characterized by the

parameters ðh; 0Þ are the DGPs of M0. Let h be an estimator of h 2 Y that is root-nconsistent under the null. Under regularity conditions specified in the appendix, the

bootstrap discrepancy, as defined above in (11), for a parametric bootstrap test based

on h, has the same rate of convergence to zero as the sample size n tends to infinity for

all levels a and for all drifting DGPs in M1 characterized by the sequence of parameters

ðh0 þ n�1=2t; n�1=2d0Þ, for some h0 2 Y and d0 2 U , and some k-vector t.

All proofs are found in the appendix.

Remarks. 1. The bootstrap discrepancy, for a nonnull DGP, is the differencebetween the RP of the bootstrap test at nominal level a and the RP of the level-adjusted asymptotic test. A theoretical comparison of the two tests might better bebased on the RP of the level-adjusted bootstrap test. Let Dða;mÞ denote the bootstrapdiscrepancy at level a for a DGP m that drifts to m0 2M0. The nominal level a0 atwhich the RP of the bootstrap test is exactly a under m0 satisfies the equationa0 þDða0; m0Þ ¼ a. Thus

a0 � a ¼ �Dða0;m0Þ (12)

and so a0 � a is of the same order as the bootstrap discrepancy. The bootstrap RP atnominal level a0 is, by (11), RðQða0;m0Þ; mÞ þDða0;mÞ. Thus the difference between thisRP and the level-adjusted RP of the asymptotic test, which is RðQða;m0Þ;mÞ, is

Dða0; mÞ þ ðRðQða0;m0Þ; mÞ � RðQða;m0Þ; mÞÞ. (13)

In the regularity conditions for Theorem 1, we assume that both R and Q arecontinuously differentiable with respect to their first argument. It therefore followsfrom (12) that the two terms in expression (13) are of the same order, namely, that ofthe bootstrap discrepancy.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 427

2. The preceding remark implies that three quantities all tend to zero at the same rateas n!1. They are (i) the ERP of the bootstrap test under the null, (ii) thedifference under a nonnull drifting DGP between the power of the bootstrap test atnominal level a and the level-adjusted power of the asymptotic test, and (iii) thedifference between the level-adjusted power of the bootstrap test and that of theasymptotic test. Just what the common rate of convergence is, expressed as anegative power of n, depends on the extent of bootstrap refinement.

A result similar to part of this result was obtained by Horowitz (1994), whoshowed that, if Rð�;mÞ converges to the asymptotic distribution of t at rate n�j=2 forDGPs m in the null, then the difference Rða;m�Þ � Rða;m0Þ is of order n�ðjþ1Þ=2 inprobability. This is normally, but not always, also the order of the ERP of thebootstrap test.

3. Explicit expressions for the bootstrap discrepancy to leading order can often beobtained with the aid of Edgeworth expansions. See Hall (1988, 1992) forbackground. An explicit example is found in Abramovitch and Singh (1985), wherethe statistic is the t statistic for the mean of an IID sample. These authors express thebootstrap discrepancy under a nonnull drifting DGP indirectly in terms of theHodges–Lehmann deficiency.

4. Although the bootstrap discrepancy is usually difficult to compute, it has anintuitive interpretation. Because the density of q is very small except in a shortinterval around 0, the second term in (11) can be approximated by

R1�1

xdF ðxjRÞ,that is, the expectation of q conditional on p being equal to R. By (6), p ¼ R isequivalent to t ¼ Qða;m0Þ, that is, to the condition that the statistic is at the marginbetween rejection and nonrejection using the critical value correct for m0. Taylorexpansion of (7) around m0 then shows that the bootstrap discrepancy isapproximately the bias of the bootstrap critical value, Qða; m�Þ, thought of as anestimator of the critical value Qða;m0Þ, conditional on being at the margin ofrejection, scaled by the sensitivity of the RP to the critical value.

5. In general, neither the bootstrap discrepancy nor the difference (13) in level-adjusted powers can be signed. Thus, in any particular finite-sample case, either theasymptotic test or the bootstrap test could turn out to be more powerful.

6. If the statistic t is exactly pivotal under the null hypothesis for any sample size,then Qða;mÞ does not depend on m if m is in the null. Since the bootstrap DGP m� is byconstruction in the null, it follows that the random variable q of (7) is identically zeroin this case, and so also, therefore, the three quantities of Remark 2.

3. Choice of a drifting DGP

We are usually interested in practice in a particular sample size, say N, and weconduct asymptotic analysis as an approximation to what happens for a DGP mN

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441428

defined just for that sample size. The drifting DGP used in asymptotic poweranalysis is a theoretical construct, but, as we will see in the simulations presented inSection 5, the bootstrap discrepancy can vary greatly with the specific choice ofdrifting DGP.

The parametrization ðh; dÞ of the extended model M1 is not necessarily welladapted to the estimator h, since this estimator is not in general consistent for h

except for DGPs inM0. We therefore introduce the following reparametrization. Foreach fixed DGP m 2M1, let / ¼ plimm h. The modelM1 is now to be parametrized by/ and d. By construction, h is consistent for / over the full extended model M1. Forthe null model M0, d ¼ 0, and the h and / parametrizations coincide.

Consider a drifting DGP m constructed as follows. We start with a DGP mN 2M1

defined for sample size N. We require first that the parameters of m for sample size N

should be those, say ð/; dÞ, that characterize mN in the new parametrization. Then,for any sample size n, the parameters are specified as

ð/; ðn=NÞ�1=2dÞ. (14)

An important property of such a drifting DGP is given in the following theorem:

Theorem 2. Assume the regularity conditions of Theorem 1. Under a drifting DGP with

parameters as specified in (14), the asymptotic distribution of n1=2ðh� /Þ is normal,and is the same as under the fixed DGP with parameters ð/; 0Þ. There is a positive

integer j such that, for the random variable q of (7), nðjþ1Þ=2q is asymptotically normal

with asymptotic expectation of zero.

Davidson and MacKinnon (1999) show that, if, under a DGP m0 2M0 associatedwith the parameter vector h0, the estimator h which determines the bootstrap DGPm� is such that n1=2ðh� h0Þ and t are independent under their joint asymptoticdistribution, then the ERP of the bootstrap test, which is the bootstrap discrepancyunder the null, converges to zero at a rate faster by at least n�1=2 than when thisasymptotic independence does not hold. It is natural to enquire whether this morerapid convergence extends to drifting DGPs. The next theorem shows that it does ifthe drifting DGP is of the type given by (14).

Theorem 3. Under the regularity conditions of Theorem 1, if n1=2ðh� h0Þ and t are

independent under their joint asymptotic distribution for all DGPs in M0, then the rate

of convergence to zero of the bootstrap discrepancy as the sample size n!1 is faster

than that of the random variable q defined in (7) by a factor of n�1=2 or better for all

DGPs in M0 and for all drifting DGPs in M1 of type (14).

A special case of particular interest arises if the estimator h is the MLE for modelM0. In this case, it is well known that, under any DGP in M0, h is asymptoticallyindependent of any classical test statistic that tests whether M0 is correctly specified.

It is possible in this case to give a meaningful characterization of the ð/; dÞparametrization of the extended model M1. Consider a DGP m1 2M1 associatedwith parameters ðh; dÞ. Let ð/; dÞ be the corresponding parameters in thereparametrization induced by the MLE for M0. Thus / is the probability limitunder m1 of the quasi-maximum likelihood estimator (QMLE) for M0. The

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 429

Kullback–Leibler information criterion (KLIC) is an asymmetric measure of thedistance from one DGP, defined for a given sample size, to another. Let the firstDGP be m1 for some sample size n, and consider the problem of minimizing theKLIC from m1 to a DGP in M0 for the same n. By definition of the KLIC, theparameters of the minimizing DGP maximize the expectation under m1 of theloglikelihood function of the null model for sample size n. Let these parameters be/n. The reason for this notation is that White (1982) showed that plim/n ¼ /.

The parameters / are usually called the pseudo-true parameters for m1. We refer tothe fixed DGP m0 2M0 with parameters / as the pseudo-true DGP. Theorem 3 tellsus that the more rapid convergence to zero of the bootstrap discrepancy for classicaltest statistics, resulting from their asymptotic independence of the MLE for the nullmodel, extends to DGPs that drift from a given nonnull DGP for sample size N tothe corresponding pseudo-true DGP according to the drift (14).

A slight modification of the scheme (14) leads to a drifting DGP starting from m1for sample size N with the property that the endpoint m0 is the pseudo-true DGP notonly for sample size N but for all sample sizes. Quite generally, let the pseudo-trueparameters associated with the DGP with parameters ðh; dÞ be ðPnðh; dÞ; 0Þ for samplesize n. Clearly, Pnðh; 0Þ ¼ h for all n. Let Fnðh; dÞ be the inverse of Pn for given d, sothat PnðFnðh; dÞ; dÞ ¼ h and FnðPnðh; dÞ; dÞ ¼ h. Then, for sample size n, the driftingDGP has parameters

ðFnð/; ðn=NÞ�1=2dÞ; ðn=NÞ�1=2dÞ, (15)

where ð/; dÞ are the parameters in the / parametrization for m1 at the referencesample size N. It is clear that, as n!1, (15) drifts towards ð/; 0Þ.

Although the drifting DGP (15) does not follow (14), the following corollaryshows that the result of Theorem 3 continue to hold for (15). Moreover, thebootstrap discrepancy is the same to leading order for (15) and the DGP that driftsfrom m1 for sample size N to m0 according to (14). Drifting DGPs for which thebootstrap discrepancy is the same to leading order will be called asymptotically

equivalent.

Corollary. The bootstrap discrepancy is the same to leading order for the drifting DGP

with / parameters ð/; n�1=2dÞ for sample size n and for drifting DGPs for which the

parameters are ð/þ n�1=2pn; n�1=2dÞ, if pn tends to zero as n!1.

The question of what null DGP m0 is most appropriate for the level adjustment ofeither a bootstrap or an asymptotic test does not seem to have an unambiguousanswer in general. If, for a given nominal level a, there exists a m0 2M0 whichmaximizes the RP of the test based on t, then there are good arguments for basinglevel adjustment on this m0, in which case one can legitimately speak of ‘‘sizeadjustment.’’ However, as pointed out by Horowitz and Savin (2000), such a m0 maynot exist, or, if it does, it may be intractable to compute its parameters, or it may leadto a size-adjusted power no greater than the size. In addition, m0 will in generaldepend on a; m1, and n, thereby making such size adjustment essentially impossible inpractice.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441430

Study of the power of asymptotic tests is usually based on Monte Carloexperiments. As Horowitz and Savin (2000) point out, it is common for such studiesto perform some sort of level adjustment, but most do so on the basis of anessentially arbitrary choice of the null DGP m0 used to generate critical values.Horowitz and Savin are critical of level adjustment in Monte Carlo experimentsbased on anything other than the m0 2M0 with parameters given by the plim of h

under the DGP m1 for which power is to be studied. The thrust of their argument isthat, since only the bootstrap offers any hope of performing level adjustment withany reasonable accuracy in practice, level adjustment in Monte Carlo experiments, tobe meaningful, should in the large-sample limit coincide with the bootstrap leveladjustment. This is the case for a parametric bootstrap based on h if m1 is thought ofas a fixed DGP, since then the parameters of the bootstrap DGP converge as n!1

to those of m0.It is illuminating to examine this argument in the light of the results of this paper.

Asymptotic analysis of power is not feasible with fixed nonnull DGPs, which is whywe have considered drifting DGPs. But if all that is required of these is that they startat m1 for a given sample size, and drift to some DGP in the null, then the bootstrapDGP will also drift to that null DGP, which might therefore seem to be indicated forlevel adjustment. Such a conclusion would clearly be unsatisfactory.

The bootstrap is usually the best way to do level adjustment in practice. Therefore,if Monte Carlo experiments on level-adjusted power are to be informative, we shouldtry to do level adjustment in experiments using a null DGP m0 that in some senseminimizes the bootstrap discrepancy for DGPs that drift to it from m1. In this way,one would minimize the difference between the rejection probability RðQða; m0Þ;m1Þ,which can be estimated with arbitrary accuracy by simulation for any given m1 andm0, and the RP of the bootstrap test. Finite-sample simulation results would then beas close as possible to the actual behavior of the bootstrap.

It is through the random variable q of (7) that the bootstrap discrepancy dependson m0. To leading order, the discrepancy is the expectation of q conditional on thestatistic t being at the margin of rejection. Although Theorem 2 shows that nðjþ1Þ=2q

has asymptotic expectation of 0 under drifting DGPs of type (14), the conditionalexpectation is different from 0 unless nðjþ1Þ=2q and t are asymptotically uncorrelated,and is not in general smaller for DGPs of type (14) than for other drifting DGPs.

If nðjþ1Þ=2q and t are asymptotically uncorrelated, then, by Theorem 3, thebootstrap discrepancy is an order of magnitude smaller under (14) than under otherdrifting DGPs. However, the DGP m0 used for level adjustment is still dependent onthe specific estimator h used to define the bootstrap DGP. If h is not asymptoticallyequivalent to the MLE for M0, then m0 is not the null DGP that minimizes the KLICfrom m1. It is still possible that t is asymptotically independent of an asymptoticallyinefficient h, in which case m0 minimizes the bootstrap discrepancy, and so shouldcertainly be used in Monte Carlo experiments. There is, however, no unique choiceof m0 that minimizes the discrepancy for all root-n consistent estimators h. The storyis clean only when h is the MLE forM0, or is asymptotically equivalent to it. Then m0is uniquely defined in a way that is independent of the parametrization of M0, sincethe KLIC and the KLIC-minimizing DGP are parametrization independent.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 431

Many applications of the bootstrap use, not the parametric bootstrap DGP wehave considered, but a DGP that is at least partially nonparametric, based on somesort of resampling. Although we conjecture that much of the analysis of this sectionapplies as well to the nonparametric bootstrap, there are technical difficulties in theway of proving this. For Theorem 1, these arise in connection with the LANproperty (see the appendix) when DGPs in the null hypothesis are not uniquelycharacterized by a finite-dimensional parameter, and for Theorem 2, there areanalogous difficulties with the LAE property. Beran (1997) makes use of aningenious construction to sidestep the problem, and it seems likely that a similartechnique would work here. For Theorem 3, the main difficulty is that asymptoticindependence of t and m� cannot, in general, be achieved simply by using a classicaltest statistic and an asymptotically efficient estimator under the null.

4. Approximate bootstrap rejection probabilities

The quantity RðQða;m0Þ;mÞ, which is the power of an asymptotic test based on tagainst the DGP m at level a when level adjustment is based on the null DGP m0, canbe straightforwardly estimated by simulation. For each of M replications, computetwo test statistics, one of them generated by m and the other by m0. Find the criticalvalue tc such that the rejection frequency in the M replications under m0 is a; tc is ourestimate of Qða;m0Þ. RðQða;m0Þ; mÞ is then estimated by the rejection frequency underm with critical value tc. If desired, we can study how power depends on level using a‘‘size-power curve,’’ as suggested by Davidson and MacKinnon (1998).

Exactly this sort of simulation experiment was suggested by Horowitz (1994) toestimate expression (5), the power of a bootstrap test. However, this ignores thebootstrap discrepancy. The obvious, but computationally expensive, way to estimate(5) taking account of the bootstrap discrepancy is to generate M sets of data,indexed by m, from m, and for each to compute a test statistic tm and a bootstrapDGP m�m. For each m, generate B statistics from m�m and find the critical valueQða; m�mÞ such that a fraction a of the bootstrap statistics are more extreme than it.Then the estimate of (5) is

1

M

XMm¼1

Iðtm 2 RejðQða;m�mÞÞÞ, (16)

the fraction of the M replications for which the bootstrap statistic leads to rejection.Horowitz also performed some simulations of this sort.

The procedure just described, which, of course, does not require the statistic t tohave an approximate Uð0; 1Þ distribution, involves the calculation of MðBþ 1Þ teststatistics. Since the power of bootstrap tests is increasing in B (see Davidson andMacKinnon, 2000) we probably do not want to use B less than about 399, in whichcase this procedure is roughly 200 times as expensive as the one described above foran asymptotic test.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441432

We now propose a method for estimating the power of bootstrap tests that takes(approximate) account of the bootstrap discrepancy at computational cost similar tothat required for the level-adjusted power of an asymptotic test. The conditions ofTheorem 3 are assumed, namely, that the parameters of the null DGP m0 are given bythe plim of h under m, and that t and nðjþ1Þ=2q are asymptotically independent. Themethod has the further very considerable advantage that it does not requirecalculation of the parameters of m0, which can be extremely difficult for manymodels. It is a slight modification of a method proposed in Davidson andMacKinnon (2001) for approximating the RP of a bootstrap test under the null.

From (11) it can be seen that the RP of a bootstrap test under m is RðQða; m0Þ;mÞplus the bootstrap discrepancy, which to leading order is just the expectation of q

under the asymptotic independence assumption. Thus, using (7), the definition of q,the RP of the bootstrap is EmðRðQða;m�Þ;mÞÞ to leading order. Davidson andMacKinnon (2001) proposed estimating this quantity as follows. For eachreplication, compute tm and m�m as before, but now generate just one bootstrapsample and use it to compute a single test statistic, t�m. Then calculate Q

�ðaÞ, the a

quantile of the t�m. The approximate rejection probability is then

cRPA �1

M

XMm¼1

Iðtm 2 RejðQ�ðaÞÞÞ. (17)

The only difference between (16) and (17) is that the tm are compared to differentestimated critical values inside the indicator functions. If the tm are independent ofthe m�m, it should not make any difference whether we estimate the a quantile Qða; m�mÞseparately for each m, or use the a quantile Q�ðaÞ of all the t�m taken together. Howwell (17) approximates (16) in finite samples depends on how close t is to beingindependent of m�. We do not claim that cRPA works well in all circumstances.However, numerous simulation experiments, some of which are discussed in the nextsection, suggest that it often works very well in practice.

The amount of computation required to compute cRPA is very similar to thatrequired for an asymptotic test: once again, we have to calculate 2M test statistics.But the cRPA procedure is often a good deal simpler, because there is no need tocalculate the parameters of m0 explicitly. When that is difficult, we can useHorowitz’s (1994) argument in reverse, and claim that, with error no greater than theorder of the bootstrap discrepancy, cRPA estimates the level-adjusted power of theasymptotic test.

5. Testing for omitted variables in a logit model

In this section, we present the results of several Monte Carlo experiments, whichdeal with Lagrange multiplier tests for omitted variables in the logit model. We choseto examine the logit model for several reasons: it is not a regression model, the resultsof Horowitz (1994) and Davidson and MacKinnon (1998) suggest that, forinformation matrix tests in the closely related probit model, bootstrapping may

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 433

greatly improve the finite-sample properties of one form of the LM test, and, incontrast to many other models, it is easy to calculate the pseudo-true DGP for thisone.

The logit model that we are dealing with may be written as

EðytjX t;Z tÞ ¼ F ðX tbþ Z tcÞ � ð1þ expð�X tb� Z tcÞÞ�1, (18)

where yt is an observation on a 0� 1 dependent variable, X t and Z t are, respectively,a 1� k vector and a 1� r vector of regressors, and b and c are corresponding vectorsof unknown parameters. The null hypothesis that c ¼ 0 may be tested in severalways. Two of the easiest are to use tests based on artificial regressions. The first ofthese is the outer product of the gradient, or OPG, variant of the LM test, and thesecond is the efficient score, or ES, variant. No sensible person would use the OPGvariant in preference to the ES variant without bootstrapping, since the asymptoticform of the OPG variant has considerably worse finite-sample properties under thenull (Davidson and MacKinnon, 1984). However, in the related context ofinformation matrix tests for probit models, Horowitz (1994) found that the OPGvariant worked well when bootstrapped, although he did not compare it with the ESvariant.

Suppose that we estimate the logit model (18) under the null hypothesis to obtainrestricted ML estimates ~b and use them to calculate ~F t � F ðX t

~bÞ and ~f t � f ðX t~bÞ,

where f ð�Þ is the first derivative of F ð�Þ. Then the OPG test statistic is n minus the sumof squared residuals from the artificial regression with typical observation

1 ¼~f tðyt �

~FtÞ

~Ftð1� ~F tÞ

Xk

i¼1

X tibi þXr

i¼1

Ztigi

!þ residual, (19)

and the ES test statistic is the explained sum of squares from the artificial regressionwith typical observation

yt �~Ft

ð ~F tð1� ~F tÞÞ1=2¼

~f t

ð ~F tð1� ~FtÞÞ1=2

Xk

i¼1

X tibi þXr

i¼1

Ztigi

!þ residual. (20)

In regressions (19) and (20), the bi and gi are parameters to be estimated, and the firstfactors on the right-hand side are weights that multiply all the regressors.

In order to level-adjust the tests, it is necessary to compute the parameters of thepseudo-true DGP that corresponds to whatever nonnull DGP actually generated thedata. If log hðy;bÞ denotes the loglikelihood evaluated at the parameters of a DGP inthe null, and log gðy;b1; c1Þ the loglikelihood at the parameters of the nonnull DGP,then the KLIC is

Eðlog gðy;b1; c1Þ � log hðy;bÞÞ, (21)

where the expectation is computed under the nonnull DGP. The parametervector for the pseudo-true DGP is the b0 that minimizes (21) with respect to b.For the model (18), this is just the vector that maximizes the expectation of

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441434

log hðy;bÞ, namely,

Xn

t¼1

ðF ðX tb1 þ Z tc1Þ log F ðX tbÞ þ ð1� F ðX tb1 þ Z tc1ÞÞ logð1� F ðX tbÞÞÞ.

(22)

Here we have used the fact that EðytjX t;Z tÞ ¼ F ðX tb1 þ Z tc1Þ.The first-order conditions for maximizing expression (22) are

Xn

t¼1

f ðX tb0ÞX t

F ðX tb0Þð1� F ðX tb0ÞÞðF ðX tb0Þ � F ðX tb1 þ Z tc1ÞÞ ¼ 0. (23)

These equations give us the relationship between the nonnull DGP, which ischaracterized by b ¼ b1 and c ¼ c1, the pseudo-true DGP, which is characterized byb ¼ b0 and c ¼ 0, and what we may call the naive null DGP, which is characterizedby b ¼ b1 and c ¼ 0. Given c1 and either b0 from the pseudo-true DGP or b1 fromthe naive null, we can solve Eqs. (23) for the other b vector.

In our experiments, the vector X t consisted of a constant term and two regressorsthat were distributed as Nð0; 1Þ, and the vector Z t consisted of eight regressors thatwere also Nð0; 1Þ. The two non-constant regressors in X t were uncorrelated, and theregressors in Z t were correlated with one or the other of these, with correlationcoefficient r. The number of regressors in Z t was chosen to be quite large in order tomake the tests, especially the OPG test, work relatively poorly. In order to allow usto plot one-dimensional power functions, we set c1 ¼ di, where i is a vector of 1s.Thus the only parameter of the DGP that we changed was d. Because the pseudo-true DGP depends on the regressors, we used a single set of regressors in all theexperiments, and all our experimental results are conditional on them.

In the experiments we report, we held the pseudo-true DGP constant and used Eq.(23) to vary b1 as d, and hence c1, varied. This procedure makes it relativelyinexpensive to level-adjust the bootstrap test, since there is only one pseudo-trueDGP to be concerned with. In a second set of experiments, of more conventionaldesign, we held the naive null DGP constant and varied b0 as d varied. Both sets ofexperiments yielded similar results, and we therefore do not report those from thesecond set.

In both sets of experiments, we set r ¼ �0:8. By making r fairly large, we ensurethat the pseudo-true and naive null DGPs will be quite different, except when thenonnull DGP is very close to the null hypothesis. Changing the sign of r wouldchange the results in a predictable way: the figures would be roughly mirror imagesof the ones presented here. In our experiments, the constant term in either b0 or b1 isset to 0, and the two slope coefficients are set to 1. Thus, under the null hypothesis,approximately half of the yt would be 0 and half would be 1. The sample size was125. This relatively large sample size was used in order to avoid having more than afew replications for which the logit routine failed to converge. Nonconvergence,which is caused by perfect classifiers, is more of a problem for smaller sample sizesand for larger values of d.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 435

Figs. 1 and 2 show estimated power functions for asymptotic and bootstrap testsat the 0:05 level, the former for the ES tests, and the latter for the OPG tests. Thesepower functions are based on 200,000 replications for a large number of values of dbetween �0:8 and 0:8 at intervals of 0:025. The unadjusted power function (the solidline) shows the power of the asymptotic test at the nominal 0.05 level. The twoadjusted power functions (the dotted lines) show adjusted test power, calculated intwo different ways. For all values of d, the naive adjustment method uses testperformance for ðb1; 0Þ as a benchmark. In contrast, the pseudo-true adjustmentmethod uses test performance for ðb0; 0Þ as a benchmark. The power functionestimated by the cRPA procedure is shown as a dashed line.

Figs. 1 and 2 also show the results of nine Monte Carlo experiments for bootstraptests. Each experiment involved 50,000 replications. We generated the data in exactlythe same way as before, using the DGP with parameters ðb1; diÞ, computed both teststatistics, and then used the parametric bootstrap based on 399 bootstrap samples toestimate a P value for each of them. The bootstrap DGP was simply the logit modelwith parameters b ¼ ~b and c ¼ 0, and the bootstrap test rejected the null hypothesiswhenever the estimated P value was less than 0:05. The bullets in the figures show theproportion of replications for which this procedure led to rejection. Fig. 2 also showsthe level-adjusted power of the bootstrap OPG test, based on the pseudo-true DGP.

From Fig. 1, we see that the ES test works so well as an asymptotic test that thereis no need to bootstrap it. There is essentially no difference between any of the powerfunctions, which suggests that the ES test statistic is nearly pivotal in this case.

In contrast, from Fig. 2, we see that the OPG test statistic is far from pivotal. Asthe theory predicts, the cRPA estimated power function is very similar to the oneadjusted using the pseudo-true DGP. However, both of these are quite different fromthe power function adjusted using the naive null. This confirms that the issues raisedin Section 3 are empirically relevant: the null DGP used for level adjustment canhave a substantial effect, and it is the pseudo-true DGP which yields a powerfunction close to that of the bootstrap test. Use of the naive null in fact leads to themisleading appearance of greater power for the OPG test for some negative values ofd. It is misleading because the ES and OPG statistics are based on the same empiricalscores, but the latter uses a noisier estimate of the information matrix, which shouldreduce its power.

The theory is also confirmed by the fact that, when the power of the bootstrap testis level-adjusted (because the test underrejects slightly), the correspondence of thepower function with the cRPA function is not as good as with no adjustment. This isas expected, since cRPA estimates the nominal power of the bootstrap test.

6. Conclusions

Level adjustment of the power of tests based on nonpivotal statistics yields resultsthat depend on the DGP in the null hypothesis used to provide a critical value. For agiven choice of this null DGP, we show that the power of a bootstrap test differsfrom the level-adjusted power of the asymptotic test on which it is based by an

ARTICLE IN PRESS

0.00 0.20 0.40 0.60 0.80

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00.................................................................... Unadjusted

. . . . . . . . . Adjusted, naive null

................ Adjusted, pseudo-true null

........................................................ RPA

Bootstrap

Pow

er

−0.80 −0.20−0.40−0.60

δ

Fig. 1. Power functions for logit ES tests.

–0.80 –0.60 –0.40 –0.20 0.00 0.20 0.40 0.60 0.80

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00.............................................................................. Unadjusted

. . . . . . . . . . Adjusted, naive null

................... Adjusted, pseudo- true null

................................................................ RPA

Bootstrap

Adjusted Bootstrap

δ

Pow

er

Fig. 2. Power functions for logit OPG tests.

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441436

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 437

amount that we call the bootstrap discrepancy. This discrepancy is of the same order,in the sample size n, as the size distortion of the bootstrap test itself.

Since the bootstrap constitutes the best way to do level adjustment in practice, itmakes sense to use critical values from a null DGP that minimizes the bootstrapdiscrepancy to do level adjustment in simulation experiments. In this way, power asmeasured by simulation in finite samples is a good approximation to the power of abootstrap test. The rate of convergence to zero of the bootstrap discrepancy whenthe sample size tends to infinity is analyzed in connection with different driftingDGPs, and we show that convergence is fastest when the test statistic isasymptotically independent of the bootstrap DGP and when a particular sort ofdrift towards a particular null DGP is used. This result serves to extend to theanalysis of power a previous result whereby the ERP of a bootstrap test is of lowerorder with asymptotic independence.

Level-adjusted power can be estimated efficiently by simulation if the appropriatenull DGP for providing critical values can readily be characterized. However, this isoften not the case. We propose a new approximate method that requires no suchcalculation and yields better estimates of the power of bootstrap tests by takingaccount of the bootstrap discrepancy.

Our theoretical results are confirmed and illustrated, for the case of tests foromitted variables in logit models, by simulation results which show that leveladjustment of our preferred type leads to power estimates close to the power ofbootstrap tests, while a cruder form of level adjustment may give quite differentresults.

Acknowledgements

This research was supported, in part, by grants from the Social Sciences andHumanities Research Council of Canada. We are grateful to Don Andrews, JoelHorowitz, two referees, the editors, and numerous seminar participants forcomments on earlier versions.

Appendix

We state the regularity conditions that we make for the proof of Theorem 1. Thefirst is an assumption about the ðk þ 1Þ-dimensional model M1 that contains the nullhypothesis model M0.

Assumption 1. The model M1, parametrized by Y�U , is locally asymptoticallynormal (LAN) at all fixed DGPs m 2M1.

Local asymptotic normality was introduced by Le Cam (1960). See also Beran(1997) for a more modern version of the definition. What is required is that, for anysample size n and for all g ¼ ðh; dÞ 2 Y�U , the difference ‘nða; gÞ between the log ofthe joint density of the dependent variables under the DGP corresponding to

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441438

parameters gþ n�1=2an, where the sequence fang converges to a, and that under theDGP g (the loglikelihood ratio) should take the form

‘nða; gÞ ¼ a>gnðgÞ � 2a>IðgÞaþ opð1Þ

for all a 2 Rkþ1, where the ðk þ 1Þ � ðk þ 1Þ matrix IðgÞ is the information matrix atg, and the ðk þ 1Þ-dimensional random vectors gnðgÞ are such that

gnðgþ n�1=2anÞ ¼ gnðgÞ � IðgÞaþ opð1Þ.

In addition, the expectation of gnðgÞ is zero for the DGP g, and, as n!1, it tends indistribution to Nð0; IðgÞÞ. As the name suggests, LAN models have the regularityneeded for the usual properties of the MLE, including asymptotic normality.

Assumption 2. The estimator h is LAE at all fixed DGPs m 2M0.

The definition of the LAE property is taken from Beran (1997). For a DGPm0 2M0 with parameter vector h0, consider a drifting DGP m with parameters in theð/; dÞ reparametrization, which was introduced in Section 3, given by the sequencefðh0 þ n�1=2tn, n�1=2dnÞg, where tn converges to a fixed k-vector t, and dn converges tod 2 R. The ð/; dÞ parametrization is used because h is consistent for / for theextended model M1. The LAE property requires that the random vectors n1=2ðh�

h0 � n�1=2tnÞ converge in distribution under this drifting DGP to the asymptoticdistribution of n1=2ðh� h0Þ under m0, namely, Nð0; Iðh0; 0ÞÞ.

The LAE property is a condition which guarantees the usual desirable propertiesof the parametric bootstrap distribution of the estimator h and excludes thepossibility of bootstrap failure, as explained by Beran. It is likely that a weakercondition would suffice for our needs, where h itself is not bootstrapped but simplyserves to define the bootstrap distribution.

In the next assumption, we extend the LAN property to cover the alternativehypothesis against which the test statistic t has maximal power.

Assumption 3. The test statistic t of which t is the asymptotic P value is a statistic ineither standard normal or w2 form asymptotically equivalent to a classical teststatistic (LR, LM, or Wald) of the hypothesis represented by the model M0 againstan alternative hypothesis represented by a ðk þ rÞ-dimensional LAN model M2 thatincludes the model M0 as a subset. Here r is the number of degrees of freedom of thetest. Under any DGP m 2M1, the CDF Rða; mÞ of t is a continuously differentiablefunction of a.

This assumption allows us to make use of results in Davidson and MacKinnon(1987), where it is shown that, under weak regularity conditions ensured by the LANproperty, test statistics in standard normal or w2 form can always be associated witha model like M2, for which they are asymptotically equivalent to a classical test ofM0 against M2.

Our final assumption is needed in order to be able to speak concretely about ratesof convergence.

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 439

Assumption 4. For sample size n, the critical value function Qða;mÞ, defined inEq. (2), can be expressed for all DGPs m 2M0 as

Qða; mÞ ¼ aþ n�j=2gða;mÞ, (24)

where j is a positive integer, and the function g is Oð1Þ as n!1 and continuouslydifferentiable with respect to the parameters of the DGP m.

Since we assume that the statistic t is expressed in asymptotic P value form, itsasymptotic distribution is Uð0; 1Þ for all DGPs in M0. It follows that, for m 2M0,Qða; mÞ ¼ aþ oð1Þ. The relation (24) specifies the actual rate of convergence to zeroof the remainder term.

Assumption 4 is precisely the assumption made in Beran (1988) in the analysis ofthe RP of bootstrap tests under the null. Actually, Beran makes the assumptionabout the function we denote as Rða; mÞ, but, since R and Q are inverse functions, theassumption can equivalently be made about one or the other. It would have beenpossible to devise some more primitive conditions that, along with the otherassumptions, would imply Assumption 4, but the clarity of the latter seemspreferable.

Proof of Theorem 1. By Assumption 3, the test statistic t of which t is the asymptoticP value has a noncentral w2 asymptotic distribution under DGPs in M1 that drift toM0; this is the conclusion of the Theorem on page 1317 of Davidson andMacKinnon (1987). This distribution is completely characterized by the number r ofdegrees of freedom of the test and a scalar noncentrality parameter (NCP) l thatdepends on the drifting DGP. Thus, for such a DGP m, Rða;mÞ tends as n!1 toPða; lÞ, the probability mass in the tail of the w2r ðlÞ distribution beyond the criticalvalue for a test at level a as defined by the central w2r distribution.

If m is a fixed DGP in M0, then, by Assumption 4,

Rða;mÞ ¼ aþ n�j=2rða;mÞ

for some Oð1Þ function rða;mÞ that is continuously differentiable with respect to a byAssumption 3. Thus the sequence ftng of test statistics for finite sample sizes n

converges to a random variable with distribution Uð0; 1Þ under DGPs m 2M0.The model M1 that contains the drifting DGPs of interest to us is LAN, by

Assumption 1. A consequence of this is that the probability measures defined on½0; 1� by the sequence ftng under a DGP m 2M0 are contiguous to those defined byftng under a DGP that drifts to m0; see Roussas (1972, Chapter 1) for a discussion ofcontiguity. Consequently, the sequence ftng converges to the same limiting randomvariable under m0 and DGPs that drift to m0. By the argument in the first paragraphof the proof, under drifting DGPs with NCP l, this variable has CDF Pða; lÞ.

By a slight abuse of notation, we write Qða; hÞ for Qða; mÞ when m 2M0 and h is theparameter vector associated with m, and similarly for gða; hÞ. Recall that the old (h)and new (/) parametrizations coincide on M0. Since the bootstrap DGP m� is in M0

and is characterized by the parameter vector h, we have that Qða;m�Þ ¼aþ n�j=2gða; hÞ. Then, from the definition (7), since the fixed DGP m0 is also in

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441440

M0, the random variable q can be expressed as

q ¼ Rðaþ n�j=2gða; hÞ; mÞ � Rðaþ n�j=2gða; h0Þ;mÞ. (25)

Since the function Rð�; mÞ is continuously differentiable with respect to a, we mayperform a Taylor expansion of (25) to obtain

q ¼ n�j=2ðP0ða; lÞðgða; hÞ � gða; h0ÞÞ þ opð1ÞÞ,

where l is the NCP for m, and P0ða; lÞ is the derivative of Pða; lÞ with respect to a.Since gða; hÞ is continuously differentiable with respect to h by Assumption 4,

Taylor’s Theorem gives

gða; hÞ � gða; h0Þ ¼ Dhgða; h0Þðh� h0Þ þOpðn�1=2Þ.

It follows that nðjþ1Þ=2q is a linear combination of the components of n1=2ðh� h0Þ plusa variable that tends to zero in m0-probability, and, by contiguity, also in m-probability. By Assumption 2, n1=2ðh� h0Þ has an asymptotically normal distribu-tion, with finite variance and with mean zero under m0 and finite mean under m.

The statistic t, if in w2 form, is a quadratic form in r asymptotically normalvariables, with finite mean and variance, that have an asymptotically normaldistribution jointly with n1=2ðh� h0Þ; again see Davidson and MacKinnon (1987) fordetails. If r ¼ 1, t is itself an asymptotically normal variable. Thus, to leading orderasymptotically under the drifting DGP m, the joint distribution of the r variablesused to construct t and nðjþ1Þ=2q is multivariate normal. It follows that thedistribution of nðjþ1Þ=2q conditional on t, and so also on t and on p, which aredeterministic functions of t, is asymptotically normal with finite mean and variance.

Let the CDF of nðjþ1Þ=2q conditional on p be denoted as GðzjpÞ. As n!1, thistends to a normal CDF with finite mean and variance under the drifting DGP m. Byperforming the change of variable x ¼ n�ðjþ1Þ=2z in the expression for the bootstrapdiscrepancy given by (11), it can be seen that the discrepancy is

n�ðjþ1Þ=2Z 1�1

zdGðzjRþ n�ðjþ1Þ=2zÞ

which is of order n�ðjþ1Þ=2 under both the drifting DGP m and the fixed null DGP m0.This completes the proof. &

Proof of Theorem 2. Since the h and / parametrizations coincide on M0, the / in(14) can be identified with the parameters h0 of the null DGP to which (14) drifts.Asymptotic normality of n1=2ðh� h0Þ was already shown in the proof of Theorem 1.That its asymptotic distribution is the same under the drifting DGP (14) as under thefixed DGP to which it drifts is then an immediate consequence of Assumption 2.

It was shown in the proof of Theorem 1 that, to leading order, nðjþ1Þ=2q is a linearcombination of the components of n1=2ðh� h0Þ It follows that nðjþ1Þ=2q isasymptotically normal with expectation zero under (14). &

Proof of Theorem 3. The theorem supposes that, for any m0 2M0 with parametersh0, the statistic t and n�1=2ðh� h0Þ are independent under their joint asymptoticdistribution. This independence holds also under DGPs that drift to M0, since, by

ARTICLE IN PRESS

R. Davidson, J.G. MacKinnon / Journal of Econometrics 133 (2006) 421–441 441

contiguity, the joint asymptotic distribution of n�1=2ðh� h0Þ and the r asymptoticallynormal variables on which t depends differs under drifting DGPs from what it isunder DGPs in M0 only in its expectation, not its covariance matrix.

If the conditional CDF F ðqjpÞ is independent of p to leading order, then, from(11), the bootstrap discrepancy is to that order just the asymptotic expectation of q.The conclusion of this theorem now follows immediately from Theorem 2. &

Proof of Corollary. The bootstrap discrepancy is determined by the jointdistribution of t and q, and to leading order by the joint asymptotic distributionof t and nðjþ1Þ=2q. The latter is determined by the joint asymptotic distribution of tand n1=2ðh� h0Þ, which, by the LAE property, is the same for all drifting DGPs withparameters ð/þ n�1=2pn; n

�1=2dÞ provided that pn converges to zero. &

References

Abramovitch, L., Singh, K., 1985. Edgeworth corrected pivotal statistics and the bootstrap. Annals of

Statistics 13, 116–132.

Beran, R., 1988. Prepivoting test statistics: a bootstrap view of asymptotic refinements. Journal of the

American Statistical Association 83, 687–697.

Beran, R., 1997. Diagnosing bootstrap success. Annals of the Institute of Statistical Mathematics 49, 1–24.

Davidson, R., MacKinnon, J.G., 1984. Convenient specification tests for logit and probit models. Journal

of Econometrics 25, 241–262.

Davidson, R., MacKinnon, J.G., 1987. Implicit alternatives and the local power of test statistics.

Econometrica 55, 1305–1329.

Davidson, R., MacKinnon, J.G., 1993. Estimation and inference in econometrics. Oxford University

Press, New York.

Davidson, R., MacKinnon, J.G., 1998. Graphical methods for investigating the size and power of

hypothesis tests. The Manchester School 66, 1–26.

Davidson, R., MacKinnon, J.G., 1999. The size distortion of bootstrap tests. Econometric Theory 15,

361–376.

Davidson, R., MacKinnon, J.G., 2000. Bootstrap tests: how many bootstraps? Econometric Reviews 19,

55–68.

Davidson, R., MacKinnon, J.G., 2001. Improving the reliability of bootstrap tests. Queen’s Institute for

Economic Research, Discussion Paper no. 995, revised.

Hall, P., 1988. Theoretical comparison of bootstrap confidence intervals. Annals of Statistics 16, 927–953.

Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer, New York.

Horowitz, J.L., 1994. Bootstrap-based critical values for the information matrix test. Journal of

Econometrics 61, 395–411.

Horowitz, J.L., 1997. Bootstrap methods in econometrics: theory and numerical performance. In: Kreps,

D.M., Wallis, K.F. (Eds.), Advances in Economics and Econometrics: Theory and Applications:

Seventh World Congress. Cambridge University Press, Cambridge.

Horowitz, J.L., Savin, N.E., 2000. Empirically relevant critical values for hypothesis tests. Journal of

Econometrics 95, 375–389.

Le Cam, L., 1960. Locally asymptotically normal families of distributions. University of California

Publications in Statistics 3, 27–98.

Roussas, G.G., 1972. Contiguity of Probability Measures. Cambridge University Press, Cambridge.

White, H., 1982. Maximum likelihood estimation of misspecified models. Econometrica 50, 1–26.