Chapter 3. GMM: Selected Topicspeople.bu.edu/qu/EC709-2012/chapter03.pdf · 2012. 10. 16. · 1. Finite sample properties of the GMM estimator, e.g., –nite sample bias and MSE;

Chapter 3. GMM: Selected Topics

Contents

1 Optimal Instruments 11.1 The issue of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Optimal Instruments under the i:i:d: assumption . . . . . . . . . . . . . 1

1.2.1 The basic result . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.2 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Optimal Instruments under the martingale di¤erence assumption . . . . 5

1.4 Optimal Instruments under general dependence . . . . . . . . . . . . . . 5

2 Finite sample properties 62.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 The size of GMM-based Wald tests . . . . . . . . . . . . . . . . . . . . . 6

2.3 Bootstrap under the martingale di¤erence assumption . . . . . . . . . . 10

2.3.1 Bootstrapping the GMM estimator . . . . . . . . . . . . . . . . . 10

2.3.2 Bootstrap the test statistics . . . . . . . . . . . . . . . . . . . 11

2.4 Bootstrap under general serial dependence . . . . . . . . . . . . . . . . . 12

3 Weak identi�cation: a pitfall and some ways around 123.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Some consequences of weak instruments in linear models: a scary regression 13

3.3 Robust Inference with weak instruments in linear models . . . . . . . . 16

3.4 Robust Inference in nonlinear models . . . . . . . . . . . . . . . . . . . . 17

1. Optimal Instruments

1.1. The issue of interest

When applying GMM in macroeconomics, a typical but crucial step is to transform

conditional restrictions into unconditional ones.

Speci�cally, suppose a model delivers the following conditional moment restrictions:

E(dt(�0)jzt) = 0; (t = 1; 2; :::) (1)

where dt(�0) and zt are �nite dimensional random vectors. (Here we have simpli�ed

the matter by restricting zt to be �nite dimensional.) This implies, for any measurable

integrable function f(:), we always have

E (f(zt)dt(�0)) = 0 (t = 1; 2; :::): (2)

We end up with an in�nite number of valid instruments, or equivalently, an in�nitely

number of unconditional moment restrictions.

Question: which of these instruments should we use if the goal is to minimize the

asymptotic variance of the GMM estimator?

1.2. Optimal Instruments under the i:i:d: assumption

1.2.1. The basic result

� Assumption 1: (dt(�0)0; z0t) are independently and identically distributed in t =

1; :::; T:

Proposition 1. Assume (1) and Assumption 1 hold. Suppose �0 is an unknown q by

1 parameter vector.

1. An optimal choice of the instruments (i.e, minimizing the asymptotic variance

among all GMM estimators) in (2) is given by

z�t = KE

�@dt(�0)

0

@�

�� zt��1; (3)

1

where K is any q by q nonsingular matrix of �nite constants, and

�� = E�dt(�0)dt(�0)

0jzt�; (4)

2. The resulting GMM estimator solves

1

T

TXt=1

z�t dt(��) = 0;

whose asymptotic covariance matrix is given by

V � =

�E

�E

�@dt(�0)

0

@�

�� zt��1E � @dt(�0)@�0

�� zt��1 :1.2.2. Illustrative examples

Example 1. Consider the linear model

yt = xt� + ut

where xt is a scalar random variable with E(xtut) 6= 0: Suppose zt is a set of instrumentssatisfying

E(utjzt) = 0;

var(utjzt) = �2;

and (yt; xt; zt) are i:i:d. Then, apply (3) and (4),

z�t = kE

�@(yt � xt�)

@�

�� zt� 1

�2= � k

�2E(xtjzt);

where k is an arbitrary constant. Since k can take any value, we are free to set k = ��2

in which case the optimal instrument reduces to,

z�t = E(xtjzt):

Now we generalize the model to allow for heteroskedasticity, i.e., we assume

var(utjzt) = �2t ;

2

but leave all other aspect of the speci�cation the same. In this case, the optimal instru-

ment takes the form

z�t = kE

�@(yt � xt�)

@�

�� zt� 1

�2t= � k

�2tE(xtjzt):

In this case, it is not possible to eliminate �2t by judicious choice of k, although we can

set k = �1 to remove the minus sign. This gives

z�t =1

�2tE(xtjzt):

Example 2. Consider the following linear two-equation system:

y1;t = x1;t�1 + u1;t

y2;t = x2;t�2 + u2;t

where x1;t and x2;t are scalar random variables. Let

ut = (u1;t; u2;t)0:

Suppose zt is a set of variables satisfying

E(utjzt) = 0;

var(utjzt) = �;

and (yt; xt; zt) are iid. Then, apply (3) and (4),

z�t = KE

"@(y1;t�x1;t�1)

@�10

0@(y2;t�x2;t�2)

@�2

#jzt

!��1

= �KE "

E(x1;tjzt) 0

0 E(x2;tjzt)

#!��1

where K is a matrix of constants. In this case, the matrix ��1 weights E(x1;tjzt) andE(x2;tjzt) to account for the correlation between the two equations.

While the Proposition characterizes the optimal instruments, it does not fully resolve

the problem of instrument selection. The function z�t depends on E (@dt(�0)0=@�j zt)

3

and, in most cases, on �� as well (see the above examples); neither of these functions

are typically part of the speci�cation of the underlying economic/statistical model. One

natural solution is to estimate the components E (@dt(�0)0=@�j zt) and �� from the data.In some cases this works in a straightforward manner, while in general it is complicated.

We now present one example in which the proposal works.

Example 3. Still consider the linear model studied in the �rst example, with the fur-

ther assumption that xt is generated by a linear model

xt = z0t� + vt: (5)

With this speci�cation, an optimal instrument is

E(xtjzt) = z0t�:

Hence, the optimal GMM estimator solves

TXt=1

z0t�(yt � xt�) = 0:

Since � is unknown, we replace it by the OLS estimate based on equation (5):

� = (Z 0Z)�1Z 0X:

Substituting in, we haveTXt=1

z0t�(yt � xt�) = 0:

Explicitly

� =

TXt=1

xtz0t�

!�1 TXt=1

ytz0t�

= (X 0PZX)�1X 0PZy:

This is precisely the two-stage least square estimator (2SLS). Therefore, 2SLS estimator

can be interpreted as the feasible optimal GMM estimator within this model.

4

In the preceding example, the construction of the optimal instruments rests crucially

on the assumption that E(xtjzt) is linear. This speci�cation may be natural in somecontexts�such as the linear simultaneous equations model�but may not be so appropriate

in others. In the general case, we can use non-parametric methods to estimate E(xtjzt);e.g., approximate the conditional expectation by a polynomial. For further discussions

in such a direction, see Newey (1990, 1993).

1.3. Optimal Instruments under the martingale di¤erence assumption

We now relax Assumption 1 and presents a similar result for dynamic models. Suppose

the solution of a model delivers the following moment restrictions:

E(dt(�0)jIt�1) = 0 (6)

where It�1 is the information set at t� 1.

Corollary 1. Assume (6) holds. Suppose �0 is of dimension q.

1. Then an optimal choice of instruments in (2) is given by

z�t = KE

�@dt(�0)

0

@�

�� It�1��1;where

�� = E�dt(�0)dt(�0)

0jIt�1�;

and K is any q by q nonsingular matrix of �nite constants.

2. This choice leads to a GMM estimator with asymptotic variance matrix

V � =

�E

�E

�@dt(�0)

0

@�

�� It�1��1E � @dt(�0)@�0

�� It�1��11.4. Optimal Instruments under general dependence

Once serial correlation is introduced, which is the case if we have moment conditions

E(dt(�0)jIt�m) = 0 with m > 1,

5

then the form of the optimal instrument will change, although the basic idea remains

the same. The result is of limited use from a practical point of view, therefore we omit

the details. You can refer to Hall (2005, pp. 247-251).

2. Finite sample properties

2.1. Introduction

Our discussions so far have been asymptotic in nature. In practice, we always face a

�nite sample. The following two issues are therefore of particular importance:

1. Finite sample properties of the GMM estimator, e.g., �nite sample bias and MSE;

2. Finite sample properties of GMM-based inference procedures.

Here we focus on the second issue. We will �rst examine the �nite sample prop-

erties of the Wald tests when asymptotic critical values are used for inference. Then,

we will discuss a bootstrap procedure, which can improve the inference under some

circumstances.

2.2. The size of GMM-based Wald tests

The discussion below is based on the simulation analysis in Burnside and Eichenbaum

(1996, Henceforth BE). They asked the following questions:

1. Does the size of the tests closely approximate their asymptotic size?

2. Do joint tests of several restrictions perform as well or worse than tests of simple

hypotheses, and what are responsible for size distortions?

3. How can modelling assumptions, or restrictions imposed by hypothesis themselves,

be used to improve the performance of these tests?

4. What practical advice can be given to the practitioner?

6

BE considered two simulation experiments. In the �rst, the data are generated

by Gaussian vector white noise, and in the second the DGP is Burnside and Eichen-

baum (1994). The �ndings from the two experiments are similar; we focus on the �rst

experiment due to its simplicity.

� DGP: Xit � i:i:d:N(0; �2i ); i = 1; :::; n; t = 1; :::; T: n = 20; T = 100; �21 = ::: =

�2n = 1:

� Parameters: Econometrician knows E(Xit) = 0 and is interested in estimating�2i � V ar(Xit):

� Moment Conditions: E(X2it � �2i ) = 0; i = 1; :::; n:

� GMM estimates: �i =�T�1

PTt=1X

2it

�1=2� Hypotheses of interest: HM : �1 = ::: = �M = 1;M � n: BE considered

M 2 f1; 2; 5; 10; 20g :

� Wald tests:WMT = T (� � 1)0A0

�AVTA

0��1A (� � 1) ; (7)

where A = (IM 0M�(n�M)); � = (�1; :::; �n)0; and VT denotes a generic estimator

of the asymptotic variance-covariance matrix ofpT (� � 1); i:e:;

limT!1

VT =�G00S

�10 G0

��1:

Note that

the i� th element of G0 is E@(X2

it � �2i )@�i

= �2�i;

the ij � th element of S0 is E(X2it � �2i )(X2

jt � �2j ):

Also note that

WMT !d �2M under HM :

� Alternative Covariance Matrix Estimators VT :

7

1. Allow the data to be dependent, and estimate S0 using the Newey and West (1987)

estimator; the bandwidth BT = 4;


estimator; the bandwidth BT = 2;


estimator, the bandwidth is determined using Andrews (1991) ;

4. Exploit the assumption that data are serially uncorrelated. Thus, the ijth element

of S0 is estimated by T�1PTt=1(X

2it � �2i )(X2

jt � �2j ):

5. Exploit the assumption that data are serially uncorrelated and mutually indepen-

dent. The iith element of S0 is estimated by T�1PTt=1(X

2it��2i )2; the o¤-diagonal

elements are zero.

6. Impose Gaussianity. The ii�th element of S0 is estimated by 2�4i ; the o¤-diagonalelements are zero.

7. Impose the null hypotheses on S0. The iith element of S0 is 2 for i � n; the

o¤-diagonal elements are zero.

8. Impose the null hypotheses on S0 and G0: The iith element of S0 is 2 for i � n;the o¤-diagonal elements are zero. The ith element of G0 is �2 for i � n:

The results are reported in Table 1.

Hence, the conclusions and suggestions to practitioner are:

1. The small sample size of the Wald tests tends to exceed the asymptotic size. The

problem becomes dramatically worse as the dimension of the joint tests being

considered increases;

2. The bulk of the problem has to do with di¢ culty in estimating the variance-

covariance matrix S0. In their second simulation experiment, BE further docu-

ments that the bias in estimating mT (�0) and the correlation between mT (�) and

ST are not the main contributors to the size distortions.

8

9

3. In practice, to improve the size property, it is useful to impose a priori information

when estimating S0: Two important sources of such information are the economic

theory being investigated and the null hypothesis being tested.

2.3. Bootstrap under the martingale di¤erence assumption

Bootstrap is an alternative way to approximate the sampling distribution of an estimator

or a test statistic.

Suppose we have the following moment restrictions:

E(m(Xt; �0)) = 0; t = 1; 2; :::T:

Assume m(Xt; �0) are serially uncorrelated.

We now show how to use bootstrap to approximate the sampling distribution of the

two-step GMM estimator and related test statistics.

2.3.1. Bootstrapping the GMM estimator

Recall

� = argmin�

1

T

TXt=1

m(Xt; �)

!0ST (�1)

�1

1

T

TXt=1

m(Xt; �)

!;

where

ST (�1) =1

T

TXt=1

m(Xt; �1)m(Xt; �1)0

and �1 is some preliminary GMM estimator, say, GMM estimator using an identity

weighting matrix.

Then the bootstrap approximation to the sampling distribution of � can be obtained

as follows.

� Step 1: Draw a sample of size T with replacement from the observed sample

fX1; :::; XT g ; denote the sample of draws as fX�1 ; :::; X

�T g :

� Step 2: Compute the GMM estimator using the random sample, i.e.,

��= argmin

�

1

T

TXt=1

m�(X�t ; �)

!0S�T (�

�1)�1

1

T

TXt=1

m�(X�t ; �)

!;

10

where

m�(X�t ; �) = m(X�

t ; �)�1

T

TXt=1

m(Xt; �),

S�T (�) =1

T

TXt=1

m�(X�t ; �)m

�(X�t ; �)

0;

and ��1 is some preliminary GMM estimator, say, GMM estimator with an identity

weighting matrix.

� Step 3: Repeating Steps 1 and 2 many times (say B times) to obtain a set of es-timates. Call them �

�(1); :::; �

�(B). We then use the distribution of

pT��(j) � �

�as an approximation to the sampling distribution of

pT (� � �0).

2.3.2. Bootstrap the test statistics

We can approximate sampling distributions of commonly used test statistics using Boot-

strap. We focus on the t, the Wald and the J statistic.

The t-statistic. Recall the that t-statistic for testing the r � th component of �0equal to some constant is given by

pT (� � �0)rqVT (�)r;r

(8)

where (��0)r denotes the r� th component of ��0 and VT (�)r;r denote the (r; r)� thcomponent of VT ; with

VT (�) =hGT (�)

0ST (�)�1GT (�)

i�1(9)

The distribution of (8) can be approximated by the empirical distribution ofpT (�

� � �)rqVT (�

�)r;r

where the formula of VT (��)r;r is given in (9), with � replaced by �

�:

11

The Wald statistic. (I will leave this as an exercise). As another exercise, bootstrap

the statistic (7) and compare with Table 1 (d to f only).

The J statistic. Simply compute

J� = T

1

T

TXt=1

m�(X�t ; �

�)

!0S�T (�

�1)�1

1

T

TXt=1

m�(X�t ; �

�)

!

and repeat. Use the resulting empirical distribution to approximate the distribution of

J:

2.4. Bootstrap under general serial dependence

The extension to this case is not straightforward. The available procedures are com-

plicated and do not work very satisfactory in practice. Interested readers can read

Hall, P., and J. L. Horowitz (1996): �Bootstrap Critical Values for Tests Based on

Generalized-Method-of-Moment Estimators,�Econometrica, 64, 891�916.

3. Weak identi�cation: a pitfall and some ways around

3.1. Introduction

Recall that for the linear instrumental variable regression to work, we need a set of

instruments that are both valid (uncorrelated with the errors) and relevant (correlated

with the endogenous regressors).

For GMM, we say instruments zt are valid (or exogenous) if they satisfy moment

restrictions E(ztdt(�0)) = 0. The requirement of "instrument relevance" is replaced by

"identi�cation", which is satis�ed if

E(ztdt(�)) 6= 0 for � 6= �0:

"Weak instruments" or "weak identi�cation" will arise if instruments are only weakly

correlated with included endogenous variables. This poses considerable challenges to

inference using GMM and IV methods.

Below, we

12

1. Use a linear model to illustrate such consequences;

2. Discuss how to conduct inference for linear models with potential weak instru-

ments;

3. Brie�y discuss how to conduct inference for nonlinear models with weak identi�-

cation.

3.2. Some consequences of weak instruments in linear models: a scary re-gression

Many papers have been written trying to measure the return to years of education. The

setting usually involves estimating some wage equation as follows

yi = Yi� + x02;i� + "i;

where yi is some measure of the income for individual i and Yi is the years of education,

and x02;i are some covariates The di¢ culty is that the education achievement Yi is

endogenous. A popular solution is to use some instruments, which generate variations

in Yi but otherwise are uncorrelated with yi:

Angrist and Krueger (1991) is a famous example. They argued that the quarter

of birth is a valid instrument. The idea is that the compulsory school attendance law

requires a student to start �rst grade in the fall of the calendar year in which he or

she turns age 6 and to continue attending school until he or she turns 16. Thus an

individual born in the early months of the year will usually enter �rst grade when he or

she is close to age 7 and will reach age 16 in the middle of tenth grade. An individual

born in the third or fourth quarter will typically start school either just before or just

after turning age 6 and will �nish tenth grade before reaching age 16. They presented

several tabulations to show that individuals born in the early month of the year on

average have less years of education. They estimated the wage equation and concluded

that the educational attainment has a signi�cant e¤ect on earnings and the magnitude

is to the one estimated with OLS.

While it can be argued that the quarter of birth itself may have a direct e¤ect on

earnings, a more relevant concern is that the relationship between quarter of birth and

13

educational attainment maybe very weak. Bound, Jaeger and Baker (1995) shows this

may well be the case (The following table is from Bound, Jaeger and Baker (1995),

Table 1): Hence, the question is how misleading the results can be in such a situation.

To address this problem Bound, Jaeger and Baker (1995) did something very clever

(following the suggestion of Krueger). They replaced each individual�s real quarter of

birth by a fake quarter of birth, randomly generated by a computer. What they found

was amazing: It didn�t matter whether you used the real quarter of birth or the fake

one as the instrument� 2SLS gave basically the same answer! The detailed results are

reported in the following table.The intuition behind the results can be illustrated by a

simple example. 1

yi = xi� + "i

xi = zi + vi

where zi are �xed, vi is a zero mean error term: For simplicity, assume the variables "i1The example is taken from Hansen (2006), with some change.

14

and vi are normally distributed and that "i is independent of vi. We have

�IV � �0 =Pni=1 zi"iPni=1 zixi

Now, suppose the instrument and the endogenous variable are not correlated, i.e., =

0:Then, for a given n,

1pn

nXi=1

zi"i � N1 = N(0; E(z2i "2i ))

1pn

nXi=1

zixi =1pn

nXi=1

zivi � N2 = N(0; E(z2i v2i ))

therefore

�IV � �0 �N1N2:

The above result holds for any n, it also holds when n ! 1: The distribution isdrastically di¤erent from the standard normal approximation and the standard inference

is invalid. In particular, in the presence of identi�cation failure,pn(�IV ��0) diverges,

hence severe over-rejection of the null hypothesis will occur if standard critical values

are used.

15

3.3. Robust Inference with weak instruments in linear models

A partial solution to the above problem is to use test statistics that are not sensitive to

the strength of instruments (this excludes the t and F statistics). Three statistics have

attracted wide attention: the Anderson-Rubin (AR) statistic, Kleibergen�s LM statistic,

Moreira�s conditional likelihood ratio statistic. Asymptotically, those statistics all have

distributions that do not depend on the strength of the instruments.

We consider a linear regression model with a single endogenous regressor and no

included exogenous variable

y = Y � + u

with

Y = Z�+ v

where y and Y are T by 1 vector of observations on endogenous variables, Z is a T by

k matrix of instruments. It is useful to de�ne the following two quantities:

S = (Z 0Z)�1=2(Z 0Y¯b0)p

b0b0(10)

and

T =(Z0Z)�1=2(Z 0Y

¯�1a0)p

a0�1a0(11)

where

Y¯= [y; Y ] ; b0 = [1;��0]0 ; a0 = [�0; 1]0

and is the variance of the reduced form errors. (11) is a su¢ cient statistic for �: Let

T and S denote T and S evaluated with =Y¯0MZY¯

=(T �K) replacing :

The Anderson-Rubin statistic. Anderson and Rubin (1949) proposed testing the

null hypothesis �0= 0 using the statistic

AR(�0) =(y � Y �0)0PZ(y � Y �0)=k

(y � Y �0)0MZ(y � Y �0)=(T � k)=S 0Sk

With �xed instruments and normal errors, the quadratic forms in the numerator and

denominator of are independent chi-squared random variables under the null hypothesis,

16

and AR(�0) has exact Fk;T�k null distribution. Dropping the Gaussian assumption, we

have

AR(�0)!d �2k=k:

Because the numerator and denominator of the Anderson Rubin statistic are eval-

uated at the true parameter value, it has an asymptotic chi-square distribution even if

the unknown parameters are poorly identi�ed.

Kleibergen�s Statistic. Kleibergen (2001) proposed the statistic

K(�0) =

�S 0T

�2T 0T

If k = 1, then K(�0) = AR(�0) . Kleibergen showed that under either conventional or

weak-instrument asymptotics, K(�0) has �21 as its null distribution.

Moreira�s Statistic. Moreira (2003) proposed testing � = �0 using the conditional

likelihood ratio test statistic

M(�0) =1

2

S 0S � T 0T +

s�S 0S + T 0T

�2�4��S 0S

��T 0T

��S 0T

�2�!

The (weak instruments) asymptotic distribution of M(�0) is non-standard. However,

conditional on the value of T ; it does not depend on the strength of the instrumentsand the null distribution can be obtained by Monte Carlo simulation.

Remark 1. Due to the duality between hypothesis tests and con�dence sets, these tests

can be used to construct con�dence sets robust to weak instruments. For example, a

fully robust 95% con�dence set can be constructed as the set of �0 for which the AR

statistic, AR(�0), fails to reject at the 5% signi�cance level.

3.4. Robust Inference in nonlinear models

Nonlinear Anderson�Rubin Statistic. Recall that because the numerator and

denominator of the Anderson Rubin statistic are evaluated at the true parameter value,

17

it has an asymptotic chi-square distribution even if the unknown parameters are poorly

identi�ed. This observation suggests tests of � =�0 based on the nonlinear analog of

the AR statistic, which is the so-called continuous-updating GMM objective function

in which the weight matrix is evaluated at the same parameter value as the numerator

JCU (�0) =

r1

T

TXt=1

m(Xt; �0)

!0S(�0)

�1

r1

T

TXt=1

m(Xt; �0)

!

If there is no serial correlation, then

S(�0) =1

T

TXt=1

~m(Xt; �0) ~m(Xt; �0)0

with ~m(Xt; �0) = m(Xt; �0)� T�1TXt=1

m(Xt; �0)

If m(Xt; �0) is serially correlated, then S(�0) is replaced by the estimate of the long run

variance using some kernel based method. Under the null hypothesis, JCU (�0) has a

�2K limiting distribution where K is the number of moment restrictions.

Notice that we need to re-center m(Xt; �0) when estimating S(�0): Otherwise S(�0)

diverges under the alternative hypothesis and the test does not have power.

Kleibergen�s Statistic. Kleibergen (2005) proposed testing the hypothesis � = �0

using a generalization of K(�0) and showed that the proposed statistic has a chi-square

limiting distribution. You can refer to his paper for details.

18

References

[1] Anderson, T. W., and Rubin, H. (1949), �Estimation of the Parameters of a Single

Equation in a Complete System of Stochastic Equations,�Annals of Mathematical

Statistics, 20, 46�63.

[2] Donald W. K. Andrews, (1999). "Consistent Moment Selection Procedures for Gen-

eralized Method of Moments Estimation," Econometrica, vol. 67(3), pages 543-564.

[3] Angrist, J. D., and Krueger, A. B. (1991), �Does Compulsory School Attendance

A¤ect Schooling and Earnings,�Quarterly Journal of Economics, 106, 979�1014.

[4] Bound, J., Jaeger, D. A., and Baker, R. (1995), �Problems With Instrumental

Variables Estimation When the Correlation Between the Instruments and the En-

dogenous Explanatory Variables Is Weak,�Journal of the American Statistical As-

sociation, 90, 443�450.

[5] Burnside, C. and Eichenbaum, M. (1996), "Small-Sample Properties of GMM-

Based Wald Tests", Journal of Business and Economic Statistics, 14, pp. 294-308.

[6] Chamberlain, G. (1987). �Asymptotic E¢ ciency in Estimation with Conditional

Moment Restrictions,�Journal of Econometrics, 34, 305�334.

[7] Hall (2005), Generalized Method of Moments, Oxford Press.

[8] Kleibergen, F. (2002), �Pivotal Statistics for Testing Structural Parameters in In-

strumental Variables Regression,�Econometrica, Vol. 70, No. 5, 1781�1803.

[9] � - (2005), �Testing Parameters in GMM Without Assuming That They Are Iden-

ti�ed,�Econometrica, Vol. 73, No. 4 (July, 2005), 1103�1123.

[10] Moreira, M. J. (2003),�A Conditional Likelihood Ratio Test for Structural Models",

Econometrica, Vol. 71, No. 4, pp. 1027-1048.

[11] W. K. Newey, (1990). "E¢ cient Instrumental Variables Estimation of Nonlinear

Models,�Econometrica 58, 809-837.

19

[12] � - (1993). "E¢ cient Estimation of Models with Conditional Moment Restrictions,�

in G.S. Maddala, C.R. Rao, and H.D. Vinod, eds., Handbook of Statistics, Volume

11: Econometrics. Amsterdam: North-Holland.

[13] Stock, J. H., Wright, J.H. and Yogo, M. (2002), "A Survey of Weak Instruments

and Weak Identi�cation in Generalized Method of Moments," Journal of Business

& Economic Statistics, Volume 20 , pp. 518-529.

20

AppendixProof of Proposition 1.

Let ��denote the GMM estimator using z�t as instruments and V

� its asymptoticvariance.

Let � denote an alternative GMM estimator using xt as instruments, where xt =f(zt) for some vector-valued function f(:). Let V denote its asymptotic variance of �.

It su¢ ces to show that (V � V �) is a positive semi-de�nite matrix.Write

� = ��+ (� � ��);

Then,

V ar(pT �) = V ar(

pT �

�) + V ar(

pT (� � ��))

+Cov�pT �

�;pTh� � ��

i�+ Cov

�pTh� � ��

i;pT �

��:

Therefore,

V ar(pT �)� V ar(

pT �

�)

= V ar(pT (� � ��)) + Cov

�pT �

�;pTh� � ��

i�+ Cov

�pTh� � ��

i;pT �

��:

Because the �rst term on the right hand side is positive semi-de�nite, the proof will becomplete if we can show

limT!1

Cov(pT �

�;pT (� � ��)) = 0:

Or, equivalently, to show

limT!1

Cov(pT �

�;pT �) = lim

T!1V ar(

pT �

�): (A.1)

To establish (A.1), explicit formulae for ��and � are needed. First consider �

�. It

satis�es1pT

TXt=1

z�t dt(��) = 0:

Take a �rst order Taylor�s expansion around the true value �0;

0 =1pT

TXt=1

z�t dt(�0) +

1

T

TXt=1

z�t@dt(�0)

@�0

!pT (�

� � �0) + op(1):

A-1

Because

limT!1

1

T

TXt=1

z�t@dt(�0)

@�0= E

�z�t@dt(�0)

@�0

�= E

�E

�z�t@dt(�0)

@�0jzt��

= KE

�E

�@dt(�0)

0

@�

�� zt��1E � @dt(�0)@�0

�� zt�� KD�;

where we have de�ned

D� = E

�E

�@dt(�0)

0

@�

�� zt��1E � @dt(�0)@�0

�� zt�� ;we have,

pT (�

� � �0) = �D��1K�1T�1=2TXt=1

z�t dt(�0) + op(1):

Therefore

limT!1

V ar(pT (�

� � �0)) = D��1K�1V ar (z�t dt(�0))K�1D��1 = D��1:

Note that the last equality follows because

V ar (z�t dt(�0))

= V ar

�KE

�@dt(�0)

0

@�

�� zt��1dt(�0)�= KE

�E

�@dt(�0)

0

@�

�� zt��1dt(�0)dt(�0)0��1E � @dt(�0)@�0

�� zt��K 0

= KE

�E

�@dt(�0)

0

@�

�� zt��1E � @dt(�0)@�0

�� zt��K 0

= KD�K 0:

Now consider � and apply similar arguments. We have

� = argmin�

1

T

TXt=1

xtdt(�)

!0S�1

1

T

TXt=1

xtdt(�)

!; (A.2)

where S is a consistent estimate of the optimal weighting matrix S�10 , with

S0 = V ar(xtdt(�0)):

A-2

The �rst order condition of (A.2) implies 1

T

TXt=1

xt@dt(�)

@�0

!0S�1

1pT

TXt=1

xtdt(�)

!= 0:

Take a �rst order Taylor�s expansion of T�1PTt=1 xtdt(�) around the true value �0; we

have

0 =

1

T

TXt=1

xt@dt(�)

@�0

!0S�1

1pT

TXt=1

xtdt(�0)

!

+

1

T

TXt=1

xt@dt(�)

@�0

!0S�1

1

T

TXt=1

xt@dt(�0)

@�0

!pT (� � �0) + op(1):

Because � !p �0, we have

limT!1

1

T

TXt=1

xt@dt(�)

@�0= limT!1

1

T

TXt=1

xt@dt(�0)

@�0� D

andlimT!1

S�1 = S�10 :

Therefore,pT (� � �0) = �(D0S�10 D)�1D0S�10 T�1=2

TXt=1

xtdt(�0):

And

limT!1

Cov(pT (� � �0);

pT (�

� � �0))

= limT!1

E

0@(D0S�10 D)�1D0S�101

T

TXt=1

xtdt(�0)

TXt=1

z�t dt(�0)

!0 �K�1�0D��1

1A(due to iid) = E

�(D0S�10 D)�1D0S�10

�xtdt(�0)dt(�0)

0z�0t� �K�1�0D��1�

= E�(D0S�10 D)�1D0S�10

�xtE

�dt(�0)dt(�0)

0jzt�0z�0t

� �K�1�0D��1� (A.3)

For the term in the middle,

xtE [dt(�0)dt(�0)jzt]0 z�0t = xt��1E�@dt(�0)

0

@�

�� zt�0K 0 = E

�xt@dt(�0)

@�0

�� zt�K 0:

A-3

Hence, (A.3) equals

E(D0S�10 D)�1D0S�10

�E

�xt@dt(�0)

@�0

�� zt�K 0��K�1�0D��1

= (D0S�10 D)�1D0S�10 E

�E

�xt@dt(�0)

@�0

�� zt�K 0��K�1�0D��1

= (D0S�10 D)�1D0S�10 E

�xt@dt(�0)

@�0

�D��1

= D��1:

A-4

Documents

Chapter 3. GMM: Selected Topicspeople.bu.edu/qu/EC709-2012/chapter03.pdf · 2012. 10. 16. · 1. Finite sample properties of the GMM estimator, e.g., –nite sample bias and MSE;