5
Econometric Analysis of Cross Section and Panel Data- Wooldridge, J. (2002) 4 The Single-Equation Linear Model and OLS Estimation 4.1 Overview of the Single-Equation Linear Model k Goldberger (1972) defines a structural model as one representing a causal relationship, as opposed to a relationship that simply captures statistical associations. The error term u can consist of a variety of things, including omitted variables and measurement error. The parameters b  j hopefully correspond to the  parameters of interest, that is, the parameters in an underlying structural model. k An explanatory variable x  j is said to be endogenous in equation (4.1) if it is correlated with u. You should not rely too much on the meaning of µµendogenous¶¶ from other branches of economics. In traditional usage, a variable is endogenous if it is determined within the context of a model. The usage in econometrics, while related to traditional definitions, is used broadly to describe any situation where an explanatory variable is correlated with the disturbance. If x  j is uncorrelated with u, then xj is said to be exogenous in equation. k In applied econometrics, endogeneity usually arises in one of three ways: Omitted Variables Measurement Error Simultaneity The distinctions among the three possible forms of endogeneity are not always sharp. In fact, an equation can have more than one source of endogeneity.For an illuminating discussion of the three kinds of endogeneity as they arise in a  particular field, see Deaton¶s (1995) survey chapter on econometric issues in development economics. k As with Assumption OLS.1, Assumption OLS.2 is an assumption about the  population. Since E (x¶x) is a symmetric KxK matrix, Assumption OLS.2 is equivalent to assuming that E (x¶x) is positive definite. Since x1 1»4 1, Assumption OLS.2 is also equivalent to saying that the (population) variance matrix of the K-1 nonconstant elements in x is nonsingular. This is a standard assumption, which

Eco No Metric Analysis of Cross Section and Panel Data

Embed Size (px)

Citation preview

Page 1: Eco No Metric Analysis of Cross Section and Panel Data

8/6/2019 Eco No Metric Analysis of Cross Section and Panel Data

http://slidepdf.com/reader/full/eco-no-metric-analysis-of-cross-section-and-panel-data 1/5

Econometric Analysis of Cross Section and Panel Data-Wooldridge, J. (2002)

4 The Single-Equation Linear Model and OLS Estimation

4.1 Overview of the Single-Equation Linear Model

k  Goldberger (1972) defines a structural model as one representing a causal

relationship, as opposed to a relationship that simply captures statistical

associations. The error term u can consist of a variety of things, including omitted

variables and measurement error. The parameters b j hopefully correspond to the

 parameters of interest, that is, the parameters in an underlying structural model.

k  An explanatory variable x j is said to be endogenous in equation (4.1) if it iscorrelated with u. You should not rely too much on the meaning of µµendogenous¶¶

from other branches of economics. In traditional usage, a variable is endogenous if 

it is determined within the context of a model. The usage in econometrics, while

related to traditional definitions, is used broadly to describe any situation where an

explanatory variable is correlated with the disturbance. If x j is uncorrelated with u,

then xj is said to be exogenous in equation.

k  In applied econometrics, endogeneity usually arises in one of three ways:

Omitted Variables

Measurement Error 

Simultaneity

The distinctions among the three possible forms of endogeneity are not always

sharp. In fact, an equation can have more than one source of endogeneity.For an

illuminating discussion of the three kinds of endogeneity as they arise in a

  particular field, see Deaton¶s (1995) survey chapter on econometric issues in

development economics.

k  As with Assumption OLS.1, Assumption OLS.2 is an assumption about the

  population. Since E (x¶x) is a symmetric KxK matrix, Assumption OLS.2 is

equivalent to assuming that E (x¶x) is positive definite. Since x1 1»4 1, Assumption

OLS.2 is also equivalent to saying that the (population) variance matrix of the K-1

nonconstant elements in x is nonsingular. This is a standard assumption, which

Page 2: Eco No Metric Analysis of Cross Section and Panel Data

8/6/2019 Eco No Metric Analysis of Cross Section and Panel Data

http://slidepdf.com/reader/full/eco-no-metric-analysis-of-cross-section-and-panel-data 2/5

fails if and only if at least one of the regressors can be written as a linear function

of the other regressors (in the population).Under Assumptions OLS.1 and OLS.2,

the parameter vector b is identified.In the context of models that are linear in the

 parameters under random sampling, identification of b simply means that b can be

written in terms of population moments in observable variables.

²  The key assumption for OLS to consistently estimate b is the population

orthogonality condition:

Assumption OLS.1: E(x¶u)= 0.

Because x contains a constant, Assumption OLS.1 is equivalent to saying that u

has mean zero and is uncorrelated with each regressor. Assumption OLS.1 is the

zero conditional mean assumption. The other assumption needed for consistency of 

OLS is that the expected outer product matrix of x has full rank, so that there are

no exact linear relationships among the regressors in the population. This is stated

succinctly as follows:

Assumption OLS.2: rank E(x¶x) = K.

Assumption OLS.3: E(u2x¶x)= 2E(x¶x), where 2 = E(u2).Homoskedasticity

assumption.

Further, WLS is generally inconsistent if E(u2x¶x) but Assumption OLS.1 holds,

so WLS is inappropriate for estimating linear projections. Especially with large

sample sizes, the presence of heteroskedasticity need not affect one¶s ability to

  perform accurate inference using OLS. But we need to compute standard errorsand test statistics appropriately.

5 Instrumental Variables Estimation of Single-Equation Linear Models

k  We explicitly allow the unobservable error to be correlated with the explanatory

variables.

5.1 Instrumental Variables and Two-Stage Least Squares

Equation (5.1)

where xK  might be correlated with u. In other words, the explanatory variables x1,

x2,...., xK -1areexogenous,butxK is potentiallyendogenousin equation.OLS estimation

Page 3: Eco No Metric Analysis of Cross Section and Panel Data

8/6/2019 Eco No Metric Analysis of Cross Section and Panel Data

http://slidepdf.com/reader/full/eco-no-metric-analysis-of-cross-section-and-panel-data 3/5

of equation (5.1) generally results in inconsistent estimators of all the b jif Cov (xK ,

u) .

²  The method of instrumental variables (IV) provides a general solution to the

 problem of an endogenous explanatory variable. To use the IV approach with xK  endogenous, we need an observable variable, z1, not in equation (5.1) that satisfies

two conditions. First, z1 must be uncorrelated with u:Cov (z1, u) . Second, z1 is

correlated with xK( ).

Equation (5.4)

²  The linear projection in equation (5.4) is called a reduced form equation for the

endogenous explanatory variable xK . In the context of single-equation linear 

models, a reduced form always involves writing an endogenous variable as a linear 

 projectiononto all exogenous variables.

²  From the structural equation (5.1) and the reduced form for xK , we obtain a

reduced form for y by plugging equation (5.4) into equation (5.1) and rearranging:

Equation (5.6)

²  Estimates of the reduced form parameters are sometimes of interest in their own

right, but estimating the structural parameters is generally more useful.

²  Two-stage least squares (2SLS) estimator is the most efficient IV estimator.To

illustrate the method of 2SLS, define the vector of exogenous variables again by

. Out of all possible linear 

combinations of z that can be used as an instrument for xK , the method of 2SLS

chooses that which is most highly correlated with xK . If xK  were exogenous, then

this choice would imply that the best instrument for xK  is simply itself. Ruling this

case out, the linear combination of z most highly correlated with xK  is given by the

linear projection of xK 

on z. Write the reduced form for xK 

as

is uncorrelated with u. In fact, xK  is often interpreted as the part of xK  that is

Page 4: Eco No Metric Analysis of Cross Section and Panel Data

8/6/2019 Eco No Metric Analysis of Cross Section and Panel Data

http://slidepdf.com/reader/full/eco-no-metric-analysis-of-cross-section-and-panel-data 4/5

uncorrelated with u. If xK is endogenous, it is because r K  is correlated with u.If we

could observe xK , we would use it as an instrument for xK  in equation (5.1)and use

the IV estimator from the previous subsection. Since the d j and y j are population

  parameters, xK is not a usable instrument. However, as long as we make

thestandard assumption that there are no exact linear dependencies among theexogenous variables, we can consistently estimate the parameters in equation

(5.14) byOLS. The sample analogues of the xfor each observation i are simply

the OLS iK fitted values:

²   Now, for each observation i, define the vector i=

1,2«  N. Using as the instruments for xi gives the IV estimator 

²  Part b is the crucial rank condition for identification. In a precise sense it means

that z is sufficiently linearly related to x so that rank E (z¶x) has full column rank.

²   Necessary for the rank condition is the order condition, L � K. In other words, we

must have at least as many instruments as we have explanatory variables. If we do

not have as many instruments as right-hand-side variables, then b is not identified.

However, L � K is no guarantee that 2SLS.2b holds: the elements of z might not

 be appropriately correlated with the elements of x.

6Additional Single-Equation Topics

Page 5: Eco No Metric Analysis of Cross Section and Panel Data

8/6/2019 Eco No Metric Analysis of Cross Section and Panel Data

http://slidepdf.com/reader/full/eco-no-metric-analysis-of-cross-section-and-panel-data 5/5