Applied Microeconometrics #01 Introduction

Applied Microeconometrics #01 IntroductionShigeki Kano (OPU) Applied Microeconometrics #01 August 2019 1 / 32
Table of Contents
Shigeki Kano (OPU) Applied Microeconometrics #01 August 2019 2 / 32
Section 1
Division of labor in econometrics
The two branches of econometrics
Macroeconometrics: Aggregate time series data on macroeconomic and financial variables. ⇒ Economic time series analysis, financial econometrics.
Microeconometrics: Cross-sectional data on individuals, e.g., households, firms, products, etc.
Features of microeconometrics
1 individual choice,
2 causal inference,
Individual choice: Directly analyze choice behaviours of individual agents.
Example 1: For a female population, Two mutually exclusive statuses, (1) working or (2) not working. ⇒ What would happen on the probability of (1) if she getting a kid?
Example 2: To move from City A to City B, commuters can choose (1) train, (2) bus, or (3) car. ⇒ What would happen on the demand for (2) and (3) if the ticket of (1) being discounted?
Causal inference: Evaluate the impact of policy or event on the behaviour and outcome of individuals.
Example 1: Does the participation into vocational training program subsidized by a government increase the earnings of the participants?
Example 2: Does the cut of class size, say, from 30 to 20 students per class, improve their test scores?
Discreteness of variables: Dis-aggregated data are more discrete.
Example 1: Individual unemployment status, recorded as dummy
yi =
1 (unemployed) , i = 1,2, . . . ,N, (1)
is discrete. ⇒ Unemployment rate rt = 1 N ∑yi at period t is virtually
real number, rt ∈ [0,1]; continuous.
Example 2: Let yi = patents firm i applied this year, taking on positive integer, yi = 0,1,2, . . .. ⇒ Average patent application st =
1 N ∑yi is virtually real number, st ∈ [0,∞]; continuous.
For discrete yi, linear regression with normal error,
yi = x ′ iβ+ εi, εi|xi ∼ N(0,σ2), (2)
may not be appropriate.
Large number of observations: Usually the number of observations (sample size), N, is large in microeconometrics.
Example: Survey data on N = 5,000 households by random sampling.
Asymptotic theory based on N→ ∞ is relevant.
Note: The data are not necessarily independent. For example, firm a from industry A is independent from firm b from industry B but may dependent on firm a′ from A. (...Clustered data.)
Course outline
We will study the following items.
Core methods: (a) OLS, IV/2SLS, GMM of linear regressions, (b) ML and quasi ML. ⇒ LN#01 – LN#04.
Panel data: (a) Panel data under strict exogeneity (FE and FD), (b) RE approach for linear/nonlinear panels, (c) sequential exoegeneity including dynamic panels. ⇒ LN#05 – LN#07.
Program evaluation: (a) Regression adjustment and PSM, (b) doubly robust estimation, (c) matching. ⇒ LN#08 – LN#012.
Program evaluation under selection-on-unobservables: (a) LATE, (b) DID and RD design. ⇒ LN#13 – LN#014.
Many important topics will not be covered: Censored and corner solution outcomes, handling missing data, sample selection, multinomial choices, duration data, simulation-based inference, specification tests, etc. ⇒ See Cameron and Trivedi (2005) and Wooldridge (2010).
Selected reading list
Preliminary, the 1st year graduate econometrics: Goldberger (1991), Greene (2012).
Popular textbooks covering many practical methods and ideas: Cameron and Trivedi (2005), Angrist and Pischke (2008), Wooldridge (2010).
Theoretical foundations: Amemiya (1985), White (1996), Lee (2009).
Asymptotic theory: Rao (1973), Newey and McFadden (1994), White (2000).
NBER Summer Institute 2007 Method Lectures: “What’s New in Econometrics”. https://www.nber.org/minicourse3.html
Population regression function
m(x) = E(y|x) = ∫
y y f (y|x)dy, (3)
is called a population regression function, where f (y|x) is a conditional density of y given x.
Call scalar y outcome and K-dimensional vector x regressor, respectively.
Generally, the functional form of m(x) is determined by joint density f (y|x), which is further derived from joint density f (x,y).
Example: If f (x,y) is multivariate normal distribution, m(x) is linear; m(x) = x′β, say.
On the other hand, a conditional variance function (CVF),
v(x) = Var(y|x) = E{[y−m(x)]2|x}= ∫
y [y−m(x)]2dy, (4)
is called a population heteroskedastic function.
Example: If f (x,y) is multivariate normal distribution, v(x) is constant; v(x) = σ2, say.
In most cases, we are more interested in a regression function and less in a heteroskedastic function.
Let h(x) be a generic predictor of y, i.e., a function of x to predict y. ⇒ Evaluate its prediction error by mean squared error (MSE)
q(h) = E{[y−h(x)]2}, (5)
where we can choose h(·).
Proposition 1 (Best predictor)
MSE q(h) is minimized by we choosing h(·) = m(·). Proof. See Chapter 5 of Goldberger (1991).
In this sense m(x) is the best predictor for y.
It is optimal to use m(x) for predicting y by seeing x.
Models of regression
In econometrics, instead of starting with conditional density f (y|x), usually we model regression function directly. Popular forms are:
1 linear
2 exponential; for y≥ 0 (including integer y = 1,2,3, . . .),
m(x) = E(y|x) = exp(x′γ), (7)
3 logistic; for 0≤ y≤ 1 (including binary y = {0,1}),
m(x) = E(y|x) = exp(x′η) 1+ exp(x′η)
. (8)
In general the derivative (gradient vector) of m(x) w.r.t. x depends on x. ⇒ To see the quantitative impact of x’s change on m(x), compute the marginal effects or partial effects at specific point x∗ (usually mean x),
MA(x∗) = ∂m(x)
MA(x∗) = β . (10)
3 Logistic;
[1+ exp(x′∗η)]2 η = [1−m(x∗)]m(x∗)η. (12)
Linear projection
We have already known that m(x) is the best predictor for y. ⇒ Restrict our scope on linear predictor
`(x) = x′b. (13)
Define a new MSE,
where what we can do is just choosing b.
Hereafter we assume that
The gradient (first derivative) and Hessian (second one) of q(b) w.r.t. b are given by
g(b) = ∂q(b)
H(b) = ∂g(b)
First order condition: Solve the first order condition
E[x(y−x′b)] = 0 ⇔ E(xx′)b= E(xy) (18)
and get
Second order condition: Because E(xx′)> 0 by assumption,
H(b)> 0. (20)
Proposition 2 (Best linear predictor)
The MSE in eq.(14) is minimized by setting
b= E(xx′)−1 E(xy). (21)
Proof. See above.
So we have the best linear predictor (BLP) or linear projection (LP) of y on x,
L(y|x) = x′b, b= E(xx′)−1 E(xy). (22)
If the regression (conditional expectation) m(x) is intrinsically linear,
m(x) = E(y|x) = L(y|x). (23)
Note: Even though the regression is nonlinear, the BLP is defined as long as E(xx′) is nonsingular.
Example 1
Consider the bivariate uniform distribution of Goldberger (1991);
f (x,y) = x+ y, 0 < x < 1, 0 < y < 1. (24)
For this joint density, the regression (CEF) and BLP are gieven by
E(y|x) = 3x+2 6x+3
, (25)
11 x, (26)
where regression is nonlinear in x.
Table 1: Both are similar; the BLP seems to approximate true regression line well.
0.0 0.2 0.4 0.6 0.8 1.0
0. 55
0. 60
0. 65
0. 55
0. 60
0. 65
Figure 1: CFE (regression,red) vs. BLP (blue) from bivariate uniform distribution
Section 3
Regression analysis under exogeneity
yi = x ′ iβ+ εi, i = 1,2, . . . ,N, (27)
for independent N observations, where εi is an error term.
The above just describes the linear relationship of (xi,yi,εi), no structural meanings. ⇒ Assume that xi is exogenous:
E(εi|xi) = 0, (28)
E(yi|xi) = x ′ iβ+E(εi|xi) = x
′ iβ. (29)
∴ Equation (27) with exogenous regressor xi implies a linear regression, so β is the partial effect.
In contrast, if we first assume linear regression E(yi|xi) = x ′ iβ, then
E(εi|xi) = E(yi−x′iβ|x) = E(yi|xi)−x′iβ = 0. (30)
So the exogeneity can be stated as either
E(εi|xi) = 0 or E(yi|xi) = x ′ iβ. (31)
Let m(xi;θ) be a general (linear or nonlinar) parametric model of regression and define residual εi = yi−m(xi;θ). Then
E(yi|xi) = m(xi;θ) ⇔ E(εi|xi) = E(yi|xi)−m(xi;θ) = 0, (32)
E(yi|xi) 6= m(xi;θ) ⇔ E(εi|xi) = E(yi|xi)−m(xi;θ) 6= 0. (33)
In the first case, we have correctly specified m(xi;θ).
In the second case, m(xi;θ) involves some misspecification.
For example, suppose we assume m(xi;β) = x′iβ but actuality E(yi|xi) = exp(x′iγ). Then
E(yi|xi) 6= x′iβ ⇔ E(εi|xi) = E(yi|xi)−x′iβ 6= 0. (34)
Another typical case of misspecification is due to omitted variables.
Suppose that we assume m(xi;β) = x′iβ but εi involves omitted variables correlated with xi, which result in E(εi|xi) = x
′ iγ.
In this case,
E(yi|xi) = x ′ iβ+x′iγ = x′iη, η = β+γ, (35)
which is different from our assumption ⇒ γ is called an omitted variables bias, a version of endogeneity bias studied in LN#02.
Regression analysis under orthogonality
The exogeneity needs us a perfect knowledge on the form of regression; too demanding. ⇒ Instead, let regressor xi be orthogonal to error εi:
E(xiεi) = 0. (36)
By inserting εi = yi−x′iβ into the orthogonality condition, we have
E[xi(yi−x′iβ)] = 0 ⇔ E(xix ′ i)β = E(xiyi) (37)
and, provided that E(xix ′ i) is non-singular,
β = E(xix ′ i) −1 E(xiyi), (38)
identical to condition (19).
Important remark: Due to the law of iterated expectations,
E(xiεi) = E[E(xiεi|xi)] = E[xi E(εi|xi)], (39)
which, if we assume the exoegeneity of eq.(28), turns to
E(xiεi) = E(xi ·0) = 0. (40)
∴ The exogeneity implies orthogonality:
E(εi|xi) = 0 =exogeneity
=orthogonality
, (41)
but not vice versa. ⇒ In this sense the orthogonality assumption is weaker than exogeneity.
The BLP interpretation of linear regression becomes popular because it
1 is based on a weaker assumption,
2 is legitimate even for nonlinear regressions, and
3 mimics (possibly complex, nonlinear) true regression function m(xi) well in that
b= argminq(b), q(b) = E{[m(xi)−x′b]2}. (42)
(The proof is left as a quiz.)
References I
Amemiya, T. (1985). Advanced Econometrics. Harvard University Press.
Angrist, J. D. and J.-S. Pischke (2008). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
Cameron, A. C. and P. K. Trivedi (2005). Microeconometrics: Methods and Applications. Cambridge University Press.
Goldberger, A. S. (1991). A Course in Econometrics. Harvard University Press.
Greene, W. H. (2012). Econometric Analysis (seventh ed.). Pearson Education.
Lee, M.-j. (2009). Micro-econometrics: Methods of Moments and Limited Dependent Variables. Springer.
Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. Handbook of Econometrics 4, 2111–2245.
References II
Rao, C. R. (1973). Linear Statistical Inference and Its Applications (second ed.). Wiley.
White, H. (1996). Estimation, Inference and Specification Analysis. Cambridge University Press.
White, H. (2000). Asymptotic Theory for Econometricians. Academic Press.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (second ed.). MIT Press.
What is Microeconometrics?

Documents

Applied Microeconometrics #01 Introduction