Applied Microeconometrics #01 Introduction Shigeki Kano Osaka Prefecture University August 2019 Shigeki Kano (OPU) Applied Microeconometrics #01 August 2019 1 / 32

Applied Microeconometrics #01 Introduction

Table of Contents
Section 1
Division of labor in econometrics
The two branches of econometrics
Macroeconometrics: Aggregate time series data on macroeconomic and financial variables. ⇒ Economic time series analysis, financial econometrics.
Microeconometrics: Cross-sectional data on individuals, e.g., households, firms, products, etc.
Features of microeconometrics
1 individual choice,
2 causal inference,
Individual choice: Directly analyze choice behaviours of individual agents.
Example 1: For a female population, Two mutually exclusive statuses, (1) working or (2) not working. ⇒ What would happen on the probability of (1) if she getting a kid?
Example 2: To move from City A to City B, commuters can choose (1) train, (2) bus, or (3) car. ⇒ What would happen on the demand for (2) and (3) if the ticket of (1) being discounted?
Causal inference: Evaluate the impact of policy or event on the behaviour and outcome of individuals.
Example 1: Does the participation into vocational training program subsidized by a government increase the earnings of the participants?
Example 2: Does the cut of class size, say, from 30 to 20 students per class, improve their test scores?
Discreteness of variables: Dis-aggregated data are more discrete.
Example 1: Individual unemployment status, recorded as dummy
yi =
1 (unemployed) , i = 1,2, . . . ,N, (1)
is discrete. ⇒ Unemployment rate rt = 1 N ∑yi at period t is virtually
real number, rt ∈ [0,1]; continuous.
Example 2: Let yi = patents firm i applied this year, taking on positive integer, yi = 0,1,2, . . .. ⇒ Average patent application st =
1 N ∑yi is virtually real number, st ∈ [0,∞]; continuous.
For discrete yi, linear regression with normal error,
yi = x ′ iβ+ εi, εi|xi ∼ N(0,σ2), (2)
may not be appropriate.
Large number of observations: Usually the number of observations (sample size), N, is large in microeconometrics.
Example: Survey data on N = 5,000 households by random sampling.
Asymptotic theory based on N→ ∞ is relevant.
Note: The data are not necessarily independent. For example, firm a from industry A is independent from firm b from industry B but may dependent on firm a′ from A. (...Clustered data.)
Course outline
We will study the following items.
Core methods: (a) OLS, IV/2SLS, GMM of linear regressions, (b) ML and quasi ML. ⇒ LN#01 – LN#04.
Panel data: (a) Panel data under strict exogeneity (FE and FD), (b) RE approach for linear/nonlinear panels, (c) sequential exoegeneity including dynamic panels. ⇒ LN#05 – LN#07.
Program evaluation: (a) Regression adjustment and PSM, (b) doubly robust estimation, (c) matching. ⇒ LN#08 – LN#012.
Program evaluation under selection-on-unobservables: (a) LATE, (b) DID and RD design. ⇒ LN#13 – LN#014.
Many important topics will not be covered: Censored and corner solution outcomes, handling missing data, sample selection, multinomial choices, duration data, simulation-based inference, specification tests, etc. ⇒ See Cameron and Trivedi (2005) and Wooldridge (2010).
Selected reading list
Preliminary, the 1st year graduate econometrics: Goldberger (1991), Greene (2012).
Popular textbooks covering many practical methods and ideas: Cameron and Trivedi (2005), Angrist and Pischke (2008), Wooldridge (2010).
Theoretical foundations: Amemiya (1985), White (1996), Lee (2009).
Asymptotic theory: Rao (1973), Newey and McFadden (1994), White (2000).
NBER Summer Institute 2007 Method Lectures: “What’s New in Econometrics”. https://www.nber.org/minicourse3.html
Population regression function
m(x) = E(y|x) = ∫
y y f (y|x)dy, (3)
is called a population regression function, where f (y|x) is a conditional density of y given x.
Call scalar y outcome and K-dimensional vector x regressor, respectively.
Generally, the functional form of m(x) is determined by joint density f (y|x), which is further derived from joint density f (x,y).
Example: If f (x,y) is multivariate normal distribution, m(x) is linear; m(x) = x′β, say.
On the other hand, a conditional variance function (CVF),
v(x) = Var(y|x) = E{[y−m(x)]2|x}= ∫
y [y−m(x)]2dy, (4)
is called a population heteroskedastic function.
Example: If f (x,y) is multivariate normal distribution, v(x) is constant; v(x) = σ2, say.
In most cases, we are more interested in a regression function and less in a heteroskedastic function.
Let h(x) be a generic predictor of y, i.e., a function of x to predict y. ⇒ Evaluate its prediction error by mean squared error (MSE)
q(h) = E{[y−h(x)]2}, (5)
where we can choose h(·).
Proposition 1 (Best predictor)
MSE q(h) is minimized by we choosing h(·) = m(·). Proof. See Chapter 5 of Goldberger (1991).
In this sense m(x) is the best predictor for y.
It is optimal to use m(x) for predicting y by seeing x.
Models of regression
In econometrics, instead of starting with conditional density f (y|x), usually we model regression function directly. Popular forms are:
1 linear
2 exponential; for y≥ 0 (including integer y = 1,2,3, . . .),
m(x) = E(y|x) = exp(x′γ), (7)
3 logistic; for 0≤ y≤ 1 (including binary y = {0,1}),
m(x) = E(y|x) = exp(x′η) 1+ exp(x′η)
. (8)
In general the derivative (gradient vector) of m(x) w.r.t. x depends on x. ⇒ To see the quantitative impact of x’s change on m(x), compute the marginal effects or partial effects at specific point x∗ (usually mean x),
MA(x∗) = ∂m(x)
MA(x∗) = β . (10)
3 Logistic;
[1+ exp(x′∗η)]2 η = [1−m(x∗)]m(x∗)η. (12)
Linear projection
We have already known that m(x) is the best predictor for y. ⇒ Restrict our scope on linear predictor
`(x) = x′b. (13)
Define a new MSE,
where what we can do is just choosing b.
Hereafter we assume that
The gradient (first derivative) and Hessian (second one) of q(b) w.r.t. b are given by
g(b) = ∂q(b)
H(b) = ∂g(b)
First order condition: Solve the first order condition
E[x(y−x′b)] = 0 ⇔ E(xx′)b= E(xy) (18)
and get
Second order condition: Because E(xx′)> 0 by assumption,
H(b)> 0. (20)
Proposition 2 (Best linear predictor)
The MSE in eq.(14) is minimized by setting
b= E(xx′)−1 E(xy). (21)
Proof. See above.
So we have the best linear predictor (BLP) or linear projection (LP) of y on x,
L(y|x) = x′b, b= E(xx′)−1 E(xy). (22)
If the regression (conditional expectation) m(x) is intrinsically linear,
m(x) = E(y|x) = L(y|x). (23)
Note: Even though the regression is nonlinear, the BLP is defined as long as E(xx′) is nonsingular.
Example 1
Consider the bivariate uniform distribution of Goldberger (1991);
f (x,y) = x+ y, 0 < x < 1, 0 < y < 1. (24)
For this joint density, the regression (CEF) and BLP are gieven by
E(y|x) = 3x+2 6x+3
, (25)
11 x, (26)
where regression is nonlinear in x.
Table 1: Both are similar; the BLP seems to approximate true regression line well.
0.0 0.2 0.4 0.6 0.8 1.0
0. 55
0. 60
0. 65
0. 55
0. 60
0. 65
Figure 1: CFE (regression,red) vs. BLP (blue) from bivariate uniform distribution
Section 3
Regression analysis under exogeneity
yi = x ′ iβ+ εi, i = 1,2, . . . ,N, (27)
for independent N observations, where εi is an error term.
The above just describes the linear relationship of (xi,yi,εi), no structural meanings. ⇒ Assume that xi is exogenous:
E(εi|xi) = 0, (28)
E(yi|xi) = x ′ iβ+E(εi|xi) = x
′ iβ. (29)
∴ Equation (27) with exogenous regressor xi implies a linear regression, so β is the partial effect.
In contrast, if we first assume linear regression E(yi|xi) = x ′ iβ, then
E(εi|xi) = E(yi−x′iβ|x) = E(yi|xi)−x′iβ = 0. (30)
So the exogeneity can be stated as either
E(εi|xi) = 0 or E(yi|xi) = x ′ iβ. (31)
Let m(xi;θ) be a general (linear or nonlinar) parametric model of regression and define residual εi = yi−m(xi;θ). Then
E(yi|xi) = m(xi;θ) ⇔ E(εi|xi) = E(yi|xi)−m(xi;θ) = 0, (32)
E(yi|xi) 6= m(xi;θ) ⇔ E(εi|xi) = E(yi|xi)−m(xi;θ) 6= 0. (33)
In the first case, we have correctly specified m(xi;θ).
In the second case, m(xi;θ) involves some misspecification.
For example, suppose we assume m(xi;β) = x′iβ but actuality E(yi|xi) = exp(x′iγ). Then
E(yi|xi) 6= x′iβ ⇔ E(εi|xi) = E(yi|xi)−x′iβ 6= 0. (34)
Another typical case of misspecification is due to omitted variables.
Suppose that we assume m(xi;β) = x′iβ but εi involves omitted variables correlated with xi, which result in E(εi|xi) = x
′ iγ.
In this case,
E(yi|xi) = x ′ iβ+x′iγ = x′iη, η = β+γ, (35)
which is different from our assumption ⇒ γ is called an omitted variables bias, a version of endogeneity bias studied in LN#02.
Regression analysis under orthogonality
The exogeneity needs us a perfect knowledge on the form of regression; too demanding. ⇒ Instead, let regressor xi be orthogonal to error εi:
E(xiεi) = 0. (36)
By inserting εi = yi−x′iβ into the orthogonality condition, we have
E[xi(yi−x′iβ)] = 0 ⇔ E(xix ′ i)β = E(xiyi) (37)
and, provided that E(xix ′ i) is non-singular,
β = E(xix ′ i) −1 E(xiyi), (38)
identical to condition (19).
Important remark: Due to the law of iterated expectations,
E(xiεi) = E[E(xiεi|xi)] = E[xi E(εi|xi)], (39)
which, if we assume the exoegeneity of eq.(28), turns to
E(xiεi) = E(xi ·0) = 0. (40)
∴ The exogeneity implies orthogonality:
E(εi|xi) = 0 =exogeneity
, (41)
but not vice versa. ⇒ In this sense the orthogonality assumption is weaker than exogeneity.
The BLP interpretation of linear regression becomes popular because it
1 is based on a weaker assumption,
2 is legitimate even for nonlinear regressions, and
3 mimics (possibly complex, nonlinear) true regression function m(xi) well in that
b= argminq(b), q(b) = E{[m(xi)−x′b]2}. (42)
(The proof is left as a quiz.)
What is Microeconometrics?