View
225
Download
4
Category
Preview:
DESCRIPTION
Econometria
Citation preview
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Maximum Likelihood Estimation
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009
April 13, 2009
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Consider the following data set
The explained variable is binary.
Admission to grad school depending on GRE score, Families sendkids to school as a function of income, etc.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
This case is a good candidate for a non-linear model
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
How to search for a model and estimate its parameters?
Consider the following non-linear model
E(y|x) = F (1 + 2x), F (z) = z
12pies2
2 ds
For example, positive values for 1 may produce a functionthat looks like the one in the previous graph (we well get1 = 2.8619 and 2 = 6.7612.This is a truly non-linear model (in both, parameters andvariables).
This is the probit model.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
In order to search for an estimation strategy for the parameters, wewill exploit the following fact.
Since y|x is a binary variable
E(y|x) = Pr(y = 1|x)
so, by being specific about the form of E(y|x) we are beingspecific about Pr(y = 1|x).
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Likelihood and Basic Concepts
Z f(z; 0). o
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Example: Z N(, 2)Here = (, 2), and K = 2.
Note:
f(z; ) : < < : 12pi
exp[ (z)2
22
]L(; z) :
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Maximum Likelihood
The maximum-likelihood estimator n is defined as
n argmax
L(; z)
It is kind of a reverse enginereeing process: to generate random numbers for acertain distribution you first set parameter values and then get realizations.This is doing the reverse process: first set the realizations and try to get theparameters that are most likely to have generated them.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Some normalizations
n argmax
L(; z)
Note
n = argmax
lnL(; z)
andn
i=1 lnL(; zi) =n
i=1 l(; zi)Also
n = argmax
1n
ni=1
l(; zi)
We will use whichever one is more convenient.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
If l(; z) is differentiable and has a local maximum in an interiorpoint, then the FOCs for the problem are
l(; z)
=n
=ni=1
l(; zi)
=n
= 0.
This is a system of K possibly non-linear equations with Kunknowns, that define n implicitely.
Even if we can guarantee that a solution to this problemexists, we do not have enough information to solve for .
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
The discrete case
When Y is a discrete random variable, the likelihood function willbe directly the probability function, that is
L(Y ; ) = f(y; )
where f(y : ) is now Pr(Y = y; ).
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Conditional likelihood
Suppose f(y, x, ) is the joint density function of two variables Xand Y . Then, it can be decomposed as
f(y, x ; ) = f(y|X ; )f(x;)Suppose we are interested in estimating : if and arefunctionally unrelated, then maximizing the joint likelihood isachivied through maximizing separately the conditional and themarginal likelihood: the MLE of also maximizes the conditionallikelihood: we can obtain ML estimates by specifying theconditional likelihood only.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Three Examples
Poisson Distribution: Y Poisson() if it takes integer andpositive values (including zero) and:
f(y) = Pr(Y = y) =eoyoy!
For an iid sample:
L(, Y ) =ni=1
eyi
yi!
its log is:
l(, Y ) =ni=1
[+ yi ln ln yi!]
= n+ lnni=1
yi ni=1
ln yi!
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
FOCs are
n+ 1
ni=1
yi = 0
so the MLE of is:
=1n
ni=1
yi = y
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Probit Model:
Y |X Bernoulli(p), p Pr(Y = 1|x) = F (x), and F (z) isthe normal CDF.
The sample (conditional) likelihood function will be:
L(, Y ) =
i/yi=1
pi
i/yi=0
(1 pi) =ni=1
pyii (1 pi)1yi
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Then
l(, Y ) =ni=1
[yi lnF (xi) + (1 yi) ln
(1 F (xi)
) ]FOCs for a local maximum are:
ni=1
(yi Fi)fixiFi(1 Fi) = 0
,Fi F (xi), fi f(xi). This is a system of K non-linearequations with K unknowns. Moreover, it is not possible to solvefor and obtain and explicit solution.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Gaussian regression model:
y = x0 + u, with u|x N(0, 2)or, alternatively
y|x N(x0, 2) = 12pi
exp
[1
2
(y x0
)2]
Then
l(, 2; Y ) = n ln n ln
2pi 122
ni=1
(yi xi)2
Any idea what will be the MLE of 0?
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Asymptotic Properties
We will follow a similar path as we did with other estimationstrategies
Consistency
Asymptotic Normality
Estimation of the asymptotic variance
Asymptotic efficiency
Invariance (this is new)
But first we need to agree on some regularity conditions
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Setup and Regularity Conditions
At this stage we will simply state them, and discuss them as we go along.
Some are purely technical, but some of them have important intuitive meaning.
1 Zi, i = 1, . . . , n, iid f(zi; 0)2 6= 0 f(zi; ) 6= f(zi; 0).3 , a compact set.4 ln f(zi; ) is continuous at each w.p.1.5 E
[sup |ln f(z; )|
]
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
In addition, for asymptotic normality we will add the following:
6 0 is an interior point of .7 f(z; ) is twice continuously differentiable and strictly
poisitive in a neighborhood N of 0.8
supN f(z; ) dz
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Quick detour: on bounds.
Recall that:
|E(X)| < E(|X|)Then by bounding E(|X|) we are guaranteeing that < E(X)
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Consistency
Our starting point is the following normalized version of the MLE:
n = argmax
Qn(), Qn() 1n
ni=1
l(, Zi)
For consistency we need to establish the following three results
1 Qn() converges uniformly in probability toQ0() E [l(, Z)].
2 Q0() has a unique maximum at 0.3
Qn()up Q0() argmax
Qn()
n
p argmax
Q0() 0
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Intuition
n maximizes Qn() (definition of MLE).Qn() Q0() (the MLE problem if well defined at ).0 maximizes Q0() (the true value solves the problem at ).By maximizing Qn() we end up maximizing Q0(),convergence of the sequence of functions guaranteesconverges of maximizers. (this is the difficult step).
If argmaxQn() is seen as function defined on functions, what property is
implied by 3)?
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
1) Qn() converges uniformly in probability to Q0() E [l(, Z)]
If we fix at any point then
Qn() =1n
ni=1
l(, Zi)p E[l(, Z)]
since by our assumptions (which ones?), l(, Zi) is a sequence ofrvs that satisfies Kolmogorovs LLN.
This establishes pointwise convergence of Qn() to Q0(). But ourstrategy requires uniform convergence.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Uniform Convergence: a sequence of real valued functions fndefined on a set S < converges uniformly to a function f on S iffor each > 0, there is a number N such that
n > N |fn(x) f(x)| < for all x S
Intuition: we are using the same for the whole domain, that is,eventually we can put fn in the strip f .It can be shown that uniform convergence is equivalent to
supxS|f(x) fn(x)| 0
(see Ross (1980, pp. 137))
Uniform convergence in probability: Qn() converges uniformly inprobability to Q0() means sup |Qn()Q0()| p 0.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Uniform LLN: if Zi are iid, is compact and a(Zi, ) is acontinuous function at each wp1, and there is d(Z) witha(z, ) d(Z) for all and E[d(Z)]
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
2) Q0() has a unique maximum at 0.
Information inequality: if 6= 0 f(zi) 6= f(zi; 0) andE[|l(;Z)|] lnE
[f(Z; )f(Z; 0)
]Jensens inequality!
> ln
f(z; )f(z; 0)
f(z; 0) dz
> ln 1> 0
Important: check where does the argument break down if assumptions do not hold.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
3) 1) and 2) imply consistency.
Pick any > 0. Let us get three inequalities wpa 1.
n maximizes Qn(), so
a) Qn(n) > Qn(0) /3Qn() converges uniformly to Q0(), so Qn(n)Q0(n) < /3,hence
b) Q0(n) > Qn(n) /3and Q0(0)Qn(0) < /3 hence
c) Qn(0) > Q0(0) /3
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Now start with b)
Q0(n) > Qn(n) /3> Qn(0) 2/3 Substract /3 in both sides of a)
d) Q0(n) > Q0(0) Substract 2/3 in both sides of c)
Let N be an open subset of containing 0. N open impliesN c closed and bounded: compact in our case.Since Q0() is continuous (the uniform limit of a continuousfunction is continuous) then:
supN c
Q0() = Q0()
for some N c (continuous functions over compact setsachieve their maximum).
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Now since 0 is the unique maximizer of Q0(),
Q0() < Q0(0)
Now pick = Q0(0)Q0(), then by inequality d), wpa1
Q0() > Q0()
so N wpa1. (if not Q0() would not be the sup).Then using the definition of convergence in probability, since Nwas chosen arbitrarily
np 0
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Asymptotic normalilty
Asymptotic normality is a bit easier to establish since we will followa strategy very simimlar to what we did with all previousestimators.
But first we need to establish some notation and results.
Score and hessian.
Score equality
Information matrix
Information matrix equality
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Score, Hessian and Information
Score: s(;Z) l(;Z), a K 1 vector.Sample score: s(; Z) = l(; Z) =
ni=1 s(;Zi)
Hessian: H(;Z) l(;Z), a K K matrix.Sample hessian: H(; Z) = l(; Z) =
ni=1H(;Zi)
Information matrix: J E [s(0;Z)s(0;Z)], an K Kmatrix.
Score equality: E [s(0;Z)] = 0 (It is kind of a FOC of the likelihood inequality.)
Information equality: E [H(0;Z)] = J
Note that this imples V [s(0;Z)] J
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Proof of score equality (the continuous case):
For any f(z; ) dz = 1
Taking derivatives in both sides
d[f(z; )dz)]d
= 0
If it is possible to interchange differentiation and integration:df(z; )d
d = 0
The score is a log-derivative, so
s(; z) =d ln f(z; )
d=df(z; )/df(z; , )
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
hencedf(z; )/d = s(; z)f(z, )
Replacing above: s(; z)f(z; ) dz = 0
When = 0 s(0; z)f(z; 0) dz = E [s(0; z)]
SoE [s(0; z)] = 0
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Aside: Interchanging differentiation and integration
Source: Bartle, R., 1966, The Elements of Integration, Wiley, New York
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Proof of information equality (the continuous case):
From the previous result: s(; z)f(z; )dz = 0
Take derivatives in both sides, use the product rule and omit arguments in functionsto simplify notation:
(sf + sf)dz = 0sf dz +
sfdz = 0
From the score equality, f = sf , replacing f aboves(; z)s(; z)f(z; )dz +
s(; z)f(z; )dx = 0
When = 0
E(s(0, Z)s(0, Z)) +
H(0; z)f(z; 0)dz = 0
J + E(H(0;Z)) = 0
which implies the desired result.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Asymptotic normality
Under our assumptions, wpa1 the MLE estimator satisfies theFOCs
s(n; Z) = 0
Take a first order Taylor expansion around 0
s(n; Z) = s(0; Z) +H(; Z)(n 0) = 0where is a mean value located between n and 0. (Note that
consistency implies np 0).
Now solve
n( 0
)=
( H(; Z)
n
)1(s(0; Z)
n
)Now we are back in familiar territory: we will show that the first factor does not explode, and that the second is
aymptotically normal
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
First we will show: (n1H(; Z)
)1 p J1Preliminary result: if gn() is a sequence of random functions that converge uniformly
in probability to g0() for all in a compact set , and g0() is continuous, np 0
implies gn(n)p g0(0) (see Ruud (2000, pp. 326).
According to our assumptions n1H(; Z) = n1n
i=1H(;Zi)converges uniformly in probability to E [H(;Z)], which by theprevious results, it is continuous in .
Hence, by the previous result, since p 0
n1H(; Z) p E [H(0;Z)] = J
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Now we show:s(0; Z)
n
d N(0, J)
Start with
s(0; Z)n
=ns(0, Z)
n=n
ni=1 s(0, Zi)
n
In order to apply the CLT we check
E [s(0, Zi)] = 0, by the score equality.V [s(0, Zi)] = J
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Collecting results:
n(n 0
)=
( H(0,X)n
)1 (s(0,X)
n
)p J1 d N(0, J))
Then by Slutzky theorem and linearity
n(n 0
)d N
(0, J1JJ1
)= N
(0, J1
)
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Variance estimation
The asymptotic variance of n is J1, with J = V (s(0, Z). We
will propose three consistent estimators:
1 Inverse of empirical minus hessian:[ 1n
ni=1
H(n;Zi)
]12 Inverse of empirical variance of score (OPG):[
1n
ni=1
s(n;Zi)s(n;Zi)]1
3 Inverse of empirical information matrix:[J(n)1
]Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Invariance
Invariance: Let = g(), where g() is a one-to-one function. Let0 denote the true parameter, so 0 = g(0) is the true parameterunder the new reparametrizaton. Then, if its MLE of 0, = g() is the MLE of 0
Example: if is the MLE of ln(0), how can we get the MLE of 0?
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Proof:
Since is the MLEl(, z) l(, z),
for every . Since = g() is one-to-one:
l(g1(), z) l(g1(), z)then = g() maximizes the reparametrized log-likelihood.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
MLE and unbiasedness
The invariance property makes the MLE estimator very likely to bebiased in many relevant cases.
Consider the following intuition. Suppose is the MLE for 0 andsuppose it is unbiased, so
E() = 0
By invariance, g() is the MLE of g(0). Is g() unbiased? In
generalE(g()) 6= g(E()) = g(0
so if the MLE is unbiased for one parametrization, it is very likelyto be biased for most other parametrizations.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
MLE and Efficiency
Let be any unbiased estimator of 0. An important result is thefollowing
Cramer-Rao Inequality: V () (nJ)1 is psd.
This provides a lower bound for unbiased estimators.
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
Proof: the single parameter case (K = 1).
For any two random variables X and Y
Cov(X,Y )2 V (X)V (Y )
since the squared correlation is less than 1. Then
V (X) Cov(X,Y )2
V (Y )
We will take X = and Y = s(0, Z). It is immediate to check
V (s(0, Z)) = V
(ni=1
s(0, Zi)
)= n J
So...what do we need to show to finish the proof?
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
We have
V () Cov
(, s(0, Z)
)2nJ
so we need to show Cov(, s(0, Z)
)= 1.
(Sketch:) Since E(s(0, Z)) = 0, Cov(, s(0, Z)
)= E
(s(0, Z)
)E(s) =
s f(z) dz
=
f(z)
f(z)f(z) dz
=
f(z) dz
=
f(z, ) dz
=0
=
E()|=0
= 1
since E() = 0.Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-Likelihood
Asymptotic properties
MLE and efficiency
The CR bound applies to unbiased estimators. MLE is likely to bebiased.
MLE estimators are asymptotically normal, centered around the trueparameter with normalized variance equal to the CR lower bound forunbiased estimators.
Problem: the class of consistent AN estimators includes someextreme (an highly unusual) cases that can improve upon the CRbound (the so called superefficient estimator).
Rao (1963): the MLE estimator is efficient (minimum variance) inthe class of consistent and uniformly asymptotically normal (CUAN)estimators.
CUAN estimators: is CUAN for 0 if it is consistent and( 0) converges in distribution to a normal rv uniformly over
compact subsets of .
Walter Sosa-Escudero Maximum Likelihood Estimation
Motivation: a non-linear modelLikelihood and Maximum-LikelihoodAsymptotic properties
Recommended