Mle

Motivation: a non-linear modelLikelihood and Maximum-Likelihood

Asymptotic properties

Maximum Likelihood Estimation

Walter Sosa-Escudero

Econ 507. Econometric Analysis. Spring 2009

April 13, 2009

Walter Sosa-Escudero Maximum Likelihood Estimation



Consider the following data set

The explained variable is binary.

Admission to grad school depending on GRE score, Families sendkids to school as a function of income, etc.




This case is a good candidate for a non-linear model




How to search for a model and estimate its parameters?

Consider the following non-linear model

E(y|x) = F (1 + 2x), F (z) = z

12pies2

2 ds

For example, positive values for 1 may produce a functionthat looks like the one in the previous graph (we well get1 = 2.8619 and 2 = 6.7612.This is a truly non-linear model (in both, parameters andvariables).

This is the probit model.




Likelihood and Basic Concepts

Z f(z; 0). o



Example: Z N(, 2)Here = (, 2), and K = 2.

Note:

f(z; ) : < < : 12pi

exp[ (z)2

22

]L(; z) :



Maximum Likelihood

The maximum-likelihood estimator n is defined as

n argmax

L(; z)

It is kind of a reverse enginereeing process: to generate random numbers for acertain distribution you first set parameter values and then get realizations.This is doing the reverse process: first set the realizations and try to get theparameters that are most likely to have generated them.




Some normalizations

n argmax

L(; z)

Note

n = argmax

lnL(; z)

andn

i=1 lnL(; zi) =n

i=1 l(; zi)Also

n = argmax

1n

ni=1

l(; zi)

We will use whichever one is more convenient.




If l(; z) is differentiable and has a local maximum in an interiorpoint, then the FOCs for the problem are

l(; z)

=n

=ni=1

l(; zi)

=n

= 0.

This is a system of K possibly non-linear equations with Kunknowns, that define n implicitely.

Even if we can guarantee that a solution to this problemexists, we do not have enough information to solve for .




The discrete case

When Y is a discrete random variable, the likelihood function willbe directly the probability function, that is

L(Y ; ) = f(y; )

where f(y : ) is now Pr(Y = y; ).




Conditional likelihood

Suppose f(y, x, ) is the joint density function of two variables Xand Y . Then, it can be decomposed as

f(y, x ; ) = f(y|X ; )f(x;)Suppose we are interested in estimating : if and arefunctionally unrelated, then maximizing the joint likelihood isachivied through maximizing separately the conditional and themarginal likelihood: the MLE of also maximizes the conditionallikelihood: we can obtain ML estimates by specifying theconditional likelihood only.




Three Examples

Poisson Distribution: Y Poisson() if it takes integer andpositive values (including zero) and:

f(y) = Pr(Y = y) =eoyoy!

For an iid sample:

L(, Y ) =ni=1

eyi

yi!

its log is:

l(, Y ) =ni=1

[+ yi ln ln yi!]

= n+ lnni=1

yi ni=1

ln yi!




FOCs are

n+ 1

ni=1

yi = 0

so the MLE of is:

=1n

ni=1

yi = y




Probit Model:

Y |X Bernoulli(p), p Pr(Y = 1|x) = F (x), and F (z) isthe normal CDF.

The sample (conditional) likelihood function will be:

L(, Y ) =

i/yi=1

pi

i/yi=0

(1 pi) =ni=1

pyii (1 pi)1yi




Then

l(, Y ) =ni=1

[yi lnF (xi) + (1 yi) ln

(1 F (xi)

) ]FOCs for a local maximum are:

ni=1

(yi Fi)fixiFi(1 Fi) = 0

,Fi F (xi), fi f(xi). This is a system of K non-linearequations with K unknowns. Moreover, it is not possible to solvefor and obtain and explicit solution.




Gaussian regression model:

y = x0 + u, with u|x N(0, 2)or, alternatively

y|x N(x0, 2) = 12pi

exp

[1

2

(y x0

)2]

Then

l(, 2; Y ) = n ln n ln

2pi 122

ni=1

(yi xi)2

Any idea what will be the MLE of 0?




Asymptotic Properties

We will follow a similar path as we did with other estimationstrategies

Consistency

Asymptotic Normality

Estimation of the asymptotic variance

Asymptotic efficiency

Invariance (this is new)

But first we need to agree on some regularity conditions




Setup and Regularity Conditions

At this stage we will simply state them, and discuss them as we go along.

Some are purely technical, but some of them have important intuitive meaning.

1 Zi, i = 1, . . . , n, iid f(zi; 0)2 6= 0 f(zi; ) 6= f(zi; 0).3 , a compact set.4 ln f(zi; ) is continuous at each w.p.1.5 E

[sup |ln f(z; )|

]



In addition, for asymptotic normality we will add the following:

6 0 is an interior point of .7 f(z; ) is twice continuously differentiable and strictly

poisitive in a neighborhood N of 0.8

supN f(z; ) dz



Quick detour: on bounds.

Recall that:

|E(X)| < E(|X|)Then by bounding E(|X|) we are guaranteeing that < E(X)



Consistency

Our starting point is the following normalized version of the MLE:

n = argmax

Qn(), Qn() 1n

ni=1

l(, Zi)

For consistency we need to establish the following three results

1 Qn() converges uniformly in probability toQ0() E [l(, Z)].

2 Q0() has a unique maximum at 0.3

Qn()up Q0() argmax

Qn()

n

p argmax

Q0() 0




Intuition

n maximizes Qn() (definition of MLE).Qn() Q0() (the MLE problem if well defined at ).0 maximizes Q0() (the true value solves the problem at ).By maximizing Qn() we end up maximizing Q0(),convergence of the sequence of functions guaranteesconverges of maximizers. (this is the difficult step).

If argmaxQn() is seen as function defined on functions, what property is

implied by 3)?




1) Qn() converges uniformly in probability to Q0() E [l(, Z)]

If we fix at any point then

Qn() =1n

ni=1

l(, Zi)p E[l(, Z)]

since by our assumptions (which ones?), l(, Zi) is a sequence ofrvs that satisfies Kolmogorovs LLN.

This establishes pointwise convergence of Qn() to Q0(). But ourstrategy requires uniform convergence.




Uniform Convergence: a sequence of real valued functions fndefined on a set S < converges uniformly to a function f on S iffor each > 0, there is a number N such that

n > N |fn(x) f(x)| < for all x S

Intuition: we are using the same for the whole domain, that is,eventually we can put fn in the strip f .It can be shown that uniform convergence is equivalent to

supxS|f(x) fn(x)| 0

(see Ross (1980, pp. 137))

Uniform convergence in probability: Qn() converges uniformly inprobability to Q0() means sup |Qn()Q0()| p 0.




Uniform LLN: if Zi are iid, is compact and a(Zi, ) is acontinuous function at each wp1, and there is d(Z) witha(z, ) d(Z) for all and E[d(Z)]



2) Q0() has a unique maximum at 0.

Information inequality: if 6= 0 f(zi) 6= f(zi; 0) andE[|l(;Z)|] lnE

[f(Z; )f(Z; 0)

]Jensens inequality!

> ln

f(z; )f(z; 0)

f(z; 0) dz

> ln 1> 0

Important: check where does the argument break down if assumptions do not hold.




3) 1) and 2) imply consistency.

Pick any > 0. Let us get three inequalities wpa 1.

n maximizes Qn(), so

a) Qn(n) > Qn(0) /3Qn() converges uniformly to Q0(), so Qn(n)Q0(n) < /3,hence

b) Q0(n) > Qn(n) /3and Q0(0)Qn(0) < /3 hence

c) Qn(0) > Q0(0) /3




Now start with b)

Q0(n) > Qn(n) /3> Qn(0) 2/3 Substract /3 in both sides of a)

d) Q0(n) > Q0(0) Substract 2/3 in both sides of c)

Let N be an open subset of containing 0. N open impliesN c closed and bounded: compact in our case.Since Q0() is continuous (the uniform limit of a continuousfunction is continuous) then:

supN c

Q0() = Q0()

for some N c (continuous functions over compact setsachieve their maximum).




Now since 0 is the unique maximizer of Q0(),

Q0() < Q0(0)

Now pick = Q0(0)Q0(), then by inequality d), wpa1

Q0() > Q0()

so N wpa1. (if not Q0() would not be the sup).Then using the definition of convergence in probability, since Nwas chosen arbitrarily

np 0




Asymptotic normalilty

Asymptotic normality is a bit easier to establish since we will followa strategy very simimlar to what we did with all previousestimators.

But first we need to establish some notation and results.

Score and hessian.

Score equality

Information matrix

Information matrix equality




Score, Hessian and Information

Score: s(;Z) l(;Z), a K 1 vector.Sample score: s(; Z) = l(; Z) =

ni=1 s(;Zi)

Hessian: H(;Z) l(;Z), a K K matrix.Sample hessian: H(; Z) = l(; Z) =

ni=1H(;Zi)

Information matrix: J E [s(0;Z)s(0;Z)], an K Kmatrix.

Score equality: E [s(0;Z)] = 0 (It is kind of a FOC of the likelihood inequality.)

Information equality: E [H(0;Z)] = J

Note that this imples V [s(0;Z)] J




Proof of score equality (the continuous case):

For any f(z; ) dz = 1

Taking derivatives in both sides

d[f(z; )dz)]d

= 0

If it is possible to interchange differentiation and integration:df(z; )d

d = 0

The score is a log-derivative, so

s(; z) =d ln f(z; )

d=df(z; )/df(z; , )




hencedf(z; )/d = s(; z)f(z, )

Replacing above: s(; z)f(z; ) dz = 0

When = 0 s(0; z)f(z; 0) dz = E [s(0; z)]

SoE [s(0; z)] = 0




Aside: Interchanging differentiation and integration

Source: Bartle, R., 1966, The Elements of Integration, Wiley, New York




Proof of information equality (the continuous case):

From the previous result: s(; z)f(z; )dz = 0

Take derivatives in both sides, use the product rule and omit arguments in functionsto simplify notation:

(sf + sf)dz = 0sf dz +

sfdz = 0

From the score equality, f = sf , replacing f aboves(; z)s(; z)f(z; )dz +

s(; z)f(z; )dx = 0

When = 0

E(s(0, Z)s(0, Z)) +

H(0; z)f(z; 0)dz = 0

J + E(H(0;Z)) = 0

which implies the desired result.




Asymptotic normality

Under our assumptions, wpa1 the MLE estimator satisfies theFOCs

s(n; Z) = 0

Take a first order Taylor expansion around 0

s(n; Z) = s(0; Z) +H(; Z)(n 0) = 0where is a mean value located between n and 0. (Note that

consistency implies np 0).

Now solve

n( 0

)=

( H(; Z)

n

)1(s(0; Z)

n

)Now we are back in familiar territory: we will show that the first factor does not explode, and that the second is

aymptotically normal




First we will show: (n1H(; Z)

)1 p J1Preliminary result: if gn() is a sequence of random functions that converge uniformly

in probability to g0() for all in a compact set , and g0() is continuous, np 0

implies gn(n)p g0(0) (see Ruud (2000, pp. 326).

According to our assumptions n1H(; Z) = n1n

i=1H(;Zi)converges uniformly in probability to E [H(;Z)], which by theprevious results, it is continuous in .

Hence, by the previous result, since p 0

n1H(; Z) p E [H(0;Z)] = J



Now we show:s(0; Z)

n

d N(0, J)

Start with

s(0; Z)n

=ns(0, Z)

n=n

ni=1 s(0, Zi)

n

In order to apply the CLT we check

E [s(0, Zi)] = 0, by the score equality.V [s(0, Zi)] = J



Collecting results:

n(n 0

)=

( H(0,X)n

)1 (s(0,X)

n

)p J1 d N(0, J))

Then by Slutzky theorem and linearity

n(n 0

)d N

(0, J1JJ1

)= N

(0, J1

)




Variance estimation

The asymptotic variance of n is J1, with J = V (s(0, Z). We

will propose three consistent estimators:

1 Inverse of empirical minus hessian:[ 1n

ni=1

H(n;Zi)

]12 Inverse of empirical variance of score (OPG):[

1n

ni=1

s(n;Zi)s(n;Zi)]1

3 Inverse of empirical information matrix:[J(n)1

]Walter Sosa-Escudero Maximum Likelihood Estimation



Invariance

Invariance: Let = g(), where g() is a one-to-one function. Let0 denote the true parameter, so 0 = g(0) is the true parameterunder the new reparametrizaton. Then, if its MLE of 0, = g() is the MLE of 0

Example: if is the MLE of ln(0), how can we get the MLE of 0?




Proof:

Since is the MLEl(, z) l(, z),

for every . Since = g() is one-to-one:

l(g1(), z) l(g1(), z)then = g() maximizes the reparametrized log-likelihood.




MLE and unbiasedness

The invariance property makes the MLE estimator very likely to bebiased in many relevant cases.

Consider the following intuition. Suppose is the MLE for 0 andsuppose it is unbiased, so

E() = 0

By invariance, g() is the MLE of g(0). Is g() unbiased? In

generalE(g()) 6= g(E()) = g(0

so if the MLE is unbiased for one parametrization, it is very likelyto be biased for most other parametrizations.




MLE and Efficiency

Let be any unbiased estimator of 0. An important result is thefollowing

Cramer-Rao Inequality: V () (nJ)1 is psd.

This provides a lower bound for unbiased estimators.




Proof: the single parameter case (K = 1).

For any two random variables X and Y

Cov(X,Y )2 V (X)V (Y )

since the squared correlation is less than 1. Then

V (X) Cov(X,Y )2

V (Y )

We will take X = and Y = s(0, Z). It is immediate to check

V (s(0, Z)) = V

(ni=1

s(0, Zi)

)= n J

So...what do we need to show to finish the proof?




We have

V () Cov

(, s(0, Z)

)2nJ

so we need to show Cov(, s(0, Z)

)= 1.

(Sketch:) Since E(s(0, Z)) = 0, Cov(, s(0, Z)

)= E

(s(0, Z)

)E(s) =

s f(z) dz

=

f(z)

f(z)f(z) dz

=

f(z) dz

=

f(z, ) dz

=0

=

E()|=0

= 1

since E() = 0.Walter Sosa-Escudero Maximum Likelihood Estimation



MLE and efficiency

The CR bound applies to unbiased estimators. MLE is likely to bebiased.

MLE estimators are asymptotically normal, centered around the trueparameter with normalized variance equal to the CR lower bound forunbiased estimators.

Problem: the class of consistent AN estimators includes someextreme (an highly unusual) cases that can improve upon the CRbound (the so called superefficient estimator).

Rao (1963): the MLE estimator is efficient (minimum variance) inthe class of consistent and uniformly asymptotically normal (CUAN)estimators.

CUAN estimators: is CUAN for 0 if it is consistent and( 0) converges in distribution to a normal rv uniformly over

compact subsets of .


Motivation: a non-linear modelLikelihood and Maximum-LikelihoodAsymptotic properties

Mle

Documents

MLE 2011 Show Guide

Mtb-mle Bikol

Camden The London MLE

Mle Testing

V2019-04-26 - SINADRIVES€¦ · 08 MLE 2 Lineareinheiten mit Direktantrieb Technische Daten Linearmotorachse MLE 20210 MLE 20210 MLE 20420 MLE 20630 Führungsschlittentyp S R S S

Lessons from the Kom MLE Project in Cameroon - MTB-MLE … MLE Project.pdf · Lessons from the Kom MLE Project in Cameroon ... Mother tongue primary language of ... All instruction

Mle Enquadramento Teorico Aula8

MTB-MLE Pangasinan_v0.1.pdf

MLE Conversion Process

Arancel Mle 2016

MLE and me!

Ang MTB-MLE

V2019-06-11 - Hefel Technik · 08 MLE 2 Lineareinheiten mit Direktantrieb Technische Daten Linearmotorachse MLE 20210 MLE 20210 MLE 20420 MLE 20630 Führungsschlittentyp S R S S Linearmotor

SISTEMAS JURIDICOS-MLE

MTB-MLE Ilokano

Econometrics MLE Probit Logit

Cagayandeoro Mle

MTB MLE PROJECT

Implementing our MLE

MLE - Historia y Definición