48
Motivation: a non-linear model Likelihood and Maximum-Likelihood Asymptotic properties Maximum Likelihood Estimation Walter Sosa-Escudero Econ 507. Econometric Analysis. Spring 2009 April 13, 2009 Walter Sosa-Escudero Maximum Likelihood Estimation

Mle

Embed Size (px)

DESCRIPTION

Econometria

Citation preview

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Maximum Likelihood Estimation

    Walter Sosa-Escudero

    Econ 507. Econometric Analysis. Spring 2009

    April 13, 2009

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Consider the following data set

    The explained variable is binary.

    Admission to grad school depending on GRE score, Families sendkids to school as a function of income, etc.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    This case is a good candidate for a non-linear model

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    How to search for a model and estimate its parameters?

    Consider the following non-linear model

    E(y|x) = F (1 + 2x), F (z) = z

    12pies2

    2 ds

    For example, positive values for 1 may produce a functionthat looks like the one in the previous graph (we well get1 = 2.8619 and 2 = 6.7612.This is a truly non-linear model (in both, parameters andvariables).

    This is the probit model.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    In order to search for an estimation strategy for the parameters, wewill exploit the following fact.

    Since y|x is a binary variable

    E(y|x) = Pr(y = 1|x)

    so, by being specific about the form of E(y|x) we are beingspecific about Pr(y = 1|x).

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Likelihood and Basic Concepts

    Z f(z; 0). o

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Example: Z N(, 2)Here = (, 2), and K = 2.

    Note:

    f(z; ) : < < : 12pi

    exp[ (z)2

    22

    ]L(; z) :

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Maximum Likelihood

    The maximum-likelihood estimator n is defined as

    n argmax

    L(; z)

    It is kind of a reverse enginereeing process: to generate random numbers for acertain distribution you first set parameter values and then get realizations.This is doing the reverse process: first set the realizations and try to get theparameters that are most likely to have generated them.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Some normalizations

    n argmax

    L(; z)

    Note

    n = argmax

    lnL(; z)

    andn

    i=1 lnL(; zi) =n

    i=1 l(; zi)Also

    n = argmax

    1n

    ni=1

    l(; zi)

    We will use whichever one is more convenient.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    If l(; z) is differentiable and has a local maximum in an interiorpoint, then the FOCs for the problem are

    l(; z)

    =n

    =ni=1

    l(; zi)

    =n

    = 0.

    This is a system of K possibly non-linear equations with Kunknowns, that define n implicitely.

    Even if we can guarantee that a solution to this problemexists, we do not have enough information to solve for .

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    The discrete case

    When Y is a discrete random variable, the likelihood function willbe directly the probability function, that is

    L(Y ; ) = f(y; )

    where f(y : ) is now Pr(Y = y; ).

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Conditional likelihood

    Suppose f(y, x, ) is the joint density function of two variables Xand Y . Then, it can be decomposed as

    f(y, x ; ) = f(y|X ; )f(x;)Suppose we are interested in estimating : if and arefunctionally unrelated, then maximizing the joint likelihood isachivied through maximizing separately the conditional and themarginal likelihood: the MLE of also maximizes the conditionallikelihood: we can obtain ML estimates by specifying theconditional likelihood only.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Three Examples

    Poisson Distribution: Y Poisson() if it takes integer andpositive values (including zero) and:

    f(y) = Pr(Y = y) =eoyoy!

    For an iid sample:

    L(, Y ) =ni=1

    eyi

    yi!

    its log is:

    l(, Y ) =ni=1

    [+ yi ln ln yi!]

    = n+ lnni=1

    yi ni=1

    ln yi!

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    FOCs are

    n+ 1

    ni=1

    yi = 0

    so the MLE of is:

    =1n

    ni=1

    yi = y

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Probit Model:

    Y |X Bernoulli(p), p Pr(Y = 1|x) = F (x), and F (z) isthe normal CDF.

    The sample (conditional) likelihood function will be:

    L(, Y ) =

    i/yi=1

    pi

    i/yi=0

    (1 pi) =ni=1

    pyii (1 pi)1yi

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Then

    l(, Y ) =ni=1

    [yi lnF (xi) + (1 yi) ln

    (1 F (xi)

    ) ]FOCs for a local maximum are:

    ni=1

    (yi Fi)fixiFi(1 Fi) = 0

    ,Fi F (xi), fi f(xi). This is a system of K non-linearequations with K unknowns. Moreover, it is not possible to solvefor and obtain and explicit solution.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Gaussian regression model:

    y = x0 + u, with u|x N(0, 2)or, alternatively

    y|x N(x0, 2) = 12pi

    exp

    [1

    2

    (y x0

    )2]

    Then

    l(, 2; Y ) = n ln n ln

    2pi 122

    ni=1

    (yi xi)2

    Any idea what will be the MLE of 0?

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Asymptotic Properties

    We will follow a similar path as we did with other estimationstrategies

    Consistency

    Asymptotic Normality

    Estimation of the asymptotic variance

    Asymptotic efficiency

    Invariance (this is new)

    But first we need to agree on some regularity conditions

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Setup and Regularity Conditions

    At this stage we will simply state them, and discuss them as we go along.

    Some are purely technical, but some of them have important intuitive meaning.

    1 Zi, i = 1, . . . , n, iid f(zi; 0)2 6= 0 f(zi; ) 6= f(zi; 0).3 , a compact set.4 ln f(zi; ) is continuous at each w.p.1.5 E

    [sup |ln f(z; )|

    ]

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    In addition, for asymptotic normality we will add the following:

    6 0 is an interior point of .7 f(z; ) is twice continuously differentiable and strictly

    poisitive in a neighborhood N of 0.8

    supN f(z; ) dz

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Quick detour: on bounds.

    Recall that:

    |E(X)| < E(|X|)Then by bounding E(|X|) we are guaranteeing that < E(X)

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Consistency

    Our starting point is the following normalized version of the MLE:

    n = argmax

    Qn(), Qn() 1n

    ni=1

    l(, Zi)

    For consistency we need to establish the following three results

    1 Qn() converges uniformly in probability toQ0() E [l(, Z)].

    2 Q0() has a unique maximum at 0.3

    Qn()up Q0() argmax

    Qn()

    n

    p argmax

    Q0() 0

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Intuition

    n maximizes Qn() (definition of MLE).Qn() Q0() (the MLE problem if well defined at ).0 maximizes Q0() (the true value solves the problem at ).By maximizing Qn() we end up maximizing Q0(),convergence of the sequence of functions guaranteesconverges of maximizers. (this is the difficult step).

    If argmaxQn() is seen as function defined on functions, what property is

    implied by 3)?

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    1) Qn() converges uniformly in probability to Q0() E [l(, Z)]

    If we fix at any point then

    Qn() =1n

    ni=1

    l(, Zi)p E[l(, Z)]

    since by our assumptions (which ones?), l(, Zi) is a sequence ofrvs that satisfies Kolmogorovs LLN.

    This establishes pointwise convergence of Qn() to Q0(). But ourstrategy requires uniform convergence.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Uniform Convergence: a sequence of real valued functions fndefined on a set S < converges uniformly to a function f on S iffor each > 0, there is a number N such that

    n > N |fn(x) f(x)| < for all x S

    Intuition: we are using the same for the whole domain, that is,eventually we can put fn in the strip f .It can be shown that uniform convergence is equivalent to

    supxS|f(x) fn(x)| 0

    (see Ross (1980, pp. 137))

    Uniform convergence in probability: Qn() converges uniformly inprobability to Q0() means sup |Qn()Q0()| p 0.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Uniform LLN: if Zi are iid, is compact and a(Zi, ) is acontinuous function at each wp1, and there is d(Z) witha(z, ) d(Z) for all and E[d(Z)]

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    2) Q0() has a unique maximum at 0.

    Information inequality: if 6= 0 f(zi) 6= f(zi; 0) andE[|l(;Z)|] lnE

    [f(Z; )f(Z; 0)

    ]Jensens inequality!

    > ln

    f(z; )f(z; 0)

    f(z; 0) dz

    > ln 1> 0

    Important: check where does the argument break down if assumptions do not hold.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    3) 1) and 2) imply consistency.

    Pick any > 0. Let us get three inequalities wpa 1.

    n maximizes Qn(), so

    a) Qn(n) > Qn(0) /3Qn() converges uniformly to Q0(), so Qn(n)Q0(n) < /3,hence

    b) Q0(n) > Qn(n) /3and Q0(0)Qn(0) < /3 hence

    c) Qn(0) > Q0(0) /3

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Now start with b)

    Q0(n) > Qn(n) /3> Qn(0) 2/3 Substract /3 in both sides of a)

    d) Q0(n) > Q0(0) Substract 2/3 in both sides of c)

    Let N be an open subset of containing 0. N open impliesN c closed and bounded: compact in our case.Since Q0() is continuous (the uniform limit of a continuousfunction is continuous) then:

    supN c

    Q0() = Q0()

    for some N c (continuous functions over compact setsachieve their maximum).

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Now since 0 is the unique maximizer of Q0(),

    Q0() < Q0(0)

    Now pick = Q0(0)Q0(), then by inequality d), wpa1

    Q0() > Q0()

    so N wpa1. (if not Q0() would not be the sup).Then using the definition of convergence in probability, since Nwas chosen arbitrarily

    np 0

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Asymptotic normalilty

    Asymptotic normality is a bit easier to establish since we will followa strategy very simimlar to what we did with all previousestimators.

    But first we need to establish some notation and results.

    Score and hessian.

    Score equality

    Information matrix

    Information matrix equality

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Score, Hessian and Information

    Score: s(;Z) l(;Z), a K 1 vector.Sample score: s(; Z) = l(; Z) =

    ni=1 s(;Zi)

    Hessian: H(;Z) l(;Z), a K K matrix.Sample hessian: H(; Z) = l(; Z) =

    ni=1H(;Zi)

    Information matrix: J E [s(0;Z)s(0;Z)], an K Kmatrix.

    Score equality: E [s(0;Z)] = 0 (It is kind of a FOC of the likelihood inequality.)

    Information equality: E [H(0;Z)] = J

    Note that this imples V [s(0;Z)] J

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Proof of score equality (the continuous case):

    For any f(z; ) dz = 1

    Taking derivatives in both sides

    d[f(z; )dz)]d

    = 0

    If it is possible to interchange differentiation and integration:df(z; )d

    d = 0

    The score is a log-derivative, so

    s(; z) =d ln f(z; )

    d=df(z; )/df(z; , )

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    hencedf(z; )/d = s(; z)f(z, )

    Replacing above: s(; z)f(z; ) dz = 0

    When = 0 s(0; z)f(z; 0) dz = E [s(0; z)]

    SoE [s(0; z)] = 0

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Aside: Interchanging differentiation and integration

    Source: Bartle, R., 1966, The Elements of Integration, Wiley, New York

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Proof of information equality (the continuous case):

    From the previous result: s(; z)f(z; )dz = 0

    Take derivatives in both sides, use the product rule and omit arguments in functionsto simplify notation:

    (sf + sf)dz = 0sf dz +

    sfdz = 0

    From the score equality, f = sf , replacing f aboves(; z)s(; z)f(z; )dz +

    s(; z)f(z; )dx = 0

    When = 0

    E(s(0, Z)s(0, Z)) +

    H(0; z)f(z; 0)dz = 0

    J + E(H(0;Z)) = 0

    which implies the desired result.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Asymptotic normality

    Under our assumptions, wpa1 the MLE estimator satisfies theFOCs

    s(n; Z) = 0

    Take a first order Taylor expansion around 0

    s(n; Z) = s(0; Z) +H(; Z)(n 0) = 0where is a mean value located between n and 0. (Note that

    consistency implies np 0).

    Now solve

    n( 0

    )=

    ( H(; Z)

    n

    )1(s(0; Z)

    n

    )Now we are back in familiar territory: we will show that the first factor does not explode, and that the second is

    aymptotically normal

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    First we will show: (n1H(; Z)

    )1 p J1Preliminary result: if gn() is a sequence of random functions that converge uniformly

    in probability to g0() for all in a compact set , and g0() is continuous, np 0

    implies gn(n)p g0(0) (see Ruud (2000, pp. 326).

    According to our assumptions n1H(; Z) = n1n

    i=1H(;Zi)converges uniformly in probability to E [H(;Z)], which by theprevious results, it is continuous in .

    Hence, by the previous result, since p 0

    n1H(; Z) p E [H(0;Z)] = J

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Now we show:s(0; Z)

    n

    d N(0, J)

    Start with

    s(0; Z)n

    =ns(0, Z)

    n=n

    ni=1 s(0, Zi)

    n

    In order to apply the CLT we check

    E [s(0, Zi)] = 0, by the score equality.V [s(0, Zi)] = J

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Collecting results:

    n(n 0

    )=

    ( H(0,X)n

    )1 (s(0,X)

    n

    )p J1 d N(0, J))

    Then by Slutzky theorem and linearity

    n(n 0

    )d N

    (0, J1JJ1

    )= N

    (0, J1

    )

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Variance estimation

    The asymptotic variance of n is J1, with J = V (s(0, Z). We

    will propose three consistent estimators:

    1 Inverse of empirical minus hessian:[ 1n

    ni=1

    H(n;Zi)

    ]12 Inverse of empirical variance of score (OPG):[

    1n

    ni=1

    s(n;Zi)s(n;Zi)]1

    3 Inverse of empirical information matrix:[J(n)1

    ]Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Invariance

    Invariance: Let = g(), where g() is a one-to-one function. Let0 denote the true parameter, so 0 = g(0) is the true parameterunder the new reparametrizaton. Then, if its MLE of 0, = g() is the MLE of 0

    Example: if is the MLE of ln(0), how can we get the MLE of 0?

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Proof:

    Since is the MLEl(, z) l(, z),

    for every . Since = g() is one-to-one:

    l(g1(), z) l(g1(), z)then = g() maximizes the reparametrized log-likelihood.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    MLE and unbiasedness

    The invariance property makes the MLE estimator very likely to bebiased in many relevant cases.

    Consider the following intuition. Suppose is the MLE for 0 andsuppose it is unbiased, so

    E() = 0

    By invariance, g() is the MLE of g(0). Is g() unbiased? In

    generalE(g()) 6= g(E()) = g(0

    so if the MLE is unbiased for one parametrization, it is very likelyto be biased for most other parametrizations.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    MLE and Efficiency

    Let be any unbiased estimator of 0. An important result is thefollowing

    Cramer-Rao Inequality: V () (nJ)1 is psd.

    This provides a lower bound for unbiased estimators.

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    Proof: the single parameter case (K = 1).

    For any two random variables X and Y

    Cov(X,Y )2 V (X)V (Y )

    since the squared correlation is less than 1. Then

    V (X) Cov(X,Y )2

    V (Y )

    We will take X = and Y = s(0, Z). It is immediate to check

    V (s(0, Z)) = V

    (ni=1

    s(0, Zi)

    )= n J

    So...what do we need to show to finish the proof?

    Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    We have

    V () Cov

    (, s(0, Z)

    )2nJ

    so we need to show Cov(, s(0, Z)

    )= 1.

    (Sketch:) Since E(s(0, Z)) = 0, Cov(, s(0, Z)

    )= E

    (s(0, Z)

    )E(s) =

    s f(z) dz

    =

    f(z)

    f(z)f(z) dz

    =

    f(z) dz

    =

    f(z, ) dz

    =0

    =

    E()|=0

    = 1

    since E() = 0.Walter Sosa-Escudero Maximum Likelihood Estimation

  • Motivation: a non-linear modelLikelihood and Maximum-Likelihood

    Asymptotic properties

    MLE and efficiency

    The CR bound applies to unbiased estimators. MLE is likely to bebiased.

    MLE estimators are asymptotically normal, centered around the trueparameter with normalized variance equal to the CR lower bound forunbiased estimators.

    Problem: the class of consistent AN estimators includes someextreme (an highly unusual) cases that can improve upon the CRbound (the so called superefficient estimator).

    Rao (1963): the MLE estimator is efficient (minimum variance) inthe class of consistent and uniformly asymptotically normal (CUAN)estimators.

    CUAN estimators: is CUAN for 0 if it is consistent and( 0) converges in distribution to a normal rv uniformly over

    compact subsets of .

    Walter Sosa-Escudero Maximum Likelihood Estimation

    Motivation: a non-linear modelLikelihood and Maximum-LikelihoodAsymptotic properties