207
Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013 Christophe Hurlin (University of OrlØans) Advanced Econometrics - HEC Lausanne December 9, 2013 1 / 207

Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

  • Upload
    others

  • View
    60

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Chapter 2: Maximum Likelihood EstimationAdvanced Econometrics - HEC Lausanne

Christophe Hurlin

University of Orléans

December 9, 2013

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 1 / 207

Page 2: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Section 1

Introduction

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 2 / 207

Page 3: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

1. Introduction

The Maximum Likelihood Estimation (MLE) is a method ofestimating the parameters of a model. This estimation method is oneof the most widely used.

The method of maximum likelihood selects the set of values of themodel parameters that maximizes the likelihood function. Intuitively,this maximizes the "agreement" of the selected model with theobserved data.

The Maximum-likelihood Estimation gives an uni�ed approach toestimation.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 3 / 207

Page 4: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

What are the main properties of the maximum likelihood estimator?I Is it asymptotically unbiased?I Is it asymptotically e¢ cient? Under which condition(s)?I Is it consistent?I What is the asymptotic distribution?

How to apply the maximum likelihood principle to the multiple linearregression model, to the Probit/Logit Models etc. ?

... All of these questions are answered in this lecture...

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 4 / 207

Page 5: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

1. Introduction

The outline of this chapter is the following:

Section 2: The principle of the maximum likelihood estimation

Section 3: The likelihood function

Section 4: Maximum likelihood estimator

Section 5: Score, Hessian and Fisher information

Section 6: Properties of maximum likelihood estimators

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 5 / 207

Page 6: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

1. Introduction

References

Amemiya T. (1985), Advanced Econometrics. Harvard University Press.

Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil

Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (aspecial thank)

Ruud P., (2000) An introduction to Classical Econometric Theory, OxfordUniversity Press.

Zivot, E. (2001), Maximum Likelihood Estimation, Lecture notes.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 6 / 207

Page 7: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Section 2

The Principle of Maximum Likelihood

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 7 / 207

Page 8: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Objectives

In this section, we present a simple example in order

1 To introduce the notations

2 To introduce the notion of likelihood and log-likelihood.

3 To introduce the concept of maximum likelihood estimator

4 To introduce the concept of maximum likelihood estimate

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 8 / 207

Page 9: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

ExampleSuppose that X1,X2,� � � ,XN are i.i.d. discrete random variables, such thatXi � Pois (θ) with a pmf (probability mass function) de�ned as:

Pr (Xi = xi ) =exp (�θ) θxi

xi !

where θ is an unknown parameter to estimate.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 9 / 207

Page 10: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Question: What is the probability of observing the particular samplefx1, x2, .., xNg, assuming that a Poisson distribution with as yet unknownparameter θ generated the data?

This probability is equal to

Pr ((X1 = x1) \ ...\ (XN = xN ))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 10 / 207

Page 11: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Since the variables Xi are i .i .d . this joint probability is equal to theproduct of the marginal probabilities

Pr ((X1 = x1) \ ...\ (XN = xN )) =N

∏i=1Pr (Xi = xi )

Given the pmf of the Poisson distribution, we have:

Pr ((X1 = x1) \ ...\ (XN = xN )) =N

∏i=1

exp (�θ) θxi

xi !

= exp (�θN)θ∑N

i=1 xi

N∏i=1xi !

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 11 / 207

Page 12: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

De�nition

This joint probability is a function of θ (the unknown parameter) andcorresponds to the likelihood of the sample fx1, .., xNg denoted by

LN (θ; x1.., xN ) = Pr ((X1 = x1) \ ...\ (XN = xN ))

withLN (θ; x1.., xN ) = exp (�θN)� θ∑N

=1 xi � 1N∏i=1xi !

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 12 / 207

Page 13: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

ExampleLet us assume that for N = 10, we have a realization of the sample equalto f5, 0, 1, 1, 0, 3, 2, 3, 4, 1g , then:

LN (θ; x1.., xN ) = Pr ((X1 = x1) \ ...\ (XN = xN ))

LN (θ; x1.., xN ) =e�10θθ20

207, 360

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 13 / 207

Page 14: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Question: What value of θ would make this sample most probable?

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 14 / 207

Page 15: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

This Figure plots the function LN (θ; x) for various values of θ. It has asingle mode at θ = 2, which would be the maximum likelihood estimate,or MLE, of θ.

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2x 10­8

θ

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 15 / 207

Page 16: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 16 / 207

Page 17: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Consider maximizing the likelihood function LN (θ; x1.., xN ) with respect toθ. Since the log function is monotonically increasing, we usually maximizeln LN (θ; x1.., xN ) instead. In this case:

ln LN (θ; x1.., xN ) = �θN + ln (θ)N

∑i=1xi � ln

�N∏i=1xi !�

∂ ln LN (θ; x1.., xN )∂θ

= �N + 1θ

N

∑i=1xi

∂2 ln LN (θ; x1.., xN )

∂θ2= � 1

θ2

N

∑i=1xi < 0

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 17 / 207

Page 18: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Under suitable regularity conditions, the maximum likelihood estimate(estimator) is de�ned as:

bθ = argmaxθ2R+

ln LN (θ; x1.., xN )

FOC :∂ ln LN (θ; x1.., xN )

∂θ

����bθ = �N + 1bθN

∑i=1xi = 0

() bθ = (1/N)N

∑i=1xi

SOC :∂2 ln LN (θ; x1.., xN )

∂θ2

����bθ = � 1bθ2N

∑i=1xi < 0

bθ is a maximum.Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 18 / 207

Page 19: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

The maximum likelihood estimate (realization) is:

bθ � bθ (x) = 1N

N

∑i=1xi

Given the sample f5, 0, 1, 1, 0, 3, 2, 3, 4, 1g , we have bθ (x) = 2.The maximum likelihood estimator (random variable) is:

bθ = 1N

N

∑i=1Xi

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 19 / 207

Page 20: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Continuous variables

The reference to the probability of observing the given sample is notexact in a continuous distribution, since a particular sample hasprobability zero. Nonetheless, the principle is the same.

The likelihood function then corresponds to the pdf associated to thejoint distribution of (X1,X2, ..,XN ) evaluated at the point(x1, x2, .., xN ) :

LN (θ; x1.., xN ) = fX1,..,XN (x1, x2, .., xN ; θ)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 20 / 207

Page 21: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

2. The Principle of Maximum Likelihood

Continuous variables

If the random variables fX1,X2, ..,XNg are i .i .d . then we have:

LN (θ; x1.., xN ) =N

∏i=1fX (xi ; θ)

where fX (xi ; θ) denotes the pdf of the marginal distribution of X (orXi since all the variables have the same distribution).

The values of the parameters that maximize LN (θ; x1.., xN ) or its logare the maximum likelihood estimates, denoted bθ (x).

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 21 / 207

Page 22: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Section 3

The Likelihood function

De�nitions and Notations

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 22 / 207

Page 23: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Objectives

1 Introduce the notations for an estimation problem that deals with amarginal distribution or a conditional distribution (model).

2 De�ne the likelihood and the log-likelihood functions.

3 Introduce the concept of conditional log-likelihood

4 Propose various applications

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 23 / 207

Page 24: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Notations

Let us consider a continuous random variable X , with a pdf denotedfX (x ; θ) , for x 2 R

θ = (θ1..θK )| is a K � 1 vector of unknown parameters. We assume

that θ 2 Θ � RK .

Let us consider a sample fX1, ..,XNg of i .i .d . random variables withthe same arbitrary distribution as X .

The realisation of fX1, ..,XNg (the data set..) is denoted fx1, .., xNgor x for simplicity.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 24 / 207

Page 25: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Normal distribution)

If X � N�m, σ2

�then:

fX (z ; θ) =1

σp2π

exp

� (z �m)

2

2σ2

!8z 2 R

with K = 2 and

θ =

�mσ2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 25 / 207

Page 26: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

De�nition (Likelihood Function)The likelihood function is de�ned to be:

LN : Θ�RN! R+

(θ; x1, .., xn) 7�! LN (θ; x1, .., xn) =N

∏i=1fX (xi ; θ)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 26 / 207

Page 27: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

De�nition (Log-Likelihood Function)The log-likelihood function is de�ned to be:

`N : Θ�RN! R

(θ; x1, .., xn) 7�! `N (θ; x1, .., xn) =N

∑i=1ln fX (xi ; θ)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 27 / 207

Page 28: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Remark: the (log-)likelihood function depends on two type of arguments:

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 28 / 207

Page 29: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Notations: In the rest of the chapter, I will use the following alternativenotations:

LN (θ; x) � L (θ; x1, .., xN ) � LN (θ)

`N (θ; x) � ln LN (θ; x) � ln L (θ; x1, .., xN ) � ln LN (θ)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 29 / 207

Page 30: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Sample of Normal Variables)

We consider a sample fY1, ..,YNg N .i .d .�m, σ2

�and denote the

realisation by fy1, .., yNg or y . Let us de�ne θ =�m σ2

�|, then we have:

LN (θ; y) =N

∏i=1

1

σp2π

exp

� (yi �m)

2

2σ2

!

=�σ22π

��N/2exp

� 12σ2

N

∑i=1(yi �m)2

!

`N (θ; y) = �N2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1(yi �m)2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 30 / 207

Page 31: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

De�nition (Likelihood of one observation)

We can also de�ne the (log-)likelihood of one observation xi :

Li (θ; x) = fX (xi ; θ) with LN (θ; x) =N

∏i=1Li (θ; x)

`i (θ; x) = ln fX (xi ; θ) with `N (θ; x) =N

∑i=1`i (θ; x)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 31 / 207

Page 32: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Exponential Distribution)

Suppose that D1,D2, ..,DN are i .i .d . positive random variables (durationsfor instance), with Di � Exp (θ) with θ � 0 and

Li (θ; di ) = fD (di ; θ) =1θexp

��di

θ

`i (θ; di ) = ln (fD (di ; θ)) = � ln (θ)�diθ

Then we have:

LN (θ; d) = θ�N exp

�1

θ

N

∑i=1di

!

`N (θ; d) = �N ln (θ)�1θ

N

∑i=1di

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 32 / 207

Page 33: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Remark: The (log-)likelihood and the Maximum Likelihood Estimator arealways based on an assumption (bet?) about the distribution of Y .

Yi � Distribution with pdf fY (y ; θ) =) LN (θ; y) and `N (θ; y)

In practice, generally we have no idea about the true distribution of Yi ....

A solution: the Quasi-Maximum Likelihood Estimator

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 33 / 207

Page 34: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Remark: We can also use the MLE to estimate the parameters of amodel (with dependent and explicative variables) such that:

y = g (x ; θ) + ε

where β denotes the vector or parameters, X a set of explicative variables,ε and error term and g (.) the link function.

In this case, we generally consider the conditional distribution of Y givenX , which is equivalent to unconditional distribution of the error term ε :

Y jX � D () ε � D

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 34 / 207

Page 35: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Notations (model)

Let us consider two continuous random variables Y and X

We assume that Y has a conditional distribution given X = x with apdf denoted fY jx (y ; θ) , for y 2 R

θ = (θ1..θK )| is a K � 1 vector of unknown parameters. We assume

that θ 2 Θ � RK .

Let us consider a sample fX1,YNgNi=1 of i .i .d . random variables anda realisation fx1, yNgNi=1 .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 35 / 207

Page 36: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

De�nition (Conditional likelihood function)

The (conditional) likelihood function is de�ned to be:

LN (θ; y j x) =N

∏i=1fY jX (yi j xi ; θ)

where fY jX (yi j xi ; θ) denotes the conditional pdf of Yi given Xi .

Remark: The conditional likelihood function is the joint conditionaldensity of the data in which the unknown parameter is .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 36 / 207

Page 37: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

De�nition (Conditional log-likelihood function)

The (conditional) log-likelihood function is de�ned to be:

`N (θ; y j x) =N

∑i=1ln fY jX (yi j xi ; θ)

where fY jX (yi j xi ; θ) denotes the conditional pdf of Yi given Xi .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 37 / 207

Page 38: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Remark: The conditional probability density function (pdf) can denotedby:

fY jX (y j x ; θ) � fY (y jX = x ; θ) � fY (y jX = x)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 38 / 207

Page 39: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Linear Regression Model)Consider the following linear regression model:

yi = X>i β+ εi

where Xi is a K � 1 vector of random variables and β = (β1..βK )> a

K � 1 vector of parameters. We assume that the εi are i .i .d . withεi � N

�0, σ2

�. Then, the conditional distribution of Yi given Xi = xi is:

Yi j xi � N�x>i β, σ2

Li (θ; y j x) = fY jx (yi j xi ; θ) =1

σp2π

exp

��yi � x>i β

�22σ2

!

where θ =�

β> σ2�>

is K + 1� 1 vector.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 39 / 207

Page 40: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Linear Regression Model, cont�d)

Then, if we consider an i .i .d . sample fyi , xigNi=1, the correspondingconditional (log-)likelihood is de�ned to be:

LN (θ; y j x) =N

∏i=1fY jX (yi j xi ; θ) =

N

∏i=1

1

σp2π

exp

��yi � x>i β

�22σ2

!

=�σ22π

��N/2exp

� 12σ2

N

∑i=1

�yi � x>i β

�2!

`N (θ; y j x) = �N2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1

�yi � x>i β

�2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 40 / 207

Page 41: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Remark: Given this principle, we can derive the (conditional) likelihoodand the log-likelihood functions associated to a speci�c sample for anytype of econometric model in which the conditional distribution of thedependent variable is known.

Dichotomic models: probit, logit models etc.

Censored regression models: Tobit etc.

Times series models: AR, ARMA, VAR etc.

GARCH models

....

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 41 / 207

Page 42: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Probit/Logit Models)Let us consider a dichotomic variable Yi such that Yi = 1 if the �rm i is indefault and 0 otherwise. Xi = (Xi1...XiK ) denotes a a K � 1 vector ofindividual caracteristics. We assume that the conditional probability ofdefault is de�ned as:

Pr (Yi = 1jXi = xi ) = F�x>i β

�where β = (β1..βK )

> is a vector of parameters and F (.) is a cdf(cumlative distribution function).

Yi =�10

with probability F�x>i β

�with probability 1� F

�x>i β

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 42 / 207

Page 43: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Remark: Given the choice of the link function F (.) we get a probit or alogit model.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 43 / 207

Page 44: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

De�nition (Probit Model)In a probit model, the conditional probability of the event Yi = 1 is:

Pr (Yi = 1jXi = xi ) = Φ (xi β) =x>i βR�∞

1p2π

exp��u

2

2

�du

where Φ (.) denotes the cdf of the standard normal distribution.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 44 / 207

Page 45: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

De�nition (Logit Model)In a logit model, the conditional probability of the event Yi = 1 is:

Pr (Yi = 1jXi = xi ) = Λ�x>i β

�=

11+ exp

��x>i β

�where Λ (.) denotes the cdf of the logistic distribution.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 45 / 207

Page 46: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Probit/Logit Models, cont�d)

What is the (conditional) log-likelihood of the sample fyi , xigNi=1?Whatever the choice of F (.), the conditional distribution of Yi givenXi = xi is a Bernouilli distribution since:

Yi =�10

with probability F�x>i β

�with probability 1� F

�x>i β

�Then, for θ = β, we have:

Li (θ; y j x) = fY jx (yi j xi ; θ) =hF�x>i β

�iyi h1� F

�x>i β

�i1�yiwhere fY jx (yi j xi ; θ) denotes the conditional probability mass function(pmf) of Yi .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 46 / 207

Page 47: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Example (Probit/Logit Models, cont�d)

The (conditional) likelihood and log-likelihood of the sample fyi , xigNi=1arede�ned to be:

LN (θ; y j x) =N

∏i=1fY jx (yi j xi ; θ) =

N

∏i=1

hF�x>i β

�iyi h1� F

�x>i β

�i1�yi

`N (θ; y j x) =N

∑i=1yi ln

hF�x>i β

�i+

N

∑i=1(1� yi ) ln

h1� F

�x>i β

�i= ∑

i : yi=1lnF

�x>i β

�+ ∑i : yi=0

lnh1� F

�x>i β

�iwhere fY jx (yi j xi ; θ) denotes the conditional probability mass function(pmf) of Yi .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 47 / 207

Page 48: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

3. The Likelihood Function

Key Concepts

1 Likelihood (of a sample) function

2 Log-likelihood (of a sample) function

3 Conditional Likelihood and log-likelihood function

4 Likelihood and log-likelihood of one observation

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 48 / 207

Page 49: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Section 4

Maximum Likelihood Estimator

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 49 / 207

Page 50: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Objectives

1 This section will be concerned with obtaining estimates of theparameters θ.

2 We will de�ne the maximum likelihood estimator (MLE).

3 Before we begin that study, we consider the question of whetherestimation of the parameters is possible at all: the question ofidenti�cation.

4 We will introduce the invariance principle

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 50 / 207

Page 51: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

De�nition (Identi�cation)

The parameter vector θ is identi�ed (estimable) if for any other parametervector, θ� 6= θ, for some data y , we have

LN (θ; y) 6= LN (θ�; y)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 51 / 207

Page 52: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

ExampleLet us consider a latent (continuous and unobservable) variable Y �i suchthat:

Y �i = X>i β+ εi

with β = (β1..βK )>, Xi = (Xi1...XiK )

> and where the error term εi isi .i .d . such that E (εi ) = 0 and V (εi ) = σ2. The distribution of εi issymmetric around 0 and we denote by G (.) the cdf of the standardizederror term εi/σ. We assume that this cdf does not depend on σ or β.Example: εi/σ � N (0, 1).

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 52 / 207

Page 53: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Example (cont�d)We observe a dichotomic variable Yi such that:

Yi =�10

if Y �i > 0otherwise

Problem: are the parameters θ = (β> σ2)> identi�able?

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 53 / 207

Page 54: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution:

To answer to this question we have to compute the (log-)likelihood of thesample of observed data fyi , xigNi=1 . We have:

Pr (Yi = 1jXi = xi ) = Pr (Y �i > 0jXi = xi )= Pr

�εi > �x>i β

�= 1� Pr

�εi � �x>i β

�= 1� Pr

�εiσ� �x>i

β

σ

�If we denote by G (.) the cdf associated to the distribution of εi/σ, sincethis distribution is symetric around 0, then we have:

Pr (Yi = 1jXi = xi ) = G�x>i

β

σ

�Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 54 / 207

Page 55: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

For θ = (β> σ2)>, we have

`N (θ; y j x) =N

∑i=1yi ln

�G�x>i

β

σ

��+

N

∑i=1(1� yi ) ln

�1� G

�x>i

β

σ

��This log-likelihood depends only on the ratio β/σ. So, for θ = (β> σ2)>

and θ� = (k � β> k � σ)>, with k 6= 1 :

`N (θ; y j x) = `N (θ�; y j x)

The parameters β and σ2 cannot be identi�ed. We can only identify theratio β/σ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 55 / 207

Page 56: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Remark:

In this latent model, only the ratio β/σ can be identi�ed since

Pr (Yi = 1jXi = xi ) = Pr�

εiσ< x>i

β

σ

�= G

�x>i

β

σ

�The choice of a logit or probit model implies a normalisation on thevariance of εi/σ and then on σ2 :

probit : Pr (Yi = 1jXi = xi ) = Φ�x>i eβ� with eβ = βi/σ, V

� εiσ

�= 1

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 56 / 207

Page 57: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

De�nition (Maximum Likelihood Estimator)

A maximum likelihood estimator bθ of θ 2 Θ is a solution to themaximization problem:

bθ = argmaxθ2Θ

`N (θ; y j x)

or equivalently bθ = argmaxθ2Θ

LN (θ; y j x)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 57 / 207

Page 58: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Remarks

1 Do not confuse the maximum likelihood estimator bθ (which is arandom variable) and the maximum likelihood estimate bθ (x) whichcorresponds to the realisation of bθ on the sample x .

2 Generally, it is easier to maximise the log-likelihood than thelikelihood (especially for the distributions that belong to theexponential family).

3 When we consider an unconditional likelihood, the MLE is de�ned by:

bθ = argmaxθ2Θ

`N (θ; x)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 58 / 207

Page 59: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

De�nition (Likelihood equations)Under suitable regularity conditions, a maximum likelihood estimator(MLE) of θ is de�ned to be the solution of the �rst-order conditions(FOC):

∂`N (θ; y j x)∂θ

����bθ = 0(K ,1)

or∂LN (θ; y j x)

∂θ

����bθ = 0(K ,1)

These conditions are generally called the likelihood or log-likelihoodequations.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 59 / 207

Page 60: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Notations

The �rst derivative (gradient) of the (conditional) log-likelihood evaluatedat the point bθ satis�es:

∂LN (θ; y j x)∂θ

����bθ �∂LN

�bθ; y j x�∂θ

= g�bθ; y j x� = 0

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 60 / 207

Page 61: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Remark

The log-likelihood equations correspond to a linear/nonlinear system ofK equations with K unknown parameters θ1, .., θK :

∂`N (θ; Y j x)∂θ

����bθ =0BB@

∂`N (θ;Y jx )∂θ1

���bθ...

∂`N (θ;Y jx )∂θK

���bθ

1CCA =

0@ 0...0

1A

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 61 / 207

Page 62: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

De�nition (Second Order Conditions)

Second order condition (SOC) of the likelihood maximisation problem: theHessian matrix evaluated at bθ must be negative de�nite.

∂2`N (θ; y j x)∂θ∂θ>

����bθ is negative de�niteor

∂2LN (θ; y j x)∂θ∂θ>

����bθ is negative de�nite

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 62 / 207

Page 63: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Remark:

The Hessian matrix (realisation) is a K �K matrix:

∂2`N (θ; y j x)∂θ∂θ>

=

0BBBBBB@

∂2`N (θ; y jx )∂θ21

∂2`N (θ; y jx )∂θ1∂θ2

..∂2`N (θ; y jx )

∂θ1∂θK

∂2`N (θ; y jx )∂θ2∂θ1

∂2`N (θ; y jx )∂θ22

.. ..

.. .. .. ..

∂2`N (θ; y jx )∂θK ∂θ1

.. ..∂2`N (θ; y jx )

∂θ2K

1CCCCCCA

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 63 / 207

Page 64: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Reminders

A negative de�nite matrix is a symetric (Hermitian if there arecomplex entries) matrix all of whose eigenvalues are negative.

The n� n Hermitian matrix M is said to be negative-de�nite if:

x|Mx < 0

for all non-zero x in Rn.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 64 / 207

Page 65: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Example (MLE problem with one parameter)Let us consider a real-valued random variable X with a pdf given by:

fX�x ; σ2

�= exp

�� x2

2σ2

�xσ2

8x 2 [0,+∞[

where σ2 is an unknown parameter. Let us consider a sample fX1, ..,XNgof i .i .d . random variables with the same arbitrary distribution as X .

Problem: What is the maximum likelihood estimator (MLE) of σ2?

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 65 / 207

Page 66: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution:

We have:

ln fX�x ; σ2

�= � x2

2σ2+ ln (x)� ln

�σ2�

So, the log-likelihood of the sample fx1, .., xNg is:

`N�σ2; x

�=

N

∑i=1ln fX

�xi ; σ2

�= � 1

2σ2

N

∑i=1x2i +

N

∑i=1ln (xi )�N ln

�σ2�

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 66 / 207

Page 67: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

The maximum likelihood estimator bσ2 of σ2 2 R+ is a solution to themaximization problem:

bσ2 = argmaxσ22R+

`N�σ2; x

�= argmax

σ22R+

� 12σ2

N

∑i=1x2i +

N

∑i=1ln (xi )�N ln

�σ2�

∂`N�σ2; x

�∂σ2

=12σ4

N

∑i=1x2i �

Nσ2

FOC (log-likelihood equation):

∂`N�σ2; x

�∂σ2

�����bσ2 =1

2bσ4N

∑i=1x2i �

Nbσ2 = 0() bσ2 = 12N

N

∑i=1x2i

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 67 / 207

Page 68: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

Check that bσ2 is a maximum:∂`N

�σ2; x

�∂σ2

=12σ4

N

∑i=1x2i �

Nσ2

∂2`N�σ2; x

�∂σ4

= � 1σ6

N

∑i=1x2i +

Nσ4

SOC:

∂2`N�σ2; x

�∂σ4

�����bσ2 = � 1bσ6N

∑i=1x2i +

Nbσ4= �2Nbσ2bσ6 +

Nbσ4 since bσ2 = 12N

N

∑i=1x2i

= � Nbσ4 < 0Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 68 / 207

Page 69: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Conclusion:

The maximum likelihood estimator (MLE) of the parameter σ2 is de�nedby:

bσ2 = 12N

N

∑i=1X 2i

The maximum likelihood estimate of the parameter σ2 is equal to:

bσ2 (x) = 12N

N

∑i=1x2i

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 69 / 207

Page 70: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Example (Sample of normal variables)

We consider a sample fY1, ..,YNg N.i .d .�m, σ2

�. Problem: what are

the MLE of m and σ2?

Solution: Let us de�ne θ =�m σ2

�|.

bθ = argmaxσ22R+,m2R

`N (θ; y)

with

`N (θ; y) = �N2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1(yi �m)2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 70 / 207

Page 71: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

`N (θ; y) = �N2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1(yi �m)2

The �rst derivative of the log-likelihood function is de�ned by:

∂`N (θ; y)∂θ

=

∂`N (θ;y )

∂m∂`N (θ;y )

∂σ2

!

∂`N (θ; y)∂m

=1

σ2

N

∑i=1(yi �m)

∂`N (θ; y)∂σ2

= � N2σ2

+12σ4

N

∑i=1(yi �m)2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 71 / 207

Page 72: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

FOC (log-likelihood equations)

∂`N (θ; y)∂θ

����bθ = 1bσ2 ∑N

i=1 (yi � bm)� N2bσ2 + 1

2bσ4 ∑Ni=1 (yi � bm)2

!=

0

0

!

So, the MLE correspond to the empirical mean and variance:

bθ = � bmbσ2�

with bm = 1N

N

∑i=1Yi bσ2 = 1

N

N

∑i=1

�Yi � Y N

�2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 72 / 207

Page 73: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

∂`N (θ; y)∂m

=1

σ2

N

∑i=1(yi �m)

∂`N (θ; y)∂σ2

= � N2σ2

+12σ4

N

∑i=1(yi �m)2

The Hessian matrix (realization) is:

∂2`N (θ; y)

∂θ∂θ>=

∂2`N (θ;y )

∂m2∂2`N (θ;y )

∂m∂σ2

∂2`N (θ;y )∂σ2∂m

∂2`N (θ;y )∂σ4

!

=

� N

σ2� 1

σ4 ∑Ni=1 (yi �m)

� 1σ4 ∑N

i=1 (yi �m) N2σ4� 1

σ6 ∑Ni=1 (yi �m)

2

!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 73 / 207

Page 74: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d): SOC

∂2`N (θ; y)

∂θ∂θ>

����bθ =

� Nbσ2 � 1bσ4 ∑N

i=1 (yi � bm)� 1bσ4 ∑N

i=1 (yi � bm) N2bσ4 � 1bσ6 ∑N

i=1 (yi � bm)2!

=

� Nbσ2 0

0 N2bσ4 � Nbσ2bσ6

!

since since N bm = ∑Ni=1 yi and N bσ2 = ∑N

i=1 (yi � bm)2∂2`N (θ; y)

∂θ∂θ>

����bθ = � Nbσ2 00 � N

2bσ4!

is de�nite negative

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 74 / 207

Page 75: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Example (Linear Regression Model)Consider the linear regression model:

yi = x>i β+ εi

where xi = (xi1...xiK )> and β = (β1..βK )

> are K � 1 vectors. We assumethat the εi are N .i .d .

�0, σ2

�. Then, the (conditional) log-likelihood of the

observations (xi , yi ) is given by

`N (θ; y j x) = �N2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1

�yi � x>i β

�2where θ = (β> σ2)> is (K + 1)� 1 vector. Question: what are the MLEof β and σ2?

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 75 / 207

Page 76: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Notation 1: The derivative of a scalar y by a K � 1 vectorx = (x1...xK )

> is K � 1 vector

∂y∂x=

0B@∂y∂x1..∂y

∂xK

1CANotation 2: If x and β are two K � 1 vectors, then:

∂�x>β

�∂β

= x(K ,1)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 76 / 207

Page 77: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution

bθ = argmaxβ2RK ,σ22R+

� N2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1

�yi � x>i β

�2The �rst derivative of the log-likelihood function is a (K + 1)� 1 vector:

∂`N (θ; y j x)∂θ| {z }

(K+1)�1

=

0@ ∂`N (θ; y jx )∂β

∂`N (θ; y jx )∂σ2

1A =

0BBBB@∂`N (θ; y jx )

∂β1..

∂`N (θ; y jx )∂βK

∂`N (θ; y jx )∂σ2

1CCCCA

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 77 / 207

Page 78: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d)

bθ = argmaxβ2RK ,σ22R+

� N2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1

�yi � x>i β

�2The �rst derivative of the log-likelihood function is a (K + 1)� 1 vector:

∂`N (θ; y j x)∂β| {z }(K ,1)

=1

σ2

N

∑i=1

xi|{z}(K ,1)

�yi � x>i β

�| {z }

(1,1)

∂`N (θ; y j x)∂σ2| {z }(1,1)

= � N2σ2

+12σ4

N

∑i=1

�yi � x>i β

�2| {z }

(1,1)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 78 / 207

Page 79: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

FOC (log-likelihood equations)

∂`N (θ; y j x)∂θ

����bθ =0@ 1bσ2 ∑N

i=1 xi�yi � x>i bβ�

� N2bσ2 + 1

2bσ4 ∑Ni=1

�yi � x>i bβ�2

1A =

�0K0

So, the MLE is de�ned by:

bθ = � bβbσ2�

bβ = N

∑i=1XiX>i

!�1 N

∑i=1XiYi

! bσ2 = 1N

N

∑i=1

�Yi � X>i bβ�2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 79 / 207

Page 80: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

The Hessian is a (K + 1)� (K + 1) matrix:

∂2`N (θ; y j x)∂θ∂θ>| {z }

(K+1)�(K+1)

=

0BBBBBBBB@

∂2`N (θ; y j x)∂β∂β>| {z }K�K

∂2`N (θ; y j x)∂β∂σ2| {z }K�1

∂2`N (θ; y j x)∂σ2∂β>| {z }1�K

∂2`N (θ; y j x)∂σ4| {z }1�1

1CCCCCCCCA

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 80 / 207

Page 81: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

∂`N (θ; y j x)∂β

=1

σ2

N

∑i=1xi�yi � x>i β

�∂`N (θ; y j x)

∂σ2= � N

2σ2+

12σ4

N

∑i=1

�yi � x>i β

�2So, the Hessian matrix (realization) is equal to:

∂2`N (θ; y j x)∂θ∂θ>

=

0BBBBB@� 1

σ2 ∑Ni=1 xi|{z}

K�1

x>i|{z}1�K

� 1σ4 ∑N

i=1 xi|{z}K�1

�yi � x>i β

�| {z }

1�1

� 1σ4 ∑N

i=1 x>i|{z}

1�K

�yi � x>i β

�| {z }

1�1

N2σ4� 1

σ6 ∑Ni=1

�yi � x>i β

�| {z }

1�1

2

1CCCCCAChristophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 81 / 207

Page 82: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

Second Order Conditions (SOC)

∂2`N (θ)

∂θ∂θ>

����bθ =0B@ � 1bσ2 ∑N

i=1 xix>i � 1bσ4 ∑N

i=1 xi�yi � x>i bβ�

� 1bσ4 ∑Ni=1 x

>i

�yi � x>i bβ� N

2bσ4 � 1bσ6 ∑Ni=1

�yi � x>i bβ�2

1CA

Since ∑Ni=1 x

>i

�yi � x>i bβ� = 0 (FOC) and Nbσ2 = ∑N

i=1

�yi � x>i bβ�2

∂2`N (θ)

∂θ∂θ>

����bθ = � Nbσ2 ∑N

i=1 xix>i 0

0 N2bσ4 � Nbσ2bσ6

!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 82 / 207

Page 83: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Solution (cont�d):

Second Order Conditions (SOC).

∂2`N (θ; y j x)∂θ∂θ>

����bθ = � 1bσ2 ∑N

i=1 xix>i 0

0 � N2bσ4

!is de�nite negative

Since ∑Ni=1 xix

>i is positive de�nite (assumption), the Hessian matrix is

de�nite negative and bθ is the MLE of the parameters θ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 83 / 207

Page 84: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Theorem (Equivariance or Invariance Principle)Under suitable regularity conditions, the maximum likelihood estimator ofa function g (.) of the parameter θ is g

�bθ�, where bθ is the maximumlikelihood estimator of θ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 84 / 207

Page 85: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Invariance Principle

The MLE is invariant to one-to-one transformations of θ. Anytransformation that is not one to one either renders the modelinestimable if it is one to many or imposes restrictions if it is many toone.

For the practitioner, this result is extremely useful. For example, whena parameter appears in a likelihood function in the form 1/θ , it isusually worthwhile to reparameterize the model in terms of γ = 1/θ.

Example: Olsen (1978) and the reparametrisation of the likelihoodfunction of the Tobit Model.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 85 / 207

Page 86: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Example (Invariance Principle)Suppose that the normal log-likelihood in the previous example isparameterized in terms of the precision parameter, γ2 = 1/σ2. Thelog-likelihood

`N�m, σ2; y

�= �N

2ln�σ2�� N2ln (2π)� 1

2σ2

N

∑i=1(yi �m)2

becomes

`N�m,γ2; y

�=N2ln�γ2�� N2ln (2π)� γ2

2

N

∑i=1(yi �m)2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 86 / 207

Page 87: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

4. Maximum Likelihood Estimator

Example (Invariance Principle, cont�d)

The MLE for m is clearly still Y N . But the likelihood equation for γ2 isnow:

∂`N�m,γ2; y

�∂γ2

=N2γ2

� 12

N

∑i=1(yi �m)2

and the MLE for γ2 is now de�ned by:

bγ2 = N

∑Ni=1 (Yi �m)

2 =1bσ2

as expected.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 87 / 207

Page 88: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Key Concepts

1 Identi�cation.

2 Maximum likelihood estimator.

3 Maximum likelihood estimate.

4 Log-likelihood equations.

5 Equivariance or invariance principle.

6 Gradient Vector and Hessian Matrix (deterministic elements).

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 88 / 207

Page 89: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Section 5

Score, Hessian and Fisher Information

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 89 / 207

Page 90: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

ObjectivesWe aim at introducing the following concepts:

1 Score vector and gradient

2 Hessian matrix

3 Fischer information matrix of the sample

4 Fischer information matrix of one observation for marginal andconditional distributions

5 Average Fischer information matrix of one observation

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 90 / 207

Page 91: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Score Vector)

The (conditional) score vector is a K � 1 vector de�ned by:

sN (θ; Y j x)(K ,1)

� s (θ) = ∂`N (θ; Y j x)∂θ

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 91 / 207

Page 92: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Remarks:

The score sN (θ; Y j x) is a vector of random elements since itdepends on the random variables Y1, .., .YN .

For an unconditional log-likelihood, `N (θ; x) , the score is denoted by

sN (θ;X ) = ∂`N (θ;X ) /∂θ

The score is a K � 1 vector such that:

sN (θ; Y j x) =

0B@∂`N (θ;Y jx )

∂θ1.

∂`N (θ;Y jx )∂θK

1CA

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 92 / 207

Page 93: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

CorollaryBy de�nition, the score vector satis�es

Eθ (sN (θ; Y j x)) = 0K

where Eθ means the expectation with respect to the conditionaldistribution Y jX = x.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 93 / 207

Page 94: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Remark: If we consider a variable X with a pdf fX (x ; θ) , 8x 2 R, thenEθ (.) means the expectation with respect to the distribution of X :

Eθ (sN (θ;X )) =

∞Z�∞

sN (θ; x) fX (x ; θ) dx = 0

Remark: If we consider a variable Y with a conditional pdf fY jx (y ; θ) ,8y 2 R, then Eθ (.) means the expectation with respect to thedistribution of Y jX = x :

Eθ (sN (θ; Y j x)) =∞Z�∞

sN (θ; Y j x) fY jx (y ; θ) dy = 0

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 94 / 207

Page 95: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Proof.If we consider a variable X with a pdf fX (x ; θ) , 8x 2 R, then:

Eθ (sN (θ;X )) =ZsN (θ; x) fX (x ; θ) dx

= NZ

∂ ln fX (x ; θ)∂θ

fX (x ; θ) dx

= NZ 1fX (x ; θ)

∂fX (x ; θ)∂θ

fX (x ; θ) dx

= N∂

∂θ

ZfX (x ; θ) dx

= N∂1∂θ= 0

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 95 / 207

Page 96: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Exponential Distribution)Suppose that D1,D2, ..,DN are i .i .d ., positive random variable withDi � Exp (θ) and E (Di ) = θ > 0.

fD (d ; θ) =1θexp

��d

θ

�, 8d 2 R+

`N (θ; d) = �N ln (θ)�1θ

N

∑i=1di

The score (scalar) is equal to:

sN (θ;D) = �Nθ+1

θ2

N

∑i=1Di

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 96 / 207

Page 97: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Exponential Distribution, cont�d)By de�nition:

Eθ (sN (θ;D)) = Eθ

�N

θ+1

θ2

N

∑i=1Di

!

= �Nθ+1

θ2

N

∑i=1

Eθ (Di )

= �Nθ+Nθ

θ2

= 0 �

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 97 / 207

Page 98: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Linear Regression Model)

Let us consider the previous linear regression model yi = x>i β+ εi . Thescore is de�ned by:

sN (θ; Y j x) =

0@ 1σ2 ∑N

i=1 xi�Yi � x>i β

�� N2σ2+ 1

2σ4 ∑Ni=1

�Yi � x>i β

�21A

Then, we have

Eθ (sN (θ; Y j x)) = Eθ

0@ 1σ2 ∑N

i=1 xi�Yi � x>i β

�� N2σ2+ 1

2σ4 ∑Ni=1

�Yi � x>i β

�21A

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 98 / 207

Page 99: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Linear Regression Model, cont�d)

We know that Eθ (Yi j x) = x>i β. So, we have:

�1

σ2∑Ni=1 xi

�Yi � x>i β

��=

1σ2

∑Ni=1 xi

�Eθ (Yi j x)� x>i β

�=

1σ2

∑Ni=1 xi

�x>i β� x>i β

�= 0K

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 99 / 207

Page 100: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Linear Regression Model, cont�d)

�� N2σ2

+12σ4

∑Ni=1

�Yi � x>i β

�2�= � N

2σ2+

12σ4

∑Ni=1 Eθ

��Yi � x>i β

�2�= � N

2σ2+

12σ4

∑Ni=1 Eθ

�(Yi �Eθ (Yi j x))2

�= � N

2σ2+

12σ4

∑Ni=1 Vθ (Yi j x)

= � N2σ2

+Nσ2

2σ4

= 0 �

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 100 / 207

Page 101: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Gradient)The gradient vector associated to the log-likelihood function is a K � 1vector de�ned by:

gN (θ; y j x)(K ,1)

� g (θ) = ∂`N (θ; y j x)∂θ

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 101 / 207

Page 102: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Remarks

1 The gradient gN (θ; y j x) is a vector of deterministic entries since itdepends on the realisation y1, .., yN .

2 For an unconditional log-likelihood, the gradient is de�ned by

gN (θ; x) = ∂`N (θ; x) /∂θ

3 The gradient is a K � 1 vector such that:

gN (θ; y j x) =

0B@∂`N (θ; y jx )

∂θ1.

∂`N (θ; y jx )∂θK

1CA

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 102 / 207

Page 103: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

CorollaryBy de�nition of the FOC, the gradient vector satis�es

gN�bθ; y j x� = 0K

where bθ = bθ (x) is the maximum likelihood estimate of θ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 103 / 207

Page 104: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Linear regression model)In the linear regression model, the gradient associated to the log-likelihoodfunction is de�ned to be:

gN (θ; y j x) =

1σ2 ∑N

i=1 xi�yi � x>i β

�� N2σ2+ 1

2σ4 ∑Ni=1

�yi � x>i β

�2!

Given the FOC, we have:

gN�bθ; y j x� =

0B@ 1bσ2 ∑Ni=1 xi

�yi � x>i bβ�

� N2bσ2 + 1

2bσ4 ∑Ni=1

�yi � x>i bβ�2

1CA =

0K0

!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 104 / 207

Page 105: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Hessian Matrix)

The Hessian matrix (deterministic) is de�ned as to be:

HN (θ; y j x) =∂2`N (θ; y j x)

∂θ∂θ>

Remarks: The matrix ∂2`N (θ; y jx )∂θ∂θ>

is also called the Hessian matrix, but do

not confuse the two matrices ∂2`N (θ;Y jx )∂θ∂θ>

and ∂2`N (θ; y jx )∂θ∂θ>

.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 105 / 207

Page 106: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Random Variable Constant

Score vector ∂`N (θ;Y jx )∂θ Gradient vector ∂`N (θ; y jx )

∂θ

Hessian Matrix ∂2`N (θ;Y jx )∂θ∂θ>

Hessian Matrix ∂2`N (θ; y jx )∂θ∂θ>

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 106 / 207

Page 107: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Fisher Information Matrix)

The (conditional) Fisher information matrix associated to the samplefY1, ..,YNg is the variance-covariance matrix of the score vector:

IN (θ)| {z }K�K

= Vθ (sN (θ; Y j x))

or equivalently:

IN (θ) = Vθ

�∂`N (θ; Y j x)

∂θ

�where Vθ means the variance with respect to the conditional distributionY jX .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 107 / 207

Page 108: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Corollary

Since by de�nition Eθ (sN (θ; Y j x)) = 0, then an alternative de�nition ofthe Fisher information matrix of the sample fY1, ..,YNg is:

IN (θ)| {z }K�K

= Eθ

0B@sN (θ; Y j x)| {z }K�1

� sN (θ; Y j x)>| {z }1�K

1CA

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 108 / 207

Page 109: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Fisher Information Matrix)

The (conditional) Fisher information matrix of the sample fY1, ..,YNg isalso given by:

IN (θ) = Eθ

��∂2`N (θ; Y j x)

∂θ∂θ>

�= Eθ (�HN (θ; Y j x))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 109 / 207

Page 110: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Fisher Information Matrix, summary)

The (conditional) Fisher information matrix of the sample fY1, ..,YNgcan alternatively be de�ned by:

IN (θ) = Vθ (sN (θ; Y j x))

IN (θ) = Eθ

�sN (θ; Y j x)� sN (θ; Y j x)>

�IN (θ) = Eθ (�HN (θ; Y j x))

where Eθ and Vθ denote the mean and the variance with respect to theconditional distribution Y jX , and where sN (θ; Y j x) denotes the scorevector and HN (θ; Y j x) the Hessian matrix.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 110 / 207

Page 111: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Fisher Information Matrix, summary)

The (conditional) Fisher information matrix of the sample fY1, ..,YNgcan alternatively be de�ned by:

IN (θ) = Vθ

�∂`N (θ; Y j x)

∂θ

IN (θ) = Eθ

∂`N (θ; Y j x)

∂θ��

∂`N (θ; Y j x)∂θ

�>!

IN (θ) = Eθ

��∂2`N (θ; Y j x)

∂θ∂θ>

�where Eθ and Vθ denote the mean and the variance with respect to theconditional distribution Y jX .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 111 / 207

Page 112: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Remarks

1 Three equivalent de�nitions of the Fisher information matrix, and as aconsequence three di¤erent consistent estimates of the Fisherinformation matrix (see later).

2 The Fisher information matrix associated to the sample fY1, ..,YNgcan also be de�ned from the Fisher information matrix for theobservation i .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 112 / 207

Page 113: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Fisher Information Matrix)

The (conditional) Fisher information matrix associated to the i th

individual can be de�ned by:

I i (θ) = Vθ

�∂`i (θ; Yi j xi )

∂θ

I i (θ) = Eθ

∂`i (θ; Yi j xi )

∂θ

∂`i (θ; Yi j xi )>

∂θ

!

I i (θ) = Eθ

��∂2`i (θ; Yi j xi )

∂θ∂θ>

�where Eθ and Vθ denote the expectation and variance with respect to thetrue conditional distribution Yi jXi .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 113 / 207

Page 114: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Fisher Information Matrix)

The (conditional) Fisher information matrix associated to the i th

individual can be alternatively be de�ned by:

I i (θ) = Vθ (si (θ; Yi j xi ))

I i (θ) = Eθ

�si (θ; Yi j xi ) si (θ; Yi j xi )>

�I i (θ) = Eθ (�Hi (θ; Yi j xi ))

where Eθ and Vθ denote the expectation and variance with respect to thetrue conditional distribution Yi jXi .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 114 / 207

Page 115: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

TheoremThe Fisher information matrix associated to the sample fY1, ..,YNg isequal to the sum of individual Fisher information matrices:

IN (θ) =N

∑i=1I i (θ)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 115 / 207

Page 116: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Remark:

1 In the case of a marginal log-likelihood, the Fisher information matrixassociated to the variable Xi is the same for the observations i :

I i (θ) = I (θ) 8i = 1, ..N

2 In the case of a conditional log-likelihood, the Fisher informationmatrix associated to the variable Yi given Xi = xi depends on theobservation i :

I i (θ) 6= I j (θ) 8i 6= j

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 116 / 207

Page 117: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Exponential marginal distribution)Suppose that D1,D2, ..,DN are i .i .d ., positive random variable withDi � Exp (θ)

E (Di ) = θ V (Di ) = θ2

fD (d ; θ) =1θexp

��d

θ

�, 8d 2 R+

`i (θ; di ) = � ln (θ)�diθ

Question: what is the Fisher information number (scalar) associated toDi ?

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 117 / 207

Page 118: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution` (θ; di ) = � ln (θ)�

diθ

The score of the observation Xi is de�ned by:

si (θ;Di ) =∂`i (θ;Di )

∂θ= �1

θ+Diθ2

Let us use the three de�nitions of the information quantity I i (θ) :

I i (θ) = Vθ (si (θ;Di ))

= Eθ

�si (θ;Di )

2�

= Eθ (�Hi (θ;Di ))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 118 / 207

Page 119: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution, cont�d

si (θ;Di ) =∂`i (θ;Di )

∂θ= �1

θ+Diθ2

First de�nition:

I i (θ) = Vθ (si (θ;Di ))

= Vθ

��1

θ+Diθ2

�=

1

θ4Vθ (Di )

=1

θ2

Conclusion: I i (θ) =I (θ) does not depend on i .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 119 / 207

Page 120: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution, cont�d

si (θ;Di ) =∂`i (θ;Di )

∂θ= �1

θ+Diθ2

Second de�nition:

I i (θ) = Eθ

�si (θ;Di )

2�

= Eθ

��1

θ+Diθ2

�2!

= Vθ

��1

θ+Diθ2

�since Eθ

��1

θ+Diθ2

�= 0

=1

θ2

Conclusion: I i (θ) =I (θ) does not depend on i .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 120 / 207

Page 121: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution, cont�d

si (θ;Di ) =∂`i (θ;Di )

∂θ= �1

θ+Diθ2

Hi (θ;Di ) =∂2`i (θ;Di )

∂θ2=1

θ2� 2Di

θ3

Third de�nition:

I i (θ) = Eθ (�Hi (θ;Di ))

= Eθ

���1

θ2� 2Di

θ3

��= � 1

θ2+2

θ3Eθ (Di )

= � 1

θ2+2

θ3θ =

1

θ2

Conclusion: I i (θ) =I (θ) does not depend on i .Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 121 / 207

Page 122: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Linear regression model)We shown that:

∂2`i (θ; Yi j xi )∂θ∂θ>

=

0BBBBB@� 1

σ2xi|{z}K�1

x>i|{z}1�K

� 1σ4xi|{z}K�1

�Yi � x>i β

�| {z }

1�1

� 1σ4x>i|{z}1�K

�Yi � x>i β

�| {z }

1�1

12σ4� 1

σ6

�Yi � x>i β

�| {z }

1�1

2

1CCCCCAQuestion: what is the Fisher information matrix associated to theobservation Yi ?

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 122 / 207

Page 123: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution

The information matrix is then de�ned by:

I i (θ)| {z }K+1�K+1

= Eθ

��∂2`i (θ; Yi j xi )

∂θ∂θ>

�= Eθ (�Hi (θ; Yi j xi ))

where Eθ means the expectation with respect to the conditionaldistribution Yi jXi = xi

I i (θ) =

0@ 1σ2xix>i

1σ4xi�Eθ (Yi )� x>i β

�1σ4x>i�Eθ (Yi )� x>i β

�� 12σ4+ 1

σ6Eθ

��Yi � x>i β

�2�1A

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 123 / 207

Page 124: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

I i (θ) =

0@ 1σ2xix>i

1σ4xi�Eθ (Yi )� x>i β

�1σ4x>i�Eθ (Yi )� x>i β

�� 12σ4+ 1

σ6Eθ

��Yi � x>i β

�2�1A

Given that Eθ (Yi ) = x>i β and Eθ(�Yi � x>i β

�2) = σ2, then we have:

I i (θ) =

1σ2xix>i 0

0 12σ4

!Conclusion: I i (θ) depends on xi and I i (θ) 6=I j (θ) for i 6= j .

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 124 / 207

Page 125: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Average Fisher information matrix)For a conditional model, the average Fisher information matrix for oneobservation is de�ned by:

I (θ) = EX (I i (θ))

where EX denotes the expectation with respect to X (conditioningvariable).

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 125 / 207

Page 126: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Summary: For a conditional model (and only for a conditional model),we have:

I (θ) = EX

�Vθ

�∂`i (θ; Yi jXi )

∂θ

��= EX (Vθ (s (θ; Yi jXi )))

I (θ) = EXEθ

∂`i (θ; Yi jXi )

∂θ

∂`i (θ; Yi jXi )>

∂θ

!= EXEθ

�si (θ; Yi jXi ) si (θ; Yi jXi )>

�I (θ) = EXEθ

��∂2`i (θ; Yi jXi )

∂θ∂θ>

�= EXEθ (�Hi (θ; Yi jXi ))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 126 / 207

Page 127: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Summary: For a marginal distribution, we have:

I (θ) = Vθ

�∂`i (θ;Yi )

∂θ

�= Vθ (s (θ;Yi ))

I (θ) = Eθ

∂`i (θ;Yi )

∂θ

∂`i (θ;Yi )>

∂θ

!= Eθ

�si (θ;Yi ) si (θ;Yi )

>�

I (θ) = Eθ

��∂2`i (θ;Yi )

∂θ∂θ>

�= Eθ (�Hi (θ;Yi ))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 127 / 207

Page 128: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (Linear Regression Model)In the linear model, the individual Fisher information matrix is equal to:

I i (θ) =

1σ2xix>i 0

0 12σ4

!

and the average Fisher information Matrix for one observation is de�nedby:

I (θ) = EX (I i (θ)) =

1σ2

EX�XiX>i

�0

0 12σ4

!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 128 / 207

Page 129: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Summary: in order to compute the average information matrix I (θ) forone observation:Step 1: Compute the Hessian matrix or the score vector for oneobservation

Hi (θ; Yi j xi ) =∂2`i (θ; Yi j xi )

∂θ∂θ>si (θ; Yi j xi ) =

∂`i (θ; Yi j xi )∂θ

Step 2: Take the expectation (or the variance) with respect to theconditional distribution Yi jXi = xi

I i (θ) = Vθ (si (θ; Yi j xi )) = Eθ (�Hi (θ; Yi j xi ))

Step 3: Take the expectation with respect to the conditioning variable X

I (θ) = EX (I i (θ))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 129 / 207

Page 130: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

TheoremIn a sampling model (with i .i .d . observations), one has:

IN (θ) = N I (θ)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 130 / 207

Page 131: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Marginal Distribution Cond. Distribution (model)pdf fXi (θ; xi ) fYi jxi (θ; y j x)

Score Vector si (θ;Xi ) si (θ; Yi j xi )Hessian Matrix Hi (θ;Xi ) Hi (θ; Yi j xi )

Information matrix I i (θ) = I (θ) I i (θ)

Av. Infor. Matrix I (θ) = I i (θ) I (θ) = EX (I i (θ))

with I i (θ) = Vθ (si (θ; Yi j xi )) = Eθ

�si (θ; Yi j xi ) si (θ; Yi j xi )>

�=

Eθ (�Hi (θ; Yi j xi ))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 131 / 207

Page 132: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

How to estimate the average Fisher Information Matrix?

This matrix is particularly important, since we will see that itscorresponds to the asymptotic variance covariance matrix of theMLE.

Let us assume that we have a consistent estimator bθ of the parameterθ, how to estimate the average Fisher information matrix?

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 132 / 207

Page 133: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

De�nition (Estimators of the average Fisher Information Matrix)

If bθ converges in probability to θ0 (true value), then:

bI �bθ� = 1N

N

∑i=1

bI i �bθ�

bI �bθ� = 1N

N

∑i=1

∂`i (θ; yi j xi )

∂θ

����bθ ∂`i (θ; yi j xi )∂θ

����>bθ!

bI �bθ� = 1N

N

∑i=1

�� ∂2`i (θ; yi j xi )

∂θ∂θ>

����bθ�

are three consistent estimators of the average Fisher information matrix.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 133 / 207

Page 134: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

1 The �rst estimator corresponds to the average of the N Fisherinformation matrices (for Y1, .., YN ) evaluated at the estimated valuebθ. This estimator will rarely be available in practice.

2 The second estimator corresponds to the average of the product ofthe individual score vectors evaluated at bθ. It is known as the BHHH(Berndt, Hall, Hall, and Hausman, 1994) estimator or OPG estimator(outer product of gradients).

bI �bθ� = 1N

N

∑i=1

�gi�bθ; yi j xi� gi �bθ; yi j xi�>�

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 134 / 207

Page 135: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

3. The third estimator corresponds to the opposite of the average of theHessian matrices evaluated at bθ.

bI �bθ� = 1N

N

∑i=1

��Hi

�bθ; yi j xi��

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 135 / 207

Page 136: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

ProblemThese three estimators are asymptotically equivalent, but they could givedi¤erent results in �nite samples. Available evidence suggests that in smallor moderate sized samples, the Hessian is preferable (Greene, 2007).However, in most cases, the BHHH estimator will be the easiest tocompute.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 136 / 207

Page 137: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 137 / 207

Page 138: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (CAPM)The empirical analogue of the CAPM is given by:

erit = αi + βiermt + εt

erit = rit � rft| {z }excess return of security i at time t

ermt = (rmt � rft )| {z }market excess return at time t

where εt is an i .i .d . error term with:

E (εt ) = 0 V (εt ) = σ2 E ( εt jermt ) = 0

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 138 / 207

Page 139: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (CAPM, cont�d)

Data (data �le: capm.xls): Microsoft, SP500 and Tbill (closing prices)from 11/1/1993 to 04/03/2003

­0.10

­0.05

0.00

0.05

0.10

­0.06 ­0.04 ­0.02 0.00 0.02 0.04 0.06 0.08

RSP500

RM

SFT

­0.08

­0.04

0.00

0.04

0.08

500 1000 1500 2000

RSP500 RMSFT

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 139 / 207

Page 140: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (CAPM, cont�d)We consider the CAPM model rewritten as follows

erit = x>t β+ εt t = 1, ..T

where xt = (1 ermt )> is 2� 1 vector of random variables,

θ =�αi : βi : σ2

�>=�

β> : σ2�>

is 3� 1 vector of parameters, andwhere the error term εt satis�es E (εt ) = 0, V (εt ) = σ2 andE ( εt jermt ) = 0.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 140 / 207

Page 141: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Example (CAPM, cont�d)Question: Compute three alternative estimators of the asymptotic

variance covariance matrix of the MLE estimator bθ = �bαi bβi bσ2�>bβ = � bαibβi

�=

T

∑t=1xtx>t

!�1 T

∑t=1xterit

!

bσ2 = 1T

T

∑t=1

�erit � x>t bβ�2

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 141 / 207

Page 142: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution The ML estimator is de�ned by:

bθ = argmaxβ2R2,σ22R+

� T2ln�σ2�� T2ln (2π)� 1

2σ2

T

∑t=1

�erit � x>t bβ�2The problem is regular, so we have:

pT�bθ� θ0

�d! N

�0, I�1 (θ0)

�or equivalently bθ asy

� N�

θ0,1TI�1 (θ0)

�The asymptotic variance covariance matrix of bθ is

V�bθ� = 1

TI�1 (θ0)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 142 / 207

Page 143: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

First estimator: The information matrix at time t is de�ned by (thirdde�nition):

I t (θ) = Eθ

0@�∂2`t�

θ; eRit ��� xt�∂θ∂θ>

1A = Eθ

��Ht

�θ; eRit ��� xt��

where Eθ means the expectation with respect to the conditionaldistribution eRit ���Xt = xt

I t (θ) =

0B@ 1σ2xtx>t

1σ4xt�

�eRit�� x>t β�

1σ4x>t�

�eRit�� x>t β�� 12σ4+ 1

σ6Eθ

��eRit � x>t β�2�

1CAChristophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 143 / 207

Page 144: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

First estimator:

I t (θ) =

0B@ 1σ2xtx>t

1σ4xt�

�eRit�� x>t β�

1σ4x>t�

�eRit�� x>t β�� 12σ4+ 1

σ6Eθ

��eRit � x>t β�2�

1CAGiven that Eθ

�eRit� = x>t β and Eθ

��eRit � x>t β�2�

= σ2, then we have:

I t (θ) =

1σ2xtx>t 02�101�2 1

2σ4

!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 144 / 207

Page 145: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

First estimator:

I t (θ) =

1σ2xtx>t 02�101�2 1

2σ4

!An estimator of the asymptotic variance covariance matrix of bθ is given by:

bVasy

�bθ� = 1TbI�1 �bθ�

bI �bθ� = 1T

T

∑t=1I t�bθ� = 1

T bσ2 ∑Tt=1 xtx>t 02�101�2 1

2bσ4!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 145 / 207

Page 146: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

Second de�nition (BHHH):

bVasy

�bθ� = 1TbI�1 �bθ�

bI �bθ� = 1T

T

∑t=1

∂`t (θ; erit j xt )

∂θ

����bθ � ∂`t (θ; erit j xt )∂θ

����>bθ!

with

∂`t (θ; erit j xt )∂θ

����bθ =0B@ 1bσ2 xt

�erit � x>t bβ�� 12bσ2 + 1

2bσ4�erit � x>t bβ�2

1CA =

1bσ2 xtbεt� 12bσ2 + 1

2bσ4bε2t!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 146 / 207

Page 147: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

Second de�nition (BHHH):

∂`t (θ; erit j xt )∂θ

����bθ � ∂`t (θ; erit j xt )∂θ

����>bθ=

1bσ2 xtbεt� 12bσ2 + 1

2bσ4bε2t!��

1bσ2 x>t bεt � 12bσ2 + 1

2bσ4bε2t �

=

0@ 1bσ4 xtx>t bε2t 1bσ2 xtbεt �� 12bσ2 + 1

2bσ4bε2t �1bσ2 x>t bεt �� 1

2bσ2 + 12bσ4bε2t � �

� 12bσ2 + 1

2bσ4bε2t �21A

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 147 / 207

Page 148: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

Second de�nition (BHHH): so we have

bVasy

�bθ� = 1TbI�1 �bθ�

with

bI �bθ� = 1T

T

∑t=1

0@ 1bσ4 xtx>t bε2t 1bσ2 xtbεt �� 12bσ2 + 1

2bσ4bε2t �1bσ2 x>t bεt �� 1

2bσ2 + 12bσ4bε2t � �

� 12bσ2 + 1

2bσ4bε2t �21A

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 148 / 207

Page 149: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

Third de�nition (inverse of the Hessian): we know that

bVasy

�bθ� = 1TbI�1 �bθ�

bI �bθ� = 1T

T

∑t=1

��Ht

�bθ; erit j xt��

Ht�bθ; erit j xt� =

0@ � 1bσ2 xtx>t � 1bσ4 xt�erit � x>t bβ�

� 1bσ4 x>t�erit � x>t bβ� 1

2bσ4 � 1bσ6�erit � x>t bβ�2

1A

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 149 / 207

Page 150: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

Third de�nition (inverse of the Hessian):

Ht�bθ; erit j xt� =

0@ � 1bσ2 xtx>t � 1bσ4 xt�erit � x>t bβ�

� 1bσ4 x>t�erit � x>t bβ� 1

2bσ4 � 1bσ6�erit � x>t bβ�2

1AGiven the FOC (log-likelihood equations), ∑T

t=1 xt�erit � x>t bβ� = 0 and�erit � x>t bβ�2 = Tbσ2.

T

∑t=1Ht�bθ; erit j xt� =

� 1bσ2 ∑T

t=1 xtx>t 02�101�2 � T

2bσ4!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 150 / 207

Page 151: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

Third de�nition (inverse of the Hessian):So, in this case, the third estimator of bI �bθ� coïncides with the �rst one:

bVasy

�bθ� = 1TbI�1 �bθ�

bI �bθ� = 1T

T

∑t=1

��Ht

�bθ; erit j xt�� = � 1T bσ2 ∑T

t=1 xtx>t 02�101�2 � 1

2bσ4!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 151 / 207

Page 152: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Solution (cont�d)

These three estimates of the asymptotic variance covariance matrixare asymptotically equivalent, but can be largely di¤erent in �nitesample...

bVasy

�bθ� = 1TbI�1 �bθ�

with bI �bθ� = 1T

T

∑t=1I t�bθ�

bI �bθ� = 1T

T

∑t=1

∂`t (θ; erit j xt )

∂θ

����bθ � ∂`t (θ; erit j xt )∂θ

����>bθ!

bI �bθ� = 1T

T

∑t=1(�Ht (θ; erit j xt ))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 152 / 207

Page 153: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 153 / 207

Page 154: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 154 / 207

Page 155: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

5. Score, Hessian and Fisher Information

�Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 155 / 207

Page 156: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Key Concepts

1 Gradient and Hessian Matrix (deterministic elements).

2 Score Vector (random elements).

3 Hessian Matrix (random elements).

4 Fisher information matrix associated to the sample.

5 (Average) Fisher information matrix for one observation.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 156 / 207

Page 157: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Section 6

Properties of Maximum Likelihood Estimators

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 157 / 207

Page 158: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Objectives

MLE is a good estimator? Under which conditions the MLE isunbiased, consistent and corresponds to the BUE (Best UnbiasedEstimator)? => regularity conditions

Is the MLE consistent?

Is the MLE optimal or e¢ cient?

What is the asymptotic distribution of the MLE? The magic of theMLE...

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 158 / 207

Page 159: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

De�nition (Regularity conditions)

Greene (2007) identify three regularity conditions

R1 The �rst three derivatives of ln fX (θ; xi ) with respect to θ arecontinuous and �nite for almost all xi and for all θ. This conditionensures the existence of a certain Taylor series approximation and the�nite variance of the derivatives of `i (θ; xi ).

R2 The conditions necessary to obtain the expectations of the �rst andsecond derivatives of ln fX (θ;Xi ) are met.

R3 For all values of θ,��∂3 ln fX (θ; xi ) /∂θi∂θj∂θk

�� is less than a functionthat has a �nite expectation. This condition will allow us to truncate theTaylor series.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 159 / 207

Page 160: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

De�nition (Regularity conditions, Zivot 2001)

A pdf fX (θ; x) is regular if and only of:

R1 The support of the random variables X , SX = fx : fX (θ; x) > 0g,does not depend on θ.

R2 fX (θ; x) is at least three times di¤erentiable with respect to θ, andthese derivatives are continuous.

R3 The true value of θ lies in a compact set Θ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 160 / 207

Page 161: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Under these regularity conditions, the maximum likelihood estimator bθpossesses many appealing properties:

1 The maximum likelihood estimator is consistent.

2 The maximum likelihood estimator is asymptotically normal (themagic of the MLE..).

3 The maximum likelihood estimator is asymptotically optimal ore¢ cient.

4 The maximum likelihood estimator is equivariant: if bθ is an estimatorof θ then g(bθ) is an estimator of g (θ).

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 161 / 207

Page 162: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Theorem (Consistency)Under regularity conditions, the maximum likelihood estimator isconsistent bθ p�!

N!∞θ0

or equivalently:p limN!∞

bθ = θ0

where θ0 denotes the true value of the parameter θ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 162 / 207

Page 163: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Sketch of the proof (Greene, 2007)

Because bθ is the MLE, in any �nite sample, for any θ 6= bθ (including thetrue θ0) it must be true that

ln LN�bθ; y j x� � ln LN (θ; y j x)

Consider, then, the random variable LN (θ; Y j x) /LN (θ0; Y j x). Becausethe log function is strictly concave, from Jensen�s Inequality, we have

�ln�LN (θ; Y j x)LN (θ0; Y j x)

��� ln

�Eθ

�LN (θ; Y j x)LN (θ0; Y j x)

��

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 163 / 207

Page 164: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Sketch of the proof, cont�d

The expectation on the right-hand side is exactly equal to one, as

�LN (θ; Y j x)LN (θ0; Y j x)

�=

Z � LN (θ; y j x)LN (θ0; y j x)

�LN (θ0; y j x) dy

=ZLN (θ; y j x) dy

= 1

is simply the integral of a joint density.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 164 / 207

Page 165: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Sketch of the proof, cont�d

So we have

�ln�LN (θ; Y j x)LN (θ0; Y j x)

��� ln

�Eθ

�LN (θ; Y j x)LN (θ0; Y j x)

��= ln (1) = 0

Divide the left hand side of this equation by N to produce

�1Nln LN (θ; Y j x)

�� Eθ

�1Nln LN (θ0; Y j x)

�This produces a central result:

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 165 / 207

Page 166: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Theorem (Likelihood Inequality)The expected value of the log-likelihood is maximized at the true value ofthe parameters. For any θ, including bθ :

�1N`N (θ0; Yi j xi )

�� Eθ

�1N`N (θ; Yi j xi )

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 166 / 207

Page 167: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Sketch of the proof, cont�d

Notice that1N`N (θ; Yi j xi ) =

1N

∑Ni=1 `i (θ; Yi j xi )

where the elements `i (θ; Yi j xi ) for i = 1, ..N are i .i .d .. So, using a lawof large numbers, we get:

1N`N (θ; Yi j xi )

p�!N!∞

�1N`N (θ; Yi j xi )

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 167 / 207

Page 168: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Sketch of the proof, cont�d

The Likelihood inequality for θ = bθ impliesEθ

�1N`N (θ0; Yi j xi )

�� Eθ

�1N`N�bθ; Yi j xi��

with1N`N (θ0; Yi j xi )

p�!N!∞

�1N`N (θ0; Yi j xi )

�1N`N�bθ; Yi j xi� p�!

N!∞Eθ

�1N`N�bθ; Yi j xi��

and thus

limN!∞

Pr�1N`N (θ0; Yi j xi ) �

1N`N�bθ; Yi j xi�� = 1

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 168 / 207

Page 169: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Sketch of the proof, cont�d So we have two results:

limN!∞

Pr�1N`N (θ0; Yi j xi ) �

1N`N�bθ; Yi j xi�� = 1

1N`N�bθ; Yi j xi� � 1

N`N (θ0; Yi j xi ) 8N

It necessarily implies that

1N`N�bθ; Yi j xi� p�!

N!∞

1N`N (θ0; Yi j xi )

If θ is a scalar, we have immediatly:bθ p�!N!∞

θ0

For a more general case with dim (θ) = K , see a formal proof in Amemiya(1985).

Amemiya T., (1985) Advanced Econometrics. Harvard University PressChristophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 169 / 207

Page 170: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Remark

The proof of the consistency of the MLE is largely easiest when we have aformal expression for the maximum likelihood estimator bθ

bθ = bθ (X1, ..,XN )

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 170 / 207

Page 171: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

ExampleSuppose that D1,D2, ..,DN are i .i .d ., positive random variable withDi � Exp (θ0), with

fD (d ; θ) =1θexp

��d

θ

�, 8d 2 R+

Eθ (Di ) = θ0 Vθ (Di ) = θ20

where θ0 is the true value of θ. Question: show that the MLE isconsistent.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 171 / 207

Page 172: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution

The log-likelihood function associated to the sample fd1, .., dNg is de�nedby:

`N (θ; d) = �N ln (θ)�1θ

N

∑i=1di

We admit that maximum likelihood estimator corresponds to the samplemean: bθ = 1

N∑Ni=1 Di

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 172 / 207

Page 173: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution, cont�d

Then, we have:

�bθ� = 1N

∑Ni=1 Eθ (Di ) = θ bθ is unbiased

�bθ� = 1N2

∑Ni=1 Vθ (Di ) =

θ2

NAs a consequence

�bθ� = θ limN!∞

�bθ� = 0and bθ p�!

N!∞θ

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 173 / 207

Page 174: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

LemmaUnder stronger conditions, the maximum likelihood estimator convergesalmost surely to θ0 bθ a.s .�!

N!∞θ0 =) bθ p�!

N!∞θ0

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 174 / 207

Page 175: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

1 If we restrict ourselves to the class of unbiased estimators (linear andnonlinear) then we de�ne the best estimator as the one with thesmallest variance.

2 With linear estimators (next chapter), the Gauss-Markov theorem tellsus that the ordinary least squares (OLS) estimator is best (BLUE).

3 When we expand the class of estimators to include linear andnonlinear estimators it turns out that we can establish an absolutelower bound on the variance of any unbiased estimator bθ of θ undercertain conditions.

4 Then if an unbiased estimator bθ has a variance that is equal to thelower bound then we have found the best unbiased estimator(BUE).

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 175 / 207

Page 176: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

De�nition (Cramer-Rao or FDCR bound)

Let X1, ..,XN be an i .i .d . sample with pdf fX (θ; x). Let bθ be an unbiasedestimator of θ; i.e., Eθ(bθ) = θ. If fX (θ; x) is regular then

�bθ� � I�1N (θ0) FDCR or Cramer-Rao bound

where IN (θ0) denotes the Fisher information number for the sampleevaluated at the true value θ0.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 176 / 207

Page 177: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Remarks

1 Hence, the Cramer-Rao Bound is the inverse of the information matrixassociated to the sample. Reminder: three de�nitions for IN (θ0) .

IN (θ0) = Vθ

∂`N (θ; Y j x)

∂θ

����θ0

!

IN (θ0) = Eθ

∂`N (θ; Y j x)

∂θ

����θ0

∂`N (θ; Y j x)>

∂θ

�����θ0

!

IN (θ0) = Eθ

� ∂2`N (θ; Y j x)

∂θ∂θ>

����θ0

!2 If θ is a vector then Vθ

�bθ� �I�1N (θ0) means that Vθ

�bθ��I�1N (θ0)

is positive semi-de�nite

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 177 / 207

Page 178: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Theorem (E¢ ciency)Under regularity conditions, the maximum likelihood estimator isasymptotically e¢ cient and attains the FDCR (Frechet - Darnois -Cramer - Rao) or Cramer-Rao bound:

�bθ� = I�1N (θ0)

where IN (θ0) denotes the Fisher information matrix associated to thesample evaluated at the true value θ0.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 178 / 207

Page 179: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Example (Exponential Distribution)Suppose that D1,D2, ..,DN are i .i .d ., positive random variable withDi � Exp (θ0), with

fD (d ; θ) =1θexp

��d

θ

�, 8d 2 R+

Eθ (Di ) = θ0 Vθ (Di ) = θ20

where θ0 is the true value of θ. Question: show that the MLE is e¢ cient.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 179 / 207

Page 180: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution

We shown that the maximum likelihood estimator corresponds to thesample mean, bθ = 1

N

N

∑i=1Di

�bθ� = θ20N

�bθ� = θ0

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 180 / 207

Page 181: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution, cont�d

The log-likelihood function is

`N (θ; d) = �N ln (θ)�1θ

N

∑i=1di

The score vector is de�ned by:

sN (θ;D) =∂`N (θ;D)

∂θ= �N

θ+1

θ2

N

∑i=1Di

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 181 / 207

Page 182: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution, cont�d

Let us use one of the three de�nitions of the information quantity IN (θ) :

IN (θ) = Vθ

�∂`N (θ;D)

∂θ

�= Vθ

�N

θ+1

θ2

N

∑i=1Di

!=

1

θ4∑Ni=1 Vθ (Di )

=Nθ2

θ4=N

θ2

Then, bθ is e¢ cient and attains the Cramer-Rao bound.Vθ

�bθ� = I�1N (θ0) =θ2

N �

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 182 / 207

Page 183: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Theorem (Convergence of the MLE)Under suitable regularity conditions, the MLE is asymptotically normallydistributed with

pN�bθ � θ0

�d! N

�0, I�1 (θ0)

�where θ0 denotes the true value of the parameter and I (θ0) the (average)Fisher information matrix for one observation.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 183 / 207

Page 184: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

CorollaryAnother way, to write this result, is to say that for large sample size N, theMLE bθ is approximatively distributed according a normal distribution

bθ asy� N

�θ0,N�1 I�1 (θ0)

�or equivalently bθ asy

� N�θ0, I�1N (θ0)

�where IN (θ0) = N�I (θ0) denotes the Fisher information matrixassociated to the sample.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 184 / 207

Page 185: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

De�nition (Asymptotic Variance)The asymptotic variance of the MLE is de�ned by:

Vasy

�bθ� = I�1N (θ0)

where IN (θ0) denotes the Fisher information matrix associated to thesample. This asymptotic variance of the MLE corresponds to theCramer-Rao or FDCR bound.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 185 / 207

Page 186: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

The magic of the MLE

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 186 / 207

Page 187: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Proof (MLE convergence)

At the maximum likelihood estimator, the gradient of the log-likelihoodequals zero (FOC):

gN�bθ�

(K ,1)

� gN�bθ; y j x� = ∂`N (θ; y j x)

∂θ

����bθ = 0Kwhere bθ = bθ (x) denotes here the ML estimate. Expand this set ofequations in a Taylor series around the true parameters θ0. We will use themean value theorem to truncate the Taylor series at the second term:

gN�bθ� = gN (θ0) +HN �θ� �bθ � θ0

�= 0

The Hessian is evaluated at a point θ that is between bθ and θ0, forinstance θ = ωbθ + (1�ω) θ0 for some 0 < ω < 1.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 187 / 207

Page 188: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Proof (MLE convergence, cont�d)

We then rearrange this equation and multiply the result bypN to obtain:

pN�bθ � θ0

�=��HN

���1 �p

NgN (θ0)�

By dividing HN�θ�and gN (θ0) by N, we obtain:

pN�bθ � θ0

�=

�� 1NHN���1 �p

N1NgN (θ0)

�=

�� 1NHN���1 �p

Ng (θ0)�

where g (θ0) denotes the sample mean of the individual gradient vectors

g (θ0) =1NgN (θ0) =

1N

N

∑i=1gi (θ0; yi j xi )

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 188 / 207

Page 189: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Proof (MLE convergence, cont�d)

Let us now consider the same expression in terms of random variables: bθnow denotes the ML estimator, HN

�θ�= HN

�θ; Y j x

�and sN (θ0; Y j x)

the score vector. We have:

pN�bθ � θ0

�=

�� 1NHN�θ; Y j x

���1 �pNs (θ0; Y j x)

�where the score vectors associated to the variables Yi are i .i .d .

s (θ0; Y j x) =1N

N

∑i=1si (θ0; Yi j xi )

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 189 / 207

Page 190: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Proof (MLE convergence, cont�d)

Let us consider the �rst element:

s (θ0) =1N

N

∑i=1si (θ0; Yi j xi )

The individual scores si (θ0; Yi j xi ) are i .i .d . with

Eθ (si (θ0; Yi j xi )) = 0

ExVθ (si (θ0; Yi j xi )) = Ex (I i (θ0)) = I (θ0)

By using the Lindberg-Levy Central Limit Theorem, we have:

pNs (θ0)

d! N (0, I (θ0))

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 190 / 207

Page 191: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Proof (MLE convergence, cont�d)

We known that:

� 1NHN�θ; Y j x

�= � 1

N

N

∑i=1Hi�θ; Yi j xi

�where the hessian matrices Hi

�θ; Yi j xi

�are i .i .d . Besides, because

plim�bθ � θ0

�= 0, plim

�θ � θ0

�= 0 as well. By applying a law of large

numbers, we get:

� 1NHN�θ; Y j x

� p! EXEθ (�Hi (θ0; Yi j xi ))

with

EXEθ (�Hi (θ0; Yi j xi )) = EXEθ

��∂2`i (θ; Yi j xi )

∂θ∂θ>

�= I (θ0)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 191 / 207

Page 192: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Reminder:

If XN and YN verifyXN(K ,K )

p! X(K ,K )

YN(K ,1)

d! N�0

(K ,1), Σ(K ,K )

�then

XN(K ,K )

YN(K ,1)

d! N�0

(K ,1), X(K ,K )

Σ(K ,K )

X>(K ,K )

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 192 / 207

Page 193: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Proof (MLE convergence, cont�d)

Here we have

pN�bθ � θ0

�=

�� 1NHN�θ; Y j x

���1 �pNs (θ0; Y j x)

��� 1NHN�θ; Y j x

���1 p! I�1 (θ0) symmetric matrix

pNs (θ0)

d! N (0, I (θ0))

Then, we get:

pN�bθ � θ0

�d! N

�0, I�1 (θ0) I (θ0) I�1 (θ0)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 193 / 207

Page 194: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Proof (MLE convergence, cont�d)

And �nally.... pN�bθ � θ0

�d! N

�0, I�1 (θ0)

�The magic of the MLE.....

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 194 / 207

Page 195: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Example (Exponential Distribution)Suppose that D1,D2, ..,DN are i .i .d ., positive random variable withDi � Exp (θ0), with

fD (d ; θ) =1θexp

��d

θ

�, 8d 2 R+

Eθ (Di ) = θ0 Vθ (Di ) = θ20

where θ0 is the true value of θ. Question: what is the asymptoticdistribution of the MLE? Propose a consistent estimator of the asymptoticvariance of bθ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 195 / 207

Page 196: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution

We shown that bθ = (1/N)∑Ni=1 Di and:

si (θ;Di ) =∂`i (θ;Di )

∂θ= �1

θ+Diθ2

The (average) Fisher information matrix associated to Di is:

I (θ) = Vθ

��1

θ+Diθ2

�=1

θ4Vθ (Di ) =

1

θ2

Then, the asymptotic distribution of bθ is:pN�bθ � θ0

�d! N

�0, θ2

�or equivalently bθ asy

� N

θ0,θ2

N

!Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 196 / 207

Page 197: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution, cont�d

The asymptotic variance of bθ is:Vasy

�bθ� = θ2

N

A consistent estimator of Vas

�bθ� is simply de�ned by:bVasy

�bθ� = bθ2N �

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 197 / 207

Page 198: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Example (Linear Regression Model)

Let us consider the previous linear regression model yi = x>i β+ εi , with εiN .i .d .

�0, σ2

�. Let us denote θ the K + 1� 1 vector de�ned by

θ =�

β> σ2�>. The MLE estimator of θ is de�ned by:

bθ = � bβbσ2�

bβ = N

∑i=1XiX>i

!�1 N

∑i=1X>i Yi

! bσ2 = 1N

N

∑i=1

�Yi � X>i bβ�2

Question: what is the asymptotic distribution of bθ? Propose an estimatorof the asymptotic variance.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 198 / 207

Page 199: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution

This model satisfy the regularity conditions. We shown that the averageFisher information matrix is equal to:

I (θ) =� 1

σ2EX

�XiX>i

�0

0 12σ4

�From the MLE convergence theorem, we get immediately:

pN�bθ � θ0

�d! N

�0, I�1 (θ0)

�where θ0 is the true value of θ.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 199 / 207

Page 200: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution, cont�d

The asymptotic variance covariance matrix of bθ is equal to:Vasy

�bθ� = N�1 I�1 (θ0) = I�1N (θ0)

with

IN (θ) =� N

σ2EX

�XiX>i

�0

0 N2σ4

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 200 / 207

Page 201: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution, cont�d

A consistent estimate of IN (θ) is:

bIN (θ) = bV�1asy

�bθ� = Nbσ2 bQX 00 N

2bσ4!

with bQX = 1N

N

∑i=1xix>i

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 201 / 207

Page 202: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Solution, cont�d

Thus we get: bβ asy� N

�β0, bσ2 �∑N

i=1 xix>i

��1�

bσ2 asy� N

σ20,2bσ4N

!

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 202 / 207

Page 203: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Summary

Under regular conditions

1 The MLE is consistent.

2 The MLE is asymptotically e¢ cient and its variance attains theFDCR or Cramer-Rao bound.

3 The MLE is asymptotically normally distributed.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 203 / 207

Page 204: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

But, �nite sample properties can be very di¤erent from large sampleproperties:

1 The maximum likelihood estimator is consistent but can be severelybiased in �nite samples

2 The estimation of the variance-covariance matrix can be seriouslydoubtful in �nite samples.

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 204 / 207

Page 205: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

6. Properties of Maximum Likelihood Estimators

Theorem (Equivariance)

Under regular conditions and if g (.) is a continuously di¤erentiablefunction of θ and is de�ned from RK to RP , then:

g�bθ� p! g (θ0)

pN�g�bθ�� g (θ0)� d! N

�0,G (θ0) I�1 (θ0) G (θ0)

>�

where θ0 is the true value of the parameters and the matrix G (θ0) isde�ned by

G (θ)(P ,K )

=∂g (θ)

∂θ>

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 205 / 207

Page 206: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

Key Concepts of the Chapter 2

1 Likelihood and log-likelihood function2 Maximum likelihood estimator (MLE) and Maximum likelihoodestimate

3 Gradient and Hessian Matrix (deterministic elements)4 Score Vector and Hessian Matrix (random elements)5 Fisher information matrix associated to the sample6 (Average) Fisher information matrix for one observation7 FDCR or Cramer Rao Bound: the notion of e¢ ciency8 Asymptotic distribution of the MLE9 Asymptotic variance of the MLE10 Estimator of the asymptotic variance of the MLE

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 206 / 207

Page 207: Chapter 2: Maximum Likelihood Estimation · Chapter 2: Maximum Likelihood Estimation Advanced Econometrics - HEC Lausanne Christophe Hurlin University of OrlØans December 9, 2013

End of Chapter 2

Christophe Hurlin (University of Orléans)

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne December 9, 2013 207 / 207