65
GVlogo Hessian Matrices In Statistics Ferris Jumah, David Schlueter, Matt Vance MTH 327 Final Project December 7, 2011 Hessian Matrices in Statistics

Hessian Matrices in Statistics

Embed Size (px)

DESCRIPTION

Hessian Matrices in Statistics

Citation preview

Page 1: Hessian Matrices in Statistics

GVlogo

Hessian Matrices In Statistics

Ferris Jumah, David Schlueter, Matt Vance

MTH 327Final Project

December 7, 2011

Hessian Matrices in Statistics

Page 2: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .

Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 3: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrix

Brief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 4: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statistics

Maximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 5: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)

Fisher Information and Applications

Hessian Matrices in Statistics

Page 6: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 7: Hessian Matrices in Statistics

GVlogo

The Hessian Matrix

Recall the Hessian matrix

H(f) =

∂2f∂x21

∂2f∂x1 ∂x2

· · · ∂2f∂x1 ∂xn

∂2f∂x2 ∂x1

∂2f∂x22

· · · ∂2f∂x2 ∂xn

......

. . ....

∂2f∂xn ∂x1

∂2f∂xn ∂x2

· · · ∂2f∂x2n

(1)

Hessian Matrices in Statistics

Page 8: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 9: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential Statisitics

Parameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 10: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 11: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 12: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)

E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 13: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 14: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample?

X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 15: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.

Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 16: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 17: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 18: Hessian Matrices in Statistics

GVlogo

Stats cont.

Estimators (θ) of Population Parameters

Definition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample dataMany estimators, but which is the best?

Hessian Matrices in Statistics

Page 19: Hessian Matrices in Statistics

GVlogo

Stats cont.

Estimators (θ) of Population ParametersDefinition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample data

Many estimators, but which is the best?

Hessian Matrices in Statistics

Page 20: Hessian Matrices in Statistics

GVlogo

Stats cont.

Estimators (θ) of Population ParametersDefinition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample dataMany estimators, but which is the best?

Hessian Matrices in Statistics

Page 21: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood Estimation

GOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 22: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sample

Likelihood FunctionWe obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 23: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 24: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)

Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 25: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 26: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 27: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 28: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regression

Recall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 29: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector w

Likelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 30: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 31: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 32: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 33: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 34: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 35: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATA

Our resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 36: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 37: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 38: Hessian Matrices in Statistics

GVlogo

Computing the Hessian Matrix

We compute the Hessian in order to show that this is minimum

∂wk∇S =

∂wk

w1

x1,1

...

xn,1

+ · · ·+ wk

x1,k

...

xn,k

+ · · ·+ wn

x1,n

...

xn,n

=

x1,k

...

xn,k

Therefore,

H = ATA (9)

which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function

Hessian Matrices in Statistics

Page 39: Hessian Matrices in Statistics

GVlogo

Computing the Hessian Matrix

We compute the Hessian in order to show that this is minimum

∂wk∇S =

∂wk

w1

x1,1

...

xn,1

+ · · ·+ wk

x1,k

...

xn,k

+ · · ·+ wn

x1,n

...

xn,n

=

x1,k

...

xn,k

Therefore,H = ATA (9)

which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function

Hessian Matrices in Statistics

Page 40: Hessian Matrices in Statistics

GVlogo

Computing the Hessian Matrix

We compute the Hessian in order to show that this is minimum

∂wk∇S =

∂wk

w1

x1,1

...

xn,1

+ · · ·+ wk

x1,k

...

xn,k

+ · · ·+ wn

x1,n

...

xn,n

=

x1,k

...

xn,k

Therefore,

H = ATA (9)

which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function

Hessian Matrices in Statistics

Page 41: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and Disadvantages

Larger samples, as n→∞, give better estimatesθn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 42: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 43: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 44: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other Advantages

Disadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 45: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fit

Begs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 46: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 47: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher Information

We determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 48: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 49: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 50: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 51: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,(

∂ ln[f(x|θ)]∂µ

,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 52: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).

The gradient of the log likelihood is,(∂ ln[f(x|θ)]

∂µ,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 53: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,

(∂ ln[f(x|θ)]

∂µ,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 54: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,(

∂ ln[f(x|θ)]∂µ

,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 55: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 56: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 57: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix.

We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 58: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 59: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 60: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .

Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 61: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 62: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θ

Wald Test: Comparing a proposed value of θ against the MLETest statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 63: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 64: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 65: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics