Charles University

Charles University

FSV UK

STAKAN III

Institute of Economic Studies

Faculty of Social Sciences Institute of Economic Studies

Faculty of Social Sciences

Jan Ámos VíšekJan Ámos Víšek

Econometrics Econometrics

Tuesday, 14.00 – 15.20

Charles University

Third Lecture

http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/

Schedule of today talk

Recalling OLS and definition of linear estimator.

Discussion of restrictions on linearity in the case of estimators and of models.

Proof of the theorem given at the end of last lecture.

Definition of the best ( linear unbiased ) estimator.

Under normality of disturbances OLS is BUE.

T1T0)n,OLS( XXXˆ

T1T0)n,OLS( XXXˆ

YXXXˆ T1T)n,OLS(

Ordinary Least Squares (odhad metodou nejmenších čtverců)

Definition

An estimator where LY)X,Y(~ )X(LL

is matrix, is called the linear estimator .)np(

ij is Kronecker delta, i.e. if and for . 1ij ji 0ij ji

Assumptions

Let be a sequence of r.v’s,1ii }{

Assertions

Then is the best linear unbiased estimator .

,,0 ij2

jii ),0(2

)n,SLO(

Assumptions

If moreover , and)n(OXX T )n(O)XX( 11T ‘s are independent,

If further , regular matrix,

.

QXXlim Tn

1

n

Assertions

)n,SLO( is consistent.Assumptions

Assertions

then

Q

),0())ˆ(n n0)n,OLS( N(L

where .120)n,OLS( Q))ˆ(n(cov

Theorem

Proof

}{ T1T0)n,OLS( XXXˆ

0T1T0 XXX

)n,OLS( is unbiased

)n,OLS( is linear

YLYXXXˆ T1T)n,OLS(

)n,OLS( is BLUE

T1T XXX LRemember that we have denoted by .

Definition

The estimator is the best one in given class of estimators if for any other , the matrix

is positive definite, i.e. for any , we have

.

G~ G

}ˆ{cov}~

{cov pR

0}ˆ{cov}~

{covT

})ZZ()ZZ{(}Zcov{ T Recalling that

T2)n,OLS( LL}ˆcov{

)n,OLS( is the best in the class of unbised linear estimators

LY~ ,

00p0 LXLY~

R

})LY()LY(}

~cov{ T00{

})LXLY()LXLY T00{(

}L)XY()XY(L TT00{ T2T2TT LLL}I{LL}L {

i.e. (unit matrix) ILX

)n,OLS( is the best in the class of unbised linear estimators

T2)n,OLS(T2 LL}ˆcov{,LL}~

cov{

1TT1TT )XX(XX)XX(LL)LL(

0)XX()XX()XX()XX(LX 1T1T1T1T

TT )LLL)(LLL(LL

TT LL)LL)(LL(

TLL TT )LL)(LL(LL

TT*T*T LL)LL(LL)LL()LL)(LL(

T1T0)n,OLS( XXXˆ

T1T0 XXXn

n

)n,OLS( is consistent

T1

T0 Xn

1XX

n

1

pT)n( RXn

1Z

i

n

1i ik)n(

k Xn

1Z

2ik

2iiiiki X}{var,0,X

Denote then

and put


Let be a sequence of independent r.v’s with

finite means and positive variances , .

1ii }{

Let moreover

i2i ,2,1i

0}{varn

1i ni2n1 .

Then 0)(n

1i niin1 in probability .

Lemma – law of large numbers

0

)n1i )

ii(n

1(P

n

1i i 0}{var2n2

For any

Proof :

0X}{varn

1i

2ik2

2

i

n

1i2 nn1

2ik

2iiiiki X}{var,0,X



finite means and positive variances , .

1ii }{

Let moreover

i2i ,2,1i

0}{varn

1i ni2n1 .

Then 0)(n

1i niin1

Recalling previous slide: Lemma – law of large numbers

0)(XZ ik

n

1i iki

n

1i ik)n(

kn

1

n

1

in probability .

in probability .

)n,OLS( is asymptotically normal


finite means and positive variances ,

1ii }{

. Let moreover

ii 2i

,2,1i

n

1i i2n }{varC .

Then

0)C(maxlim 1ni

ni1n

.

Central Limit Theorem - Feller- Lindeberg

and )(CZ i

n

1i i1

nn

and )1,0()Z( n NL

if and only if for any

0)z(F)z(Climn

1iCz

2i

2nn

ni

d

0


Let

1n

)n(p

)n(2

)n(11n

)n( }Z,,Z,Z{}Z{

be sequence of vectors from with d.f. .

Varadarajan theorem

pR )n(F

Further let for any be the d.f. ofpR )n(F

)n(pp

)n(22

)n(11 ZZZ .

Moreover, let be d.f. of F p21 Z,,Z,Z and be d.f. of F pp2211 ZZZ .

If for any , then . )n(F F )n(F FpR

T1T0)n,OLS( XXXˆ


T

1T0)n,OLS( XXX

n

1)ˆ

n

1(n

Firstly we verify conditions of Feller-Lindeberg theorem for

TXn

1T, for arbitrary and secondly we apply Vara-

darajan theorem. Then we transform asymptotically normally

distributed vector by matrix .TXn

1 1T XX

n

1

pR

)n,OLS( is the best in the class of unbiased linear estimators

REMARK

p,,2,1j,0XXY ij

n

1i

Tii

Normal equations

(See the next slides ! )

If either for some Tii 11

XY 1i

or for some 2i ji2X are large,

it may cause serious problems when solving normal equations

and solution can be rather strange.

Outlier

Solution given by OLS

A “reasonable” model, neglecting the outlier

Leverage point

Solution given by OLS

A “reasonable” model, neglecting the leverage point

Solution given by OLS may be different from that expected by common sense.

Conclusion I

One reason is that is the best only among linear estimators. )n,SLO(

Drawing the data from previous slide on the screen of PC, the common sense propose to reject the leverage point and then apply OLS.

We obtain than “reasonable” model but it can’t be written as where is the response for all data. So this estimator is not linear.

LY Y

Restriction on the linear estimators can appear to be drastic !!

Conclusion II

Restriction on the linear regression model is not substantial.

Conclusion III

And what represents the restriction on the linear model ?

Time total = -3.62 + 1.27 * Weight - 0.53 * Puls - 0.51 * Strength + 3.90 * Time per ¼-mile

Remember, we have considered model

is not a better one.

Time total = -3.62 + 1.27 * Weight + a* Weight - 0.53 * Puls + b* Puls - 0.51 * Strength + c* log(Strength) + 3.90 * Time per ¼-mile

But it is easy to test whether the model 2

3

System of all polynomials is dense in the space of continuous functions on a compact space.

Weierstrass approximation theorem

What is a mutual relations

of linearity of the estimator of regression coefficients

linearity of regression model ?

and

NONE NONE

The answer is simpler then one would expect :

( and to use OLS only under these conditions ).

We should find the conditions under which OLS are (is) the best estimator among all unbiased estimators.

Conclusion IV

And why OLS became so popular ?

It has a simple geometric interpretation, implying existence of solution together with an easy proof of its properties.

Nowadays however there is a lot of implementation which are safe against numerical difficulties.

There is a simple formula for evaluating it, although the evaluation need not be straightforward.

Firstly

Secondly

F)( i L )z(f F

n

1i

Tii

R

)n,ML( )XY(fmaxargˆp

Let and be the density of the distribution

Recalling the definitionMaximum Likelihood Estimator (maximálně věrohodný odhad)

)n,SLO( )n,LM( )n,SLO(

)n,SLO(

Then and attains Rao-Cramer

lower bound, i.e. is the best unbiased estimator.

),0()( 2i NL )n,SLO( )n,LM(then and .

1ii }{ ),0(),,0()( 22

i NLLet be iid. r.v’s, .

)n,SLO(If is the best unbiased estimator attaining

Rao-Cramer lower bound of variance,

Theorem Assumptions

Assumptions

Assertions

Assertions

BLUE

n

1i2

2Tii

R

)n,ML( }{ }2

)XY(exp{)2/1(maxargˆ

p

n

1i 2

2Tii

R

}{2

)XY()2log(maxarg

p

n

1i 2

2Tii

R

}{2

)XY(maxarg

p

n

1i

2Tii

R

)XY(minargp

)n,SLO(

Maximum Likelihood Estimator under assumption of normality of disturbances

A monotone transformation doesn’t change location of extreme!

This is a constant with respect to

The change of sign changes “max” to “min” !

y),y(f),y(fˆ }{ )2()1(j

)2(j

)1(j d

If is unbiased, then

write instead of ),X,y(fn ),y(f

Denote joint density of disturbances by

y),y(f}){,y(f

),y(f),y(fˆ )2()2(

k)1(

k)2(

)2()1(

j)2(k

)1(k

)2(j

)1(j d

),X,y(fn

Recalling Rao-Cramer lower bound of variance of unbiased estimator

Let us divide both sides by )2(

k)1(

k

So we have

y),y(f}){,y(f

),y(f),y(fˆ )2()2(

k)1(

k)2(

)2()1(

j)2(k

)1(k

)2(j

)1(j d

y),y(f),y(flogˆ

kjjk d

)2(

In matrix form

y),y(f),y(flogˆI

T

d

was arbitrary hence write instead of it

Multiply it by from the left-hand-side and by from the right-one.

p,,1k,1k,,1l,)2(l

)1(l

)2(k

)1(k 0. Then let .

Assume that ,

T

So we have for any

y),y(f),y(flogˆ

T

TT d

pR

y),y(f),y(flog),y(flog

d

y),y(f

y),y(f),y(f

1),y(fdd

01y),y(fy),y(f

dd

Intermediate considerations

So we have for any

But then

y),y(f),y(flog),y(flogˆ

T

TT d

pR


T

T d


T

T d

0

Further intermediate considerations

Finally write as ),y(f ),y(f),y(f

So we have for any

),y(fˆˆTT

pR


T

d

y),y(fˆˆ2

TT d

2

1


2T

d

Applying Cauchy-Schwarz inequality

dx)x(hdx)x(gdx)x(h)x(g 22

2

So we have for any pR

),y(flog

varˆˆvar TTTT

2T2T ˆˆ)(

2T),y(flog

TT ˆˆˆˆ

T

T ),y(flog),y(flog

ˆcovT

),y(flog

covT TT2T )(

,

Notice, both r.v. are scalars!!

i. e.

Since it holds for any , we have

( in the sense of positive semidefinite matrices)

pR

T ˆcov

),y(flogcovT

1

T ),y(flogcov

Tˆcov

1

TT ),y(flogcov TT ˆcov

),y(flog

covAssuming regularity of

Select with 12T

Since it holds for any , we have pR

and

1

),y(flogcovˆcov0

(inequality is in the sense of positive semidefinite matrices).

ˆˆTT


T

d

Cauchy-Schwarz inequality has been applied on

1

T ),y(flogcov ˆcovT

We would like to reach equality !

),y(flog

, i.e.

)(),X,y(flog

)()X,y(ˆ n

n

1i2

2Tii

n }{ }2

)Xy(exp{)2/1(),X,y(f

Remember the joint density of disturbances is

n

1i iTii

n X)Xy(),X,y(flog

Hence the equality is reached iff is a linear function of

where is a matrix and . )( )pp( pR)(

)(XX)(Xy)()X,y(ˆ Ti

n

1i ii

n

1i i

Hence

pTi

n

1i i Ra)(XX)(

)(X)Xy()()X,y(ˆi

Tii

SoaYX)()X,y(ˆ T

.

,

.

aXX)(a)X(X)(aYX)( TTT

)X,y( cannot depend on

is to be unbiased, i.e. for any )X,y( pR

and so with . 1T XX)(

0a

)n,SLO(T1T ˆYXXX)X,Y(ˆ Finally .

If attains Rao-Cramer lower bound, then the equality in Cauchy Schwarz inequality is reached and hence

)n,SLO(

( write instead of ))X,y(ˆ )n,SLO(

)(

),X,y(flog)()X,y(ˆ n)n,SLO(

)(ˆ)(),X,y(flog 1n

)y(U)(ˆ)(),X,y(flog n

)y(U)(yX)XX)((),X,y(flog T1Tn

(notice that after integration )pR)(

The proof of opposite direction.

)y(U)(yX)XX)((),X,y(flog T1Tn

)y(U)(yX),X,y(flog TT2n

)y(U)(expyXexp),X,y(f TT2n

)y(U~

)(~

exp)Xy()Xy(2

1expf T

2n

This we only rewrote from the previous slide

Since , for any regular matrix ,

there is a vector so that .

pR)( XX T

XX TT 2)(

It has to hold for any and any of type pR X )pn(

1y),X,y(fn d y),X,y(fyX)XX( nT1T dand

Imposing the marginal conditions, we obtain finally

)Xy()Xy(

2

1exp

2

1),X,y(f T

2

n

n

What is to be learnt from this lecture for exam ?

Linearity of estimator and of model – what advantages and restrictions do they represent ?

What means : “The estimator is the best in the class of … .”?

OLS is the best unbiased estimator - the condition(s) for it.

All what you need is on http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010

Documents

Charles University