Upload
madge
View
19
Download
2
Embed Size (px)
DESCRIPTION
Tuesday, 14.00 – 15.20. Charles University. Charles University. Econometrics. Econometrics. http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/. http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/. Jan Ámos Víšek. Jan Ámos Víšek. FSV UK. Institute of Economic Studies - PowerPoint PPT Presentation
Citation preview
Charles University
FSV UK
STAKAN III
Institute of Economic Studies
Faculty of Social Sciences Institute of Economic Studies
Faculty of Social Sciences
Jan Ámos VíšekJan Ámos Víšek
Econometrics Econometrics
Tuesday, 14.00 – 15.20
Charles University
Third Lecture
http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010/
Schedule of today talk
Recalling OLS and definition of linear estimator.
Discussion of restrictions on linearity in the case of estimators and of models.
Proof of the theorem given at the end of last lecture.
Definition of the best ( linear unbiased ) estimator.
Under normality of disturbances OLS is BUE.
T1T0)n,OLS( XXXˆ
T1T0)n,OLS( XXXˆ
YXXXˆ T1T)n,OLS(
Ordinary Least Squares (odhad metodou nejmenších čtverců)
Definition
An estimator where LY)X,Y(~ )X(LL
is matrix, is called the linear estimator .)np(
ij is Kronecker delta, i.e. if and for . 1ij ji 0ij ji
Assumptions
Let be a sequence of r.v’s,1ii }{
Assertions
Then is the best linear unbiased estimator .
,,0 ij2
jii ),0(2
)n,SLO(
Assumptions
If moreover , and)n(OXX T )n(O)XX( 11T ‘s are independent,
If further , regular matrix,
.
QXXlim Tn
1
n
Assertions
)n,SLO( is consistent.Assumptions
Assertions
then
Q
),0())ˆ(n n0)n,OLS( N(L
where .120)n,OLS( Q))ˆ(n(cov
Theorem
Proof
}{ T1T0)n,OLS( XXXˆ
0T1T0 XXX
)n,OLS( is unbiased
)n,OLS( is linear
YLYXXXˆ T1T)n,OLS(
)n,OLS( is BLUE
T1T XXX LRemember that we have denoted by .
Definition
The estimator is the best one in given class of estimators if for any other , the matrix
is positive definite, i.e. for any , we have
.
G~ G
}ˆ{cov}~
{cov pR
0}ˆ{cov}~
{covT
})ZZ()ZZ{(}Zcov{ T Recalling that
T2)n,OLS( LL}ˆcov{
)n,OLS( is the best in the class of unbised linear estimators
LY~ ,
00p0 LXLY~
R
})LY()LY(}
~cov{ T00{
})LXLY()LXLY T00{(
}L)XY()XY(L TT00{ T2T2TT LLL}I{LL}L {
i.e. (unit matrix) ILX
)n,OLS( is the best in the class of unbised linear estimators
T2)n,OLS(T2 LL}ˆcov{,LL}~
cov{
1TT1TT )XX(XX)XX(LL)LL(
0)XX()XX()XX()XX(LX 1T1T1T1T
TT )LLL)(LLL(LL
TT LL)LL)(LL(
TLL TT )LL)(LL(LL
TT*T*T LL)LL(LL)LL()LL)(LL(
T1T0)n,OLS( XXXˆ
T1T0 XXXn
n
)n,OLS( is consistent
T1
T0 Xn
1XX
n
1
pT)n( RXn
1Z
i
n
1i ik)n(
k Xn
1Z
2ik
2iiiiki X}{var,0,X
Denote then
and put
)n,OLS( is consistent
Let be a sequence of independent r.v’s with
finite means and positive variances , .
1ii }{
Let moreover
i2i ,2,1i
0}{varn
1i ni2n1 .
Then 0)(n
1i niin1 in probability .
Lemma – law of large numbers
0
)n1i )
ii(n
1(P
n
1i i 0}{var2n2
For any
Proof :
0X}{varn
1i
2ik2
2
i
n
1i2 nn1
2ik
2iiiiki X}{var,0,X
)n,OLS( is consistent
Let be a sequence of independent r.v’s with
finite means and positive variances , .
1ii }{
Let moreover
i2i ,2,1i
0}{varn
1i ni2n1 .
Then 0)(n
1i niin1
Recalling previous slide: Lemma – law of large numbers
0)(XZ ik
n
1i iki
n
1i ik)n(
kn
1
n
1
in probability .
in probability .
)n,OLS( is asymptotically normal
Let be a sequence of independent r.v’s with
finite means and positive variances ,
1ii }{
. Let moreover
ii 2i
,2,1i
n
1i i2n }{varC .
Then
0)C(maxlim 1ni
ni1n
.
Central Limit Theorem - Feller- Lindeberg
and )(CZ i
n
1i i1
nn
and )1,0()Z( n NL
if and only if for any
0)z(F)z(Climn
1iCz
2i
2nn
ni
d
0
)n,OLS( is asymptotically normal
Let
1n
)n(p
)n(2
)n(11n
)n( }Z,,Z,Z{}Z{
be sequence of vectors from with d.f. .
Varadarajan theorem
pR )n(F
Further let for any be the d.f. ofpR )n(F
)n(pp
)n(22
)n(11 ZZZ .
Moreover, let be d.f. of F p21 Z,,Z,Z and be d.f. of F pp2211 ZZZ .
If for any , then . )n(F F )n(F FpR
T1T0)n,OLS( XXXˆ
)n,OLS( is asymptotically normal
T
1T0)n,OLS( XXX
n
1)ˆ
n
1(n
Firstly we verify conditions of Feller-Lindeberg theorem for
TXn
1T, for arbitrary and secondly we apply Vara-
darajan theorem. Then we transform asymptotically normally
distributed vector by matrix .TXn
1 1T XX
n
1
pR
)n,OLS( is the best in the class of unbiased linear estimators
REMARK
p,,2,1j,0XXY ij
n
1i
Tii
Normal equations
(See the next slides ! )
If either for some Tii 11
XY 1i
or for some 2i ji2X are large,
it may cause serious problems when solving normal equations
and solution can be rather strange.
Outlier
Solution given by OLS
A “reasonable” model, neglecting the outlier
Leverage point
Solution given by OLS
A “reasonable” model, neglecting the leverage point
Solution given by OLS may be different from that expected by common sense.
Conclusion I
One reason is that is the best only among linear estimators. )n,SLO(
Drawing the data from previous slide on the screen of PC, the common sense propose to reject the leverage point and then apply OLS.
We obtain than “reasonable” model but it can’t be written as where is the response for all data. So this estimator is not linear.
LY Y
Restriction on the linear estimators can appear to be drastic !!
Conclusion II
Restriction on the linear regression model is not substantial.
Conclusion III
And what represents the restriction on the linear model ?
Time total = -3.62 + 1.27 * Weight - 0.53 * Puls - 0.51 * Strength + 3.90 * Time per ¼-mile
Remember, we have considered model
is not a better one.
Time total = -3.62 + 1.27 * Weight + a* Weight - 0.53 * Puls + b* Puls - 0.51 * Strength + c* log(Strength) + 3.90 * Time per ¼-mile
But it is easy to test whether the model 2
3
System of all polynomials is dense in the space of continuous functions on a compact space.
Weierstrass approximation theorem
What is a mutual relations
of linearity of the estimator of regression coefficients
linearity of regression model ?
and
NONE NONE
The answer is simpler then one would expect :
( and to use OLS only under these conditions ).
We should find the conditions under which OLS are (is) the best estimator among all unbiased estimators.
Conclusion IV
And why OLS became so popular ?
It has a simple geometric interpretation, implying existence of solution together with an easy proof of its properties.
Nowadays however there is a lot of implementation which are safe against numerical difficulties.
There is a simple formula for evaluating it, although the evaluation need not be straightforward.
Firstly
Secondly
F)( i L )z(f F
n
1i
Tii
R
)n,ML( )XY(fmaxargˆp
Let and be the density of the distribution
Recalling the definitionMaximum Likelihood Estimator (maximálně věrohodný odhad)
)n,SLO( )n,LM( )n,SLO(
)n,SLO(
Then and attains Rao-Cramer
lower bound, i.e. is the best unbiased estimator.
),0()( 2i NL )n,SLO( )n,LM(then and .
1ii }{ ),0(),,0()( 22
i NLLet be iid. r.v’s, .
)n,SLO(If is the best unbiased estimator attaining
Rao-Cramer lower bound of variance,
Theorem Assumptions
Assumptions
Assertions
Assertions
BLUE
n
1i2
2Tii
R
)n,ML( }{ }2
)XY(exp{)2/1(maxargˆ
p
n
1i 2
2Tii
R
}{2
)XY()2log(maxarg
p
n
1i 2
2Tii
R
}{2
)XY(maxarg
p
n
1i
2Tii
R
)XY(minargp
)n,SLO(
Maximum Likelihood Estimator under assumption of normality of disturbances
A monotone transformation doesn’t change location of extreme!
This is a constant with respect to
The change of sign changes “max” to “min” !
y),y(f),y(fˆ }{ )2()1(j
)2(j
)1(j d
If is unbiased, then
write instead of ),X,y(fn ),y(f
Denote joint density of disturbances by
y),y(f}){,y(f
),y(f),y(fˆ )2()2(
k)1(
k)2(
)2()1(
j)2(k
)1(k
)2(j
)1(j d
),X,y(fn
Recalling Rao-Cramer lower bound of variance of unbiased estimator
Let us divide both sides by )2(
k)1(
k
So we have
y),y(f}){,y(f
),y(f),y(fˆ )2()2(
k)1(
k)2(
)2()1(
j)2(k
)1(k
)2(j
)1(j d
y),y(f),y(flogˆ
kjjk d
)2(
In matrix form
y),y(f),y(flogˆI
T
d
was arbitrary hence write instead of it
Multiply it by from the left-hand-side and by from the right-one.
p,,1k,1k,,1l,)2(l
)1(l
)2(k
)1(k 0. Then let .
Assume that ,
T
So we have for any
y),y(f),y(flogˆ
T
TT d
pR
y),y(f),y(flog),y(flog
d
y),y(f
y),y(f),y(f
1),y(fdd
01y),y(fy),y(f
dd
Intermediate considerations
So we have for any
But then
y),y(f),y(flog),y(flogˆ
T
TT d
pR
y),y(f),y(flog),y(flogˆ
T
T d
y),y(f),y(flog),y(flogˆ
T
T d
0
Further intermediate considerations
Finally write as ),y(f ),y(f),y(f
So we have for any
),y(fˆˆTT
pR
y),y(f),y(flog),y(flog
T
d
y),y(fˆˆ2
TT d
2
1
y),y(f),y(flog),y(flog
2T
d
Applying Cauchy-Schwarz inequality
dx)x(hdx)x(gdx)x(h)x(g 22
2
So we have for any pR
),y(flog
varˆˆvar TTTT
2T2T ˆˆ)(
2T),y(flog
TT ˆˆˆˆ
T
T ),y(flog),y(flog
ˆcovT
),y(flog
covT TT2T )(
,
Notice, both r.v. are scalars!!
i. e.
Since it holds for any , we have
( in the sense of positive semidefinite matrices)
pR
T ˆcov
),y(flogcovT
1
T ),y(flogcov
Tˆcov
1
TT ),y(flogcov TT ˆcov
),y(flog
covAssuming regularity of
Select with 12T
Since it holds for any , we have pR
and
1
),y(flogcovˆcov0
(inequality is in the sense of positive semidefinite matrices).
ˆˆTT
y),y(f),y(flog),y(flog
T
d
Cauchy-Schwarz inequality has been applied on
1
T ),y(flogcov ˆcovT
We would like to reach equality !
),y(flog
, i.e.
)(),X,y(flog
)()X,y(ˆ n
n
1i2
2Tii
n }{ }2
)Xy(exp{)2/1(),X,y(f
Remember the joint density of disturbances is
n
1i iTii
n X)Xy(),X,y(flog
Hence the equality is reached iff is a linear function of
where is a matrix and . )( )pp( pR)(
)(XX)(Xy)()X,y(ˆ Ti
n
1i ii
n
1i i
Hence
pTi
n
1i i Ra)(XX)(
)(X)Xy()()X,y(ˆi
Tii
SoaYX)()X,y(ˆ T
.
,
.
aXX)(a)X(X)(aYX)( TTT
)X,y( cannot depend on
is to be unbiased, i.e. for any )X,y( pR
and so with . 1T XX)(
0a
)n,SLO(T1T ˆYXXX)X,Y(ˆ Finally .
If attains Rao-Cramer lower bound, then the equality in Cauchy Schwarz inequality is reached and hence
)n,SLO(
( write instead of ))X,y(ˆ )n,SLO(
)(
),X,y(flog)()X,y(ˆ n)n,SLO(
)(ˆ)(),X,y(flog 1n
)y(U)(ˆ)(),X,y(flog n
)y(U)(yX)XX)((),X,y(flog T1Tn
(notice that after integration )pR)(
The proof of opposite direction.
)y(U)(yX)XX)((),X,y(flog T1Tn
)y(U)(yX),X,y(flog TT2n
)y(U)(expyXexp),X,y(f TT2n
)y(U~
)(~
exp)Xy()Xy(2
1expf T
2n
This we only rewrote from the previous slide
Since , for any regular matrix ,
there is a vector so that .
pR)( XX T
XX TT 2)(
It has to hold for any and any of type pR X )pn(
1y),X,y(fn d y),X,y(fyX)XX( nT1T dand
Imposing the marginal conditions, we obtain finally
)Xy()Xy(
2
1exp
2
1),X,y(f T
2
n
n
What is to be learnt from this lecture for exam ?
Linearity of estimator and of model – what advantages and restrictions do they represent ?
What means : “The estimator is the best in the class of … .”?
OLS is the best unbiased estimator - the condition(s) for it.
All what you need is on http://samba.fsv.cuni.cz/~visek/Econometrics_Up_To_2010