22
MATH2831/2931 Linear Models/ Higher Linear Models. August 19, 2013

Week4Lect1

Embed Size (px)

DESCRIPTION

sfv

Citation preview

  • MATH2831/2931

    Linear Models/ Higher Linear Models.

    August 19, 2013

  • Week 4 Lecture 1 - Last lecture:

    Properties of least squares estimator of .

  • Week 4 Lecture 1 - This lecture:

    Comparing estimators: This lecture we revist the concept ofproperties of estimators(recall ideas of Cramer-Rao lower bounds)

    Gauss-Markov theorem.

    Fisher-Neyman Factorization

    Maximum likelihood estimation.

    Estimation of the error variance.

    Interval estimation of coefficients in the general linear model.

  • Week 4 Lecture 1 - Properties of Estimators Revisited!

    We have seen some general asymptotic results for estimators(Cramer-Rao, Efficiency, Optimality) and we considered theseresults in the context of Maximum Likelihood Estimators,MLE.

    We have seen that asympotitically the MLE estimators areasymptotically efficient and satisfy the Cramer-Rao lowerbound, and they are asympotically optimal.

    What can be said about Least Squares Estimators ???

    Now we consider a sub-class of estimators (linear unbiasedestimators) and we consider properties this time of the LeastSquares Estimators (LSE) in more detail!

  • Week 4 Lecture 1 - Gauss-Markov Theorem

    Ability to compute Var(b) provides a way of comparing bwith other estimators.

    Linear estimator: estimator of the form Ay for some p nmatrix A (setting A = (XX )1X gives the least squaresestimator).NOTE: we already saw this result in week 2, lect. 1.

    We have seen some general asymptotic results for estimators(Cramer-Rao, Efficiency, Optimality) and we considered theseresults in the context of Maximum Likelihood Estimators,MLE.

    Now we consider a sub-class of estimators (linear unbiasedestimators) and we consider properties this time of the LeastSquares Estimators (LSE) in more detail!

  • Week 4 Lecture 1 - Gauss-Markov Theorem

    Gauss-Markov Theorem: if b is any unbiased linear estimator of, and if b is the least squares estimator, then Var(bi ) Var(bi ),i = 1, ..., p.

    Say that b is the best linear unbiased estimator (BLUE) of.

    This is like a restricted version of the result we saw in week 2,lect 1. on Cramer-Rao lower bounds.

    It restricts the class of estimators to linear and unbiased !!

  • Week 4 Lecture 1 - Gauss-Markov theorem

    Proof of Gauss-Markov Theorem:

    Let b be a linear unbiased estimator of :

    b = ((XX )1X + B)y

    for some p n matrix B(NOTE: B = 0 gives least squares).

    The expectation of this linear estimator is given by

    E(b) = ((XX )1X + B)X

    = + BX

    Since b is unbiased, we must have BX = 0.

  • Week 4 Lecture 1 - Gauss-Markov theorem

    What about the variance ?

    Hint - use the rules of variance from last lecture!

    Var(b) = Var((

    (XX )1X + B)y)

    = ((XX )1X + B)2I ((XX )1X + B)

    = 2((XX )1X + B)(X (XX )1 + B)

    = 2((XX )1XX (XX )1

    + (XX )1XB + BX (XX )1 + BB).

  • Week 4 Lecture 1 - Gauss-Markov theorem

    Use BX = 0,

    Var(b) = 2((XX )1 + BB

    )So the variance of bi is the variance of bi plus the ith diagonalelement of BB.But

    (BB)ii =n

    j=1

    B2ij ,

    a positive quantity.

    CONCLUSION: The variance of any linear unbiased estimatoris at least as big of the variance of the least squaresestimator.

  • Week 4 Lecture 1 - Fisher-Neyman Factorization Theoremand Sufficiency

    RECALL: we considered the property of asympototically efficientestimators variance of estimator achieves the Cramer-Rao lower bound,i.e. (single parameter unbiased estimator case this means thevariance of the estimator is asymptotically given by the inverse ofthe Fisher Information matrix)

    Sufficiency of an estimator:

    Interpretation 1: crudely speaking we say a statistic is sufficientfor the parameter it is estimating, if it exhausts all the usefulinformation about the parameter contained in the observations.

    Interpretation 2: The statistic T (Y1:n) condenses the sample ofobservations in such a way that no information about theparameter being estimated, is lost.

  • Week 4 Lecture 1 - Fisher-Neyman Factorization Theoremand Sufficiency

    Fisher-Neyman Factorization Theorem: Let Y denote arandom variable with density f (Y ; ). Furthermore let Y1:n be arandom sample drawn from the distribution with joint density

    f (y1, . . . , yn; ). Then the statistic T (Y1:n) is sufficient forparameter if and only if:

    f (y1, . . . , yn; ) = g [T (Y1:n); ] h (y1:n)

    where function g depends on the ys only through estimator

    T (Y1:n) and function h does not depend on parameter .

  • Week 4 Lecture 1 - Fisher-Neyman Factorization Theoremand Sufficiency

    Fisher-Neyman Factorization theorem helps us to recognise asufficient statistic from knowledge of the form of the jointdistribution of the sample.

    Interpretation: This theorem states that statistic T is sufficientfor if and only if the joint density of the sample can be factoredinto two components

    the first dependent only on the statistic and the parameter

    the second independent of the parameter.

  • Week 4 Lecture 1 - Least squares in the general linearmodel

    General linear model in matrix form:

    y = X +

    with X a n (k + 1) = n p matrix of full rank and is a(k + 1) 1 vector of unknown parameters.Least squares estimator of :

    b = (XX )1Xy .

    (Inverse of XX exists when X has full rank).The least squares estimator of is the maximum likelihoodestimator assuming normality.Maximum likelihood estimator 2 of 2 is ???

  • Week 4 Lecture 1 - Maximum likelihood estimatorsRECALL: simple linear regression model the least squares andmaximum likelihood coincide under normality assumptions.

    Consider the general linear model!

    Likelihood in the general linear model:

    L(, 2) =

    ni=1

    122

    exp

    ( 122

    (yi (X)i )2)

    = (22)n2 exp

    ( 122

    ni=1

    (yi (X)i )2)

    But

    ni=1

    (yi (X)i )2 = (y X)(y X)

    and so we can rewrite L(, 2) as

    (22)n2 exp

    ( 122

    (y X)(y X)).

  • Week 4 Lecture 1 - Maximum likelihood estimators

    Log-likelihood l(, 2),

    l(, 2) = n2log(2) n

    2log(2)

    122

    (y X)(y X).

    AGAIN: maximizing the likelihood is equivalent to maximizing thelog-likelihood

    Log-likelihood is maximized with respect to regardless of thevalue of 2 by minimizing (y X)(y X) with respect to .RESULT: Maximum likelihood and least squares coincide undernormality assumption

  • Week 4 Lecture 1 - Maximum likelihood estimation

    DERIVE: maximum likelihood estimator of 2.Differentiating with respect to 2,

    2l(, 2) = n

    22+

    1

    24(y X)(y X).

    Write b for least squares estimator of , 2 for maximumlikelihood estimator of 2,

    n

    22=

    1

    24(y Xb)(y Xb)

    so that

    2 =1

    n(y Xb)(y Xb).

  • Week 4 Lecture 1 - Bias of MLE of variance

    What is E(2) in the general linear model case?Since b = (XX )1Xy , we have that E(2) is

    1

    nE

    ((y X (XX )1Xy)(y X (XX )1Xy)

    )or

    1

    nE(y(I X (XX )1X)(I X (XX )1X)y).

    I claim that (our condition of idempotency)

    (I X (XX )1X)2 = (I X (XX )1X).

    Hence

    E(2) =1

    nE

    (y(I X (XX )1X)y

    ).

    Matrix A with A2 = A: idempotent

  • Week 4 Lecture 1 - Matrix traces

    Some useful matrix identities!! (expected to know and use)

    Matrix X k k , trace of X is

    tr(X ) =k

    i=1

    Xii .

    Theorem:i) Let c be a real number. Then tr(cX ) = c .tr(X ).ii) tr(X + Y ) = tr(X ) + tr(Y )iii) If X is an n p matrix, and Y is a p n matrix, thentr(XY ) = tr(YX ).

  • Week 4 Lecture 1 - Matrix traces

    Lemma:Let y be a k 1 random vector with E(y) = and Var(y) = V .Let A be a k k matrix of real numbers.Then

    E(yAy) = tr(AV ) + A.

    Use this result to show:

    E(2) =n pn

    2.

  • Week 4 Lecture 1 - Matrix traces

    Proof:

    E(2) =1

    nE(y(I X (XX )1X)y)

    =1

    n

    (2tr(I X (XX )1X

    )+ (X)

    (I X (XX )1X)(X)

    )=

    1

    n

    (2tr(I ) 2tr(X (XX )1X

    )+ X(I X (XX )1X)X)

    =1

    n

    (2n 2tr(XX (XX )1

    )+ XX XX (XX )1XX)

    = 2n pn

  • Week 4 Lecture 1 - Estimation of error varianceSince

    E(2) = 2n pn

    we clearly have that

    n

    n pE(2) = 2

    and son

    n p2

    is an unbiased estimator of 2.We have

    n

    n p2 =

    n

    n p(y Xb)(y Xb)

    n

    =(y Xb)(y Xb)

    n p

    is an unbiased estimator of 2.

  • Week 2 Lecture 2 - Learning Expectations.

    Be familiar with the matrix formulation of the linear regressionmodel.

    Be able to apply basic matrix manipulations to obtain theleast squares estimate and its mean and variance.

    Understand the significance of properties of estimators and beable to derive and state properties of different estimators.