32
2 MATH2831/2931 Linear Models/ Higher Linear Models. September 13, 2013

21Week7Lect3

Embed Size (px)

DESCRIPTION

n bn

Citation preview

  • 2MATH2831/2931

    Linear Models/ Higher Linear Models.

    September 13, 2013

  • 3Week 7 Lecture 3 - Last lecture:

    DFFITS

    DFBETAS

    COVRATIO

  • 4Week 7 Lecture 3 - This lecture:

    Variance stabilizing transformations.

    Examples: snow geese, inflation rates and central bankindependence.

    Weighted regression

    Transformations for modelling non-linear relationshipsbetween a response and a single predictor.

  • 5Week 7 Lecture 3 - Transformations

    Diagnostic measures detect violations of assumptions: how dowe deal with these violations?

    One approach which is sometimes applicable:use of transformations.

    Appropriateness of transformation and the type oftransformation used depends on the nature of the violations ofassumptions.

    Are there problems with: constant variance, independence ofnoise random variables i , distributional assumption for noise,mean...

  • 6Week 7 Lecture 3 - Transformations

    This lecture: variance stabilizing transformations(appropriate when constancy of error variance is violated).

    Applying a transformation to fix a violation of assumptionsmay cause another violation which did not appear on theoriginal scale.

    What is the effect of transformation on the model for theerrors?

  • 7Week 7 Lecture 3 - Variance stabilizing transformationsWrite y for a typical response. Common variance stabilizing transformations: Square root transformation,

    y : appropriate when error

    variance is proportional to the mean.Exercise: demonstrate this result by taking an

    appropriate 1st order Taylor series expansion of the

    response.

  • 8Week 7 Lecture 3 - Variance stabilizing transformations

    Write y for a typical response.

    First order Taylor series expansion ofy about expected

    value E(y) of y :y is approximately

    E(y) +

    1

    2E(y)

    (y E(y)).

    From this variance ofy is approximately

    1

    4E(y)Var(y)

  • 9Week 7 Lecture 3 - Variance stabilizing transformations

    So if variance of y is proportional to the mean, variance ofy is approximately constant.

    Square root transformation is often used with count data

    if a Poisson distribution is an appropriate model for thecounts, then the variance is equal to the mean property ofPoisson distribution.

    E(N) = Var(N) =

  • 10

    Week 7 Lecture 3 - Variance stabilizing transformationsWrite y for a typical response. Other common variance stabilizing transformations: log y : appropriate when standard deviation is proportional to

    the mean.Exercise: demonstrate this result by taking an

    appropriate 1st order Taylor series expansion of the

    response.

  • 11

    Week 7 Lecture 3 - Variance stabilizing transformations

    Other common transformations.

    Log transformation: log y is approximately

    logE(y) + (y E(y))/E(y)

    giving variance of log y approximately

    Var(y)

    E(y)2.

    If the standard deviation is proportional to the mean, logtransformation stabilizes the variance.

  • 12

    Week 7 Lecture 3 - Variance stabilizing transformations

    Inverse transformation 1/y : useful when error standarddeviation is proportional to the square of the mean.

    All the transformations weve considered are only appropriatewith a positive response.

    When zeros occur, might use log(y + 1) or 1/(y + 1) insteadof log and inverse transformations.

  • 13

    Week 7 Lecture 3 - Variance stabilizing transformations

    Freeman-Tukey transformationy +

    y + 1: useful when

    variance of y is proportional to the mean and some of theresponses are zero or very small.

    sin1(y): useful when variance of y is proportional to

    E(y)(1E(y)) such as binomial proportions where 0 y 1.

  • 14

    Week 7 Lecture 3 - Comparing models for transformationsof the response

    If were building a model for predictive purposes, were usuallyinterested in predictions on the original scale.

    We cant compare models for a response y and atransformation z = f (y) (f invertible) of that response bylooking at R2 or for the two different models.

    How should we compare models then?

  • 15

    Week 7 Lecture 3 - Comparing models for transformationsof the response

    Develop statistic for model for transformed response z whichcan be compared with the PRESS statistic for the model for y .

    Write zi ,i : prediction of zi based on fit to all the data withith observation deleted.

    Prediction of yi (original scale): f1(zi ,i ).

  • 16

    Week 7 Lecture 3 - Comparing models for transformationsof the response

    Analogue of PRESS residual on original scale:

    yi f 1(zi ,i ).Compare

    n

    i=1

    (yi f 1(zi ,i ))2

    with PRESS statistic, or

    n

    i=1

    |yi f 1(zi ,i )|

    with sum of absolute PRESS residuals.

    Same idea can be used to compare models for two differenttransformations of the response.

  • 17

    Week 7 Lecture 3 - Snow Geese

    Aerial survey counting methods to estimate Snow Geesenumbers (Hudson Bay, Canada)

    Test reliability of expert counters

    Record expert estimated number of birds, compare to exactphoto numbers

    Common modelling for count data is Poisson distribution(mean=variance)

    Square root transformation expected

    Also consider log-transformation.

  • 18

    Week 7 Lecture 3 - Snow Geese

    Transformation is improvement.Still concern that variance increases with the mean - try log?

  • 19

    Week 7 Lecture 3 - Snow Geese

    Log-transform appears better than square root transform.However, may work with square root for interpretability (data arecounts).

  • 20

    Week 7 Lecture 3 - Snow Geese

    Comparing the PRESS statistics on each scale:Original square root transform log-transformed

    n

    i=1(yi f

    1(zi,i ))2 172,738 137,603 122,704

    n

    i=1|yi f

    1(zi,i )| 1,475.55 1,295.89 1,257.81

    Two statistics are consistent (not always the case see next example)

    Log-transformation again seems to give better fit.

  • 21

    Week 7 Lecture 3 - Inflation rates

  • 22

    Week 7 Lecture 3 - Inflation rates

    Comparing the PRESS statistics on each scale:Original log-transformed

    n

    i=1(yi f 1(zi ,i ))2 16,071 21,611

    n

    i=1|yi f 1(zi ,i )| 433.84 431.19

    Conflict between two statistics is due to outlier.

    Log-transformation preferred on squared-PRESS statistic ifoutlier removed.

  • 23

    Week 7 Lecture 3 - Weighted regression

    If constant error variance assumptions seems to be violated,transformation is one approach to fixing the problem.

    Another approach: change the model to allow a variancewhich is not constant.

    Errors i , i = 1, ..., n. Suppose we know Var(i ) = 2

    i

    (variance of errors not necessarily constant) or suppose weknow weights wi such that Var(i ) =

    2wi where 2 is

    unknown.

    Write V for covariance matrix of the errors. V is diagonalmatrix with diagonal elements the 2

    ior 2wi . V =

    2W

    where W is diagonal matrix of weights in second situation.

  • 24

    Week 7 Lecture 3 - Weighted regression

    Maximum likelihood estimator of :

    = (XV1X )1XV1y

    where X is the design matrix and y is vector of responses.

    Covariance matrix of ,

    (XV1X )1.

    When V = 2W , have

    = (XW1X )1XW1y

    and covariance matrix is

    2(XW1X )1.

  • 25

    Week 7 Lecture 3 - Weighted regression

    minimizesn

    i=1

    w1i

    (yi xi )2

    weighted least squares type criterion. Observations with largevariance are less reliable and get less weight.

  • 26

    Week 7 Lecture 3 - Weighted regression

    When suitable weights wi are known, or when 2

    iis known,

    weighted regression may be preferable to a variance stabilizingtransformation.

    Much of the theory of linear models for constant variance casecan be carried over.

    May be able to estimate weights from the data: if we havemultiple observations for each combination of predictor values,variances can be estimated from the data.

    Sometimes it may be natural to take the weights wi to beproportional to one of the predictors.

  • 27

    Week 7 Lecture 3 - Example: Transfer efficiency data

    Model equipment efficiency (response) as function of airvelocity and voltage

    Experiment conducted: For each of 2 levels of air velocity and voltage (4 in all) Ten observations each combination

    So can estimate variance for each (of 4) predictorcombinations

    Perform weighted regression

  • 28

    Week 7 Lecture 3 - Example: Transfer efficiency data

    Unweighted Regression:The regression equation isEfficiency = 143 0.927Voltage 0.138AirVelocityS = 5.40890 R-Sq = 79.2% R-Sq(adj) = 78.1%

    Weighted Regression:The regression equation isEfficiency = 142 0.924Voltage 0.124AirVelocityS = 0.983881 R-Sq = 87.3% R-Sq(adj) = 86.6%

  • 29

    Week 7 Lecture 3 - Transformations and nonlinearity

    Previously: transformations of the response to stabilize theerror variance.

    Transformations can also be helpful when there is evidence ofa nonlinear relationship between the response and predictors.

    Simplest case: interested in describing a relationship betweenresponse y and a single predictor x .

    Different nonlinear relationships can be captured bytransforming y , and incoporating transformation(s) of x into alinear model.

    Common nonlinear relationships: parabolic, hyperbolic,exponential, inverse exponential, power.

  • 30

    Week 7 Lecture 3 - Transformations and nonlinearity

    For the moment ignore random component of the model.Assume y is positive.

    Parabolic relationship between y and x :

    y = 0 + 1x + 2x2.

    Introduce new predictor x2 (a transformation of x).

    Hyperbolic relationship between y and x :

    y =x

    0 + 1x.

    Then1

    y= 1 + 0

    1

    x.

    Transform response to 1/y , use predictor 1/x

  • 31

    Week 7 Lecture 3 - Transformations and nonlinearity Exponential relationship:

    y = 0 exp(1x).

    Take logarithms,

    log y = log 0 + 1x .

    Inverse exponential relationship:

    y = 0 exp(1/x).

    Take logarithms,

    log y = log 0 + 1/x .

    So transform y to log y , use 1/x as predictor. Power relationship:

    y = 0x1 .

    Take logarithms,

    log y = log 0 + 1 log x .

  • 32

    Week 7 Lecture 3 - Transformations and model errorstructure

    Effect of transformations of the response on model errorstructure need to be carefully considered.

    Supposeyi = 0 exp(1xi ) + i

    where i are zero mean errors with constant variance. Taking logarithms of the mean of yi gives something linear in

    unknown parameters. But taking logarithms of both sides of above does not give a

    model of the form

    log yi = 0 + 1xi + i

    where the i are zero mean errors with constant variance. Effect of transformation on model errors needs to be

    considered. May be better to work with original nonlinearmodel: nonlinear regression (will be covered in later statisticscourses).

  • 33

    Week 7 Lecture 2 - Learning Expectations.

    Understand the use of variance stabilizing transformations.

    Be able to perform and work with weighted regression(including estimation of weights).

    Appreciate the role of transformations for modelling non-linearrelationships between a response and a single predictor.