21Week7Lect3

2MATH2831/2931

Linear Models/ Higher Linear Models.

September 13, 2013

3Week 7 Lecture 3 - Last lecture:

DFFITS

DFBETAS

COVRATIO

4Week 7 Lecture 3 - This lecture:

Variance stabilizing transformations.

Examples: snow geese, inflation rates and central bankindependence.

Weighted regression

Transformations for modelling non-linear relationshipsbetween a response and a single predictor.

5Week 7 Lecture 3 - Transformations

Diagnostic measures detect violations of assumptions: how dowe deal with these violations?

One approach which is sometimes applicable:use of transformations.

Appropriateness of transformation and the type oftransformation used depends on the nature of the violations ofassumptions.

Are there problems with: constant variance, independence ofnoise random variables i , distributional assumption for noise,mean...

6Week 7 Lecture 3 - Transformations

This lecture: variance stabilizing transformations(appropriate when constancy of error variance is violated).

Applying a transformation to fix a violation of assumptionsmay cause another violation which did not appear on theoriginal scale.

What is the effect of transformation on the model for theerrors?

7Week 7 Lecture 3 - Variance stabilizing transformationsWrite y for a typical response. Common variance stabilizing transformations: Square root transformation,

y : appropriate when error

variance is proportional to the mean.Exercise: demonstrate this result by taking an

appropriate 1st order Taylor series expansion of the

response.

8Week 7 Lecture 3 - Variance stabilizing transformations

Write y for a typical response.

First order Taylor series expansion ofy about expected

value E(y) of y :y is approximately

E(y) +

1

2E(y)

(y E(y)).

From this variance ofy is approximately

1

4E(y)Var(y)

9Week 7 Lecture 3 - Variance stabilizing transformations

So if variance of y is proportional to the mean, variance ofy is approximately constant.

Square root transformation is often used with count data

if a Poisson distribution is an appropriate model for thecounts, then the variance is equal to the mean property ofPoisson distribution.

E(N) = Var(N) =

10

Week 7 Lecture 3 - Variance stabilizing transformationsWrite y for a typical response. Other common variance stabilizing transformations: log y : appropriate when standard deviation is proportional to

the mean.Exercise: demonstrate this result by taking an

appropriate 1st order Taylor series expansion of the

response.

11

Week 7 Lecture 3 - Variance stabilizing transformations

Other common transformations.

Log transformation: log y is approximately

logE(y) + (y E(y))/E(y)

giving variance of log y approximately

Var(y)

E(y)2.

If the standard deviation is proportional to the mean, logtransformation stabilizes the variance.

12


Inverse transformation 1/y : useful when error standarddeviation is proportional to the square of the mean.

All the transformations weve considered are only appropriatewith a positive response.

When zeros occur, might use log(y + 1) or 1/(y + 1) insteadof log and inverse transformations.

13


Freeman-Tukey transformationy +

y + 1: useful when

variance of y is proportional to the mean and some of theresponses are zero or very small.

sin1(y): useful when variance of y is proportional to

E(y)(1E(y)) such as binomial proportions where 0 y 1.

14

Week 7 Lecture 3 - Comparing models for transformationsof the response

If were building a model for predictive purposes, were usuallyinterested in predictions on the original scale.

We cant compare models for a response y and atransformation z = f (y) (f invertible) of that response bylooking at R2 or for the two different models.

How should we compare models then?

15


Develop statistic for model for transformed response z whichcan be compared with the PRESS statistic for the model for y .

Write zi ,i : prediction of zi based on fit to all the data withith observation deleted.

Prediction of yi (original scale): f1(zi ,i ).

16


Analogue of PRESS residual on original scale:

yi f 1(zi ,i ).Compare

n

i=1

(yi f 1(zi ,i ))2

with PRESS statistic, or

n

i=1

|yi f 1(zi ,i )|

with sum of absolute PRESS residuals.

Same idea can be used to compare models for two differenttransformations of the response.

17

Week 7 Lecture 3 - Snow Geese

Aerial survey counting methods to estimate Snow Geesenumbers (Hudson Bay, Canada)

Test reliability of expert counters

Record expert estimated number of birds, compare to exactphoto numbers

Common modelling for count data is Poisson distribution(mean=variance)

Square root transformation expected

Also consider log-transformation.

18


Transformation is improvement.Still concern that variance increases with the mean - try log?

19


Log-transform appears better than square root transform.However, may work with square root for interpretability (data arecounts).

20


Comparing the PRESS statistics on each scale:Original square root transform log-transformed

n

i=1(yi f

1(zi,i ))2 172,738 137,603 122,704

n

i=1|yi f

1(zi,i )| 1,475.55 1,295.89 1,257.81

Two statistics are consistent (not always the case see next example)

Log-transformation again seems to give better fit.

21

Week 7 Lecture 3 - Inflation rates

22

Week 7 Lecture 3 - Inflation rates

Comparing the PRESS statistics on each scale:Original log-transformed

n

i=1(yi f 1(zi ,i ))2 16,071 21,611

n

i=1|yi f 1(zi ,i )| 433.84 431.19

Conflict between two statistics is due to outlier.

Log-transformation preferred on squared-PRESS statistic ifoutlier removed.

23

Week 7 Lecture 3 - Weighted regression

If constant error variance assumptions seems to be violated,transformation is one approach to fixing the problem.

Another approach: change the model to allow a variancewhich is not constant.

Errors i , i = 1, ..., n. Suppose we know Var(i ) = 2

i

(variance of errors not necessarily constant) or suppose weknow weights wi such that Var(i ) =

2wi where 2 is

unknown.

Write V for covariance matrix of the errors. V is diagonalmatrix with diagonal elements the 2

ior 2wi . V =

2W

where W is diagonal matrix of weights in second situation.

24


Maximum likelihood estimator of :

= (XV1X )1XV1y

where X is the design matrix and y is vector of responses.

Covariance matrix of ,

(XV1X )1.

When V = 2W , have

= (XW1X )1XW1y

and covariance matrix is

2(XW1X )1.

25


minimizesn

i=1

w1i

(yi xi )2

weighted least squares type criterion. Observations with largevariance are less reliable and get less weight.

26


When suitable weights wi are known, or when 2

iis known,

weighted regression may be preferable to a variance stabilizingtransformation.

Much of the theory of linear models for constant variance casecan be carried over.

May be able to estimate weights from the data: if we havemultiple observations for each combination of predictor values,variances can be estimated from the data.

Sometimes it may be natural to take the weights wi to beproportional to one of the predictors.

27

Week 7 Lecture 3 - Example: Transfer efficiency data

Model equipment efficiency (response) as function of airvelocity and voltage

Experiment conducted: For each of 2 levels of air velocity and voltage (4 in all) Ten observations each combination

So can estimate variance for each (of 4) predictorcombinations

Perform weighted regression

28

Week 7 Lecture 3 - Example: Transfer efficiency data

Unweighted Regression:The regression equation isEfficiency = 143 0.927Voltage 0.138AirVelocityS = 5.40890 R-Sq = 79.2% R-Sq(adj) = 78.1%

Weighted Regression:The regression equation isEfficiency = 142 0.924Voltage 0.124AirVelocityS = 0.983881 R-Sq = 87.3% R-Sq(adj) = 86.6%

29

Week 7 Lecture 3 - Transformations and nonlinearity

Previously: transformations of the response to stabilize theerror variance.

Transformations can also be helpful when there is evidence ofa nonlinear relationship between the response and predictors.

Simplest case: interested in describing a relationship betweenresponse y and a single predictor x .

Different nonlinear relationships can be captured bytransforming y , and incoporating transformation(s) of x into alinear model.

Common nonlinear relationships: parabolic, hyperbolic,exponential, inverse exponential, power.

30

Week 7 Lecture 3 - Transformations and nonlinearity

For the moment ignore random component of the model.Assume y is positive.

Parabolic relationship between y and x :

y = 0 + 1x + 2x2.

Introduce new predictor x2 (a transformation of x).

Hyperbolic relationship between y and x :

y =x

0 + 1x.

Then1

y= 1 + 0

1

x.

Transform response to 1/y , use predictor 1/x

31

Week 7 Lecture 3 - Transformations and nonlinearity Exponential relationship:

y = 0 exp(1x).

Take logarithms,

log y = log 0 + 1x .

Inverse exponential relationship:

y = 0 exp(1/x).

Take logarithms,

log y = log 0 + 1/x .

So transform y to log y , use 1/x as predictor. Power relationship:

y = 0x1 .

Take logarithms,

log y = log 0 + 1 log x .

32

Week 7 Lecture 3 - Transformations and model errorstructure

Effect of transformations of the response on model errorstructure need to be carefully considered.

Supposeyi = 0 exp(1xi ) + i

where i are zero mean errors with constant variance. Taking logarithms of the mean of yi gives something linear in

unknown parameters. But taking logarithms of both sides of above does not give a

model of the form

log yi = 0 + 1xi + i

where the i are zero mean errors with constant variance. Effect of transformation on model errors needs to be

considered. May be better to work with original nonlinearmodel: nonlinear regression (will be covered in later statisticscourses).

33

Week 7 Lecture 2 - Learning Expectations.

Understand the use of variance stabilizing transformations.

Be able to perform and work with weighted regression(including estimation of weights).

Appreciate the role of transformations for modelling non-linearrelationships between a response and a single predictor.

Documents

21Week7Lect3