Upload
matthew-goodwin
View
214
Download
0
Embed Size (px)
DESCRIPTION
n bn
Citation preview
2MATH2831/2931
Linear Models/ Higher Linear Models.
September 13, 2013
3Week 7 Lecture 3 - Last lecture:
DFFITS
DFBETAS
COVRATIO
4Week 7 Lecture 3 - This lecture:
Variance stabilizing transformations.
Examples: snow geese, inflation rates and central bankindependence.
Weighted regression
Transformations for modelling non-linear relationshipsbetween a response and a single predictor.
5Week 7 Lecture 3 - Transformations
Diagnostic measures detect violations of assumptions: how dowe deal with these violations?
One approach which is sometimes applicable:use of transformations.
Appropriateness of transformation and the type oftransformation used depends on the nature of the violations ofassumptions.
Are there problems with: constant variance, independence ofnoise random variables i , distributional assumption for noise,mean...
6Week 7 Lecture 3 - Transformations
This lecture: variance stabilizing transformations(appropriate when constancy of error variance is violated).
Applying a transformation to fix a violation of assumptionsmay cause another violation which did not appear on theoriginal scale.
What is the effect of transformation on the model for theerrors?
7Week 7 Lecture 3 - Variance stabilizing transformationsWrite y for a typical response. Common variance stabilizing transformations: Square root transformation,
y : appropriate when error
variance is proportional to the mean.Exercise: demonstrate this result by taking an
appropriate 1st order Taylor series expansion of the
response.
8Week 7 Lecture 3 - Variance stabilizing transformations
Write y for a typical response.
First order Taylor series expansion ofy about expected
value E(y) of y :y is approximately
E(y) +
1
2E(y)
(y E(y)).
From this variance ofy is approximately
1
4E(y)Var(y)
9Week 7 Lecture 3 - Variance stabilizing transformations
So if variance of y is proportional to the mean, variance ofy is approximately constant.
Square root transformation is often used with count data
if a Poisson distribution is an appropriate model for thecounts, then the variance is equal to the mean property ofPoisson distribution.
E(N) = Var(N) =
10
Week 7 Lecture 3 - Variance stabilizing transformationsWrite y for a typical response. Other common variance stabilizing transformations: log y : appropriate when standard deviation is proportional to
the mean.Exercise: demonstrate this result by taking an
appropriate 1st order Taylor series expansion of the
response.
11
Week 7 Lecture 3 - Variance stabilizing transformations
Other common transformations.
Log transformation: log y is approximately
logE(y) + (y E(y))/E(y)
giving variance of log y approximately
Var(y)
E(y)2.
If the standard deviation is proportional to the mean, logtransformation stabilizes the variance.
12
Week 7 Lecture 3 - Variance stabilizing transformations
Inverse transformation 1/y : useful when error standarddeviation is proportional to the square of the mean.
All the transformations weve considered are only appropriatewith a positive response.
When zeros occur, might use log(y + 1) or 1/(y + 1) insteadof log and inverse transformations.
13
Week 7 Lecture 3 - Variance stabilizing transformations
Freeman-Tukey transformationy +
y + 1: useful when
variance of y is proportional to the mean and some of theresponses are zero or very small.
sin1(y): useful when variance of y is proportional to
E(y)(1E(y)) such as binomial proportions where 0 y 1.
14
Week 7 Lecture 3 - Comparing models for transformationsof the response
If were building a model for predictive purposes, were usuallyinterested in predictions on the original scale.
We cant compare models for a response y and atransformation z = f (y) (f invertible) of that response bylooking at R2 or for the two different models.
How should we compare models then?
15
Week 7 Lecture 3 - Comparing models for transformationsof the response
Develop statistic for model for transformed response z whichcan be compared with the PRESS statistic for the model for y .
Write zi ,i : prediction of zi based on fit to all the data withith observation deleted.
Prediction of yi (original scale): f1(zi ,i ).
16
Week 7 Lecture 3 - Comparing models for transformationsof the response
Analogue of PRESS residual on original scale:
yi f 1(zi ,i ).Compare
n
i=1
(yi f 1(zi ,i ))2
with PRESS statistic, or
n
i=1
|yi f 1(zi ,i )|
with sum of absolute PRESS residuals.
Same idea can be used to compare models for two differenttransformations of the response.
17
Week 7 Lecture 3 - Snow Geese
Aerial survey counting methods to estimate Snow Geesenumbers (Hudson Bay, Canada)
Test reliability of expert counters
Record expert estimated number of birds, compare to exactphoto numbers
Common modelling for count data is Poisson distribution(mean=variance)
Square root transformation expected
Also consider log-transformation.
18
Week 7 Lecture 3 - Snow Geese
Transformation is improvement.Still concern that variance increases with the mean - try log?
19
Week 7 Lecture 3 - Snow Geese
Log-transform appears better than square root transform.However, may work with square root for interpretability (data arecounts).
20
Week 7 Lecture 3 - Snow Geese
Comparing the PRESS statistics on each scale:Original square root transform log-transformed
n
i=1(yi f
1(zi,i ))2 172,738 137,603 122,704
n
i=1|yi f
1(zi,i )| 1,475.55 1,295.89 1,257.81
Two statistics are consistent (not always the case see next example)
Log-transformation again seems to give better fit.
21
Week 7 Lecture 3 - Inflation rates
22
Week 7 Lecture 3 - Inflation rates
Comparing the PRESS statistics on each scale:Original log-transformed
n
i=1(yi f 1(zi ,i ))2 16,071 21,611
n
i=1|yi f 1(zi ,i )| 433.84 431.19
Conflict between two statistics is due to outlier.
Log-transformation preferred on squared-PRESS statistic ifoutlier removed.
23
Week 7 Lecture 3 - Weighted regression
If constant error variance assumptions seems to be violated,transformation is one approach to fixing the problem.
Another approach: change the model to allow a variancewhich is not constant.
Errors i , i = 1, ..., n. Suppose we know Var(i ) = 2
i
(variance of errors not necessarily constant) or suppose weknow weights wi such that Var(i ) =
2wi where 2 is
unknown.
Write V for covariance matrix of the errors. V is diagonalmatrix with diagonal elements the 2
ior 2wi . V =
2W
where W is diagonal matrix of weights in second situation.
24
Week 7 Lecture 3 - Weighted regression
Maximum likelihood estimator of :
= (XV1X )1XV1y
where X is the design matrix and y is vector of responses.
Covariance matrix of ,
(XV1X )1.
When V = 2W , have
= (XW1X )1XW1y
and covariance matrix is
2(XW1X )1.
25
Week 7 Lecture 3 - Weighted regression
minimizesn
i=1
w1i
(yi xi )2
weighted least squares type criterion. Observations with largevariance are less reliable and get less weight.
26
Week 7 Lecture 3 - Weighted regression
When suitable weights wi are known, or when 2
iis known,
weighted regression may be preferable to a variance stabilizingtransformation.
Much of the theory of linear models for constant variance casecan be carried over.
May be able to estimate weights from the data: if we havemultiple observations for each combination of predictor values,variances can be estimated from the data.
Sometimes it may be natural to take the weights wi to beproportional to one of the predictors.
27
Week 7 Lecture 3 - Example: Transfer efficiency data
Model equipment efficiency (response) as function of airvelocity and voltage
Experiment conducted: For each of 2 levels of air velocity and voltage (4 in all) Ten observations each combination
So can estimate variance for each (of 4) predictorcombinations
Perform weighted regression
28
Week 7 Lecture 3 - Example: Transfer efficiency data
Unweighted Regression:The regression equation isEfficiency = 143 0.927Voltage 0.138AirVelocityS = 5.40890 R-Sq = 79.2% R-Sq(adj) = 78.1%
Weighted Regression:The regression equation isEfficiency = 142 0.924Voltage 0.124AirVelocityS = 0.983881 R-Sq = 87.3% R-Sq(adj) = 86.6%
29
Week 7 Lecture 3 - Transformations and nonlinearity
Previously: transformations of the response to stabilize theerror variance.
Transformations can also be helpful when there is evidence ofa nonlinear relationship between the response and predictors.
Simplest case: interested in describing a relationship betweenresponse y and a single predictor x .
Different nonlinear relationships can be captured bytransforming y , and incoporating transformation(s) of x into alinear model.
Common nonlinear relationships: parabolic, hyperbolic,exponential, inverse exponential, power.
30
Week 7 Lecture 3 - Transformations and nonlinearity
For the moment ignore random component of the model.Assume y is positive.
Parabolic relationship between y and x :
y = 0 + 1x + 2x2.
Introduce new predictor x2 (a transformation of x).
Hyperbolic relationship between y and x :
y =x
0 + 1x.
Then1
y= 1 + 0
1
x.
Transform response to 1/y , use predictor 1/x
31
Week 7 Lecture 3 - Transformations and nonlinearity Exponential relationship:
y = 0 exp(1x).
Take logarithms,
log y = log 0 + 1x .
Inverse exponential relationship:
y = 0 exp(1/x).
Take logarithms,
log y = log 0 + 1/x .
So transform y to log y , use 1/x as predictor. Power relationship:
y = 0x1 .
Take logarithms,
log y = log 0 + 1 log x .
32
Week 7 Lecture 3 - Transformations and model errorstructure
Effect of transformations of the response on model errorstructure need to be carefully considered.
Supposeyi = 0 exp(1xi ) + i
where i are zero mean errors with constant variance. Taking logarithms of the mean of yi gives something linear in
unknown parameters. But taking logarithms of both sides of above does not give a
model of the form
log yi = 0 + 1xi + i
where the i are zero mean errors with constant variance. Effect of transformation on model errors needs to be
considered. May be better to work with original nonlinearmodel: nonlinear regression (will be covered in later statisticscourses).
33
Week 7 Lecture 2 - Learning Expectations.
Understand the use of variance stabilizing transformations.
Be able to perform and work with weighted regression(including estimation of weights).
Appreciate the role of transformations for modelling non-linearrelationships between a response and a single predictor.