1 Chapter 5 Transformations and Weighting to Correct Model
Inadequacies Ray-Bing Chen Institute of Statistics National
University of Kaohsiung
Slide 2
2 5.1 Introduction Recall several implicit assumptions: 1.The
model errors have mean zero and constant variance and are
uncorrelated. 2.The model errors have a normal distribution 3.The
form of the model, including the specification of the regressors,
is correct. Plots of residuals are very powerful methods for
detecting violations of these basic regression assumptions.
Slide 3
3 In this chapter, we focus on methods and procedures for
building regression models when some of the above assumptions are
violated. We place considerable emphasis on data transformation.
The method of weighted least squares is also useful in building
regression model in situations where some of the underlying
assumptions are violated.
Slide 4
4 5.2 Variance-stabilizing Transformations The assumption of
constant variance is a basic requirement of regression analysis. A
common reason for the violation of this assumption is for the
response variable y to follow a probability distribution in which
the variance is functionally related to the mean. For example:
Poisson r.v.
Slide 5
5 Several common used variance-stabilizing transformations are
in Table 5.1
Slide 6
6 The strength of a transformations depends on the amount of
curvature that it induces. Sometimes we can use prior experience or
theoretical considerations to guide us in selecting an appropriate
transformation. In these caes the appropriate transformation must
be selected empirically. If we do not correct the non-constant
error variance problem, then the least-squares estimators will
still be unbiased, but they will no longer have the minimum
variance property. That is the regression coefficients will have
larger standard errors than necessary.
Slide 7
7 When the response variable has been reexpressed, the
predicted values are in the transformed scale. The predicted values
=> the original units Confidence or prediction intervals Example
5.1 The Electric Utility Data: Develop a model relating peak hour
demand (y) to total energy usage during the month (x). Data (Table
5.2): 53 residential customers for the month of August, 1979 Figure
5.1: the scatter plot of data
Slide 8
8
Slide 9
9 A simple linear regression model is assumed: ANOVA is shown
in Table 5.3 A plot of the R-student residuals v.s. the fitted
values is shown in Figure 5.2 From Figure 5.2, the residuals form
an outward- opening funnel, indicating that the error variance is
increasing as energy consumption increases.
Slide 10
10
Slide 11
11 Suggest The R-student values from this least-squares fit are
plot against the new fitted values in Figure 5.3 From Figure 5.3,
the variance should be stable.
Slide 12
12
Slide 13
13 5.3 Transformations to Linearize the Model The assumption of
linear relationship between y and the regressors Nonlinearity may
be detected via: Lack-of-fit test, scatter diagram, the matrix of
scatter-plots, or residual plots such as the partial regression
plot. Some nonlinear models are called intrinsically or
transformably linear if the corresponding nonlinear function can be
linearized by using a suitable transformation.
Slide 14
14 Figure 5.4 and Table 5.4
Slide 15
15
Slide 16
16 For example: Example 5.2 The Windmill Data A research
engineer is investigating the use of a windmill to generate
electricity. He collected the data on the DC output (y) and the
corresponding wind velocity (x). Data is listed in Table 5.5.
Figure 5.5 is the scatter diagogram.
Slide 17
17
Slide 18
18 From Figure 5.5, the relationship between y and x may be
nonlinear. Assume the simple linear regression model: and the
summary statistics for this model are R 2 = 0.8745, MS Res = 0.0557
and F 0 = 160.26 A plot of the residuals versus the fitted values
is shown in Figure 5.6. From this plot, clearly some other model
form should be considered.
Slide 19
19
Slide 20
20 According to some reasons, the new model is assumed to be y
= 0 + 1 (1/x) + Figure 5.7 is a scatter diagram with the
transformed variable x = 1/x. The new fitted regression model is
The summary statistics are R 2 = 0.9800, MS Res = 0.0089 and F 0 =
1128.43 Figure 5.8: R-student residuals from the transformed model
v.s. the fitted values. Figure 5.9: The normal probability plot
(heavy tails)
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25 5.4 Analysis Methods for Selecting a Transformation While in
many instances transformation are selected empirically, more
formal, objective techniques can be applied to help specify an
appropriate transformation. 5.4.1 Transformation on y: The Box-Cox
Method Want to transform y to correct nonnormality and/or
nonconstant variance. Power transformation: y
Slide 26
26 Box and Cox (1964) show how the parameters of the regression
model and can be estimated simultaneously using the method of
maximum likelihood. Use Where is the geometric mean of the
observations and fit the model is related to the Jocobian of the
transformation converting the response variable y into
Slide 27
27 Computation Procedure: Choose to minimize SS Res ( ) Use
10-20 values of to compute SS Res ( ). Then plot SS Res ( ) v.s..
Finally read the value of that minimizes SS Res ( ) from graph. A
second iteration can be performed using a finer mesh of values if
desired. Cannot select by directly comparing residual sum of
squares from the regressions of on x because of a different scale.
Once is selected, the analyst is free to fit the model using y ( 0)
or ln y ( = 0).
Slide 28
28 An Approximate Confidence Interval for The C.I. can be
useful in selecting the final value for. For example: if the 0.596
is the minimizing value of SS Res ( ), but if 0.5 is in the C.I.,
then we would prefer choose = 0.5. If 1 is in the C.I., then no
transformation may be necessary. Maximize An approximate 100(1- )%
C.I. for is
Slide 29
29 Let can be approximated by or where is the number of
residual degrees of freedom. This is based on exp(x) = 1 + x + x 2
/2! +
Slide 30
30 Example 5.3 The Electric Utility Data Use the Box-Cox
procedure to select a variance- stabilizing transformation. The
values of SS Res ( ) for various values of are shown in Table 5.7 A
graph of the residual sum of squares v.s. is shown in Figure 5.10.
The optimal value = 0.5 Find an approximate 95% C.I. The critical
sum of squares SS* is 104.23. Then the C.I. is [0.26,0.80].
Slide 31
31
Slide 32
32 5.4.2 Transformation on the Regressor Variables Suppose that
the relationship between y and one or more of the regressor
variables is nonlinear but that the usual assumptions of normally
and independently distribution responses with constant variance are
at least approximately satisfied. Select an appropriate
transformation on the regressor variables so that the relationship
between y and the transformed regressor is as simple as possible.
Box and Tidwell (1962) describe an analytical procedure for
determining the form of transfomation on x.
Slide 33
33 Assume that y is related to = x
Slide 34
34
Slide 35
35 Box and Tidwell (1962) note that this procedure usually
converges quite rapidly and often the first- stage result 1 is a
satisfactory estimate of . Convergence problem may be encountered
in case where the error standard deviation is large or when the
range of the regressor is very samll compared to its mean. Example
5.4 The Windmill Data Figure 5.5 suggests that the relationship
between y and x is not a straight line!
Slide 36
36
Slide 37
37 5.5 Generalized and Weighted Least Squares Linear regression
models with nonconstant error variance can also be fitted by the
method of weighted least squares. Choose weight w i 1/ Var( i ) For
the simple linear regression, The normal equations:
Slide 38
38 5.5.1 Generalized Least Squares Model: For , assume that E(
) = 0 and Var( ) = 2 V Since 2 V is a covariance matrix, V must be
nonsingular and positive definite. So there exists a matrix K such
that V = KK. K is also called the square root of V. New model: For
this new model, z = K -1 y, B = K -1 X and g = K -1 Or z = (K) -1
y, B = (K) -1 X and g = (K) -1
Slide 39
39 E(g) = 0, Var(g) = 2 I The least-squares functions: S( )=(y
- X ) V -1 (y - X )
Slide 40
40
Slide 41
41 5.5.2 Weighted Least Squares Assume The estimation procedure
is usually called weighted least squares. W = V -1 is also a
diagonal matrix with diagonal elements (weights) w 1, , w n
Slide 42
42 The normal equation: The weighted least-squares estimator:
Transformed set of data
Slide 43
43 5.5.3 Some Practical Issues To use weighted least-squares,
the weights w i must be known! Sometimes prior knowledge or
experience or information from a theoretical model can be used to
determine the weights. Alternatively, residual analysis may
indicate that the variance of the errors may be a function of one
of the regressors, say Var( i ) = 2 x ij, i.e. w i = 1/x ij In some
cases y i is actually an average of n i observations at x i and if
all original observations have constant variance 2, then Var(y i )
= Var( i ) = 2 /n i, i.e. w i = n i
Slide 44
44 Another possible: inversely proportional to the variances of
the measurement error. Several iterations: Guess at the weights
=> Perform the analysis => reestimate the weights When Var( )
= 2 V and V I, the ordinary least- square estimator is still
unbiased. The corresponding covariance matrix This estimator is no
longer a minimum variance estimator, because the covariance matrix
of the generalized least-squares estimators gives the smaller
variances for the regression coefficients.
Slide 45
45 Example 5.5 Weighted Least Squares 30 restaurants: the
average monthly food sale(y) v.s. the annual advertising expenses
(x)(Table 5.9) Use the ordinary least-squares, Figure 5.11: the
residuals v.s. the fitted values. This figure indicates violation
of the constant variance assumption. Consider the near-neighbors as
the repeat points.
Slide 46
46
Slide 47
47 Let the fitted values from the above equations to be the
inverse of weights. The weighted least-squares: Plot weighted
residuals ( ) v.s. the fitted values ( ). See Figure 5.12 For
several regressors, it is not easy to identify the near-neighbors.
Be careful to check if the weights procedure is reasonable or
not!