Upload
katherine-bradley
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
ETM 620 - 09U1
Multiple regressionMore than one indicator variable may be responsible
for the variation we see in the response.Gas mileage is a function of weight, horsepower, use of air
conditioning, etc.Metal fatigue in airplanes is a function of number of takeoffs and
landings, climbout speed, landing speed, etc.Incidence of heart attack is a function of age, BMI, cholesterol
levels, etc.
If the function that defines the relationship between the indicator variables and the response is linear, then we have multiple linear regression, i.e.,
If a polynomial relationship between indicators and response is the best fit, then we have polynomial regression, e.g.,
ETM 620 - 09U1
y 0 1xk ...k xk
y 0 1x1 2x2 11x12 22x2
2 12x1x2
ETM 620 - 09U2
Multiple linear regression: Matrix approachThe viscosity of slurry is believed to be a function of
the temperature and the feed rate. A number of readings were taken with the following results:
Hypothesize the relationship,
Y = β0 + β1 x1 + β2 x2 + ε
and calculate the estimate,
ˆ y b0 b1x1 b2x2
ETM 620 - 09U2
TempFeed Rate Viscosity
80 8 2256
93 9 2340
100 10 2426
82 13 2293
90 11 2330
99 8 2368
81 8 2250
96 10 2409
94 12 2364
93 11 2379
97 13 2440
95 11 2364
100 8 2404
85 12 2317
86 9 2309
87 12 2328
ETM 620 - 09U3
Matrix form of the equationDefine the matrices:
2328
...
2330
2293
2426
2340
2256
Y
12871
...
11901
12821
101001
9931
8801
X
Bb0
b1
b2
ETM 620 - 09U3
ETM 620 - 09U4
General Matrix FormWe obtain the least squares estimates (b0, b1, b2)
of (β0, β1, β2) by solving the matrix equation:
for b, or
XTXbXTY
YXXXb TT 1)(
ETM 620 - 09U4
ETM 620 - 09U5
14.519
-0.132
7
-0.22
9
-0.132
70.001
40.00
02
-0.229
1
0.0002
0.0203
From Excel,
XTX = (XTX)-1 =
XTY =
b =
16 1458 165
14581335
60150
28
1651502
8175
137577
3429550
387855
1560.67
7.73
8.11
ETM 620 - 09U6
Or, using regression analysis on Excel
Regression Statistics
Multiple R0.9620594
25
R Square0.9255583
37
Adjusted R2
0.914105774
Std. Error16.515955
92
Observations 16
ANOVA
df SS MS FSignifican
ce F
Regression 2
44089.84
22045
80.82
4.64306E-08
Residual 133546.0
98272.
78
Total 1547635.
94
Coefficient
s Std Errt
Stat
P-valu
eLower 95% Upper 95%
Intercept1560.6678
862.932
0124.7
992E-12
1424.711536
1696.624225
Temp7.7281042
110.6248
8112.3
671E-08
6.378130266
9.078078155
Feed Rate8.1135634
812.3509
363.45
120.00
43.034676
02313.192450
94
ETM 620 - 09U7
How do we interpret these results?R2 – the degree to which the variability of the data is
accounted for in the modelwill naturally increase as number of regressor variables
increasesadjusted R2 – adjusted to reflect how well the addition
of new regressors improves the ability of the model to account for the variability in the data.adjusted R2 > R2 if the new term significantly decreases
MSE
adjusted R2 << R2 if the new term is not significantIn our example,
R2 = _______________ ; adj R2 = ________________Interpretation?
ETM 620 - 09U8
Confidence intervals around β values …Calculated by,
Given in the regression results …
Interpretation?
jjpnj Ct 2,2/
ˆˆ
Coefficients Std Err t StatP-
valueLower 95% Upper 95%
Intercept 1560.6678862.932
0124.79
9 2E-121424.71
15 1696.6242
Temp7.72810421
10.6248
8112.36
7 1E-086.37813
03 9.078078
Feed Rate8.11356348
12.3509
363.451
2 0.0043.03467
60 13.19245
ETM 620 - 09U9
A trickier example…The gas mileage for a passenger automobile is
believed to be a function of the weight of the car and the horsepower of the engine. Several cars were tested with the following results:
ETM 620 - 09U9
MPG, y
Wt., x1 HP, x2
26 3.4 16931 2.5 10620 3.8 30431 2.8 15524 3.6 21129 3.3 14020 3.3 21023 3.9 25524 4.1 25526 3.3 164
ETM 620 - 09U10
Regression results from Excel …
Regression Statistics
Multiple R0.8497
6
R Square0.7220
9Adjusted R Square
0.64269
Standard Error
2.39433
Observations 10ANOVA
df SS MS FSignifica
nce F
Regression 2104.
352.14
9.094
0.0113149
Residual 740.1
35.733
Total 9144.
4
Coefficients
Std Err
t Stat
P-valu
eLower 95%
Upper 95%
Intercept 36.7447.04
65.215
0.001
20.081785
53.406268
Wt., x1-
0.19173.16
9
-0.0
60.95
3
-7.68627
97.3029
757
HP, x2-
0.05430.02
5
-2.1
50.06
9
-0.11401
90.0054
114
ETM 620 - 09U11
Let’s try it in Minitab …What do the residuals look like?
What does the output of the regression tell us?
What do we get if we try “Stepwise Regression”?
ETM 620 - 09U12
Polynomial regression …Example: The expected yield of a crop of
marigolds is hypothesized to be a function of the days after the first bloom. Yield (in number of blooms) from a given plot was counted in one growing season with the results as given in the data file.
Step 1: plot the data …
ETM 620 - 09U13
Plot of the data …
Marigold Yields
0500
10001500200025003000350040004500
14 19 24 29 34 39 44 49
ETM 620 - 09U14
Fitting the polynomial …Hypothesize the model,
In Excel,
In Minitab,
2210 )()( daydayy
ETM 620 - 09U15
Indicator variablesAllows us to include qualitative factors in
regression analysis …machine typegrade of fueloperator
Example,In addition to SAT scores, an admissions officer is concerned that whether or not a student attended private high school might affect the freshman GPA. Data from 20 students resulted is given in the data file.Conduct the analysis and interpret the results …
ETM 620 - 09U16
Problems in multiple regressionMulticollinearity
Influential observations
Autocorrelation