Upload
xinearpinger
View
215
Download
0
Embed Size (px)
Citation preview
8/4/2019 SDA 3E Chapter 6 (2)
1/36
2007 Pearson Education
Chapter 6: RegressionAnalysis
Part 2: Multiple Regression
8/4/2019 SDA 3E Chapter 6 (2)
2/36
Model Form Multiple linear regression model:
Y = b0+ b1 X1+ b2 X2 + ... + bkXk+ e
Predicted model:
Y = b0 + b1X1 + b2 X2 + ... + bkXk
The bs are called partial regression coefficients.
8/4/2019 SDA 3E Chapter 6 (2)
3/36
Example: 2000 NFL DataGames Won = b0 + b1 Yards Gained + b2 Takeaways + b3Giveaways + b4 Yards Allowed + b5 Points Scored + e
Games Won = 8.29 + 0.00074 Yards Gained + 0.1001Takeaways - 0.0839 Giveaways - 0.0018 Yards Allowed +0.0138 Points Scored
8/4/2019 SDA 3E Chapter 6 (2)
4/36
Excel Tool Results
8/4/2019 SDA 3E Chapter 6 (2)
5/36
Interpreting Results Regression statistics similar to single
independent variable case
R Square (coefficient of multiple determination) The value .779 indicates that about 78% of the variation
in games won can be explained by the variation in theindependent variables.
Adjusted R2 accounts for sample size and numberof independent variables.
8/4/2019 SDA 3E Chapter 6 (2)
6/36
ANOVA Results Significance of regression
H0: b1 = b2= = bk= 0
H1: at least one bj is not 0
Note: df for residual is n k1; df for regression is k
8/4/2019 SDA 3E Chapter 6 (2)
7/36
Residual Plots
8/4/2019 SDA 3E Chapter 6 (2)
8/36
Test for Individual
Coefficients
H0: bj = 0 vs. H1: bj 0
t = bj/standard error, with n k 1 df
Confidence intervals: bj tn-k-1 s.e.
8/4/2019 SDA 3E Chapter 6 (2)
9/36
Building Good Models Include only significant independent variables. Use
the fewest necessary to permit adequateinterpretation of the dependent variable.
10 variables has potentially 210 = 1024 models!
As you add more explanatory variables to a model, R2increases (even if the variables are irrelevant).However, the Adjusted R2 could either increase ordecrease, thus providing information about the valueof additional variables.
1
1
SST
SSE1RAdjusted 2
kn
n
8/4/2019 SDA 3E Chapter 6 (2)
10/36
Model After Dropping Yards
Gained
Adjusted R2
increases slightly
8/4/2019 SDA 3E Chapter 6 (2)
11/36
Model After Dropping
Takeaways
8/4/2019 SDA 3E Chapter 6 (2)
12/36
Best Subsets Regression Evaluates all possible
models or those
containing a fixednumber of independentvariables to identify thebest.
Selects appropriatemodels based on Cp
PHStatoutput
8/4/2019 SDA 3E Chapter 6 (2)
13/36
Stepwise Regression Best subsets is not always practical.
Stepwise regression is a search processthat adds or deletes variables at eachstep until no changes can improve themodel.
8/4/2019 SDA 3E Chapter 6 (2)
14/36
PHStatTool: Stepwise
Regression PHStatmenu > Regression> Stepwise
Regression
Enter variable ranges
Select stepwise criteria
Select type of method touse: general, forwardselection, or backwardelimination
8/4/2019 SDA 3E Chapter 6 (2)
15/36
Stepwise Regression Results
Forward Selection
First model
Second model
Final model
8/4/2019 SDA 3E Chapter 6 (2)
16/36
Multicollinearity Multicollinearity when two or more independent
variables contain high levels of the sameinformation.
The independent variables predict each otherbetter than the dependent variable, making itdifficult to interpret the regression coefficients andlead to poor statistical conclusions.
Effects: Estimates of the regression coefficientsare unstable depending on which variables arepresent, signs may be opposite of expectations,and p-values can be inflated
8/4/2019 SDA 3E Chapter 6 (2)
17/36
Correlation Matrix
The correlation between Points Scored and Yards Gainedis larger than any correlation between Games Won andother independent variables.
8/4/2019 SDA 3E Chapter 6 (2)
18/36
Measuring Multicollinearity Variance Inflation Factor, VIF =
Option in PHStatroutine (be sure to checkthe box).
If no multicollinearity, VIF = 1
Researchers suggest that VIF should be nogreater than 5
21
1
jr
8/4/2019 SDA 3E Chapter 6 (2)
19/36
Variance Inflation Factors
8/4/2019 SDA 3E Chapter 6 (2)
20/36
Models with Categorical
Independent Variables Examples
Gender (male, female)
College graduate (no, 2-year degree, 4-year degree, postgraduate degree)
Own home (yes, no)
8/4/2019 SDA 3E Chapter 6 (2)
21/36
Example How do age and MBA degree affect
employee salaries?
Y=b0+b1X1+b2X2+ e
where
Y= salary
X1= age
X2= MBA indicator (0 = No; 1 = Yes)
8/4/2019 SDA 3E Chapter 6 (2)
22/36
Results
8/4/2019 SDA 3E Chapter 6 (2)
23/36
Model Salary = 893.59 + 1044.15 Age + 14767.23
MBA No MBA: Salary = 893.59 + 1044.15 Age
MBA: Salary = 15660.82 + 1044.15 Age
The models suggest that the rate of salaryincrease for age is the same for both groups.
However, individuals with MBAs might earnrelatively higher salaries as they get older. Inother words, the slope ofAgemay depend onthe value ofMBA. Such a dependence iscalled an interaction.
8/4/2019 SDA 3E Chapter 6 (2)
24/36
Interaction Model Y = b0 + b1Age + b2MBA + b3Age*MBA + e
8/4/2019 SDA 3E Chapter 6 (2)
25/36
Results With Interaction Term
8/4/2019 SDA 3E Chapter 6 (2)
26/36
Final Model
8/4/2019 SDA 3E Chapter 6 (2)
27/36
Model Results Salary = 3323.11 + 984.25 Age +
425.58 MBA*Age
No MBA: Salary = 3323.11 + 984.25 Age +425.58 (0)*Age
= 3323.11 + 984.25 Age
MBA: Salary = 3323.11 + 984.25 Age +425.58 (1)*Age
= 3323.11 + 1409.83 Age
8/4/2019 SDA 3E Chapter 6 (2)
28/36
Categorical Variables With
More Than Two Levels For k > 2 levels, add k-1 additional variables.
Example: The Excel file Surface Finish.xls
provides measurements of the surface finishof 35 parts produced on a lathe, along withthe revolutions per minute (RPM) of thespindle and one of four types of cutting tools
used. The engineer who collected the data isinterested in predicting the surface finish as afunction of RPM and type of tool.
8/4/2019 SDA 3E Chapter 6 (2)
29/36
Model Y = b0 + b1X1 + b2X2 + b3X3 + b4X4 + e
where
Y = surface finishX1 = RPM
X2 = tool type B
X3 = tool type C
X4 = tool type D
Tool Type X2 X3 X4A 0 0 0B 1 0 0C 0 1 0D 0 0 1
Tool Type A: Y=b0+b1X1+ e
Tool Type B: Y=b0+b1X1+b2+e
Tool Type C: Y=b0+b1X1+b3 +e
Tool Type D: Y=b0+b1X1+b4 +e
8/4/2019 SDA 3E Chapter 6 (2)
30/36
Regression Results
Y
Y
Y
Y
8/4/2019 SDA 3E Chapter 6 (2)
31/36
Results Surface Finish = 24.49 + 0.098 RPM 13.31 Type B
20.49 Type C 26.04 Type D
Tool A: Surface Finish = 24.49 + 0.098 RPM 13.31(0) 20.49(0)
26.04(0) = 24.49 + 0.098 RPM
Tool B: Surface Finish = 24.49 + 0.098 RPM 13.3(1) 20.49(0)26.04(0) = 11.18 + 0.098 RPM
Tool C: Surface Finish = 24.49 + 0.098 RPM 13.31(0) 20.49(1)26.04(0) = 4.00 + 0.098 RPM
Tool D: Surface Finish = 24.49 + 0.098 RPM 13.3(0) 20.49(0)26.04(1) = -1.55 + 0.098 RPM
8/4/2019 SDA 3E Chapter 6 (2)
32/36
Nonlinear Models Interaction terms (X1*X2) or nonlinear variables
(X22) do not make a model nonlinear; linear
regression still applies because the model is linear in
the parameters:
Y = b0+ b1 X1+ b2 X2 + b3X1X2 + b4X32 + e
However, if the parameters are nonlinear (Y = aXb),
then you must try to transform the model or use anonlinear regression technique.
8/4/2019 SDA 3E Chapter 6 (2)
33/36
Example: Energy Imports
Delete explainablevariation due to oil
embargo
8/4/2019 SDA 3E Chapter 6 (2)
34/36
Residual Plot
Residual plot suggests nonlinearity
8/4/2019 SDA 3E Chapter 6 (2)
35/36
Alternative ModelsLinear model (R2 = 0.944)
Total Imports = -938764498.6 + 481246.5728*Year
2nd order polynomial:Y = b0 + b1X + b2X
2 + e (R2= 0.985)
Total Imports = 3292116394
33822316.2*Year +8687.66*Year2
8/4/2019 SDA 3E Chapter 6 (2)
36/36
Exponential ModelY = aebX (R2 = 0.973)
Transformation: Ln Y = Ln a + bX
Model: Ln Y = -87.14 + 0.052*Year
Original variables: Y = 1.43E-38 e0.052X