Upload
bree-brewer
View
53
Download
0
Embed Size (px)
DESCRIPTION
Welcome to Chapter 14 MBA 541. B ENEDICTINE U NIVERSITY Regression and Correlation Multiple Regression and Correlation Analysis Chapter 14. Chapter 14. Please, Read Chapter 14 in Lind before viewing this presentation. Statistical Techniques in Business & Economics Lind. Goals. - PowerPoint PPT Presentation
Citation preview
14-1
Welcome to Chapter 14 MBA 541
BENEDICTINE UNIVERSITY
• Regression and Correlation
• Multiple Regression and Correlation Analysis
• Chapter 14
14-2
Chapter 14
Please, Read Chapter 14 in
Lind before viewing this presentation.
StatisticalTechniques in
Business &Economics
Lind
14-3
Goals
When you have completed this chapter, you will be able to:
• ONE– Describe the relationship between two or more
independent variables and the dependent variable using a multiple regression equation.
• TWO– Compute and interpret the multiple standard error of
estimate and the coefficient of determination.• THREE
– Interpret a correlation matrix.
14-4
Goals
When you have completed this chapter, you will be able to:
• FOUR– Set up and interpret an ANOVA table.
• FIVE– Conduct a test of hypothesis to determine if any of the
set of regression coefficients differ from zero.• SIX
– Conduct a test of hypothesis on each of the regression coefficients.
14-5
Multiple Regression Analysis
• The general multiple regression with k independent variables is given by:
where: a is the Y-intercept, andX1 to Xk are the independent
variables.
• Greek letters are used for a (α) and b (β) when denoting population parameters.
1 1 2 2 k kY ' a b X b X ... b X
14-6
Multiple Regression Analysis
• In the general multiple regression equation, bj is the net change in Y for each unit change in Xj (holding all other values constant) where j = 1 to k.
• Each bj is called a partial regression coefficient, a net regression coefficient, or just a regression coefficient.
• The least squares criterion is used to develop the multiple regression equation.
• Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB is recommended.
14-7
Multiple Standard Errorof Estimate
• The Multiple Standard Error of Estimate is a measure of the effectiveness of the regression equation.
• The formula is:
• It is measured in the same units as the dependent variable.
2
y.12...k
Y Y 's
n k 1
14-8
Multiple Regression and Correlation Assumptions
• Assumptions about Multiple Regression and Correlation:– The independent variables and the dependent variable
have a linear relationship.– The dependent variable is continuous and at least interval
scale.– The variation in (Y-Y’) must be approximately the same
for all values of Y’. When this is the case, differences exhibit homoscedasticity.
– The residuals should follow the normal distribution with a mean of 0.
– Successive values of the dependent variable must be uncorrelated.
14-9
ANOVA Table
• SSR = Regression variation. Variation accounted for by the set of independent variables.
• SSE = Error variation. Variation not accounted for by the independent variables.
• SS Total = Total Variation.
ANOVA TABLE
Source df SS MS
Regression k SSR SSR / k
Error n-(k+1) SSE SSE / [n-(k+1)]
Total n-1 SS Total
2
SSR Y' Y
2SSE Y Y '
2
SST Y Y
14-10
Correlation Matrix• A Correlation Matrix is used to show all the correlation
coefficients between all pairs of the variables.
• The matrix is useful for locating correlated independent variables.
• It shows how strongly each independent variable is correlated with the dependent variable.
CORRELATION MATRIX
CorrelationCoefficients
Cars Advertising Sales Force
Cars 1.000
Advertising 0.808 1.000
Sales Force 0.872 0.537 1.000
14-11
Global Test
• The Global Test is used to investigate whether any of the independent variables have significant coefficients.
• The hypotheses are:H0: β1 = β2 = … = βk = 0
H1: Not all βs are equal to 0
• The test statistic is the F distribution.– The numerator degrees of freedom is k, the number of
independent variables.– The denominator degrees of freedom is n-(k+1), where n
is the sample size.
14-12
Test for Individual Variables
• The Test of Individual Variables is used to determine which independent variables have nonzero regression coefficients.
• The variables that have zero regression coefficients are usually dropped from the analysis.
• The test statistic is the t distribution with n-(k+1) degrees of freedom.
j
j
b
b 0t
s
14-13
Example 1
• A market researcher for Super Dollar Super Markets is studying the yearly amount that families of four or more spend on food.
• Three independent variables are thought to be related to yearly food expenditures (Food).
• These variables are: total family income (Income) in $00, size of the family (Size), and whether the family has children in college (College).
14-14
Example 1 (Continued)
• Notice that the variable, College, is called a dummy or indicator variable. It can take only one of two possible outcomes. That is, a child is a college student or not.
• We usually code one value of the dummy variable as “1” and the other “0”.
• Other examples of dummy variables include: gender, the part is acceptable or unacceptable, the voter will or will not vote for the incumbent governor.
1 2 3Food a b Income b Size b College
14-15
Example 1 (Continued)Family Food Income Size College
1 3900 376 4 0
2 5300 515 5 1
3 4300 516 4 0
4 4900 468 5 0
5 6400 538 6 1
6 7300 626 7 1
7 4900 543 5 0
8 5300 437 4 0
9 6100 608 5 1
10 6400 513 6 1
11 7400 493 6 1
12 5800 563 5 0
14-16
Example 1 (Continued)• Use a computer software package, such as MINITAB or Excel,
to develop a correlation matrix.
• From the analysis provided by MINITAB (as presented on the next slide), write out the regression equation:
• This regression equation can be used to estimate the food expenditures for families of different sizes, with or without college students, and with various incomes.
1 2 3Y ' 954 1.09X 748X 565X
Food $954 $1.09 Income $748 Size $565 College
14-17
Example 1 (Continued)The regression equation isFood = 954 + 1.09 Income + 748 Size + 565 College
Predictor Coef SE Coef T PConstant 954 1581 0.60 0.563Income 1.092 3.153 0.35 0.738Size 748.4 303.0 2.47 0.039Student 564.5 495.1 1.14 0.287
S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%
Analysis of Variance
Source DF SS MS F PRegression 3 10762903 3587634 10.94 0.003Residual Error 8 2623764 327970Total 11 13386667
14-18
Example 1 (Continued)
• This regression equation indicates that:– Each additional $100 of income per year will increase
the amount spent on food by $1.09 per year.– An additional family member will increase the amount
spent per year on food by $748.– A family with a college student will spend $565 more per
year on food than those without a college student.
• For a family of 4, with no college students, and an income of $50,000 (which is input as 500), this regression equation predicts the food expenditure will be $4,491.
Food $954 $1.09 Income $748 Size $565 College
Food $954 $1.09 500 $748 4 $565 0 $4,491
14-19
Example 1 (Continued)
• Regarding the Coefficient of Determination:
• From the regression output (shown on Slide 17), we note that the coefficient of determination is 80.4%.
• This means that more than 80% of the variation in the amount spent on food is accounted for by the variables of income, family size, and student status.
14-20
Example 1 (Continued)
• Regarding the Correlation Matrix:
• The strongest correlation between the dependent variable and an independent variable is between family size and amount spent on food.
• None of the correlations among the independent variables should cause problems. All are between -0.70 and +0.70.
Food Income Size College
Food 1.000
Income 0.587 1.000
Size 0.876 0.609 1.000
College 0.773 0.491 0.743 1.000
14-21
Example 1 (Continued)
• Conduct a Global Test of Hypothesis to determine if any of the regression coefficients are not zero.
• The hypotheses are:H0: β1 = β2 = β3 = 0
H1: At least one β ≠ 0
• The test statistic is the F distribution.
– H0 is rejected if F > 4.07.
– From the MINITAB output, the computed value of F is 10.94.
– Decision: H0 is rejected.
– Not all the regression coefficients are zero.
14-22
Example 1 (Continued)
• Conduct an Individual Test to determine which coefficients are not zero.
• For the independent variable, family size, the hypotheses are:H0: β2 = 0
H1: β2 ≠ 0
• Using the 5% level of significance, reject H0 if the p-value < 0.05.
• From the MINITAB output, the only significant variable is Size (family size) using the p-values. The other variables can be omitted from the model.
14-23
Example 1 (Continued)
• We rerun the analysis using only the significant independent variable, family size.
• The new regression equation is:
• As shown on the next slide, which presents the MINITAB output, the coefficient of determination is 76.8%.
• We dropped two independent variables, and the R-squared term was reduced by only 3.6%.
2Y ' 340 1031X
14-24
Example 1 (Continued)Regression Analysis: Food versus Size
The regression equation isFood = 340 + 1031 Size
Predictor Coef SE Coef T PConstant 339.7 940.7 0.36 0.726Size 1031.0 179.4 5.75 0.000
S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4%
Analysis of Variance
Source DF SS MS F PRegression 1 10275977 10275977 33.03 0.000Residual Error 10 3110690 311069Total 11 13386667
14-25
Analysis of Residuals
• A Residual is the difference between the actual value of Y and the predicted value Y’.
• Residuals should be approximately normally distributed. Histograms and stem-and-leaf charts are useful in checking this requirement.
• A plot of the residuals and their corresponding Y’ values is used for showing that there are no trends or patterns in the residuals.
14-26
Residual Plot
Residual Plot Against Estimated Values of Y
4500 75006000
0
-500
500
1000
Y’
Resi
du
als
14-27
Histogram of Residuals
-600 -200 200 600 1000
876543210
Freq
uen
cy
Residuals