27
14-1 Welcome to Chapter 14 MBA 541 BENEDICTINE UNIVERSITY Regression and Correlation Multiple Regression and Correlation Analysis Chapter 14

Welcome to Chapter 14 MBA 541

Embed Size (px)

DESCRIPTION

Welcome to Chapter 14 MBA 541. B ENEDICTINE U NIVERSITY Regression and Correlation Multiple Regression and Correlation Analysis Chapter 14. Chapter 14. Please, Read Chapter 14 in Lind before viewing this presentation. Statistical Techniques in Business & Economics Lind. Goals. - PowerPoint PPT Presentation

Citation preview

Page 1: Welcome to Chapter 14 MBA 541

14-1

Welcome to Chapter 14 MBA 541

BENEDICTINE UNIVERSITY

• Regression and Correlation

• Multiple Regression and Correlation Analysis

• Chapter 14

Page 2: Welcome to Chapter 14 MBA 541

14-2

Chapter 14

Please, Read Chapter 14 in

Lind before viewing this presentation.

StatisticalTechniques in

Business &Economics

Lind

Page 3: Welcome to Chapter 14 MBA 541

14-3

Goals

When you have completed this chapter, you will be able to:

• ONE– Describe the relationship between two or more

independent variables and the dependent variable using a multiple regression equation.

• TWO– Compute and interpret the multiple standard error of

estimate and the coefficient of determination.• THREE

– Interpret a correlation matrix.

Page 4: Welcome to Chapter 14 MBA 541

14-4

Goals

When you have completed this chapter, you will be able to:

• FOUR– Set up and interpret an ANOVA table.

• FIVE– Conduct a test of hypothesis to determine if any of the

set of regression coefficients differ from zero.• SIX

– Conduct a test of hypothesis on each of the regression coefficients.

Page 5: Welcome to Chapter 14 MBA 541

14-5

Multiple Regression Analysis

• The general multiple regression with k independent variables is given by:

where: a is the Y-intercept, andX1 to Xk are the independent

variables.

• Greek letters are used for a (α) and b (β) when denoting population parameters.

1 1 2 2 k kY ' a b X b X ... b X

Page 6: Welcome to Chapter 14 MBA 541

14-6

Multiple Regression Analysis

• In the general multiple regression equation, bj is the net change in Y for each unit change in Xj (holding all other values constant) where j = 1 to k.

• Each bj is called a partial regression coefficient, a net regression coefficient, or just a regression coefficient.

• The least squares criterion is used to develop the multiple regression equation.

• Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB is recommended.

Page 7: Welcome to Chapter 14 MBA 541

14-7

Multiple Standard Errorof Estimate

• The Multiple Standard Error of Estimate is a measure of the effectiveness of the regression equation.

• The formula is:

• It is measured in the same units as the dependent variable.

2

y.12...k

Y Y 's

n k 1

Page 8: Welcome to Chapter 14 MBA 541

14-8

Multiple Regression and Correlation Assumptions

• Assumptions about Multiple Regression and Correlation:– The independent variables and the dependent variable

have a linear relationship.– The dependent variable is continuous and at least interval

scale.– The variation in (Y-Y’) must be approximately the same

for all values of Y’. When this is the case, differences exhibit homoscedasticity.

– The residuals should follow the normal distribution with a mean of 0.

– Successive values of the dependent variable must be uncorrelated.

Page 9: Welcome to Chapter 14 MBA 541

14-9

ANOVA Table

• SSR = Regression variation. Variation accounted for by the set of independent variables.

• SSE = Error variation. Variation not accounted for by the independent variables.

• SS Total = Total Variation.

ANOVA TABLE

Source df SS MS

Regression k SSR SSR / k

Error n-(k+1) SSE SSE / [n-(k+1)]

Total n-1 SS Total

2

SSR Y' Y

2SSE Y Y '

2

SST Y Y

Page 10: Welcome to Chapter 14 MBA 541

14-10

Correlation Matrix• A Correlation Matrix is used to show all the correlation

coefficients between all pairs of the variables.

• The matrix is useful for locating correlated independent variables.

• It shows how strongly each independent variable is correlated with the dependent variable.

CORRELATION MATRIX

CorrelationCoefficients

Cars Advertising Sales Force

Cars 1.000

Advertising 0.808 1.000

Sales Force 0.872 0.537 1.000

Page 11: Welcome to Chapter 14 MBA 541

14-11

Global Test

• The Global Test is used to investigate whether any of the independent variables have significant coefficients.

• The hypotheses are:H0: β1 = β2 = … = βk = 0

H1: Not all βs are equal to 0

• The test statistic is the F distribution.– The numerator degrees of freedom is k, the number of

independent variables.– The denominator degrees of freedom is n-(k+1), where n

is the sample size.

Page 12: Welcome to Chapter 14 MBA 541

14-12

Test for Individual Variables

• The Test of Individual Variables is used to determine which independent variables have nonzero regression coefficients.

• The variables that have zero regression coefficients are usually dropped from the analysis.

• The test statistic is the t distribution with n-(k+1) degrees of freedom.

j

j

b

b 0t

s

Page 13: Welcome to Chapter 14 MBA 541

14-13

Example 1

• A market researcher for Super Dollar Super Markets is studying the yearly amount that families of four or more spend on food.

• Three independent variables are thought to be related to yearly food expenditures (Food).

• These variables are: total family income (Income) in $00, size of the family (Size), and whether the family has children in college (College).

Page 14: Welcome to Chapter 14 MBA 541

14-14

Example 1 (Continued)

• Notice that the variable, College, is called a dummy or indicator variable. It can take only one of two possible outcomes. That is, a child is a college student or not.

• We usually code one value of the dummy variable as “1” and the other “0”.

• Other examples of dummy variables include: gender, the part is acceptable or unacceptable, the voter will or will not vote for the incumbent governor.

1 2 3Food a b Income b Size b College

Page 15: Welcome to Chapter 14 MBA 541

14-15

Example 1 (Continued)Family Food Income Size College

1 3900 376 4 0

2 5300 515 5 1

3 4300 516 4 0

4 4900 468 5 0

5 6400 538 6 1

6 7300 626 7 1

7 4900 543 5 0

8 5300 437 4 0

9 6100 608 5 1

10 6400 513 6 1

11 7400 493 6 1

12 5800 563 5 0

Page 16: Welcome to Chapter 14 MBA 541

14-16

Example 1 (Continued)• Use a computer software package, such as MINITAB or Excel,

to develop a correlation matrix.

• From the analysis provided by MINITAB (as presented on the next slide), write out the regression equation:

• This regression equation can be used to estimate the food expenditures for families of different sizes, with or without college students, and with various incomes.

1 2 3Y ' 954 1.09X 748X 565X

Food $954 $1.09 Income $748 Size $565 College

Page 17: Welcome to Chapter 14 MBA 541

14-17

Example 1 (Continued)The regression equation isFood = 954 + 1.09 Income + 748 Size + 565 College

Predictor Coef SE Coef T PConstant 954 1581 0.60 0.563Income 1.092 3.153 0.35 0.738Size 748.4 303.0 2.47 0.039Student 564.5 495.1 1.14 0.287

S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1%

Analysis of Variance

Source DF SS MS F PRegression 3 10762903 3587634 10.94 0.003Residual Error 8 2623764 327970Total 11 13386667

Page 18: Welcome to Chapter 14 MBA 541

14-18

Example 1 (Continued)

• This regression equation indicates that:– Each additional $100 of income per year will increase

the amount spent on food by $1.09 per year.– An additional family member will increase the amount

spent per year on food by $748.– A family with a college student will spend $565 more per

year on food than those without a college student.

• For a family of 4, with no college students, and an income of $50,000 (which is input as 500), this regression equation predicts the food expenditure will be $4,491.

Food $954 $1.09 Income $748 Size $565 College

Food $954 $1.09 500 $748 4 $565 0 $4,491

Page 19: Welcome to Chapter 14 MBA 541

14-19

Example 1 (Continued)

• Regarding the Coefficient of Determination:

• From the regression output (shown on Slide 17), we note that the coefficient of determination is 80.4%.

• This means that more than 80% of the variation in the amount spent on food is accounted for by the variables of income, family size, and student status.

Page 20: Welcome to Chapter 14 MBA 541

14-20

Example 1 (Continued)

• Regarding the Correlation Matrix:

• The strongest correlation between the dependent variable and an independent variable is between family size and amount spent on food.

• None of the correlations among the independent variables should cause problems. All are between -0.70 and +0.70.

Food Income Size College

Food 1.000

Income 0.587 1.000

Size 0.876 0.609 1.000

College 0.773 0.491 0.743 1.000

Page 21: Welcome to Chapter 14 MBA 541

14-21

Example 1 (Continued)

• Conduct a Global Test of Hypothesis to determine if any of the regression coefficients are not zero.

• The hypotheses are:H0: β1 = β2 = β3 = 0

H1: At least one β ≠ 0

• The test statistic is the F distribution.

– H0 is rejected if F > 4.07.

– From the MINITAB output, the computed value of F is 10.94.

– Decision: H0 is rejected.

– Not all the regression coefficients are zero.

Page 22: Welcome to Chapter 14 MBA 541

14-22

Example 1 (Continued)

• Conduct an Individual Test to determine which coefficients are not zero.

• For the independent variable, family size, the hypotheses are:H0: β2 = 0

H1: β2 ≠ 0

• Using the 5% level of significance, reject H0 if the p-value < 0.05.

• From the MINITAB output, the only significant variable is Size (family size) using the p-values. The other variables can be omitted from the model.

Page 23: Welcome to Chapter 14 MBA 541

14-23

Example 1 (Continued)

• We rerun the analysis using only the significant independent variable, family size.

• The new regression equation is:

• As shown on the next slide, which presents the MINITAB output, the coefficient of determination is 76.8%.

• We dropped two independent variables, and the R-squared term was reduced by only 3.6%.

2Y ' 340 1031X

Page 24: Welcome to Chapter 14 MBA 541

14-24

Example 1 (Continued)Regression Analysis: Food versus Size

The regression equation isFood = 340 + 1031 Size

Predictor Coef SE Coef T PConstant 339.7 940.7 0.36 0.726Size 1031.0 179.4 5.75 0.000

S = 557.7 R-Sq = 76.8% R-Sq(adj) = 74.4%

Analysis of Variance

Source DF SS MS F PRegression 1 10275977 10275977 33.03 0.000Residual Error 10 3110690 311069Total 11 13386667

Page 25: Welcome to Chapter 14 MBA 541

14-25

Analysis of Residuals

• A Residual is the difference between the actual value of Y and the predicted value Y’.

• Residuals should be approximately normally distributed. Histograms and stem-and-leaf charts are useful in checking this requirement.

• A plot of the residuals and their corresponding Y’ values is used for showing that there are no trends or patterns in the residuals.

Page 26: Welcome to Chapter 14 MBA 541

14-26

Residual Plot

Residual Plot Against Estimated Values of Y

4500 75006000

0

-500

500

1000

Y’

Resi

du

als

Page 27: Welcome to Chapter 14 MBA 541

14-27

Histogram of Residuals

-600 -200 200 600 1000

876543210

Freq

uen

cy

Residuals