11
MATH 533: Applied Managerial Statistics Part C: Regression and Correlation Analysis Using MINITAB perform the regression and correlation analysis for the data on CREDIT BALANCE (Y) and SIZE (X) by answering the following. 1. Generate a scatterplot for CREDIT BALANCE vs. SIZE, including the graph of the "best fit" line. Interpret. 7 6 5 4 3 2 1 6000 5000 4000 3000 2000 Size Credit Balance($) ScatterplotofCreditBalance($) vs Size The scatter plot of Credit balance ($) versus Size show that the slope of the ‘best fit’ line is upward (positive); this indicates that Credit balance varies directly with Size. As Size increases, Credit Balance also increases vice versa. Correct MINITAB OUTPUT: Regression Analysis: Credit Balance($) versus Size The regression equation is Credit Balance($) = 2591 + 403 Size Predictor Coef SE Coef T P Constant 2591.4 195.1 13.29 0.000 Size 403.22 50.95 7.91 0.000

MATH 533 Part C - Regression and Correlation Analysis

Embed Size (px)

DESCRIPTION

MATH 533 Part C - Regression and Correlation Analysis

Citation preview

MATH 533: Applied Managerial StatisticsPart C: Regression and Correlation AnalysisUsing MINITAB perform the regression and correlation analysis for the data on CREDIT BALANCE (Y) and SIZE (X) by answering the following.1. Generate a scatterplot for CREDIT BALANCE vs. SIZE, including the graph of the "best fit" line. Interpret. The scatter plot of Credit balance ($) versus Size show that the slope of the best fit line is upward (positive); this indicates that Credit balance varies directly with Size. As Size increases, Credit Balance also increases vice versa. CorrectMINITAB OUTPUT:Regression Analysis: Credit Balance($) versus Size

The regression equation isCredit Balance($) = 2591 + 403 Size

Predictor Coef SE Coef T PConstant 2591.4 195.1 13.29 0.000Size 403.22 50.95 7.91 0.000

S = 620.162 R-Sq = 56.6% R-Sq(adj) = 55.7%

Analysis of Variance

Source DF SS MS F PRegression 1 24092210 24092210 62.64 0.000Residual Error 48 18460853 384601Total 49 42553062

Predicted Values for New Observations

NewObs Fit SE Fit 95% CI 95% PI 1 4607.5 119.0 (4368.2, 4846.9) (3337.9, 5877.2)

Values of Predictors for New Observations

NewObs Size 1 5.002. Determine the equation of the "best fit" line, which describes the relationship between CREDIT BALANCE and SIZE. The equation of the best fit line help describes the relationship between Credit Balance and Size is Credit Balance ($) = 2591 + 403.2 Size Correct

3. Determine the coefficient of correlation. Interpret. The coefficient of correlation is given as r = 0.752. The correlation coefficients between the variables show a positive sign or direct relationship. The correlation coefficient is far from the P-Value of 0.000. In this case, a p-value of 0.000 is extremely low. This means that there is an extremely low chance that Credit Balance and Size results are due to chance. Correct

MINITAB OUTPUT:Pearson correlation of Credit Balance ($) and Size = 0.752P-Value = 0.0004. Determine the coefficient of determination. Interpret. The coefficient of determination, R-Sq = 0.566. The proportion of variability in a dataset that is accounted for by the regression model is given by the coefficient of determination which for this regression model is 56.6%. Correct

MINITAB OUTPUT:S = 620.162 R-Sq = 56.6% R-Sq(adj) = 55.7%5. Test the utility of this regression model (use a two tail test with =.05). Interpret your results, including the p-value. The null hypothesis; Ho, states that there is no significant correlation, or the correlation coefficient=0.The Significance Level, = 0.05

Decision Rule: Reject Ho, if p-value < 0.05

From the Analysis of Variance table, I find that the p-value is 0.000, which is much less than 0.05. Therefore, I reject the null hypothesis because there is no significant correlation and conclude that, according to the overall test of significance, the regression model is valid. Correct

MINITAB OUTPUT:Analysis of Variance

Source DF SS MS F PRegression 1 24092210 24092210 62.64 0.000Residual Error 48 18460853 384601Total 49 425530626. Based on your findings in 1-5, what is your opinion about using SIZE to predict CREDIT BALANCE? Explain. Base on my finding, I see that Size is a good predictor of Credit Balance because Credit Balance and Size seems to affect each other. As Size increase Credit Balance seems to increases also; they correlated. As the Size of the household grow so does the Credit Balance of those household also grew and increase. Correct

7. Compute the 95% confidence interval for . Interpret this interval. N/A8. Using an interval, estimate the average credit balance for customers that have household size of 5. Interpret this interval. The household size of 5 average credit balances for customers is estimated to lie within the interval of (4368.2, 4846.9). This is the 95% confidence interval estimate for the credit balance for customers that have household size of 5. Correct

MINITAB OUTPUT:

Predicted Values for New Observations

NewObs Fit SE Fit 95% CI 95% PI 1 4607.5 119.0 (4368.2, 4846.9) (3337.9, 5877.2)

Values of Predictors for New Observations

NewObs Size1 5.00

9. Using an interval, predict the credit balance for a customer that has a household size of 5. Interpret this interval. The credit balance for a customer that has household size of 5 is expected to lie within the interval of (3337.9, 5877.2). This is the 95% prediction interval estimate for the credit balance for a customer that has household size of 5. Correct

MINITAB OUTPUT:

Predicted Values for New Observations

NewObs Fit SE Fit 95% CI 95% PI 1 4607.5 119.0 (4368.2, 4846.9) (3337.9, 5877.2)

Values of Predictors for New Observations

NewObs Size1 5.0010. What can we say about the credit balance for a customer that has a household size of 10? Explain your answer. We cannot say anything about the credit balance for a customer that has a household size of 10 because since the maximum value of the predictor variable (size) used to formulate the given regression model is only 7, which is much less than 10; therefore, we cannot use the given regression model to accurately estimate the credit balance for a customer that has a household size of 10. Correct

In an attempt to improve the model, we attempt to do a multiple regression model predicting CREDIT BALANCE based on INCOME, SIZE and YEARS.11. Using MINITAB run the multiple regression analysis using the variables INCOME, SIZE and YEARS to predict CREDIT BALANCE. State the equation for this multiple regression model. MINITAB OUTPUT:Regression Analysis: Credit Balance($ versus Income ($1000), Size, Years

The regression equation isCredit Balance($) = 1276 + 32.3 Income ($1000) + 347 Size + 7.9 Years

Predictor Coef SE Coef T PConstant 1276.0 273.6 4.66 0.000Income ($1000) 32.272 4.348 7.42 0.000Size 346.85 36.03 9.63 0.000Years 7.88 12.34 0.64 0.526

S = 424.715 R-Sq = 80.5% R-Sq(adj) = 79.2%

Analysis of Variance

Source DF SS MS F PRegression 3 34255444 11418481 63.30 0.000Residual Error 46 8297619 180383Total 49 42553062

Source DF Seq SSIncome ($1000) 1 16703393Size 1 17478430Years 1 73620

Unusual Observations

Income CreditObs ($1000) Balance($) Fit SE Fit Residual St Resid 3 32.0 5100.0 3830.1 93.7 1269.9 3.07R 5 31.0 1864.0 3001.7 139.3 -1137.7 -2.84R 11 25.0 4208.0 3210.1 103.3 997.9 2.42R 17 55.0 4412.0 5250.3 116.3 -838.3 -2.05R

R denotes an observation with a large standardized residual.

The multiple regression equation is: Credit Balance($) = 1276 + 32.3 Income ($1000) + 347 Size + 7.9 Years Correct

12. Perform the Global Test for Utility (F-Test). Explain your conclusion. The null hypothesis, Ho states that there is no significant correlation, or the correlation coefficient=0.Significance Level, = 0.05Decision Rule: Reject Ho if p-value < 0.05 From the Analysis of Variance table, we find that the p-value (0.000) is much less than 0.05. Therefore, we reject the null hypothesis that there is no significant correlation and conclude that, according to the overall test of significance, the multiple regression models are valid. Correct

MINITAB OUTPUT:

Test for Equal Variances: Credit Balance($) versus Income ($1000)

95% Bonferroni confidence intervals for standard deviations

Income($1000) N Lower StDev Upper 21 2 267.855 830.85 344720 22 2 188.069 583.36 242037 23 1 * * * 25 1 * * * 26 1 * * * 27 2 101.215 313.96 130260 29 1 * * * 30 3 123.736 309.43 7053 31 1 * * * 32 1 * * * 33 1 * * * 34 1 * * * 35 1 * * * 37 2 328.265 1018.23 422465 39 2 276.062 856.31 355281 40 1 * * * 41 1 * * * 42 1 * * * 44 1 * * * 46 1 * * * 48 2 80.471 249.61 103563 50 2 259.193 803.98 333571 51 1 * * * 52 1 * * * 54 3 396.622 991.86 22607 55 4 290.865 647.76 5780 61 1 * * * 62 2 221.807 688.01 285457 63 1 * * * 64 1 * * * 65 1 * * * 66 2 87.765 272.24 112951 67 2 70.212 217.79 90361

Bartlett's Test (Normal Distribution)Test statistic = 5.59, p-value = 0.935

Levene's Test (Any Continuous Distribution)Test statistic = 1.01, p-value = 0.479

Test for Equal Variances: Credit Balance($) versus Size

95% Bonferroni confidence intervals for standard deviations

Size N Lower StDev Upper 1 5 137.540 271.807 1303.27 2 15 459.836 698.998 1337.23 3 8 193.542 336.323 943.85 4 9 415.251 701.689 1796.00 5 5 340.696 673.284 3228.28 6 5 360.277 711.981 3413.83 7 3 150.085 356.267 5956.16

Bartlett's Test (Normal Distribution)Test statistic = 8.07, p-value = 0.233

Levene's Test (Any Continuous Distribution)Test statistic = 1.12, p-value = 0.369

Test for Equal Variances: Credit Balance($) versus Years

95% Bonferroni confidence intervals for standard deviations

Years N Lower StDev Upper 1 2 541.930 1714.03 875261 2 1 * * * 4 2 452.950 1432.60 731550 5 2 130.788 413.66 211232 6 1 * * * 7 2 78.920 249.61 127462 8 1 * * * 9 2 76.013 240.42 122768 10 2 135.483 428.51 218815 11 4 204.115 461.26 4413 12 4 348.641 787.86 7538 13 4 167.957 379.55 3631 14 5 584.321 1221.32 7236 15 3 232.333 590.58 14935 16 4 231.705 523.61 5010 17 2 111.114 351.43 179457 18 5 452.721 946.25 5607 19 2 121.398 383.96 196067 20 2 540.589 1709.78 873094

Bartlett's Test (Normal Distribution)Test statistic = 13.77, p-value = 0.543

Levene's Test (Any Continuous Distribution)Test statistic = 2.23, p-value = 0.029Conclusion is that since all the p-value of the Bartletts Test (Normal Distribution) is greater than 0.05, I am unable to reject the null hypothesis. Levenes Test does not assume Normality and also fails to reject the null hypothesis of equal variance.13. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, which independent variables should we keep and which should be discarded. Test the significance for the individual coefficients of the independent variables. The null hypothesis, Ho states that there is no significant correlation, or the correlation coefficient p = 0.Decision Rule: Reject Ho if p-value