18
578Assignment-5 (Chs. 13 and 14)-solutions: Due by midnight of Sunday, December 2 nd , 2012: drop box 4): 70 points True/False(One point each) Chapter 13 1. The standard error of the estimate (standard error) is the estimated standard deviation of the distribution of the independent variable (X). FALSE it is the estimate of the standard deviation of the error term 2. In a simple linear regression model, the coefficient of determination only indicates the strength of the relationship between independent and dependent variable, but does not show whether the relationship is positive or negative. TRUE R 2 is greater than or equal to 0, no negative 3. When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variable. FALSE the strong correlation could be negative 4. The error term is the difference between an 1

Statistics 578 Assignment 5 Homework

  • Upload
    mia-dee

  • View
    1.654

  • Download
    1

Embed Size (px)

DESCRIPTION

Study Help

Citation preview

Page 1: Statistics 578 Assignment 5 Homework

578Assignment-5 (Chs. 13 and 14)-solutions: Due by midnight of Sunday, December 2nd, 2012: drop box 4): 70 points

True/False(One point each)

Chapter 131. The standard error of the estimate (standard error) is the estimated standard deviation of the distribution of the independent variable (X).FALSE it is the estimate of the standard deviation of the error term

2. In a simple linear regression model, the coefficient of determination only indicates the strength of the relationship between independent and dependent variable, but does not show whether the relationship is positive or negative. TRUE R2 is greater than or equal to 0, no negative

3. When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variable. FALSEthe strong correlation could be negative

4. The error term is the difference between an individual value of the dependent variable and the corresponding mean value of the dependent variable. FALSE it is the difference between an individual value of the dependent variable and the corresponding predicted value (not the mean value) : residual and error term are the same thing

5. In bi-variate regression the Coefficient of Determination is always equal to the square of the correlation coefficient. TRUE

6. In Regression Analysis if the variance of the error term is constant, we call it the Heteroscedasticity property.

FALSE (instruction page 10-11)

1

Page 2: Statistics 578 Assignment 5 Homework

Chapter 14

7. When the F test is used to test the overall significance of a multiple regression model, if the null hypothesis is rejected, it can be concluded that all of the independent variables X1, X2, Xk are significantly related to the dependent variable Y. FALSE we can conclude that at least one (not all)….

8. An application of the multiple regression model generated the following results involving the F test of the overall regression model: p-value=.0012, R2=.67 and s=.076. Thus, the null hypothesis, which states that none of the independent variables are significantly related to the dependent variable, should be rejected even at the .01 level of significance. TRUE since p-value is less than 0.01

9. High Multicollinearity problem occurs when the Independent variables are highly correlated with the Dependent variable. FALSE It occurs when there is high linear relation among the Independent variables.

10. The assumption of independent error terms in regression analysis is often violated when using time series data and is called the problem of Autocorrelation. TRUE see Instructions

11. Homoscedasticity problem occurs when the assumption of constant error variance is violated. FALSE. This problem is called Heteroscedasticity and frequently occurs in cross-sectional data.

Multiple Choices(Two points each)

Chapter 13

1. All of the following are assumptions of the error terms in the simple linear regression model except :A. Errors are normally distributedB. Error terms have a mean of zeroC. Error terms have a constant varianceD. Error terms depend on the explanatory variable(Instruction page 10-11, Book page 530)

2

Page 3: Statistics 578 Assignment 5 Homework

2. The point estimate of the variance in a regression model is A. SSEB.  MSEC. se

D. b1

3. The least squares regression line minimizes the sum of the A. Sum of Differences between actual and predicted Y valuesB. Sum of Squared differences between actual and predicted X values C. Sum of Absolute deviations between actual and predicted X valuesD. Sum of Absolute deviations between actual and predicted Y valuesE. Sum of Squared differences between actual and predicted Y values

4. The ___________ the R2 and the __________ the s (standard error), the stronger the relationship between the dependent variable and the independent variable. A. Higher, lower B. Lower, higherC. Lower, lowerD. Higher, higher

5. In simple bivariate regression analysis, if the correlation coefficient is a positive value, then A. The Y intercept must also be a positive value.B. The coefficient of determination can be either positive or negative, depending on the value of the slope.C. The least squares regression equation could either have a positive or a negative slope.D. The standard error of estimate can either have a positive or a negative value.E. The slope of the regression line must also be positive.

(the slope coefficient and correlation coefficient have the same sign in bivariate regression- also obvious from the interpretation of the slope in

3

Page 4: Statistics 578 Assignment 5 Homework

Instruction- but note that the relation could be weak or strong. Positive sign only shows the direction not the magnitude.)

6. A researcher wants to explore the relationship between the grades students receive on their Midterm test and their Final test score. The following data present the Midterm and Final scores for ten students. What is the correlation coefficient?

Mid Fin180 280195 280210 300225 316240 320255 350255 370264 320265 400290 350

A. 0.556B. 0.645C. 0.738D. 0.802E. 0.905

The MegaStat result is given below:

Correlation Matrix

Mid FinMid 1.000  

Fin .802 1.00

0

10 sample size

Chapter 147. Which is not an assumption of a multiple regression model? A. Positive autocorrelation of error termsB. Normality of error terms

4

Page 5: Statistics 578 Assignment 5 Homework

C. Independence of error termsD. Constant variation of error termsE. Independence of error terms with X variablessee Instructions

8. A multiple regression analysis with 22 observations on each of four independent variables and the dependent variable would yield ______ and _______ degrees of freedom respectively for regression (explained) and error. A. 3, 17B. 4, 20C. 4, 18D. 3, 20E.   4, 17

df for regression = k = 4 and df for error = n-k-1 = 22-4-1 = 17

9. Consider the following partial computer output for a multiple regression model.

What is R2? A. 31.308%B. 76.95%C. 77.72%D. 72.63%E. 23.1%

where the denominator is SST

5

Page 6: Statistics 578 Assignment 5 Homework

10. Consider the following partial computer output for a multiple regression model.

What is adjusted R2? A. 31.308%B. 76.95%C. 87.72%D. 72.63%E. 23.1%R2 = 1- [SSE/(n-k-1)]/[SST/(n-1)] = 1- (9.378/16)/(40.686/19) =.7263

11. In multiple regression analysis, the mean square regression divided by mean square error yields the: A. Standard errorB. F statisticC. R2

D. Adjusted R2 or E. T statisticsee Instructions

12. A particular multiple regression model has 3 independent variables, the sum of the squared error is 7680 and the total number of observations is 34. What is the value of the standard error of estimate? A. 256

6

Page 7: Statistics 578 Assignment 5 Homework

B. 232.72C. 225.89D. 16E. 15.03The df for error = 34- 3-1 = 30 and the standard error of estimate is √MSE = √(7680/30) = 16.

Essay Type (Five points each)

Chapter 13

1. Use the following results obtained from a simple linear regression analysis with 12 observations. Y   = 37.2895 - (1.2024)X

r2 = 0.6744 sb1 = 0.2934Test to determine if there is a significant negative relationship between the independent and dependent variable at =.05 and .01Reject H0, There is a significant negative relationship between dependent and independent variable. H0: b1≥0 and Ha: b1<0one-tailed test

b1=-1.2024, therefore, t(hat)=b1/sb1=-1.2024/0.2934 = -4.09816; df = 12-2 = 10

table t-values (one-tailed) are 1.812 and 2.764 for 5% and 1% significance levels

Comparing the absolute value of t(hat) with critical values in the table we conclude that  b1 is highly significant or significantly different from zero (therefore negative here) even at 1% significance level (or with 99% confidence).

2. A local tire dealer wants to predict the number of tires sold each month. He believes that the number of tires sold is a linear function of the amount of money invested in advertising. He randomly selects 6 months of data consisting of tire sales (in thousands of tires) and advertising expenditures (in thousands of dollars). Based on the data set with 6 observations, the simple linear regression model yielded the following results. (X is advertising expenditure in thousand dollars and Y is tires sold in thousands): ∑X =24; ∑Y =42; ∑X2 = 124; ∑Y2 = 338; ∑XY = 196

7

Page 8: Statistics 578 Assignment 5 Homework

Find the Intercept and slope and Write the Regression Equation. Also predict the amount of tires (in thousand tires) sold when money invested in advertising is 5 thousand dollars. Calculate the correlation coefficient, coefficient of determination. Check whether there is a relation between correlation coefficient and coefficient of determination. Calculate SSE and MSE and standard error of the slope coefficient.

SSxy = 196- (24*42)/6 = 28 SSxx = 124 – (242)/6 = 28 SSyy = SST = 338– (422)/6 = 44. b1= SSxy/ SSxx = 1 and b0 = b0 = Y - b1X = 7 – 4 = 3.The estimated Regression Equation is:Y = 3 + 1*X or simply 3 + X. Since advertising is measured in thousand dollars, we enter 5 for 5 thousand dollars, in the equation for prediction.At X = 5, Y = 3 + 5 = 8 or 8 thousand tires sold.R2 = SSR/SST = 28/44 = 0.6364: the model explains 63.64% of variation in the tire sales.

Correlation coefficient r = SSxy/√[ SSxx SSyy] = 28/√28*44 =0.7977

We see that √.6364 = .7977. Thus it is verified that Coefficient of Determination is the square of the Correlation coefficient

SSR = SS2xy/ SSxx = 282/28 = 28 SSE = SST- SSR = 44 -28 = 16

MSE = SSE/(6-2) = 16/4 = 4.

Standard Error of Estimate = se = √4 = 2sb1 = se/√ SSxx = 2/√28 = 0.378

3. Consumer Reports provided extensive testing and ratings for more than 100 HDTVs. An overall score, based primarily on picture quality, was developed for each model. In general, a higher overall score indicates better performance. The following data show the price and overall score for the ten 42-inch plasma televisions (Consumer Report data slightly changed here):

Brand Price Score

8

Page 9: Statistics 578 Assignment 5 Homework

Dell 2800 60Hisense 2800 55Hitachi 2700 45JVC 3500 50LG 3300 55Maxent 2000 38Panasonic 4000 67Phillips 3000 56Proview 2500 32Samsung 3000 40

Use the above data to develop an estimated regression equation. Compute the Coefficient of Determination and the correlation coefficient and show their relation. Interpret the explanatory power of the model. Estimate the overall score for a 42-inch plasma television with a price of $3400.

The MegaStat Regression result is shown below:

Regression Analysis

R² 0.491 n 10 r 0.701 k 1

Std. Error 8.243 Dep. Var. Score

ANOVA table

Source SS df MS F p-valueRegression 524.0257 1 524.0257 7.71 .0240

Residual 543.5743 8 67.9468 Total 1,067.6000 9      

Regression output confidence interval

variables coefficient

s std. error t

(df=8)p-

value95%

lower95%

upperstd.

coeff.Intercept 8.8950 14.9582 0.595 .5685 -25.5987 43.3888 0.000

Price 0.0138 0.0050 2.777 .0240 0.0023 0.0253 0.701 Observatio

n ScorePredicte

d Residua

l1 60.0 47.6 12.4 2 55.0 47.6 7.4 3 45.0 46.2 -1.2 4 50.0 57.3 -7.3 5 55.0 54.5 0.5 6 38.0 36.5 1.5 7 67.0 64.2 2.8 8 56.0 50.4 5.6

9

Page 10: Statistics 578 Assignment 5 Homework

9 32.0 43.4 -11.4 10 40.0 50.4 -10.4

Estimated regression equation: Y = 8.895+ 0.0138X Coefficient of determination, R2 = 0.491Correlation coefficient, r = 0.701Relation: r2 = 0.7012 = 0.491 = R2

The Regression model can explain 49.1% of the variation in the Dependent variable (Overall score)Estimate Overall score: when X = 3400 is Y = 8.895+ 0.0138*3400 = 55.82

Calculator based:

Brand Score=Y Price=X Xi-X =xi

Yi-Y =

yi xi*yi xi2 yi

2

Dell 60 2800 -160 10.2 -1632 25600 104.04

Hisense 55 2800 -160 5.2 -832 25600 27.04

Hitachi 45 2700 -260 -4.8 1248 67600 23.04

JVC 50 3500 540 0.2 108 291600 0.04

LG 55 3300 340 5.2 1768 115600 27.04

Maxent 38 2000 -960 -11.8 11328 921600 139.24

Panasonic 67 4000 1040 17.2 17888 1081600 295.84

Phillips 56 3000 40 6.2 248 1600 38.44

Proview 32 2500 -460 -17.8 8188 211600 316.84

Samsung 40 3000 40 -9.8 -392 1600 96.04

Sum 498 29600 0 0 37920 2744000 1067.6

SSXY SSXX SSYY

X =29600/10 =2960Y = 498/10 = 49.8b1 = SSxy/SSxx = 0.0138

b0=49.8- (.0138*2960)= 8.895yhat=8.895 +0.0138X

10

Page 11: Statistics 578 Assignment 5 Homework

if X=3400, Y-hat = 8.8950 + (.0138*3400) = 55.82SST=1067.6SSR=.0138*37920 = 524SSE=1067.6 - 524 = 543.6R2= SSR/SST = 524/1067.6 = 0.491r= SSxy/√( SSxx SSyy) = 37920/√(2744000*1067.6) = 0.701

R = √.491= 0.701 Relationship between r & R2 verified

Chapter 14

4. A member of the state legislature has expressed concern about the differences in the mathematics test scores of high school freshmen across the state. She asks her research assistant to conduct a study to investigate what factors could account for the differences. The research assistant looked at a random sample of school districts across the state and used the factors of percentage of mathematics teachers in each district with a degree in mathematics, the average age of mathematics teachers and the average salary of mathematics teachers

Analysis of Variance

     Write the least squares prediction equation. What is the number of observations in the sample? Based on the multiple regression model given above, estimate the mathematics test score and calculate the value of the residual, if the percentage of teachers with a mathematics degree is 50.0, the average age is 43 and the average salary is 48,300 (48.3). If the actual mathematics test score for these factors is 68.50, what is the error for this observation?

Ŷ=35.178+0.22073X1+0.3353X2+0.0930X3 n=3+32+1=36

11

Page 12: Statistics 578 Assignment 5 Homework

Estimated test score=65.12 and residual=3.38Ŷ =35.178+0.22073(50)+0.3353(43)+0.0930(48.3)=65.12e=68.50-65.12=3.38

5. For the above equation (question # 4) answer the following: What is the total sum of squares?  What is the explained variation? What is the mean square error? 

SS Total=SST = 1053.09+1858.50=2911.59 SSR = explained variation=1053.09SSE = 2911.59-1053.09 = 1858.50 and MSE=1858.50/32 = 58.08

6. For the above equation (question # 4), calculate the Coefficient of Determination and the Adjusted coefficient of Determination and Test for the overall usefulness of the model using F-Statistic at 5% and 1% significance levels.

R2=1053.09/2911.59=0.3617R2 adjusted=(0.3617-(3/35))(35/32)=(0.3617-0.0857)(1.0938)=0.3019MSR=1053.09/3=351.03MSE=1858.50/32=58.08F=MSR/MSE=351.03/58.08=6.04F.01,3,32=4.51 Since, 6.04>4.51, reject H0

The coefficient of determination is highly significant and the model is useful. (Since it exceeds critical value at 1% we don’t need to test at 5%). However the model can explain only 36.17 % of variation and 30.19 % of variance in the dependent variable.

7. For the above Regression (question # 4), test the usefulness (or significance of the three independent variables using t-test for 5% and 1% significance levels.t1 = 0.22073/0.07131 = 3.095t2 = 0.3353/0.1901 = 1.764t3 = 0.0930/0.1675 = 0.555The two-tailed t-values for 32 df are: 2.037 and 2.738 for 5% and 1%, respectively.

12

Page 13: Statistics 578 Assignment 5 Homework

Thus, only the first slope is statistically significant at 1% level and the other two are insignificant even at 5%. In other words, statistically speaking, only the percentage of teachers with Math degree is significant (that is useful) in explaining the changes is Math test scores given the current sample results. This is not surprising considering the low value of the coefficient of Determination.

13