10
1 FREC 409 Assignment 2 Be sure to: Put your name and the Assignment # on the front Answer as completely as you can. All I can go on is what you give me, so show your work. Upload the file back to Sakai This assignment is worth 6 points toward your final grade. Be sure to give complete answers and give a narrative explaining what you did when asked. Problem. The Newark Coffee Company sells coffee and related products via a large number of stores. Some of those stores are located in a college town and some have a drive through service. We are involved in a study to determine the effect the square footage, location in a college town, and whether there is a drive through have on sales. We randomly select 53 stores and conduct a study using regression analysis. The file Newark Coffee.JMP can be found on the web site. The variables are: Dependent: Weekly Sales in dollars for the week the store was sampled Independent: SqFt the square footage of the store College a nominal variable that indicates if a college is located nearby Drivethru a nominal variable that indicates whether the store has a drive thru service A. Use JMP to generate basic descriptive statistics for each of the variables in the analysis. Briefly describe each variable in words so that the reader has a good sense of the weekly sales and square footage of the sample stores, as well as the proportion of stores that are in a college town or has a drive through window. The average weekly sales for the 53 stores is $22,462.85. The median value is close to the mean at $22956.40. The histogram shows a symmetrical mound shaped distribution that looks approximately normal. The standard deviation ($2,340.49) is relatively small compared with the mean and the CV is only 10.42%.

FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

1

FREC 409 Assignment 2

Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give me, so show your work. • Upload the file back to Sakai • This assignment is worth 6 points toward your final grade. Be sure to give complete answers and

give a narrative explaining what you did when asked. Problem. The Newark Coffee Company sells coffee and related products via a large number of stores. Some of those stores are located in a college town and some have a drive through service. We are involved in a study to determine the effect the square footage, location in a college town, and whether there is a drive through have on sales. We randomly select 53 stores and conduct a study using regression analysis. The file Newark Coffee.JMP can be found on the web site. The variables are:

Dependent: Weekly Sales in dollars for the week the store was sampled

Independent: SqFt the square footage of the store College a nominal variable that indicates if a college is located nearby Drivethru a nominal variable that indicates whether the store has a drive thru service

A. Use JMP to generate basic descriptive statistics for each of the variables in the analysis. Briefly describe each variable in words so that the reader has a good sense of the weekly sales and square footage of the sample stores, as well as the proportion of stores that are in a college town or has a drive through window.

The average weekly sales for the 53 stores is $22,462.85. The median value is close to the mean at $22956.40. The histogram shows a symmetrical mound shaped distribution that looks approximately normal. The standard deviation ($2,340.49) is relatively small compared with the mean and the CV is only 10.42%.

Page 2: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

2

The average square feet of the stores was 925.81. The distribution is skewed right, but without any extreme outliers. The median value is 939 sq. ft. The spread of the data is small with a CV of only 13.17%.

22.64% of the stores in the sample were located in college towns.

Only 13.21% of the stores in the sample had a drive thru window service.

B. Use JMP to run the correlations for all the variables. I would list Weekly Sales first. I have already

created the dummy variables for College and Drivethru. To generate correlations in JMP you use: Analyze; Multivariate Methods; Multivariate Then you add the variables of interest. For correlations, you must have continuous variables, so use College Dum and Drivethru Dum as opposed to the nominal versions of College and Drivethru. You

Page 3: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

3

can work with just the correlations. Briefly describe the correlations of each independent variable with Weekly Sales as well as any high correlations between the independent variables.

The highest correlation with Weekly Sales is SqFt at .8101. There is a strong positive correlation between the size of the store and the weekly sales. There is also a moderate positive correlation between sales and whether the store is located in a college town – stores in a college town have on average higher sales. A moderate to strong negative correlation exists between sales and whether the store has a drive thru window service. Stores with a drive thru window have on average lower weekly sales. Both College Dum and Drivethru Dum are moderately correlated with SqFT. Stores in college towns are on average larger in size (.5281) while stores with a drive thru are on average smaller (-.580).

C. Look at the bivariate relationships of the independent variables with Weekly Sales. I suggest the following Fit Y by X commands. Note that I am suggesting using the nominal variables College and Drivethru for the bivariate comparisons. Analyze; Fit Y by X; Weekly Sales and SqFt. Ask to fit a line and interpret the bivariate regression. Analyze; Fit Y by X; Weekly Sales and College. Ask for Means/ANOVA/Pooled t. Interpret the difference of means test for sales of stores in a college town and those not. Analyze; Fit Y by X; Weekly Sales and Drivethru. Ask for Means/ANOVA/Pooled t. Interpret the difference of means test for sales of stores with a drive through and those without.

1. Weekly Sales and SqFt

Page 4: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

4

As expected from the correlation, there is a moderately strong positive relationship between the square footage of the store and weekly sales. For every square foot of space, sales increase by $15.55. R2 for this bivariate model is .656, indicating SqFt accounts for 65.6% of the variability in weekly sales.

2. Weekly Sales and College

Page 5: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

5

There is a significant difference in weekly sales between college towns and non college towns. College town stores have an average weekly sales of $25,378.50 while noncollege town stores have an average of $21,609.50. The difference of the means is $3,78.99 per week and is highly significant (p < .001). Stores in college towns seem to have less variability in weekly sales (based on looking at the box plots), though the difference does not seem to be large.

3. Weekly Sales and Drivethru

Page 6: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

6

Stores with a drive thru have on average less weekly sales of $18,228.60 while stores without a drivethru have average weekly sales of $23,107.20. The difference of means is -$4,878.60 per week, a substantial amount. This difference is statistically significant at the p < .001.

D. Now we are ready for the full model. Use the following model Analyze; Fit Model; Weekly Sales is Y. SqFt, College Dum and Drivethru Dum are the independent variables. In your write up, interpret the overall fit of the model, the conclusion of the F-test; the conclusions of the individual t-tests for the coefficients, and a look at the residuals. Next look at the regression equation and interpret the results of the analysis, including the relative importance of each variable in the model. Pay attention to whether any of the bivariate relationships changed much in a multivariate framework. Summarize the overall findings in terms of the impact of having a store in a college town and having a drive through window.

Page 7: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

7

The overall fit is the model is quite good with a R2 of .8617 or 86.17% of the variability in weekly sales is explained by the square footage and knowing whether the store was located in a college town or if it had a drive thru window. The residual plot also suggests a good fit to the data with no visual patterns.

The F-test is significant at p <.001 indicating at least one of the coefficients for the independent variables in the model is significantly different from zero.

The coefficient for SqFt is 6.714, which is considerably smaller than the bivariate value of 15.55. Controlling for College Dum and Drivethru Dum reduced the size of this coefficient. However, it is still statistically significant at p < .001.

The coefficient for College Dum is 2250.506 and is statistically significant at p < .001. This value is also less than the bivariate difference of means ($3768.99) and indicates that stores in college towns still have higher weekly sales, but the difference is reduced once we control for the other factors in the model. The reduction in the value of this coefficient might be due to the correlation between College Dum and SqFt – stores in college towns tend to

be bigger. However, there still remains a unique effect of a college town store that is positive, substantial, and statistically significant.

The coefficient for Drivethru Dum is -2902.717 and is statistically significant at p < .001. This value is smaller (in absolute terms) than the bivariate difference of means (-$4878.60) and indicates that stores with drive thru windows have lower weekly sales that is independent of SqFt and whether the store is in a college town. The reduction in the value of this coefficient might be due to the correlation between Drivethru Dum and SqFt – stores with drive thru windows tend to be smaller. However, there still remains a unique effect of a drive thru window store that is negative, substantial, and statistically significant. Using the standardized coefficients we can see that the largest effect in the model is Drivethru Dum (-.424), followed by College Dum (.406) and Sqft(.350). The Variance Inflation Factors (VIFs) are low and collinearity is not a factor in the model.

Page 8: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

8

EXTRA CREDIT If you so choose, you can receive either up to 1 extra credit or up to 1.5 extra credit by answering one, but not both of these questions. Pick one if you would like an extra challenge. Extra Credit 1. Up to 1 point (6+1). Using the formulas from the book and the lecture, show how the covariances or the correlations between Y, X1, and X2 can be used to estimate the regression coefficients. Show all the work to get the estimated coefficients for B0, B1, and B2. For your answer, use the following variables Y = Weekly Sales X1 = SqFt X2 = College Dum

B1 = (rY1 – r12rY2)/ (1-r122) * sY/ s1

= (0.8101- 0.5281*0.6804)/ (1- 0.52812) *2340.5/ 121.9 = 12.003 B2 = (rY2 – r12rY1)/ (1-r12

2) * sY/ s2 = (0.6804 – 0.5281*0.8101)/ (1- 0.52812) * 2340.5/ 0.4225 = 1940.4

B0 = E(Y) – B1* E(X1) – B2 * E(X2) = 22462.852- 12.003 * 925.81 – 1940.4 * 0.2264 = 10911.1 So the fitted model is:

Y = 10911.1 + 12.003 * X1 + 1940.4 * X2 Also we can compare to the JMP output below:

Page 9: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

9

Extra Credit 2. Up to 1.5 point (6+1.5). JMP, as would other statistical software, will allow you to put nominal variables into a regression equation. JMP does not treat them exactly as dummy variables. It does something similar, but different. It uses a different form of coding. I want you to figure out what the coding is. To compare the two approaches, use the Fit Model option. Model1 Fit Model Y = Weekly Sales Model Effects: College Dum Model2 Fit Model Y = Weekly Sales Model Effects: College

You will see the two models generate the same R2, F-test, and sums of squares. The only difference is the estimated coefficients. The JMP help files will also help, but the best thing to do is see what the model predicts using both approaches. Model1 Model Effects: College Dum

This intercept in this model is the mean of the reference group, weekly sales in stores that are not in college towns. The coefficient for the dummy variable reflects the difference of means between stores in college towns and noncollege towns. That difference is $3,768.992.

Page 10: FREC 409 Assignment 2 answer.pdf · Assignment 2 Be sure to: • Put your name and the Assignment # on the front • Answer as completely as you can. All I can go on is what you give

10

Model2 Model Effects: College

The intercept in this model is the average of the mean level for college towns (25,378.50) and the mean level for noncollege towns (21,609.50). Average = 25,378.488 + 21,609.495)/2 = 23,493.992 This is called an unweighted average. It is not adjusted for the difference in the number of stores in a college town versus not in a college town. Each mean is given the same weight. The coefficient for College – in this case No – is the difference from the unweighted mean and the mean for stores not in a college town. This value is $1,884.496. With a nominal variable in a model, JMP uses a different coding scheme than what is used in a dummy variable. It uses a -1, 1 coding scheme. The same result would have happened if we coded the College variable as 1 for Not in a college Town and -1 for in a college. If we preferred the result to be expressed for College towns, we would have needed to change the value order and have Yes first (an option when you click on the variable header in JMP). The different coding scheme changes on the coefficients under the parameter estimates, but not the mode and overall F-test It is simply a different way to model the nominal variable for whether a store is located in a college town. This is what is said in the JMP manual: “When an x variable has the nominal modeling type, the response can have separate prediction coefficients fit to each of its levels. The prediction

coefficients are modeled to represent differences from each level to the average response across all the nominal values.” This coding approach can have advantages when using stepwise regression methods (letting some criteria decide which variables are entered into a model and in what order). It also allows for a single F-test for all the levels of the nominal variable. Finally, it allows you to enter nominal variables into a model without creating dummy variables. You will get the same basic result – sort of!!!