Pam+and+Susan_s

Pam and Susan’s: Locating New Stores

SituationPam and Susan’s is a chain of discount department stores.* The original store was opened in the South in the mid-1950s by Pam and Susan’s father. About 10 years ago, Pam and Susan took over operational control of the stores, working together under a joint power sharing arrangement. The unusual management arrangement and consensus decision making by the two women, for which they have received a great deal of publicity, have contributed in part to sales growth and to the recent upsurge in new store openings. Fundamentally, however, their success is based on an uncanny ability to appropriately stock stores and underprice competitors. State-of-the-art business processes are at the core of their low price structure.

There are currently 250 Pam and Susan’s stores, mostly located throughout the South. Expansion has been incremental, growing from its Southern base into the Border States and increasingly into the Southwest. Identification of the most appropriate sites for new stores is becoming an issue of increasing strategic importance.

Store location decisions are based upon estimates of sales potential. The traditional process leading to estimates of sales potential starts with demographic analyses, site visits, and studies by the company’s real estate experts (augmented by input from local experts). The demographic data judged relevant for a given store location is that for people within a store’s estimated “trading zone,” usually operationalized as consisting of those census tracts within a 15 minute drive of the store. Planners in the real estate department consider current and expected future competition, ease of highway access, costs of the site, planned square footage of the store, and estimates of average sales per square foot, based on data from all existing stores. They judgmentally combine the demographic information, site information and overall sales rates to come up with an estimate of sales for a new store. Pam and Susan’s stores have primarily targeted lower-middle class to poorer neighborhoods/trading zones.

Increasingly, actual store sales at new locations have deviated from estimates provided by the real estate department. Pam and Susan want to develop better methods for estimating sales potential. A consultant (you) has been hired to explore the possibility of using the census data in stores’ trading zones, along with data on individual stores, to construct regression models to help make the location decisions..

To explore this option, a number of variables derived from the most recent census were compiled for the trading zone of each of the 250 stores (there is no overlap in the trading zones of the 250 stores).

Demographic Variables:

population: % black population: % Spanish speaking

% in each of the following family income categories (000s): 0-10; 10+-14; 14+-20; 20+-30; 30+-50; 50+-100; >100 median yearly family income median rent per month median home value % home owners % with no cars % with one car % households with TV % households with washer % households with dryer % households with dishwasher % households with air conditioner

% households with freezer % households with second home % adults (over age 25) with the following years of education: 0-8; 9-11; 12+ total population average family size

Store Data (collected on each store)

square feet of selling area (000s) annual sales (000s of $) % hard goods

Competitive Types

Type 1: Densely populated areas, particularly good store sites with relatively little direct competition Type 2: Good locations in relatively high-income areas, with little direct competition Type 3: Locations near major shopping centers Type 4: Stores in downtown areas of suburbs Type 5: Stores with competition from discounters only (not from department stores) Type 6: Stores in shopping centers Type 7: Old stores located along the sides of major roads.

Proposed New Site Locations

Site A Site BStore size:

gross square feet 170,000 160,000selling square feet 125,000 120,000

Competitive group 1 5Population

black 40.0% 13.8%Spanish speaking 10.8% 6.6%

Family income (000)0-10 26.6% 19.2%10+-14 14.0% 13.0%14+-20 19.9% 22.2%20+-30 23.9% 27.1%30+-50 13.3% 15.7%50+-100 2.0% 2.5%> 100 0.3% 0.3%

Median yearly income $16,838 $18,802Median rent per month $160 $166Median home value $46,790 $36,058

% homeowners 10.1 10.7% no cars 57.0 44.0% 1 car 36.6 45.7% TV 90.0 93.6% washer 41.8 53.6% dryer 9.0 12.2% dishwasher 6.0 4.6% air conditioner 17.9 39.3% freezer 6.1 5.0% second home 1.6 4.6

Education: %0-8 37.4 40.19-11 24.1 23.512 29.0 25.513-15 5.6 5.216 plus 3.9 5.7

Total population 955,000 431,285Average family size 3.7 3.5

Report requirements:

1. Using statistical analysis including regression, pivot table etc. Briefly describe what you discovered.

We have lots of data on the trading zones of 250 existing stores. We want to use this data to construct regression models to predict sales based on this data. We will then use this model to decide which location (site A or B) would be a better choice.

I will do the calculations in SPSS, but if you prefer another statistical program, you should be able to follow my methods to do the analyses using different software.

The dependent variable (the one we want to predict) will be “sales”.

2. Before you run regressions, you should do scatterplots of each independent variable against the dependent variable. In the interest of time, don’t actually do all of the scatterplots. Assume they all show either a linear or no relationship with the dependent variable. That said, since we know it’s important to do this step, do a few scatterplots and identify one that shows a positive or negative linear relationship and one that shows no relationship. Include these as exhibits.

To choose which scatterplots to look at, I first looked at the correlations between the independent variables and sales. The table with the Pearson correlation coefficients is below:

To summarize the correlations, the following variables are positively correlated with sales: %black, %spanishsp, %inc0-10, %inc10-14, %inc14-20, %nocars, %sch0- 8, populat, sqrft

the following variables are negatively correlated with sales:%inc20-30, %inc30-50, medianinc, medianrent, %owners, %washers, %dryers,

%dishw, %aircond, %freezer, %sechome, %sch12, %sch12+, famsize, comtype

and the following variables are not significantly correlated with sales:%inc50-100, %inc100+, medianhome, %1car, %tvs, %sch9-11, perhard

Below are some sample scatterplots (for sales against populat, medianinc, and medianhome):

Correlations

.275

.000

250

.547

.000

250

.615

.000

250

.614

.000

250

.265

.000

250

-.310

.000

250

-.404

.000

250

-.107

.092

250

.010

.870

250

-.325

.000

250

-.394

.000

250

.030

.638

250

-.690

.000

250

.701

.000

250

.010

.876

250

-.058

.358

250

-.562

.000

250

-.657

.000

250

-.491

.000

250

-.290

.000

250

-.639

.000

250

-.287

.000

250

.486

.000

250

.008

.903

250

-.238

.000

250

-.218

.001

250

.600

.000

250

-.280

.000

250

.349

.000

250

1

.

250

.016

.805

250

-.660

.000

250

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

%black

%spanishsp

%inc0-10

%inc10-14

%inc14-20

%inc20-30

%inc30-50

%inc50-100

%inc100+

medianinc

medianrent

medianhome

%owners

%nocars

%1car

%tvs

%washers

%dryers

%dishw

%aircond

%freezer

%sechome

%sch0-8

%sch9-11

%sch12

%sch12+

POPULAT

FAMSIZE

SQRFT

SALES

PERHARD

COMTYPE

SALES

3. Using the available variables (but NOT using comtype), develop the best model you can to predict store sales. How good is this model? Justify your conclusion with appropriate numbers. Based on this model, how would you describe the nature of location sites that are likely to have higher sales?

Tip: remember that you need to remove one of the income variables and one of the schooling variables to avoid a problem with multicollinearity.

After developing your model, describe how you would test the remaining regression assumptions (don’t actually do the tests, just state what they are).

To find a model, I used stepwise regression. This is a method that looks at all of the possible models with one independent variable, picks the best one, then adds on another independent variable, picks the best of those models, and so on. To do this in SPSS, you go to analyze regression linear and choose “stepwise” as the method.

First of all, we are only going to look at those variables that are correlated with sales. So, for all of these analyses, we’re not going to include %inc50-100, %inc100+, medianhome, %1car, %tvs, %sch9-11, or perhard (or comtype) as independent variables. Note that two of the income variables and one of the schooling variables has been removed in this step (as the tip suggests). If there remains a problem with colinearity, we can fix it later.Using stepwise regression, SPSS looked at 13 different models. We’re going to use the last one (so, for the ANOVA table, I removed the results for all but the last model to save space). All of the SPSS output is below:

Variables Entered/Removeda

%nocars . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

POPULAT . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%owners . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%spanishsp

. Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

. %nocars Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

SQRFT . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%washers . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%sch12+ . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%freezer . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%sch12 . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%aircond . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

%inc20-30 . Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

. %washers Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-to-remove >= .100).

Model1

2

3

4

5

6

7

8

9

10

11

12

13

VariablesEntered

VariablesRemoved Method

Dependent Variable: SALESa.

Model Summary

.701 .491 .489 3896.527

.728 .530 .526 3753.759

.746 .557 .551 3651.691

.763 .582 .576 3552.129

.760 .578 .573 3563.505

.770 .593 .586 3507.485

.777 .604 .596 3465.771

.782 .612 .603 3437.322

.787 .620 .609 3409.492

.793 .629 .617 3374.883

.799 .638 .625 3339.110

.805 .647 .633 3305.013

.802m .643 .630 3316.813

Model1

2

3

4

5

6

7

8

9

10

11

12

13

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), POPULAT, %owners,%spanishsp, SQRFT, %sch12+, %freezer, %sch12,%aircond, %inc20-30

m.

Coefficientsa

11952.685 2409.388 4.961 .000

4.360E-03 .001 .224 4.418 .000 .579 1.728

-55.347 13.294 -.278 -4.163 .000 .333 3.007

304.772 63.871 .244 4.772 .000 .569 1.757

23.313 7.749 .127 3.008 .003 .838 1.194

104.378 27.620 .213 3.779 .000 .468 2.136

-158.047 44.337 -.230 -3.565 .000 .357 2.799

105.814 38.160 .147 2.773 .006 .526 1.900

-57.095 18.248 -.139 -3.129 .002 .749 1.335

-81.547 32.225 -.107 -2.531 .012 .825 1.212

(Constant)

POPULAT

%owners

%spanishsp

SQRFT

%sch12+

%freezer

%sch12

%aircond

%inc20-30

Model13

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics


ANOVAn

4.76E+09 9 529096554.2 48.094 .000m

2.64E+09 240 11001247.30

7.40E+09 249

Regression

Residual

Total

Model13

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), POPULAT, %owners, %spanishsp, SQRFT, %sch12+,%freezer, %sch12, %aircond, %inc20-30

m.

Dependent Variable: SALESn.

The model is:

sales = 11952.685 + 0.00436(population) – 55.347(%owners) + 304.772(%spanishsp) + 23.313(sqrft) + 104.378(%sch12+) – 158.047(%freezer) + 105.814(%sch12) – 57.095(%aircond) – 81.547(%inc20-30)

The coefficients can be found in the “Coefficients” table above. All of the coefficients are significantly different than 0 (note the small p-values in the table). I included a measure of collinearity in the analysis. A low tolerance (close to 0) or a high VIF (variance inflation factor, close to 10) would indicate that we have a collinearity problem. As you can see from the last two columns of the table, we do not need to remove any of the variables from the model (if there were any collinear variables, they were excluded automatically during the stepwise regression process).

We still need to test that the residuals of the model are independent, normal, and have constant variance. I’ll test those assumptions now, even though the assignment is just to mention them, not actually test them.

I ran the regression analysis again, this time saving the unstandardized residuals and the standardized predicted values of the model. A scatterplot of the predicted values and residuals shows that the residuals are independent (slope of 0) and the variance is constant (although the range of the variances might be slightly larger to the right of the graph), and a Q-Q plot of the residuals shows that they are normally distributed (it is linear except for a few points near the top – in real life, you might want to identify those outlying points, remove them, and do the analyses again).

All the assumptions are met.

4. A group within the planning department has developed a more subjective approach in which potential sites are classified according to an assessment of the “competitive type” of the trading zone. How well does a model just using the “competitive type” variables predict sales compared to the model you developed in Step 1? (NOTE: to do this you must create and use dummy variables for COMTYPE. Remember to use only six of the seven dummy variables in your regression).

A model only using the comtype data looks like this:

sales = 19247.746 – 1941.466(comtype)

This model isn’t as good as the previous model (note the lower r-squared value of 0.443 instead of 0.643).

Model Summaryb

.666a .443 .441 4190.734Model1



Predictors: (Constant), COMTYPEa.

Dependent Variable: SALESb.

ANOVAb

3.19E+09 1 3188817820 181.572 .000a

4.00E+09 228 17562250.53

7.19E+09 229

Regression

Residual

Total

Model1


Predictors: (Constant), COMTYPEa.


Coefficientsa

19247.746 586.447 32.821 .000

-1931.466 143.338 -.666 -13.475 .000 1.000 1.000

(Constant)

COMTYPE

Model1

B Std. Error


Beta





5. If you are allowed to use any of the variables (the variables used in Step 1 above and the competitive type variables used in Step 2), can you build a better model than that developed in Step 1? Does this new model change how you would describe locations likely to have higher sales?

A regression analysis using the variables from the original model and comtype produces an even better model. The r-squared value is 0.721, and there still isn’t any problem with collinearity.

sales = 14030.898 – 811.404(comtype) + 0.005331(populat) – 35.680(%owners)) + 228.047(%spanishsp) + 11.570(sqrft) + 83.933(%sch12+) – 149.536(%freezer) + 101.193(%sch12) – 47.460(%aircond) – 48.020(%inc20-30)

Model Summaryb

.849a .721 .709 3025.670Model1



Predictors: (Constant), %inc20-30, %sch12+, SQRFT,%aircond, POPULAT, COMTYPE, %sch12,%spanishsp, %freezer, %owners

a.


ANOVAb

5.19E+09 10 518813570.9 56.672 .000a

2.00E+09 219 9154681.423

7.19E+09 229

Regression

Residual

Total

Model1


Predictors: (Constant), %inc20-30, %sch12+, SQRFT, %aircond, POPULAT,COMTYPE, %sch12, %spanishsp, %freezer, %owners

a.


Coefficientsa

14030.898 2313.457 6.065 .000

-811.404 147.710 -.280 -5.493 .000 .491 2.037

5.331E-03 .001 .269 5.743 .000 .582 1.718

-35.680 13.995 -.172 -2.550 .011 .281 3.557

228.047 59.576 .184 3.828 .000 .548 1.823

11.570 7.492 .063 1.544 .124 .769 1.300

83.933 27.063 .170 3.101 .002 .423 2.361

-149.536 41.720 -.211 -3.584 .000 .367 2.725

101.193 36.468 .139 2.775 .006 .511 1.958

-47.460 18.462 -.108 -2.571 .011 .718 1.393

-48.020 31.213 -.062 -1.538 .125 .778 1.286

(Constant)

COMTYPE

POPULAT

%owners

%spanishsp

SQRFT

%sch12+

%freezer

%sch12

%aircond

%inc20-30

Model1

B Std. Error


Beta





6. Two sites, A and B, are currently under consideration for the next new store opening. Which site would you recommend? Justify your choice, using the model you like best from those you have developed.

Pick the model you like best and plug the data from sites A and B into it. The site with the higher sales should “win”. I like the last model, so I’ll use that one.

To avoid doing the calculations by hand, I entered the data from stores A and B into the SPSS file (excluding the sales data, since there isn’t any), and did the same regression analysis, saving the unstandardized predicted values. This means that for all the stores, including sites A and B, I have the predicted sales values.

The predicted sales for site A is $22,682,300 and the predicted sales for site B is $14,366,040. Therefore, I would choose site A.

Report Specifications

Prepare a report for a non-technical manager answering the questions above. The text of the report should explain the process you followed from your initial exploration of the data to the development and assessment of the final model. Exhibits (graphs, tables, charts, regression output) should be constructed, labeled and captioned so they can be understood by a non-technical person and your conclusions from each exhibit should be described in the text of the report.

Please communicate findings clearly (both technically and managerially) on the appropriate conclusions. A reasonable guideline would be about three to four pages of actual text plus exhibits.

Use the answers to the questions above to write the report.

Documents

Pam+and+Susan_s