Burns05 Im 1s9

Embed Size (px)

Citation preview

  • 7/28/2019 Burns05 Im 1s9

    1/31

    CHAPTER 19

    REGRESSION ANALYSIS IN MARKETING RESEARCH

    LEARNING OBJECTIVES

    To understand the basic concept of prediction

    To learn how marketing researchers use regression analysis

    To learn how marketing researchers use bivariate regression analysis

    To see how multiple regression differs from bivariate regression

    To appreciate various types of stepwise regression, how they are applied, and the

    interpretation of their findings

    To learn how to obtain and interpret regression analyses with SPSS

    CHAPTER OUTLINE

    UNDERSTANDING PREDICTION

    Two Approaches to PredictionHow to Determine the Goodness of Your Predictions

    BIVARIATE LINEAR REGRESSION ANALYSIS

    Basic Procedure in Bivariate Regression AnalysisIndependent and Dependent VariablesComputing the Slope and the Intercept

    The Hobbits Choice Restaurant Survey: How to Run and InterpretBivariate Regression Analysis on SPSS

    Testing for Statistical Significance of the Intercept and the SlopeMaking a Prediction and Accounting for Error

    MULTIPLE REGRESSION ANALYSIS

    An Underlying Conceptual ModelMultiple Regression Analysis Described

    Basic Assumptions in Multiple RegressionThe Hobbits Choice Restaurant Survey: How to Run and Interpret Multiple

    Regression Analysis on SPSS

    355

  • 7/28/2019 Burns05 Im 1s9

    2/31

    Chapter 19: Regression Analysis in Marketing Research

    Using Results to Make a PredictionSpecial Uses of Multiple Regression Analysis

    Using a Dummy Independent VariableUsing Standardized Betas to Compare the Importance of Independent

    Variables

    Using Multiple Regression as a Screening DeviceSTEPWISE MULTIPLE REGRESSIONHow to do Stepwise Multiple Regression with SPSS

    THREE WARNINGS REGARDING MULTIPLE REGRESSION ANALYSIS

    KEY TERMS

    Prediction Extrapolation

    Predictive model Analysis of residuals

    Bivariate regression analysis Intercept

    Slope Dependent variable

    Independent variable Least squares criterion

    Standard error of the estimate Outlier

    General conceptual model Multiple regression analysis

    Regression plane Additivity

    Independence assumption Multicollinearity

    Variance inflation factor (VIF) Dummy independent variable

    Standardized beta coefficient Screening device

    Stepwise multiple regression

    TEACHING SUGGESTIONS

    1. Students may need some additional help understanding the difference betweenextrapolation and building a predictive model. A way to help them comprehend thedifference is to note that extrapolation always relies on some pattern that is seen overtime, while prediction requires the use of a factor other than time. Extrapolation usesthe average change in the focal variable per relevant time period, while predictionuses the average change in the focal variable per relevant unit of the other variable.

    100

  • 7/28/2019 Burns05 Im 1s9

    3/31

    Chapter 19: Regression Analysis in Marketing Research

    You can use sales and marketing variables as an example. If sales have increased at10% per year for the past 5 years, you can extrapolate that they will increase 10% inthe coming year. However, if sales have increased by 20% for every 10% decrease inprice, you can predict that they will increase by 20% if the price is decreased by 10%.In both cases, however, all other variables are assumed to have the same influence as

    in the past.

    2. The analysis of residuals underpins assessment of the goodness of a predictive model,and it is an important foundational concept. To help students understand analysis ofresiduals, consider the following in-class exercise.

    Show students the following number series, and ask them what straight line formulawill correctly predict the next number.

    15, 20, 25, 30, ?

    To find the intercept, use y = a + bx, and set x = 0, or

    a + b (0) = 15a = 15

    Next, experiment with different values of b, and look at how close the results are tothe given series.

    Series (y) 15, 20, 25, 30, ?

    Let x = 0 1 2 3 4 Residual (sum: 0-3)15 + 1x 15, 16, 17, 18, 19 2415 + 2x 15, 17 19, 21, 23 1815 + 3x 15, 18, 21, 24, 27 1215 + 4x 15, 19, 23, 27, 31 615 + 5x 15, 20, 25, 30, 35 0

    The residual (sum: 0-3) is the sum of the differences in the predicted value for eachequation as compared to the series. The 15 +5x equation has the lowest residual, so itis the best predictive model, and although its residual is 0, the prediction of 35 forx=4 is correct.

    3. Use of the Novartis data to illustrate bivariate regression is intentional as it explicitlyties regression to correlation. The text notes that the same data is used, but it isworthwhile to point out the connection to students who may have skipped over thispoint or otherwise overlooked it.

    4. There are many nuances to regression analysis not treated in this chapter'sintroduction to the topic. The intent is to describe the basic concepts and to havestudents identify their related values on a printout. SPSS on the other hand, doesprovide for a number of statistical options that are beyond the scope of the chapter,

    101

  • 7/28/2019 Burns05 Im 1s9

    4/31

    Chapter 19: Regression Analysis in Marketing Research

    particularly in the case of multiple regression. Some instructors who desire more in-depth coverage of this technique may do so with their own materials and rely onSPSS to accommodate this deeper coverage.

    5. Regression analysis is complicated and difficult for undergraduate students to

    understand. To help with the comprehension of regression analysis, we haveprovided a number of regression application examples. If ones students relate wellto concrete examples, it may be beneficial to use these examples in class or to go overthem in detail more than with the examples in earlier chapters.

    6. The section on the underlying conceptual model for multiple regression analysis hastwo pedagogical benefits.

    First, it can be used to help students understand the distinction between independentand dependent variables. The independent variables come from the constructs thatare on the outside of the diagram that have arrows pointing toward the center.

    Dependent variables emanate from the center of the diagram, and the diagram impliesthat the central variables (dependent) are affected or influenced by the surroundingvariables (independent).

    Second, the abbreviated lists of examples of variables for each circle in the diagramshould help students to identify the specific variables (such as demographic variables)that would or could be used in the multiple regression model.

    7. In earlier editions of the textbook, Chapter 18 included a section on time seriesanalysis. This topic was deleted, and the section on multiple regression wasexpanded in response to what the authors perceived to be a low level of interest intime series analysis by adopting instructors. The Student Version of SPSS does havetime series (experiential smoothing) analysis capabilities as well as graphingprocedures for time series data. Instructors who wish to teach time series analysisconcepts can still do so using SPSS; however, they will need to draw from sourcesother than the textbook for reading or study materials for their students.

    8. Because of the many assumptions of regression analysis that can be easily violatedwith a tool such as SPSS, we emphasize caution when unleashing students onmultiple regression analysis. We have provides some readable references in theendnotes (13 and14) that we list below in case the Instructor want his or her studentsto be exposed to practitioner-oriented literature on this topic. (The QuirksMarketing Research Review articles are available at www.quirks.com).

    See for example, Kennedy, Peter (2005, Winter), Oh No! I got the wrong sign!What should I do?Journal of Economic Education, Vol. 36, No. 1, 77-92.

    For readable treatments of problems encountered in multiple regression applied tomarketing research, see Mullet, Gary (1994, October), Regression, regression,Quirks Marketing Research Review, electronic archive, Mullet, Gary (1998,

    102

  • 7/28/2019 Burns05 Im 1s9

    5/31

    Chapter 19: Regression Analysis in Marketing Research

    June), Have you ever wondered, Quirks Marketing Research Review,electronic archive, Mullet, Gary (2003, February), Data abuse, Quirks MarketingResearch Review, electronic archive.

    ACTIVE LEARNING EXERCISES

    Perform a Bivariate Regression with SPSS

    Using the clickstream and annotated SPSS output in Figures 19.2 and 19.3, respectively,and the Hobbits Choice Restaurant survey dataset provided to you, perform bivariateregression analysis using the amount respondents expect to pay, on average, for andevening meal entree item in the new restaurant. When you have determined the results,make a prediction of how much a person who earns an income of $100,000 per yearexpects to pay for this entre.

    Students should use the recoded income variable (using midpoints of the ranges inthousands) to perform this bivariate regression.

    Model Summary

    .754a .569 .567 $6.46485

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), Recoded income to $1,000s

    using midpoints of questionnaire ranges

    a.

    ANOVAb

    18616.303 1 18616.303 445.427 .000a

    14126.474 338 41.794

    32742.776 339

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), Recoded income to $1,000s using midpoints ofquestionnaire ranges

    a.

    Dependent Variable: What would you expect an average evening meal entree itemalone to be priced?

    b.

    103

  • 7/28/2019 Burns05 Im 1s9

    6/31

    Chapter 19: Regression Analysis in Marketing Research

    Coefficientsa

    5.932 .705 8.417 .000

    .148 .007 .754 21.105 .000

    (Constant)

    Recoded income to$1,000s using midpointsof questionnaire ranges

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    StandardizedCoefficients

    t Sig.

    Dependent Variable: What would you expect an average evening meal entree item alone to bepriced?

    a.

    The equation isAmount expected to pay = $5.93 + .148 times (income level)

    So, for an income level of $100,000Amount expected to pay = $5.93 + .148 times ($100)

    = $5.93 + $14.80= $20.73

    To apply 95% confidence intervals, students must use 1.96 times $6.46, or $12.66, sothe boundaries are $8.07 to $33.39.

    The General Conceptual Model for Intentions to Patronize the Hobbits Choice

    Restaurant

    What is the general conceptual model apparent in the Hobbits Choice Restaurant surveydataset?

    The central dependent variable is How likely would it be for you to patronize thisrestaurant (new upscale restaurant)?

    Demographics are:

    Year born

    What is your highest level of education?

    What is your marital status?

    Including children under 18 living with you, what is your family size?

    Please check the letter that includes the Zip Code in which you live (coded by

    letter). Which of the following categories best describes your before tax household

    income?

    What is your gender?

    Attitudes are preferences for various restaurant features:

    Prefer Waterfront View

    Prefer Drive Less than 30 Minutes

    104

  • 7/28/2019 Burns05 Im 1s9

    7/31

    Chapter 19: Regression Analysis in Marketing Research

    Prefer Formal Waitstaff Wearing Tuxedos

    Prefer Unusual Desserts

    Prefer Large Variety of Entrees

    Prefer Unusual Entrees

    Prefer Simple Decor

    Prefer Elegant Decor

    Prefer String Quartet

    Prefer Jazz Combo

    Also, and attitude is: What would you expect an average evening meal entree item aloneto be priced?

    Media habits are:

    Would you describe yourself as one who listens to the radio?

    To which type of radio programming do you most often listen?

    Would you describe yourself as a viewer of TV local news? Which newscast do you watch most frequently?

    Do you read the newspaper?

    Which section of the local newspaper would you say you read most

    frequently?

    Do you subscribe to City Magazine?

    Past behavior is Do you eat at this type of restaurant at least once every two weeks?(qualifier, so not to be included in any analysis), and How many total dollars do youspend per month in restaurants (for your meals only)?

    Comment on the usefulness of this general conceptual model to Jeff Dean. That is,assuming that the regression results are significant, what marketing strategyimplications will become apparent?

    Jeff will gain market segmentation implications from the demographics and the pastbehavior (amount spent on restaurants per month), promotional strategy implicationsfrom the media habits, restaurant design implications from the preferences, and pricingimplications from the average price expected to pay variable.

    Segmentation Associates, Inc.

    Note: this was an end-of-chapter case in thefourth edition. The full case solution isprovided following even though the Active Learning Exercise only asks students aboutquestions 1 and 2. Instructors may want to use questions 3-5 for class discussion.

    Case Objective

    105

  • 7/28/2019 Burns05 Im 1s9

    8/31

    Chapter 19: Regression Analysis in Marketing Research

    This case requires students to interpret the results of multiple regression and to applythem to market segmentation target marketing considerations using the underlyingconceptual model concept described in the chapter. It also illustrates the use of multipleregression to identify market segment differences.

    Answers to Case Questions

    1. What is the underlying conceptual model used by Segmentation Associates that isapparent in these three sets of findings?

    One can refer to the general conceptual model presented in Figure 19.6 and pick outthose variables that are apparent in the Segmentation Associates example.

    The dependent variable is type of automobile purchased (compact, sports car, orluxury car). In order to satisfy multiple regression assumptions, the dependentvariable(s) would be metric, so they could be preference measures (such as 1 = do not

    prefer and 5= greatly prefer) as to automobile type. The independent variables aredemographic and life style measures. Thus, the conceptual model is thatdemographic and life styles predict automobile type preference.

    2. What are the segmentation variables that distinguish compact automobile buyers andin what ways do they distinguish them?

    The relevant section of the table is reproduced here. Recall that the cell entries arestandardized beta coefficients that are statistically significant. Compact automobilebuyers have strong family values, are not cosmopolitan, have larger families, takepride in American, but do not embrace change. They are younger, financiallyinsecure, and they have less education and earn less income.

    Segmentation

    Variable

    Compact

    Automobile

    Buyers

    Demographics

    Age -.28

    Education -.12

    Family size +.39

    Income -.15

    Life Style/Values

    ActiveAmerican pride +.30

    Bargain hunter +.45

    Conservative

    Cosmopolitan -.40

    Embrace change -.30

    Family values +.69

    Financially secure -.28

    106

  • 7/28/2019 Burns05 Im 1s9

    9/31

    Chapter 19: Regression Analysis in Marketing Research

    Optimistic

    3. What are the segmentation variables that distinguish sports car buyers and in whatways do they distinguish them?

    SegmentationVariable

    Sports CarBuyers

    Demographics

    Age -.15

    Education +.38

    Family size -.35

    Income +.25

    Life Style/Values

    Active +.59

    American pride

    Bargain hunter -.33

    Conservative -.38Cosmopolitan +.68

    Embrace change +.65

    Family values

    Financially secure +.21

    Optimistic +.71

    Sports car buyers are optimistic and cosmopolitan, and they embrace change. Theyalso lead active lives. They are not conservative, and they are not bargain hunters.Demographically, sports cars buyers have more education and more income, areyounger, and represent smaller families.

    4. What are the segmentation variables that distinguish luxury automobile buyers andin what ways do they distinguish them?

    Segmentation

    Variable

    Luxury

    Automobile

    Buyers

    Demographics

    Age +.59

    Education

    Family size

    Income +.68

    Life Style/Values

    Active -.39

    American pride +.24

    Bargain hunter

    Conservative +.54

    Cosmopolitan

    107

  • 7/28/2019 Burns05 Im 1s9

    10/31

    Chapter 19: Regression Analysis in Marketing Research

    Embrace change

    Family values +.21

    Financially secure +.50

    Optimistic +.37

    Luxury car buyers are older with higher incomes. They are conservative, financiallysecure, and optimistic. They do not lead active lives, and they believe in familyvalues and American pride.

    5. Contrast the segmentation variable classification differences among the three typesof automobile buyers.

    Three differences are apparent. First, not all segmentation variables are statisticallysignificant for all three car buyer segments. Second, the standardized betacoefficients are different from segment to segment, in fact, the signs are different insome cases. Third, the relative importance (absolute values of the standardized betas)

    differs between market segments.

    ANSWERS TO END-OF-CHAPTER QUESTIONS

    1. Construct and explain a reasonable simple predictive model for each of the followingcases:

    Application question. Students must use the predictive model concept introduced inthe chapter to suggest reasonable relationships.

    A reasonable model is described under each case.

    a. What is the relationship between gasoline prices and distance traveled for familyautomobile touring vacations?

    As gasoline prices at the pump increase, the number of automobile touring milesfor family vacations decreases.

    b. How do hurricane force warnings relate to purchases of flashlight batteries in theexpected landfall area?

    The severity of the forecasted hurricane will be positively related to purchases of

    flashlight batteries because hurricanes commonly cause electricity blackouts.

    c. What do florists do with regard to inventory of flowers for the week prior to andthe week after Mother's Day?

    During the week prior, they stock up, so inventories will increase to peak on theday before Mother's Day, but the demand falls significantly in the next week, sostocks fall to their normal levels.

    108

  • 7/28/2019 Burns05 Im 1s9

    11/31

    Chapter 19: Regression Analysis in Marketing Research

    109

  • 7/28/2019 Burns05 Im 1s9

    12/31

    Chapter 19: Regression Analysis in Marketing Research

    2. Indicate what the scatter diagram and probable regression line would look like fortwo variables that are correlated in each of the following ways. In each instance,assume a negative intercept.

    Application question. To answer each item, students must understand how to apply

    scatter diagrams in the context of a correlation coefficients information.

    The scatter diagram appearance and regression line are described after eachcorrelation.

    a. - .89

    The scatter diagram would be a well-defined and narrow ellipse with a negativeslope. The regression line would begin at the (negative) intercept and trace themidpoint line of the ellipse from end to end.

    b. +.48

    The scatter diagram would be an ill-defined and wide ellipse with a positive slope.The regression line would begin at the (negative) intercept and cut through themiddle of the ellipse from end to end.

    c. -.10

    The scatter diagram would be definitionless. The regression line would begin atthe (negative) intercept and move down to the right. The precise angle of theregression line could not be determined from the scatter diagram.

    3. Circle K runs a contest inviting customers to fill out a registration card. Inexchange, they are eligible for a grand prize drawing of a trip to Alaska. The cardasks for the customer's age, education, gender, estimated weekly purchases (indollars) at the Circle K, and approximate distance the Circle K is from his or herhome. Identify each of the following if a multiple regression analysis was to beperformed.

    a. Independent variablesb. Dependent variablec. Dummy variable

    Application question. Students will need to comprehend and apply these basicregression concepts.

    The dependent variable is estimated weekly purchases, while the independentvariables are age, education, gender, and distance from his/her home. Gender is adummy variable although it is categorical (male or female).

    110

  • 7/28/2019 Burns05 Im 1s9

    13/31

    Chapter 19: Regression Analysis in Marketing Research

    4. Explain what is meant by the independence assumption in multiple regression. Howcan you examine your data for independence, and what statistics are issued by moststatistical analysis programs? How is this statistic interpreted? That is, what wouldindicate the presence of multicollinearity, and what would you do to eliminate it?

    Review question.In order to answer this question correctly, students must beconversant in the notion of independence as it pertains to multiple regression analysis.

    The independence assumption refers to the necessity for the independent variables ina multiple regression to be statistically independent, or uncorrelated. This is afundamental assumption of multiple regression. The statistic referred to is thevariance inflation factor or VIF. A rule of thumb is that as long as VIF is less than10, multicollinearity is not a concern. With a VIF of greater than 10 associated withany independent variable in the multiple regression equation, the researcher shouldremove that variable from the independent variable set, and rerun the multipleregression. This iterative process is used until only independent variables that are

    statistically significant and that have acceptable VIFs are in the final multipleregression equation.

    5. What is multiple regression? Specifically, what is "multiple" about it, and how doesthe formula for multiple regression appear? In your indication of the formula,identify the various terms and also indicate the signs (positive or negative) that theymay take on.

    Review question. This is a test of a students basic knowledge of the underlyingmodel in multiple regression analysis.

    Multiple regression is conceptually identical to bivariate regression except that morethan one, and perhaps several, independent variables are used. The general equationfor multiple regression is as follows:

    Where y is the dependent variables, a is the intercepts, the b is are the regressioncoefficients or betas, and the xis are the values of the independent variables. Thereare n independent variables in the equation.

    6. If one uses the "enter" method for multiple regression analysis, what statistics on an

    SPSS for Windows output should be examined to assess the result? Indicate how youwould determine each of the following:

    a. Variance explained in the dependent variable by the independent variablesb. Statistical significance of each of the independent variablesc. Relative importance of the independent variables in predicting the dependent

    variable

    111

    nnxbxbxbxbay ...332211 ++++=

  • 7/28/2019 Burns05 Im 1s9

    14/31

    Chapter 19: Regression Analysis in Marketing Research

    Review question. Students will need to be familiar with SPSS multiple regressionoutput and how its various statistics relate to multiple regression modelassumptions/results.

    For a., inspect the adjusted R square; with b., the significance of the t value

    associated with each independent variables beta is reported; and for c., look at thestandardized beta coefficients.

    7. Explain what is meant by the notion of "trimming" a multiple regression result. Usethe following example to illustrate your understanding of this concept.

    A bicycle manufacturer maintains records over 20 years of the following measured inappropriate units per year: unit sales (dependent variable), average retail price indollars, co-operative advertising amount in dollars, competitors' average retail pricein dollars, number of retail locations selling the bicycle manufacturer's brand, andwhether or not the winner of the Tour de France was riding the manufacturers' brand

    (coded as a dummy variable where 0=no, and 1-yes).

    The initial multiple regression result determines the following:Variable Significance Level

    Average retail price in dollars .001Cooperative advertising amount in dollars .202Competitors' average retail price in dollars .028Number of retail locations .591Tour de France .032

    Using the "enter" method, what would be the trimming steps you would expect toundertake to identify the significant multiple regression result? Explain yourreasoning.

    Application question. This question simulates the first result of an SPSS multipleregression model analysis and requires students to apply the notion of trimming.

    Trimming refers to the iterative process of eliminating nonsignificant independentvariables and rerunning the multiple regression until only statistically significant onesremain. In the bicycle example, the process would be to next eliminate the numberof retail locations for the next multiple regression run. If cooperative advertising...is again nonsignificant and the largest significance value, eliminate it and rerun thetrimmed multiple regression. Continue until only statistically significant betas areleft.

    The logic is based on the null hypothesis that says that any nonsignificantindependent variable in the multiple regression equation has a beta of zero. Thus, inorder to make it take on a zero beta, the independent variable must be removed.Otherwise, the statistical analysis will still compute a nonzero value for thatindependent variable.

    112

  • 7/28/2019 Burns05 Im 1s9

    15/31

    Chapter 19: Regression Analysis in Marketing Research

    8. Using the bicycle example in question 7, what do you expect would be the eliminationof variables sequence using stepwise multiple regression? Explain your reasoningwith respect to the operation of this technique.

    Application question. This question tests a students comprehension of how thestepwise feature eliminates nonsignificant independent variables from the regressionequation.

    With stepwise regression, the first independent variable to be included is the one thatis most significant and explains the most variance. The multiple regression equationis then recomputed with the remaining independent variables and the most significantone in that set is added. This process is repeated until only those independentvariables with statistically significant coefficients are in the equation.

    There is no explained variance information in question 7, so students must work with

    the significance levels. Assuming that the significance levels do not differ with eachiteration from those given in the bicycle example, the order of entry would be (1)average price in dollars, (2) competitors average retail price in dollars, and (3) Tourde France.

    At this point, the stepwise procedure would stop and compute a multiple regressionresult for these three independent variables.

    9. Using SPSS graphical capabilities, diagram the regression plane for the followingvariables.

    Number of gallons ofgasoline used per week Miles commuted for workper week Number of riders incarpool

    5 50 4

    10 125 3

    15 175 2

    20 250 0

    25 300 0

    Application question. Students must use the scatter diagram graphing capabilities ofSPSS for Windows to create these graphs.

    Enterprising students may attempt to create a three-dimensional graph, but the datarange and number of data points are too few to create a good-looking graph.

    113

  • 7/28/2019 Burns05 Im 1s9

    16/31

    Chapter 19: Regression Analysis in Marketing Research

    WORK

    3002001000

    GAS

    30

    20

    10

    0

    CARPOOL

    543210-1

    GAS

    30

    20

    10

    0

    10. The Maximum Amount is a company that specializes in making fashionable clothes inlarge sizes for large people. Among its customers are Sinbad and Shaquille ONeal.A survey was performed for the Maximum Amount, and a regression analysis was runon some of the data. Of interest in this analysis was the possible relationship self-esteem (dependent variable) and number of Maximum Amount articles purchased lastyear (independent variable). Self-esteem was measured on a 7-point scale where 1signifies very low and 7 indicates very high self-esteem. Following are some itemsthat have been taken from the output.

    114

  • 7/28/2019 Burns05 Im 1s9

    17/31

    Chapter 19: Regression Analysis in Marketing Research

    Pearson product moment correlation = +.63Intercept = 3.5Slope = +0.2Standard Error = 1.5

    All statistical tests significant at the .01 level or lessWhat is the correct interpretation of these findings?

    Application question. This question tests students comprehension of regressionfindings.

    This is a bivariate regression, so the square of the correlation will indicate the amountof variance explained by the regression which is .632 or about .40. The interceptsuggests that for those people who buy no Maximum Amount articles, their self-esteem is 3.5 on the average. Self-esteem increases by the slope (.2) with each articlepurchased. The standard error reveals that there is a considerable range in any

    prediction, however, although at the 95% level of significance, the confidenceintervals are 1.5 times 1.96, or about 3.0. Essentially, prediction is very imprecise.

    11. Wayne LaTorte is a safety engineer who works for the U.S. Postal Service. For mostof his life, Wayne has been fascinated by UFOs. He has kept records of UFOsightings in the desert areas of Arizona, California, and New Mexico over the past 15years and he has correlated them with earthquake tremors. A fellow engineersuggests that Wayne use regression analysis as a means of determining therelationship. Wayne does this and finds a "constant" of 30 separate earth tremorevents and a slope of 5 events per UFO sighting. Wayne then writes an article for theUFO Observerclaiming that earthquakes are largely caused by the subsonicvibrations emitted by UFOs as they enter the Earth's atmosphere. What is yourreaction to Wayne's article?

    Application question. Can students spot this misuse of regression?

    Although Wayne has determined a statistical relationship, he cannot claim to havefound a causal relationship with this data. There is only an association, andregression is a means of making a prediction, but it is inappropriate to infer a causalrelationship. (Students may point out that the R square and standard error areunknown, so it is impossible to assess the goodness of Wayne's model, but the largerpoint is that he is using regression to substantiate his causal analysis.)

    115

  • 7/28/2019 Burns05 Im 1s9

    18/31

    Chapter 19: Regression Analysis in Marketing Research

    CASE SOLUTIONS

    Case 19.1 Dont You Hate it When Part IV

    Case Objective

    This case is another one where Josh fails to do the analysis correctly, and Marsha must dothe work. It involves understanding of the two-step, trimming process and interpretationof standardized beta coefficients.

    Answers to Case Questions

    1. Describe the two-step process and trimming approach that Josh should have used inrunning his three multiple regression analyses with the Pets, Pets, & Pets data.

    The two-step process is (1) determine if there is a linear relationship by examining the

    significance level for the F-test, and if this test is significant, (2) inspect the multipleregression independent variables coefficients for significance. Trimming is aprocess of systematically eliminating the nonsignificant independent variablesiteratively based on their lack of statistical significance.

    2. Assume that the independent variables reported in each of Joshs three tables are theresult of correctly using the two-step process and trimming the nonsiginficantindependent variables. Describe the relationships revealed in each table, andindicate the implications of these relationships for Pets, Pets & Pets marketingstrategy.

    The interpretations are provided in the following tables.

    116

  • 7/28/2019 Burns05 Im 1s9

    19/31

    Chapter 19: Regression Analysis in Marketing Research

    The most important variables are Iusually purchase pet supplies from thesame pet company, and My pet is alarge part of my life, with PPP helpsme stretch my wallet (negative) and

    buying pet supplies at PPP gives metime to do more important thingsslightly less important.

    The profile of the PPP frequent visitor:

    Company loyal

    Pet is large part of the family

    Use PPP to save time

    Does not use PPP to save money

    Lives close to PPP

    Male

    Enjoys caring for pet

    Pleased with life

    *Based on a scale where 1=strongly disagree and

    5=strongly agree

    The most important variable is goodvalues with helpful employees next,and narrow (negative sign) of PPPsupplies third in importance.

    Why patrons are likely to buy at PPPnext time:

    Good values

    Helpful employees

    Narrow variety (PPP is a specialty

    store)

    117

    Table 1: Times visited

    PPP in past year

    Independent Variable(s) Standardized

    I usually purchase petsupplies from the same

    company.*

    0.31

    Pets, Pets & Pets helps mestretch my wallet.*

    -0.25

    Buying pet supplies at Pets,Pets & Pets gives me timeto do more importantthings.*

    0.25

    My pet is a large part of mylife.*

    0.30

    I am pleased with my petright now.*

    0.13

    I enjoy taking care of mypet.*

    0.15

    How many miles do youlive from Pets, Pets & pets?

    -0.19

    Indicate your gender(1=male, 2=female)

    -0.18

    *Based on a scale where 1=strongly disagreeand 5=strongly agree

    Table 2: How likely to buy

    at PPP next time (1-7 scale)

    Independent Variable(s) Standardized

    Wide variety of pet suppliesat Pets, Pets & Pets*

    -0.25

    Good values at Pets, Pets &Pets*

    0.49

    Helpful employees at Pets,Pets & Pets*

    0.33

  • 7/28/2019 Burns05 Im 1s9

    20/31

    Chapter 19: Regression Analysis in Marketing Research

    Most important is family incomelevel, next is (did not) recall seeingPPP newspaper ad, third is (few)number of pets owned.

    The profile of high spenders at PPP: High income level

    Do not recall PPP newspaper ads

    Few pets owned

    Case19.2 Sales Training Associates, Inc.

    Case Objective

    This case requires students to perform bivariate and multiple regression analyses and tointerpret the results.

    118

    Table 3: Amount spent at

    PPP last time

    Independent Variable(s) Standardized

    Number of pets owned -0.17

    Recall seeing a PPPnewspaper ad in the pastmonth? (1=yes, 2=no)

    -0.21

    Family income level 0.38

  • 7/28/2019 Burns05 Im 1s9

    21/31

    Chapter 19: Regression Analysis in Marketing Research

    Answers to Case Questions

    1. Using SPSS for Windows perform a series of bivariate regressions using the salesperformance measure as the dependent variable, and each of the other factors in thetable as independent measures. What did you find, and how do you interpret these

    findings?

    The bivariate regression outputs are listed in order, each with an interpretation. Forconvenience, confidence intervals for predictions are set at 95%.

    Model Summary

    .723a .523 .497 4.07

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), TRAINHRSa.

    ANOVAb

    326.264 1 326.264 19.738 .000a

    297.536 18 16.530

    623.800 19

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), TRAINHRSa.

    Dependent Variable: RATINGb.

    Coefficientsa

    3.357 2.127 1.578 .132

    6.45E-02 .015 .723 4.443 .000

    (Constant)

    TRAINHRS

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: RATINGa.

    Interpretation. The bivariate regression is significant (F=.000) and the Adjusted R

    Square indicates that training hours explains about 50% of the ratings. The slopeis significant (.000) and .065 in size. The constant is zero as it is nonsignificant(Sig = .132). The equation is: rating = 0 + .065 times number of training hours.Confidence intervals for predictions at 95% level of confidence will be "1.96times 4.07.

    119

  • 7/28/2019 Burns05 Im 1s9

    22/31

    Chapter 19: Regression Analysis in Marketing Research

    Model Summary

    .612a .374 .339 4.66

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), CERTSa.

    ANOVAb

    233.297 1 233.297 10.754 .004a

    390.503 18 21.695

    623.800 19

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), CERTSa.

    Dependent Variable: RATINGb.

    Coefficientsa

    5.595 2.187 2.558 .020

    1.314 .401 .612 3.279 .004

    (Constant)

    CERTS

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: RATINGa.

    Interpretation. The bivariate regression is significant (F=.004), and both the slopeof number of certificates and the constant are significantly different from zero(.004 and .020, respectively). The equation is: rating = 5.59 + 1.31 times numberof certificates earned. Confidence intervals for predictions at 95% level ofconfidence will be "1.96 times 4.66. The R square is lower than the previousbivariate regression, indicating a weaker linear relationship.

    Model Summary

    .759a .576 .553 3.83

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), AGEa.

    120

  • 7/28/2019 Burns05 Im 1s9

    23/31

    Chapter 19: Regression Analysis in Marketing Research

    ANOVAb

    359.386 1 359.386 24.465 .000a

    264.414 18 14.690

    623.800 19

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), AGEa.

    Dependent Variable: RATINGb.

    Coefficientsa

    -2.124 2.962 -.717 .483

    .353 .071 .759 4.946 .000

    (Constant)

    AGE

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: RATINGa.

    Interpretation. The bivariate regression is significant (F=.000), and the slope forage is significantly different from zero (.000), but the constant is not (.483). Theregression equation is: ratings = 0 + .35 times years of age. Confidence intervalsfor predictions at 95% level of confidence will be "1.96 times 3.83. The R squarevalue is comparable to the first regression and larger than the second one.

    Model Summary

    .659a .435 .403 4.43

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), COMPYRSa.

    ANOVAb

    271.130 1 271.130 13.838 .002a

    352.670 18 19.593

    623.800 19

    RegressionResidual

    Total

    Model

    1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), COMPYRSa.

    Dependent Variable: RATINGb.

    121

  • 7/28/2019 Burns05 Im 1s9

    24/31

    Chapter 19: Regression Analysis in Marketing Research

    Coefficientsa

    6.906 1.668 4.140 .001

    .401 .108 .659 3.720 .002

    (Constant)

    COMPYRS

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: RATINGa.

    Interpretation. The bivariate regression is significant (F=.002), and both the slopeand intercept are significantly different from zero. The regression equation is:rating = 6.91 + .40 times number of years with company. Confidence intervalsfor predictions at 95% level of confidence will be "1.96 times 4.43.

    Note: The independent variable of gender (coded 1 or 2), is a nominally scaled

    variable, and it should not be used in a regression analysis as this analysisassumes metric data for the independent and dependent variables.

    2. Use multiple regression to determine the relationship of the various factors to self-evaluated sales performance for last year. What did you find, and what are theimplications of the findings for STA?

    Preliminary analysis involves inspecting the correlations between the variousindependent variables to spot multicollinearity problems. The correlation matrix isfound as follows.

    Correlations

    1.000 .605** -.339 .873** .558*

    . .005 .144 .000 .011

    20 20 20 20 20

    .605** 1.000 -.436 .590** .867**

    .005 . .055 .006 .000

    20 20 20 20 20

    -.339 -.436 1.000 -.306 -.200

    .144 .055 . .189 .398

    20 20 20 20 20

    .873** .590** -.306 1.000 .428

    .000 .006 .189 . .060

    20 20 20 20 20

    .558* .867** -.200 .428 1.000

    .011 .000 .398 .060 .

    20 20 20 20 20

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    Pearson Correlation

    Sig. (2-tailed)

    N

    TRAINHRS

    COMPYRS

    GENDER

    CERTS

    AGE

    TRAINHRS COMPYRS GENDER CERTS AGE

    Correlation is significant at the 0.01 level (2-tailed).**.

    Correlation is significant at the 0.05 level (2-tailed).*.

    122

  • 7/28/2019 Burns05 Im 1s9

    25/31

    Chapter 19: Regression Analysis in Marketing Research

    High correlations exist between trainhrs and certs (.873) and between age andcompyrs (.867). It is reasonable to drop trainhrs and use certs although thenumber of certificates earned in SA programs is more managerially relevant thanthe total training hours. Also, SA would probably like to advertise thatsalespersons can benefit from its training programs regardless of age, so drop

    compyrs and keep age for the multiple regression that follows. Note that genderis a dummy independent variable.

    Model Summary

    .824a .679 .618 3.54

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), AGE, GENDER, CERTSa.

    ANOVAb

    423.327 3 141.109 11.262 .000a

    200.473 16 12.530

    623.800 19

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), AGE, GENDER, CERTSa.

    Dependent Variable: RATINGb.

    Coefficientsa

    -2.098 4.085 -.513 .615

    -.507 1.749 -.043 -.290 .776

    .729 .348 .340 2.097 .052

    .282 .073 .605 3.848 .001

    (Constant)

    GENDER

    CERTS

    AGE

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    Standardi

    zedCoefficie

    nts

    t Sig.

    Dependent Variable: RATINGa.

    Interpretation. The multiple regression is significant (F=.000). The slopes forcerts and age are significantly different from zero (albeit marginally for

    certificates), but this is not the case for gender. Drop gender and perform atrimmed model regression. The output follows:

    123

  • 7/28/2019 Burns05 Im 1s9

    26/31

    Chapter 19: Regression Analysis in Marketing Research

    Model Summary

    .823a .677 .639 3.44

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), AGE, CERTSa.

    ANOVAb

    422.273 2 211.137 17.811 .000a

    201.527 17 11.855

    623.800 19

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), AGE, CERTSa.

    Dependent Variable: RATINGb.

    Coefficientsa

    -2.970 2.686 -1.106 .284

    .754 .328 .351 2.303 .034

    .283 .071 .609 3.993 .001

    (Constant)

    CERTS

    AGE

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: RATINGa.

    Interpretation. The multiple regression is significant (F=.000), and the slopes ofboth independent variables are significantly different from zero (.034 and .001),while the constant is not. The high multiple R of .823 infers a strong linearrelationship with much of the ratings variance explained by the two independentvariables. The regression model is: rating = 0 +.75 times number of certificatesearned + .28 times the person's age in years. Using the 95% level of confidence,predictions can be made with confidence intervals of "1.96 times 3.44 (standarderror). Age is about twice as important as is the number of certificates earnedaccording to the standardized beta coefficients. Age can be interpreted as ageneral indicator of sales experience, operating independently of training gainedin the STA certification programs.

    Case 19.3 The Hobbits Choice Restaurant Survey Predictive Analysis

    Case Objective

    Students must apply predictive analysis on the SPSS integrated case data set and interpretthe findings.

    124

  • 7/28/2019 Burns05 Im 1s9

    27/31

    Chapter 19: Regression Analysis in Marketing Research

    Answers to Case Questions

    1. What is the demographic target market definition for the Hobbits ChoiceRestaurant?

    The dependent variable to use is the likelihood of patronizing the Hobbits ChoiceRestaurant. Independent variables must be metric or dichotomous (dummy). Therecoded income level is metric; family size is metric; year born (or age) is metric; andgender can be used as a dummy variable. The multiple regression findings follow.

    Model Summary

    .776a .602 .598 .785

    Model1

    R R Square

    Adjusted

    R Square

    Std. Error of

    the Estimate

    Predictors: (Constant), Recoded income to $10,000'susing midpoints of questionnaire ranges, Including

    children under 18 living with you, what is you familysize?, What is your gender? , Year Born

    a.

    ANOVAb

    367.834 4 91.958 149.379 .000a

    243.164 395 .616

    610.998 399

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), Recoded income to $10,000's using midpoints ofquestionnaire ranges, Including children under 18 living with you, what is you family

    size?, What is your gender? , Year Born

    a.

    Dependent Variable: How likely would it be for you to patronize this restaurant (newupscale restaurant)?

    b.

    Coefficientsa

    9.103 9.308 .978 .329

    .018 .001 .762 20.957 .000

    .019 .029 .021 .664 .507

    .038 .079 .016 .490 .624

    -.004 .005 -.030 -.816 .415

    (Constant)

    Recoded income to$1,000s using midpointsof questionnaire ranges

    Including children under18 living with you, what isyour family size?

    What is your gender?

    Year Born

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    StandardizedCoefficients

    t Sig.

    Dependent Variable: How likely would it be for you to patronize this restaurant (new upscalerestaurant)?

    a.

    125

  • 7/28/2019 Burns05 Im 1s9

    28/31

    Chapter 19: Regression Analysis in Marketing Research

    As can be seen, the only significant independent variable is the recoded income level.Trimming and rerunning the regression results in the following:

    Model Summary

    .775a .601 .600 .783

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), Recoded income to $10,000'susing midpoints of questionnaire ranges

    a.

    ANOVAb

    366.920 1 366.920 598.309 .000a

    244.078 398 .613

    610.998 399

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), Recoded income to $10,000's using midpoints ofquestionnaire ranges

    a.

    Dependent Variable: How likely would it be for you to patronize this restaurant (newupscale restaurant)?

    b.

    Coefficientsa

    1.622 .069 23.625 .000

    .018 .001 .775 24.460 .000

    (Constant)

    Recoded income to$1,000s using midpointsof questionnaire ranges

    Model1 B Std. Error

    UnstandardizedCoefficients

    Beta

    StandardizedCoefficients

    t Sig.

    Dependent Variable: How likely would it be for you to patronize this restaurant (new upscalerestaurant)?

    a.

    The target market definition is very simple: The Hobbits Choice Restaurant targetmarket is upper income residents.

    2. What is the restaurant spending behavior target market definition for the HobbitsChoice Restaurant?

    This question refers to restaurant spending behavior as the independent variables.There are two questions on the survey that pertain to restaurant spending: total dollarsspent per month in restaurants and expected average price for an evening meal entreeitem alone.

    126

  • 7/28/2019 Burns05 Im 1s9

    29/31

    Chapter 19: Regression Analysis in Marketing Research

    Model Summary

    .848a .720 .718 .549

    Model1

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), What would you expect anaverage evening meal entree item alone to be priced?,How many total dollars do you spend/ per month inrestaurants (for your meals only)?

    a.

    ANOVAb

    260.859 2 130.430 432.827 .000a

    101.553 337 .301

    362.412 339

    Regression

    Residual

    Total

    Model1

    Sum ofSquares df Mean Square F Sig.

    Predictors: (Constant), What would you expect an average evening meal entree

    item alone to be priced?, How many total dollars do you spend/ per month inrestaurants (for your meals only)?

    a.

    Dependent Variable: How likely would it be for you to patronize this restaurant (newupscale restaurant)?

    b.

    Coefficientsa

    1.512 .072 20.871 .000

    .004 .001 .275 4.567 .000 .229 4.361

    .063 .006 .597 9.908 .000 .229 4.361

    (Constant)

    How many total dollarsdo you spend per

    month in r estaurants

    (for your meals only)?

    What would you expectan average e veningmeal entree item alone

    to be priced?

    Model1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    StandardizedCoefficients

    t Sig. Tolerance VIF

    Collinearity Statistics

    Dependent Variable: How likely would it be for you to patronize this restaurant (new upscale restaurant)?a.

    The significance level of both independent variables is .000, and there is no problemwith multicollinearity as no VIF value exceeds 10. Both total dollars spent inrestaurants per month, and the expected average price for an evening meal entrepredict the likelihood of patronizing the Hobbits Choice Restaurant. The standardbeta coefficients reveal that the average price variable is twice as important as is total

    dollars spent on restaurants in predicting this likelihood.

    The target market definition for the Hobbits Choice Restaurant is: (1) people whopatronize restaurants in general, and (2) those who expect to spend more for anevening entre (i.e., bigger spenders).

    127

  • 7/28/2019 Burns05 Im 1s9

    30/31

    Chapter 19: Regression Analysis in Marketing Research

    3. Develop a general conceptual model of market segmentation for the Hobbits ChoiceRestaurant. Test it using multiple regression analysis and interpret your findings forJeff Dean.

    The general conceptual model should be based on the variables in the survey. There

    are three classes of variables: (1) demographics, (2) restaurant patronage, and (3)restaurant feature preferences. The media usage variables are categorical and notsuited to regression analysis.

    Following is the result of stepwise multiple regression.

    Model Summary

    .838a .702 .702 .565

    .852b .725 .724 .543

    .855c .731 .729 .538

    Model1

    2

    3

    R R SquareAdjustedR Square

    Std. Error ofthe Estimate

    Predictors: (Constant), What would you expect anaverage evening meal entree item alone to be priced?

    a.

    Predictors: (Constant), What would you expect anaverage evening meal entree item alone to be priced?,Prefer Formal Waitstaff wearing Tuxedos

    b.

    Predictors: (Constant), What would you expect anaverage evening meal entree item alone to be priced?,Prefer Formal Waitstaff wearing Tuxedos, Year Born

    c.

    ANOVAd

    254.574 1 254.574 797.917 .000a

    107.838 338 .319

    362.412 339

    262.880 2 131.440 445.038 .000b

    99.531 337 .295

    362.412 339

    265.085 3 88.362 305.049 .000c

    97.327 336 .290

    362.412 339

    Regression

    Residual

    Total

    Regression

    Residual

    Total

    Regression

    Residual

    Total

    Model1

    2

    3

    Sum of

    Squares df Mean Square F Sig.

    Predictors: (Constant), What would you expect an average evening meal entreeitem alone to be priced?

    a.

    Predictors: (Constant), What would you expect an average evening meal entreeitem alone to be priced?, Prefer Formal Waitstaff wearing Tuxedos

    b.

    Predictors: (Constant), What would you expect an average evening meal entreeitem alone to be priced?, Prefer Formal Waitstaff wearing Tuxedos, Year Born

    c.

    Dependent Variable: How likely would it be for you to patronize this restaurant (newupscale restaurant)?

    d.

    128

  • 7/28/2019 Burns05 Im 1s9

    31/31

    Chapter 19: Regression Analysis in Marketing Research

    Coefficientsa

    1.663 .066 25.080 .000

    .088 .003 .838 28.247 .000 1.000 1.000

    1.593 .065 24.446 .000

    .069 .005 .652 14.428 .000 .399 2.508

    .163 .031 .240 5.303 .000 .399 2.508

    40.085 13.953 2.873 .004

    .065 .005 .618 13.323 .000 .371 2.696

    .107 .037 .157 2.918 .004 .276 3.629

    -.020 .007 -.136 -2.759 .006 .331 3.021

    (Constant)

    What would you expect

    an average eveningmeal entree item aloneto be p riced?

    (Constant)

    What would you expect

    an average eveningmeal entree item aloneto be p riced?

    Prefer Formal WaitstaffWearing Tuxedos

    (Constant)

    What would you expectan average evening

    meal entree item aloneto be p riced?

    Prefer Formal Waitstaff

    Wearing Tuxedos

    Year Born

    Model1

    2

    3

    B Std. Error

    Unstandardized

    Coefficients

    Beta

    Standardized

    Coefficients

    t Sig. Tolerance VIF

    Collinearity Statistics

    Dependent Variable: How likely would it be for you to patronize this restaurant (new upscale restaurant)?a.

    The analysis has determined three significant variables: expected average eveningentre price, preference for formal waitstaff with tuxedos, and year born. You havefound that Jeff Dean should target the older, big spenders with a formal waitstaffattired in tuxedos in his upscale restaurant.