257
PROCEEDINGS OF THE SAWTOOTH SOFTWARE CONFERENCE September 2001

PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

PROCEEDINGS OF THE SAWTOOTH SOFTWARE

CONFERENCE

September 2001

Page 2: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Copyright 2002

All rights reserved. This electronic document may be copied or printed for personal use only. Copies or reprints may not be sold

without permission in writing from Sawtooth Software, Inc.

Page 3: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

FOREWORD

The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events of the prior day, September 11, 2001. It is ironic that we had moved the date of the conference to September 12th to avoid an earlier scheduled WTO meeting in Victoria, for fear that protests might somehow disrupt transportation to the conference. Instead, in response to the terrorist attacks on the World Trade Center, the Pentagon, and another hijacking that ended in a fatal crash in Pennsylvania, the FAA grounded all commercial flights in North America. About half our attendees had already made it to Victoria, but many were unable to leave home, and so many others were stranded in airports around the world.

We were impressed by the professionalism of those who spent those difficult days with us in Victoria. On breaks between sessions, we were riveted to the television set in the registration room, collectively shaking our heads. Marooned together, we went ahead with a successful conference. Technology came to the rescue for some speakers unable to make it, as we piped their voices into the ballroom via phone line and advanced the PowerPoint slides on the screen as they spoke. A few planned speakers were unable to deliver a presentation at all.

Due to the circumstances, no best paper award was given this year. Many were deserving, as you will discover. The papers presented in this volume are in the words of the authors, and we have performed very little copy editing. We wish to express our sincere thanks to the authors and discussants whose dedication and efforts made this very unusual 2001 Conference a success.

Some of the papers presented at this and previous conferences are available in electronic form at our Technical Papers Library on our home page: http://www.sawtoothsoftware.com.

Sawtooth Software February, 2002

2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 4: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 5: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

CONTENTS

KNOWLEDGE AS OUR DISCIPLINE ........................................................................................... 1 Chuck Chakrapani, Ph.D, Standard Research Systems / McMaster University, Toronto, Canada

PARADATA: A TOOL FOR QUALITY IN INTERNET INTERVIEWING................................................... 15 Ray Poynter, The Future Place, and Deb Duncan, Millward Brown IntelliQuest

WEB INTERVIEWING: WHERE ARE WE IN 2001?...................................................................... 27 Craig V. King and Patrick Delana, POPULUS

USING CONJOINT ANALYSIS IN ARMY RECRUITING ................................................................ 35 Todd M. Henry, United States Military Academy, and Claudia G. Beach, United States Army Recruiting Command

DEFENDING DOMINANT SHARE: USING MARKET SEGMENTATION AND CUSTOMER RETENTION MODELING TO MAINTAIN MARKET LEADERSHIP ...................................................................... 47

Michal G. Mulhern, Ph.D, Mulhern Consulting

ACA/CVA IN JAPAN: AN EXPLORATION OF THE DATA IN A CULTURAL FRAMEWORK.................. 59 Brent Soo Hoo, Gartner/Griggs-Anderson, Nakaba Matsushima and Kiyoshi Fukai, Nikkei Research

A METHODOLOGICAL STUDY TO COMPARE ACA WEB AND ACA WINDOWS INTERVIEWING........ 85 Aaron Hill and Gary Baker, Sawtooth Software, Inc., and Tom Pilon, TomPilon.com

INCREASING THE VALUE OF CHOICE-BASED CONJOINT WITH “BUILD YOUR OWN” CONFIGURATION QUESTIONS ...................................................................................................................... 99

David G. Bakken, Ph.D, and Len Bayer, Harris Interactive

APPLIED PRICING RESEARCH ............................................................................................. 111 Jay L. Weiner, Ph.D., Ipsos North America

RELIABILITY AND COMPARABILITY OF CHOICE-BASED MEASURES: ONLINE AND PAPER-AND-PENCIL METHODS OF ADMINISTRATION ......................................................................................... 123

Thomas W. Miller, A.C. Nielsen Center, School of Business, University of Wisconsin-Madison, David Rake, Reliant Energy, Takashi Sumimoto, Harris Interactive, and Peggy S. Hollman, General Mills

TRADE-OFF STUDY SAMPLE SIZE: HOW LOW CAN WE GO? ................................................... 131 Dick McCullough, MACRO Consulting, Inc.

i2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 6: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

THE EFFECTS OF DISAGGREGATION WITH PARTIAL PROFILE CHOICE EXPERIMENTS....................... 151 Jon Pinnell and Lisa Fridley, MarketVision Research

ONE SIZE FITS ALL OR CUSTOM TAILORED: WHICH HB FITS BETTER?......................................... 167 Keith Sentis and Lihua Li, Pathfinder Strategies

MODELING CONSTANT SUM DEPENDENT VARIABLES WITH MULTINOMIAL LOGIT: A COMPARISON OF FOUR METHODS .............................................................................................................. 177

Keith Chrzan, ZS Associates, and Sharon Alberg, Maritz Research

DEPENDENT CHOICE MODELING OF TV VIEWING BEHAVIOR.................................................. 185 Maarten Schellekens, McKinsey & Company / Intomart BV

ALTERNATIVE SPECIFICATIONS TO ACCOUNT FOR THE “NO-CHOICE” ALTERNATIVE IN CONJOINT CHOICE EXPERIMENTS...................................................................................................... 195

Rinus Haaijer, MuConsult, Michel Wedel, University of Groningen and Michigan, and Wagner Kamakura, Duke University

HISTORY OF ACA ........................................................................................................... 205 Richard M. Johnson, Sawtooth Software, Inc.

A HISTORY OF CHOICE-BASED CONJOINT.......................................................................... 213 Joel Huber, Duke University

RECOMMENDATIONS FOR VALIDATION OF CHOICE MODELS.................................................. 225 Terry Elrod, University of Alberta

ii 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 7: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

SUMMARY OF FINDINGS

We distilled some of the key points and findings from each presentation below.

Knowledge as Our Discipline (Chuck Chakrapani): Chuck observed that many market researchers have become simply “order takers” rather than having real influence within organizations. He claims that very early on marketing research made the mistake of defining its role too narrowly. Broadening the scope of influence includes helping managers ask the right questions and becoming more knowledgeable of the businesses that market researchers are consulting. As evidence of the poor state of marketing research, Chuck showed how many management and marketing texts virtually ignore the marketing research function as important to the business process.

Chuck argued that, as opposed to other sciences, market researchers have not mutually developed a core set of knowledge about the law-like relationships within their discipline. The reasons include of a lack of immediate rewards for compiling such knowledge, and an over-concern for confidentiality within organizations. Chuck decried “black-box” approaches to market research. The details of “black-box” approaches are confidential, and therefore the validity of such approaches cannot be truly challenged or tested. He argued that widespread practice of “Sonking” (Scientification of Non-Knowledge) in the form of sophisticated-looking statistical models devoid of substantial empirical content has obscured true fact-finding and ultimately lessened market researchers’ value and influence.

Paradata: A Tool for Quality in Internet Interviewing (Ray Poynter and Deb Duncan): Ray and Deb showed how Paradata (information about the process) can be used for fine-tuning and improving on-line research. Time to complete the interview, number of abandoned interviews at each question in the survey, and internal fit statistics are all examples of Paradata. The authors reported that complex “grid” style questions, constant-sum questions, and open-end questions that required respondents to type a certain number of characters resulted in many more drop-outs within on-line surveys.

In addition to observing the respondent’s answers, the authors pointed out that much information can be learned by “asking” the respondent’s browser questions. Ray and Deb called this “invisible” data. Examples include: current screen resolution, browser version, operating system, Java enabled or not. Finally, the authors suggested that researchers pay close attention to privacy issues: posting privacy policies on their sites and faithfully abiding by those guidelines.

Web Interviewing: Where are We in 2001? (Craig King and Patrick Delana): Craig and Patrick reported their experiences with Web interviewing (over 230,000 interviews over the last two years). Most of their research has involved employee interviews at large companies, for which they report about a 50% response rate. The authors have also been involved in more traditional market research studies, for which they often find Web response rates of about 20% after successful qualification by a phone screener, but less than 5% for “targeted” lists of IT professionals. They suggested that the best way to improve response rates is by giving cash to each respondent, though they noted that this is more expensive to process than cash drawings.

iii2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 8: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The authors reported findings from other research suggesting that paper-and-pencil and Web interviews usually produce quite similar findings. They reported on a split-sample study they conducted which demonstrated virtually no difference between the response patterns to a 47-question battery of satisfaction questions for Web and conventional mail surveys. The authors also shared some nuts-and-bolts advice, such as to be careful about asking respondents to type alpha-numeric passwords where there can be confusion. Examples include 0, o, O, vv, WW, the number “1” vs. small “L”.

Using Conjoint Analysis in Army Recruiting (Todd Henry and Claudia Beach): The Army has found it increasingly difficult to recruit 17- to 22-year olds. Three reasons include low unemployment rate, a decrease in the propensity among youth to serve, and an increase in the number of young people attending college. As a result, the Army has had to offer increased incentives to entice people to enlist. The authors described how they used the results of a CBC study of potential enlistees to allocate enlistment incentives across different military career paths and enlistment periods. They utilized CBC utilities within a goal program with the aim to allocate resources to minimize deviations from the recruiting quotas for each occupational specialty. They found that their model did a good job of estimating recruits at the shorter terms of service, but over-estimated recruit preference for longer terms of service. They discussed some of the challenges of using the conjoint data to predict enlistment rates. One in particular is that not all career paths are available to every enlistee, as they are in the CBC interview. Enlistees must meet certain requirements to be accepted into many of the specialties.

Defending Dominant Share: Using Market Segmentation and Customer Retention Modeling to Maintain Market Leadership (Mike Mulhern): Mike provided a case study demonstrating how segmentation followed by customer retention modeling could help a firm maintain market leadership. One of Mike’s most important points was the distinction between using intention to re-purchase instead of customer satisfaction as the dependent variable. He argued that intention to re-purchase was better linked to behavior than satisfaction.

Mike described the process he used to build the retention models for each segment. After selecting logistic regression as his primary modeling tool, Mike discussed how he evaluated and improved the models. In this research, improvements to the models were made by managing multicollinearity with factor analysis, recoding the dependent variable to ensure variation, testing the independent variables for construct validity, and employing regression diagnostics. The diagnostic measures improved the model by identifying outliers and cases that had excessive influence. Examples from the research were used to illustrate how these diagnostic measures helped improve model quality.

ACA/CVA in Japan: An Exploration of the Data in a Cultural Framework (Brent Soo Hoo, Nakaba Matsushima, and Kiyoshi Fukai): Brent and his co-authors cautioned researchers to pay attention to cultural differences prior to using conjoint analysis across countries. As one example, they pointed out some of the characteristics of Japanese people that are quite unique to that country and might affect conjoint results. For example, the Japanese people’s reluctance to be outspoken (using the center rather than extreme points on scales) might result in lower quality conjoint data.

iv 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 9: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

They tested the hypothesis that Japanese people tended to use the center part of the 9-point graded comparison scale in ACA and CVA. They found at least some evidence for this behavior, but did not find proof that the resulting ACA utilities were less valid than among countries that tend to use, to a greater extent, the full breadth of the scale.

A Methodological Study to Compare ACA Web and ACA Windows Interviewing (Aaron Hill, Gary Baker, and Tom Pilon): Aaron and his co-authors undertook a pilot research study among 120 college students to test whether the results of two new software systems (ACA for Windows and ACA for Web) were equivalent. They configured the two computerized interviews to look nearly identical (fonts, colors, and scales) for the self-explicated priors section and the pairs section. The ACA for Windows interview provided greater flexibility in the design of its calibration concept questions, so a new slider scale was tested.

The authors found no substantial differences among the utilities for the two approaches, suggesting that researchers can employ mixed modality studies with ACA (Web/Windows) and simply combine the results. Respondents were equally comfortable with either survey method, and took equal time to complete them. The authors suggested that respondents more comfortable completing Web surveys could be given a Web-based interview, whereas others might be sent a disk in the mail, be invited to a central site, or could be visited by an interviewer carrying a laptop.

As seen in many other studies, HB improved the results over traditional ACA utility estimation. Other tentative findings were as follows: self-explicated utilities alone did quite well in predicting individuals’ choices to holdout tasks —but the addition of pairs and HB estimation further improved the predictability of the utilities; the calibration concept question can be skipped if the researcher uses HB and does not need to run purchase likelihood simulations; and the slider scale for calibration concepts may result in more reliable purchase likelihood scaling among respondents comfortable with using the mouse.

Increasing the Value of Choice-Based Conjoint with “Build Your Own” Configuration Questions (David Bakken and Len Bayer): David and Len showed how computerized questionnaires can include a “Build Your Own” (BYO) product question. In the BYO question, respondents can configure the product that they are most likely to buy by choosing a level from each attribute. Each level is associated with an incremental price, and the total price is re-calculated each time a new feature is selected. Even though clients tend to like BYO questions a great deal, David and Len suggest that the actual data from the BYO task may be of limited value.

The authors presented the results of a study that compared traditional Choice-Based Conjoint results to BYO questions. They found only a loose relationship between the information of the two methods. They concluded that a BYO question may serve a good purpose for product categories in which buyers truly purchase the product in a BYO fashion, but that larger sample sizes than traditional conjoint are needed. Furthermore, experimental treatments (e.g. variations in price for each feature) might be needed either within or between subjects to improve the value of the BYO task. Between-subjects designs would increase sample size demands. David and Len pointed out that the BYO focuses respondents on trading off each feature versus price rather than

v2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 10: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

trading features off against another. The single trade-off versus price may reflect a different cognitive process than the multi-attribute trade-off that characterizes a choice experiment.

Applied Pricing Research (Jay Weiner): Jay reviewed the common approaches to pricing research in marketing research: willingness to pay questions, monadic designs, the van Westendorp technique, conjoint analysis, and discrete choice. Jay argued that most products exhibit a range of inelasticity —and finding that range of inelasticity is one of the main goals of pricing research. Demand may fall, but total revenue can increase over those limited ranges.

Jay compared the results of monadic concept tests and the van Westendorp technique. He concluded that the van Westendorp technique did a reasonable job of predicting actual trial for a number of FMCG categories. Even though he didn’t present data on the subject, he suggested that the fact that CBC offers a competitive context may improve the results relative to other pricing methods.

Reliability and Comparability of Choice-Based Measures: Online and Paper-and-Pencil Methods of Administration (Tom Miller, David Rake, Takashi Sumimoto, and Peggy Hollman): Tom and his co-authors presented evidence that the usage of on-line surveys is expected to grow significantly in the near future. They also pointed out that some studies, particularly comparing web interviewing with telephone research, show that different methods of interviewing respondents may yield different results. These differences may be partly due to social desirability issues, since telephone respondents are communicating with a human rather than a computer.

Tom and his co-authors reported on a carefully designed split-sample study that compared the reliability of online and paper-and-pencil discrete choice analysis. Student respondents from the University of Wisconsin were divided into eight design cells. Respondents completed both paper-and-pencil and CBC tasks, in different orders. The CBC interview employed a fixed design in which respondents saw each task twice, permitting a test-retest condition for each task. The authors found no significant differences between paper-and-pencil administration and on-line CBC. Tom and his colleagues concluded that for populations in which respondents were comfortable with on-line technology, either method should produce equivalent results.

Trade-Off Study Sample Size: How Low Can We Go? (Dick McCullough): In market research, the decision regarding sample size is often one of the thorniest. Clients have a certain budget and often a sample size in mind based on past experience. Different conjoint analysis methods provide varying degrees of precision given a certain sample size. Dick compared the stability of conjoint information as one reduces the sample size. He compared Adaptive Conjoint Analysis (ACA), traditional ratings-based conjoint (CVA), and Choice-Based Conjoint (CBC). Both traditional and Hierarchical Bayes analyses were tested. Dick used actual data sets with quite large sample sizes (N>400). He randomly chose subsets of the sample for analysis, and compared the results each time to the full sample. The criterion for fit was how well the utilities from the sub-sample matched the utilities for the entire sample, and how well market simulations for hypothetical market scenarios for the sub-sample matched the entire sample.

Because the data sets were not specifically designed for this research, Dick faced challenges in drawing firm conclusions regarding the differences in conjoint approaches and sample size.

vi 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 11: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Despite the limitations, Dick’s research suggests that ACA data are more stable than CBC data (given the same sample size). His findings also suggest that conjoint researchers may be able to significantly reduce sample sizes without great losses in information. Especially for preliminary exploratory research, sample sizes as small as 30 or even less may yield valid insights into the population of interest. In the discussion following the presentation, Greg Allenby of Ohio State (considered the foremost expert in applying HB to marketing research problems) suggested that HB should work better than traditional estimation even with extremely small samples —even sample sizes of fewer than 10 people.

The Effects of Disaggregation with Partial-Profile Choice Experiments (Jon Pinnell and Lisa Fridley): Jon and Lisa’s research picked up where Jon’s previous Sawtooth Software Conference paper (from 2000) had left off. In the 2000 conference, Jon examined six commercial CBC data sets and found that Hierarchical Bayes (HB) estimation almost universally improved the accuracy of individual-level predictions for holdout choice tasks relative to aggregate main-effects logit. The one exception was a partial-profile choice experiment in which respondents only saw a subset of the total number of attributes within each choice task. Jon and Lisa decided this year to focus the investigation on just partial-profile choice data sets to see if that finding would generalize.

After studying nine commercial partial-profile data sets, Jon found that for four of the data sets simple aggregate logit utilities fit individual holdout choices better than individual estimates under HB. Jon could not conclusively determine which factors caused this to happen, but he surmised that the following may hurt HB’s performance with partial-profile CBC data sets: 1) low heterogeneity among respondents, 2) large number of parameters to be estimated relative to the amount of information available at the individual level. Specifically related to point 2, Jon noted that experiments with few choice concepts per task performed less well for HB than experiments with more concepts per task. Later discussion by Keith Sentis suggested that the inability to obtain good estimates at the individual level may be exacerbated as the ratio of attributes present per task versus total attributes in the design becomes smaller. Jon also suggested that the larger scale parameter previously reported for partial profile data sets relative to full-profile data might in part be due to overfitting, rather than a true reduction in noise for the partial-profile data.

One-Size-Fits-All or Custom Tailored: Which HB Fits Better? (Keith Sentis and Lihua Li): Keith began his presentation by expressing a concern he has had over the last few years when using Sawtooth Software’s HB software due to its assumption of a single multivariate normal distribution to reflect the population. Keith and Lihua wondered whether that assumption negatively affected the estimated utilities if segments existed with quite different utilities.

The authors studied seven actual CBC data sets, systematically excluding some of the tasks to serve as holdouts for internal validation. They estimated the utilities in four ways: 1) by using the entire sample within the same HB estimation routine, 2) by segmenting respondents according to industry sectors and estimating HB utilities within each segment, 3) by segmenting respondents using a K-means clustering procedure on HB utilities, and then re-estimating within each segment using HB, 4) and by segmenting respondents using Latent Class and then estimating HB utilities within each segment.

vii2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 12: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Keith and Lihua found that whether one ran HB on the entire sample, or whether one segmented first prior to estimating utilities, the upper-level model assumption in HB of normality did not decrease the fit of the estimated utilities to the holdouts. It seemed unnecessary to segment first before running HB. In his discussion of Keith’s paper, Rich Johnson suggested that Keith’s research supports the notion that clean segmentation may not be present in most data sets. Subsequent discussion highlighted that there seemed to be enough data at the individual level (each respondent received usually about 14 to 20 choice tasks) that respondents’ utilities could be fit reasonably well to their own data while being only moderately tempered by the assumptions of a multivariate normal population distribution. Greg Allenby (considered the foremost expert on applying HB to marketing problems) chimed in that Keith’s findings were not a surprise to him. He has found that extending HB to accommodate multiple distributions leads to only minimal gains in predictive accuracy.

Modeling Constant Sum Dependent Variables with Multinomial Logit: A Comparison of Four Methods (Keith Chrzan and Sharon Alberg): Keith and Sharon used aggregate multinomial logit to analyze three constant sum CBC data sets under different coding procedures. In typical CBC data sets, respondents choose just one favored concept from a set of concepts. With constant sum (allocation) data, respondents allocate, say, 10 points among the alternatives to express their relative preferences/probabilities of choice. The first approach the authors tested was to simply convert the allocations to a discrete choice (winner takes all for the best alternative). Another approach coded the 10-point allocation as if it were 10 independent discrete choice events, by applying the winner take all method weighted by the allocation, with separate tasks to reflect each alternative receiving an allocation. Keith noted that this was the method used by Sawtooth Software’s HB-Sum software. Another approach involved making the allocation task look like a series of interrelated choice sets, the first showing that the alternative with the most “points” was preferred to all other; the second showing that the second most preferred alternative was preferred to the remaining concepts (not including the “first choice”), etc. The last approach was the same as the previous, but with a weight given to each task equal to the allocation for the chosen concept.

Using the Swait-Louviere test for equivalence of parameters and scale, Keith and Sharon found that the different models were equivalent in their parameters for all three data sets, but not equivalent for scale for one of the data sets. Keith noted that the difference in scale could indeed affect the results of choice simulations. He suggested that for logit simulations this difference was of little concern, since the researcher would likely adjust for scale to best fit holdouts anyway. Keith concluded that it was comforting that different methods provided quite similar results and recommended the coding strategy as used with HB-Sum, as it did not discard information and seemed to be the easiest for his group to program whether using SAS, SPSS, LIMDEP or LOGIT.

Dependent Choice Modeling of TV Viewing Behavior (Maarten Shellekens): Marten described the modeling challenges involved with studying TV viewing behavior. Given a programming grid with competing networks offering different programming selections, respondents indicate which programs they would watch in each time slot, or whether they would not watch at all. As opposed to traditional CBC modeling where it is assumed that a respondent’s choice within a given task is independent of selections made in previous tasks, with

viii 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 13: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

TV viewing there are obvious interdependencies. For example, respondents who have chosen to watch a news program in an earlier time slot may be less likely to choose another news program later on. Marten discussed two ways to model these dependencies. One method is to treat the choice of programming in each time slot as a separate choice task, and to treat the availability of competing alternatives within the same time slots as context effects. Another approach is to treat the “path” through the time slots and channels as a single choice. The number of choice alternatives per task dramatically increases with this method, but Maarten argued that the results may in some cases be superior.

Alternative Specifications to Account for the “No-Choice” Alternative in Conjoint Choice Experiments (Rinus Haaijer, Michel Wedel, and Wagner Kamakura): Rinus and his coauthors addressed the pros and cons of including No-Choice (None, or Constant Alternatives) in choice tasks. The advantages of the None alternative are that it makes the choice situation more realistic, it might be used as a proxy for market penetration, and it promotes a common scaling of utilities across choice tasks. The disadvantages are that it provides an “escape” option for respondents to use when a choice seems difficult, less information is provided by a None choice than a choice of another alternative, and potential IIA violations may result when modeling the None.

Rinus provided evidence that some strategies that have been reported in the literature for coding the None within Choice-Based Conjoint can lead to biased parameters and poor model fit —particularly if some attributes are linearly coded. He found that the None alternative should be explicitly accounted for as a separate dummy code (or as one of the coded alternatives of an attribute) rather than just be left as the “zero-state” of all columns. The coding strategy that Rinus and co-authors validated is the same that has been used within Sawtooth Software’s CBC software for nearly a decade.

History of ACA (Rich Johnson): Rich described how in the early ‘70s he developed a system of pair-wise trade-off matrices for estimating utilities for industry-driven problems having well over 20 attributes. Rich was unaware of the work of Paul Green and colleagues regarding full-profile conjoint analysis, which he noted would have been of immense help. He noted that practitioners like himself had much less interaction with academics than they do today. Rich discovered that trade-off matrices worked fairly well, but they were difficult for respondents to complete reliably. About that same time, small computers were being developed, and Rich recognized that these might be used to ask trade-off matrices. He also figured that if respondents were asked to provide initial rank-orders within attributes, many of the trade-offs could be assumed rather than explicitly asked. The same information could be obtained for main-effects estimation with many fewer questions. These developments marked the beginning of what became the computerized conjoint method Adaptive Conjoint Analysis (ACA).

Rich founded Sawtooth Software in the early ‘80s, and the first commercial ACA system was released in 1985. ACA has benefited over the years from interactions with users and academics. Most recently, hierarchical Bayes methods have improved ACA’s utility estimation over the previous OLS standards. Rich suggested that ACA users may be falsely content with their current OLS results and should use HB estimation whenever possible.

ix2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 14: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

A History of Choice-Based Conjoint (Joel Huber): Joel described the emergence of Choice-Based Conjoint (discrete choice) methods and the challenges that researchers have faced in modeling choice-based data. He pointed to early work in the ‘70s by McFadden, which laid the groundwork for multinomial logit. A later paper by Louviere and Woodworth (1983) kicked off Choice-Based Conjoint within the marketing profession. Joel discussed the Red-Bus/Blue Bus problem and how HB helps analysts avoid the pitfalls of IIA. Propelled by the recent boost offered by individual-level HB estimation, Joel predicted that Choice-Based Conjoint would eventually overtake Adaptive Conjoint Analysis as the most widely used conjoint-related method.

Recommendations for Validation of Choice Models (Terry Elrod): Terry criticized two common practices for validating conjoint models and proposed a remedy for each. First, he criticized using hit rates to identify the better of several models because they discard too much information. Hit rate calculations consider only which choice was predicted as being most likely by a model and ignores the predicted probability of that choice. He pointed out that the likelihood criterion, which is used to estimate models, is easily calculated for holdout choices. He prefers this measure because it uses all available information to determine which model is best. It is also more valid than hit rates because it penalizes models for inaccurate predictions of aggregate shares.

Second, Terry warned against the common practice of using the same respondents for utility estimation and validation. He showed that this practice artificially favors utility estimation techniques that over-fit respondent heterogeneity. For example, it understates the true superiority of hierarchical Bayes estimation (which attenuates respondent heterogeneity) relative to individual-level estimation. He suggested a four-fold holdout procedure as a proper and practical alternative. This approach involves estimating a model four times, each time using a different one-fourth of the respondents as holdouts and the other three-fourths for estimation. A model’s validation score is simply the product of the four holdout likelihoods.

x 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 15: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

KNOWLEDGE AS OUR DISCIPLINE

Chuck Chakrapani, Ph.D. Standard Research Systems / McMaster University

Toronto, Canada

Traditionally, marketing research has been considered a discipline that uses scientific methods to collect, analyze, and interpret data relevant to marketing of goods and services. The acceptance of this definition has prevented marketing researchers from being meaningful partners in the decision-making process. The way marketing research has been positioned and practiced over the years appears to be at odds with the new information age and management decision requirements. There seems to be an immediate need to redefine our discipline and our role in management decision making.

In 1961, the American Marketing Association defined marketing research as “the systematic gathering, recording, and analyzing of data about problems relating to the marketing of goods and services”. Implied in this definition is the idea that marketing researchers have no direct involvement in the process of marketing decision making. Their role is to provide support to the real decision makers by providing the information asked for by them.

Academics readily accepted the AMA definition and its implications as evidenced by the following typical quote from a textbook: Marketing research is the systematic process of purchasing relevant information for marketing decision making (Cox and Evans 1972, p. 22; emphasis added). Authors such as Kinear and Taylor (1979) went a step further and explicitly made the point that only the decision maker had a clear perspective with regard to information requirements: Only the manager has a clear perspective as to the character and specificity of the information needed to reduce the uncertainty surrounding the decision situation (Kinnear and Taylor 1979, p. 25; emphasis added). By inference, marketing researchers have nothing to contribute to the character and specificity of the information needed to reduce the uncertainty surrounding the decision situation; only the manager has a clear perspective on these matters. This idealized version of the decision maker as someone who has a clear perspective on what he or she needs to make sound decisions is as much a myth as the concept of the “rational man” of the economic disciplines of yesteryears. Both these romanticized portraits—decision maker with a clear perspective and rational man who optimizes his well-being/returns—sound enticing and plausible in theory but are seldom obtained in practice.

Yet experienced marketing researchers know that they have a lot to contribute to the character and specificity of the information needed to reduce the uncertainty surrounding the decision. In fact, it is one of the most important bases on which a good researcher is distinguished from a mediocre one.

By defining marketing research in terms of its narrow functional roles rather than by its

broad overall goals, we have acutely limited the growth of marketing research as a serious discipline striving to create a core body of knowledge. To define ourselves by our functional roles rather than by our overall goals is similar to a university defining itself as a collection of

12001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 16: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

buildings with employees whose job it is to publish papers and lecture students who pay. The results of a narrow functional definition have led to a tunnel vision in our profession. Its consequences have been far-reaching and not in positive ways either. At the dawn of the twenty-first century, marketing research stands at the threshold of irrelevance, as the following facts indicate:

• In 1997, Financial Times of London published a 678-page book, The Complete MBA Companion, with the assistance of three major international business schools: IMD, Wharton, and the London Business School. The book had 20 modules that covered a wide range of subjects relevant to management. Marketing research is not one of them. The term “marketing research” is not even mentioned in the index.

• In 1999, Financial Times published another book, Mastering Marketing Management, this time with the assistance of four major international business schools: Kellogg, INSEAD, Wharton, and the London Business School. The book had 10 modules and covered the entire field of marketing. Again, marketing research is not one of them. No module discussed the topic of marketing research directly. Rather, there were some indirect and sporadic references to the uses of marketing research in marketing. Apparently, the field of marketing can be mastered without having even a passing familiarity with marketing research.

• The following books, many of them business bestsellers, completely ignore marketing research: Peters and Waterman’s In Search of Excellence and A Passion for Excellence, Porter’s Competitive Advantage, Rapp and Collins’s Maximarketing, and Bergleman and Collins’s Inside Corporate Innovation (Gibson 2000).

There is more. The advent of new technologies—the Internet, data mining, and the like—brought with it a host of other specialists who started encroaching upon the territory traditionally held by marketing researchers and minimizing their importance even further.

All this reduced marketing researchers to a role of order takers. Yet, beneath the surface, things have been changing for a while. In 1987, AMA revised its

definition of marketing research and stated that:

“Marketing research is the use of scientific methods to identify and define marketing opportunities and problems; generate, refine, and evaluate marketing actions; monitor marketing performance; and improve our understanding of marketing as a process (Marketing News 1987, p. 1).”

2 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 17: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

This extended definition acknowledged that information is used to identify and define marketing opportunities and problems; generate, refine, and evaluate marketing actions; monitor marketing performance; and improve understanding of marketing as a process. Marketing research is the function that links the consumer, customer, and public to the marketer through information.

Thirteen years before the dawn of the third millennium, AMA acknowledged that marketing research is much more than collecting and analyzing data at the behest of the “decision makers”; it is a discipline in its own right and is involved in improving our understanding of marketing as a process. With this new definition, marketing research is not a content-free discipline that merely concerns itself (using methods heavily borrowed from other disciplines) with eliciting information with no thought given to accumulating, codifying, or generalizing the information so elicited, but a discipline with content that is relevant to marketing as a process.

As one of the prescriptions to reviving the role of research, Mahajan and Wind (1999)

completely reversed the earlier view that “only the manager has a clear perspective as to the character and specificity of the information” (Kinnear and Taylor 1979, p. 22) and stated that the “biggest potential in the use of marketing research is … in helping management ask the right strategic questions.” They went on to suggest that “marketing researchers need to give it a more central role by connecting it more closely to strategy processes and information technology initiatives” (Mahajan and Wind 1999, pp. 11–12). The president and CEO of Hasbro, Herb Baum, the president and CEO echoed this view: “[Market research] could improve productivity, if the department were to run with the permission, so to speak, to initiate projects rather than be order takers. . . I think they [market researchers] would be more productive if they were more a part of the total process, as opposed to being called in cafeteria style to do a project.” (Fellman, 1999).

But we how do we reclaim our relevance to the marketing decision processes? Before we

answer this question let’s ask ourselves another question: What makes us marketing researchers? There must be an underlying premise to our discipline. There must be a point of view that defines our interests. There must be an underlying theme that motivates us and makes us define ourselves as practitioners of the profession.

That underlying theme cannot simply be the collection, analysis, and interpretation of data.

This theme has not served us well in the past and has led us to a place where we are already discussing how we can effectively continue to exist as a profession and reestablish our relevance without facing imminent professional extinction.

The underlying theme that propels a marketing researcher, it seems to me, should be much

more than the collection, analysis, and interpretation of data. I would like to propose that it is the search for marketing knowledge.

32001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 18: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

FROM DATA TO A CORE BODY OF KNOWLEDGE

The quest for knowledge is not unique to marketing researchers. It is common to all researchers. Many scientific paradigms are based on inductive reasoning, followed by a deductive verification of the hypotheses generated by induction. It is no different in marketing research. In marketing research, we also have the opportunity to follow the scientific process of accumulating data in order to derive lawlike relationships. Marketing researchers seek knowledge at various levels of abstraction. Consider the following marketing questions that move from very specific to very generalized information:

• How many consumers say that they intend to buy brand X next month? • How many consumers say that they intend to buy the product category? • How many of the consumers who say that they intend to buy the brand are likely to do

so? • Can the intention–behavior relationship be generalized for the product category? • Can it be generalized across all product categories? • Can we derive any lawlike relationships that will enable us to make predictions

about consumer behavior in different contexts?

Clearly, these questions require different degrees of generalization. We go from data to information to lawlike relationships to arrive at knowledge (Ehrenberg and Bound, 2000). Because our discussion is focused on deriving lawlike relationships that lead to knowledge, we review these concepts briefly here.

Information

The term information refers to an understanding of relationships in a limited context. For example, correlational analysis may show that loyal customers of firm x are also more profitable customers. This is information because, though it is more useful than raw data, it has limited applicability. We don’t know whether this finding is applicable to other firms or even to the same firm at a different point in time.

Lawlike Relationships

By increasing the conditions of applicability of information, we arrive at lawlike relationships. In the preceding example, if it can be shown that loyalty of customers is related to profitability of a firm across different product categories and across different geographic regions, we have what is known as a lawlike relationship. The other characteristics of lawlike relationships (Ehrenberg 1982) are that they are

1. General, but not universal. We can establish under what conditions a lawlike relationship holds. Exceptions do not minimize the value of lawlike relationships;

2. Approximate. Absolute precision is not a requirement of lawlike relationships;

3. Broadly descriptive and not necessarily causal. In our example, our lawlike relationship does not say that customer satisfaction leads to profits; and

4 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 19: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

4. Of limited applicability and may not lend themselves to extrapolation and prediction. Lawlike relationships cannot be assumed to hold in all contexts; they have to be verified separately under each context.

Knowledge

Accumulation of lawlike relationships leads to knowledge. In the words of Ehrenberg and Bound (2000, pp. 24–25), “Knowledge implies having a wider context and some growing understanding. For example, under what different circumstances has the attitude–behavior relationship been found to hold; that is, how does it vary according to the circumstances? With knowledge of this kind, we can begin successfully to predict what will happen when circumstances change or when we experimentally and deliberately change the circumstances. This tends to be called ‘science,’ at least when it involves empirically grounded predictions that are routinely successful.”

Unlike in many other disciplines, the gathering of lawlike relationships has been less

vigorously pursued in market research. Whenever we are asked by non-research professionals how some marketing variables work (for example, the relationship between advertising expenditure and sales, the relationship between attitude and behavior), we are painfully reminded how little accumulated knowledge we really have on the subject despite our professional credentials.

“Core Body of Knowledge”

In general usage, the phrase “core body of knowledge” also refers to the skill set a person is expected to have to qualify for the title market researcher. However, in this paper, the term refers to marketing knowledge derived through the use of marketing research techniques.

Practical Uses of Lawlike Relationships With Examples

As practitioners of an applied discipline, we can argue that it is the method itself (which may include several aspects such as sampling, statistics, data collection and analysis methods, and mathematical model building in a limited context), not the knowledge generated by the method, that should be of concern to us. Such an argument has some intrinsic validity. However, this view is shortsighted, because it is wasteful and ignores the feedback that knowledge can potentially provide to strengthen the method.

Consider the lawlike relationship that attribute ratings of different brands roughly follow the

market share. In general, larger brands are rated more highly on practically all brand attributes, and smaller brands are rated low on practically all brand attributes (Ehrenberg, Goodhardt, and Barwise 1990; McPhee 1963). A lawlike relationship such as this will enable the researcher to understand and interpret the data much better. For example, while comparing two brands with very different market shares, the researcher may realize that it may be unproductive to carry out significance tests between the two brands because practically all attributes favor the larger brand. Instead, the researcher may look at mean-corrected scores to assess whether there are meaningful differences, after accounting for the differences between the two brands that may be due to their market share.

52001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 20: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

When we use lawlike relationships, our analysis becomes more focused. In the preceding example, we would be more interested in those attributes that do not conform to the lawlike relationship (e.g., the large brand being rated low on a given attribute). Because lawlike relationships are a summary of prior knowledge, using them forces us to take into account what is already known.

As another example, if we can uncover a lawlike relationship between advertising and sales

in different media, then we can use this to build further models and simplify and strengthen data analysis. Confirmed theories become prior knowledge when we analyze new data. As these examples show, knowledge can play an important role in marketing research, not just from a theoretical point of view, but from a practical point of view as well.

It is also true that the more we know about how marketing variables actually work, the more focused our data collection will be. From a practical point of view, firms do not have to pay for irrelevant data, information that does not in any way contribute to marketing decision making.

Braithwaite (1955, p. 1) explains the importance of formulating lawlike relationships this

way: “The function of science … is to establish general laws covering the behaviors of the empirical events or objects with which the science in question is concerned, and thereby to enable us to connect together our knowledge of separately known events, and to make reliable predictions of events as yet unknown.” We need to connect what we know through “separately known events” to derive knowledge that we can then use to predict what is not yet known. Deriving knowledge and using it to predict (e.g., success of new products, the relationship between advertising and sales, the relationship between price and quality) seem to be worthy goals for marketing researchers.

Knowledge, therefore, is not a luxury but a powerful tool that can contribute to the collection

of relevant data, avoidance of irrelevant data, and more focused analysis of the data so collected. Because knowledge leads to more relevant data and more focused analysis, we can lower the cost of research while increasing its usefulness to decision making. Knowledge contributes to more relevant and cost-efficient research.

WHY DON’T WE HAVE A CORE BODY OF KNOWLEDGE? Given all the benefits of having a core body of knowledge, we must ask ourselves why we

failed to develop it, despite the fact that every other science has been steadfastly accumulating lawlike relationships. How is that even the presence of some of the brightest minds that any profession can boast of—from Alfred Politz to George Gallup, from Hans Zeisel to Andrew Ehrenberg, from Louis Harris to Paul Green—failed to propel us into rethinking our role in decision making until now? There are many reasons, including

• The way the marketing research has been taught, • The way marketing research has been perceived over the years, • The divergent preoccupations of academics (quantitative methods and

model building) and practitioners (solving the problem at hand as quickly and cost effectively as possible),

6 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 21: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

• Lack of immediate rewards, and • An over-concern about confidentiality.

All of these reasons, except for the last two, are expounded in different places in this paper. Lack of immediate rewards and over-concern about confidentiality are discussed next.

Lack of Immediate Rewards

To be a market researcher, one does not necessarily have to generalize knowledge. In many other disciplines, this is not the case. Specific information is of use only to the extent that it contributes to a more generalized understanding of the observed phenomenon. In marketing research, information collected at a given level of specificity does not need to lead to any generalization for it to be useful. It can be useful at the level at which it is collected. For example, the question, “How many consumers say that they intend to buy brand X next month?” is a legitimate one, even if it does not lead to any generalization about consumers, brands, products, or timelines. It does not even have to be a building block to a higher level of understanding. For marketing researchers, information is not necessarily the gateway to knowledge. Information itself is often the knowledge sought by many marketing researchers.

Maybe because of the way we defined ourselves, no one seems to expect us to have a core

body of knowledge. If no one expects this of us, if there is considerable work but not commensurate reward for it, why bother? Because there is no tangible reward, there is no immediacy about it either. When there is never immediacy, it is only natural to expect that things will not get accomplished.

Over-Concern About Confidentiality

The bulk of all market research data is paid for by commercial firms that rightfully believe that the data exclusively belong to them and do not want to share them with anyone, especially with their competition. This belief has such intrinsic validity that it clouds the fact that, in most cases, confidentiality is not particularly relevant.

Let us consider data that are few years old and do not reflect current market conditions. In

such cases, data are of little use to competitors—the data are dated, and the market structure has changed. It is because of such concerns that many firms repeat similar studies year after year. Although old data are of little use to current marketing decision making, they could be of considerable use to researchers trying to identify the underlying marketing relationships. Yet, as experience shows, it is extremely difficult to get an organization to release past research data, even when concerns of confidentiality have no basis in fact. The concept of confidentiality is so axiomatic and so completely taken for granted in many businesses that it is not even open for discussion. It is as though the research data come with a permanent, indelible label “confidential” attached to them. A case can be made that non-confidential data held confident do not promote the well-being of an organization but simply deprive it of the generalized knowledge that can be built on such data.

72001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 22: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

WHAT TAKES THE PLACE OF KNOWLEDGE?

Our not having a core body of knowledge has led to at least two undesirable developments; development of models without facts (including “sonking”), and proprietary black box models.

Models Without Facts

Because we have defined ourselves as essentially collectors and interpreters of data, all we are left with are facts. So those who do realize the importance of models, especially in the academe, have attempted to build models with sparse data sets and substituted advanced statistical analysis for empirical observation. It is not uncommon to find research papers that attempt to develop elaborate marketing models based on just a single set of data supported by complex mathematics. Unfortunately, no amount of mathematics, no number of formulas, and no degree of theorizing can compensate for the underlying weakness: lack of data. When we use models without facts, we do not end up with a core body of knowledge but with, as Marder (2000, p. 47) points out, a “premature adoption of the trappings of science without essential substance.”

Sonking

A particular variation of models without facts is sonking, or the scientification of non-knowledge. It is the art of building scientific looking models with no empirical support so although they look generalizable, they are not.

The use of sonking is sharply illustrated by Ehrenberg (2001):

“…one minute the analyst does not know how Advertising (A) causes Sales (S); the next minute his computer has S=5.39A+14.56 as the ‘best’ answer. The label on the least squares SPSS/SAS regression bottle says so …”

Applying multiple regression analysis to a single set of data and using the resultant coefficients to decide multimillion dollar product decisions is an example of sonking. The regression coefficient derived from a single study is dependent on many factors (e.g. Number and type of variables in the equation, the data points, presence of collinear variables, special factors affecting the dataset, to name a few). Even if we knew all the variables that could potentially affect the dependent variable, entering all those variables will result in over fitting the model. The problems are further complicated by the fact that in most cases in marketing research multiple regression is applied to survey data (as opposed to experimental data) where there may be other confounding variables. The only way we can establish the validity of the relationship is through a number of replications. In sonking, the inductive method of science is replaced by scientific looking equations supported by tests of significance which assures the marketer that the results are valid within certain specified margin. Yet tests of significance were never designed to be a shortcut to empirical knowledge.

This is not just a theoretical concern. In a recent JMR paper, Henard & Szymanski (2001)

reviewed a number of studies in an attempt to understand why some new products are more successful than others. Their analysis compared correlations between different attributes and product performance obtained in 41 different studies. Some of the correlation ranges are reported in Exhibit 1.

8 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 23: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Exhibit 1: Predictors of new product performance (range of correlations obtained different studies)

Low High –Product advantage -0.31 +0.81 –Product innovativeness -0.62 +0.81 –Technological synergy -0.73 +0.68 –Likelihood of competitive response -0.60 +0.05 –Competitive response intensity -0.72 +0.63 –Dedicated resources -0.19 +1.00 –Customer input -0.21 +0.81 –Senior management support -0.07 +0.46

What is surprising about this exhibit is that for every single predictor variable, the

coefficients are both negative and positive. For instance how does product advantage affect performance? Well, according to the data it sometimes it will aid product performance and other times it will hinder it. It is not just for this variable for practically all variables studied (the exhibit shows only a few variables to illustrate the point). It is difficult to believe that there are no consistent relationships between any of these variables and product performance. There is not even clear directionality! We can conclude one of two things:

1. Each study was done differently, with different definitions and variables that it produced

no generalizable knowledge. After reviewing 41 studies we know as little about the relationship between different independent variables and product performance as we ever did; or

2. Different relationships are applicable to different contexts and we have no idea what such

conditions might be. The problem with lack of generalized knowledge is that we have no way of checking the

validity of new information or even our own analysis. If, through a programming error, we obtain a negative coefficient instead of a positive one, we have nothing in our repertoire that will alert to this potentially harmful error. Anything is possible and this leads to further sonking. In the absence of empirical knowledge who is to say something is nonsensical (as long as it is “statistically significant”)?

Many analysts seem to be willing to analyze data with purely statistical methods. Colombo et

al. (2000) asked 21 experts to analyze a set of brand switching tables. All analysts were provided the same set of questions. These 21 experts collectively used 18 different techniques to understand a set of straightforward contingency tables. Their conclusions did not converge either. This is not to be critical of the analysts, but to point out that without prior knowledge, without an idea as to what to look for, even experts cannot provide us with an answer the validity of which we can be comfortable with.

Market researchers who are serious about developing a core body of knowledge do not, and

should not, believe in premature theorizing and model building. Unfortunately, as Marder (2000) points out, academics that mainly develop theories and models do not have the vast resources needed to test them; practitioners who have access to data are not necessarily interested in theory building.

92001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 24: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

As it stands, we have the two solitudes: facts without models and models without facts. However, the basis of all sciences is empirical data. If our models and theories are not supported by extensive empirical data, then we don’t have a core body of knowledge.

BLACK BOX MODELS

Another outcome of the lack of a strong core body of knowledge is the proliferation of black box models. These are “proprietary” models developed by commercial vendors that are claimed to predict marketing successes of different marketing activities. For example, there are black box models that purport to predict future sales volumes of new products or forecast the success of a commercial yet to be aired. Because the mechanics of the models are not revealed to the buyer, the buyer necessarily has to rely on the persuasion skills and promises of the vendor. Black box models do not allow the buyers to evaluate for themselves the reasonableness and correctness of the assumptions and procedures involved.

Black box models implicitly assume to have uncovered some lawlike relationship or a precise way of identifying how (and which) independent variables relate to a dependent variables. Yet such models are of unknown validity and suspect because “the necessarily simple concepts behind a good method can hardly be kept a secret for long” (Ehrenberg and Bound 2000, p. 40). Claims of precise predictability by proponents of black box models can at times stretch one’s credulity. If such precise knowledge were possible, it is only reasonable to assume that it would not have eluded the thousands of other researchers who work this rather narrow field of inquiry. We can of course never be sure, because there is no way to subject these models to scientific scrutiny. If proponents of the model cannot prove the validity of the model, neither can we disprove its lack of validity. This leads us to our next point.

We are uncomfortable with black box models not necessarily because there may be less to

them than meets the eye, but because they lack the hallmark of scientific models: refutability (Popper 1992b). As Kuhn (1962) argues, even science is not protected from error. One main reason the scientific method is accepted in all disciplines is that science is self-correcting. Its models are open, and anyone can refute it with the use of logic or empirical evidence. Any knowledge worth having is worth scrutinizing. The opaqueness of black box models makes them impervious to objective scrutiny. Consequently, in spite of their scientific aura and claims to proprietary knowledge, black box models contribute little to our core body of knowledge. Unfortunately, the less we know about marketing processes, the more marketers will be dependent on black boxes of unknown validity.

A SLOW MARCH TOWARD A CORE BODY OF KNOWLEDGE

In a way, perhaps we have known all along that we need a core body of knowledge. We can assume that academics, who attempted to build models, though they did not have access to large volumes of data, did so in an attempt to develop a core body of knowledge. But not enough has been done, and we continue to lack a core body of knowledge.

10 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 25: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

In marketing research, lawlike relationships are orphans. Applied researchers tend to concentrate on gathering and interpreting information, whereas academics tend to concentrate on methodology and techniques. Academics are mainly concerned with the how well (techniques and methods), whereas applied researchers are mainly concerned with the how to (implementation and interpretation). Academics act as enablers, whereas applied researchers use the techniques to solve day-to-day problems. But, as we have been discussing, a mature discipline should be more than a collection of techniques and agglomeration of facts. It should lead to generalizable observations. It should lead to knowledge.

We can think of knowledge as our discipline. Not just information, not just techniques.

Information and skills imparted through techniques are not only ends in themselves (though they often can be), but also means to an end. That end is knowledge, and therefore, knowledge is our discipline. Marketing research can be thought of as a collection of techniques, an approach to solving problems as well as a means of uncovering lawlike relationships, which is our final goal.

From an overall perspective, we need to merge facts with models, theory with practice.

Models that cannot be empirically verified have no place in an applied discipline like marketing research. To arrive at knowledge, we convert data into information and information into lawlike relationships and merge empirical facts with verifiable theory.

I think the last point is worth emphasizing. Marketing research should be more than just an

approach to solving problems—it should result in an uncovering of lawlike relationships in marketing. It should not be just a collection of tools—it should also be what these tools have produced over a period of time that is of enduring value. Market researchers should be able to say not only that they have the tools to solve a variety of problems, but also that they have developed a core body of knowledge using these tools. In short, marketing research is not just a “means” discipline, but an “ends” discipline as well.

KNOWLEDGE AS OUR DISCIPLINE

Developing lawlike relationships and creating a core body of knowledge are marketing research’s contribution to the understanding of marketing as processes. We cannot overemphasize that marketing research does not exist solely to provide information to decision makers, but also to develop a core body of marketing knowledge. Marketing research does not simply provide input to decision makers, but is also a part of the decision-making process. Marketing research is not peripheral to, but an integral part of, marketing decision making. To treat it otherwise impoverishes both marketing and marketing research.

To develop a core body of knowledge, we need to reexamine and perhaps discard many currently held beliefs, such as

• The sole purpose of marketing research is to collect and analyze data, • Marketing researchers cannot directly participate in the decision-making

processes, • It is acceptable for decision makers to have an implicit faith in models that

cannot be put through transparent validity checks (proprietary black boxes),

112001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 26: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

• Marketing researchers can be an effective part of the decision-making process without possessing a core body of knowledge,

• All data are forever confidential, and • Lawlike relationships can be uncovered with the sheer strength of mathematical

and statistical techniques without our first having to build a strong empirical base.

I believe that we need to work consciously toward creating a core body of knowledge. We deliberately need to share information and data. We need to discourage secret knowledge, sonking and unwarranted confidentiality. We need not to be content with just immediate solutions to marketing problems.

For those of us who have long believed that marketing research is more than a glorified

clerical function, it has been obvious that a substantial part of marketing research should be concerned with developing knowledge that contributes to our understanding of marketing as a process.

For anyone who accepts this premise, it is self-evident that knowledge is our discipline. If it

had not been so in the past, it should be so in the future, if we are to fulfill our promise.

12 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 27: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

BIBLIOGRAPHY American Marketing Association (1961), Report of Definitions Committee of the American

Marketing Association. Chicago: American Marketing Association. ——— (2000), “The Leadership Imperative,” Attitude and Behavioral Research Conference,

Phoenix, AZ (January 23–26).

Braithwaite, R. (1955), Scientific Explanation. Cambridge: Cambridge University Press.

Colombo, Richard, Andrew Ehrenberg and Darius Sabavala. (2000) “Diversity in Analyzing Brand-switching Tables: The Car Challenge”, Canadian Journal of Marketing Research, Vol. 19, 23-36.

Cox, Keith K. and Ben M. Evans (1972), The Marketing Research Process. Santa Monica, CA:

Goodyear Publishing Company.

Ehrenberg, A.S.C. (1982), A Primer in Data Reduction. Chichester, England: John Wiley & Sons.

Ehrenberg, A.S.C. (2001) “Marketing: Romantic or Realist”. Marketing Research (Summer) 40-

42. ——— and John Bound (2000), “Turning Data into Knowledge,” in Marketing Research: State-

of-the-Art Perspectives, Chuck Chakrapani, ed. Chicago: American Marketing Association, 23–46.

———, G.J. Goodhardt, and T.P. Barwise (1990), “Double Jeopardy Revisited,” Journal of

Marketing, 54 (3), 82–89.

Fellman, Michelle Wirth (1999), “Marketing Research is ‘Critical’,” Marketing Research: A Magazine of Management & Applications, 11 (3), 4–5.

Financial Times (1997), The Complete MBA Companion. London: Pitman Publishing. ——— (1999), Mastering Marketing. London: Pitman Publishing.

Gibson, Larry (2000), “Quo Vadis, Marketing Research?” working paper, Eric Marder

Associates, New York.

Henard, David H. and David M. Symanski. “Why Some New Products are More Successful Than Others”, Journal of Marketing Research, Vol. XXXVIII, August. 362-375.

Kinnear, Thomas C. and James Taylor (1979), Marketing Research: An Applied Approach. New

York: McGraw-Hill.

132001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 28: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Kuhn, Thomas (1962), The Structure of Scientific Revolutions. Chicago: University of Chicago Press.

Mahajan, Vijay and Jerry Wind (1999), “Rx for Marketing Research,” Marketing Research: A

Magazine of Management & Applications, 11 (3), 7–14.

Marder, Eric (2000), “At the Threshold of Science,” in Marketing Research: State-of-the-Art Perspectives, Chuck Chakrapani, ed. Chicago: American Marketing Association, 47–71.

Marketing News (1987), “New Marketing Definition Approved,” (January 2), 1, 14.

McPhee, William N. (1963), Formal Theories of Mass Behaviour. Glencoe, NY: The Free Press.

Popper, Karl (1992a [reprint]), The Logic of Scientific Discovery. London: Routledge. ——— (1992b [reprint]), Conjectures and Refutations: The Growth of Scientific Knowledge,

5th ed. London: Routledge.

14 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 29: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

PARADATA: A TOOL FOR QUALITY IN INTERNET INTERVIEWING

Ray Poynter The Future Place

Deb Duncan Millward Brown IntelliQuest

What is the biggest difference between Internet interviewing and face-to-face interviewing?

What is the biggest difference between Internet interviewing and CATI? On the face of it the biggest differences might appear to be the presence of the new technologies, the presence of an international medium, the presence of a 24/7 resource. However, the biggest difference is the absence of the interviewer.

Historically, the interviewer has acted as the ears and eyes of the researchers. The interviewer has also operated as an advocate for the project, for example by persuading respondents to start and then to persevere and finish surveys. If something is going awry with the research there is a good chance that the interviewer will notice it and alert us. All of this is lost when the interviewer is replaced with the world’s biggest machine, namely the Web.

Although we have lost the interviewer we have gained an enhanced ability to measure and monitor the process. This measuring and monitoring is termed Paradata. Paradata is one of the newest disciplines of market research, although like most new ideas it includes and reflects many activities that have been around for a long time. Paradata is data about the process, for example Paradata includes: interview length, number of keystrokes, details about the browser and about the user’s Internet settings. The introduction of the term Paradata is credited to Mick Couper from the Institute of Social Research, University of Michigan. His initial work (1994 and 1997) was conducted with CAPI systems, looking at the keystrokes of interviewers using CAPI systems.

This measuring and monitoring, this Paradata, enables the researcher to put some intelligence back into the system, to regain control, and to improve quality.

PARADATA: A TOOL FOR EXPLORATION

The first person to popularize Paradata as a means of understanding more about the online interviewing process and experience was Andrew Jeavons (1999). Jeavons analyzed a large number of server log files to see what was happening. He noted the number of errors, backtracks, corrections, and abandoned interviews. He discovered that higher levels of mistakes and confusion were associated with higher rates of abandoned interviews (not a surprising outcome). In particular, he found that grid questions and questions where respondents had to type in numbers which summed to some figure caused more errors, corrections, and abandoned interviews. This finding led Jeavons to advise against these types of questions, or at least to avoid correcting the respondent when they failed to fill in the questions to the pattern desired by the researcher.

152001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 30: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Jeavons took this analysis further in his 2001 paper, which associated the term Paradata with his explorations and identified a number of phenomena that occur in interviews. One such phenomenon is cruising, where the respondent adopts some strategy to complete the interview quickly, for example a respondent who always selects the first option in any list. In his paper Jeavons started to explore how problems could be identified.

Jeavons identified three uses of Paradata:

• Questionnaire Optimization — Paradata can be used to help us write the questions in the best way possible. Confusion, delays, missing data all indicate that the questionnaire could be improved.

• Quality Control — Paradata can help us identify cases where respondents are making mistakes (e.g. sums not adding to a specific figure), and cases where respondents may be short-cutting the interview (e.g. selecting the first item from each list), a process Jeavons terms cruising.

• Adaptive Scripting — Jeavons raises the prospect of using Paradata to facilitate adaptive scripting. For example, we might consider asking respondents with faster connections and faster response times more questions. Adapting the approach of adaptive approaches such as ACA it may be possible to route respondents to the sort of interview that best suits their aptitudes.

Paradata with CAPI and CATI

With 20/20-hindshight we can re-classify many aspects of CAPI and CATI as Paradata. Key CAPI items include: the date of data collection, the length of the interview, and with many systems the length of time spent on each question is also recorded. Advanced systems such as Sawtooth Software’s ACA capture statistics such as the correlation between the scores on calibration concepts and the conjoint utilities – this coefficient provides an estimate of how well the interview process has captured the person’s values.

CATI systems collect data about the respondent’s interaction with the interview (e.g. date, length etc) and also about the interviewer’s interaction with the software. Paradata elements that relate to the interviewer include statistics about inter-call gaps, rejection rates, probing, and a wide variety of QA characteristics.

The main difference between CAPI/CATI Paradata and the growing practice of Internet Paradata is accessibility. In the CAPI/CATI context most Paradata is never seen beyond the programming or DP department. By contrast, researchers have shown a growing interest in the information that can be unleashed by the use of Paradata.

Optimization in Practice

At the moment the main practical role for Paradata is in optimizing questionnaires. This optimization process can be conducted in an ongoing fashion, using Paradata to define and refine ongoing guidelines. It can also be used on a project-by-project basis, to optimize individual studies.

16 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 31: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

For example, after 30 interviews the Paradata can be examined to consider:

• how long the interview took

• how long did people think the interview took

• how the open-ends are working

• how many questions have missing data

• how many interviews were abandoned

• user comments

When this approach is adopted it is important that this analysis is carried out on all the people who reached the questionnaire, not just those who complete it! Try to identify if the system crashed for any respondents.

From this Paradata you can identify whether any emergency corrections need to be made. Often this may be as simple as things like improvements to the instructions, on other occasions it may mean more significant alterations.

Tail-End Checks for Optimization

As part of the Paradata process it is useful to add questions at the end of the survey. It should be noted that these check questions can be asked to a sub-set of the respondents, there is no need to ask everybody. For example, a survey could have three tail-end questions, with each respondent being asked just one of them. A suitable configuration would be:

• open-end question asking about the overall impression of the interview;

• a five point satisfaction scale with the research;

• a question asking how long the respondent felt the survey took to complete. A good interview is one where people underestimate how long the survey took.

AN OLYMPIC CASE STUDY

In 2000 Millward Brown UK conducted a number of online and offline research projects thematically connected with the Olympics. These tests were funded by Millward Brown and allowed a range of quantitative and qualitative techniques to be compared in the context of a single event. This section looks at one of the quantitative projects and the learning that was acquired. The survey was hosted on the web and respondents invited to take part by e-mail, shortly after the Olympics finished.

368 people reached the first question:

“Thinking about the Olympic games, which of the following statements best describes your level of interest in the Olympic games?” (a closed question with 5 breaks)

172001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 32: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

• 18 people dropped out at this stage, which was 27% of all dropouts and 4% of those who were offered the question. • This is a very typical pattern, in most studies a number of people will visit the survey to see what it is like and then drop out at the first question.

350 people reached the second question:

“When you think about the Olympic games what thoughts, feelings, memories and images come to mind?” (an open-ended question)

• The key features of this question is that it is very ‘touchy feely’, and in this test it was compulsory (i.e. the respondent had to type something). • 23 people dropped out, 38% of all those who dropped out, 7% of those who reached the question. • This level of dropout on the second question is much higher than we would expect, and would appear to be due to asking a softer open-end, near the beginning, in a compulsory way.

327 people reached the third question:

“How would you rate this year's Olympic games in terms of your personal enjoyment of the games?” (a closed question with 5 breaks)

• Just 3 people dropped out, 5% of all dropouts, and a dropout rate of 1%. • This is what we would expect to see.

324 people reached the fourth question:

“What stands out in your mind from this year’s Olympic games? Please tell us why it stands out for you?” (an open-ended question, again non-trivial, again compulsory)

• 10 people dropped out, which was 17% of all dropouts, and a dropout rate of 3%. • This result appears to confirm that it is the softer, compulsory open-ends that are causing the higher dropout rates.

314 people reached the fifth question, which was another closed question.

• Just 3 people dropped out. • Over the remaining nine questions only 3 more respondents dropped out.

Previous research by us and others had identified that simple, directed, open-ends work very well in Internet interviewing. These Olympic results suggest that we should try to avoid the more open style of question, in favor of directed questions. If the softer, more open, type of

18 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 33: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

question is needed then it should appear later in the interview, and it should not be compulsory – if that is possible within the remit of the project.

This study also looked at response rates and incentives. The original invitations were divided into 3 groups: a prize draw of £100, donation of £1 per completed interview to charity, and one groups with no incentive.

Response rates Prize draw 6.7%Charity donation 5.5%No incentive 6.3%

The data suggested that small incentives did not work well. Other research has suggested that larger incentives do produce a statistically significant effect, but not necessarily a commercially significant one.

CIMS – A CASE STUDY

Millward Brown IntelliQuest’s Computer Industry Media Study (CIMS) is a major annual survey that measures US readership of computer and non-computer publications as well as the viewing of broadcast media. Prior to 1999 CIMS employed a two-phase interviewing procedure. Telephone screening qualifies business and household computer purchase decision-makers, to whom Millward Brown IntelliQuest then mails a survey packet. While part of CIMS is administered on a floppy disk, the media section had remained on paper to accommodate a large publication list with graphical logos.

Millward Brown IntelliQuest conducted a large scale study in the summer of 1999 to explore the feasibility of data collection via the Internet and to compare the response rates and survey results obtained using three alternative Web-based questionnaires, and the customary paper questionnaire.

We obtained a sample of technology influencers from a database of those who had registered a computer or computer-related product in the past year. IQ2.net, a database marketing firm that until recently was owned by Millward Brown IntelliQuest, provided a list of 27,848 records selected at random from their computer software and hardware registration database.

This database was then interviewed by phone to find people who were available, who had email and Internet access, and who were prepared to take the survey. In total 2,760 people were recruited in the time available.

Those who agreed to cooperate were then assigned at random to receive one of four questionnaire versions, yielding approximately 690 completed phone interviews per version. One cell used the pre-existing paper questionnaire; the other three cells were assigned to three different Web approaches.

Regardless of whether the questionnaire was Web-based or paper-based, the respondent had to supply an email address during phone screening. When attempting to contact these respondents electronically, 18% of their email addresses turned out to be invalid. In order to ensure comparable samples, all recipients of the paper questionnaire were also sent an email to

192001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 34: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

verify their email address. Those who failed such validation were removed from the respondent base so that all four groups were known to have valid email addresses.

Questionnaire 1: The Paper Format The paper-based questionnaire represented the current version of the CIMS media

measurement component. The questionnaire began with an evaluation of 94 publications, followed by television viewership, product ownership and usage, and concluding with demographic profiling. The format used to evaluate readership is shown below in Figure 1.

Figure 1: Paper Format

94 pub

Questionnaire 2: The HorizontThe horizontal Web version represented the closest possible visual representation of the

paper-based media questionnaire. Respondents viewed all four readership questions across the screen for each publication. Respondents had to provide a "yes" or "no" to the six-month screening question, and were expected to provide reading frequency and two qualitative evaluations (referred to collectively as "follow-up questions") for each publication screened-in. An example of the Web-based horizontal version appears in Figure 2.

Due to design considerations inherent to the Web modality, the horizontal Web version differed from the paper questionnaire in a few ways. First, only 7 publications appeared on screen at a time compared with the 21 shown on each page of the paper version.

Second, the horizontal Web version did not allow inconsistent or incomplete responses, as did the paper version. This means that the follow-up questions could not be left blank, even in the instance where a respondent claimed not to have read the publication in the past six months!

Figure 2: Horizontal Web Format

Given the onerous task of filling out the readership section (four questions bylications), we experimented with three alternative designs to determine the impact on

response rate and readership estimates.

al Web Format

20 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 35: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Questionnaire 3: The Modified Horizontal Web Format

The modified horizontal Web version was identical to the horizontal version except that it assumed an answer of "no" for the six-month screen. This allowed respondents to move past publications they had not read in the last six months, similar to the way respondents typically fill out paper questionnaires. As in the Web-based horizontal version, only seven publications appeared on each screen.

Questionnaire 4: The Vertical Web Format

The vertical Web version differed most from the original paper-based horizontal format. Respondents were first shown a six-month screen, using only black-and-white logo reproductions. After all 94 publications were screened; respondents received follow-up questions (frequency of reading and the two qualitative questions) for those titles screened-in. It was assumed that hiding the presence of follow-up questions on the Web would lead to higher average screen-ins similar to the findings of Appel and Pinnell (1995) using a disk-based format and Bain, et al (1997) using Computer-Assisted Self-Interviewing. An example of the Vertical Format is given in Figure 3.

212001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 36: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 3: The Vertical Web Format

Response Rates The initial response rates ranged from 54% for the paper format to 37% for the Horizontal

Web format. Between the Web formats the values ranged from 48% for the Vertical format to the 37% for the Horizontal format. These data are shown in Figure 4.

Figure 4: The Response Rates For The Four Cells

However, we inspected how many people commenced each type of interview a different pattern emerged. For example, 11% of those invited to complete the Web Horizontal started the interview but did not complete it (a drop-out rate of about 23%). When we add the incomplete interviews to the completed interviews, as in Figure 4, we see that the initial response rate for the 3 Web formats was very similar.

1 1% 11 %

5 4 %48 %

4 0% 37 %

4%

0 %

1 0 %

4

5 0 %

6 0 %

P a p e r W e b V e r tic a l W e b M o d if ie dH o r iz o n ta l

W e b H o r iz o n ta l

2 0 %

3 0 %

0 %

22 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 37: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Publications Read When we reviewed the number of publications that the respondents claimed to have read

(Figure 5), we see that the Vertical Web format elicited significantly higher numbers than any of the other formats. The combination of seeing the titles in groups of 7, and of not seeing the ‘penalty’ in terms of additional questions produced much higher recognition figures. This was something the smaller circulation publications liked, but was clearly an artifact of the design, discovered by the Paradata analysis process.

Figure 5: Publications Claimed

8 .6 **

1 2 .2 *

7 .9 **

1 0 .0 *

0

2

4

6

8

1 0

1 2

1 4

P ape r 8.6We b Ve rtical 12.2We b M odifie d H oriz ontal 7.9We b H oriz ontal 10

* S ign ificantly d iffe rent from a ll o thers a t 95% confidence. **S ign ificantly d ifferent from W eb V ertica l and W eb H orizonta l a t 95% confidence.

8 .6 **

1 2 .2 *

7 .9 **

1 0 .0 *

0

2

4

6

8

1 0

1 2

1 4

P ape r 8.6We b Ve rtical 12.2We b M odifie d H oriz ontal 7.9We b H oriz ontal 10

* S ign ificantly d iffe rent from a ll o thers a t 95% confidence. **S ign ificantly d ifferent from W eb V ertica l and W eb H orizonta l a t 95% confidence.

B2B Probing Millward Brown IntelliQuest conduct a great many B2B online interviews and there is

always great interest in how far the boundaries can be pushed in terms of utilizing the latest facilities, for example Flash, Java, and Rich Media.

The following data is a small example of the data that were obtained by querying the browsers of respondents who were completing a panel interview in early 2001, the countries were UK, France, and Germany.

Operating Systems Browsers Apple 2% IE 4 9% Windows 95 14% IE 5 54% Windows 98 43% IE 5.5 27% Windows NT 41% Other/DK 10% UNIX 0.20%

232001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 38: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

This data contrasted strongly with our consumer data where Windows NT was hardly present. The data allowed us to optimize settings and to demonstrate that there were not enough UNIX users to segment.

Invisible Processing Invisible processing is data collected about the respondent without asking direct questions.

For example, we can use cookies to identify how often somebody visits a site. We can query their browser to find out their screen settings. We can time them to see how long the interview took to complete.

Not all invisible processing is Paradata (for example the web site data collected by tracking companies such as Engage is primary data collection, not Paradata). Not all Paradata is invisible, e.g. asking people what they thought about the interview. Nevertheless, there is a large area of overlap between invisible processing and Paradata, and most of the guidelines and ethical considerations about Paradata stem from observations about invisible processing.

Figure 6 show an example of the sort of information that is available from the respondent’s browser, without asking the respondent an explicit question.

Figure 6: Paradata and Ethics

The overarching principle of 21st Century research is informed consent. If you are collecting invisible data you should be informing respondents. The normal practice is to include a short note in the survey header with a link to a fuller Privacy Policy statement.

It is not possible to list all of the invisible processing that might happen, since the researcher will not normally be aware of all of the options. Therefore, the Privacy Policy should highlight the broad types of data that are collected and how they will be used. If cookies are being used, they should be declared, along with their type and longevity.

24 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 39: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

If you will not be linking the Paradata to individual data you should say so. If however, you plan to use Paradata to eliminate respondents who are considered to be cheating or to be lacking in proper care, and to eliminate them from the incentive, you should say so. This only needs to be done at a general, indicative, level. For example, that interview metrics will be used to evaluate whether the interview appears to have been completed correctly and that failure to fall within these metrics may result in your data being removed from the survey and you being excluded from the incentive. For example, researchers will often exclude an interview that was completed too quickly, but it would not be advisable to warn the respondent of the specifics of this quality check.

CONCLUSIONS

The key observation is that all projects are experiments; all projects provide data that allow us to improve our understanding and our ability to improve future research.

Millward Brown have found that Paradata has allowed us to better understand the way our research instruments are performing and is an additional tool in the quality assurance program.

Amongst other uses, we find Paradata useful in:

• Optimizing questionnaires

• Avoiding rogue interviews

• Minimizing unwanted questionnaire effects

• Maximizing opportunities.

252001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 40: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES

Appel, V. and Pinnell, J. (1995). How Computerized Interviewing Eliminates the Screen-In Bias of Follow-Up Questions. Proceedings of the Worldwide Readership Research Symposium 7, p. 117.

Bain, J., Arpin, D., and Appel, V. (1995). Using CASI-Audio (Computer-Assisted Self Interview with Audio) in Readership Measurements. Proceedings of the Worldwide Readership Research Symposium 7, p. 21.

Couper, M.P, Sadosky, S.A., and Hansen, S.E. "Measuring Interviewer Behaviour Using CAPI" Proceedings of Survey Research Methods Section, American Statistical Association 1994, pp 845-850

Couper, M.P, Hansen, S.E. and Sadosky, S.A. "Evaluating Interviewer Use of CAPI Technology" in "Survey Measurement and Process Quality", Edited by Lyberg, L. et al, published by Wiley 1997, pp 267-285

Jeavons, Andrew (1999). "Ethology and the Web. Observing Respondent Behaviour in Web Surveys". Proceedings of the ESOMAR Worldwide Internet Conference Net Effects 2 London 1999.

Jeavons, Andrew (2001). " Paradata: Concepts and Applications ". Proceedings of the ESOMAR Worldwide Internet Conference Net Effects 4 Barcelona 2001.

26 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 41: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

WEB INTERVIEWING: WHERE ARE WE IN 2001?

Craig V. King & Patrick Delana POPULUS

POPULUS has been conducting web-based data collection since early in 1999. To date we

have collected approximately 230,000 completed interviews with the vast majority consisting of employee attitudinal and behavioral (adaptive conjoint analysis; ACA) studies. Response rates across all companies involved in the employee-based research (n ~130) average 51% with a range of 19% - 93%. Sample sizes for employee-based surveys vary in size from 140 to ~83,000. These surveys have also been translated into six foreign languages: Spanish, German, French, Portuguese, Dutch, and Japanese.

RESPONSE RATES

One reason response rates tend to be high in these employee studies is due to the company-wide support of the research initiative. Organizations that participate in the research notify the affected employee that the research will be taking place, identify the vendor, and communicate support for the research. The organization provides to the vendor a spreadsheet containing names, email addresses, and other relevant information. We then send out an email invitation to participate in the research. Key elements of the invitation include the URL – generally as a hyperlink – and password. The invitations are sent using a plain text format, not HTML. And the invitations do not include any attachments. The reason for not including the text as an attachment is because some email systems do not have the capability to manage attachments. In most instances, a reminder notification is sent 7 – 21 days following the initial notification. Reminders are effective in maximizing response rates and are highly recommended (King & Delana, 2001). The reminder notifications include the URL, the individual’s password, and two notifications of the deadline.

Non-employee research on the Web has primarily consisted of targeted business consumer-based research projects. We have conducted several studies in different industries to evaluate product attributes, preference, need, and purchase intent using conjoint and non-conjoint approaches. In one study, assessing company desires for enhanced product features, a telephone list was generated targeting businesses of specific sizes and industry types. Telephone calls screened potential respondents and then assessed interest in completing the research. Email addresses were captured with the understanding that respondents would receive an invitation to participate in the research via email. Response rates for this study varied depending on the incentive offered. Initially, respondents who agreed over the telephone to participate in the research were offered a charity donation in their name. For this incentive, there was a 20% response rate of people who actually agreed to participate in the survey and who provided their email address to the telephone interviewer. Response rates increased when different incentives were offered. Specifically, when electronic cash redeemable at several on-line businesses was offered, response rates increased to approximately 35%.

272001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 42: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

In an adaptive conjoint study targeting IT professionals, we sent invitations through purchased email lists of IT professionals. Response rates varied from .5% to 3% depending on the source of the list. Company-supplied email lists generally result in much better response rates to Web surveys. On average, we have seen about a 35% response rate when we have email addresses of current customers supplied by the client.

SAMPLING

Internet-based research has several advantages over traditional research methods (Miller, 2001). Some of the main advantages include the speed at which data can be collected. For a consumer-based survey a client could easily obtain all necessary respondents over a weekend period. Emails sent to a panel or a list can be sent on Friday, with all respondents completing the survey by Monday. Another advantage is the complete lack of interviewer bias. A computer lacks any personal attributes that might affect response patterns in any systematic way.

The cost of Internet data collection has been touted as being cheaper than any other methodology. In many instances, especially when a good list of email addresses is readily available, the cost per interview for Web data collection is very low. However, in studies where email addresses are not easily obtained, a traditional telephone survey may actually be cheaper. If the survey is brief and the topic is of interest, it is probably less expensive to collect data over the telephone than it would be to collect it over the Internet.

The biggest disadvantage of Web data collection is the fact that the Web sample may not be representative of the population of interest. Approximately 50% of households have Internet access, while approximately 70% of the US population has access to the Internet. If people who do not have Internet access are potential or current customers there is a possibility they will behave differently than people with Internet access. While weighting procedures may be developed to adjust for differences, it is difficult to assess the exact algorithm needed to effectively weight for non-response due to lack of Web access.

Another challenge in collecting data via the Web has to do with obtaining email addresses. Email addresses are difficult to obtain and there is no set protocol for email addresses like there is for telephone numbers. If a list is not readily available from the client, access to opt-in lists is an option. However, response rates may be very low. A panel of respondents solves some of the problems, but panels are expensive to develop and maintain.

INCENTIVES

While there was some discussion of incentives earlier in the paper, it is important to expand on that discussion in more detail. We have seen improved response rates from the same sample list when higher amounts of cash were offered. Specifically, we slowly increased the incentive from a drawing to a fixed dollar amount for each respondent. When dollar amounts exceeded $10, response rates increased dramatically. The disadvantage of offering cash is that it can quickly become very expensive to pay all respondents for their time. Additionally, it is

28 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 43: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

expensive and time-consuming to process many individual respondent payments. Drawings for cash and prizes may or may not be an effective incentive. The challenge is to determine the optimal amount of the reward that will be effective without spending too much money. Drawings may not be as effective for business-based research as they are for consumer research.

TECHNICAL CONSIDERATIONS

There are several factors to consider when developing a survey for the Web. The variability in computer configurations is large and issues such as screen size and resolution should always be considered when developing a survey. Respondents with low resolution or small monitors may have difficulty viewing the entire question or set of responses without scrolling vertically or horizontally. Anecdotal evidence suggests that respondents will quickly become frustrated with the survey if they are required to constantly scroll in order to view the question.

Another limitation is related to Web browsers. Certain browsers do not support all possible Web interviewing features. We recommend that researchers carefully review the survey on several different computers with various browsers and screen resolutions to ensure that the desired visual effect is preserved across the various configurations.

Many home computer users are connecting to the Web with a 56k modem. These connections are slower than DSL or T1 connections and can become very slow when the survey contains more than simple text. Weighing the balance between design and respondent capabilities is important.

Many people simply do not have Web access and thus are not viable candidates for a Web interview. Others, who might have Web access, may not have the technical skills needed to complete a survey on-line. While people are becoming increasingly more proficient with computers, there are many who still do not understand some basic features, such as scroll bars and “next page” buttons. We recommend that respondents be provided easy access to a “FAQ” page to assist with simple technical problems. Our surveys contain a FAQ page link in the footer of each page.

Another technical consideration has to do with user access to the survey. If a researcher is inviting respondents to take an on-line survey, we recommend using passwords that control access to the survey. Password access will ensure that only targeted respondents will complete the survey. Additionally, this approach prevents respondents from accessing the survey more than once, thereby increasing validity of the responses. However, when assigning passwords it is critical to keep a few general rules in mind. Mixing alpha and numeric characters will increase the difficulty of unauthorized access; however, it does pose additional problems. Specifically, characters such as the numeral “1”, an upper case letter “I” (i), and a lower case letter “l” (L) appear very similar and are easily confused. Other potential problems include the use of the numeral “0” (zero) and the upper case letter “O”, as well as the lower case letter “w” and two lower case letters “vv” (v + v) in passwords.

292001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 44: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

We recommend using automated on-line technical support programs as well. All our surveys have an email address to request help; each person who sends an email will receive an automated response with information contained in the FAQ page as well as login instructions.

MEASUREMENT

It has been demonstrated that methodology does affect response patterns (Kraut, 1999), and several studies have compared the data integrity of Web surveys to other methodologies (Hezlett, 2000; Magnan, Lundby, & Fenlason, 2000; Miller, 2001; Spera, 2000; Yost & Homer, 1998). Miller reported a greater tendency for respondents to use end points when using telephone surveys than Web surveys. Specifically, there appears to be a general trend toward higher acquiescence when using a telephone compared to either mail or Web surveys (Kraut, 1999; Miller 2001). Miller believes that acquiescence associated with telephone interviews may be a function of interviewer bias; whereas, respondents taking Web surveys may have a greater feeling of anonymity and feel more comfortable providing responses that are more reflective of their true feelings and less influenced by a desire to please the interviewer.

While there is no apparent difference in overall response patterns when comparing pencil and paper (P&P) surveys to Web surveys, there are a few subtle differences that seem to be emerging. When comparing overall mean responses to a question across Web and P&P, the results are mixed. Some studies find tendencies toward slightly higher means using the Web, while others find the opposite. The same holds true for missing data. Fenlason (2000) and Yost and Homer (1998) reported higher amounts of missing data (incomplete surveys), while our experience indicates very low levels of missing data. Spera (2000) reported that response rates were lower for the Web than for P&P. However, there was a consistent finding that open-ended responses were longer and more rich (i.e., detailed and specific) when they were provided on the Web than when they were provided on P&P surveys (Fenlason, 2000; Yost & Homer, 1998).

COMPARISON OF WEB SURVEY TO TRADITIONAL MAIL SURVEYS

In an effort to determine if medium affects response patterns, a study was conducted comparing responses from traditional mail surveys to responses from a Web survey. Respondents were current customers who had provided both valid mail and email addresses. The sample was randomly divided into two groups: an email recruit to a Web site and a traditional mail survey. The surveys were identical except that one group had a P&P mail survey and the other group received an email recruiting them to the survey site. The recruiting letter was identical in both situations except when referring to the medium.

There were no significant differences in demographics between the two recruiting methods for those who completed the survey. Additionally, the overall response patterns were similar for both samples. However, as can be seen in Figure 1, there was a general tendency for the Web sample to have a slightly lower mean for each question, although only 3 of the 47 were significantly different.

30 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 45: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 1: A Comparison of Web and Mail Response Patterns

Comparison of Email and Convential M ail Responses

1

2

3

4

5

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47

Mea

n Sa

tisfa

ctio

n (5

= V

ery

Satis

fied)

Mail-backE-mail

Significantly

SAWTOOTH INTERNET SOFTWARE (CIW)

Sawtooth Software developed their adaptive conjoint analysis (ACA) module for the Web as a response to a request from POPULUS in 1999. The ACA module was based on earlier Internet software developed by Sawtooth that could accommodate a simple questionnaire on the Web. We were pre-beta and beta testers of the ACA Web interviewing software prior to the final release.

The Sawtooth Web interviewing software continues to evolve and many improvements have been added since its initial release. POPULUS continues to utilize this software for all of its Web data collection needs.

Sawtooth Software Strengths

In our opinion, the biggest strength of CiW is the enabling of Web ACA. Web ACA has many advantages over DOS ACA (Windows ACA is being released soon). Specifically, Web ACA:

• Allows for the inclusion of more than five levels for a given attribute

• Accommodates longer text labels, thus allowing more detail

312001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 46: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

• Accommodates typographical devices such as italics, bolding, and underlining at the text

level • Accommodates many languages including Japanese

• Allows respondents to view their attribute importance scores on a single page at the

completion of the survey.

Other strengths of CiW include compatibility with most older Web browsers. This is important because many respondents do not have computers that support current Web browsers, which would eliminate them from any study. Other strengths of CiW include:

• The ability to randomize answer choices within a question, or to randomize a question

within a block of questions on a page

• Automatic progress bar indicating real time progress to the respondent

• Password protection allowing only those people who are invited to complete the survey

• Stop-and-restart capability

• The ability to incorporate simple skip patterns

• Next page button above footer, which does not force respondents to scroll to the very bottom of the page before advancing

• The addition of a free form question that allows programmers greater flexibility in customizing the instrument

Sawtooth Software Weaknesses As with virtually any product, there are opportunities for improving CiW. In our opinion, the

biggest weakness in the current release is the inability to build lists. Some list building is possible, but it has to be accomplished by programming many “if statements” which become extremely cumbersome and require many pages and questions. Another area that could be improved is the skip patterns. Complex skip patterns are difficult to program and require the creation of many pages to accomplish the goal.

A general weakness of Web interviewing software is the click and scroll features required to navigate through the questionnaire. Perhaps this is more of a function of our age and desire to use keystrokes rather than a mouse to move through a survey and make responses.

In the current version, there is no feature to assist in managing quotas or to force quota sampling in real time. The only way to manage quotas is to constantly monitor the data collection and then to reprogram the questionnaire to terminate interviews that meet the quotas.

32 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 47: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

A couple of weaknesses that are probably more specific to our needs have to do with obtaining real-time results. We often have multiple samples taking the same survey and each participating company requires regular updates regarding response rates. It is not possible for us to obtain this information without first downloading the data and processing it. And finally, data downloads can be very time consuming, mainly because we often have files with thousands or tens of thousands of respondents. In most instances, data download times will not be a problem because of smaller numbers of interviews.

CONCLUSION

The Web is a constantly changing world. Software and Web browsers are updated regularly and it is important to keep abreast of these changes if one desires to conduct Web interviews. Internet penetration continues to grow at a rapid pace. According to a recent poll reported in the Wall Street Journal, approximately 50% of households now have Internet access. This is roughly a 50% increase in penetration over the past three years. People are becoming more computer literate as well. With higher penetrations of personal computer usage and an increase in Web activity, it is becoming easier for the researcher to obtain valid information through the use of computers and the Web.

In our opinion, the greatest challenge in the Web interviewing world is to find respondents. Panels appear to be effective for most consumer research; however, for targeted business-to-business needs it may be somewhat difficult to conveniently gain access to the desired respondents.

We predict that Web interviews will continue to gain share of interviews. The speed and ease at which information can be obtained makes the Web an ideal medium for many data collection needs.

332001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 48: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES

Fenlason, K. J. (2000, April). Multiple data collection methods in 360-feedback programs: Implication for use and interpretation. Paper presented at the 15th annual conference of the Society for Industrial and Organizational Psychology, New Orleans, LA.

Hezlett, S. A. (2000, April). Employee attitude surveys in multinational organizations: An investigation of measurement equivalence. Paper presented at the 15th annual conference of the Society for Industrial and Organizational Psychology, New Orleans, LA.

King, C. V., & Delana, P. (2001). Web Data Collection: Reminders and Their Effects on Response Rates. Unpublished Manuscript.

Kraut, A. I. (1999, April). Want favorable replies? Just call! Telephone versus self-administered surveys. Paper presented at the 14th annual conference of the Society for Industrial and Organizational Psychology, Atlanta, GA.

Magnan, S. M., Lundby, K. M., & Fenlason, K. J. (2000, April). Dual media: The art and science of paper and internet employee survey implementation. Presented at the 15th annual conference of the Society for Industrial and Organizational Psychology, New Orleans, LA.

Miller, T. W. (2001). Can we trust the data of online research? Marketing Research: A Magazine of Management and Application, 13(2).

Spera, S. D. (2000, April). Transitioning to Web survey methods: Lessons from a cautious adapter. Paper presented at the 15th annual conference of the Society for Industrial and Organizational Psychology, New Orleans, LA.

Yost, P. R. & Homer, L. E. (1998, April). Electronic versus paper surveys: Does the medium affect the response? Paper presented at the 13th annual conference of the Society for Industrial and Organizational Psychology, Dallas, TX.

34 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 49: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

USING CONJOINT ANALYSIS IN ARMY RECRUITING

Todd M. Henry United States Military Academy

Claudia G. Beach United States Army Recruiting Command

ABSTRACT

The Army’s past efforts to structure recruiting incentives ignored its prime market’s preferences. This study extends conjoint utilities by converting them to probabilities for use as primary input to a goal program. The final product determines incentive levels for Army career field and term of service combinations, and the total incentive budget requirements for a fiscal year. INTRODUCTION

In recent years, the Armed Forces of the United States particularly the Army, Navy and Air Force, have been faced with the increasingly difficult task of attracting and recruiting the required number of enlistees. These factors become more significant when the U.S. economy is strong, which reduces the number of individuals who would possibly enlist. The fact that the United States has had an extremely strong economy in recent years has played a major role in effectively reducing the number of 17 to 22 year-olds who would consider enlisting in the Armed Forces. The 17 to 22 year-olds comprise the prime market segment in the U.S. for recruitment into entry-level occupational specialties. The three factors which have caused the Army to have more difficulty recruiting are:

• An extremely low unemployment rate among the prime market segment • A decrease in the propensity to serve, as tracked by the Youth Attitude Tracking Survey

(YATS)

• An increase in the number of young people attending 2-year and 4-year colleges.

As a result, the Army and the U.S. Army Recruiting Command (USAREC) are faced with offering enlistment incentives to entice those who would not otherwise serve in the Army to enlist. The problem is which incentives to offer, when, and to which occupational specialties. These questions are addressed during the Enlisted Incentive Review Board (EIRB). The current method of assigning enlistment incentives does not consider recruit preferences for incentives and thus cannot predict the number of enlistments for a given incentive nor can it evaluate the effects of new incentives. The EIRB requires a quantitative decision support tool that will accurately predict incentive effects and calculate the total cost for offering these incentives. This

352001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 50: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

paper describes the methodology used to create such a decision support tool, known as the Enlisted Bonus Distribution Model.

EFFECTIVE NEEDS ANALYSIS

The Enlisted Incentive Review Board requires a flexible decision support tool to do the following:

• Predict the number of individuals who will enlist into a given occupational specialty for a given incentive and time of service

• Determine the optimal mix of incentives to offer and to which occupational specialties

• Determine the total cost for offering these incentives

• Minimize the deviation from the recruiting goals for each occupational specialty

Predicting numbers of individuals who will enlist for a certain incentive package requires

data on recruit preferences. A choice-based conjoint (CBC) analysis will provide this preference data. A Microsoft Excel® based integer goal program model has the characteristics required to solve for the optimal mix of incentives to offer. The model is integer-based because the decision is whether or not to offer a certain incentive to a given occupational specialty for a certain term of service. In addition, a goal program realistically models the incentive environment since every occupational specialty has an annual recruitment goal. For these reasons a binary integer goal program model was selected as the best alternative.

THE CHOICE-BASED CONJOINT STUDY

MarketVision Research® was contracted to complete a market survey of the Army’s target population to assess the effectiveness of different enlistment incentives. MarketVision used choice-based conjoint analysis in its assessment. Relevant attributes for the CBC study included career field, term of service and incentives. There are approximately 194 entry-level occupational specialties organized into 26 career fields. Although occupational specialty data is required for the problem, career field data was obtained to reduce the number of choice tasks required for each respondent. Table 1 shows the career fields (military positions) included in the CBC study.

36 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 51: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Military PositionMilitary IntelligenceMilitary PolicePsychological OperationsAdministrationAviation OperationsMedicalTransportationPublic Affairs/Journalism

Electronic Warfare/Intercept Systems maintenanceAutomatic Data Processing/Computers

AmmunitionSignal OperationsSupply and ServicesVisual Information/SignalAir Defense ArtilleryInfantryArmorCombat EngineeringElectronic Maintenance and Calibration

Field ArtilleryTopographic EngineeringAircraft MaintenanceMechanical MaintenanceElectronic Warfare/Cryptologic OperationsGeneral Engineering/ConstructionPetroleum and Water

Military PositionMilitary IntelligenceMilitary PolicePsychological OperationsAdministrationAviation OperationsMedicalTransportationPublic Affairs/Journalism

Electronic Warfare/Intercept Systems maintenanceAutomatic Data Processing/Computers

AmmunitionSignal OperationsSupply and ServicesVisual Information/SignalAir Defense ArtilleryInfantryArmorCombat EngineeringElectronic Maintenance and Calibration

Field ArtilleryTopographic EngineeringAircraft MaintenanceMechanical MaintenanceElectronic Warfare/Cryptologic OperationsGeneral Engineering/ConstructionPetroleum and Water

Table 1: Career Fields

Terms of service from two to six years were included in the study. Incentives evaluated in the study were the Army College Fund (from $26,500 to $75,000), enlistment bonus (from $1,000 to $24,000) and a combination of both the Army College Fund and the enlistment bonus. Appendix A shows the terms of service and incentive levels tested in the study.

506 intercept interviews were conducted in malls at ten locations throughout the U.S. Respondents were intercepted at random and presented with 20 partial profile choice tasks as

shown in Figure 1.

Figure 1: Choice Task

1 A irc ra ft M a in te n a n c e

3 -y e a r e n lis tm e n t

$ 8 ,0 0 0 E n lis tm e n t B o n u s

a n d

$ 4 9 ,0 0 0 A rm y C o lle g e F u n d

2 T ra n sp o r ta tio n

5 -y e a r e n lis tm e n t

$ 8 ,0 0 0 E n lis tm e n t B o n u s

a n d

$ 1 9 ,6 2 6 G I B ill

3 P e tro le u m a n d W a te r

4 -y e a r e n lis tm e n t

$ 1 ,0 0 0 E n lis tm e n t B o n u s

a n d

$ 5 0 ,0 0 0 A rm y C o lle g e F u n d

4 I f th e se w e re m y o n ly o p tio n s I w o u ld n o t e n lis t

W h ic h o f th e se th re e e n lis tm e n t s c e n a r io s w o u ld y o u c h o o se ?

1 A irc ra ft M a in te n a n c e

3 -y e a r e n lis tm e n t

$ 8 ,0 0 0 E n lis tm e n t B o n u s

a n d

$ 4 9 ,0 0 0 A rm y C o lle g e F u n d

2 T ra n sp o r ta tio n

5 -y e a r e n lis tm e n t

$ 8 ,0 0 0 E n lis tm e n t B o n u s

a n d

$ 1 9 ,6 2 6 G I B ill

3 P e tro le u m a n d W a te r

4 -y e a r e n lis tm e n t

$ 1 ,0 0 0 E n lis tm e n t B o n u s

a n d

$ 5 0 ,0 0 0 A rm y C o lle g e F u n d

4 I f th e se w e re m y o n ly o p tio n s I w o u ld n o t e n lis t

W h ic h o f th e se th re e e n lis tm e n t s c e n a r io s w o u ld y o u c h o o se ?

372001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 52: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Using statistical analysis, MarketV ted utilities, that represent resp , term of

Table 2. CBC Utilities

he CBC utilities will be converted to probabilities and serve as a major input to the Enlisted Bon

NLISTED BONUS DISTRIBUTION MODEL

The Enlisted Bonus Distribution Model is a binary integer goal program that minimizes the dev

Military Position UtilityMilitary Intelligence 1.5446Military Police 1.0058Psychological Operations 0.8385Administration 0.4037Aviation Operations 0.3262Medical 0.2913Transportation 0.2699Public Affairs/Journalism 0.2505Electronic Warfare/Intercept Systems maintenance 0.2144Automatic Data Processing/Computers 0.1446

Ammunition 0.1358Signal Operations 0.0617Supply and Services 0.0313Visual Information/Signal -0.0535Air Defense Artillery -0.1251Infantry -0.1379Armor -0.1410Combat Engineering -0.1996Electronic Maintenance and Calibration -0.2006

Field Artillery -0.2512Topographic Engineering -0.5162Aircraft Maintenance -0.5431Mechanical Maintenance -0.5907Electronic Warfare/Cryptologic Operations -0.7675General Engineering/Construction -0.7881Petroleum and Water -1.2649

ision Research calculaondent’s preferences, for each of the attributes tested: career field (military position)

service, Army College Fund, and enlistment bonus. Table 2 shows the utilities.

Military Position UtilityMilitary Intelligence 1.5446Military Police 1.0058Psychological Operations 0.8385Administration 0.4037Aviation Operations 0.3262Medical 0.2913Transportation 0.2699Public Affairs/Journalism 0.2505Electronic Warfare/Intercept Systems maintenance 0.2144Automatic Data Processing/Computers 0.1446

Ammunition 0.1358Signal Operations 0.0617Supply and Services 0.0313Visual Information/Signal -0.0535Air Defense Artillery -0.1251Infantry -0.1379Armor -0.1410Combat Engineering -0.1996Electronic Maintenance and Calibration -0.2006

Field Artillery -0.2512Topographic Engineering -0.5162Aircraft Maintenance -0.5431Mechanical Maintenance -0.5907Electronic Warfare/Cryptologic Operations -0.7675General Engineering/Construction -0.7881Petroleum and Water -1.2649

U tilityE n lis tm e n t P e rio d

0.05760.0237

Enlistment bonusArmy College Fund

Utility(per $1000)Incentive0.05760.0237

Enlistment bonusArmy College Fund

Utility(per $1000)Incentive

0 .432 00 .303 50 .037 0-0 .243 4-0 .529 1

2 -yea r3 -yea r4 -yea r5 -yea r6 -yea r

U tilityE n lis tm e n t P e rio d0 .432 00 .303 50 .037 0-0 .243 4-0 .529 1

2 -yea r3 -yea r4 -yea r5 -yea r6 -yea r

Tus Distribution Model.

E

iations from the recruiting goals for each military occupational specialty (MOS) while remaining within the recruiting budget. Figure 2 describes the inputs and outputs of the model:

38 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 53: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Binary Integer Goal Program

Probabilities of Selection

Recruiting Goals per MOS

17 - 22 Year Old Population

Inputs Outputs

Incentives to Offer each MOS

Total Cost FY Recruiting

Budget

Incentive Costs

Binary Integer Goal Program

Probabilities of Selection

Recruiting Goals per MOS

Recruiting Goals per MOS

17 - 22 Year Old Population

17 - 22 Year Old Population

Inputs Outputs

Incentives to Offer each MOS

Total Cost

Budget

Incentive Costs

Figure 2: Model Description

The probabilities of selection are obtained from the CBC utilities calculated by MarketVision Research ®. Each career field, term of service and incentive combination is considered a product. The total product utility, Uijk, is given as the sum of the career field (i ) utility, term of service (j) utility and incentive (k) utility.

Uijk = utilitycareer field + utilityterm of service + utilityincentive

This product utility represents the odds in favor of a positive response to the product. These

odds must then be converted to probabilities of positive response. The probability of a positive response to the product is then given by1:

The estimated fraction, Pijk, of the 17-22 year-old population that would enlist for a specific career field (i), term of service (j) and incentive (k) is given by2:

ijk

ijk

U

U

ijk eep+

=1

∑∑∑=

i j kijk

ijkijk p

pP

This gives us the fraction of the population who would enlist into a certain career field given a term of service and incentive. The model requires the estimated fraction of the population who 1 Joles et al, An Enlistment Bonus Distribution Model, 1998 2 Joles et al, An Enlistment Bonus Distribution Model, 1998

392001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 54: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

would enlist for a certain occupational specialty (m), term of service (j) and incentive (k), Pmjk. This requires that Pijk be divided among all occupational specialties within the career field based on the percentage fill of the occupation specialty within the career field. For instance, MOS 13C has a recruiting goal of 108 out of 3350 for the field artillery career field. Therefore, Pmjk for 13C is given as:

)3350/108(

)(%

×=

×=

ijk

ijkmjk

P

fieldcareerofPP

Incentive policy allows for only one incentive level to be offered to each occupational specialty for a certain term of service. For example, occupational specialty 11X may be offered the following incentives for a 2-year term of service:

$2,000 enlistment bonus

or $26,500 Army College Fund

or $1,000 enlistment bonus plus $26,500 Army College Fund

Both a $1,000 and $2,000 enlistment bonus could not be offered to 11X for a 2-year term of service. This affects the fraction of the population calculations above. Because only one incentive type can be offered, we must assume that a higher incentive level will also attract those persons who would enlist for a lower incentive level. For instance, if a $2,000 enlistment bonus is offered to occupational specialty 11X for a 2-year term of service, we will also attract those who would enlist into occupational specialty 11X for a 2-year term of service given a $1,000 enlistment bonus.

The expected number of recruits, Rmjk, to enlist into occupational specialty (m), for term of service (j), given incentive (k) is given by:

Rmjk = Pmjk * (17-22 year-old population)

The model then uses Rmjk to determine the optimal mix of incentives to offer occupational

specialty (m) to meet its recruiting goal. The decision variables in the binary integer goal program are which incentives to offer each

occupational specialty. The benefits and costs for offering each of the incentives are evaluated and considered globally through out the entire solution space. That is, the effects of each incentive are evaluated with regard to their impact on the model as a whole.

40 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 55: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

There are four categories of constraints in the model (besides the binary constraint on the decision variables). The first category is the recruiting goal for each occupational specialty. These are goal constraints with the left-hand side as the summation of all the expected recruits from all the offered incentives and the right-hand side is the recruiting goal for that individual occupational specialty.

The second type of constraint is the budget constraint. Because the budget is for all incentives and for all occupational specialties there is no budget constraint for each individual occupational specialty. The left-hand side of the budget constraint is the summation of the costs for all the incentives offered multiplied by the number of individuals who select those incentives for all occupational specialties. The right-hand side is the fiscal year recruiting budget.

The third category of constraint is that only one level of each incentive (enlistment bonus, Army College Fund, and a combination of both) can be offered to each occupational specialty for a given term of service. For instance, occupational specialty 11X could be offered a $4,000 enlistment bonus for a 3-year term of service; a $50,000 Army College Fund for a 4-year term of service; and a $1,000 enlistment bonus plus a $40,000 Army College Fund for a 5-year term of service. For each occupational specialty there are fifteen of these constraints (one constraint for each type of incentive and each possible term of service).

The final type of constraint is on the minimum term of service required for each occupational specialty. Each occupational specialty is assigned a minimum term of service. No incentives can be offered for terms of service less than the minimum.

The objective function for the goal program minimizes the deviations from the recruiting

goals for each occupational specialty. The objective function follows:

MINIMIZE MOSU overw∑ ∑ +

order to find a global solution that minimizes the deviations over all occupational specialties.

)( MOSO

MOSs TOS Incentives

underw∑

WU represents the weight assigned for an occupational specialty attracting below/under the recruiting goal and WO represents the weight assigned for an occupational specialty attracting above/over. These weights allow the user to identify critical occupational specialties. underMOS and overMOS are over and under variables for each of the occupational specialties included in the model. This objective function allows solutions that may include local overages or shortages in

412001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 56: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

MODEL RESULTS

The model contains over 17,500 decision variables and over 13,600 constraints. This size requires use of an Extended Large-Scale Solver Engine from Frontline Systems, Inc. Table 3 shows a small portion of model results.

MOS TARGET # Recruits Offer # Recruits Total Cost(Acc seats) No Incentives Incentives Expected

00B1 48 24 Yes 48 354,74411X1 8534 4267 Yes 6153 18,234,41212B1 288 144 Yes 288 783,08012C1 80 40 Yes 80 191,42613B1 1513 756 Yes 1513 7,015,58613C1 108 54 Yes 108 497,58713D1 162 81 Yes 162 746,63013E1 289 144 Yes 289 1,337,60113F1 393 196 Yes 393 1,823,21413M1 336 168 Yes 336 1,548,851

Table 3: Model Results

For MOS 13B1, with a goal (target) of 1513, the model has determined the optimal mix of incentives to offer to achieve this goal. The cost for enlisting 1513 into 13B1 is $7,015,586. Table 4 below shows the incentive report generated by the model. To meet the recruiting goal for 13B1 we should offer a $10,000 enlistment bonus for a 3-year term of service, $49,000 Army College Fund for a 3-year term of service, and a $4,000 enlistment bonus plus a $33,000 Army College Fund for a 3-year term of service.

Table 4: Incentive Report

Total Budget Required $100,000,000FY01 Recruiting Budget $100,000,000

Total # Recruits 30,708

MOS TARGET Min TOS # Recruit Total Cost Incentives to OfferExpected

13B1 1,513 3 1,513 $7,015,586 3yr10EB 3yr49ACF 3yr4EB33ACF13C1 108 3 108 $497,587 3yr10EB 3yr49ACF13D1 162 3 162 $746,630 3yr10EB 3yr49ACF13E1 289 3 289 $1,337,601 3yr10EB 3yr49ACF 3yr4EB33ACF13F1 393 3 393 $1,823,214 3yr10EB 3yr49ACF 3yr4EB33ACF13M1 336 3 336 $1,548,851 3yr10EB 3yr49ACF13P1 338 3 338 $1,559,387 3yr10EB 3yr49ACF 3yr4EB33ACF13R1 86 3 86 $395,114 3yr10EB 3yr49ACF

42 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 57: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

RESULTS

Model verification involved comparing the model’s predicted recruits into an occupational specialty for a specific incentive against fiscal year recruiting data for 2000. Figure 1 shows the results for 11X.

Actual vs Predicted Recruits for 11X

0500

100015002000

3yr2E

B33ACF

3yr4E

B

4yr50

ACF

4yr5E

B40ACF

4yr8E

B

5yr2E

B50ACF

5yr10

EB

6yr4E

B40ACF

6yr12

EB

Actual # RecruitsPredicted # Recruits

P-value = .363

Actual vs Predicted Recruits for 11X

0500

100015002000

3yr2E

B33ACF

3yr4E

B

4yr50

ACF

4yr5E

B40ACF

4yr8E

B

5yr2E

B50ACF

5yr10

EB

6yr4E

B40ACF

6yr12

EB

Actual # RecruitsPredicted # Recruits

P-value = .363

Figure 3: Verification Results

Figure 3 shows that the model does a good job of estimating recruits at the shorter terms of service, but over-estimates recruit preference for longer terms of service. Performing a simple analysis of variance for the actual versus predicted recruit data over all occupational specialties and incentives results in a p-value of .363. This p-value seems to indicate that the data may be statistically similar, although not strongly. This may be caused by study subjects being willing to accept longer periods of commitment than if they were actually signing a contract. Recruiting policies may also affect the results. Not all incentives are offered throughout the entire year. Given the data for 11X in table 3, $10,000 enlistment bonus for a 5-year term of service may have only been offered for a brief period of time. This would effectively reduce the 17-22 year-old population who were aware of this incentive. Also, not every recruit that enters a recruiting station is able to enlist into every occupational specialty. A battery of tests determines his enlistment choices and only the incentives offered to that small number of occupational specialties are made known to the recruit. So not all “products” are available and known by all recruits, which is contrary to the conjoint study assumption that all products are offered to everyone.

432001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 58: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

CONCLUSION

Choice-based conjoint study utilities adequately portray prime market preferences. Current recruiting policies somewhat diminish the validity of these utilities. These utilities can easily be converted to fractions of the population who would enlist for a certain recruiting product. This fraction of the population can then be used in a binary integer goal program, Enlisted Bonus Distribution Model, to determine the optimal mix of incentives to offer each occupational specialty to ensure it meets its recruiting goal.

Future research should center on modeling changes in the U.S. economy on recruitment. Specifically, the CBC study utilities will change as the U.S. economy changes. Finding a relationship between economic factors and CBC utilities would allow the model data to change as economic factors change.

REFERENCES

Army College Fund Cost-effectiveness Study. Systems Research and Applications Corporation and Economic Research Laboratory, Inc., November 1990.

Asch, Beth and Dertouzos, James. Educational Benefits Versus Enlistment Bonuses: A Comparison of Recruiting Options. Rand Corporation, 1994.

Asch, Beth et al. Recent Trends and Their Implications: Preliminary Analysis. Rand Corporation, MR-549-A/OSD. 1994.

Clark, Captain Charles G. Jr. et al. The Impact of Desert Shield/Desert Storm and Force Reductions on Army Recruiting and Retention. Department of Systems Engineering, United States Military Academy, May 1991.

Curry-White, Brenda et al. U.S. Army Incentives Choice-based Conjoint Study. University of Louisville Urban Studies Institute, TCN 96-162, April 1997.

Dolan, Robert J. Conjoint Analysis: A Manager’s Guide. President & Fellows of Harvard College, Harvard Business School, 1990.

Enlisted Bonus Distribution Conjoint Study. MarketVision Research, 2000.

Joles, Major Jeffery et al. An Enlisted Bonus Distribution Model. Department of Systems Engineering, United States Military Academy, February 1998.

Perspectives of Surveyed Service Members in Retention Critical Specialties. United States General Accounting Office, GAO/NSDIA-99-197BR, August 1999.

Pinnell, Jon. Conjoint Analysis: An Introduction. MarketVision Research, 1997.

Solver User’s Guide. Frontline Systems, Inc., 1996.

44 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 59: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Appendix A

Terms of Service and Incentive Levels Tested

2-Year Enlistment Incentives

No Enlistment Bonus $1,000$3,000$5,000$7,000$9,000$26,500 Army College Fund$39,000 Army College Fund$1,000 and $26,500 Army College Fund$4,000 and $26,500 Army College Fund$2,000 and $39,000 Army College Fund$8,000 and $39,000 Army College Fund

3-Year Enlistment IncentivesNo Enlistment Bonus$1,000$2,000$4,000$6,000$8,000$10,000$33,000 Army College Fund$49,000 Army College Fund$1,000 and $33,000 Army College Fund$4,000 and $33,000 Army College Fund$2,000 and $49,000 Army College Fund$8,000 and $49,000 Army College Fund

4 and 5-Year Enlistment IncentivesNo Enlistment Bonus$2,000$4,000$8,000$12,000$16,000$20,000$40,000 Army College Fund$50,000 Army College Fund$60,000 Army College Fund$75,000 Army College Fund$1,000 and $40,000 Army College Fund$4,000 and $40,000 Army College Fund$2,000 and $60,000 Army College Fund$8,000 and $60,000 Army College Fund$1,000 and $50,000 Army College Fund$4,000 and $50,000 Army College Fund$2,000 and $75,000 Army College Fund$8,000 and $75,000 Army College Fund

6-Year Enlistment IncentivesNo Enlistment Bonus$2,000$4,000$8,000$12,000$18,000$24,000$40,000 Army College Fund$50,000 Army College Fund$60,000 Army College Fund$75,000 Army College Fund$2,000 and $40,000 Army College Fund$2,000 and $50,000 Army College Fund$8,000 and $40,000 Army College Fund$8,000 and $50,000 Army College Fund$4,000 and $60,000 Army College Fund$4,000 and $75,000 Army College Fund$12,000 and $60,000 Army College Fund$12,000 and $75,000 Army College Fund

452001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 60: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

46 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 61: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

DEFENDING DOMINANT SHARE: USING MARKET SEGMENTATION AND CUSTOMER RETENTION MODELING TO MAINTAIN MARKET LEADERSHIP

Michael G. Mulhern, Ph.D.

Mulhern Consulting

ABSTRACT

Regardless of their market’s growth rate or competitive intensity, market leaders expend considerable resources defending and improving their market position. This paper presents a case study from the wireless communications industry that incorporates two widely used marketing strategy elements – market segmentation and customer retention.

After using customer information files to derive behaviorally based customer segments, qualitative and quantitative research was conducted to gather performance and switching propensity data. After an extensive pre-test, scale reliability was evaluated with coefficient Alpha. Once survey data were collected, scale items were tested for convergent and discriminant validity via correlation analysis. Finally, retention models using logistic regression were developed for each segment.

Recommendations were made to maintain and enhance the firm’s market leadership position. Specifically, we identified the strategy elements that would likely retain customers in high revenue, heavy usage segments and lower usage segments management wanted to grow.

BACKGROUND

This paper focuses on a portion of the segmentation and retention process undertaken by a North American provider of wireless telecommunication services. The need for the study arose when my client, the dominant player with 60+% market share, recognized two major structural changes taking place. First, its major competitor had recently allied itself with a major international telecom provider having virtually unlimited resources. Secondly, two new competitors had entered the market in the past year with the potential for more new entrants to follow. Consequently, senior management decided to segment their customer base using their customer information file of transaction data. Once segmented, primary research would offer ways to retain their existing customers. The segmentation and retention process encompassed several major stages.

When contacted by the client, we learned that behavioral segmentation had already been

implemented. Eleven segments had been derived from the customer information file using factor analysis and CHAID. Management identified the most relevant variable on each factor and these variables were submitted to a CHAID analysis. CHAID is a large sample exploratory

472001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 62: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

segmentation procedure. Bases for segmentation included revenue and usage variables. In this case, 11 segments were derived, four high usage and seven light usage.

MAKING MODELING STRATEGIC: SETTING STUDY OBJECTIVES

When Mulhern Consulting was contacted to bid on this project, management sought strategic and operation guidance on the drivers of satisfaction in each customer segment. We recommended that the focus of the effort should determine how performance affected profitability. A compromise was reached where the determinants of satisfaction, dissatisfaction, and retention were investigated. This paper will focus on the retention modeling.

RETENTION MODELING

Data Sources

Qualitative Research Once the segments were identified, primary research was undertaken. During this stage,

qualitative research preceded quantitative research. The qualitative work consisted of 22 focus groups, two with each segment. The primary purpose of the groups was to help identify the attributes employed by customers to select and assess the performance of a wireless service provider.

Quantitative Research Once the attribute list was created and the survey developed, an extensive pre-test was

conducted. In addition to clarifying question wording and flow, the pre-test results were used to test the reliability of the scale items. Cronbach’s alpha was the reliability test applied. Reliability scores were quite high so no modifications to the scale items were made.

Reliability is a necessary component of properly constructing an attitude scale but it alone is not sufficient. To meet the sufficiency criterion, validity testing must also be performed. Consequently, once fielding was completed, construct validity was investigated. Both convergent and discriminant validity, two key components of construct validity, were assessed using Pearson’s correlation. For all pairs of attributes, both convergent and discriminant validity were statistically significant.

Structuring the Modeling Problem

Dependent Variable: Retention The dependent variable, retention, was conceptualized as a constant sum measure (Green,

1997). Retention was measured with a constant sum scale that asked respondents to assume that their contract was about to expire and then allocate ten points across the four competitors based upon the odds (or likelihood) of selecting each provider. Since these data were collected via

48 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 63: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

survey research, retention is actually a repurchase intention score rather than behaviorally based retention.

Independent Variables: Retention The independent variables were performance compared to expectations on 50 scale items in

five business areas: sales, network, service, pricing, and billing. Both the business area definition and the performance attributes evolved from the qualitative research.

Two approaches were taken to identifying independent variables to include in the models. First, a factor analysis was run to assess the correlation among variables. Since each factor is orthogonal, the highest loading variable on each factor was considered a candidate independent variable.

The second approach implements an idea proposed by Allenby (1995). He proposed that respondents at the extremes of a distribution were more valuable when developing marketing strategy than the bulk of individuals at the center of the distribution. We modified the idea and applied it to attributes in the survey. The idea was that the attributes that had “extreme” scores (i.e. the highest and lowest within a business area) might be significant predictors of retention. Essentially, we developed a set of independent variables where the competitors scored highest (i.e. best performing attributes) and the attributes where the competitors scored lowest (i.e. worst performing attributes) within each business area. Models employing each set of independent variables were constructed separately.

MODELING APPROACHES CONSIDERED

Three modeling approaches were considered for this study. The first, ratings based conjoint, was eliminated primarily due to the difficulty of determining objective levels for the attributes of interest. Further, since modeling by segment was required, many designs would have to have been developed at a cost that was prohibitive to the client. Finally, many attributes made ratings based conjoint a less than optimal technique.

Structural equation modeling was also evaluated. Given the time and budget constraints of the project, we doubted that acceptable models by segment could be derived since management needed operational as well as strategic guidance.

A regression-based approach was selected because it could handle many attributes or independent variables while maintaining individual level data, no level specification was required, and it could provide both strategic and operational guidance to management.

Modeling Goals

Once a regression-based approach was selected, three modeling goals were identified. First, we wanted to create a retention variable with sufficient variation in response across individuals. Since the goal of regression is to assess the degree of variation in a dependent variable explained

492001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 64: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

by one or more independent variables, more rather than less variation in the dependent variable was preferred. This required the repurchase intent question to be recoded. Secondly, we wanted to develop valid and reliable indicators of the constructs measured. To accomplish this goal, construct validity was assessed. The final goal was to build models that explained as well as predicted repurchase intent. Investigating logistic regression diagnostics and modifying the models based on what we learned helped achieve this goal. Each of these goals required actions to be taken with regard to both the data and its analysis.

Goal 1: Ensuring Variation in the Dependent Variable

Initial assessment of the frequency distribution for the repurchase intention variable indicated that 48% of the respondents allocated all 10 points to their current provider. In retrospect, this is not surprising since the sample consisted only of the client’s current customers. However, to ensure variation, the repurchase intention scores were recoded into these categories: Likely to repurchase=10, Unlikely to repurchase=0-6, Missing=7-9. After recoding, 52% of the remaining respondents fell into the likely to repurchase category and 48% were categorized as unlikely to repurchase.

Recoding impacted the modeling in two ways. First, it ensured that sufficient variation existed, and secondly, required that logistic regression be used since the dependent variable was now dichotomous.

Goal 2: Developing Valid and Reliable Attitude Scales with Construct Validity

Psychometric Theory When attempting to measure attitudinal constructs, psychometric theory suggests that

constructs must be both valid and reliable. Construct validity has been defined in various ways, including:

• The extent to which an operationalization measures the concept it purports to measure

(Zaltman et al 1973), and • Discriminant and convergent validity (Campbell and Fiske, 1959)

Bagozzi (1980) expands these definitions to include theoretical and observational

meaningfulness of concepts, internal consistency of operationalizations and nomological validity. He also explores several more extensive methodologies for testing construct validity: multitrait-multimethod and causal modeling. In this paper, we will focus on reliability (internal consistency of operationalizations), convergent validity, and discriminant validity.

Reliability

Two approaches to internal consistency of interval level, cross sectional data are widely used in marketing research — split half reliability and Cronbach's alpha. Because the manner in which separate measures of the same construct can be split is arbitrary and the value of the reliability coefficient is contingent upon this division, pointed criticism has been levied against the split half method. A widely accepted alternative is Cronbach's alpha. Alpha overcomes the arbitrary nature of the splitting decision by estimating the mean reliability coefficient for all possible ways

50 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 65: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

of splitting a set of items in half. Alpha estimates reliability based upon the observed correlations or covariances of the scale items with each other. Alpha indicates how much correlation we can expect between our scale and all other scales that could be used to measure the same underlying construct. Consequently, higher values indicate greater reliability. In the early stages of exploratory research, Churchill (1979) notes that alpha values of .5 -.6 are acceptable. In more confirmatory settings, scores should be higher. A general rule of thumb suggests that values of .70 and above are considered acceptable. It should be noted, however, that alpha assumes equal units of measurement in each scale item and no measurement error (Bagozzi, 1980).

Table 1: Reliability results for the 77 pre-test respondents

Measurement Scales by

Business Area Alpha Score Sales Process .94 Cellular Network .88 Customer Service .92 Pricing .89 Billing .86

Since the reliability scores were uniformly high, no modifications were made to the attributes. Convergent and Discriminant Validity

By using inter-item correlation analysis, we can determine if the attributes designed to measure a single construct are highly correlated (i.e. converge) while having a low correlation with attributes that purportedly measure a different concept (i.e. discriminate). For example, the attributes or scale items that are developed to measure customer service attributes should have high correlations with each other, while these same scale items should have low correlations with attributes designed to measure other constructs (e.g. billing, pricing, sales).

512001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 66: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Tables 2 & 3: Selected Results for Convergent and Discriminant Validity.

Convergent Validity

Network1 Network2 Network3 Network4

Network1 1.00

Network2 .41 1.00

Network3 .51 .53 1.00

Network4 .48 .52 .55 1.00

Discriminant Validity

Network1 Network2 Network3 Network4

Billing1 .23 .27 .27 .26

Billing2 .22 .26 .27 .26

Billing3 .24 .24 .27 .26

Billing4 .22 .19 .20 .21

Note: All results significant at P=.00, and all pairs were tested and the results were similar to those in the above tables.

Goal 3: Improving Explanation and Prediction Evaluating Model Quality: Overall Model Fit

A set of goodness of fit statistics was used to evaluate model quality. Pearson’s Chi Square was employed to determine how well the fitted values represented the observed values. Secondly, R SquareL assesses the degree to which the inclusion of a set of independent variables reduces badness of fit by calculating the proportionate reduction in log likelihood.

With respect to prediction, Lambdap, a measure of the proportionate change in error, was employed. Also, the classification or confusion matrix was used to determine the proportion of cases correctly classified by the model. Following Menard (1995), the following decision rules guided our model assessment:

52 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 67: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 4: “Goodness of Fit” and Predictive Ability

Statistical Test Decision Rule

“Goodness of Fit" Chi Square

P > .05

R SquareL High - .3 or higher Moderate - .2 - .3 Low – Less than .2 Predictive Ability Lambdap High - .3 or higher Moderate - .2 - .3 Low – Less than .2

IMPROVING MODEL QUALITY: LOGISTIC REGRESSION DIAGNOSTICS

Since modeling is an iterative process, logistic regression diagnostics were used to obtain an initial assessment of model quality. The purpose was to identify those cases where the model worked poorly as well as cases that had a great deal of influence on the model’s parameter estimates. Following Menard’s recommendations (pp77-79), the Studentized residual was used to identify those instances where the model worked poorly. The Studentized residual estimates the change in deviance if a case is excluded. Deviance is the contribution of each case to poorness of fit.

With respect to identifying cases that had a large influence on the model’s parameter estimates, two statistics were evaluated; leverage and DfBeta. Leverage assesses the impact of an observed Y on a predicted Y, and DfBeta measures the change in the logistic regression coefficient when a case is deleted.

For several of the segments, these diagnostics identified cases which had undue influence on

the model or cases for which the model did not fit very well. In general, these cases were few in number but, when deleted from the database, had a major impact on the explanatory and predictive capability of the segment retention models. Although a variety of other measures were undertaken to improve model quality, these were most effective in this study. The tables below illustrate this phenomenon for Segment H2.

Table 5: Impact of Diagnostics on Statistical Measures

With Influential Cases Without Influential Cases Sample Size 99 96 Goodness of Fit Model Chi Square 32.0 44.3 R SquareL 0.24 0.35 Predictive Ability Lambdap 0.39 0.56 Percent Correctly Classified 76% 83%

532001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 68: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 6: Impact of Diagnostics on Independent Variables (Selected disguised results)

Segment Indep Variable Set Business Area Odds Ratio L1 Best Pricing 2.3 Worst Pricing 1.7 Worst Sales 1.7 H1 Factor Sales 2.0 Factor Pricing 1.6 H4 Best Network 4.0 Best Billing 3.1

Variables Included in model With Influential Cases Without Influential Cases X1 Significant @

0.01 Significant @

0.00 X2 Significant @

0.08 Significant @

0.02 X3 Not Significant Significant @

0.00 Note: All results statistically significant at the P > .05 level. The odds ratio indicates impact of a one-unit change in the independent variable on retention.

LESSONS LEARNED: RESEARCH

Modeling Flexibility Logistic regression offers modeling flexibility by allowing for dependent variable

modification. In this study the measurement scale of the dependent variable was changed from interval to nominal.

Scale Validation Construct validity provides confidence that we are actually measuring what we think we are

measuring.

Advantages of Using Regression Diagnostics Diagnostics can improve the statistical indicators of model quality dramatically, especially

where certain cases fit the model poorly or if there are influential cases. Further, diagnostics can suggest substantive changes to the model.

54 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 69: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

LESSONS LEARNED: MANAGEMENT

Model Retention, Not Satisfaction The retention models statistically outperformed the satisfaction and dissatisfaction models in

the majority of segments. Therefore, in this study, it was more appropriate to model retention.

Segmentation Basis Variables: Revenue vs. Profit Revenue was selected by management as the initial basis variable for behavioral

segmentation. However, simply because a customer generates substantial revenue does not necessarily imply s/he is highly profitable. A customer profitability score or index may be a more appropriate variable upon which to base the behavioral segmentation.

Evaluate Behavioral Segmentation Solution Based on Criteria for Effective Segmentation Several criteria for effective segmentation were violated in this study.

Substantial Both qualitative and quantitative primary research suggested the segments were not as

homogenous as management expected. Although models were built to identify drivers in each low usage segment, management chose not to develop strategies for several low usage segments due to their small size. Actionable

In addition, some of the retention drivers cut across segments so that improvements could not be segment specific. This violates the actionability criterion for segmentation. As an example, dropped calls were particularly relevant for several high usage segments. However, solving this problem requires a system-wide network upgrade. Any upgrade affects all users, not only those whose retention is driven by this attribute.

This reinforces the need to match the criteria for effective segmentation with the behavioral segmentation solution. Had this been accomplished prior to the retention modeling, resources would have been allocated more efficiently.

Optimizing Segmentation: Combining Behavioral with Attitudinal Segmentation Combining behavioral with attitudinal segmentation may have enhanced the segmentation

scheme by identifying the attitudinal rationale for the behavior. Management decided not to pursue this alternative.

552001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 70: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

RETROSPECTIVE

If asked to replicate this study today, the following should be considered:

Barriers to Switching Users with no ability to switch (i.e. individuals locked into their employer’s service plan)

were excluded from the sample. However, for those included in the respondent database, barriers to switching may have impacted propensities to switch. These barriers could be modeled as independent variables.

Response Bias in Satisfaction and Retention Research After this research was completed, Mittal and Kamakura (2000) found that repurchase

intentions could be impacted by response bias in the form of demographic characteristics. When predicting actual behavior among automobile purchasers, they found the interaction between satisfaction and demographics captured the response bias among respondents. Given the time and budget, modeling within each segment could be tested for these covariates.

Investigate Linkages Among Satisfaction, Retention, and Profitability Since the assumption is that increased satisfaction will lead to enhanced retention and greater

profitability, attention needs to be paid to empirically testing these assumptions. Kamakura et al (2001) have begun working on this.

Consider Hierarchical Bayes as a Modeling Tool Hierarchical Bayes analysis can estimate the parameters of a randomized coefficients

regression model. Early indications are that HB could provide superior estimates of coefficients for each segment, by assuming that the segments are all drawn from an underlying population distribution. Since there was only one observation per respondent, it would be necessary to pool individuals within segment. However, HB is capable of estimating both the variance within segment, including that due to unrecognized heterogeneity, as well as heterogeneity among segments (Sawtooth Software, 1999).

56 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 71: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES Allenby, Greg and J. L. Ginter (1995) “Using Extremes to Design Products and Segment

Markets,” Journal of Marketing Research, 32, (November) 392-403. Bagozzi, Richard P. (1980) Causal Models in Marketing, New York: John Wiley & Sons. Campbell, D.T. and D.W. Fiske (1959) “Convergent and Discriminant Validation by the

Multitrait-Multimethod Matrix,” Psychological Bulletin 56: 81-105. Churchill, Gilbert A. (1979) A Paradigm for Better Measures of Marketing Constructs, Journal

of Marketing Research, XVI (February), 64-73.

_______ and J.P. Peter (1984) Research Design Effects on the Reliability of Ratings Scales: A

Meta-Analysis, Journal of Marketing Research, XXI (November), 360-75.

Cronbach, Lee (1951) “Coefficient Alpha and the Internal Structure of Tests,” Psychometrica, 16:3 (September), 297-334.

Green, Paul (1997) “VOICE: A Customer Satisfaction Model With an Optimal Effort Allocation Feature,” Paper presented at the American Marketing Association’s Advanced Research Techniques Forum.

Hanemann, W. Michael and B. Kanninen (1998) “The Statistical Analysis of Discrete Response CV Data,” University of California at Berkeley, Department of Agricultural and Resource Economics and Policy, Working Paper No. 798.

Hosmer, David W. and S. Lemeshow (1989) Applied Logistic Regression. New York: Wiley Interscience.

Kamakura, Wagner, V. Mittal, F. deRosa, and J. Mazzon (2001) “Producing Profitable Customer Satisfaction and Retention,” Paper presented at the American Marketing Association’s Advanced Research Techniques Forum.

Menard, Scott (1995) Applied Logistic Regression Analysis. Sage Series in Quantitative Applications in the Social Sciences: 106. Thousand Oaks, CA: Sage.

Mittal, Vikas (1997) “The Non-linear and Asymmetric Nature of the Satisfaction and Repurchase

Behavior Link,” Paper presented at the American Marketing Association’s Advanced Research Techniques Forum.

572001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 72: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

_________ and W. Kamakura (2000) “Satisfaction and Customer Retention: An Empirical Investigation,” Paper presented at the American Marketing Association’s Advanced Research Techniques Forum.

Mulhern, Michael G. (1999) "Assessing the Impact of Satisfaction and Dissatisfaction on

Repurchase Intentions," AMA Advanced Research Techniques Conference, Poster Session. Santa Fe NM.

_______ and Lynd Bacon (1997) "Improving Measurement Quality in Marketing Research: The

Role of Reliability," Working Paper. ________ and Douglas MacLachlan (1992) "Using Analysis of Residuals and Logarithmic

Transformations to Improve Regression Modeling of Business Service Usage," Sawtooth Software Conference Proceedings. Sun Valley ID.

Myers, James H. (1999) Measuring Customer Satisfaction: Hot Buttons and Other Measurement

Issues. Chicago: American Marketing Association. Norusis, Marija/SPSS Inc. (1994) Advanced Statistics 6.1 Chicago: SPSS Inc. Sawtooth Software (1999) “HB-Reg for Hierarchical Bayes Regression” Technical paper

accessible at www.sawtoothsoftware.com. Zaltman, G., Pinson, C.R.A., and R. Angelmar (1973) Metatheory and Consumer Research, New

York: Holt, Rinehart, and Winston.

58 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 73: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

ACA/CVA IN JAPAN: AN EXPLORATION OF THE DATA IN A CULTURAL FRAMEWORK

Brent Soo Hoo Research Analyst, Gartner/Griggs-Anderson

Nakaba Matsushima Senior Researcher, Nikkei Research

Kiyoshi Fukai Senior Researcher, Nikkei Research

BACKGROUND

Ray Poynter asserted in his 2000 Sawtooth Conference paper on Creating Test Data to Objectively Assess Conjoint and Choice Algorithms that “The author’s experience is that different cultures tend to have larger or smaller proportions of respondents who answer in this simplified way. For example, the author would assert that Japan has fewer extreme raters and Germany has more” (2000 Sawtooth Conference Proceedings, p. 150). Poynter called for more research data, and we believe we have some. HYPOTHESIS

One of the big questions about using Sawtooth’s ACA or CVA conjoint programs in Japan is the probable cultural problem of selecting from the centroid (e.g., rating “4,” “5” or “6” from a standard ACA/CVA nine-point scale) on a pairwise comparison rating scale. This is what Ray Poynter calls the “simplified way.” Does this exist or not? It is argued that this is not really a problem at all, but an artifact of the society in which the data is gathered and that it leads to imprecise utility estimations that are still usable. Based on Japanese cultural homogeneity, our hypothesis is that Japanese people make different choices than Western people due to the cultural desire not to confront or be outspoken. Brand data is especially affected by this cultural desire. We have access to current data from Japan, which we analyzed for this potential problem. This is a phenomenon that has been widely acknowledged in Japan by various researchers and Sawtooth Software, but we can now examine some real data sets for this effect. Data

We were fortunate to have access to eight recent ACA/CVA data sets conducted by Nikkei Research and/or Gartner/Griggs-Anderson in Japan (in Japanese characters) from 1998 to 2000. All of these studies had at least 85 respondents each, with a few of the studies having up to 300 total respondents. Four of these studies contained brand as an attribute. Nikkei Research contends that the role of brand in Japan is understated by exercises such as conjoint due to the fact that the conjoint does not properly address the external factors such as brand strength, marketing/name awareness and distribution. Why do foreign (Western) companies that have high-quality products have difficulty succeeding/selling in Japan?

592001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 74: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Analysis Plan We compared the relationship between the utilities derived from the pairs to the utilities

derived from the priors. There were differences in the results, although not with brand included as an attribute. The results were charted/graphed to show comparisons across the data available. The issue of branding was also analyzed using the data sets with the absence of brand as an attribute and the data sets with the presence of brand as an attribute. This contrast may help researchers design studies that are not culturally biased due to the introduction of branding in the conjoint exercises. We examined the available holdout data sets from our Japanese data (termed “catalog choices” by Nikkei Research) and compared hit rate accuracy for predicting the holdout or catalog choices with the ACA/CVA simulator. Then, we used the stronger analysis routines contained in Hierarchical-Bayes (HB) to improve the accuracy of predicting the holdout or catalog choices with HB-generated attribute utilities. Using this analysis could help to make future ACA simulators more accurate and less influenced by “external factors.” The result would be more accurate results from the data, which would please the clients and the researchers.

In the cultural issues section, we examined the Japanese research with the researchers from Japan’s Nikkei Research. It is important to understand how Japanese culture and its homogeneous societal pressures affect branding when doing research in Japan. We will attempt to answer the question of why the data says what it does and how Japan is culturally different from the Western world. Many Western corporations are wondering how to penetrate the Japanese market. Ask Anheuser-Busch what happened to Budweiser beer in Japan; it did not fare well against the native Japanese competitive brands. We will look at some historic case studies from Japan of Western products’ inability to penetrate the market. What is strange is that in Japan there is much adoration of American cultural items/products, yet few of the big American brands/products are strong in Japan.

Finally, in relation to the use of ACA/CVA in Japan, we have unique access to a particular study fielded in Japan in November 2000. Data collection staff filled out a research methods observational worksheet on each respondent. Quantitative and qualitative feedback from the respondents on the ACA exercise and administration were gathered. This paper contains rich feedback data that will help the reader understand the Japanese mindset and reaction to choice exercises of this sort. CENTROID ANALYSIS

We looked at ratings from the nine-point bidirectional scale from ACA and CVA pairs comparisons. A full profile CVA study was included to compare against the ACA partial profile routines. Does full profile (CVA) make a difference when compared to partial profile (ACA)?

60 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 75: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

OUR HYPOTHESIS

We believe that Japanese people select answers from the centroid (4, 5, 6) rather than from the outliers (1, 2, 3 or 7, 8, 9). Well-known Japan expert/researchers, George Fields, Hotaka Katahira and Jerry Wind, also addressed this subject in Leveraging Japan:

“Japanese survey respondents tend to answer in a narrower range than their Western peers. Early in his career in Japan, George Fields was involved in a project in which more than 30 concepts were to be screened, using the same techniques used in the United States, the United Kingdom, Germany and Australia. In each case, the most critical measure was a “top of the box” rating on a certain scale. On this basis, all concepts failed in Japan. Worse yet, the data were non-discriminating—that is, the figures were similar for all concepts. The client was warned of this possibility. However, the experience of all the other countries could not be ignored, and the client had the prerogative to conduct the exercise. The data were discriminating to some extent when the two top boxes were combined, but this raised a quandary in that one could not be sure whether the discrimination simply indicated differences between mediocrities (some are not as bad as others) or superiority of some over others.”1

1 Fields, George and Katahira, Hotaka and Wind, Jerry, Leveraging Japan, San Francisco, Jossey-Bass Inc., 2000, p. 283.

612001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 76: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Fields et al. continue:

“The ‘theory,’ simply stated, is that when confronted with a verbal seven-point scale, or a range from ‘excellent’ to ‘terrible,’ Westerners tend to start from the extremes. They consider a position quickly and then perhaps modify it, which means working from both ends and moving toward the center. The Japanese, on the other hand, tend to take a neutral position or start from the midpoint and move outward, seldom reaching the extremes, hence the low top-of-the-box ratings.”2

This is not a one-time occurrence. In another study, “A company that screened advertising concepts in both Japan and the United States found marked differences in responses to the survey. The verbal scale was far less sensitive in Japan. Far fewer said, ‘I like it very much,’ or ‘I dislike it very much.’ They clung to the middle. About 20 percent of the American respondents chose the highest box compared to just 7 percent of Japanese.”3 Fields et al. concludes, “Using even number scales, or those without a strict midpoint, doesn’t really resolve the issue. While this explanation is a little pat, it fits our usual observations of the Japanese being very cautious to take up a fixed position before all known facts and consequences are weighed. How, then, can one give a clear opinion on a product or an advertisement in a single-shot or short-term exposure? Here, time is the substance rather than the number of exposures.”4 Many researchers have observed it, isn’t it time that we try to quantify this phenomena? OUR CRITERIA

Defining the “centroid” people: 50% or more of the pairs answers fall in the range of 4, 5, 6. Defining the “outlier” people: 50% or more of the pairs answers fall in the range of 1, 2, 3 or 7, 8, 9). OUR STUDIES/DETAILS Study 1—Nikkei Japanese Finance Study (July 2000): Tokyo and Osaka (n=307), six attributes with 30 total attribute levels, 37 pairs shown, ACA 3.0 converted to ACA 4.0 for analysis purposes, no holdouts/catalog, no brand attribute. Study 2—Gartner Networking Product Study (November 2000): n=123, 14 attributes with 40 total attribute levels (client designed the ACA design, not Gartner/Griggs-Anderson), 35 pairs shown, ACA 3.1 for data collection converted to ACA 4.0 for analysis purposes, has holdouts, no brand attribute; respondent observation worksheet and respondent comments collected by interviewers. Study 3—Nikkei Japanese Business Car Study (February 1998): n=107, five attributes with 15 total attribute levels, ACA 3.0, 12 pairs shown, no holdouts/catalog, has brand attribute. 2 Ibid. Fields et al., p. 283. 3 Ibid. Fields et al., p. 283. 4 Ibid. Fields et al., p. 283-284.

62 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 77: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Study 4—Nikkei Japanese Passenger Car Study (February 1998): n=185, five attributes with 13 total attribute levels, ACA 3.0, eight pairs shown, no holdouts/catalog, has brand attribute. Study 5—Nikkei Japanese Software Study (March 1999): n=85, five attributes with 19 total attribute levels, ACA 3.0, 16 pairs shown, no brand attribute, has holdouts/catalog. Study 6—Nikkei/Gartner Japan/America Cross-Cultural Research Study (July 2001): Japanese segment (n=228), eight attributes with 33 total attribute levels, ACAWeb with CiW for demographics, 30 pairs shown, has holdouts/catalog, has brand attribute. Study 7—Nikkei/Gartner Japan/America Cross-Cultural Research Study (July 2001): American segment (n=85), eight attributes with 33 total attributes levels, ACAWeb with CiW for demographics, 30 pairs shown, has brand attribute, has holdouts/catalog. Study 8—Gartner Japanese Networking Product Study (spring and summer 1998): n=250, eight attributes with 25 total attribute levels, CVA 2.0 full profile design using Japanese Ci3, 30 pairs shown, has brand attribute, no holdouts/catalog. DATA PROCESSING

It required extensive formatting and data processing to get Japanese ACA 3.x format data into usable ACA 4.0 and ACA/HB formats. This work was done using basic text editors and individually sifting through the ACD interview audit trail file for each study. ACA/HB does not work with ACA 3.x format ACD files. Additional statistical work was done in SPSS and SAS. Due to client confidentiality, some studies/attributes will be heavily masked. We have shown which attributes and levels go together when the exact attribute names and levels cannot be disclosed. Our hope is that this information will be useful even in its masked form. We have identified at least one attribute as much as possible in each data set, to give the reader an idea of how the brand or price attribute reacted. The results of the centroid/outlier analysis follow.

Centroid vs. Outliers—Percentage of Total Respondents

Study Centroid Outlier

1 24.8% 75.2% 2 24.4% 75.6% 3 51.4% 48.6% 4 57.8% 42.2% 5 28.2% 71.8% 6 (PCJ) 38.6% 61.4% 7 (PCE) 22.4% 77.6% 8 (CVA) 36.8% 63.2%

Note: Different research topics, respondent types and different studies can yield different results. If the two Car studies (Studies 3 and 4) and the PCE study (Study 7) are excluded, an average of 30.56% centroid and 69.44% outlier is the result.

632001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 78: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

DATA TO COMPARE: A STUDY WITH BOTH JAPANESE AND ENGLISH SEGMENTS

Japan/America ’Cross-Cultural Research Study. Data collection occurred July 2001. Nikkei Research hosted and programmed the survey from its Japanese Web server (web.nikkei-r.co.jp/pcj or pce), and fielded dual studies running in Japanese and English at approximately the same time. The research topic was notebook computers. Brand was included in the attribute set. The software used was ACA/Web with CiW. The study included holdouts which were the same for all respondents (Japanese/American). For the Japanese segment, the Nikkei Research Panel was used for list in Japan—3000 e-mail invitations were sent. No reminder e-mails were sent. For the American segment, Gartner Panel invitees was used for the list source—1921 e-mail invitations were sent, 1660 reminder e-mails were sent. To qualify, respondents must be planning to purchase a notebook computer in the next year. This question was asked in the first four questions to disqualify anyone not planning to purchase during this time period. ’We spent a lot of time analyzing the data from these two studies, since they are the only pair that were done in both countries. Since this was not academic research, we did not have the luxury of an American or European segment for the other jobs. The cost of doing research in Japan is expensive. Clients cannot always, or do not always, want to conduct studies in multiple countries.

Diagnostics

Japanese completes were 228 + 758 disqualified (but tried to participate). 7.6% response rate for qualified respondents to complete the study. 32.86% response rate overall. American completes were 85 + 91 disqualified (but tried to participate). 4.4% response rate for qualified respondents to complete the study. 9.16% response rate overall. Japanese part-way terminations was n=35. American part-way terminations was n=40. The Japanese are more likely to finish what they start based on the ratio of total number of completed interviews to part-way terminations. The ACA task is one that we have found takes Japanese people generally more time than Westerners. We hypothesize that this is because Japanese people are very detail-oriented and they read the attributes in the pair-wise comparisons closely. Respondent Co-Ops

There was a co-op prize drawing of 1000 Yen (US$8) for 50 respondents in Japan (US$400 total) and a co-op prize drawing for a unique Japanese item worth $10 for 25 respondents in America (approximately US$250 total). Comments from American respondents were positive for the unique omiyage or gift.

64 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 79: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Demographics Purchase Intention Japan (n=228) Japan % America (n=85) America % 1-6 months 102 44.74% 42 49.41% 7-12 months 126 55.26% 43 50.59% Gender Japan (n=228) Japan % America (n=85) America % Male 119 52.19% 68 80.00% Female 109 47.81% 17 20.00% Age Distribution

0

10

20

30

40

50

60

Japan (n=228) America (n=85)

0-1920-2425-2930-3435-3940-4445-4950-5455-5960+Refused

Age Japan (n=228) America (n=85) 0-19 2.63% 1.18% 20-24 9.65% 0.00% 25-29 16.67% 8.24% 30-34 25.00% 8.24% 35-39 23.68% 20.00% 40-44 13.60% 20.00% 45-49 4.82% 23.53% 50-54 2.63% 8.24% 55-59 0.88% 7.06% 60+ 0.44% 2.35% Refused 0.00% 1.18%

652001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 80: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Holdout Specifications Attribute Holdout 1 Holdout 2 Holdout 3 Holdout 4 Holdout 5 Holdout 6 Brand IBM Sony Toshiba NEC COMPAQ Dell CPU Brand Pent. III Celeron Celeron Athlon Athlon Pent. III Proc. Speed

600 MHz 600 MHz 800 MHz 800 MHz 1.0 GHz 1.0 GHz

RAM 128 MB 64 MB 128 MB 128 MB 128 MB 64 MB Display Size

12.1” 12.1” 14.1” 14.1” 14.1” 14.1”

Extra Drive None DVD DVD CD-RW CD-RW CD-ROM Weight 4.4 lbs. 4.4 lbs. 6.6 lbs. 8.8 lbs. 6.6 lbs. 6.6 lbs. Price $1,500 $1,500 $2,000 $1,500 $2,000 $1,000

Holdout Results (averages) 0-to-100 scale—purchase likelihood Holdout Japan (n=228) America (n=85) Holdout 1 (IBM) 36.39 31.32 Holdout 2 (Sony) 47.00 38.11 Holdout 3 (Toshiba) 36.73 45.55 Holdout 4 (NEC) 38.74 44.50 Holdout 5 (Compaq) 30.91 51.11 Holdout 6 (Dell) 30.91 61.97

As our discussant Ray Poynter noted, we put the Japanese in a tough choice situation. By marrying the better features to the least preferred brand in the Japanese segment (Dell), the desirability of Holdout 6 was dampened. The exact opposite problem occurs with Holdout 2, the Sony product, which is the most preferred Japanese brand yet it has a poor specification on the other attributes. The problem doesn’t exist to that extent on the American holdouts, although there is not a really strong holdout overall with Holdout 6 leading at only a 61.97 average out of a possible 100 points. Obviously, lack of differentiation in the holdouts can lead to lower hit rate on the predictions. We discussed this with the conference attendees to explore ideas on holdout formulation. It is not an easy choice, since a clearly superior holdout may predict better, but may not accurately represent the marketplace.

66 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 81: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

ACA Holdout Prediction

Study Sample size # of accurate predictions (ACA) % of accurate predictions 2 n=123 84 67.74% Note: Client-generated holdouts, three “bad” options led most to one “good” option. Study Sample size # of accurate predictions (ACA) % of accurate predictions 5 n=85 24 28.23% Note: No brand in ACA exercise, study on existing products, brand was shown in holdouts, strong brand affects purchasing. Study Sample size # of accurate predictions (ACA) % of accurate predictions 6 (PCJ) n=228 97 42.54% Study Sample size # of accurate predictions (ACA) % of accurate predictions 7 (PCE) n=85 49 57.64% Note: Nikkei Research created holdouts based on actual products in catalogs; however, there are no clear winners as shown in holdout preferences. There is pressure between priors and pairs. This is a hard task for Japanese respondents, since the best brand was coupled with poor specs and the worst brand was coupled with better specs.

HB Holdout Prediction

Study Sample size # of accurate predictions (HB) % of accurate predictions 2 n=123 81 65.85% Note: Client-generated holdouts, three “bad” options led most to one “good” option. HB didn’t help. Study Sample size # of accurate predictions (HB) % of accurate predictions 5 n=85 26 30.58% Note: No brand in ACA exercise, study on existing products, brand was shown in holdouts, strong brand affects purchasing. There was slight improvement with HB. Study Sample size # of accurate predictions (HB) % of accurate predictions 6 (PCJ) n=228 96 42.10% Study Sample size # of accurate predictions (HB) % of accurate predictions 7 (PCE) n=85 49 57.64 % Note: Nikkei Research created holdouts based on actual products in catalogs; however, there are no clear winners as shown in holdout preferences. There is pressure between priors and pairs. This is a hard task for Japanese respondents, since the best brand was coupled with poor specs and the worst brand was coupled with better specs. HB didn’t help.

672001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 82: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

PRIORS VS. PAIRS UTILITIES Question: Is there a difference between the self-explicated priors utilities and the pairs utilities? Hypothesis: There will be more Japanese in the centroid because they are culturally less likely to stand out/be outspoken.

ACA/HB were used to calculate pairs utilities, since ACA 3.x does not compute these. Nikkei Research programmed to calculate the priors utilities from the raw ACD logfiles based on Sawtooth Software’s methods from ACA 4.0. ACA PTS files were generated, just showing mean utilities for comparison. Significant difference is at the 95% confidence level. Shown below are the Priors/Pairs for Study #6 & #7. All other Priors/Pairs utilities are in Appendix A.

Priors vs. Pairs Utilities n=228 Study 6 (PCJ): Priors Pairs Priors Pairs PC Brand—Sony 36.19* 39.50* 13.3" display 15.00 15.47 PC Brand—Dell 12.80 11.42 14.1” display 27.21* 21.26* PC Brand—IBM 25.05* 28.16* 15.0” display 33.19* 25.88* PB Brand—NEC 23.77* 29.94* No extra drive 2.50* 0.57* PC Brand—Toshiba 21.27* 26.59 CD-ROM drive 30.24* 33.71* PC Brand—Compaq 11.66 12.40 DVD-ROM drive 36.86* 41.34* CPU Pentium III 37.26* 33.38* CD-RW drive 40.59* 56.33* CPU Crusoe 13.75* 9.13* CD-RW/DVD-ROM 46.46* 67.22* CPU Celeron 10.67 8.88 4.4 lbs 47.14* 35.33* CPU Athlon 7.06 7.35 6.6 lbs 20.97 22.14 CPU Speed 600 MHz 0.00* 2.39* 8.8 lbs 0.45* 1.53* CPU Speed 800 MHz 23.70* 16.69* $1,000 56.18 55.99 CPU Speed 1.0 GHz 47.41* 25.3.5* $1,500 42.14* 45.64* 64 Mb RAM 0.00* 0.71* $2,000 28.09* 33.80* 128 Mb RAM 26.93* 24.02* $2,500 14.05* 19.83* 256 Mb RAM 53.86* 39.29* $3,000 0.00* 2.19* 12.1” display 7.55 6.58 * Significant difference at 95% confidence level. n=85 Study 7 (PCE): Priors Pairs Priors Pairs PC Brand— Sony 25.05 24.28 13.3” display 13.22* 16.97* PC Brand—Dell 26.72* 19.84* 14.1” display 33.68* 30.45* PC Brand—IBM 23.72 22.81 15.0” display 43.92 40.25 PC Brand—NEC 7.08 7.83 No extra drive 0.55 0.15 PC Brand—Toshiba 23.03 23.83 CD-ROM drive 38.92 35.60 PC Brand—Compaq 19.31 18.84 DVD-Rom drive 36.01* 41.91* CPU Pentium III 40.42 41.50 CD-RW drive 37.10* 52.27* CPU Crusoe 9.03* 12.99* CD-RW/DVD-ROM 44.86* 71.76* CPU Celeron 8.55 6.74 4.4 lbs. 42.25* 30.95*

68 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 83: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

n=85 (con’t) Study 7 (PCE): Priors Pairs Priors Pairs CPU Athlon 19.22* 26.08* 6.6 lbs. 22.09 19.47 CPU Speed 600 MHz 0.00* 1.15* 8.8 lbs 0.50 0.52 CPU Speed 800MHz 24.33* 14.84* $1,000 51.56 55.27 CPU Speed 1.0 GHz 48.66* 21.33* $1,500 38.67 41.32 64 Mb RAM 0.00 0.17 $2,000 25.78* 31.15* 128 Mb RAM 27.53 27.09 $2,500 12.89* 17.50* 256 Mb RAM 55.06* 42.81* $3,000 0.00* 1.61* 12.1” display 0.27 0.71 *Significant difference at 95% confidence level. DISCUSSANT ISSUES

Ray Poynter gave us some valuable feedback on the shape of the data in comparison to the absolute values of the data. In his ad hoc analysis on the aggregate data, there was a lot of shape consistency. We did a frequency distribution of the answer range (one through nine) for Studies 6 and 7. There were approximately two times the number of pairs answers in the “5” midpoint position for Japanese. Frequency Distribution of Study 6 (PCJ)—30 pairs x 228 respondents 1 - 7.7% 2 - 5.2% 3 - 15.5% 4 - 10.0% 5 - 18.4% 6 - 12.7% 7 - 17.1% 8 - 5.1% 9 - 8.3% Frequency Distribution of Study 7 (PCE)—30 pairs x 85 respondents 1 - 6.8% 2 - 7.5% 3 - 16.2% 4 - 10.4% 5 - 9.3% 6 - 11.8% 7 - 20.7% 8 - 7.5% 9 - 9.6%

SHAPE VS. ABSOLUTE VALUES Study 6 (PCJ) Priors Pairs Study 7 (PCE) Priors Pairs PC Brand—Sony 36.19 39.50 PC Brand—Sony 25.05 24.28 PC Brand—Dell 12.80 11.42 PC Brand—Dell 26.72 19.84 PC Brand—IBM 25.05 28.16 PC Brand—IBM 23.72 22.81 PC Brand—NEC 23.77 29.94 PC Brand—NEC 7.08 7.83 PC Brand—Toshiba 21.27 26.59 PC Brand—Toshiba 23.03 23.83 PC Brand—Compaq 11.66 12.40 PC Brand—Compaq 19.31 18.84 Imp 24.53 28.08 Imp 19.64 16.45 rSquare 0.95 rSquare 0.84 distance 9.46 distance 7.08 Note: Brand is more important in Japan; top American brands are not worth much in Japan.

692001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 84: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Study 6 (PCJ) Priors Pairs Study 7 (PCE) Priors Pairs CPU Pentium III 37.26 33.38 CPU Pentium III 40.42 41.50 CPU Crusoe 13.75 9.13 CPU Crusoe 9.03 12.99 CPU Celeron 10.67 8.88 CPU Celeron 8.55 6.74 CPU Athlon 7.06 7.35 CPU Athlon 19.22 26.08 Imp 30.20 26.03 Imp 31.87 34.76 rSquare 0.98 rSquare 0.94 distance 6.30 distance 8.20 Note: CPU chip is less of an issue in Japan. Only Pentium will do. America has plus for Pentium and Athlon. Study 6 (PCJ) Priors Pairs Study 7 (PCE) Priors Pairs CPU Speed 600 MHz 0.00 2.39 CPU Speed 600 MHz 0.00 1.15 CPU Speed 800 MHz 23.70 16.69 CPU Speed 800 MHz 24.33 14.84 CPU Speed 1.0 GHz 47.41 25.35 CPU Speed 1.0 GHz 48.66 21.33 Imp 47.41 22.96 Imp 48.66 20.18 rSquare 0.98 rSquare 0.96 distance 23.27 distance 28.95 Note: CPU speed shows weakness of priors. Attribute looks important until traded off. Identical results in Japan/America. 64 Mb RAM 0.00 0.71 64 Mb RAM 0.00 0.17 128 Mb RAM 26.93 24.02 128 Mb RAM 27.53 27.09 256 Mb RAM 53.86 39.29 256 Mb RAM 55.06 42.81 Imp 53.86 38.58 Imp 55.06 42.64 rSquare 0.99 rSquare 0.98 distance 14.87 distance 12.26 Note: Similar weakness of priors, but not as pronounced. Attribute looks important until traded off. Identical results in Japan/America. Study 6 (PCJ) Priors Pairs Study 7 (PCE) Priors Pairs 12.1" display 7.55 6.58 12.1" display 0.27 0.71 13.3" display 15.00 15.47 13.3" display 13.22 16.97 14.1" display 27.21 21.26 14.1" display 33.68 30.45 15.0" display 33.19 25.88 15.0" display 43.92 40.25 Imp 25.64 19.30 Imp 43.65 39.54 rSquare 0.96 rSquare 0.98 distance 9.49 distance 6.18 Note: Screen size is less important in Japan—footprint/size is a hidden attribute. The importance of this in Japan pulls opposite of large screen sizes.

70 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 85: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Study 6 (PCJ) Priors Pairs Study 7 (PCE) Priors Pairs No extra drive 2.50 0.57 No extra drive 0.55 0.15 CD-ROM drive 30.24 33.71 CD-ROM drive 38.92 35.60 DVD-ROM drive 36.86 41.34 DVD-ROM drive 36.01 41.91 CD-RW drive 40.59 56.33 CD-RW drive 37.10 52.27 Combo drive 44.46 67.22 Combo drive 44.86 71.76 Imp 41.96 66.65 Imp 44.31 71.61 rSquare 0.94 rSquare 0.84 distance 28.31 distance 31.62 Note: The opposite of the priors problem occurs. The priors cannot fully express the importance of the attribute in the pairs. The machine must be able to read CDs—a “no brainer.” Study 6 (PCJ) Priors Pairs Study 7 (PCE) Priors Pairs 4.4 lbs 47.14 35.33 4.4 lbs 42.25 30.95 6.6 lbs 20.97 22.14 6.6 lbs 22.09 19.47 8.8 lbs 0.45 1.53 8.8 lbs 0.50 0.52 Imp 46.69 33.80 Imp 41.75 30.43 rSquare 0.96 rSquare 0.99 Distance 11.92 Distance 11.60 Note: Weight is less relevant when assessed in pairs comparisons. Study 6 (PCJ) Priors Pairs Study 7 (PCE) Priors Pairs $1,000 56.18 55.99 $1,000 51.56 55.27 $1,500 42.14 45.64 $1,500 38.67 41.32 $2,000 28.09 33.80 $2,000 25.78 31.15 $2,500 14.05 19.83 $2,500 12.89 17.50 $3,000 0.00 2.19 $3,000 0.00 1.61 Imp 56.18 53.80 Imp 51.56 53.66 rSquare 0.99 rSquare 1.00 Distance 9.12 Distance 8.57 Note: Price is important, but behind the drive attribute. Similar results occur in Japan and the U.S.

712001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 86: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

BRAND ISSUES

We looked at “brand” and “no brand” in the pairs and the centroid/outlier issues to determine whether the presence or absence of brand attribute drove answers to the centroid or outlier. The option to do this analysis was only available when brand included in attributes (four studies). Surprisingly, brand did not seem to drive answers to the centroid or outlier. Perhaps the Japanese are not as brand-conscious as previously thought. Literature notes that this brand awareness and loyalty is fading. Study No brand % centroid Brand % centroid 3 n=55 51.40% n=57 53.27% 4 n=120 64.86% n=95 51.35% 6 (PCJ) n=87 38.15% n=86 37.71% 7 (PCE) n=20 23.52% n=17 20.0%

Qualitative Feedback—From Study 2’s Respondent Observation Worksheet • 36% of respondents asked the interviewer(s) five or more questions during the ACA

process. • 48% of respondents looked at the product definition/glossary sheet only once. (It is a

good idea to include a product definition/glossary sheet that goes over all the attributes and levels used in the conjoint study, especially when researching technical products.) In U.S. studies, Gartner’s experience has generally seen more usage of the product definition/glossary sheet.

• 36% of respondents took approximately 10 seconds to make a choice in pairs at the start

of the pairs section.

• 33% of respondents took approximately five seconds to make a choice in pairs at the end of the pairs section.

• 18% of Japanese respondents did not seem to be comfortable with the ACA tasks.

• 36% of Japanese respondents said unprompted that ACA was a hard task.

Many comments came back from Japanese research participants on ACA Study 2. The top

five comments follow: • “Seems like a psychological exam/analysis.” • “Too long. Too many questions/pairs.”

• “Difficult” choices. Exhausting task.”

72 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 87: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

• “I don’t know if I’m being consistent with answers from start to finish. Inconsistency.”

• “I’m curious about total results and analysis to come—how the data/research design is analyzed.”

CULTURAL ISSUES

Japan is the second largest consumer market in the world.5 Japan was ranked second in Gross Domestic Product (GDP) at 15.8% of the total world economy comparing to the U.S.’s 25.5%. This ranking is even more impressive considering the smaller size of the population of Japan in comparison to the U.S., and the rest of the world for that matter. Even with the crash of the Asian markets in the past few years, Japan is an economic power and a market worth pursuing. If anything, the recent economic downturns have increased opportunities for Western companies in Japan. Japan’s number two automobile manufacturer, Nissan, had to sell 36.8% of the company to French automobile manufacturer Renault in order to keep afloat.6 Nissan has returned to profitability since that happened. Historically ”sky high” real estate prices have evened out, approaching levels for similar space in the U.S.7 As the real estate and currency have equalized, more and more Western companies have gone to Japan in search of profits from this lucrative market.

Japan is culturally a hard place to break into due to cultural heterogeneity/insularity. Being an island nation with a unique language and a strong work ethic has kept foreign labor and immigrants to a minimum. The Japanese people did not need foreign workers to work the farms as other countries did. The history of Japan has made things happen a certain way. In the famous management book, Theory Z, William Ouchi states that “this characteristic style of living paints the picture of a nation of people who are homogenous with respect to race, history, language, religion and culture. For centuries and generations these people have lived in the same village next door to the same neighbors. Living in close proximity and in dwellings which gave little privacy, the Japanese survived through their capacity to work together in harmony. In this situation, it was inevitable that the one most central social value which emerged, the one value without which the society could not continue, was that an individual does not matter”8 Ouchi discusses the basis for teamwork which came from the need to cooperate to farm. He makes comparisons of cooperative clusters of Japanese households around Japanese farms and the space separation of homesteads in the U.S.9 The very basis of how the Japanese have been raised in an environment of teamwork and cooperation stands against the American ideals of self-sufficiency and independence. Japanese society is built on this environment of teamwork and working things out.

One thing that is not as pronounced in Japan is the large amount of lawyers and lawsuits that are part of the everyday culture in the U.S. “In 1998 Japan had one lawyer for every 6600 people, the lowest per capita figure amongst major industrialized nations. In the U.S. at the time

5 Ibid. Fields et al., p. 9-11. 6 Kageyama, Yuri, “Bargain Shopping in Japan,” The Oregonian, 8/15/2001, Section B, p.1-2. 7 Ibid. Kageyama, p. 1-2, and Fields, p. 3. 8 Ouchi, William G., Theory Z, New York, Avon Addison-Wesley Publishing Company, 1981, p.54-55. 9 Ibid. Ouchi, p.54-55.

732001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 88: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

there was one lawyer for every 300 people and one for every 650 people in the ”U.K.”10 The implicit cultural message is to avoid confrontation. For the Japanese, this goes hand in hand with the desire for long-term relationships. These types of lawyer statistics “clearly reflect the American concern for legal rights, which sharply define many relationships in the U.S., both professional and personal. The Japanese, on the other hand, are less concerned about legal rights and duties and what’s legally mine or yours than about the quality of a relationship in terms of longevity and mutual supportiveness.”11

Societal politeness in Japan is something that Westerners are not very familiar with. Our discussant Ray Poynter commented that of the Western nations, perhaps the U.K. is the most familiar with the levels of societal politeness as a result of an awareness of class structure and royalty in the U.K. From the measured way that one introduces oneself in a business setting to the tone/vocabulary of the Japanese language in precise situations, Japan is unlike Western countries. Anyone who has done business in Japan can tell you of the formal presentation of meishi or business cards.12 Entire books and chapters in books about Japan are devoted to the differences in communication styles.13 In the West, we work in polarized black and white distinctions. In Japan, it is more shades of gray. It would be hard to get agreements immediately in Japanese business. The answers given seem to be noncommittal to Westerners. Part of this is attributable to the need for a group decision. The people you may be meeting with need to take the proposal back to their team for a group decision and consensus. The lack of an immediate answer could also be met with just a smile or silence from a Japanese person. Instead of giving you a strict “no” and causing you a loss of face, they would rather be ambiguous.

The concept of “face” is also very foreign to Westerners. The basis of Japanese society on social surface harmony calls for not embarrassing anyone. “In tandem with the Japanese desire for saving face and maintaining surface harmony is their aversion to the word ‘no.’”14 The Japanese person will probably say something like “cho muzukashii” or “it’s a little bit difficult” when pressed for an answer that may be uncomfortable to either the questioner or respondent.

Although this is changing with the most recent generations of Japanese people, there is a societal unwillingness to stand out of crowd. Until the past 10 or so years, it was unheard of to dye hair in Japan, now hair dye is a market estimated at 53 billion Yen (approximately $530 million).15 However, for the general working class Japanese, they strive to just fit in. Workers are often supplied with company uniforms and almost anyone who has visited Japan on business is familiar with the dark suit of a salary man. Work units are set up in teams with management working elbow to elbow with lower echelon employees. In a Japanese office, there is a lack of Western cubicles and private space. People sit across from each other on communal undivided tables. This adds to the teamwork in the workplace that Japanese people are famous for. In the U.S., the cubicles are individually oriented. A Web commentary from the Asia Pacific

10 Melville, Ian, Marketing in Japan, Butterworth-Heinemann, 1999, p.60-61. 11 Deutsch, Mitchell F., Doing Business with the Japanese, New York, Mentor NAL, 1983, p.68. 12 Rowland, Diana, Japanese Business Etiquette, New York, Warner Books, 1985, p.11-17. See also commentary on translation tone/context in Melville p. 121. See also commentary on translation/market research in Christopher, Robert C., American Companies in Japan, New York, Fawcett Columbine, 1986, p.128. 13 Shelley, Rex, Culture Shock! Japan, Kuperard, 1993, p. 116-135. 14 Ibid. Rowland, p.31, Melville, p.107, and Deutsch, p. 80-83. 15 Ibid. Fields et al. p. 12.

74 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 89: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Management Forum describes the Japanese concept of “true heart” or magokoro which describes the Japanese psyche: “a person who exhibits all the characteristics that have traditionally been attributed to the ideal Japanese—one who meticulously follows the dictates of etiquette, is scrupulously truthful and honest, can be trusted to fulfill all his obligations, and will make any sacrifice necessary to protect the interests of friends or business partners”16 That is not a common Western ideal, where interest in individual needs takes precedence. This word ‘individualism’ in Japanese, kojinshugi, is noted by Japan-based American journalist Robert Whiting as a negative connotation. “The U.S. is a land where the hard individualist is honored…In Japan, however, kojinshugi is almost a dirty word.”17

Japan has on average 2.21 more levels in product distribution structure than do Western

countries.18 This leads to less profit margins at each step and potentially higher prices. The basis of these relationships goes back to the keiretsu or aligned business group. The large companies of Japan have been around for hundreds of years; they have built business relationships that cross into many product areas and distribution nets. The alignment of these companies as suppliers, consumers and distributors of component goods and products ties all large-scale Japanese businesses together. From Dodwell’s Industrial Groupings in Japan chart, the multitude of various complex relationships can be seen.19

16 De Mente, Boyd Lafayette, “Monthly Column,” Asia Pacific Management Forum, September 2000, p. 1-3; www.apmforum.com/columns/boye42.htm. See also Deutsch, p. 139. 17 Whiting, Robert, You Gotta Have Wa, New York, Vintage Books, 1989, p. 66. According to De Mente, “wa” incorporates mutual trust between management and labor, harmonious relations among employees on all levels, unstinting loyalty to the company (or team), mutual responsibility, job security, freedom from competitive pressure from other employees, and collective responsibility for both decisions and results: www.apmforum.com/columns/boye11.htm, p.2. 18 Anderson UCLA Grad School Web resource, www.anderson.ucla.edu/research/japan/t4/sup2art.htm. 19 Czinkota, Michael R. and Woronoff, Jon, Unlocking Japan’s Markets,Rutland (VT), Charles E. Tuttle Company, 1991, p. 34.

752001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 90: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

T o k y u M it s u b is h i

N ip p o n S te e lT o k a i

T o y o ta

T o s h ib a

I B J

M a ts u s h ita

M i t s u i

S u m i t o m o

F u y o

D K B

S a n w a

N is s a n

H it a c h iS e ib u S a is o n

M o r i m u r a

O ji

K a w a s a k i

F u r u k a w a

M e i j i

S ix m a jo r in d u s t r ia l g r o u p s

T o w m e d iu m in d u s t r ia l g r o u p s le d b yle a d in g b a n k s

V e r t ic a l ly in t e g r a te d g r o u p s

R e la t io n s h ip

ing

ay down to the personal level, a sort of business contact net. The Japanese term for this is jinmyaku. In Japanese business, it is often who you know that drives success, not how good your product is. This is changing somewhat now, but these relationships are contrary to similar business relationships in Western countries.

C r e d i t : D o d w e l l M a r k e t in g C o n s u l ta n t s , I n d u s t r ia l G r o u p in g s in J a p a n , 1 9 8 8 , p . 5 .

Long-standing relationships are what Japanese society is built upon, from how businesses

recruit and keep employees to how business suppliers/manufacturers are selected. “An overridconcern for many Japanese businesses is loyalty and, related thereto, stability. It is therefore regarded as advantageous to create a more structured framework for business activities. This spawns the long-standing and time-honored relationships many Japanese take pride in.”20 Over and over the literature on Japan will quote successful Western businessmen as describing that it will take “”five to 10 years to turn a profit.”21 These relationships go all the w

20 Ibid. Czinkota and Woronoff, p. 35-36. See also ”Long-Term Perspective,” Czinkota and Woronoff, p. 179-180. 21 Ibid. Deutsch, p. 155.

76 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 91: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Cultural differences in product formulation are evident. We missed a crucial attribute for our PCJ

ll

g of

,

an. y

. Customer service has got to atch the high standards and expectations. In a department store in Japan, store employees will

inu

o

h dishonor

% n

study in Japan by omitting “”footprint/size. Because of the limited space in Japanese offices and homes, the size of an appliance or piece of equipment is very important. This can be seen athe way down to the miniature size of personal electronics in Japan. Wireless communicationgiant DoCoMo has been able to drive the development of tiny wireless Internet phones, working with major manufacturers but branding them with the DoCoMo brand name.22 The packagina product in Japan is so important because of the gift-giving culture on both a personal and corporate level. Since many items are given as gifts, the importance of an attractive package with no flaws is perhaps more important than similar goods in the West.23 Czinkota and Woronoff cover the following Japanese product needs: the quality imperative, the importance of priceproduct holism, reliability and service, and on-time delivery.

Another issue that needs to be addressed is that of tailoring the product correctly for JapExamples of how equipment needs to be adjusted for Japanese body sizes abound. A companmust not use the “one size fits all” mentality with Japan. The sophistication of the consumer to see design or manufacturing flaws is well-documented. This speaks to the issue of product holism. In Japan the product has got to work on all levels—from the way it looks to pleasingcolors to being well-designed and to being well-supported by customer service/warranties/repair/parts. Not always does the best product in terms of features win. The durability and reliability of a product is more of a concern in Japanm

ndate the customer with offers of assistance. In a department store in the West, the customer may have to actively search for a store employee. The Japanese expectation of service and support is very high. “Japanese customers and channel members expect major service backup, nmatter what the cost, with virtually no time delay. Service or sa-bi-su is a highly regarded component of a product and is expected throughout the lifetime of a business relationship.”24

Defective goods are a source of shame; no Japanese company would want to bring sucon the company.

Market research in Japan is a topic that hopefully this paper will shed some light upon. Japan spends 9% of the world’s total spent on market research. This number compares to the U.K.’s 9and the U.S.’s 36%. The biggest spender in worldwide market research dollars is the EuropeaUnion (including Germany and the U.K.) with approximately 46%.25 Melville gives some information on the comparative market research costs by country: “Japan’s 9% is rather low, considering that research in Japan is more expensive than in any other country. On average, it is about twice the cost of research in Western Europe, and 18% higher than in the U.S.”26 When doing work in Japan, translation is an under-appreciated item. The American Electronics Association Japan (AEA Japan) reinforces a stance on tone/formality/vocabulary in translation found consistently in the literature in the following passage. “If your technology is leading-edge,

22 Rose, Frank, “Pocket Monster,” Wired, September 2001, p. 126-135. 23 Ibid. Melville, p. 134. See also De Mente on Japanese design aesthetics, the words shibui = restrained/refined, wabi=simple/quiet/tranquil, sabi=the beauty of age, yugen=mystery/subtlety. p. 3; www.apmforum.com/columns/boye14.htm. 24 Ibid. Czinkota and Woronoff, p. 178. 25 Ibid. Melville, p. 161. 26 Ibid., Melville, p. 161.

772001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 92: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

you do not want to go with a discount translation company; they will produce only egregiously emb

r

sing agreement it had with Suntory. The problem was that untory did not have as strong of a distribution network compared to other domestic Japanese

bee

This

ontention because in the old Suntory agreement all advertising was controlled by Suntory. Sun

e ss

ent g

1999. “The execution of P&G pan’s Japanese market entry was a rocky road. At one point, they seriously had to consider

g from the Japanese market. They chose to stay and ompete with the local competitors, even though it meant having to develop a precise nd

ne

e necessary research with its consumers to identify opportunities. The Joy brand dish detergent that

arrassing, nonsensical garbage that will tarnish your company image.”27

Case studies like Budweiser beer’s experience in Japan are fodder for many MBA marketing

programs. Judging success is a relative measure. Would Anheuser-Busch rather be the numbeone imported beer in Japan, or would it rather have a bigger share of the domestic Japanese market which is approximately one billion cases annually (from 1993 numbers)? The Budweiser case study goes back to the licenS

rs. The market share of Budweiser in 1993 was 1.2% of the total Japanese beer market, whichequates to approximately 10.1 million cases. In comparison, Budweiser carried a hefty 21.6% market share in the U.S. beer market, which equates to a hefty 330 million cases. In 1993, Anheuser-Busch dismissed Suntory and launched a joint venture with larger brewer Kirin. was an effort to gain more control of the distribution system on Anheuser-Busch’s part and to also use a stronger Kirin distribution system. The control of advertising was also a point of c

tory did little in the way of advertising and promotion of Budweiser. With the new Kirin joint venture, Anheuser-Busch controlled the important media submissions and had more freedom toexperiment in this market.28 Only time well tell how well Budweiser’s efforts in Japan will result. Many foreign companies have failed in Japan for various reasons. The main reasons havbeen due to: failure to understand the market, impatience, product marketing, long-term businerelationships, distribution systems and language/cultural barriers.

Proctor & Gamble (P&G) almost failed in Japan, but had the guts and long-term commitmto stay in the market. Bob McDonald, the president of P&G Japan, conveyed the followinmessage to the UC Berkeley Haas business school Japan tour inJawriting off their investment and retirincu erstanding of Japanese consumer needs, their way of doing business, and especially their distribution infrastructure.”29 P&G could leverage its brand name much like the local keiretsu could. Koichi Sonoda, Senior Manager of Corporate Communications at Dentsu (the number oJapanese ad agency), said, “Without reputation or trust of its corporate brand, a company like Proctor and Gamble would have experienced hardship not only in sales performance of theirproducts, but also in dealing with wholesalers, retailers, or even financial institutions. Building trust or a favorable corporate image is the most important factor for success in Japan, whereas in the U.S., when the product brand is well accepted, the corporate brand matters less.”30 P&G did th

was introduced in 1995 as a concentrated grease fighting soap is a great example of adapting 27 Business in Japan Magazine, “Q&A with the AEA,” p.3; www.japan-magazine.com/1998/sep/zashi/dm5.htm. See also commentary on translation tone/context in Melville p. 121. 28 Ono, Yumiko, “King of Beers wants to rule more of Japan,” The Wall Street Journal, October 28, 1993, p. B1-B6. See also www.eus.wsu.edu/ddp/courses/guides/mktg467x/lesson5.html. 29 McDonald, Bob, “Consumer Market Entry in Japan,” Haas Summer Course; www.haas.berkeley.edu/courses/summer1999/e296-3/trip/japan/pandg.htm. Another fact lost on some companies going into Japan, that there are domestic Japanese products that must be considered competition. 30 Moffa, Michael, “Ad-apting to Japan: a Guide for Foreign Advertisers,” Business Insight Japan Magazine, Nov., 1999, p. 2; www.japan-magazine.com/1999/november/ad-apting.htm.

78 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 93: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

a product to Japan/Asia. P&G also worked on the packaging with redesigned bottles that saved shelf space, which would no doubt please the retailer and distributors as well as the consumers.31

ters due

are not likely to use mputers and the aging population will not be computer literate like the younger users are.

The

There are still double-byte operating system issues. Japanese researchers are still using ACA 3.0

oblem.

Surprisingly, Japanese schools are not well-equipped or trained to teach computer usage and the re

ave to teach the Internet, because kids may learn faster than the teachers.” Statistics reveal that only 20% of Japanese teachers know how to use PCs. The experiences of Japanese Exchange and Teaching (JET) assistant English teachers also reveal some limitations in the way that school is taught. The strictness, old school methods and regimentation of the Japanese education are well-documented. INTERPRETATIONS/CONCLUSIONS

These were real research projects conducted in Japan. They are not academically configured with corresponding research projects in America or Western countries. Based on the Japan/America Cross-Cultural Research Study, there is evidence that Japanese people tend to select answers from the centroid rather than from the outliers. To similar and lesser degrees this is backed up by the other Japanese-only studies conducted. The tendency to select from the centroid leads to significant differences (in absolute values) between the priors utilities and the pair utilities. However, the shape of Japanese data based on Euclidean distances is more consistent.

There are computer usage issues in Japan. The Japanese need more help using compu

to the fact that using computers is a fairly recently learned skill. Computers are up to 25%integration into the home as of 1998.32 Japanese people older than 45 yearsco

re has been an operating system bias since it takes time to localize systems and programs into Japanese. Typing on a keyboard is not common; it is harder since the characters are multistroke. Handwriting is more common.

versions which only work on the proprietary NEC operating system—“pain in the NEC.” Lack of double-byte system support for different software packages is a continuing prAlthough the Internet is to some degree replacing the need for some of this localization, according to recent Japanese Internet usage demographics, the growing 19.4 million “wiredcitizens entail approximately 25% of the entire population.33

Internet. “In Japan, fewer than 70% of the schools actively use the three or so PCs that aconnected to the Internet. Of these three, one is used by teachers, and the remaining two by students. If you consider the fact that an average elementary school holds up to 1000 students, the schools are not doing enough in computer education. Some people even joke that teachers lock up the PCs so they won’t h

34

31 Ibid. Fields et al., p. 18-19. 32 Ibid. Melville, p. 175. 33 Japan Inc Magazine, “Selling to Japan Online,” p. 2; www.japaninc.net/mag/comp/2000/10/print/oct2000p_japan.html. 34 Sekino, Nicky, “Compaq’s Murai Challenges Info Age Japan,” Business Insight Japan Magazine, 1998, p. 3; www.japan-magazine.com/1998/nov/bijapan-www/zashi/dm4.htm

792001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 94: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

In the U.S., respondents know what they like and don’t like in a matter of absolutes, but not nts in between—a basic confrontational American way of thinking. In Japan,

respondents put more careful thought into the choices and don’t change their minds. The pairs

does not mean that ACA is not effective. Additional cluster analysis showed no

e, uti e. a from b aware c issue

Weig in holdo ways tries.

d not find any consistent ig ou o

ri have pca uch

.).

v g and teamwork experience for us as researchers as b Japan. The Japanese reluctance to address cultural issues

e casions is contrasted by the U.S. proclivity to do just that. d important to keep in mind when undertaking

ny research project in Japan.

as much on the poi

match the priors in terms of shape.

Thissignificant differences in final utilities between centroid people and outlier people. ACA combines final utilities using data/weighting from the priors and the pairs.

For studies conducted in Japan alon lities will hold up fin In comparing ACA datJapan to ACA data from Western countries, the researcher should e of the entroidand process the data with this in mind. hting methods and us g uts will provideto tune the data to work across all coun

Surprisingly, for brand/no brand in the pairs comparisons, we iddifference in the ass nment of centroid or tlier based on the answers t the paired comparisons. Once again, these studies did not all include brand as an att bute, so we onlythe data we have shown. On holdouts prediction, we feel that brand does lay a part in the holdout selections/predictions we have seen. However, Nikkei Research utions that so mmore goes into the product choice in Japan (packaging, color, size, distribution method, etc

This study was a ery good learninrepresentatives from oth the U.S. andand make bold statem nts on many ocThis is a basic difference between our cultures ana

80 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 95: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Appendix A

Priors vs. Pairs Util ie ’ve tried to showit s: We , where possible, the brand and pricing attributes.

Priors Pairs Priors Pairs e 1 Level 1 14.54* 2.40* Attribute 4 Level 3 31.32* 48.11*

23.72* 35.21* ttribu l 5 12.75* 16.72*

2.73 ttribute 4 Level 6 2.54 2.75 ttribute 5 Level 1 15.37* 4.26*

ttribu l 2 21.86* 8.12* ttribute 5 Level 3 20.06* 7.93* ttribute 5 Level 4 18.26* 9.62*

vel 1 36.71* 55.90* ttribute 6 “Cost” Level 2 36.16* 47.97*

vel 3 27.59* 37.77* ttribute 6 “Cost” Level 4 14.04* 19.10*

vel 5 1.41* 0.16*

1.30* 6.02*3.48* 0.08*9.43* 0.37*

Priors Pairs

n=307 Study 1: AttributAttribute 1 Level 2 19.35* 16.07* Attribute 4 Level 4 Attribute 1 Level 3 28.71* 23.64* A te 4 LeveAttribute 1 Level 4 21.24 2 AAttribute 1 Level 5 17.17* 24.99* AAttribute 2 Level 1 11.29* 4.71* A te 5 LeveAttribute 2 Level 2 19.13* 14.54* AAttribute 2 Level 3 26.89* 17.44* AAttribute 2 Level 4 18.78* 14.22* Attribute 6 “Cost” LeAttribute 2 Level 5 16.35* 10.40* AAttribute 3 Level 1 16.58* 11.52* Attribute 6 “Cost” LeAttribute 3 Level 2 14.35* 4.35* AAttribute 3 Level 3 20.33* 11.78* Attribute 6 “Cost” LeAttribute 3 Level 4 19.28* 11.13* Attribute 3 Level 5 2 1 Attribute 4 Level 1 2 5 Attribute 4 Level 2 2 5 * Significant difference at 95% confidence level.

n=123 Study 2: Priors Pairs

8.25 te 7 Level 4 59.06* 41.25* e 8 Level 1 14.19* 9.48*

7.59* 8 Level 2 56.04* 29.65* e 9 Level 1 6.39 4.88 e 9 Level 2 31.00* 38.00*

vel 3 52.18* 37.42* 1.48* e 9 Level 4 54.23* 46.71*

10 Level 1 1.65 0.82 10 Level 2 57.89* 49.27*

e 11 Level 1 2.19 0.48 1.64 11 Level 2 55.78* 83.94* 7.25 11 Level 3 62.48* 94.62*

evel 1 1.79 0.70 Attribute 12 Level 2 43.98* 75.05*

ttribute 6 Level 1 8.16 6.17 Attribute 12 Level 3 69.89* 95.50* ttribute 6 Level 2 38.05 38.36 Attribute 13 Level 1 3.45* 0.31* ttribute 6 Level 3 46.92* 32.92* Attribute 13 Level 2 71.34* 90.78* ttribute 6 Level 4 50.77 45.15 Attribute 14 Level 1 5.03 2.88

Attribute 1 “Cost” Level 1 3 44.54 AttribuAttribute 1 “Cost” Level 2 33.01* 47.50* AttributAttribute 1 “Cost” Level 3 2 36.79* AttributeAttribute 1 “Cost” Level 4 14.63* 8.93* AttributAttribute 2 Level 1 18.87* 4.76* AttributAttribute 2 Level 2 42.87* 26.00* Attribute 9 LeAttribute 3 Level 1 1 2.22* AttributAttribute 3 Level 2 55.52* 35.29* AttributeAttribute 4 Level 1 3.20* 0.34* AttributeAttribute 4 Level 2 66.88* 54.37* AttributAttribute 5 Level 1 1 8.67 AttributeAttribute 5 Level 2 2 28.19 AttributeAttribute 5 Level 3 52.94* 79.44* Attribute 12 LAttribute 5 Level 4 67.53* 93.55* AAAA

812001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 96: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

n=123 (con’t) Study 2: Priors Pairs Priors Pairs Attribute 7 Level 1 5.22* 17.37* Attribute 14 Level 2 59.20* 46.71* Attribute 7 Level 2 27.77* 21.13* Attribute 7 Level 3 43.70* 19.85*

c at 95% el.

airs

* Significant differen e confidence lev

n=107 Study 3: Priors P

L el 1 67.42* L el 2 43.28* L el 3 76.10* L el 4 30.92*

7.70* 35.22* 20.91

14.89*

20.59* 16.77

ttribute 5 Level 1 38.99 41.65 ttribute 5 Level 2 30.44* 22.38*

Attribute 5 Level 3 39.50* 26.30* Attribute 5 Level 4 27.21* 35.38* * Significant difference at 95% confidence level.

n=185 Study 4: Priors Pairs

Attribute 1 “Brand” ev 45.11* Attribute 1 “Brand” ev 25.06* Attribute 1 “Brand” ev 46.68* Attribute 1 “Brand” ev 16.71* Attribute 2 Level 1 17.17* Attribute 2 Level 2 50.12*Attribute 3 Level 1 24.62 Attribute 3 Level 2 40.26* Attribute 4 Level 1 36.94 40.48 Attribute 4 Level 2 47.49* Attribute 4 Level 3 13.72 AA

Attribute 1 “Brand” Level 1 36.53 43.69 Attribute 1 “Brand” Level 2 28.34* 40.41* Attribute 1 “Brand” Level 3 51.03* 88.62* Attribute 1 “Brand” Level 4 28.14* 75.70* Attribute 2 Level 1 17.73* 5.12* Attribute 2 Level 2 57.40 48.80 Attribute 3 Level 1 36.38* 19.55* Attribute 3 Level 2 51.16* 27.15* Attribute 3 Level 3 28.99* 31.37* Attribute 4 Level 1 3.58* 6.54* Attribute 4 Level 2 77.10* 40.22* Attribute 5 Level 1 6.93* 1.50* Attribute 5 Level 2 76.70 71.33 * Significant difference at 95% confidence level.

82 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 97: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

n=85 Priors Pairs Priors PairsStudy 5:

ttribute 1 Level 1 26.07* 7.65* Attribute 5 “Price” Level 6 27.48* 42.83* tt

evel 1 21.94 18.74

L

4 L

e

AA ribute 1 Level 2 28.89* 22.38* Attribute 5 “Price” Level 7 14.87* 64.76* Attribute 2 Level 1 33.76 34.35 Attribute 2 Level 2 9.69* 1.14* Attribute 2 Level 3 32.82 29.90

ttribute 3 LAAttribute 3 Level 2 20.24* 6.12* Attribute 3 Level 3 33.92* 19.99*

ttribute 4 evel 1 24.29* 34.38* AAttribute 4 Level 2 5.53* 1.45* Attribute 4 Level 3 36.78* 48.52*

ttribute evel 4 42.97* 68.34* AAttribute 5 “Price” Level 1 9.58 12.00 Attribute 5 “Price” Level 2 22.52 18.68

ttribute 5 “Price” Lev l 3 32.60* 14.25* AAttribute 5 “Price” Level 4 38.77* 24.13* Attribute 5 “Price” Level 5 37.27* 30.38* Significant difference at 95% confidence level.*

832001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 98: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Appendix B

Additional frequency distributions from the other studies, obviouanalysis and does not address the possibility of individual respondents beoutlier oriented. Also, different product categories could have an effect on the resuof the data.

9 - 11.5%

123 respondents 1 - 9 - 5.3%

9.6% 9 - 3.6%

s x 185 respondents 1 - 9 - 5.0%

9 - 19.0%

Th

sly this is aggregate level ing centroid oriented or

and shape lts

Frequency Distribution of Study 1—37 pairs x 333 respondents 1 - 15.4% 2 - 7.4% 3 - 12.7% 4 - 10.5% 5 - 14.5% 6 - 10.1% 7 - 11.5% 8 - 6.4% Frequency Distribution of Study 2—35 pairs x

5.8% 2 - 8.9% 3 - 14.3% 4 - 12.7% 5 - 12.3% 6 - 13.2% 7 - 15.8% 8 - 8.5% Frequency Distribution of Study 3—12 pairs x 107 respondents 1 - 4.2% 2 - 5.9% 3 - 13.1% 4 - 15.0% 5 - 16.3% 6 - 15.2% 7 - 16.9% 8 - Frequency Distribution of Study 4—8 pair

3.8% 2 - 8.4% 3 - 12.6% 4 - 14.7% 5 - 17.8% 6 - 15.7% 7 - 14.9% 8 - 7.0% Frequency Distribution of Study 5—16 pairs x 85 respondents 1 - 19.2% 2 - 5.3% 3 - 7.8% 4 - 10.1% 5 - 13.7% 6 - 9.8% 7 - 9.6% 8 - 5.4%

anks To Nikkei Research Clients for data sets Chuck Neilsen Thunderbird American Graduate School of International Management Aaron Tam Don Milford

Teri Watanabe Hiroyuki Iwamoto

84 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 99: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

A METHODOLOGICAL STUDY TO COMPARE ACA WEB AND ACA WINDOWS NTERVIEWING

Aaron Hill & Gary BakSawtooth Softw

Tom PilonTomPilon.com

Over the past two decades, Adaptive Conjoint Analysis (ACA) has gained popularity and

acceptance in the market research community. Due to the interactive nature of the questionnaire, ACA surveys must be conducted using computers. In the past, many researchers have relied on Sawtooth Software’s DOS-based ACA program.

I er

are, Inc.

rom the

There is an inherent risk that two survey methods would differ in ways that would make combining and analyzing the resulting data sets unacceptable.

Software survey instruments were developed using the same rogramming code, differences exist that could potentially introduce some bias.

y, results for both methods should be almost exactly the same with respect to both imp

OUND

r

tradeoff tasks cou

r methods for lly, Johnson’s recent application of hierarchical

has improved the predictive accuracy of ACA utilities even further.

With Sawtooth Software’s introduction of two new ACA products (ACA/Web, released in December of 2000, and ACA version 5.0 for Windows, released November 2001), researchers can now conduct bimodal ACA studies, with respondents self-selecting into either a disk-by-mail (CAPI) or an online survey implementation. This development could allow researchers to conduct ACA surveys in a greater number of settings, reaching more respondents with more convenient survey tools and at lower costs.

However, researchers conducting bimodal surveys must be confident that the results ftwo survey methods are comparable, and that one method does not introduce error or bias not present in the other method.

While both of the new Sawtoothunderlying pTheoreticall

lementation and results. This pilot research project suggests that any differences between thetwo survey methods are minimal and that it is acceptable to combine the results from the two modalities.

BACKGR

Adaptive Conjoint Analysis was first developed by Richard Johnson in 1985 as a way foresearchers to study a larger number of attributes than was generally thought prudent with standard conjoint analysis methods. By collecting initial data from respondents,

ld be designed that efficiently collected information regarding the most relevant tradeoffs for each respondent. By combining a self-explicated priors section with customized tradeoff questions, ACA is able to capture a great deal of information in a short period of time. Over the years, Johnson has added new capabilities to his original process, leading to bettecombining priors and pairs utilities. AdditionaBayes estimation for ACA

The ACA conjoint model has steadily gained favor over the years due to its effectiveness in researching products and services with large numbers of attributes or with high customer involvement. Additionally, respondents in one survey indicated that ACA studies seem to take less time and are more enjoyable than comparable full profile conjoint studies (Huber et al.,

852001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 100: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

1991). ACA is now “…clearly, the most used commercial conjoint model…” (Green et al., 2000).

ACA has other important characteristics as well. The individual-level utility calculatioallow researchers to segment respondent populations and generally draw

ns more precise

conCA

re

ty reversals (e.g. a higher price having a higher util

ncy s

subset of the

sing

eb and Windows versions of ACA that would make utilities from one inc

ethods. Secondly, any differences in survey look and feel can be minimized. Finally, the there is minimal “survey modality” self-selection bias.

Our secondary reason for conducting this research was to see how respondents reacted to the s. We were particularly interested in finding whether perceptions differed with

reg r

nistered to 121 students at three universities. Each respondent was randomly assi re

clusions about market preferences than conjoint methods based on aggregate utility estimation. This is particularly true in cases where a high degree of heterogeneity exists. Aalso tends to be less prone to the “number of levels effect,” where attributes represented on molevels tend to be biased upward in importance relative to attributes represented on fewer levels. Finally, ACA is generally robust in the face of level prohibitions within the paired tradeoff section and significantly reduces troublesome utili

ity than a lower price) relative to traditional conjoint methods.

ACA is not necessarily the best conjoint method for all situations, however. The interview format forces respondents to spread their attention over many attributes, resulting in a tendeto “flatten out” relative importance scores when many attributes are studied. ACA interviewalso require respondents to keep an “all else being equal” mindset when viewing only a

full array of attributes. Both of these factors tend to reduce ACA’s effectiveness in measuring price attributes, often leading to substantially lower importance (price elasticity) estimates than are actually present in the market. ACA surveys also must be administered ucomputers, which rules out its use in some interviewing situations. It has no compelling advantages over other conjoint methods when smaller numbers of attributes (about six or seven or fewer) are being studied.

SURVEY EXPERIMENTAL DESIGN

Objective The primary purpose of our study was to validate the hypothesis that there are no substantial

differences between the Wompatible with utilities from the other. That is, all other things held equal, the method used to

implement an ACA survey (either Web or disk) should have no impact on the final utilities or results.

This hypothesis is based on several underlying assumptions. The first assumption is that the design algorithm used to create each individual survey is fundamentally the same between mhypothesis assumes that

different methodard to how long the surveys took, whether respondents felt that they were able to express thei

opinions, and whether one was more enjoyable or more realistic than the other.

Design Considerations To test the primary hypothesis, Web and Windows versions of the same ACA survey were

developed and admigned to take one version or the other, so that the numbers of respondents for each mode we

roughly equal at each university. The first version (“Web”) was created and administered using

86 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 101: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

the ACA/Web program, with respondents gaining access to the survey over the Internet throua password-protected site. The second version (“Windows”) was created using ACA/Wi

gh ndows

and

o to

gh r

ns, consideration was given to several survey design and

rior to launch to ensure that they generated logical surveys and reasonable utilities. After ollecting the field data, the joint frequency distributions between levels were measured to

te/level combination occurred with roughly equal frequency.

) ffect.

font the

ar

e. one survey specifies an a priori ordered attribute where the other does not).

described using three of the ten attributes were displayed per

cho e y

administered, wrapped within a Ci3 questionnaire, using a CAPI approach. Students completed the survey on PCs and returned the disks to their instructor.

The survey asked about features of notebook computers, with utilities measured for twsix levels for each of ten attributes (Appendix 1). Seven of the ten attributes were a priori ordered, while the remaining three had no natural order of preference. Notebook computers were chosen because we believed that the majority of the respondents would be familiar enouwith the subject to complete the survey and because notebook computers involve a large numbeof attributes that could affect purchase decisions.

To control for the underlying assumptio implementation factors. First, both versions of the questionnaire were evaluated and tested

pcconfirm that each attribu

One unexpected benefit of this testing was that we found, and were subsequently able to correct, a small error in the way that the Web software generated the tradeoffs in the pairssection. Incorrect random number seeding led to similarity in the initial pairs questions for different respondents, which led to undesirable level repetition patterns across (but not withinrespondents. Fortunately, there is no evidence that this problem had any practical e

We also went to great lengths to make both surveys look and feel similar. Wherever possible, question and instruction text was written using exactly the same wording, and the and color schemes were standardized across interfaces. This ensured that we minimizeddifferences that were not interface-specific. Self-selection bias was controlled by randomly assigning participants to one version or the other, a practice that would not normally occur in realworld research situations.

In situations where researchers are interested in performing bimodal surveys such as this, it is highly recommended that they take appropriate steps to ensure that the two surveys are as similas possible. Even seemingly minor differences can lead to negative consequences, as is the caseif rating scales are dissimilar, if instruction text is slightly different, or if survey design is not exactly the same (i.

SURVEY VALIDATION Holdout Tasks

Results from the study were validated through the use of four partial-profile holdout choicetasks. Three product concepts

ice task. Across the four choice tasks, all attributes were displayed. The four holdout choictasks were repeated (with concept order rotated within each task) to measure test/retest reliabilit(see Appendix 4).

Holdouts were designed based on initial utility estimates derived from a small test study. The first task was designed to be relatively easy (predictable), with one choice dominating the others. Two of the remaining three tasks were designed with varying degrees of choice difficulty

872001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 102: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

(predictability), while the fourth task had three choices with roughly equal average utilities, making the choice task more difficult for respondents to answer (and for us to predict).

It is important for researchers to incorporate holdout tasks in surveys. Well-constructed holdout tasks are extremely useful for a variety of reasons. In our study, these holdouts were used to control for differences in respondent test/retest reliability between the two design treatments, determine the correct scaling “exponent” in the market simulator, measure the

ation techniques, and to identify inconsistent respondents. sks to test specific product concepts of interest to clients,

rovide concrete validation of the conjoint analysis process, identify potential errors in survey des

ppendix 5.)

Met

Hierarchical Bayes). This allowed us to examine whether different techniques might identify the two survey methods. Again, our hypothesis was that the results

from

for ere able to

successfully predict his or her actual responses to the holdouts. The hit rates were then compared to each individua respondents were able to give the same answe it rates to account for the internal consistency of the respondents within each group by dividing each respondent’s actual hit rate by his or her retest reliability score. This “adjusted hit rate” represents the ratio of the number of times a choice was accurately predicted by the utilities to the number of times a respondent was able to accurately repeat his or her previous choices. Since our choice tasks consisted of three alternatives, we would expect random responses to achieve a mean test/retest reliability of 33%.

To validate the results of the survey, we first compared the adjusted hit rates within each utility calculation method to determine how well the different survey methods were able to predict the individual holdout choices. We then performed significance testing (t-test) to determine if small amounts of hit rate variation between the two methods were statistically significant for any of the utility calculation methods. A lack of statistically significant differences would seem to support our hypothesis that the Web and Windows versions would produce comparable results.

o see whether the two versions produced similar results when used in the

for each alternative represented in the holdout tasks. By averaging the actual holdout choices for

accuracy of the various utility estimResearchers can also use holdout tap

ign or data processing, and to determine which utility estimation method results in the most accurate predictions for a specific data set.

(Note: For additional information on designing holdout tasks, please refer to A

parison hod ComWe began our analysis by computing utilities for the Web and Windows respondent groups

using several different estimation techniques (including priors only, OLS regression and

incongruencies between the two survey methods should be the same regardless of analysis technique.

Using these sets of utilities and the results of the holdout tasks, we calculated “hit rates”each respondent. These hit rates measured how often an individual’s utilities w

l’s test/retest reliability, a measure of how oftenr to a repeated holdout task. We adjusted the h

We also wanted tmarket simulator. In other words, would the difference between predicted and actual shares of choice for the holdouts be similar between survey methods? To determine whether the two methods produced similar results, we computed the mean absolute error (MAE) between the aggregate share of preference reported by the market simulator and the actual choice probability

88 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 103: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

89

“exponent” to minimize the mea d share of preference and the average holdout probabilities. It inimize the MAE so that the true predictive ability of the utilities can be assessed independent of the differences caused by utility scaling.

FINDINGS AND CONCLUSIONS

uld o

targ

s

e characteristics.

o

The perceived time was significantly lower in both cases than the actual time to complete the survey, a finding that has been illustrated in prev us studies. One of the potential drawbacks to using a Web interface is the potential for slow connections to significantly increase the amount

T-test*d

4.4 4.0 3.9 0.8um

to get done* T

tudy

both groups, comparisons between groups can be made using a standard benchmark. Using the market simulator, we computed shares of preference for each holdout task and tuned the

n absolute error between the estimate is critical to tune the exponent to m

It should be noted that this project is only intended as a pilot study. Further research wobe useful to c nfirm our results, particularly since the sample size was relatively small and the

et population was less than representative of typical market research subjects.

Finding #1: There were no significant differences in how the two groups reacted to the survey process.

When asked about the survey experience, respondents indicated that both versions of the survey were equal in every respect. There were no significant differences in ratings on the itemreported in Table 1, and open-ended responses were very similar between the two versions.

Qualitative Measures Respondents were asked to rate their respective surveys on seven descriptiv

In both cases, significant differences between the two methods were nonexistent, and werecomparable t a previous study involving ACA for DOS (Huber et al., 1991).

Table 1: Qualitative ratings of survey methods

Qualitative Responses"How much do you agree or Web Windows

'91 Huber S

isagree that this survey:"…was enjoyable" 5.8 6.0 5.8 -0.6"…was easy" 6.7 6.6 6.6 0.1"…was realistic" 6.8 6.6 6.3 0.7"…allowed me to express my opinions" 6.4 6.3 6.4 0.2"…took too long""…was fr strating" 3.2 3.2 3.3 0.0"…made e feel like clicking answers just

"3.6 3.9 3.4 -0.6

Time to Complete

-test compares results between the Web and Windows versions.

io

2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 104: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

of time it take nce in the survey completion times between the two methods (both took about 18 minutes) (See Table 2).

Table 2: Time to complete interview

Finding #2: There did not appear to be a significant difference between the Web utilities and the Windows utilities.

Utilities were calculated using the standard Ordinary Least Squares (OLS) estimation procedure included with ACA, and then normalized as “zero-centered diffs.” Significance testing

confidence lev ppendix 2). Although som t

wo methods.

y

crease the like s

ed just the pairs information (not the default HB method) achieved adjusted hit rates that

ata to estimate the utilities and constrained the estimates

Bachieving an adjusted hit rate of 81% for

s to complete a survey. In this study, there was not a significant differe

Time Differences Web Windows T-testPerceived time to complete survey 15.2 13.7 1.7Actual time to complete survey 17.5 18.1 -0.3

of these utilities revealed that there were no statistically significant differences (at the 95% el) between the importance scores for the two methods (A

e of the differences in average utility values for the two methods appear to be differen(Appendix 3), they are few in number (6 out of 36 with t>2.0) and their true significance is not easily assessed because of their lack of independence. More telling, the correlation between the resulting average utilities was 0.99. This leads us to conclude that for all practical purposes thereseems to be no difference between the utilities estimated by the t

Finding #3: Priors (“self-explicated”) utility values did as well as OLS utilities with respect to hit rates for this data set.

One of the more interesting findings in this study was that the self-explicated section of ACA appeared to do just as well at predicting holdouts (in terms of hit rates) as the ACA OrdinarLeast Squares utilities, which incorporated the additional information gathered in the pairs section. Normally, one would expect that the additional pairs information would in

lihood of accurately predicting holdout tasks. However, in this case, it appears that the pairsection did not improve the hit rates achieved using the OLS utilities.

It should also be noted that utilities estimated using an ACA/Hierarchical Bayes run that incorporat

were very similar to the self-explicated results. This suggests that each section alone contributes approximately equal information, while the addition of the other section marginally adds to the predictive ability of the model.

Finding #4: For both groups, Hierarchical Bayes estimation achieved significantly higher adjusted hit rates.

Using ACA/HB, we used the pairs dto conform to the ordinal relationships determined in the priors section. With both data sets, the Hierarchical ayes utilities outperformed both the OLS utilities and the priors only utilities,

the Web and 91% for Windows (See Table 3).

90 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 105: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 3: Adjusted Hit Rates for Different Utility Estimation Techniques

Web Win t-test

Test-retest reliability 84.5% 86.1% -0.5

Adjusted hit ratesACA OLS analysis 72.8% 82.7% -1.9

Finding #5: This study (and many others) suggests that well-designed self-explicatesurve

Prior utilities only 76.0% 85.0% -1.5HB (constrained pairs) 80.9% 90.6% -1.5HB (unconstrained pairs) 74.9% 84.1% -1.5

d

self-exp ed

e

of

survey would have produced acceptable results. It is often only after the ACA results are analyzed that the appropriate survey mechanism an be determined and the value of the pairs as

While the pairs questions in ou d only marginally to the results achieved through OLS estimation, they were essential in achieving the superior results obtained through HB analysis. This leads us to conclude that if using ACA, one should always include the paired tradeoff section unless budgetary or time constraints prevent their use. In these cases, we would recommend the use of a more rigorous form of self-explicated questionnaire than is available in the priors section of ACA.

Finding #6: Now that we have Hierarchical Bayes estimation for ACA, the inclusion of the Calibration Concepts section may be a wasted effort in many ACA studies.

Another significant finding confirms past observations regarding the value of the calibration oncepts section of ACA. Calibration concepts are used to scale individual utilities for use in

odels. They are also used to identify inconsistent respondents.

ys may perform quite well in some situations. In the past several years, researchers have debated the value of asking paired tradeoff

questions in ACA when seemingly good results can often be obtained using a well-designed stionnaire. This study seems to lend some credibility to the use of self-elicated que xplicat

models, but this finding comes with several important caveats.

First, past comparative studies between the two models have generally compared ACA with well-developed self-explicated questionnaires, which are more suited for stand-alone use than thsimple question flow used in the priors section of ACA. For instance, ACA allows researchers to specify a priori attribute orders, which assume linearity in the priors. In a self-explicated survey,researchers would want to use a rating-type scale to estimate the differences in preference between these levels. As a result, one cannot necessarily conclude that a study consisting only ACA priors is a reasonable substitute for a well-designed self-explicated exercise.

Additionally, researchers cannot know until after the fact whether a “self-explicated only”

csessed.

r survey may have contribute

cpurchase likelihood simulation m

912001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 106: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

They have been anrespondents as being confusing.

integral part of ACA since its inception, but are often regarded by

r

dicates that the exponent (scale factor) will need to be set at around five to ma

be needed in fewer ACA studies. Hierarchical Bayes analysis automatically scales the utilities for use in most market simulation models, although additional tuning of the exponent may still be required. It also calculates a goodness of fit measure that is probably a better estimate of respondent consistency than ACA’s correlation figure, which is based on the calibration concepts. Unless the researcher requires the purchase likelihood model, there is no need to ask the calibration concept questions.

Finding #7: The Windows and Web ACA groups did not differ either in terms of the holdout “hit rates” or the predictive accuracy of holdout shares.

The final finding, and probably the most important, is that we did not find any significant differences in the final results between the two versions. Both versions had very similar adjusted hit rates within utility estimation treatments. In each case, adjusted holdout hit rates improved when Hierarchical Bayes utility estimation was used (see Table 3).

We also examined the mean absolute errors (MAE) of each method when comparing market simulation results with actual choice probabilities from the holdout tasks. While the mean absolute errors varied between utility estimation methods, the MAEs of each survey method were very similar for each estimation technique (See Table 4). Once again, we found that the Hierarchical Bayes utility estimates provided improvements over other utility estimation methods, achieving the lowest MAEs.

Table 4: Mean Absolute Errors between Market Simulation Share of Preference and Observed Choices from Holdout Tasks

In the market simulation validation described previously, the uncalibrated utilities had lowemean absolute errors than the calibrated utilities (See Table 4). Additionally, the calibrated utilities required significantly large exponents to minimize the error in share of preference simulations. In most cases where calibrated utilities are used in market simulations, our experience in

ximize the fit of holdout choice tasks.

With the introduction of ACA/Hierarchical Bayes, calibration concepts will

Web WindowsMAE MAE

ACA Utility Run (OLS) 7.15 6.60Prior Utilities 7.06 7.14HB Utilities 5.86 6.45HB Utilities (Calibrated) 6.21 6.51

92 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 107: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

CONCLUSIONS

Our study suggests that dual mode surveys, using both ACA/Web and ACA for Windows, re an acceptable op The results from

substantially, meaning thtility data.

ary fin mir ear to utilitie ition ilit sin

a tion for conducting research. at you should be able to run dual m

the two seemode studies and com

not to differ bine the final

u

Second dings seem to ror previous res ch with respect the value of self-explicated s, and the add al predictive ab y added when u g HB.

932001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 108: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Appendix 1: Attributes & Levels Included in Notebook Survey

Brand

Appendix 2: Importance Scores Derived from Web and Windows Utilities

Weight Screen Size Battery Processor

(a priori – best to worst)

(a priori – worst to best)

(a priori – worst to best)

IBM Compaq Toshiba Dell Sony Winbook

Weighs 6.3

pounds Weighs 7.0

pounds Weighs 7.7

pounds Weighs 8.4

pounds

15.7" Screen

Size 15.0" Screen

Size 14.0" Screen

Size 12.0" Screen

Size 10.4" Screen

Size

2 Hour Battery Life

3 Hour Battery Life

4 Hour Battery Life

600 Mhz Processor

Speed 700 Mhz Processor

Speed 800 Mhz Processor

Speed

Hard Drive Memory Pointing Device Warranty Support

(a priori – worst to best)

(a priori – worst to best)

(a priori – worst to best)

(a priori – best to worst)

10 GB Hard Drive

20 GB Hard Drive

30 GB Hard Drive

64 MB RAM 128 MB RAM 192 MB RAM 256 MB RAM

Touchpad

Pointing Device Eraser Head

Pointing Device Touchpad &

Eraser Head Pointing Device

1 yr. Parts & Labor Warranty

2 yr. Parts & Labor Warranty

3 yr. Parts & Labor Warranty

24 hrs/day 7 days/week

Toll Free Telephone Support

13 hrs/day 6 days/week Toll Free Telephone Support

3.2% 12.7% 0.81Screen 13.2% 12.9% 0.38Brand 1.3% 12.6% -1.80Battery 0.4% 9.3% 1.64Weight .2% 9.7% 0.57Speed 9.9% 8.5% 1.87Hard Drive 9.7% 8.8% 1.34Warranty 9.2% 10.1% -1.09Pointing Device 7.2% 8.3% -1.33Support 5.7% 7.1% -1.83

Importance Web Windows t-statRAM 1

1110

94 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 109: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Appendix 3: Average Utility Scores for Web and Windows Versions of CA

(Represented as “zero-centered diffs”)

A

Web Win t-stat Web Win t-statBrand Hard Drive Capacity

IBM 19.9 22.0 -0.3 10 GB hard drive -49.6 -43.6 -1.5Compaq -1.3 2.1 -0.5 20 GB hard drive 4.3 3.3 0.3Toshiba -6.6 -4.8 -0.3 30 GB hard drive 45.4 40.3 1.3Dell 14.8 12.2 0.3Sony 13.3 21.5 -1.1 Random Access Memory (RAM)Winbook -40.1 -53.1 2.3 64 MB RAM -67.3 -64.1 -0.7

128 MB RAM -17.4 -15.2 -0.6Weight 192 MB RAM 22.4 22.4 0.0

6.3 lbs 43.5 41.6 0.4 256 MB RAM 62.3 56.9 1.37 lbs 23.0 11.3 3.27.7 lbs -16.9 -8.5 -2.3 Pointing Device8.4 lbs -49.5 -44.3 -1.0 Touchpad -8.0 -1.4 -1.0

Eraser head -10.4 -18.1 1.1Screen Size Both 18.5 19.5 -0.2

15.7" screen 35.9 46.2 -1.415" screen 38.8 33.5 1.0 Warranty14" screen 13.7 14.2 -0.1 1 year warranty -43.1 -49.5 1.312" screen -28.3 -32.5 0.7 2 year warranty -3.9 1.6 -1.610.4" screen -60.1 -61.4 0.2 3 year warranty 47.0 47.9 -0.2

Battery Life Technical Support2 hr. battery -52.5 -43.7 -2.2 24/7 tech support 27.7 34.7 -1.73 hr. battery 1.8 -2.0 1.2 13/6 tech support -27.7 -34.7 1.74 hr. battery 50.8 45.7 1.2

600 Mhz processor -48.4 -40.5 -2.0700 Mhz processor -1.2 -0.1 -0.3800 Mhz processor 49.5 40.6 2.1

Processor Speed

952001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 110: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Appendix 4: Holdout Choice Task Design

AppendFor years, Sawtooth Software has recommended that researchers include holdout tasks in

conjoint studies. Holdo invaluable information as researche

oldout tasks provid ve tool

tifying/ ing unreliabl ndents suring r ent test/retes bility

cking tha samples indeed reflect a simila position of r /unreliable ondents

the samples’ reliability by indexing hit rates to within-sample test-retest reliability)

B versus OLS, etc.) testing specific product concepts of particular interest identifying potential errors in conjoint design or data processing tuning market simulation models

* Holdout 5 = Holdout 1 (shifted); Holdout 6 = Holdout 2 (shifted); etc.

Concept A Concept B Concep

ix 5: Guidelines for Creating Effective Holdout Tasks

t CHoldout 1 Toshiba Sony Winbook

15" Screen 14" Screen 12" Screen

Holdo

MB RAM 64 MB RAMH

Holdout 4

Holdout 5*

Holdout 6

Holdout 7*hpad & Eraserhead Eraserhead

antyHoldout 8*

h support 13/6 tech support 13/6 tech support

10 GB Hard Drive 30 GB Hard Drive 20 GB Hard Driveut 2 IBM Compaq Dell

600 Mhz Processor 700 Mhz Processor 800 Mhz Processor128 MB RAM 256

oldout 3 2 hour battery 3 hour battery 4 hour batteryTouhpad & Eraserhead Eraserhead Touchpad

3 year warranty 2 year warranty 1 year warrantyWeighs 6.3 lbs. Weighs 8.4 lbs. Weighs 7.7 lbs.

10 GB Hard Drive 30 GB Hard Drive 20 GB Hard Drive13/6 tech support 13/6 tech support 24/7 tech support

Winbook Toshiba Sony12" Screen 15" Screen 14" Screen

20 GB Hard Drive 10 GB Hard Drive 30 GB Hard Drive* Dell IBM Compaq

800 Mhz Processor 600 Mhz Processor 700 Mhz Processor64 MB RAM 128 MB RAM 256 MB RAM

4 hour battery 2 hour battery 3 hour batteryTouchpad Tou

1 year warranty 3 year warranty 2 year warrWeighs 7.7 lbs. Weighs 6.3 lbs. Weighs 8.4 lbs.

20 GB Hard Drive 10 GB Hard Drive 30 GB Hard Drive24/7 tec

ut tasks, when designed appropriately, can providers implement studies and analyze conjoint results.

H e an effecti for:

iden mea

removespond

e respot relia

che t split r com eliableresp

allowing researchers to adjust (by weighting respondents or

comparing the predictive ability of different utility estimation models (ACA Windows vs. Web, H

96 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 111: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Designing Holdout Tasks

som ade .

retest your conjoint survey before fielding your questionnaire. In addition to identifying problem areas, these pretests can also be used to gather preliminary utility estimates. These preliminary estimates are useful in designing well-balanced holdout choice tasks.

Holdout choice tasks should be constructed with the goal of providing the maximum amount of useful information. Use preliminary utility estimates to design holdout tasks based on the following principles:

♦ If the choice task is very obvious (nearly everybody chooses same concept), the holdout task is less valuable because the responses are so predictable.

♦ If the choice task is too well balanced (nearly equal probabilities of choice for concepts), the holdout task is less valuable because even random utilities will predict preference shares well.

♦ The best strategy is to have multiple holdouts with varying degrees of utility balance. This allows you to test extreme conditions (very easy and very difficult), and provides a richer source by which to test the success of various utility estimation techniques, simulation models (including tuning) or experimental treatments.

♦ Choice tasks should include a variety of degrees of product differentiation/similarity to ensure that the simulation technique can adequately account for appropriate degrees of competitive or substitution effects.

In this study, we designed holdouts to include relatively dominated to relatively balanced sets. The table below shows the actual choice probabilities for the sample (n=121).

Choice probabilities (test and retest collapsed)*

Holdout1 Holdout 2 Holdout3 Holdout4

Alternative A 13% 18% 31% 18%

For methodological research, we suggest that you include four or more holdouts, repeated ate point in the interview. For commercial research, two or three holdout tasks may be

quate. Below are some suggestions for designing effective holdout tasks

It’s a good idea to p

Alternative B 85% 72% 38% 25%

Alternative C 2% 10% 31% 57%

* Actual respondents chose the same concept 85% of the time between test and retest (Windows group: 86%; Web group: 84%)

972001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 112: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES

Huber, Joel C., Dick R. Wittink, John A. Fiedler, and Richard L. Miller (1991), “An Empirical Comparison of ACA and Full Profile Judgments”, in Sawtooth SoftwProceedings, Sun Valley, ID: Sawtooth Software, Inc., April, 189-20

Green, Paul E. (2000), “Thirty Years of Conjoint Analyforthcoming in Interfaces.

are Conference 2.

sis: Reflections and Prospects”,

98 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 113: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

INCREASING THE VALUE OF CHOICE-BASED CONJOINT WITH “BUILD YOUR OWN” CONFIGURATION QUESTIONS

David Bakken, Ph.D. Senior Vice President, Harris Interactive

n Bayer tist

BSTRACT

n product” question. Results from this question are compared to choice-using the same attributes and levels and gathered from the same

d

Cho etechniqu sor attrib e is the design o a stomer prefere s fits. Choice

mulat , the goal is f

duct line. As a result, choice-based t involve “mass customization.”

his technique is less suited for identifying the best mix of different wheel

pes to offer.

Choice-based conjoint faces another problem in products where the levels of the various ttributes or features have significant cost and pricing implications. While one might calculate a parate price for every concept in the design based on the specific attribute levels, this creates a w problems. In the first place, price is no longer uncorrelated with utilities of the different

LeExecutive Vice President and Chief Scien

Harris Interactive

A

We describe a method for capturing individual customer preferences for product features using a “build your ow

ased conjoint results bin ividuals. BACKGROUND AND INTRODUCTION

ic -based conjoint analysis has become one of the more popular market research e for estimating the value or utility that consumers place on different product features ut s. Choice-based conjoint is particularly valuable when the goal of the researchf n optimal product or service, where “optimal” is usually defined in terms of cu

nce , but also may be defined in terms of projected revenues or projected proors are used to identify these optimum product configurations. In most casessi

to ind the one “best” level of each feature or attribute to include in the product. While it is technically possible to simulate the impact of offering multiple products from the same manufacturer with different options, with more than a few features and levels, the task becomes

aunting. Moreover, under conditions in which IIA applies, the simulations may give a dmisleading picture of the preference shares for the overall proconjoint models are not especially well-suited for products tha

For manufacturers, mass customization offers the prospect of satisfying diverse customer needs while reducing the costs associated with greater product variety. However, from the standpoint of profitability, many companies give customers far too many choices. Pine et al (1993) cite as an example the fact that Nissan reportedly offered 87 different types of steering wheels. While choice-based conjoint potentially can identify the relative value of each differenttype of steering wheel, tty

asefe

992001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 114: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

attribute levels. The negative correlation of etween price and features may result in model failure, as Johnson, et al (1989) has demonstrated. Moreover, if the incremental cost for a given feature level is the same across all concepts, it will be impossible to estimate the price sensitivity for that level. Conditional pricing and alternative-specific designs offer limited capability for incorporating feature-specific pricing.

For some types of products (e.g., automobiles, personal computers), the diversity of consumer preferences has led most manufacturers to offer a wide variety of feature options. Many of these companies now offer “design your own product” functions on their websites. These sites allow customers to configure products by selecting the options they wish to include. While we know of no instance where companies are using these sites as a market research tool, the format of these design-your-own features is well suited to Internet survey administration.

Taking the “design your own product” website functions as a model, we developed an application to enable us to insert a design your own product question into a web-survey. STUDY OBJECTIVES

The objective of the research described in this paper was to determine the value of combining a design your own product question with a choice-based conjoint exercise. We sought to answer these questions:

res

stions be used in place of choice experiments?

N OF THE BUILD YOUR OWN QUESTION

d forth between the attributes until he

s acceptable. Figure 1 presents an example of a

utilities b

• How do the results from a design your own question compare to preference shaestimated from a discrete choice model?

• Can design your own questions be used to validate results from choice-based conjoint?

• Can design your own que

In answering these questions, we hoped to determine if the design your own results might beused to create individual level “holdout” tasks, particularly for partial profile designs. We also hoped to investigate the possibility of estimating utilities from design your own questions. IMPLEMENTATIO

After a respondent completes the choice experiment, a follow-up question presents the same attributes, one at a time, and asks for the most preferred level. Each level has a price attached (which may or may not be revealed), and the screen displays a “total price” based on the specificfeatures selected so far. The respondent can move back anor she arrives at a configuration and price that ibuild your own screen.

100 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 115: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 1

At the top of the screen we display the total price for the product as configured. Below thithe screen is divided into four sections. The upper left section lists the different features. Highlighting a feature will display additional information about that feature in the upper right section. At the same time, the available levels of the feature will be displayed in the lower leftsection. Highlighting a level will display any descriptive information (including graphics) abouthe level in the lower right section.

s,

t

ill

CASE STUDIES

e conducted two different studies incorporating the build your own question with a discrete choice experiment. Both studies involved consumer products. The data were collected through Internet surveys of Harris Poll Online (HPOL) panelists.

Firs Consumer Electronics Product he first study concerned a new consumer electronics product. The overall objective of the

study was to determine the optimum configuration for the product. A secondary objective was to determine the extent to which specific optional features could command a higher price. The product was defined by nine attributes. Eight of the attributes consisted of three levels each. The ninth was a six-level attribute that resulted from combining two three-level attributes in order to eliminate improbable (from an engineering standpoint) combinations from the design. An

The respondent selects a feature and level, and then moves to another feature. Any feature

specific changes in price are reflected in the price displayed at the top. When the respondent is satisfied with the levels for each feature and the total price, clicking on the “finish” button wsubmit the data to the survey engine. Respondents must select a level for each feature.

W

t Study: T

1012001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 116: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

orthogonal and balanced fractional factorial design was created using a design table. The design incorporated conditional pricing based on one of the features. A total of four price points were employed in the study, but only the lower three price points appeared at one level of the conditional attribute, and the upper three price points appeared with the other levels of this attribute.

The build your own question1 followed the choice experiment in the survey. There are two important differences between the choice experiment and the build your own question. First, price is not included as an attribute in the build your own question. Instead, a price is calculated based on the feature levels the respondent selects. Second, in this study, the “combined” attribute was changed to the separate three-level attributes in the build your own question in order to determine the extent to which consumers might prefer one of the improbable combinations. A total of 967 respondents from the HPOL online panel completed the survey. A choice model was estimated using Sawtooth Software’s CBC-HB (main effects only).

Study One Results Before presenting the results from the first study, we feel it is important to point out some of

the challenges that arise in comparing build your own product data with market simulations using a discrete choice model. In our case, there is no “none of the above” option in the build you

of

ns

ce

differences in preference for each level of each attribute are in the me direction although the magnitude of the change varies.

r own question. Respondents are forced to configure a product, selecting one level for each feature. Another difficulty lies in defining the competitive set for the market simulator. In theory, the choice set for the build your own question consists of all possible combinations attribute levels. Additionally, many attribute levels have a specific incremental price in the buildyour own question. This pricing must be accommodated to avoid drawing incorrect conclusiofrom the comparison. In this study, this was accomplished through conditional pricing in the choice experiment for those attributes with large price differences between levels.2 With these limitations in mind, we attempted to find comparable results wherever possible.

We first compared the incremental differences in the proportion of respondents choosing eachlevel of each feature in the build your own question with the incremental changes in preferenshare from the market simulator. The results for four representative attributes are shown in Figure 2. Here we see that thesa

1 This study employed an earlier version of the build your own question than that illustrated in Figure 1. In this version, each feature was displayed in a table with the relevant levels. Radio buttons controlled the selection of the feature levels. 2 For these attributes, the utility estimates encompass information about sensitivity to the price increment for different levels of each attribute. Thus, the market simulations should reflect the incremental pricing for each attribute level without specifically incrementing price, as long as the overall price is in the right range.

102 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 117: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 2

We next explored the extent to which the “ideal” products produced by each method resulted in s In the

th

an ideal product based on average utility estimates.

19%

43%

13%

24%

43%

44%

31%

28%

18%

63%

9%

12%

46%

35%

31%

19%

0% 10% 20% 30% 40% 50% 60% 70%

Attr01/L2

Attr01/L3

Attr02/L2

Attr02/L3

Attr04/L2

Attr04/L3

Attr05/L2

Attr05/L3

% of respondents

Build your ownChoice Tasks

imilar preference shares. To accomplish this, two market simulations were conducted. first simulation, three products were constructed so that one product was comprised of the feature levels with the highest average utility, a second was comprised of all feature levels withe lowest average utility values, and the third was comprised of the levels with intermediate average utility3. The most preferred feature level was the same for five of the eight available attributes. The second most preferred level was the same across the two methods for only threeattributes. The least preferred level was the same for four attributes. Any differences in preference share will be due to these mismatches between attribute levels, and such mismatches will always reduce the preference share for the ideal build your own product relative to the ideal product based on average utility estimates. As Figure 3 illustrates, the simulated preference share for the ideal build your own product is 77%, compared to 90% for

ause price was not an attribute in the build your own question but was, in fact, correlated with the attribute levels, we set the

ice of the most desired product at the highest level. The mid-product was priced at the intermediate level, and the least desired product at the lowest level of the price attribute.

3 Becpr

1032001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 118: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 3

any of the 3

levels.

products is sim

0.0%

7.9%

0% 20% 40% 60% 80% 100%

Least Preferred

Share of Preference

7.6%

90.0%

10.2%

76.8%

SecondPreferred

Most Preferred

Build Your OwnChoice Model

Each task in the choice experiment represents a subset of the possible combinations of features and levels. In the build your own question, respondents have the ability of “choosing”

9 possible product configurations. Figure 4 shows the proportion of respondents building products with the best (most often picked) levels of all nine attributes, and of eight, seven and six attributes. Only a handful of respondents configured a product with the best level of all nine attributes. Given that this results in a product with the highest possible price, this is not surprising. Slightly more individuals (2.6%) configured a product with eight of the nine best

The average price across these configured products was $370, about 11% lower than that of the overall best product. Greater numbers designed products with best levels for seven out of nine (13.9%) and six out of nine (25.6%) attributes. Interestingly, the average price for these

ilar to that for the eight of nine attribute combinations.

Figure 4

25 .6%

13.9%

2.6%

0.2%

0% 5% 10% 15% 20% 25% 30%

6 at tributes

7 a t tributes

8 a t tributes

9 a t tributes

% of respondents

Avg. price=$410

Avg. price=$370

Avg. price=$370

Avg. price=$376

104 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 119: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

We compared choices in the build your own question with attribute and level-specific ind

en levels in e build your own question. As noted previously, conditional pricing was used in the choice

experiment to reflect this increment. Howev ars that the conditional pricing may not capture the effect of this price increment. For example, only 23% of those who gave the highest utility to level th

e

Figure 6

ividual utility estimates from the choice model. For each respondent, we identified the mostpreferred level of a feature based on his or her utility estimates and compared that to the level that was selected in the build your own question. Figures 5 and 6 show the results for two individual attributes. Similar results were obtained for other attributes. One important characteristic of this particular attribute is the large increment ($100) in price betweth

er, it appe

ree actually selected this level in the build your own question.

Figur 5

% "Co r Fi bute

12.1%35%

52.7%

0% 20% 40% 60% 80%

Level 3Preferred

rrect" fo rst Attri

8.6%

5.4%

26%

23%

65.5%

71.3%

Level 1Preferred

Level 2Preferred

Picked Level 3Picked Level 2Picked Level 1

% "Correct" for Second Attribute

42.9%

33.5%

34.5%

52.5%

22.6%

Level 2Preferred

Level 3Preferred

25.3%

13.9%

46.0%28.7%

0% 10% 20% 30% 40% 50% 60%

Level 1Preferred

Picked Level 3Picked Level 2Picked Level 1

1052001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 120: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Build your own results may aid in the design of an optimal product line consisting of more

than one variant based on different feature combinations. As a simple example, we looked at three attributes in combination. Here our goal is to identify two or more different products that will maximize overall penetration. Figure 7 shows the ranking of several configurations in terms of unique reach as well as the selection frequencies for the levels of each attribute.

Figure 7

Best Product Configurations for Top Three Attributes Attribute 1 Attribute 2 Attribute 3 Unique % choosing First product Level C Level C Level C 19% Second product Level C Level C Level B 13% Third product Level C Level B Level B 11% Fourth product Level C Level B Level C 9% Total 69% 66% 53% 52%

Second Study: Consumer Durable The second study focused on a consumer durable category. For this study, we employed a

partial profile design because of the large number of attributes (12). Six attributes were constant in every choice task (brand, price, form factor and three attributes that were either more important to management or had several levels). The remaining attributes randomly rotated into and out of the choice tasks. Each respondent evaluated 25 choice tasks, of which 5 were holdout tasks. Each task presented three concepts. A “none” alternative was included. Respondents, 1170 in total, were recruited from the HPOL on-line panel. A main effects model was estimated using CBC-HB.

Study Two Results In comparing the results of the choice model with those of the build your own question, we

assumed that two attributes, brand and form factor, would be most informative. Previous research has shown that brand and form factor preferences are particularly strong. Moreover, brand and form factor account for most of the variability in price in the build your own exercise.

igures 8 and 9 compare the simulated results for each brand and form factor with the selections om the build your own question. In this instance, if we take the product configuration as the fe s

.

Ffrre rence point, the choice model overestimates preference for Form A and underestimatepreference for Form B. Form B is somewhat (10%) more expensive that Form A, on averageFor brand, the results are similar, with the choice model overestimating some brands and underestimating others.

106 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 121: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 8

Figure 9

We comselections in the build three levels. This attribute has the larg 4. So that

pared incremental effects of level changes from the choice model with level your own question. Figure 10 shows the results for one attribute with

This was a “partial” attribute, appearing in 50% of the choice tasks. est price difference between levels (+$450 for level 2 and +$900 for level 3)

we incorporated the incremental pricing for attribute levels by adding the incremental price from th

base price. The price range tested in the choice experiment was large enough to allow simulation of all baossible attribute specific incremental prices.

4 For this study, e build your own question to the se prices with all p

Predicted Brand Share vs. Actual Pick Rate for Form A

7.4%3.9%Brand G

10.1%

6.1%

10.5%

8.1%

8.7%

6.5%

10.3%

4.4%

7.3%

11.6%

4.6%

3.8%

0% 5% 10% 15%

Brand A

Brand B

Brand C

Brand D

Brand E

Brand F

Build Your OwnSimulation

Predicted Share for Form vs. Actual Pick Rate

57.8%

12.6%

45.8%

15.7%

0% 10% 20% 30% 40% 50% 60% 70%

Form A

Form C

30.1%

38.5%Form B Build Your Own

Simulation

1072001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 122: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

the results are as comparable as possible, we looked at the differences by form factor for the two rms that accounted for 84% of the choices. These results are interesting because the

diff r the choice model than for the build your wn responses. In particular, for Form A, respondents are much more sensitive in configuring a

produc

foerences between levels are much less pronounced fo

ot to the price difference between level 2 and level 3.

Figure 10

Comparison of Incremental Effects of Attribute Levels

90

20

30

40

50

60

70

80

efer

ence

sha

re/%

of r

espo

nden

ts

0

10

Level 1 Level 2 (+$450) Level 3 (+$900)

Pr

Sim--Form A--Attrib 1BYO--Form A--Attrib 1Sim--Form B--Attrib 1BYO--Form B--Attrib 1

We see a similar effect for the second attribute in Figure 11. We included a second

simulation where the base price did not include the level-specific increment. There is very little difference between the two simulations, suggesting that the incremental prices may not be large enough to make a difference in the simulator.

Figure 11

Simulations with and without price increment, vs. BYO

90%

0%10%20%30%40%50%60%70%

Pre

fere

nce

shar

e/%

pic

king

leve

l

80%

Lvl 1 (

+$0)

Lvl 2 (

+$50

)

Lvl 3 (

+$125

)

Lvl 4 (

+$15

0)

Lvl 5 (

+$50)

Lvl 6 (

+$20

0)

Lvl 7 (

+$15

0)

Lvl 8 (

+$15

0)

Sim w/o priceSim w/priceBuild Your Own

108 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 123: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

DISCUSSION

to preference shares estimated from a discrete choice model?

ompare? aw that the results are similar, but there appear to be

e

tion,

task. ask, respondents must make trade-offs between the different attributes as well as

etween attributes and price. In a typical choice experiment, most of the concepts will be t a respondent will have to accept a less desirable level of one attribute in order

obtain a desire level of some other attribute. Sometimes the most desired level of an attribute wil ild

vels as

ect feature-specific sensitivity to price. Our main effects models appear to do a poor job emulating the price-responsiveness

bserved in the design your own question. The inclusion of conditional pricing in the first study seemed to have little impact.

The second consequence is that respondents may use different choice heuristics in the two

he que dents are loy non-compensatory strategies such as elimination by aspects (EBA). Use of non-compensatory strategies may have a significant impact on parameter estimates; particularly in main effects-only models (see Johnson et al, 1989).

At the outset we sought to answer three questions:

• How do the results from a design your own question compare

• Can design your own questions be used to validate results from choice-based conjoint?

• Can design your own questions be used in place of conjoint experiments?

How Do the Results CWith respect to the first question, we s

some important differences. In particular, market simulations based on choice-based conjoint may be less sensitive than the build your own question to the incremental cost of different featurelevels. The two tasks are similar in some respects, notably that respondents indicate their desirefor particular features, but they differ in two very important aspects. First, in the choice task, thlevels of the features are uncorrelated with the levels of price. In the build your own quesevery feature level has some associated incremental price, even if that incremental price is $0 for a given level. Moreover, there is a relationship between the expected utility of a level and the price increment; as functionality or benefit increases, so does the incremental price.

The second difference lies in the nature of the trade-offs that respondents make in each In a choice tbconstructed so tha

dtol be offered at a relatively lower price, and sometimes at a relatively higher price. In the bu

your own question, respondents do not have to make compromises in selecting feature lelong as they are willing to pay for those levels.

We suspect that these differences have two consequences. First, the fact that features and price were uncorrelated in the choice tasks makes it harder to det

o

tasks. The build your own question may constrain respondents to employ a compensatory strategy with respect to price. Respondents are forced to consider all attributes to complete t

stion, and each attribute has some cost implication. In the choice task, however, responnot so constrained, and may emp

1092001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 124: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Can We Validate CBC with Build Your OOur experience indicates that the build your

wn? own question provides ind

choice-based conjoint results. Given the unlimited set of alternatives offewn question, it is difficult to devise market simulations that compare direct

Individual-level validation turns out to be quite difficult.

rength of choice-based conjoint lies in the ability to estimate utility values for features d price in such a ay that b er respon e to different combinations of features and price can

ext, even if those specific combinations were never resented to survey respondents. As we have implemented it to date, the build your own

que o to elicit responses that can be used to estimate utilities, at least at an aggregate level. If base and incremselectio as a function of price. Prices might be varied across individuals (req ri rough replicated build your own que o

ONCLUSIONS

Based on our experience with these two studies, we believe that build your own questions may

gn

hnson, Eric J., Robert J. Meyer, and Sanjoy Ghose (1989), “When Choice Models Fail:

irect validation of red by the build your

ly to the build your oown results.

As we have noted above, in one specific area the build your own results would appear to

invalidate the choice-based conjoint results. In the build your own exercise respondents are more sensitive to incremental feature prices. At this point we cannot be sure if this lack of feature price sensitivity is a general characteristic of choice-based conjoint or specific to the models in our studies.

Can Design Your Own Replace CBC? The st

an w uy sbe simulated, usually in a competitive contp

sti n does not have this capability. However, the build your own framework may offer a way

ental prices are systematically varied in some fashion, it might be possible to model the n of different features

ui ng large samples) or perhaps within individuals thsti ns.

C

indeed add value to choice-based conjoint under specific circumstances. For those categories where buyers can choose among product or service variations defined by different levels of attributes, the build your own question may yield information that can be used to desithe best combinations of features for variations within a product line. Build your own questions may also be more sensitive to feature-related price differences.

REFERENCE Jo

Compensatory Models in Negatively Correlated Environments,” Journal of Marketing Research, 24 (August), 255-270

110 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 125: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

APPLIED PRICING RESEARCH Jay L. Weiner, Ph.D.

Ipsos North America

and choice modeling (conjoint and discrete choice) APPRO C

• is pro

• be t

• rice Sensitivity Meter (PSM) - van Westendorp • Conjoint Analysis and Choice based conjoint • Discrete Choice (and Brand/price tradeoff)

PRICE ELASTICITY

The typical economic course presents the downward sloping demand curve. As the price is raised, the quantity demanded drops, and total revenue often falls. In fact, most products exhibit a range of inelasticity. That is, demand may fall, but total revenue increases. It is the range of inelasticity that is of interest in determining the optimal price to charge for the product or service. Consider the following question. If the price of gasoline were raised 5¢ per gallon, would you drive any fewer miles? If the answer is no, then we might raise the price of gas such that the quantity demanded is unchanged, but total revenue increases. The range of inelasticity begins at the point where the maximum number of people are willing to try the product/service and ends when total revenue begins to fall. Where the marketer chooses to price the product/service depends upon the pricing strategy.

There are two basic pricing strategies. Price skimming sets the initial price high to maximize revenue. As the product moves through the product lifecycle, the price typically drops. This strategy is often used for technology based or products protected by patents. Intel, for example, prices each new processor high and as competitors match performance characteristics, Intel lowers its price. Penetration pricing sets the initial price low to maximize trial. This pricing strategy tends to discourage competition, as economies of scale are often needed to make a profit.

ABSTRACT

The lifeblood of many organizations is to introduce new products to the marketplace. One of the commonly researched topics in the area of new product development is “What price should we charge?” This paper will discuss various methods of conducting pricing research. Focus will be given to concept testing systems for both new products and line extensions. Advantages and disadvantages of each methodology are discussed. Methods include monadic concept testing, the van We endorp Premium Pricing Model (PSM) st

.

A HES TO PRICING RESEARCH

Blunt approach – you can simply ask – “how much would you be willing to pay for thduct/service?”

Monadic concept – present the new product/service idea and ask “how likely would youo buy @ $2.99?”

P

1112001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 126: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The goal of the pricing researcher should be to understand this range of prices so as to make good strategic pricing decisions. In the blunt approach, you typically need a large numbrespondents to understand purchase intent at a variety of price points. The monadic concept tesalso requires a fairly large number of respondents across a wide range of price points. THE VAN WESTENDORP MODEL

In order to better understand the price consumers are willing to pay for a particular product orservice, Dutch economist Peter van Westendorp developed the Price Sensitivity Meter. Thunderlying premise of this model is that there is a relationship between price and quality and that consumers are willing to pay more for a higher quality product. The PSM requires the addition of 4 questions to the questionnaire.

• At what pr

er of t

e

ice would you consider the product to be getting expensive, but you would still consider buying it? (EXPENSIVE)

• At what price would you consider the product too expensive and you would not consider buying it? (TOO EXPENSIVE)

• At what price would you consider the product to be getting inexpensive, and you would consider it to be a bargain? (BARGAIN)

• At what price would you consider the product to be so inexpensive that you would doubt its quality and would not consider buying it? (TOO CHEAP)

Sample Datavan Westendorp PSM (n=281)

MGP=$3.38 MDP=$5.00

Range of Acceptable Prices

100%

OPS=$3.58

IDP=$4.95

0%

25%

50%

%

$0 $2 $4 $6 $8 $10 $12 $14

Price ($US)

75

TooExpensive

Expensive

Bargain

Too Cheap

112 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 127: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The indifference price point (IDP) occurs at the intersection of the bargain and expensive lines. To the right of this point, the proportion of respondents who think this product is expensive exceeds the proportion that thinks it is a bargain. If you choose to price the product less than IDP, you are losing potential profits. To the left of the indifference price, the proportion of respondents who think this price is a bargain exceeds the proportion who think it is expensive. Pricing the product in excess of the IDP causes the sales volume to decline. The IDP can be considered the “normal” price for this product. The marginal point of cheapness (MGP) occurs at the intersection of the expensive and too cheap curves. This point represents the lower bound of the range of acceptable prices. The marginal point of expensiveness (MDP) occurs at the intersection of the bargain and too expensive curves. The point represents the upper bound of the range of acceptable prices. The optimum price point (OPS) represents the point at which an equal number of respondents see the product as too expensive and too cheap. This represents the “ideal” price for this product. The range between $3.58 (OPS) and $4.95 (IDP) is typically the range of inelasticity.

Newton, Miller and Smith (1993), offer an extension of the van Westendorp model. With the addition of two purchase probability questions (BARGAIN and EXPENSIVE price points), it is possible to plot trial and revenue curves. We can assume that the probability of purchase at the TOO CHEAP and TOO EXPENSIVE prices is 0. These curves will indicate the price that will stimulate maximum trial and the price that should produce maximum revenue for the company. It is the use of these additional questions that allows the researcher to frame the range of inelasticity.

With the addition of purchase probability questions, it is possible to integrate the price erceptions into the model. The Trial/Revenue curves offer additional insight into the pricing ue ice

pq stion. By plotting the probability of purchase at each price point, we can identify the prthat will stimulate maximum trial. By multiplying the proportion of people who would purchase the product at each price by the price of the product, we generate the revenue curve.

1132001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 128: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Sample Data

Trial/Revenue Curves (n=281)

Max Revenue=$5.00

Max Tria

$0 $2 $4 $6 $8 $10 $12 $14 Price ($US)

l=$3.50

Trial

Revenue

exam

nalysis adds a different perspective. Using this model, the optimum price is about the same. aximum trial is achieved at $8 while maximum revenue is achieved between $8.60 and $9.95.

Beyond these price points, pricing is very elastic meaning that both trial and revenue would decrease significantly as price increases.

The difference between the point of maximum trial ($3.50) and the point of maximum revenue ($5.00) represents the relative inelasticity of the product. Inelastic products are products where there is little or no decrease in sales if the price were increased. Most products are inelastic for a narrow range of price. If we choose to price the product at the point of maximum revenue, we may lose a few customers, but the incremental revenue more than offsets the decline in sales.

Monadic price concept tests are very blunt instruments for determining price sensitivity. Webelieve that the data suggest that the range of inelasticity is far greater than it should be. For

ple: For one concept tested three prices, i.e., $9.99, $12.99, and $15.99 which means we tested the same concept three times each with a different price. The results showed that the $9.99 elicited significantly more appeal than either of the other two other price points which were equal in appeal (DWB: 18%…for $9.99 vs. 14%…for $12.99 & $15.99). As such, the conclusion would be that trial would be highest at $9.99 but that the trial curve would be inelastic from $12.99 to $15.99. Therefore, one interpretation would be that if we were willing to accept lower trial by pricing at $12.99, we could increase our revenue by jumping to $15.99, as the consumer is price insensitive between $12.99 and $15.99. However, the van Westendorp aM

114 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 129: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Comparison of van Westendorp to sts In this experiment, we tested eight new product concepts. There were four cells for each of

the eight concepts. A nationally representative sample was drawn for each group (Base > 300 each cell). For each new product concept, there were three monadic price cells (price client wanted to test and ±25%) and one unpriced cell with the van Westendorp and Newton/Miller/Smith questions. Both premium price and low price products were tested to determine if the relative price affected the results. Premium price products are products that are relatively expensive compared to other products in the category.

Products/Prices Tested

Healthcare (Premium Price)

Category Low Med High

Monadic Concept Te

Healthcare Premium Price $ 4.29 $ 5.69 $ 6.99 Healthcare Low Price $ 2.79 $ 3.69 $ 4.59 Health & Beauty Premium Price $ 5.59 $ 7.49 $ 9.39Health & Beauty Low Price $ 2.99 $ 3.99 $ 4.99 Snack Premium Price $ 2.99 $ 3.99 $ 4.99 Condiment Low Price $ 2.29 $ 2.99 $ 3.69 Home Care Low Price $ 1.89 $ 2.49 $ 3.09 Personal Care Premium Price $ 2.29 $ 2.99 $ 3.69

0

0.05

0.1

0.15

0.2

0.25

0.3

4 4.25 4.5 4.75 5 5.25 5.5 5.69 5.75 6 6.5 6.75 7 7.11 7.5

vW Trial MC Trial

1152001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 130: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

16 1

Healthcare (Low Price)

H )

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

$2.75 $3.00 $3.25 $3.50 $3.69 $3.75 $4.00 $4.25 $4.50 $4.59

vW Trial MC Trial

ealth & Beauty (Premium Price

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

$5.52 $5.61 $5.99 $6.50 $7.00 $7.50 $8.00 $8.50 $9.00 $9.36 $9.95

vW Trial MC Trial

2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 131: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Health & Beauty (Low Price)

Snack Food (Premium Price)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

$2.99 $3.49 $3.99 $4.50 $4.99

vW Trial MC Trial

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

$2.99 $3.49 $3.99 $4.50 $4.99

vW Trial MC Trial

1172001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 132: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

118

Condiment (Low Price)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

$2.25 $2.50 $2.75 $3.00 $3.25 $3.50 $3.75

vW Trial MC Trial

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1.99 2.25 2.5 2.75 2.99 3.25 3.5 3.59 3.75 3.99

vW Trial MC Trial

Home Care (Low Price)

2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 133: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Personal Care (Premium Price)

119

OW WELL DOES THIS VALIDATE

ess r two of our product concepts, the average

rice was below the lowest monadic price tested.

A

? H

For each of the 8 products, we have actual year 1 trial (IRI), estimates of ACV and awarenand average product price (net of promotions). Fop

Category Low vg. PriceHealthcare $ 4.29 $ 6.14 Healthcare $ 2.79 $ 3.13 Health & Beauty $ 5.59 $ 5.16 *** Health & Beauty $ 2.99 $ 3.47 Snack $ 2.99 $ 3.81 Condiment $ 2.29 $ 2.50 Home care $ 1.89 $ 1.79 *** Personal Care $ 2.29 $ 2.44

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

$1.75 $1.89 $1.99 $2.25 $2.49 $2.75 $3.00 $3.15 $3.25

vW Trial MC Trial

2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 134: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

C L N M P V R S

Predicted versus Actual Trial

Actual vW Predicted Monadic

WHAT ABOUT CONJOINT ANALYSIS

For the lower priced Healthcare product, conjoint purchase intent data were collected. In addition to price, two additional attributes were varied. This resulted in a very simple nine-card design. By selecting the correct levels of the two additional attributes tested, we can predict purchase intent for each of the three prices tested. The prices from the monadic concept test were used in the conjoint. The results suggest that conjoint results over predict purchase intent.

120 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 135: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Healthcare Product (Low Price)

UMMARY

Monadic concept tests tend to over-estimate trial. This may be due to the fact that prices giv

il price

Monadic concept tests require a higher base size. A typical concept test would require 200 to 300

The van Westendorp series does a reasonable job of predicting trial from a concept test wit

ieved

points

Conjoint analysis tends to over-estimate trial, but competitive testing (discrete choice, CBC) sho

The van Westendorp series does a reasonable job of predicting trial from a concept test wit

ieved

points

Conjoint analysis tends to over-estimate trial, but competitive testing (discrete choice, CBC) sho

0

0.1

0.2

0.3

0.4

0.5

0.6

$2.75 $3.00 $3.25 $3.50 $3.69 $3.75 $4.00 $4.25 $4.50 $4.59

vW Trial MC Trial Conjoint

S

en to respondents in a monadic concept test do not adequately reflect sales promotion activities. Respondents may think that the price given in the concept is the suggested retaand that they are likely to buy on deal or with a coupon.

completes per cell. The number of cells required would depend on the prices tested, but from the results, it appears that three cells tend to over-estimate the range of inelasticity. Providing a competitive price frame might improve the results of monadic concept tests.

ompletes per cell. The number of cells required would depend on the prices tested, but from the results, it appears that three cells tend to over-estimate the range of inelasticity. Providing a competitive price frame might improve the results of monadic concept tests.

hout the need for multiple cells. This reduces the cost of pricing research and also the likelihood that we do not test a price low enough. The prices given by respondents are belto represent the actual out-of-pocket expenses. This permits the researcher some understanding of the effects of promotional activities (on shelf price discounts or coupons). The van Westendorp series will also permit the research to understand the potential trial at pricehigher than those that might be tested in a monadic test.

hout the need for multiple cells. This reduces the cost of pricing research and also the likelihood that we do not test a price low enough. The prices given by respondents are belto represent the actual out-of-pocket expenses. This permits the researcher some understanding of the effects of promotional activities (on shelf price discounts or coupons). The van Westendorp series will also permit the research to understand the potential trial at pricehigher than those that might be tested in a monadic test.

uld improve the results.

uld improve the results.

1212001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 136: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES

avid W. Lyon (2000), Pricing Research (Chapter 21) in Marketing Research: State-of-the-Art

ewton, Dennis, Jeff Miller and Paul Smith (1993). A Market Acceptance Extension to

an Westendorp, Peter H. (1976). NSS – Price Sensitivity Meter (PSM) – A New Approach to

einer, Jay L. & Lee Markowitz (2000). Calibrating / Validating an adaptation of the van

D

Perspectives

NTraditional Price Sensitivity Measurement. AMA ART Forum.

vStudy Consumer Perception of Price. Venice Congress Main Sessions p. 139-167.

WWestendorp model. AMA ART Forum Poster Session.

122 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 137: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

RELIABILITY AND COMPARABILITY OF CHOICE-BASED MEASURES: ONLINE AND PAPER-AND-PENCIL METHODS OF ADMINISTRATION

Thomas W. Miller A.C. Nielsen Center, School of Business

University of Wisconsin-Madison David Rake∗

Reliant Energy Takashi Sumimoto∗

Harris Interactive Peggy S. Hollman∗

General Mills ABSTRACT

Are choice-based measures reliable? Are measures obtained from the online administration of choice tasks comparable to measures obtained from paper-and-pencil administration? Does the complexity of a choice task affect reliability and comparability? We answered these questions in a test-retest study. University student participants made choices for 24 pairs of jobs in test and retest phases. Students in the low task complexity condition chose between pairs of jobs described by six attributes; students in the high task complexity condition chose between pairs of jobs described by ten attributes. To assess reliability or comparability, we used the number of choices in agreement between test and retest. We observed high reliability and comparability across methods of administration and levels of task complexity. INTRODUCTION

Providers and users of marketing research have fundamental questions about the reliability and validity of online measures (Best et al. 2001; Couper 2000; Miller and Gupta 2001). Studies in comparability help to answer these questions. In this paper, we present a systematic study of online and paper-and-pencil methods of administration. We examine the reliability and comparability of choice-based measures. Our study may be placed within the context of studies that compare online and traditional methods of administration (Miller 2001; Miller and Dickson 2001; Miller and Panjikaran 2001; Witt 2000). Methods

We implemented the online choice task using zTelligence™ survey software, with software and support from InsightTools and MarketTools. To provide a relevant task for university student participants, we used a simple job choice task. Students were given Web addresses and personal identification numbers. The online task consisted of a welcome screen of instructions and 24 choice pairs, with one choice pair per screen. Student participants clicked on button images under job descriptions to indicate choices. Students were permitted to work from any

∗ Research completed while students at the A.C. Nielsen Center, School of Business, University of Wisconsin-Madison.

1232001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 138: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Web-connected device. Computer labs were available, and help desk assistance was available. Initial online instructions to students were as follows:

In this survey you will be given 24 pairs of jobs. For each pair of jobs, your task is to pick the job that you prefer. Indicate your choice by clicking on the button below the preferred job. After you have made your choice, press the “CONTINUE” button to move on to the next pair of jobs.

For each choice pair, we repeated the instruction, “Pick the job that you prefer.”

Paper-and-pencil administration consisted of one page of instructions describing the choice

task, a 24-page booklet with one choice pair per page, and a separate answer sheet. Instructions for the paper-and-pencil method were similar to instructions for the online method, except that students were asked to indicate job preferences by using a pencil to fill in circles on the answer sheet. Answer sheets were machine-scored.

To examine choice tasks of different levels of complexity, we constructed six- and ten-attribute choice designs. The six-attribute design consisted of the first six attributes from the ten-attribute design. We employed identical 24-pair choice tasks in test and retest phases. Some participants received the six-attribute task; others received the ten-attribute task. Table 1 shows the job attributes and their levels.

To design the choice tasks, we worked with the set of ten job attributes and their levels. Side-

by-side choice set pairs were generated so that all attribute levels for the left-hand member of a pair would be different from attribute levels of the right-hand member of the pair. We assured within-choice-pair balance by having levels within attributes matched equally often with one another. Using the PLAN and OPTEX procedures of the Statistical Analysis System (SAS), we generated a partially balanced factorial design. The design was balanced for main effects, with each level of each attribute occurring equally often on the left- and right-hand-sides of choice pairs. The design was balanced for binary attribute two-way interaction effects, with both levels of a binary attribute occurring equally often with both levels of all other binary attributes. Two-way interactions between the income attribute and all binary attributes were also balanced in the design. Because the six-attribute design was a subset of the ten-attribute design, balance for the ten-attribute design applied as well to the six-attribute choice design.

We employed a follow-up survey, which was administered online to all participants after the

retest. Self-reported day and time of test and retest, demographic data, and task preferences were noted in the follow-up survey. Students were also encouraged to make open-ended comments about the task.

124 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 139: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Attribute Levels

Annual salary $35,000 $40,000 $45,000 $50,000 Typical work week 30 hours 40 hours 50 hours 60 hours Schedule Fixed work hours Flexible work hours Annual vacation 2 weeks 4 weeks Location Small city Large city Climate Mild, small seasonal changes Large seasonal changes Type of organization Work in small organization Work in large organization Type of industry High-risk, growth industry Low-risk, stable industry Work and life On call while away from work Not on call while away from work Signing bonus No signing bonus $5,000 signing bonus

Table 1: Job Attributes and Levels

Methods of administration in the test and retest phases defined a between-subjects factor in

our study. Data from participants receiving the same method of administration in test and retest (online test and retest or paper-and-pencil test and retest) provided information about the reliability of measures. Data from participants receiving both methods of administration (online test followed by paper-and-pencil retest or paper-and-pencil test followed by online retest) provided information about the comparability of methods. Methods of administration were

1252001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 140: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

completely crossed with the other between-subjects factor, task complexity (low complexity at six attributes or high complexity at ten attributes), to yield eight treatment conditions.

We recruited student volunteers from an undergraduate course in marketing management at the University of Wisconsin-Madison. We assembled packets with instructions and treatment materials for the test phase. Packets for the eight treatment conditions were randomly ordered. and distributed to students at the conclusion of their Monday classes. On Wednesday of the same week, students went to a central location, returned their test-phase packets and received packets for the retest phase and follow-up survey. The incentive for students, five points toward their grade in the class, was provided only to those who completed the test, retest, and follow-up surveys.

Table 2 shows response rates across the eight treatment conditions. In three out of four test-retest conditions, we observed higher response rates for participants who received the six-attribute task than for participants who received the ten-attribute task. This is not surprising given that the six-attribute task can be expected to require less effort to complete.

Response

Methods of Task Number Number Rate Administration Complexity Distributed Completed (Percent

Complete) Online-Online Six Attributes 48 30 62.5 Online-Paper Six Attributes 56 43 76.8 Paper-Online Six Attributes 51 44 86.3 Paper-Paper Six Attributes 57 49 86.0 Online-Online Ten Attributes 55 37 67.3 Online-Paper Ten Attributes 50 36 72.0 Paper-Online Ten Attributes 50 36 72.0 Paper-Paper Ten Attributes 48 31 64.6

Table 2: Response Rates by Methods of Administration and Task Complexity

(Percent Completing Test, Retest, and Follow-up Questionnaire)

Our analysis of reliability and comparability focused upon the number of choices in agreement between test and retest. The number of choices in agreement represented a simple response variable to compute and understand. We would expect other choice-based measures, such as estimates of job choice share, attribute importance, or utility, to follow a similar pattern of results. That is, if we see a high number of choices in agreement between test and retest, then we expect derivative measures to have high agreement between test and retest.

126 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 141: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

RESULTS AND CONCLUSIONS

We observed high levels of test-retest reliability for both the online and paper-and-pencil methods. We also observed high levels of comparability between online and paper-and-pencil methods. Median numbers of choices in agreement were between 20 and 21 for a task involving 24 choice pairs. High levels of reliability and comparability were observed for both the six- and ten-attribute tasks. Within a classical hypothesis testing framework using generalized linear models and our completely crossed two-factor design (methods of administration by task complexity), we observed no statistically significant differences across treatments for the number of choices in agreement.

Table 3 shows summary statistics for the number of choices in agreement between test and retest across the eight treatment conditions. Figure 1, a histogram trellis, shows the study data for the number of choices in agreement between test and retest. Another view of the results is provided by the box plot trellis of Figure 2, which shows the proportion of choices in agreement between test and retest.

Methods of Task Number of Choices in Agreement

Administration Complexity Minimum Median Mean Online-Online Six Attributes 16 21 21.1 Online-Paper Six Attributes 10 21 20.6 Paper-Online Six Attributes 11 21 20.4 Paper-Paper Six Attributes 11 21 20.5 Online-Online Ten Attributes 8 20 19.5 Online-Paper Ten Attributes 15 20.5 20.4 Paper-Online Ten Attributes 11 20 19.6 Paper-Paper Ten Attributes 14 21 20.3

Table 3. Number of Choices in Agreement by Methods of Administration and Task Complexity

1272001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 142: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

0

10

20

30

40

Online-OnlineSix-Attribute Task

10 15 20

Online-PaperSix-Attribute Task

Paper-OnlineSix-Attribute Task

10 15 20

Paper-PaperSix-Attribute Task

Online-OnlineTen-Attribute Task

Online-PaperTen-Attribute Task

10 15 20

Paper-OnlineTen-Attribute Task

0

10

20

30

40

Paper-PaperTen-Attribute Task

10 15 20

Number of Choices in Agreement between Test and Retest

Per

cent

of T

otal

Figure1. Histogram Trellis for the Number of Choices in Agreement

0

10

20

30

40

Online-OnlineSix-Attribute Task

10 15 20

Online-PaperSix-Attribute Task

Paper-OnlineSix-Attribute Task

10 15 20

Paper-PaperSix-Attribute Task

Online-OnlineTen-Attribute Task

Online-PaperTen-Attribute Task

10 15 20

Paper-OnlineTen-Attribute Task

0

10

20

30

40

Paper-PaperTen-Attribute Task

10 15 20

Number of Choices in Agreement between Test and Retest

Per

cent

of T

otal

Figure1. Histogram Trellis for the Number of Choices in Agreement

Six-Attribute Task

Ten-Attribute Task

Online-Online

0.4 0.6 0.8 1.0

Online-Paper

0.4 0.6 0.8 1.0

Paper-Online

0.4 0.6 0.8 1.0

Paper-Paper

0.4 0.6 0.8 1.0

Proportion of Choices in Agreement

Figure 2. Box Plot Trellis for Proportion of Choices in Agreement

Six-Attribute Task

Ten-Attribute Task

Online-Online

0.4 0.6 0.8 1.0

Online-Paper

0.4 0.6 0.8 1.0

Paper-Online

0.4 0.6 0.8 1.0

Paper-Paper

0.4 0.6 0.8 1.0

Proportion of Choices in Agreement

Figure 2. Box Plot Trellis for Proportion of Choices in Agreement

We conclude that online and paper-and-pencil tasks are highly reliable and highly comparable under favorable conditions. What are “favorable conditions”? In the present study, the task was simple, involving choice pairs with at most ten attributes. The task was short, involving 24 identical choice pairs in test and retest phases. The task was relevant (job choices for students)

128 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 143: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

and the participants were motivated (by extra credit points). The participants were Web-savvy and had access to computer lab assistance. Finally, the time between test and retest was short, fewer than three days for most participants.

The reader might ask, “If you could not demonstrate high levels of reliability and comparability under such favorable conditions, where could you demonstrate it?” This is a reasonable question. Our research is just beginning, an initial study in comparability. Additional research is needed to explore reliability and comparability across more difficult choice tasks, both in terms of the number of alternative profiles in each choice set and in terms of the number of attributes defining profiles. Further research should also examine the performance of less Web-savvy, non-university participants.

1292001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 144: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

ACKNOWLEDGMENTS Many individuals and organizations contributed to this research. Bryan Orme of Sawtooth

Software provided initial consultation regarding previous conjoint and choice study research. InsightTools and MarketTools provided access to online software tools and services. Holly Pilch and Krista Sorenson helped with the recruitment of subjects and with the administration of the choice-based experiment. Caitlyn A. Beaudry, Janet Christopher, and Nicole Kowbel helped to prepare the manuscript for publication. REFERENCES Best, S.J., Krueger, B., Hubbard, C., and Smith, A. (2001). ``An Assessment of the

Generalizability of Internet Surveys,'' Social Science Computer Review, 19(2), Summer, 131-145.

Couper, M. P. (2000). ``Web Surveys: A Review of Issues and Approaches,'' Public Opinion

Quarterly, 64:4, Winter, 464-494. Miller, T. W. (2001). ``Can We Trust the Data of Online Research?'' Marketing Research,

Summer, 26-32. Miller, T. W. and Dickson, P. R. (2001). ``On-line Market Research,'' International Journal of

Electronic Commerce, 3, 139-167. Miller, T.W. and Gupta, A. (2001). Studies of Information, Research, and Consulting Services

(SIRCS): Fall 2000 Survey of Organizations, A.C. Nielsen Center for Marketing Research, Madison, WI.

Miller, T.W. and Panjikaran, K. (2001). Studies in Comparability: The Propensity Scoring

Approach, A.C. Nielsen Center for Marketing Research, Madison, WI.

Witt, Karlan J. (2000). ``Moving Studies to the Web,'' Sawtooth Software Conference Proceedings, 1-21

130 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 145: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

TRADE-OFF STUDY SAMPLE SIZE: HOW LOW CAN WE GO?1

Dick McCullough MACRO Consulting, Inc.

ABSTRACT The effect of sample size on model error is examined through several commercial data sets,

using five trade-off techniques: ACA, ACA/HB, CVA, HB-Reg and CBC/HB. Using the total sample to generate surrogate holdout cards, numerous subsamples are drawn, utilities estimated and model results compared to the total sample model. Latent class analysis is used to model the effect of sample size, number of parameters and number of tasks on model error.

INTRODUCTION Effect of sample size on study precision is always an issue to commercial market researchers.

Sample size is generally the single largest out-of-pocket cost component of a commercial study. Determining the minimum acceptable sample size plays an important role in the design of an efficient commercial study.

For simple statistical measures, such as confidence intervals around proportions estimates, the effect of sample size on error is well known (see Figure 1). For more complex statistical processes, such as conjoint models, the effect of sample size on error is much more difficult to estimate. Even the definition of error is open to several interpretations.

Figure 1

1 The author wishes to thank Rich Johnson for his invaluable suggestions and guidance during the preparation of this paper.

The author also thanks POPULUS and The Analytic Helpline, Inc. for generously sharing commercial data sets used in the analysis.

Sample Error for Proportions at 50%

0

5

10

15

20

25

0 200 400 600

Sample Size

Con

fiden

ce In

terv

al

Confidence Interval

1312001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 146: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Many issues face practitioners when determining sample size: • Research objectives • Technique • Number of attributes and levels • Number of tasks • Expected heterogeneity • Value of the information • Cost and timing • Measurement error • Structure and efficiency of experimental design:

o Fixed designs o Blocked designs o Individual-level designs

Some of these issues are statistical in nature, such as number of attributes and levels, and

some of these issues are managerial in nature, such as value of the information, cost and timing. The commercial researcher needs to address both types of issues when determining sample size. Objectives

The intent of this paper is to examine a variety of commercial data sets in an empirical way to see if some comments can be made about the effect of sample size on model error. Additionally, the impact of several factors: number of attributes and levels, number of tasks and trade-off technique, on model error will also be investigated. Method

For each of five trade-off techniques, ACA, ACA/HB, CVA, HB-Reg, and CBC/HB, three commercial data sets were examined (the data sets for ACA, and CVA also served as the data sets for ACA/HB and HB-Reg, respectively). Sample size for each data set ranged between 431 and 2,400.

Since these data sets were collected from a variety of commercial marketing research firms, there was little control over the number of attributes and levels or the number of tasks. Thus, while there was some variation in these attributes, there was less experimental control than would be desired, particularly with respect to trade-off technique.

132 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 147: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 1

Attr Lvls Pars Tasks df SS CBC/HB Data set 1 4 14 11 8 -3 612 Data set 2 6 17 12 18 +6 422 Data set 3 5 25 21 12 -9 444 CVA,HB-Reg Data set 1 6 24 19 30 +11 2,400 Data set 2 4 9 6 10 +4 431 Data set 3 6 13 8 16 +8 867 ACA,ACA/HB Data set 1 25 78 54 782 Data set 2 5 24 20 500 Data set 3 17 63 47 808

Notice in Table 1 above that the number of parameters and number of tasks are somewhat

correlated with trade-off technique. CBC/HB data sets tended to have fewer degrees of freedom (number of tasks minus the number of parameters) than CVA data sets. ACA data sets had a much greater number of parameters than either CBC/HB or CVA data sets. These correlations occur quite naturally in the commercial sector. Historically, choice models have been estimated at the aggregate level while CVA models are estimated at the individual level. By aggregating across respondents, choice study designers could afford to use fewer tasks than necessary for estimating individual level conjoint models. Hierarchical Bayes methods allow for the estimation of individual level choice models without making any additional demands on the study’s experimental design. A major benefit of ACA is its ability to accommodate a large number of parameters.

For each data set, models were estimated using a randomly drawn subset of the total sample, for the sample sizes of 200, 100, 50 and 30. In the cases of ACA and CVA, no new utility estimation was required, since each respondent’s utilities are a function of just that respondent. However, for CBC/HB, HB-Reg and ACA/HB, new utility estimations occurred for each draw, since each respondent’s utilities are a function of not only that respondent, but also the “total” sample. For each sample size, random draws were replicated up to 30 times. The number of replicates increased as sample size decreased. There were five replicates for n=200, 10 for n=100, 20 for n=50 and 30 for n=30. The intent here was to stabilize the estimates to get a true sense of the accuracy of models at that sample size.

Since it was anticipated that many, if not all, of the commercial data sets to be analyzed in this paper would not contain holdout choice tasks, models derived from reduced samples were compared to models derived from the total sample. That is, in order to evaluate how well a smaller sample size was performing, 10 first choice simulations were run for both the total sample model and each of the reduced sample models, with the total sample model serving to generate surrogate holdout tasks. Thus, MAEs (Mean Absolute Error) were the measure with which models were evaluated (each sub-sample model being compared to the total sample model). 990 models (5 techniques x 3 data sets x 66 sample sizes/replicate combinations) were

1332001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 148: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

estimated and evaluated. 9,900 simulations were run (990 models x 10 simulations) as the basis for the MAE estimations.

Additionally, correlations were run, at the aggregate level, between the mean utilities from each of the sub-sample models and the total sample model. Correlation results were reported in the form 100 * (1-rsquared), and called, for the duration of this paper, mean percentage of error (MPE).

It should be noted that there is an indeterminacy inherent in conjoint utility scaling that makes correlation analysis potentially meaningless. Therefore, all utilities were scaled so that the levels within attribute summed to zero. This allowed for meaningful correlation analysis to occur.

SAMPLE BIAS ANALYSIS

Since each sub-sample was being compared to a larger sample, of which it was also a part, there was a sample bias inherent in the calculation of error terms.

Several studies using synthetic data were conducted to determine the magnitude of the

sample bias and develop correction factors to adjust the raw error terms for sample bias.

Sample Bias Study 1 For each of four different scenarios, random numbers between 1 and 20 were generated 10

times for two data sets of sample size 200. In the first scenario, the first 100 data points were identical for the two data sets and the last 100 were independent of one another. In the second scenario, the first 75 data points were identical for the two data sets and the last 125 were independent of one another. In the third scenario, the first 50 data points were identical for the two data sets and the last 150 were independent of one another. And in the last scenario, the first 25 data points were identical for the two data sets and the last 175 were independent of one another.

The correlation between the two data sets, r, approximately equals the degree of overlap, n/N, between the two data sets (Table 2).

134 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 149: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 2

N=200 n= 100 75 50 25

0.527451 0.320534 0.176183 0.092247 0.474558 0.411911 0.255339 0.142685 0.611040 0.310900 0.226798 0.111250 0.563223 0.287369 0.223945 0.194286 0.487692 0.398193 0.368615 0.205507 0.483789 0.473380 0.229888 -0.095050 0.524381 0.471472 0.288293 0.250967 0.368708 0.274371 0.252346 0.169203 0.446393 0.401521 0.245936 0.109158 0.453217 0.389331 0.139375 0.184337

r= 0.494045 0.373898 0.240672 0.136459 n/N = 0.500000 0.375000 0.250000 0.125000

Sample Bias Study 2 To extend the concept further, a random sample of 200 was generated, a second sample of

100 was created where each member of the second sample was equal to a member of the first sample and a third sample of a random 100 was generated, independent of the first two.

For each of the three samples, the mean was calculated. This process was replicated 13 times and the mean data are reported below (Table 3).

The absolute difference (MAE) between the first two data sets is 0.147308 and the absolute difference between the first and third data sets is 0.218077. By dividing the MAE for the first two data sets by the finite population correction factor (sqrt(1-n/N)), the MAEs become quite similar.

1352001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 150: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 3

N=200 n=100 n=100 11.07500 11.18000 9.54000 10.27500 10.15000 11.15000 10.85000 11.15000 10.62000 10.59500 10.51000 10.81000 9.99000 9.92000 10.88000 9.73500 10.11000 11.19000 10.55500 11.30000 11.43000 11.44000 11.68000 10.88000 10.41000 10.33000 9.37000 10.13000 10.55000 10.87000 10.34000 9.84000 11.23000 10.29500 10.86000 11.46000 10.85500 10.88000 9.95000 10.50346 10.65077 10.72154

MAE 0.147308 0.218077MAE/sqrt(1-n/N) = 0.208325

Sample Bias Study 3 To continue the extension of the concept, a random sample of 200 was generated, a second

sample of 100 was created where each member of the second sample was equal to a member of the first sample and a third sample of a random 100 was generated.

The squared correlation was calculated for the first two samples and for the first and third samples. This procedure was replicated 11 times. The 11 squared correlations for the first two samples were averaged as were the 11 squared correlations for the first and third samples.

MPEs were calculated for both mean r-squares (Table 4). The MPE for the first two sample is substantially smaller than the MPE for the first and third samples. By dividing the MPE for the first two samples by the square of the finite population correction factor (1-n/N), the MPEs become quite similar.

Note that it is somewhat intuitive that the correction factor for the MPEs is the square of the correction factor for the MAEs. MPE is a measure of squared error whereas MAE is a measure of first power error.

136 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 151: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 4

ns=100 n(R)=100 rsq= 0.603135 0.099661rsq= 0.648241 0.048967rsq= 0.357504 0.111730rsq= 0.303370 0.099186rsq= 0.790855 0.178414rsq= 0.883459 0.379786rsq= 0.829014 0.182635rsq= 0.477881 0.270630rsq= 0.798317 0.010961rsq= 0.425018 0.462108rsq= 0.785462 0.003547average rsq= 0.627478 0.167966MPE= 37.252220 83.203400MPE/(1-n/N)= 74.504450

Sample Bias Study 4 Finally, the synthetic data study below involves more closely replicating the study design

used in this paper. Method The general approach was:

• Generate three data sets o Each data set consists of utility weights for three attributes o Utility weights for the first and third data sets are randomly drawn integers

between 1 and 20 o Sample size for the first data set is always 200 o Sample size for the second and third data sets varies across 25, 50 and 100 o The second and third data sets always are of the same size o The second data set consists of the first n cases of the first data set, where n = 25,

50 or 100 • Define either a two, three, four or five product scenario • Estimate logit-based share of preference models for each of the three data sets,

calculating shares at the individual level, then averaging • Calculate MAEs for each of the second and third data sets, compared to the first, at the

aggregate level • Calculate MPEs (mean percent error = (1- rsq(utils-first data set, utils-other data

set))*100) for each of the second and third data sets, compared to the first, at the aggregate level

• Redraw the sample 50 times for each scenario/sample size and make the above calculations

• Calculate mean MAEs and MPEs for each of 50 random draws for each model • 36 models (3 data sets x 4 market scenarios x 3 sample sizes)

1372001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 152: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Note: Empirically, the ratio of random sample MAE to overlapping sample MAE equals the scalar that corrects the overlapping sample MAE for sample bias. Similarly for MPE. The issue, then, is to develop a formula for the correction factor that closely resembles the ratio of random sample error/overlapping sample error. CONCLUSION

As suggested by Synthetic Data Study 2, the formula (1/(1-percent overlap))^0.5 may represent the desired scalar for correction for MAE. Similarly, as suggested by Synthetic Data Study 3, the formula 1/(1-percent overlap) may represent the desired scalar for correction for MPE:

Table 5

MAE Percent Overlap (1/1-%overlap)^0.5 random/overlap 12.5% (n=25) 1.07 1.17 25% (n=50) 1.15 1.32 50% (n=100) 1.41 1.56

MPE

Percent Overlap 1/1-%overlap random/overlap 12.5% (n=25) 1.14 1.18 25% (n=50) 1.33 1.84 50% (n=100) 2.00 2.95

138 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 153: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 2

Theoretical vs. Empirical Adjustment Factors

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 20 40 60 80 100 120

Sample Size

MA

E 1/(1-%)^0.5E(Rn)/E(n)

Figure 3

Theoretical vs. Empirical Adjustment Factors

3.5

3

2.5

0

0

1

1

2

0

MPE

.5

.5

0 20 40 60 80 100 12

Sample Size

1/(1-%)E(Rn)/E(n)

1392001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 154: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Additional conclusions: • There is a definite bias due to overlapping sample, both in MAE and MPE. • This bias appears to be independent of th lations (see

T 6 an• is d ted i the se• The amount of bias is dif

MAE

n=25 0.077167327 0.065818796 0.04373055 0.03523201 0.055487 n=R25 0.081864359 0.078921973 0.05091588 0.04456126 0.064066

0.046603952 0.02920880 0.02401724 0.035367

Table 7

MPE two products three products four products five products mean

n=25 0.707187240 0.687751783 0.856957370 0.664759341 0.729164 n=R25 0.785403871 0.870813277 0.869094024 0.884405920 0.852429

0.273622

n=R50 0.437063715 0.55490602 0099 0.552845437 0.499586

n=100 0.094198823 0.096766941 0.123103025 0.099623936 0.103423 n=R100 0.281835972 0.335639163 0.490892078 0.296887426 0.351314

mple ze entify the ax

e number of products in the simuab s le d 7).

The bias directly related to the perceferent for MAE and MPE.

nt of the first ata set duplica n cond.

Table 6

two products three products four products five products mean

n=50 0.041639879

n=R50 0.057030973 0.057865728 0.03926258 0.03140596 0.046391

n=100 0.024216460 0.024658317 0.01847943 0.01383198 0.020297 n=R100 0.033042464 0.040345804 0.02954819 0.02281538 0.031438

n=50 0.242856551 0.312908934 0.292542572 0.246179851

7 0.45353

SAMPLE SIZE STUDY RESULTS

Referring to the error curve for proportions once again (Figure 1), a natural point to search for in the error curve would be an elbow. An elbow would be a point on the curve where any ncrease in sample size would result in a declining gain in precision and any decrease in sai

si would result in an increasing loss in precision. This elbow, if it exists, would idimally efficient sample size. m

140 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 155: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Visually, and intuitively, an elbow would appear as noted in Figure 4.

Figure 4

Sample Curve With Elbow

0

5

10

15

20

25

0 200 400 600 800 1000

Sample Size

Erro

r

To formally identify an elbow, one would need to set the third derivative of the error function to zero. It is easy to demonstrate that, for the proportions error curve, the third derivative of the error function cannot be zero. Therefore, for a proportions error curve, an elbow does not exist.

Below in Figure 5 and in Figure 7, the error curves for both the MAE and MPE error terms have been plotted for the aggregate data, that is, for all five techniques averaged together. In Figures 6 and 8, the error curves for each trade-off technique has been plotted separately.

The MAE curves are all similar in shape to one another as are the MPE curves.

Visually, the MAE curves appear to be p te to 1/sqrt(n) and the MPE curves appear to proportionate to 1/n. By regressing the log of mple size it can be confirmed that the aggregate MAE is indeed proportionate to 1/sqrt(n) and the aggregate MPE proportionate to 1/n (coefficients of –0.443 and –0.811, respectively).

The third derivative of both 1/sqrt(n) and 1/n can never equal zero. Therefore, neither of these error curves can have an elbow.

Figure 5

roportionathe error against the log of sa

1412001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 156: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 6

Grand Mean MAE's

0

1

2

3

4

5

6

7

8

0 50 100 150 200 250

Sample Size

MA

E

Grand MAE

CBC/HB MAE

CVA MAE

HBReg MAE

ACA MAE

ACA/HB MAE

Grand Mean

0

1

2

3

4

5

6

0 50 100 150 200 250

Sample Size

MA

E

Grand MAE

142 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 157: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Figure 7

Grand Mean

6

7

3

4

5

MPE MPE

Proportions

2

1

00 50 100 150 200 250

Sample Size

Figure 8

Grand Mean MPE's

0

10

12

8 Grand MPECBC/HB MPECVA MPE

6

MPE

HBReg MPEACA MPEACA/HB4

2

0 50 100 150 200 250

Sample Size

MPE

curves, it is left to the researcher, just as it is with proportions curves, to determine the level of error that is acceptable.

Using the aggregate MAE and MPE curves as surrogate formulae, tables of error terms as a

function of sample size have been constructed below. Given that no elbow exists for these

1432001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 158: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

There is substantial increase in precision (or decrease in error) when increasing sample from 30 to 50, both for MAE and MPE. There is also substantial increase in precision in terms oMAE and MPE when increasing sample size from 50 to 75. However, the amount of increased precision may become less relevant to many commercial studies when increasing sample size beyond 75 or 100.

Ta

f both

ble 8

Sample Size

Estimated MAE by Sample Size

MAE 30 5.8 75 3.9 100 3.5 125 3.2 150 3.0

Sample Size

50 4.6

175 2.7 200 2.5

Table 9

Estimated MPE by Sample Size

MPE 30 6.4

3.9

rror term is based on total sample utilities computed with a ple, the CVA

50

75 2.7 100 2.0

125 1.6 150 1.4 175 1.2 200 1.0

A careful review of Figures 6 and 8 will reveal a pattern of error terms which might suggest that certain trade-off techniques generate lower or higher model error terms than others. This conclusion, at least based on the data presented here, would be false. Each e

given trade-off technique. Thus, for exam

144 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 159: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

MP correlating them with the CVA-generated mean

util ed

on one set of “holdouts” (total sample CVA-bas

poi

MAE

E at a sample size of 100 is determined by taking the CVA-generated mean utilities from the five replicates of the 100 sub-sample and

ities for the total sample. Similarly, for HB-Reg, the sub-sample mean utilities are correlatwith the total sample mean HB-Reg utilities. Even though the underlying data are exactly the same, MPEs for the CVA sub-samples are based

ed utilities) while the MPEs for the HB-Reg sub-samples are based on an entirely separate and different set of “holdouts” (total sample HB-Reg-based utilities). Because the reference

nts for calculating error are not the same, conclusions contrasting the efficiency of the different trade-off techniques cannot be made.

To illustrate how different the total sample models can be, MAEs were calculated comparing the total sample CVA-based models with the total sample HB-Reg-based models for three data sets.

6.7

These MAEs are larger than most of the M lated using much smaller sample sizes. Thus, while we cannot compar an conclude that different trade-off techniques can generate substantially different results.

Having said the above, it is still interesting to note that bo CA and ACA/HB utilities and owed ble st t low s izes des he burden of a very large number of parameters to estimate; much larger number of parameters than any other of the tec

LAT S MO The above analysis is based upon a data set of 1,950 data points, 975 data points for each

err E and MPE. Excluding ACA data, there were 585 data points for each error term.

Latent Class models were run on these data to re the on m r of ssiz f attri d leve essed as er of ers) er ofAC e excl m the lass b e f lly dnat to CV BC.

A var f mo s wer d, wi lest ror regressed against s size dels ed t w orm

MAE = k*(sqrt(P b))) and = k Tb)

Data set 1 7.7 Data set 2 6.5 Data set 3

AEs calcue error terms as calculated here, we c

th the Amodels sh remarka ability a ample s pite t

hniques.

ENT CLAS DELS

or term, MA

explo numb

impact paramet

odel erroand numb

ample tasks. e, number o

A data werbutes an ls (expruded froA and C

latent c modeling ecause of th undamenta ifferent ure of ACA

iety o del form. The mo

e explorethat yield

beginning the best fi

th the simpere of the f

, such as er: ample

c/(na*T

MPE *Pc/(na*

1452001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 160: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Where P is the number of parameters, n size, T is number of tasks and k, c, a and b are coefficients estimated by the model.

The k coefficient in the MA rent from 1 and therefore effectively drops out of the equation.

MA PE models, latent class regressions were run for solutions with up to 12 classes. In both cases, the two class solution d to h ptimal BIC number.

Also in both models, sample size (n) and num sks (T) were class independent while num meters was class dependent. In both els, al dependent variables were highly significant.

It is in ing to t the ective te attri s, for the MAE model, trade-off technique (CBC/HB, CVA, HB-Reg). In that model, CBC/HB data points and HB-Reg data points tended to be members of the same class while CVA data points tended to be clas other class.

For the MPE model, the most effective covariate was data type (CBC, CVA), which would, by definitio data points t leav HB dpoints in the other class.

Table 10 MAE 2-Latent Class Model Output

Latent Variabl (gamma)

Interc -0.0395 0.0395 0.0119 0.91

Covariates Class1 Class2 Wald p-valueTechnique

0 -0 7.9449 0.019CVA -1.8192 1.8192

HB-Reg 0.907 -0.907

ue

905.8524 2.00E-197 3.77E+01 8.30E-10

.

.

16

0.5751 0.4249

is sample

E model was not significantly diffe

For both the E and Mprove ave the o

ber of taber of para mod l three in

terest note tha most eff covaria bute wa

sified in the

n group CVA data points and HB-Reg ogether, ing CBC/ ata

e(s)Class1 Class2 Wald p-value

ept

CBC/HB .9122 .9122

Dependent Variable (beta)

Class1 Class2 Wald p-value Wald(=) p-vallogAdjVal

1.5988 1.2358Predictors

logn-0.4166 -0.4166 481.2585 1.10E-106 0.00E+00

logT-0.2255 -0.2255 41.0711 1.50E-10 0.00E+00

logP0.1471 0.3588 62.4217 2.80E-14 14.2287 0.000

Class1 Class2Class Size

146 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 161: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 11

MPE 2-Latent Class Model Output

Latent Variable(s) (gamma)

Intercept 1.0976 -1.0976 4.7383 0.03

ovariates Class1 Class2 Wald p-value

7

-42

79.3514 5.20E-19 0 .

.

O

However, using the aggregate error tables as a guide, sample sizes of approximately 75 to 100 appear to be sufficient to provide reasonably accurate models. Larger sample sizes do not provide a substantial improvement in model error. If fact, sample sizes as low as 30 provided larger but not unreasonable error terms, suggesting that, in some instances, small sample sizes may be appropriate.

These data do not suggest that sample size needs to be larger for any trade-off technique relative to the others. Specifically, HB methods do not appear to require greater sample size than traditional methods.

Class1 Class2 Wald p-value

CDataType

CBC 1.2901 -1.2901 6.6741 0.0098CVA -1.2901 1.2901

Dependent Variable (beta)

Class1 Class2 Wald p-value Wald(=) p-valuelogAdjVal

0.7849 2.9455 575.3608 1.20E-125 252.1947 8.60E-5Predictors

logP2.0587 0.1556 499.858 2.90E-109 186.1446 2.20E

logT-0.7422 -0.7422

logn-0.9422 -0.9422 467.1816 1.30E-103 0

Class1 Class2Class Size 0.6005 0.3995

C NCLUSIONS Minimum sample size must be determined by the individual researcher, just as is the case

with simple proportions tests. There is no obvious “elbow” in the error curve which would dictate a natural minimum sample size.

1472001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 162: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

In addition s

to sample size, both the number of tasks and the number of parameters being timated play a significant role in the size of model error. An obvious conclusion from this

on

ec ber of parameters estimated.

IS

inclusion of more data points would provide better estimates of the shape of the error curve. ore replicates at lower samples sizes would provide more stability. MSE (Mean Squared Error)

The most serious limitation to this paper is the absence of objective standards, that is, holdout

cards. Ideally, holdout cards and also attributes and levels would be identical across trade-off techniques. This would require custom designed studies for the purpose of sample size research. An alternative to funding fieldwork for a non-commercial study would be to construct synthetic data sets based on the means and co-variances of existing, commercial data sets. If the synthetic data sets were constructed, the sample bias problem would be eliminated, a variety of sample sizes could be independently drawn and attribute co-linearity, which commonly exist in commercial data sets, would be maintained.

There are other factors that may affect model error. The number of tasks may have a non-linear relationship to model error. Increasing the number of tasks increases the amount of information available to estimate the model. Excessive number of tasks, however, may increase respondent fatigue to the point of offsetting the theoretical gain in information. Many aspects of measurement error, such as method of data collection (online vs telephone vs mall intercept), use of physical or visual exhibits, interview length, level of respondent interest, etc. may all play a role in model error that could affect the ultimate decision regarding sample size.

The ultimate question that remains unanswered is, what is the mathematics behind model error? If a formula could be developed, as exists for proportions, researchers could input various study parameters, such as number of tasks, number of parameters, sample size, etc. and chart the error term by sample size. They could then make an informed decision, weighing both the technical and managerial aspects, and select the sample size most appropriate for that situation.

efinding is that when circumstances dictate the use of small sample sizes, the negative effectsmodel precision can be somewhat offset by either increasing the number of tasks and/or

reasing the numd These results appear consistent for both error terms calculated for this study: MAE and MPE.

CUSSION D There are many aspects of this study which could be improved in future research. The

Mcould be included as an additional error term that may prove to be more sensitive than MAE.

148 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 163: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES Johnson, Richard (1996), “Getting the Most From CBC – Part 1,” Sawtooth Software, Inc.,

Sequim, WA. Johnson, Richard M. and Bryan K. Orme (1996), “How Many Questions Should You Ask In

Choice-Based Conjoint Studies?” 1996 Advanced Research Techniques Forum Proceedings, American Marketing Association.

Moskowitz, Howard (2000), “Stability of the Mean Utilities In Hybrid Conjoint Measurement,”

2000 Advanced Research Techniques Forum Poster Presentation, American Marketing Association.

Orme, Bryan (1998), “Sample Size Issues for Conjoint Analysis Studies,” Sawtooth Software,

Inc., Sequim, WA

1492001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 164: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

150 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 165: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

THE EFFECTS OF DISAGGREGATION WITH PARTIAL PROFILE HOICEEXPERIMENTS

C

nell

Fridley er up

MarketVision Research

provements in predictive ever, some researchers have observed that

ual differences. owever, many techniques commonly used fail to allow responses to differ across respondents. wo relevant example

MarkeHis nduc aggregate level,

possibl t, “In practic evel of aggregation, probably because of the constraints surrounding availability of data.” If individual differences are considered in market response modeling, they are most often inc

parameters (Rosenberg, 1973), but these methods were not widely dopted.

Customtisfaction, typically produces a regression

coe

1

Jon Pin

President & COOMarketVision Research

Lisa Research Manag

Marketing Sciences Gro

ABSTRACT Recently, hierarchical Bayes has been shown to produce im

validity over aggregate logit models. Howhierarchical Bayes when used with partial profile choice tasks may decrease predictive validity relative to an aggregate logit analysis. The authors explore the internal predictive validity of partial profile choice tasks with disaggregation, comparing aggregate logit and hierarchical Bayes on several partial profile datasets. THE BENEFITS OF DISAGGREGATION

Researchers have long discussed the benefits of considering individHT s include:

t Response odM els torically, market response modeling has been co ted at they by market or store. Hanssens, Parsons and Schultz (1990) commene, model building efforts appear to exclude considerations about the l

luded via entity specific intercept terms rather than entity specific slope (elasticity) terms. In the 1970s, Markov techniques had been suggested that allow for individual specific elasticitya

er Satisfaction Research Derived importance, as is common with safficient for each attribute (or driver). These coefficients are commonly interpreted as

1 The authors wish to thank Andrew Elder of Momentum Research Group, Chris Goglia of American Power Conversion, andTom Pilon of TomPilon.com, for each kindly sharing datasets for this research. W

e also wish to thank Ying Yuan of

MarketVision Research for producing the simulated choice results.

1512001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 166: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

the n. Crouch and Pinnell (2001) show that accounting for respondent

eterogeneity (in this example using latent class analysis) produces derived importances wit u atory power.

s s-based conjoint would commonly promote the ability to derive ind d hniques. As the research community’s focus shif of individual level utilities was begrudgingly

duced pted a wide variety of solutions to maintain nd

regate choice data. These include (all from git, latent class, ICE, and hierarchical Bayes. Hierarchical Bayes

e previous work was to compare improvement of predictive ability when implementing utility balance (UB) relative to improvement of pre tive ability when implementing hierarchical Bayes (HB of UB is not rel vant to the current work, but the findings related to HB certainly are. he followin table summ rizes the hit te of severa studies com aring the

a s from logit to the hit rates ing hierarchical Bayes.

Agg. Logit Hier. Bayes Improve t

importance of that attribute, or put another way, its relative influence on overall satisfactioh

h s bstantially improved explan

Re earchers using ratingivi ual level utilities as a key benefit of such tec

ing, the goal ted towards discrete choice model to a nicety. Some researchers attemre

i ividual level estimation. For example, some researchers have suggested dual conjoint (Huisman) and reweighting (Pinnell, 1994) to leverage the benefit of ratings and choice based techniques.

Other advances have provided methods to disagg

Sawtooth Software): k-loappears to offer the greatest opportunity. Hierarchical Bayes with Discrete Choice

One of the current authors has presented a meta-analysis of commercial and experimental research studies comparing the ability to predict respondents’ choices when those choices were

ompared to hierarchical Bayes (Pinnell, 2000). The purpose of analyzed with aggregate logit cth

dic). The discussion e

T g a ra l phit r

te aggregate us

men tudy One 75.8% 9.5% 23.8% tudy Tw 9.5% 54.7%

tudy Thr % 2.6% 2.1% tudy Four 61.2% 79.37% 18.1%

Study 59.2% 78.8% 19.6%

er

S 9S o 24.8% 7S ee 60.5 6S

Five

The five studies shown above show substantial improvement in hit rates. However, the papcontinued to report on a sixth study. The findings from the sixth study show a very different result from HB.

Agg. Logit Hier. Bayes Improvement Study Six 71.9% 68.1% -3.8%

The conclusion still is that HB is generally beneficial. However, this one study demonstrates

HB can be deleterious, which is troubling. It is not surprising that HB could produce a result that was not clearly superior to aggregate logit. But in the face of heterogeneity and with enough information, we expected HB to provide beneficial results.

152 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 167: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

There are several possible explanations that could explain this anomalous result. They include:

• Not enough heterogeneity • Not enough information in the choice tasks • Poor hold-out tasks

with our beliefs, but there was little scientific xth study was the only one of the set that was

ot full profile. Is there some reason that HB with partial profile choice tasks will behave diff

Partial profile choice designs have emerged as methods of dealing with choice based studies that

elow:

• Confounding effects with UB experiments • Partial profile nature of tasks

At the time, the last explanation most resonated

basis for that claim. It is worth noting that the sin

erently than HB with full profile choice tasks? What is Partial Profile?

include a large number of attributes. In a partial profile choice design, respondents are shown, at one time, a subset of the full set of attributes. They are then asked to indicate their preferred concept based on this subset of attributes. This task is then repeated with the subset of attributes changing each time. An illustrative partial profile task is shown b

Full Profile Task Partial Profile Task

1 2 3 1 2 3

A1 A2 A3 A1 A2 A3 B4 B3 B1 D2 D1 D2 C3 C F3 F1 F2 D2 D E1 E F3 F G1 G

1 C2 1 D2 2 E4 1 F2 2 G2

Full profile designs, where respondents are exposed to the full set of attributes, work well when the number of attributes is relatively small. However, as the number of attributes becomes large, some researchers are concerned that the ability of respondents to provide meaningful data in a full profile choice task decreases. One particular concern is that respondents may oversimplify the task by focusing on only a few of the attributes. Thus, partial profile choice tasks are designed to be easier for the respondent.

Like full profile choice designs, partial profile choice designs allow the researcher to produceutilities either at the aggregate or disaggregate level. Chrzan (1999) has presented evidence that partial profile choice tasks are easier for respondents, and produce utilities with less error,

relative to full profile choice tasks.

1532001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 168: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

There is substantial precedent for partial profile tasks. ACA (Adaptive Conjoint Analysis) ia ratings-based c

s onjoint methodology that customizes each task based on each respondent’s prior

util m

wou

ING PARTIAL PROFILE TASKS

nvestigated the effects of disaggregating partial profile choice tasks. erred to the llowing: Lenk, DeSarbo, Green, and Young; Chrzan;

al.

, show mixed results and fail to provide clear direction regarding th partial profile choice tasks. Given the popularity of partial profile and HB,

e ork c a meta-ana sis of se ial prof choice datasets and compares th valid ggreg mpared to hierarch mpirical Results mpirical Results: Method

port on the findings of nine different studies. For each study we will report the ollowing design elements:

ber of respondents of attributes

Number of attributes shown per task rnatives per task

asks Number of parameters estimated.

review w e condu y post nalysis don’t have consistent hold-

ou ill, we a e criterion to evaluate th fectiveness of HB with art ile choi . The cr we use ates. W t consistent hold-out tasks cross all nine studies, even hit rates are difficult to produce. Rather, for each study we held out

and used all other tasks to estimate a set of utilities using aggregate logit (AL) and f utilities were used to calculate hit rates for the one held-

rocess of holding out one task, estimating two sets of utilities (AL & HB), and s repeated several times for each study (for a total of eight times).

we are dealing with randomized choice tasks, so we must use hits as would have preferred to use errors in share predictions as our criterion, but that

lso important to note that the studies analyzed here are a compilation from

ities. ACA also presents tasks that are partial profile. There have been suggestions frosome researchers that the partial profile tasks in ACA dampen the utility measures (Pinnell, 1994).

However, is there a compelling reason to expect that disaggregating partial profile tasks ld behave differently than full profile tasks?

DISAGGREGAT

Previous research has iader is refThe interested re fo

Huber; and Brazell et

These studies, taken togetherhe use of HB wit

th current w onducts ity of a

ly veral part ilee predictive ate logit co ical Bayes.

EE

We ref

• Num• Number•• Number of alte• Number of t•

For this e ar cting purel hoc a , so wet tasks. St re required to us

ce taskse som

iterione ef

ithoup ial prof is hit raone task

ierarchical Bayes. These two sets ohout task. This pcalculating hit rates wa

We must point out thatour criterion. Wewasn’t feasible. It is a

154 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 169: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

multiple researchers, each likely with differing design considerations and expectations for the se of the data.

of the f ing stu port th ates us lities estimated from aggregate logit and HB. We show the difference hit rates the standard error of the ifference. Finally, we show a t-value of the difference testing the null that it is zero. The tandard error and t-value take into account the correlated nature of our paired observations.

ily ordered from the smallest sample size to the largest.

s 3 ters estimated 33

St ngs

Hit Rate Aggregate

Hit Rate HB Difference

S Error t-ratio

64.1% 59.6% -4.5% .0147 -3.05

u

For each ollow dies, we re e hit r ing utiin the and

ds

The following studies are arbitrarStudy 1 Study 1 Design:

nts Number of responde 75 1Number of attributes 1

Number of attributes shown umber of tasks

5 6 N

Number of alternativeNumber of parame

u idy 1 Find

tandard

Study 2 Study 2 Design:

nts Number of responde 133 Number of attributes 7 Number of attributes shown

umber of tasks 4 15 N

Number of alternatives umber of parameters estimated 22

4 N

tudy 2 Findings: S

Hit Rate Ag e gregat

47.6%

Hit Rate HB

Difference

Standard Error

t-ratio

51.7% 4.0% .0170 2.38 Study 3 Study 3 Design: Number of respondents 1 66

1 Number of attributes tes shown

8Number of attribu 5

2 Number of tasks 0Number of alternatives Number of parameters estimated

4 18

1552001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 170: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Study 3 Findings:

Hit Rate Aggregate

Hit Rate HB

Difference

Standard Error

t-ratio

56.0% 56.5% 0.6% .0105 .54 Study 4 Study 4 Design: Number of respondents 167 Number of attributes 9 Number of attributes shown 5 Number of tasks 15 Number of alternatives 4 Number of parameters estimated 29 Study 4 Findings:

Hit Rate Aggregate

Hit Rate HB

Difference

Standard Error

t-ratio

52.2% 53.6% 1.4% .0170 .82 Study 5 Study 5 Design: Number of respondents 187 Number of attributes 7 Number of attributes shown 5 Number of tasks 15 Number of alternatives 4 Number of parameters estimated 23

tudy 5 Findings:

A Hit Rate

D S

46.6% 57.6% 11.0% .0148 7.41

S

Hit Rate ggregate HB

ifference

tandardError

t-ratio

Study 6

ts

tes shown

umber of parameters estimated 27

Study 6 Design: Number of responden 218 Number of attributes 12 Number of attribu 4 Number of tasks 10 Number of alternatives 3 N

156 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 171: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Study 6 Findings:

Hit Rate Aggregate

Hit Rate HB

Difference

Standard Error

t-ratio

73.4% 66.5% -6.9% .0095 -7.24 Study 7

umber of attributes w

tasks ltern

umber of paramete estimated

Findings

Hit Rate A

Hit Rate Standard

Study 7 Design:

umber of respondents 611 NN 17

n 5 Number of attributeNumber of

s sho17

Number of a atives 3 N rs 39 Study 7 :

ggregate HB Difference Error t-ratio 59.2% 57.0% -2.1% .0060 -3.58

Stu

9 5

Num

stimated 35

Stud

H RDifference Error t-ratio

15.8% .0106 14.84

dy 8 Study 8 Design: Number of respondents 791 Number of attributes Number of attributes shown

ber of tasks 13 Number of alternatives 5 Number of parameters e

y 8 Findings:

it ate Hit Rate Standard Aggregate HB

32.8% 48.6%

Stu

umber of tasks 16 Num

dy 9 Study 9 Design: Number of respondents 1699 Number of attributes 30 Number of attributes shown 5 N

ber of alternatives 3 Number of parameters estimated 49

1572001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 172: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Study 9 Findings:

Hit Rate Aggregate

Hit Rate HB

Difference

Standard Error

t-ratio

68.0% 65.7% -2.3% .0040 -5.67

Empirical Results: Summary of Findings The following table summarizes the findings from the nine studies. HB DOES WORSE HB DOESN’T HELP HB DOES BETTER Agg. Logit 73.4% 64.1% 68.0% 59.2% 56.0% 52.2% 47.6% 46.6% 32.8% Hier. Bayes 66.5% 59.6% 65.7% 57.0% 56.5% 53.6% 51.7% 57.6% 48.6%

Diff -6.9% -4.5% -2.3% -2.1% 0.6% 1.4% 4.0% 11.0% 15.8% Stderr 0.0095 0.0147 0.0040 0.0060 0.0105 0.0170 0.0170 0.0148 0.0106 t-ratio -7.24 - 0.54 2.38 7.41 14.84 3.05 -5.67 -3.58 0.82

Of the nine studies in o is detrimental to predictive validity in four of the nine cases, is not ben of the nine cases and for only three of the nine cases does HB show an im for individual prediction.

Based on the results from Pinnell (2000), we aren’t terribly surprised by these findings, but we do consider ubling e of HB with partial profile choice tasks. Next, we explore a number of elements th ight explain the findings.

ossible Explanations

it HB’s ability? ing?

.

e, one would not expect HB to outperform

n aggregate model. However, it is unlikely that preferences are homogeneous in most ight expect for any give gher the hit rate from aggregate logit,

e more homogeneous the population under investigation. It isn’t clear this is true in a meta-analysis such as this, for the following reasons.

First, the magnitude of hit rates is affected by the amount of dominance in the hold-out tasks. As the level of dominance increases, so will the hit rates.

ur analysis, we see that HBeficial in another two

provement over aggregate logit

them tro to the usat m

PThere are several possible explanations to the previous findings. We explore the following

three possibilities: • Is there enough heterogeneity for disaggregation to be beneficial? • Do specific design considerations lim• Are partial profile choice tasks susceptible to overfitt

Each is discussed below

H terogeneity As discussed above, in the absence of heterogeneity

acategories. One m n study, the hith

158 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 173: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Second, hit rates from hold-out tasks with two alternatives will have much higher expected hit rates than hold-outs with more alternatives. For example, a hold-out hit rate of 52% from pairs would not be impressive (the level of chance is 50%). However, if the hold-out tasks were

f five alternatives, a hold-out hit rate of 40% would be meaningful (twice the level of chance, which would be 20%).

to improve on hit rates in the mid 70 percent range.

ing design characteristics and results.

Number of tasks 10 er of alternatives 3

eters estimated 27

HB

Difference Standard

Error

t-ratio -6.9% .0095 -7.24

he aggregate model produced the following model fit:

-2LL = 3480

By accounti es (Swait and Louviere) between the three segments, the best improveme omplish was:

= 3448

o

Not unrelated to the points outlined above, some might be concerned that there is a headroom

problem. Recall, however, that in the full profile case reported previously and summarized above, HB was able

To further explore whether a lack of heterogeneity might explain the findings of HB, we explore one study in more detail. Specifically, we explore the study with the best hit rate fromaggregate logit.

Recall this study had the follow

Number of respondents 218 Number of attributes 12 Number of attributes shown 4

NumbNumber of param

Hit Rate Hit Rate Aggregate

73.4% 66.5%

To investigate the existence of heterogeneity, we needn’t rely on HB alone. Rather, for this

study we had additional information2 about each respondent outside the choice questions. Fromthis additional information we are able to form three segments of respondents on the attributes that were also in the choice study.

T

ng for scale differencnt in fit we could acc

-2LL

2 Effectively, we had prior estimates of each individual’s utilities, as are collected in an ACA interview.

1592001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 174: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

However, by evaluating the model fit of three independent logit models, we can evaluate if there are heterogeneous preferences. Model fits expressed as –2LL are additive, so the model fit of the three independent models combined is:

-2LL = 3364

This suggests (p < .01) there is clearly heterogeneity between respondents in the dataset. We conclude that the HB with partial profile didn’t produce a deleterious result solely due to a lack of heterogeneity.

e

DE

the de the following.

We evaluate the relationship between each design stic and the t-statistic of the improvement due to HB based on Kendall’s tau (τ . The Kendall’s tau and probability reported are the simple averag drawn from ine studies, with each sample draw excluding one study. u) are sho the following table.

Characteristic τ p value

-0.056 0.73

-0.087 0.76 -0.222 0.48

# of Observations/Parameter 0.457 0.13

The one significant design characteristic we believe deserves special attention is the finding

that as the number of alternatives per task increases, the performance of HB improves (relative to

It might be the case that there just wasn’t enough information with the partial profile choic

tasks. The amount of information available from choice tasks is based on a number of design considerations.

SIGN CONSIDERATIONS

We explored the relationship between several design characteristics and the t-ratio of improvement due to HB. The design characteristics we considered inclu

• Number of respondents • Number of attributes in the study • Number of attributes shown in each task • Number of alternatives per task • Number of tasks • Total number of parameters estimated • Total number of observations3 per parameter estimated

characteri

)es across nine samples our nThe results (Kendall’s ta wn in

# of Respondents # of Attributes -0.572 0.06 # of Attributes shown 0.178 0.48 # of Alternatives/Task 0.817 0.01 # of Tasks # of Parameters

3 Observations is defined as follows: (# of attributes shown X # of alternatives per task X # of tasks).

160 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 175: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

aggregate logit). One hypothesis stat ith respondent heterogeneity, the ability of HB might be limited based on the amount of information available in the choice tasks. Previous research has f alternatives per

sk. (Pinnell and Englert, 1997). Besides number of alternatives per task, none of the findings related to design characteristics are significant (α = 0.05).

Finally, we explore if partial profile tasks might be more susceptible to overfitting, and if HB might exacerbate that result.

Are We Overfitting with PP, HB with PP To explore whether we might be overfitting with HB on partial profile tasks, we produced

synthetic data. Using Monte Carlo simulations we produce full profile and partial profile tasks and we simulate respondents’ choices with varying amounts of error and heterogeneity.

The design of the data included 10 attributes, 5 of which were shown in the partial profile tasks. In conducting these simulated choices, we sought to time equalize the full profile tasks relative to the partial profile tasks so 4 full profile tasks and 10 partial profile tasks were simulated for 600 respondents. All tasks relied on randomized designs.

We explore utilities derived from three sources:

• Aggregate logit estimation from full profile choice tasks • Aggregate logit estimation from partial profile choice tasks • Hierarchical Bayes estimation from partial profile choice tasks

ed above is that, even w

shown the statistical benefit of increasing the number ota

Each approach’s ability to recover known parameters can be expressed via a correlation

between the derived utilities and the known utilities. These are shown in the following table:

Pearson r Agg. Logit, Full Profile 0.86 Agg. Logit, Partial Profile 0.84 Hier. Bayes, Partial Profile 0.77

Coile

ll

rrelation with Known Utilities When we compare the full profile utilities (from simulated utilities) to the partial prof

utilities we see that the utilities from partial profile tasks have a much larger scale than the fuprofile utilities, as shown below.

1612001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 176: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Comparison of Utilities

Full Profi g. Logit) le (Agg. Logit) vs. Partial Profile (Ag

-4.00

-3.00

-2.00

0.00

4.00

-4.00 -2.00 0.00 2.00 4.00

Parti l Prof

y=x

A casual inspectio ies derived from the partial profile choice tasks ave a scale nearly tw . It mi surprising that our synthetic ar

r, our understanding of previous research is that this improvement was always attributed to partial profile based solely on an easier respondent task. This finding suggests the difference between full profile and partial profile might be systemic and not limited to differential information processing in humans.

It raises the question: can too much scale be a bad thing? Traditionally, higher scale has been represented as a good thing suggesting a decrease in error. Could higher scale represent overfitting? No amount of analysis on utilities can be as informative as exploring how well different sets of utilities can be used for prediction.

For the meta-analysis above we used hit rates to gauge effectiveness in individual level prediction. We commented that we would have preferred to measure errors in aggregate share prediction. Given the synthetic nature of this final data set, we were able to simulate choices to fixed hold-out tasks. We report on the errors in prediction when predicting shares of hold-out choices. Specifically, we show the mean squared errors, averaged across multiple simulations.

3.00

a ile

1.00

2.00

Full Profile/Time Equalized

-1.00

n suggests that the utilithp

ice those from full profile ght not betial profile utilities have a larger scale than full profile utilities. This finding confirms

previous research (Chrzan, 1999) in which utilities estimated from partial profile tasks were shown to have larger scale than utilities estimated from full profile tasks. Howeve

162 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 177: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The errors under the three scenarios are summarized below:

Errors Predicting Hold-Out Shares

Error Agg. Logit, Full Profile 7.17% Agg. Logit, Partial Profile 8.44%

B s had a larger scale parameter, and this

ifference might impact their ability to produce accurate share estimates. To account for this sca

w:

After Attenuating for Scale Differences

Hier. Bayes, Partial Profile 11.52%

These findings would suggest that partial profile with aggregate logit or with HB has more error than full profile aggregate logit, and about 60% more for the partial profile with Hscenario. However, we know that the partial profile utilitied

le difference, we reanalyzed the share predictions from each of the three sets of utilities, this time attenuating for scale differences.

The errors in share predictions, after attenuating for scale differences, are summarized belo

Errors Predicting Hold-Out Shares

Error Agg. Logit, Full Profile 0.69% Agg. Logit, Partial Profile 3.53% Hier. Bayes, Partial Profile 8.82%

Attenuating for scale fails to improve the performance of partial profile results. In fact, relative to full profile aggregate logit4, partial profile’s performance deteriorates.

4 As well as full profile with HB estimation, though this finding is beyond the scope of the current work.

1632001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 178: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

CONCLUSIONS

impque eliable results with partial

rofile choice tasks and HB estimation of individual part-worths.

with partial

s HBign hit rates relative to aggregate logit.

o th aggregate

ese earch has concluded that the higher scale of partial

he

re erro that

ot vidual part-worths from partial profile with HB

We had previously found HB to be beneficial with full profile choice tasks, and often ressively so. Through this investigation, and others, we have found nothing to cause us to stion this conclusion. However, in the past we have seen unr

p

In our meta-analysis of nine commercial datasets, we find that HB, when used profile choice tasks improves prediction in individual level hit rates relative to aggregate logit in only three of the nine studies examined. At the same time, we find that in four of the nine studie

used to estimate individual part-worths from partial profile choice tasks produced ificantly inferiors

We also explore the impact of using HB with partial profile tasks through synthetic data and nte Carlo simulations. We find that utilities derived from partial profile tasks wiM

logit had nearly twice the scale of utilities derived from full profile tasks with aggregate logit. This finding is not inconsistent with other research, but our conclusion is different than other

arch. As far as we know, all previous resrprofile tasks was a result of a simplified respondent task. Given that our synthetic data produces the same findings without respondents, we conclude the scale is not a purely respondent based

nomenon. p

Rather, we hypothesize that partial profile tasks are susceptible to overfitting. We explors in share prediction of hold-out tasks. Even attenuating for scale differences, we show

h partial profile with aggregate logit and indibhave much higher error than full profile with aggregate logit.

164 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 179: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES Brazell, Jeff, William Moore, Christopher Diener, Pierre Uldry, and Valerie Severin (2001),

“Understanding the Dynamics of Partial Profile Application in Choice Experiments,” AMA Advanced Research Techniques Forum; Amelia Island, FL.

Chrzan, Keith (1999), “Full versus Partial Profile Choice Experiments: Aggregate and

Disaggregate Comparisons,” Sawtooth Software Conference; San Diego, CA. Crouch, Brad and Jon Pinnell (2001), “Not All Customers Are Created Equally: Using Latent

Class Analysis To Identify Individual Differences,” Working Paper, MarketVision Research; Cincinnati, OH.

Hanssens, Dominique, Leonard Parsons, and Randall Schultz (1990), Market Response Models:

Econometric and Time Series Analysis. Kluwer Academic Publishers: Boston. Huber, Joel (2000), “Projecting Market Behavior for Complex Choice Decisions,” Sawtooth

Software Conference, Hilton Head, SC. Huisman, Dirk (1992), “Price-Sensitivity Measurement of Multi-Attribute Products,” Sawtooth

Software Conference; Sun Valley, ID. Lenk, Peter, Wayne DeSarbo, Paul Green and Martin Young (1996), “Hieararchical Bayes

Conjoint Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs,” Marketing Science, 15, 173-191.

Pinnell, Jon (1994), “Multistage Conjoint Methods to Measure Price Sensitivity,” AMA

Advanced Research Techniques Forum; Beaver Creek, CO. Pinnell, Jon and Sherry Englert (1997), “Number of Choice Alternatives in Discrete Choice

Modeling,” Sawtooth Software Conference; Seattle, WA. Pinnell, Jon (2000), “Customized Choice Designs: Incorporating Prior Knowledge and Utility

Balance in Discrete Choice Experiments,” Sawtooth Software Conference; Hilton Head, SC. Rosenberg, Barr (1973), “The Analysis of a Cross-Section of Time Series by Stochastically

Convergent Regression,” Annals of Economic and Social Measurement, 399-428. Swait, Joffre and Jordan Louviere (1993), “The Role of the Scale Parameter in the Estimation

and Comparison of Multinomial Logit Models.” Journal of Marketing Research, Vol. 30 (August), 305-14.

1652001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 180: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

166 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 181: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

ONE SIZE FITS ALL OR CUSTOM TAILORED: WHICH HB FITS BETTER?

Keith Sentis & Lihua LPathfinder Strategies

i 1

tions to variety of marketing problems. For example, HB analyses yield equivalent predictive accuracy

wit

ting of

orths. t the lower level of the model, HB assumes a logit model for each individual. The analytical

alch er

n the upper level. In contrast, if the respondent’s choices are oorly estimated from his own data, then his part-worths are derived more from the distributions

in t ven

This process of “borrowing” information from the entire sample to assist in fitting individual leve

sis aspect of how HB

borrows” information to fit individual level data. I mentioned a moment ago that the upper vel model in HB makes some assumptions about the distribution of vectors of part-worths in

the from the same population distribution. More complex upper level models make further assumptions about the nature of the population. For example, the upper level model may allow for gender differences in choice behavior. Of course, these more complex upper level models require more parameters and additional computational grunt.

INTRODUCTION

As most of you know from either your own experience with Hierarchical Bayes Analysis (HB) or from reports by colleagues, this relatively new analytic method offers better solua

h shorter questionnaires when estimating conjoint part-worths (Huber, Arora & Johnson, 1998). HB also gives us estimates of individual utilities that perform well where before we would have had to settle for aggregate analyses (Allenby and Ginter, 1995; Lenk, DeSarbo, Green & Young, 1996).

HB methods achieve an “analytical alchemy” by producing information where there is verylittle data – the research equivalent of turning lead into gold. This is accomplished by taking advantage of recently developed analytical tools (the Gibbs sampler) and advances in compuspeed to estimate a complex two-level model of individual choice behavior. In the upper levelthe model, HB makes assumptions about the distributions of respondents’ vectors of part-wA

emy results from using information from the upper level to assist with the fitting of the lowlevel model. If a given respondent’s choices are well estimated from his own data, the estimatesof his part-worths are derived primarily from his own data in the lower level and depend very little on the population distribution ip

he upper level and less from his individual data in the lower level. Essentially, HB “borrows”information from the entire sample to produce reasonable estimates for a given respondent, ewhen the number of choices made by the respondent is insufficient for individual analysis.

l data requires considerable computational “grunt” and is a potential barrier to widespread use of HB methods. However, the rapid increase in computer speed and some of our own work (Sentis & Li, 2000) that identified economies achievable in the analysis, have made HB analya viable tool for the practitioner. The focus of our paper today is a particular“le

population. In the simplest case, the upper level model assumes that all respondents come

1 The authors thank Rich Johnson for helpful comments.

1672001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 182: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

In popular B

implementations of HB for estimating conjoint part-worths such as the Sawtooth modules, the upper level model is simple. All respondents’ vectors of part-worths are

ssu

is

, this demand heterogeneity enables diff

isions

The idea was to attempt to improve our predictions by borrowing” information from a more appropriate segment of respondents rather than borrowing

it, there are two ways to proceed. You can shop for an off-the-rack suit and hope or assume that your particular body shape fits within the distribution of body shapes in the population. Alternatively, you can have a suit custom-tail look than the “one-size-f re costly.

segby h e a l a more appropriate base than the entire sample — namely segm

we use to h ctive accuracy from first dividing respo n atterns and then estimating the util

ompared the predictive accuracy of HB utilities derived from the entire sample to those hin latent segments. To customize the e increases the cost of the analysis.

Keeping with o s poses the following question:

These custo -ta her price tag but do they yield a nicer fit?

Ha med to be normally distributed. That is, all respondents’ choices are assumed to come froma single population of choice behavior. On the face of it, this assumption runs counter to muchwork done in market segmentation. Indeed, the fundamental premise of market segmentation that different segments of respondents have different requirements which are manifest as

ifferent patterns of choice behavior. Ostensiblyderentiated product offerings, niche strategies and effective target marketing efforts. This view

was first posited by Smith (1956) who defined market segmentation as making product decby studying and characterizing the diversity of wants that individuals bring to a market.

Our paper examines what happens when HB analyses are allowed to “borrow” information from more relevant subpopulations. “from the entire sample.

Consider this analogy. If you want to buy a new su

ored to your exact shape. Custom tailoring will almost always yield a betterits-a iloring yields a better fit but is moll” alternative. This custom ta

Similarly, in our current project, we explored whether custom tailoring HB utilities within ments yields a better fit. That is, we explored whether better fitting models can be achieved aving th na ysis “borrow” information from

ents of the sample.

We do not have access to HB software that allows complex upper level models. Instead, d the Saw ot HB CBC module to explore the impact on predi

nde ts into groups with presumably similar choice pities separately for each group.

We c

derived from within a priori segments and also from witHB utilities in is nd thereforth way requires more effort a

ur artorial analogy, our paper

m il d HB analyses have a higore

168 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 183: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

AP

estion. First, we computed HB utilities using the entire sample. Actually, we computed three separate sets of utilities to reduce any random jitters in the results. Then we calculated the hit rates for hold out tasks using the three sets of utilities and we averaged the three hit rates.

Next, we divided the sample into segments –either a priori segments or latent segments – and computed HB utilities within each segment. Then we calculated the hit rates for the same holdouts using the within-segment utilities. We computed three sets of HB utilities within each segment and averaged the hit rates as we did for the entire sample analyses. Then we compared the hit rates based on the total sample utilities with the hit rates based on the within-segment utilities. Here is a summary of our approach:

Step 1: Compute HB utilities using entire sample • 3 separate sets

Step 2: Calculate hit rates for hold outs using three separate sets of utilities

• average the three hit rates

Step 3: Divide sample into segments (a priori or latent) and compute HB utilities within each

gment using three separate sets of

he

tasks = 16 o 4 concepts plus NONE

• holdouts = 2 • attributes = 10 • partial profile design

This study focused on a range of farm enterprises that were engaged in quite different farm

activities. Some of these farms produced fine Merino wool and some produced fine chardonnay grapes. The range of farm enterprises broke into three broad industry sectors and we used these industry sectors as a priori segments.

PROACH

We took a simple-minded approach to this qu

segment • 3 separate sets in each segment

Step 4: Calculate hit rates for hold outs within each seutilities

• average the three hit rates within each segment

Step 5: Compare the mean hit rates

esults RT first dataset we looked at had these characteristics:

• business to business study • 280 respondents •

1692001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 184: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The results of our analyses are shown on this graph. Each of the points is the mean of the hit rates from three separate sets of utilities. It would be safe to summarize this slide as “Custom tailoring does not yield dramatically better fits.”

ed on the basis of different choice patterns. We examined three latent segments that we had identified within this same dataset.

Three segments were defined using KMEANS clustering of the HB utilities from the total sample. These three segments comprised 40%, 34% and 26% of the farming enterprises and had all of the characteristics that we like to see when we conduct segmentation projects. They looked different, they made sense, they were statistically different and most importantly, the client gave us a big head nod.

This graph shows the relative importance of the attributes for each of the segments. We have highlighted three of the attributes to demonstrate the differences in the pattern across the segments. These differences meet the usual significance thresholds for both univariate and multivariate tests.

0.4

0

0.6

TotalSample

WithinSegment

TotalSample

WithinSegment

TotalSample

WithinSegment

Hit

Rat

es

Industr ectors

Sector 1

Sector 2

Sector 3

.5

y S

We were somewhat surprised by this and decided to explore what happens to the fit when

latent segments are defin

170 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 185: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

171

s do not yield better fits.

The next graph shows how much better the fit is when we customize the HB runs to borrow information from only the most relevant segment. Once again, custom-tailoring the HB utilitie

Brand

Brand

Brand

Use

Use

Use

Price

Price

Price

Segment 1 Segment 2 Segment 3Feature Importance by Segment

0.4

0.5

0.6

TotalSample

WithinSegment

TotalSample

WithinSegment

TotalSample

WithinSegment

Hit

Rat

es

Clustering Segments – HB Utilities

1712001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 186: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

We thought that perhaps an alternative segmentation method would yield more expected results. So we ran a Latent Class segmentation using the Sawtooth LClass module to define two segments that comprised 53% and 47% of the sample. These segments are similar to the ones we found using the KMEANS method and they do exhibit a different pattern of attribute importance scores.

The results are shown here. Again, the within-segment hit rates were not any better than those from the total sample.

Feature Importance by SegmentSegment 1 Segment 2

Brand

Use

Price Brand

Use

Price

0.4

0.5

0.6

Total Sample Within Segment Total Sample Within Segment

Hit

Rat

es

L Class Segments

Segment 1

Segment 2

172 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 187: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Undeterred by these unexpected results, we continued using this approach to examine six other datasets. These additional datasets were from business to business studies as well as FMCG studies. The sample sizes ranged from 320 to 800, the number of tasks ranges from 11 to 20, the number of attributes ranged from 4 to 7 with both full profile and partial profile designs. On these datasets, we examined a priori segments as well as latent segments derived using KMEANS and L Class methods. Across the six datasets, we examined latent segment solutions with between two and seven segments.

In some instances, there were slight improvements in the within-segment hit rates and in some instances the obverse result obtained. The graph below shows the results across the seven datasets. On the left is the mean hit rate from the 21 sets of HB utilities based on the total samples in our seven datasets. On the right is the mean hit rate from the 222 sets of HB utilities that were customised for the various segments. Even blind Freddy can see that the null hypothesis does not get much nuller than this.

This graph illustrates that the effort and expense of custom-tailoring 222 sets of utilities yields a fit that is no better than the 21 “off-the-rack” utilities. The finding across the seven datasets can be stated quite simply:

• there is no consistent improvement in predictive accuracy when going to the trouble to

compute HB utilities within segments

0.50

0.55

0.60

Total Sample Within Segment

Hit

Rat

es

Average Hit Rates

21 Sets of Utilities 222 Sets of Utilities

So after all of this computation, Lihua and I were faced with a good news – bad news scenario. The good news is that the time and effort associated with customising HB to produce within-segment utilities does not appear to yield anything worthwhile. Therefore, we can run only the total sample analyses and then head for the surf with our sense of client commitment fully intact.

1732001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 188: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The bad news is that our worldview on market segments had been severely challenged. In discussing our findings with colleagues, we encountered a continuum of reaction. This continuum was anchored at one end by responses like “that’s so implausible you must have an error in your calculations”. At the other end of the spectrum, we heard reactions like “just what I expected, segmentation is actually like slicing a watermelon” or “social science data is usually one ‘big smear’ that we cut up in ways that suit our needs”.

Returning to our analogy about buying a suit for a moment, suppose we were to attempt to segment the buyers of suits using a few key measurements like length of sleeve, length of inseam, waist size and so forth. In this hypothetical segmentation exercise, we would expect to identify at least two segments of suit buyers. One segment would cluster around a centroid of measurements that is known in the trade as “42 Long”. The members of this segment are more than 6 feet tall and reasonably slim. Another segment likely to emerge from our segmentation is known as “38 Short”. Members of this segment tend to be vertically challenged but horizontally robust. Despite the fact that members of the 42 Long segment look very different from the members of the 38 Short segment, they all buy their suits off the rack from a common distribution of sleeve lengths, inseams and waist measurements.

In examining the literature more broadly, we came across other findings that are similar to ours. For example, Allenby, Arora and Ginter (1998) examined three quite different datasets looking for homogenous segments. They did not find convincing evidence of homogeneity of demand:

• “For all parameter estimates in the three datasets, the extent of within-component

heterogeneity is typically estimated to be larger than the extent of across-component heterogeneity, resulting in distributions of heterogeneity for which well defined and separated modes do not exist. In other words, across the three data sets investigated by us, a discrete approximation did not appear to characterize the market place completely or accurately.”

In the aftermath of this project, Lihua and I have come to revise our worldview on market

segments by embracing the “watermelon theory”. And as is often the case when one’s fundamentals are challenged, our revisionist view of market segments is a more comfortable one. So while we set out to find nicer fitting HB utilities, we ended up with a better fitting view of market segments.

174 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 189: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES Allenby, G. M.; Arora, N; and Ginter J. L (1998) “On the heterogeneity of demand.” Journal of

Marketing Research 35, 384-389. Allenby, G. M. and Ginter J. L. (1995) “Using Extremes to Design Products and Segment

Markets.” Journal of Marketing Research, 32, 392-403. Huber, J., Arora, N. and Johnson, R. (1998) “Capturing Heterogeneity in Consumer Choices.”

ART Forum, American Marketing Association. Lenk, P. J., DeSarbo, W. S., Green, P. E. and Young, M.R. (1996) “Hierarchical Bayes Conjoint

Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs.” Marketing Science, 15, 173-191.

Sentis, K. and Li, L. (2000) “HB Plugging and Chugging: How Much Is Enough.” Sawtooth

Software Conference Proceedings, Sawtooth Software, Sequim. Smith, W. (1956) “Product Differentiation and Market Segmentation as Alternative Marketing

Strategies.” Journal of Marketing, 21, 3-8.

1752001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 190: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

176 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 191: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

MODELING CONSTANT SUM DEPENDENT VARIABLES WITH MULTINOMIAL LOGIT: A COMPARISON OF FOUR METHODS

Keith Chrzan ZS Associates

Sharon AlbergMaritz Research

INTRODUCTION

In many markets, customers split their choices among products. Some contexts in which we often see this sort of choosing are the hospitality and healthcare industries, fast moving consumer goods and business-to-business markets.

• In the hospitality industry, for example, business travelers may not always stay in the

same hotel, fly the same airline, or rent from the same automobile rental agency. Likewise, most people include two or more restaurants in their lunchtime restaurant mix, eating some of their lunches at one, some at another and so on.

• Similarly, many physicians recommend different brands of similar drugs to different patients. Outside of pharmaceuticals, buyers and recommenders of medical devices and medical supplies also allocate choices.

• In fast moving consumer goods categories, consumers may split their purchases among several brands, buying and using multiple brands of toothpaste, breakfast cereal, soda and so on.

• Finally, many of the choices that are “pick one” for individual consumers are allocations for business-to-business. Consider PC purchases by corporate IT departments.

In these cases, and in many others, it may make more sense to ask respondents to describe

the allocation of their last 10 or next 10 purchases than to ask which one brand they chose last or which one brand they will choose next.

Some controversy attends the modeling of these allocations. They are measured at ratio level, so regression may work. Moreover, because the allocations are counts, Poisson regression comes to mind. On the other hand, the predictors of these counts are not just a single vector of independent variables but a vector of predictors for each alternative in the allocation. This latter consideration suggests a multinomial logit solution to the problem. Although multinomial logit typically has one alternative as chosen and the remainder as not chosen, there are several possible ways of modeling allocation data.

Four ways to model constant sum dependent variables with multinomial logit are described and tested below.

1772001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 192: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Method 1: Winner Takes All (WTA) One could simplify the allocation data and model as if the respondent chooses the one

alternative with the highest point allocation and fails to choose the rest. Method 2: Simple Dominance (SD)

Another method is to recognize and model preference inequalities. This method is the extension to constant sum data of the method recommended by Louivere, Hensher and Swait (2001) for rating scale data. For example, in a constant sum allocation of his next 10 choices in a given category, Smith allocates 5 points to Brand A, 2 each to Brands B and C, 1 to D and none to E or F. These inequalities are implicit in Smith’s allocation:

A>(B, C, D, E, F) B>(D, E, F) C>(D, E, F) D>(E, F) Thus one could turn Smith’s allocation into four choice sets corresponding to the four inequalities above: Set 1: Smith chooses A from the set A-F Set 2: Smith chooses B from the set B, D-F Set 3: Smith chooses C from the set C, D-F Set 4: Smith chooses D from the set D, E, F. Method 3: Discretizing the Allocation (DA)

A third way to set up the estimation model involves seeing the 10 point allocation as 10 separate reports of single-brand choosing. Thus there are 10 “choices” to be modeled from the set A-F.

In five of these, Smith chooses A over B-F; in two he chooses B over A, C, D, E and F; in two he chooses C over A, B, D, E and F; and in one he chooses D over A, B, C, E and F.

Discretizing the allocation is the method for handling constant sum dependent variables in SAS (Kuhfeld 2000). Method 4: Allocation-Weighted Dominance (WD)

One could combine Methods 2 and 3 above simply by weighting the four dominance choice sets from Method 2 by the observed frequencies from the allocation: Thus the Set 1 (Smith chooses A from the set A-F) gets a weight of 5 Set 2 (Smith chooses B from the set B, D-F) gets a weight of 2 Set 3 (Smith chooses C from the set C, D-F) gets a weight of 2 Set 4 (Smith chooses D from the set D, E, F) gets a weight of 1

178 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 193: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Possible Biases Winner Takes All ignores much of the information provided by respondents about the

strength of their preferences; containing the least information, we expect it will do least well in predicting shares. To the extent respondents gravitate toward extreme preferences (100% for a single brand) Winner Takes All should perform better.

Dominance Modeling makes use of more of the information but only insofar as the allocation data provides a rank-order preference of the alternatives. Whereas Louviere, Hensher and Swait (2001) use this method to make the most of the preference information contained in rating scale data, in the present case it ignores the magnitude of preference information provided by the constant sum metric. For this reason, Dominance Modeling also may not predict the allocation shares well.

Discretizing the Allocation may result in shares that are too flat. The reason is that the same independent variables will predict different dependent variable outcomes and the noise this creates will depress the multinomial logit coefficients. Thus shares of more and less preferred alternatives will be less distinct.

Allocation Weighted Dominance makes use of more of the information than Dominance Modeling and Winner Take All. No observations from a single respondent predict different outcomes from an identical pattern of independent variables, so multinomial logit utilities will not be muted. This may produce share estimates that are too spiky.

Of the four methods, both Simple Dominance and Weighted Dominance involve substantial programming for data setup. Winner Takes All requires much less programming and Discretized Allocation almost none at all. EMPIRICAL TESTING

Using three empirical data sets (two from brand equity studies, one from a conjoint study) we will test whether these four analytical approaches yield the same or different model parameters; if different, we will identify which performs best.

It may be that the four models do not differ significantly in their model parameters, or that they differ only in the scale factors, and not in the scale-adjusted model parameters. For this test we employ the Swait and Louviere (1993) test for equality of MNL model parameters. If a significant difference in model parameters results, then it will make sense to discover which model best predicts observed allocation shares.

In case of significantly different model parameters, testing will proceed as follows. Under each model, each respondent’s predicted allocation will be compared to that brand’s observed allocation for that respondent. The absolute value of the difference between actual and predicted share for each brand will be averaged within respondent. This mean absolute error or prediction (MAE) will be our metric for comparing the four models. We use a comparable metric for aggregate share predictions.

1792001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 194: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Empirical Study 1 Data Set

A hospitality provider commissioned a positioning study in which 247 respondents evaluated the client and three competitors on 40+ attributes. Factor analysis reduced this set of attributes to 13 factors, and factor scores representing these factors are the predictor variables in the model. Respondents’ report of the share of their last 10 shopping occasions is the constant sum dependent variable.

Analysis We used the Salford Systems LOGIT package to estimate the MNL models (Steinberg and

Colla 1998).

Table 1 Raw Model Parameters for Study 1

Winner Simple Discretized Weighted Parameter Takes All Dominance Allocation Dominance 1 .65 .36 .50 .51 2 -.43 -.25 -.30 -.31 3 -.09 -.03 -.05 -.05 4 .02 .14 .13 .11 5 .59 .37 .33 .38 6 -.06 .00 .02 .02 7 .08 -.03 .05 .06 8 .46 .11 .24 .23 9 .05 .04 .13 .13 10 .02 .03 .05 .06 11 -.05 .07 .02 .03 12 .15 .01 .09 .07 13 .35 .22 .25 .25

Table 1 shows the model parameters for the four models. The various manipulations necessary to conduct the Swait and Louviere test reveal that the scale factors for the four models were, setting WTA as the reference:

WTA 1.000 SD 0.583 DA 0.765 WD 0.785

180 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 195: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 2 Scale-Adjusted Parameters for Study 1

Winner Simple Discretized Weighted Parameter Takes All Dominance Allocation Dominance 1 .65 .62 .65 .65 2 -.43 -.42 -.39 -.39 3 -.09 -.05 -.07 -.06 4 .02 .24 .18 .14 5 .59 .63 .43 .49 6 -.06 .00 .03 .02 7 .08 -.04 .06 .08 8 .46 .20 .31 .29 9 .05 .06 .17 .16 10 .02 .06 .07 .08 11 -.05 .12 .02 .04 12 .15 .02 .12 .09 13 .35 .38 .33 .31

Scale-adjusted model parameters appear in Table 2. The coefficients for WTA are larger than for the other models and those for SD are smaller, with WD and DA in between. Unadjusted, WTA will produce the most spiky shares and SD the most flat shares.

But we need to test to make sure it makes sense to look at the unadjusted shares. The log likelihoods for the four individual models were:

WTA -237.132 SD -286.741 DA -264.193 WD -261.487

The other two log likelihoods needed for the Swait and Louviere test are the log likelihood of

the data set that concatenates the above four models (-1058.374) and the scale-adjusted concatenated data set (-1051.804). The omnibus test for the difference in model parameters has a χ2 of 4.502; with 42 degrees of freedom (13 parameters plus one for each model past the first) this is not even close to significant. This means that the models are returning non-significantly different parameters (utilities) after adjusting for differences in scale. The test for the difference in scale, however, is significant (χ2 of 13.14; with three degrees of freedom, p<.005). Results

Together the results of these two tests mean that the only significant difference between the parameters from the four models is their scale. One can thus use any of the four models if one first adjusts the parameters by a multiplicative constant to best fit observed (disaggregate) shares.

1812001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 196: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Empirical Study 2 Data Set

366 health care professionals completed a telephone-based brand equity study. The subject was a medical diagnostic category with five major competitors who accounted for over 95% of category sales. Each respondent rated all of the five brands with which she was familiar on a total of 10 attributes. Respondents also reported what percentage of their purchases went to each of the five brands.

Unlike Study 1, this time a large majority of respondents allocated all of their usage to a single brand – nearly 90%. We expect that this might make the models more similar than in Study 1, as all are more likely to resemble Winner Takes All.

Analysis Again we used the Salford Systems LOGIT package to estimate the MNL models (Steinberg

and Colla 1998). Coefficients were very similar, as were the scale parameters: 1.00, 1.10, 1.02 and 1.14 for Winner Takes All, Simple dominance, Discretized Allocation and Weighted Dominance, respectively.

Results For this data set, neither the test for differences in parameters (χ2 of 7.272 with 33 degrees of

freedom) nor the test for difference in scale (χ2 of 1.142 with 3 degrees of freedom) is significant. Unlike Study 1, the non-significant differences in coefficients lack even the appearance of being different, because even the models’ scales are not significantly different. Empirical Study 3 Data Set

As part of a strategic pricing study, 132 hospital-based purchase influencers completed a phone-mail-phone pricing experiment for a category of disposable medical products. Respondents completed 16 experimental choice sets each from a total design of 32. Each set contained three or four brands at varying prices, so that both brand presence and price are manipulated in the experiment. Also appearing in each choice set is an “other” alternative worded as “any other brand of <widgets> at its usual price.”

A nice compromise between Studies 1 and 2, this time about half of respondents’ choices are allocations of 100% to a single alternative in a choice set.

Analysis Again we used the Salford Systems LOGIT package to estimate the MNL models (Steinberg

and Colla 1998).

As in Study 2, neither the coefficients (χ2 of 4.58 with 18 degrees of freedom) nor the scale factors (χ2 of 5.764 with 3 degrees of freedom) differ significantly.

182 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 197: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Results As in the previous studies, it does not matter which MNL formulation one uses, as the

coefficients are not significantly different. As in Study 2, not even the scale parameters differ significantly. DISCUSSION

The four MNL formulations produce coefficients that do not differ significantly from one formulation to another. A utility scale difference occurred in just one of the three studies. It is common to have to calibrate utilities in order to have the best fit between MNL share simulations and actual shares, so it is not clear that any of the formulations is superior to the others. For ease of programming and consistency with software providers’ recommendations, discretized allocation is probably the best way to analyze constant sum data.

While we have shown that constant sum dependent variables do lend themselves to analysis via MNL, it is not clear that this is always a good idea. Sometimes we use constant sum dependent variables because we know that there is taste heterogeneity and variety seeking within respondents (say in choice of food products).

Other times we use constant sum dependent variables when the heterogeneity is influenced by situational factors. A physician may prescribe different antidepressant drugs to different patients because of differences (age, sex, concomitant conditions, concurrently taken drugs) in the patients, not because of any variety seeking on the doctor’s part. Or again, one might eat lunch at McDonald’s one day and Bennigans the next because one day I’m on a short break and one day I have more time, or because one day I am taking the kids to lunch and the next I am going with office mates.

When the source of the heterogeneity is situational, it probably makes more sense to model the situational effect directly using a hybrid logit model, one wherein we model choice as a function of attributes of the things chosen (conditional MNL) and as a function of attributes of the chooser or the situation (polytomous MNL).

1832001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 198: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES

Kuhfeld, Warren (2000) “Multinomial Logit, Discrete Choice Modeling: An Introduction to Designing Choice Experiments, and Collecting, Processing and Analyzing Choice Data,” in Marketing Research Methods in the SAS System, SAS Instutute.

Louviere, Jordan J, David A. Hensher and Joffre D. Swait (2001), Stated Choice Methods:

Analysis and Application, Cambridge: Cambridge University Press. Steinberg, Dan and Phillip Colla (1998) LOGIT: A Suplementary Module by Salford Systems, San

Diego: Salford Systems. Swait, Joffre and Jordan Louviere (1993) “The Role of the Scale Parameter in the Estimation and

Comparison of Multinomial Logit Models,” Journal of Marketing Research, 30, 315-314.

184 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 199: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

DEPENDENT CHOICE MODELING OF TV VIEWING BEHAVIOR

Maarten Schellekens

McKinsey & Company / Intomart BV

INTRODUCTION Conjoint choice analysis is ideally suited to model consumer choice-behavior in situations

where consumers need to select their preferred choice-alternative out of a number of choice-alternatives. In many instances, conjoint choice models are more versatile than traditional conjoint approaches. Complications, like the similarity of choice-alternatives (the IIA-problem) have largely been solved by using nested choice models and segmentation models (latent class analysis, hierarchical bayes). Furthermore, conjoint choice analysis has considerable advantages over traditional conjoint, given its ability to include context effects and alternative-specific attributes, and to directly model choice-behavior without the need for additional assumptions on how to translate utilities into choices.

One issue that has not completely been resolved yet, is the simultaneous modeling of a set of interdependent choices in conjoint choice analysis. How do we need to apply conjoint choice modeling in situations in which consumers make a number of interrelated choices? We see this phenomenon in the context of shopping behavior and entertainment. Take for example the process of buying a car: in normal conjoint approaches we model the add-on features, such as a roof, spoilers, cruise-control and automatic gear as attributes of the choice-alternative ‘car’, in order to derive their utility-values. In reality, these ‘attributes’ are not real attributes, but choice-alternatives in their own right. However, there is a clear interdependence between the choices for these features and the type of car. A single-choice model is not capable of capturing this complexity.

APPROACHES FOR MULTI- CHOICE MODELING Most multi-choice situations can be modeled with resource-allocation tasks. These tasks are

appropriate when for example the utility-value of the alternatives vary over situations, for instance. Take for example the choice for beer: the choice for a type or brand of beer may well depend on the context in which the beer is consumed. The choice may be different when alone at home compared to a social situation in a bar. It may vary when the weather is hot versus cold. Therefore, when asking for a choice, we’d better ask to indicate how often they would take the different types of beer.

Another useful application of resource-allocation tasks is when the ‘law-of-diminishing-returns’ applies. This law is easy to understand in a shopping-context: the utility of the second pair of shoes one buys is much lower compared to the first pair of shoes chosen. This law also applies in the context of variety-seeking: when people visit a fair they rarely visit just one single attraction. Therefore, when we have a budget to spend, we most often allocate it to several choice-alternatives instead of just one. A resource allocation task is most appropriate in these situations.

1852001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 200: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

There are situations in which resource-allocation is not appropriate, however. The example of the choice of a car is not suited well for an allocation task (at least not an allocation task in the traditional sense, as the basic car and the add-ons are complementary products). A better design would be to include add-ons as additional options in choice-tasks and to treat the choice-problem as one in which the choice-alternative at the car-level and at the add-on level influence each other.

Another good example is TV-viewing behavior. TV-viewing can be characterized by variety seeking as well: not many viewers want to watch the news all the time and strive towards a more varied portfolio of program genres that reflect their viewing preferences. Again, TV-viewing can not be dealt with appropriately by means of a resource-allocation task. The main complication is that the viewer is restricted in his allocation of viewing-time to the program-genres offered in a specific timeslot, due to the framing in time of the broadcasted programs. Therefore, the choices viewers make given the competitive program-grids as offered by a number of channels, can best be characterized as a number of inter-related, dependent single choices over time.

MODELING TV-VIEWING BEHAVIOR In this article I want to clarify the choice-modeling issues that need to be dealt with when

modeling such a complex choice-process as TV viewing-behavior. I will use the results of a study that I have undertaken to outline different ways of analyzing viewer behavior and preferences for channels and program-genres, I will also highlight potential pitfalls that may arise in the analysis.

RESPONDENT CHOICE TASKS Choice-behavior with multiple dependencies can only be studied well by capturing the

complexity of the choice-process in the respondent tasks. Essential is that the choice-tasks resemble well the actual choice-situations viewers face in reality.

When studying TV-viewing behavior, this translates into offering respondents the program-grids of a number of channels for a defined time-interval, and asking the respondents which choices they would make.1 A number of simplifications are necessary in order to construct the choice-tasks.

- First of all, we reduced the diversity of programs to a limited number of program-genres, since we wanted to represent the universe of program-genres, and not so much specific programs. In this study, 15 program genres were defined, to make up the attribute ‘program genres’ with 15 levels.

- A number of channels needed to be selected. In this study 8 channels were selected based upon the dominance of the channels in the market.

- In actual program grids, not all channels start and finish their programs at the same time. In order to keep the complexity to manageable proportions, timeslots with a length of 4 hours were defined with 4 programs lasting exactly one hour for each channel. This way, it was clear which the competing program-genres were in each hour of the timeslot.

1 This assumes that actual choices are primarily based upon ‘looking through the program-grids’ as they appear in newspapers and TV-guides, and not on ‘choose as you go’ or Zapping. However, this is probably the most feasible way of studying TV-viewing behavior.

186 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 201: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

- In order to limit the information overload for the respondents, the number of channels in each choice-task is set to three. The channels are rotated in a random fashion from task to task. Next to the three channels there is the none-option, to give the respondents the opportunity to express that they don’t like any of the options.

This exercise resulted in choice-tasks with the following ‘program-grid’-format:

PREFERENCE CARD

Time slot: Weekdays 1900-2300

Channel 3Channel 2Channel 1

Don’t watchNewsMagazineFootball1st program

Don’t watchSeriesVariety showLocal Cinema4th program

Don’t watchForeign FilmPop musicNews3r program

Don’t watchDocumentarySeriesGame show2nd program

We asked respondents to indicate for each program-grid, which program/channel combinations they would choose given in the timeslot to which they were allocated. Basically, each choice-task requires four choices: one for each hour in the time-slot. Furthermore, the respondents could indicate that they didn’t want to watch any of the programs. In total, each respondent had to fill out 21 program-grids. In total, 1292 respondents were interviewed, resulting in a database with 108,528 choices available for analysis. The next table shows a hypothetical completed choice-task.

1872001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 202: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

PREFERENCE CARD

Time slot: Weekdays 1900-2300

Channel 3Channel 2Channel 1

2

Don’t watchMagazineFootball1st program

Don’t watchSeriesVariety show4th program

Don’t watchDocumentaryGame shownd program

Don’t watchPop musicNews3r program

News

Series

Local cinema

Foreign film

ANALYSIS APPROACHES There are two basic approaches to analyze the interdependencies amongst the choices:

1. We consider the choices made at any one hour in a timeslot as a separate choice-task, and treat the availability of choice-alternatives in the other three hours in the timeslot as context-effects in the analysis. Table 2 illustrates this approach.

PREFERENCE CARD

Time slot: Weekdays 1900-2300

Channel 3Channel 2Channel 1

Don’t watchSeriesVariety showLocal Cinema4th program

Don’t watchForeign FilmPop musicNews3r program

Don’t watchDocumentarySeriesGame show2nd program

Don’t watchNewsMagazineFootball1st program Context-effects

Context-effects

188 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 203: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The analysis of this specification is straightforward: in the coding of each choice-

fy d in other hours in the timeslot, and study

ilability to see the program at a different hour’ on the propensity to availability of a program-genre g certain program-genres in the

imp

of analyzing the interdependencies is ath’

e ut of all

alternative, these context-effects are being specified and constant for all choice-alternatives in the task. In the analysis the context-effects can be estimated as usual. The exact specification of the context-effects can take several formats. For example, one can specihow often the same program type is being offerethe effect of this ‘avachoose this program. Or alternatively, one can study how thein one hour diminishes or increases the likelihood of choosinnext hour. The clear disadvantage of this method is that it is very cumbersome (though not

ossible) to explicitly model dependencies in choices, as this would make the coding of the context-effects dependent on choices made by the respondents in the other hours of the timeslot.

2. A more versatile, but also more complex methodto consider all four choices made in the grid as one single choice. In other words: the ‘pth respondent chooses through the grid is being conceived as one choice-alternative o

potential paths. Table 3 illustrates this approach:

PREFERENCE CARD

Time slot: Weekdays 1900-2300

Channel 3Channel 2Channel 1

1st program

Don’t watchSeriesVariety show

Don’t watchMagazineFootball

Don’t watchDocumentaryGame show2nd program

4th program

Don’t watchPop musicNews3r program

News

Series

Local cinema

Foreign film

the ower of the number of

e-ice in

One can easily see that the number of choice-alternatives dramatically increases with thenumber of channels in the grid and/or hours in the timeslot. In the grids used in this study,number of potential paths is 44 = 256 (the number of channels to the phours in the timeslot). However, the flexibility in the analysis increases as well. We can now explicitly model dependent choices, as all choices are captured in the same single choicalternative (the ‘path’). Theoretically, all effects a choice in one hour can have on a choa different hour can be modeled. Given the large number of effects that one could look into, one would practically only include the estimation of those effects that can be argued to exist from a substantive point of view.

1892001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 204: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

CONJOINT DESIGN AND RESULTS In order to keep the flexibility in the type of effects to be estimated, the second approach was

adopted in this study to analyze the choice-data. Guided by the effects our client was most interested in, the following effects were estimated:

- Channel effects

am-genres for each hour in the timeslot

- Selected interaction-effects between channel and program-genre (to answer the question

nel

n each other at

- Effects of progr

what combinations of program-genre and channel do relatively well or do not so well)

- ‘Horizontal’ cross-effects, i.e. the effect of specified program-genres on each other in thesame hour (to study the effect that the audience is being taken away to another chanbroadcasting the same or similar program-genres)

- ‘Vertical’ cross-effects, i.e. the effect of specified program-genres odifferent hours in the timeslot (to study the effect that the availability of specified program-genres at different times in the timeslot may have on the choice)

The results for the effects of the program-genres, averaged out over the four hours in a timeslot, are displayed in figure 1.

0

0.5

1

1.5

PROGRAM PREFERENCESProgram utilities*

Across all time slots

Other sp

orts

Traditi

onal m

usic

Popula

r musi

c

Variety

show

Cultur

e maga

zine

Magazin

eFoo

tball

Series

Game s

how

Docum

entary

Know

ledge

game

News b

ulletin

Actual

ity/deb

ateLoc

al film

Foreig

n film

The utility values are derived from the Multinomial Logit model, and are therefore directly

related to the likelihood of choosing one program-genre over another. E.g. if the utility value of one program type is 0.4 higher than the utility of another program-genre, the ‘better’ one is chosen e0.4 = 1.5 times more than the ‘worse’ one. If the difference would be 0.7, it would be chosen twice as often.

190 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 205: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Two different types of ‘horizontal’ cross-effects were estimated. First, the effect of the same program-genre at a different channel is -0.40, implying that people are 33% less likely to choose a program-genre, if a similar genre is being broadcasted at the same time on a different channel.

The second type of horizontal cross-effect is for program-types that are alike but not exactly the same. Based upon factor-analysis, we defined that the following genres are alike:

• actuality – news bulletin

• football – other sports

• domestic film – series

The horizontal cross-effect (averaged out for all three combinations) is -0.28, meaning that people are 25% less likely to watch a program-genre, if a program genre that is alike but not similar is being broadcasted at a different channel during the same hour.

More interestingly are the vertical cross-effects, as they can only be derived given the specific choice-design that allows the estimation of these effects. As we assumed that the information processing of the respondents would mainly occur in a sequential fashion, we estimated the vertical cross-effects for each hour separately. This hypothesis was confirmed, as the effects ranged from -0.01 in the first hour of the timeslot to -0.13 in the last hour of the slot, with an average of -0.07 over all hours in the timeslot (the effect of -0.13 m12% r in the

rdly

to

, we do need to realize that these effects essentially model the program-preferences f specific segments in the sample. For example, if football is being chosen in the first hour of e grid, one is also more likely to choose football later on. This is because the respondents in the

e football in the first hours happen to like football more than other re therefore more likely to also choose it in the second hour.

eans that people were less likely to watch a program-genre if it had already been available in an earlier hou

slot). The results indicate that the choice for a program genre in the first hour of the slot is haaffected by the occurrence of this genre in later hours, but that in later hours one does take into account the availability of the genre in earlier hours.

POTENTIAL PITFALLS

In the analysis of dependent choice modeling, we explicitly conceive the ‘paths’ through the program-grids as the choice-alternatives. As long as we are interested in these paths, this works out fine. We can include ‘dependency-effects’ that explicitly model the interdependencies between program-genres over time.2 A dependency-effect could for example be used to study what extent specific program-genres are being chosen in the same paths.

Howeverothsample who choosrespondents, and a

This is all correct and the dependency effects are even necessary to build a model that correctly reflects people’s preferences for ‘paths’. However, sometimes we may not so much be interested in these paths, but more so in the question how often a specific program is chosen at aspecified channel, regardless of the paths taken. Especially when working with simulations, in

2 A dependency-effect is different frochoice, whereas a vertical cross-effect

m a vertical cross-effect, as a dependency-effect models the effect of one choice on another only models the effect of a (non-choice related) context on a choice.

1912001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 206: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

whi

in

ch grids are being specified and sensitivity-analysis is being performed, one may be temptedto aggregate all the paths that contain a specific cell in the grid (program-genre at one of the channels) to calculate the share of audience for that program.

This is not without risks, however. I will try to clarify this point. Let’s assume we want to runa simulation based on the program-grid I showed earlier. As you can see, football appears only the first hour (on channel A). Let’s assume we carry out a sensitivity-analysis in order to see what happens when channel 2 changes the series in the second hour to football, resulting in thefollowing program-grid.

PREFERENCE CARD

Time slot: Weekdays 1900-2300

Channel 3Channel 2Channel 1

Don’t watchSeriesVariety showLocal Cinema4th program

Don’t watchForeign FilmPop musicNews3r program

Don’t watchDocumentaryFootballGame show2nd program

Don’t watchNewsMagazineFootball1st program

What you would expect is that the audience-share for football in the first hour would

decrease, as football-lovers do now have the option to watch their favorite sport later in the evening as well. However, what actually happens in the model when applying the aggregation procedure as described above, is that paths with football in both the first and second hour are boosted substantially, as the choice-model now has the opportunity to fit segment-specific preferences (the paths of football-fans). It should be clear that the aggregation-procedure results in erroneous output from the simulation model, and should therefore not be used.

The underlying problem, causing this disparity between a specification of a model that in itself is correct, and results that are clearly counterintuitive, is the aggregate nature of the choice-model (i.e. the estimated parameters are at the aggregate level, and not at the segment or individual level). If we would be able to estimate the dependency-effects at the individual level in a way that actual paths can be reproduced accurately by the model, this problem would be alleviated. However, in most situations the ratio between data-points available for each respondent and number of parameters to estimate for each respondent will make this problem unsolvable.

192 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 207: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

CONCLUSIONS In this study we tested a new methodology to study dependent choices in TV viewing

behavior. The methodology introduces ‘dependency-effects’ to model the interrelatedness of choices people make. This methodology works well to the extent the main focus of the study is to understand the dependencies of the choices at the aggregate level. For TV-viewing behavior, dependency-effects reveal the extent to which specific sequences of choosing programs and channels from program-grids exist. These effects mainly reveal the heterogeneity of viewer-preferences in the market.

The new methodology is not suited for studying the effects of program-changes in program-grids on audience-shares. Therefore, the effects that programs in the grid, whether broadcasted at the same time or a different time, can only be studied by means of cross-effects.

Modeling dependent-choices is a neglected topic in choice-modeling. In this study a choice design was developed that allows dependent choice modeling. However, the results show that this approach has limitations that need to be resolved. One avenue for future research on choice-dependency is to develop disaggregated choice-models that include these dependencies. Only then dependent choice modeling can be used to fully complement and enrich the ‘single choice’ methodology. Only then we will be able to model choice-problems that involve more than a single choice with one encompassing methodology.

1932001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 208: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

194 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 209: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

ALTERNATIVE SPECIFICATIONS TO ACCOUNT FOR THE “NO-CHOICE” ALTERNATIVE IN CONJOINT CHOICE EXPERIMENTS1

ultel an a

ity

utes may be biased, especially when linear ttr

its popularity. In onjoint choice experiments respondents choose one profile from each of several choice sets. In

ord

.

alternatives are most often coded in the data matrix with effects-type or ummy coding. Since a no-choice alternative does not possess any of the attributes in the design,

one-

Rinus HaaijerMuCons 2

Michel WedUniversity of Groningen and Michig

Wagner KamakurDuke Univers

ABSTRACT

In conjoint choice experiments a "no-choice" option is often added to the choice sets. When this no-choice alternative is not explicitly accounted for in the modeling phase, e.g. by adding a no-choice constant" to the model, estimates of attrib"

a ibutes are present. Furthermore, we show that there are several methods, some equivalent, to account for the no-choice option.

INTRODUCTION

Choice experiments have become prevalent as a mode of data collection in conjoint analysis in applied research. The availability of new computer-assisted data collection methods, in particular CBC from Sawtooth Software, has also greatly contributed toc

er to make the choice more realistic, in many conjoint experiments one of the alternatives inthe choice sets is a “no-choice” or “none” option. This option can entail a real no-choice alternative (“None of the above”) or an “own-choice” alternative (“I keep my own product”)This base alternative, however, presents the problems of how to include it in the design of the choice experiment and in what way to accommodate it in the choice model.

Regular choice d

may be tempted to code it simply as a series of zero’s. In this paper we investigate several specifications that can be used to accommodate the no-choice option. We show that when the no

1 This paper is based on Haaijer, Kamakura, and Wedel (2001). 2 M.E. Haaijer, Junior Projectleader MuConsult BV, PO Box 2054, 3800 CB Amersfoort, The Netherlands. Phone: +31-33-

4655054, Fax: +31-33-4614021, Email: [email protected]. W.A. Kamakura, Professor of Marketing, The Fuqua School of Business, Duke University. P.O. Box 90120, Durham, NC

27708-0120, U.S.A. Phone: (919) 660-7855, Fax: (919) 681-6245, Email: [email protected]. M. Wedel, Professor of Marketing Research, University of Groningen, and Visiting Professor of Marketing, Michigan

University. P.O. Box 800, 9700 AV Groningen, The Netherlands. Phone: +31-50-3637065, Fax: +31-50-3637207, Email: [email protected].

1952001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 210: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

choice alternative is not explicitly accounted for a odel, estimates of

in the model phase, by adding an additional rameter to the m the attribute dummies may be biased.

design of the experiment, mong others, to scale the utilities between the various choice sets. A base alternative can be

spe esets. Se an be specified as “your current brand” and third, as a “none”, “other” or “no-choal. 1994 t are men d lead to it may ult choices, which detracts from the validity of using the

o-choice probability to estimate market shares. However, Johnson and Orme (1996) claim that this rnative

is the

y

ributes present, the ero values of the no-choice alternative act as real levels of the linear attributes. When for

e for no-choice will correspond to a ero price. We hypothesize that this can lead to a biased estimate of the parameter of the linear ttr

bias discussed above use all part-worths are now specified relative to the zero-utility of the no-

cho

-

tured in a different nest.

p

THE BASE ALTERNATIVE

In conjoint choice experiments a base alternative is included in the a

cifi d in several ways. First, it can be a regular profile that is held constant over all choice cond, it c

ice” alternative (e.g., Louviere and Woodworth 1983; Batsell and Louviere 1991; Carson et ). Additional advantages of including a “no-choice” or “own” base alternative tha

tioned in the literature are that it would make the choice decision more realistic and woulbetter predictions of market penetrations. A disadvantage of a no-choice alternative is that lead respondents to avoid diffic

n seems not to happen in conjoint choice experiments. In addition, the no-choice alte

gives limited information about preferences for attributes of the choice alternatives, whichmain reason for doing a conjoint choice experiment.

In this paper we investigate the no-choice option from a modeling point of view3. We start bdiscussing a number of alternative model formulations. First, simply having a series of zeros describing the attribute values of the no-choice alternative seems a straightforward option, but this formulation may produce misleading results. When there are linear attzinstance price is a linear attribute in the design, the zero valuza ibute when the no-choice option is not accounted for in the model.

Second, when all attributes are modeled with effects-type coding the does not arise, beca

ice alternative. However, even when all attributes are coded with effects-type dummies, adding such a constant for the no-choice option to the design matrix improves model fit. This canbe explained because the no-choice option in fact adds one level to the attributes. Although this additional constant increases the number of parameters by one, it sets the utility level of the no-choice alternative.

Finally, another way to model the presence of a no-choice option is by specifying a Nested Logit model. When two nests are specified, one containing the no-choice and the other the real product profiles, the no-choice alternative is no longer treated as just another alternative. The idea is that respondents first decide to choose or not and only when they decide to choose a real profile they select one of them, leading to a nested choice decision. This way of modeling the nochoice potentially also removes the effects of linear attributes because the zeros of the no-choice are no longer treated as real levels, because they are now cap

3 In the remainder of the paper we only mention the “no-choice” (or “none”), but the results also apply to the “own”

alternative when nothing is known about its characteristics to the researcher.

196 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 211: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

EQUIVALENT NO-CHOICE SPECIFICATIONS

There are several equivalent methods to account for the no-choice option, in the sense that

r

. ce

ns (e.g. Brownstone, Bunch and Train 2000) or Bayesian methods to account for IIA violations (e.g. McCulloch and Rossi 1994). However, in this paper we use the simple MNL model since it suffices to demonstrate our point. EMPIRICAL INVESTIGATIONS OF MODELING OPTIONS

In this section we provide an application of a commercial conjoint choice data to illustrate the relative fits of the alternative models and coding of the attributes. Data Description4

The product we consider is a technological product with six attributes: Brand (6 levels), Speed (4 levels), Technology Type (6 levels), Digitizing Option (no and 2 yes-levels), Facsimile Capable (y/n), and Price (4 levels). The Price and Speed attributes are coded linear with {1, 2, 3, 4} for the four levels respectively (Speed ascending, Price descending) and the other attributes are coded using effects-type coding. We use 200 respondents whom each had to choose from 20 choice sets with four alternatives, where the last alternative is the “no-choice” option, which is defined as “none of the above alternatives”. We use the first 12 choice sets for estimation and the last 8 for prediction purposes. Each respondent had to choose from individualized choice sets. We compare the results of the models on the Log-Likelihood value, AIC (Akaike 1973) and BIC statistics (Schwarz 1978) and the Pseudo R2 value (e.g., McFadden 1976) relative to a null-model in which all probabilities in a choice set are equal to 1/M, with M the number of alternatives in each the choice set. The AIC criterion is defined as: , where n is the total number of estimated parameters in the m

, where O is the number of observations in the conjoint choice

they all lead to the same overall (predictive) model fit, which is shown in the application section below. Of course, the estimates for some of the parameters differ across models depending on the specification used. The equivalent specifications that we consider are:

1. Include a “no-choice constant”, and model all attributes with effects-type and/or linear coding;

2. Include a “product category constant”, and model all attributes with effects-type and/or linear coding. In this situation the no-choice alternative is coded with only zeros;

3. Code one of the attributes with regular dummies (e.g. Brand-dummies), and all otheattributes with effects-type and/or linear coding. In this situation the no-choice alternativeis also coded with only zeros.

In the application section these specifications will be estimated with the use of the

Multinomial Logit model. It is well known that the MNL model may suffer from the IIA-property. Several approaches have been developed that do not have this property. Haaijer et al(1998) used a Multinomial Probit specification with dependencies between and within the choisets of the respondents (see also Haaijer 1999, and Haaijer, Kamakura and Wedel 2000). Other studies used Mixed Logit specificatio

2n + L 2- = AIC lnodel and the BIC criterion is defined as:

(O) n + L 2- = BIC lnln

4 We thank Rich Johnson from Sawtooth Software for allowing us to analyze this data set.

1972001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 212: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

experiment. We test differ he ested with the likelihood ratio (LR) test.

Estimating the Equivalent S Table 1 gives the results of on of the th t specificat nt for

ice alternative in a choice experiment Al ere stiMNL model. As expected, all three specifications converge to exactly the same

e ihood for the predictions of the holdout choice odels have a parameter in common, the estimates are equal. The model with the

constant and the model with the product-category constant differ onl e estim onstant, which are equ value opposite in sign. The del that ins the

ummies, instead of effect-type coding for the br attribu nly show erent r these brand-dumm e utility f A oth othe ications

is equal to minus the sum of the estimates for the brand B up to brand F parameters, which is the standard way to calculate the utility of the reference level of an attribute that is coded with effects-type coding. Note also that in Table 1 the estimate for the no-choice constant is relatively large and positive. This means that the no-choice has a high overall utility, which is also shown by the high number of times the no-choice alternative was actually chosen (in 43.1% of all choice sets). Similar, the large negative value for the product-category constant in the second model and the large negative values for all brand-dummies in the third model also show a low preference for the product-category.

ences in t likelihood values for models that are n

pecifications the estimati ree equivalen

modeions to accou

ith ththe no-cho . l these ls w e mated w e standard optimum and all give the sam value of the likelsets. When the mno-choice y in th atefor this c al in but mo conta brand-d and- te, o s diffestimates fo ies. Note that th or brand in b r specif

198 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 213: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 1: Estimation results MNL specifications

No Choice

Prod. Cat.

Brand constant constant dummies

Estimate s.e. Estimate s.e. Estimate s.e. Parameters β00 Brand A - - -1.990 *.131β01 Brand B 0.472 *.066 0.472 *.066 -2.473 β

*.140

.064

-0.173 *.075 -0.173 *.075β12 Dig. Opt (n) -0.732 *.052 -0.732 *.052 -0.732 *.052β13 Dig. Opt (y1) 0.172 *.043β14 Facsimile -0.543 *.032β15 Price 0.396 *.028 0.396 *.028 0.396 *.028

cnc No-Choice constant 2.46 . - - cpc Product ca nst. -2.461 *.121 -

Fit Statistics

02 Brand C -0.011 .070 -0.011 .070 -2.424 *.142β03 Brand D 0.037 .071 0.037 .071 -2.648 *.144β04 Brand E -0.187 *.075 -0.187 *.075 -2.441 *.141β05 Brand F 0.020 .071 0.020 .071 -2.793 *.144β06 Speed 0.129 *.028 0.129 *.028 0.129 *.028β07 Tech. Type A -0.633 *.086 -0.633 *.086 -0.633 *.086β08 Tech. Type B 0.575 *.064 0.575 *.064 0.576 *

β Tech. Type C -0.368 09*.078 -0.368 *.078 -0.368 *.078

β Tech. Type D 0.628 10*.064 0.628 *.064 0.628 *.064

β Tech. Type E -0.173 11*.075

0.172 *.043 0.172 *.043-0.543 *.032 -0.543 *.032

*

1 121

tegory co -

Ln-Likelihood 63 -2 66AIC 5358.035 5358.035 5358.035 BIC 50 5450.566 566 Pseudo R2 0.200 0.200 0.200 Predict Statistics

-26 .017 663.017 -2 3.017

54 .566 5450.

Ln-Likelihood -1706.087 -1706.087 -1706.087 AIC 3444.175 3444.175 3444.175 BIC 3530.219 3530.219 3530.219 Pseudo R2 0. 0.231 0.231 231

*: p<0.05

ction we comp re the estimation r sults of the MNL model with the no-choice Logit model) with an MNL specif tion that

ontain this consta t to show that not accounting for the no-choice option may give very results. F ore, both are comp ith the N NL m

erent choice situa

ference between the MNL model a o-choic model is the extra constant added for the no optio desi both mo l with tanda

mial Logit context for conjoint experime n the Neste git model there is one extra parameter (λ) called the dissimilarity coefficient (Börsch-Supan 1990). When its value is equal to 1, the Logit and Nested Logit model are equal.

For all models we use two versions, in the first situation the linear levels are coded as: {1, 2,

3, 4} respectively, and in the second situation as {-3, -1, 1, 3}, to investigate whether mean

Estimating Different Model Options In this se a e

constant (we call this from here the No-Choice icadoes not c nmisleading urtherm ared w ested M odel that presents a diff tion.

The dif nd the N e MNL

(cnc) -choice n in the gn, but dels fal in the s rd Multino nts. I d Lo

1992001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 214: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

centering the linear levels solves (part of) the problem in any of the three models considered. Table 2 lists the estimation result for all models. Note that the Nested Logit and the No-choice Logit models are not nested, but both are nested within the Logit model without the constant. In the Nested Logit model we do not estimate λ itself but estimate (1-λ) instead, to have a direct test on λ=1.

The first conclusion that can be drawn from Table 2 is that the No-choice Logit model gives

the best overall fit, in both situations, and converged to the same point. The Likelihood is significantly better than the standard MNL model (LR(1 df) tests, p<0.01). The No-choice Logit model and the Nested Logit model are not nested, so these models cannot be compared with an

C and BIC values show, however, that the No-choice Logit model fits better than the

LR test. The AINested Logit model, which itself is significantly better than the Logit model (LR(1) tests,

p<0.01) again in both situations. Table 2 also shows that the estimates for the dissimilarity coefficients (λ) are significantly different from 1 for the Nested Logit model, hence the NestedLogit differs significantly from the MNL model.

Table 2: Estimation and prediction results

Levels linear attributes: {1, 2, 3, 4} Levels linear attributes: {-3, -1, 1, 3}

Model:

MNL

Nested MNL No-Choice

MNL

MNL

Nested MNL No-Choice

MNL Est. s.e. Est. s.e. Est. s.e. Est. s.e. Est. s.e. Est. s.e.Parameters β01 Brand B 0.386 *.061 0.522 *.077 0.472 *.066 0.350 *.064 .0524 *.076 0.472 *.066β02 Brand C -0.013 .067 -0.009 .078 -0.011 .070 -0.012 .069 -0.010 .078 -0.011 .070β03 B

0.543 *.032 -0.386 *.031 -0.539 *.035 -0.543 *.032-0.013 .020 0.385 *.031 0.396 *.028 0.144 *.014 0.203 *.015 0.198 *.014

rand D 0.052 .067 -0.006 .080 0.037 .071 0.024 .071 0.001 .080 0.037 .071β04 Brand E -0.150 *.071 -0.166 *.087 -0.187 *.075 -0.142 *.074 -0.176 *.086 -0.187 *.075β05 Brand F 0.011 .067 -0.018 .081 0.020 .071 0.013 *.070 -0.011 .080 0.020 .071β06 Speed -0.237 *.021 0.118 *.031 0.129 *.028 0.046 *.014 0.067 *.016 0.064 *.014β07 Tech. Type A -0.531 *.081 -0.678 *.096 -0.633 *.086 -0.451 *.084 -0.682 *.095 -0.634 *.086β08 Tech. Type B 0.505 *.060 0.613 *.074 0.575 *.064 0.432 *.062 0.621 *.073 0.575 *.064β09 Tech. Type C -0.321 *.075 -0.366 *.086 -0.368 *.078 -0.281 *.077 -0.377 *.086 -0.368 *.078β10 Tech. Type D 0.514 *.060 0.635 *.074 0.628 *.064 0.471 *.062 0.651 *.073 0.628 *.064β11 Tech. Type E -0.132 .071 -0.149 .084 -0.173 *.075 -0.132 .073 -0.159 .083 -0.173 *.075β12 Dig. Opt (n) -0.586 *.049 -0.714 *.055 -0.732 *.052 -0.488 *.050 -0.721 *.055 -0.732 *.052β13 Dig. Opt (y1) 0.128 *.041 0.180 *.046 0.172 *.043 0.099 *.043 0.178 *.046 0.172 *.043β -0.445 *0.30 -0.528 *.035 -14 Facsimile β15 Price

1-λ Nested Logit - 0.924 *.009 - - 0.840 *.017 -cnc No-Choice - - 2.461 *.121 - - 1.150 *.048 Fit Statistics Ln-Likelihood -2906.738 -2715.413 -2663.017 -2947.716 -2701.927 -2663.017AIC 5843.476 5462.826 5358.035

5930.224 5555.358 5450.5665925.431 5435.854 5358.0356012.180 5528.386 5450.567

0.126 0.184 0.200 0.114 0.184 0.200Pred

BIC Pseudo R2

ict Statistics Ln-Likelihood -1883.069 -1741.723 -1706.087 -1960.978 -1735.014 -1706.087AIC 3796.139 3515.448 3444.175 3951.956 3502.028 3444.175BIC 3876.805 3601.492 3530.219 4032.622 3588.072 3530.219Pseudo R2 0.151 0.215 0.231 0.116 0.218 0.231

*: p<0.05

200 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 215: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

When the β-estimates are compared, Table 2 shows that the parameter estimates of the attributes with a dummy-coding (β01, …, β05, β07, …, β14) are somewhat different, although not dramatically so. However, in the left-hand side of Table 2, the coefficients of the linear attributes β06, β15) in the standard MNL model differ strongly from the other two models. Whereas the

estimate for speed is negative (a high level is unattractive) and significant for the MNL model, it positive (a high level is attractive) and significant for the other models. The price parameter

sho r

No-e

ls.

model el.

s

n efficients, some in the range of 5-10%, which may be

stantive interpretation. However, the fit of the No-choice Logit model is much bett

y tests, p<0.01) and which are also better than the Nested Logit

model in all situations. The Nested Logit model also predicts significantly better than the standard Logit model (LR(1) tests, p<0.01). Thus, the predictive validity results confirm the results on model fit. Note that although the MNL model with linear levels {1, 2, 3, 4} is clearly misspecified (as could be seen from the Speed and Price estimates), the likelihood, both in

(

isws a similar effect; it is negative but not significant in one situation and positive in the othe

for the MNL model and positive (lower price is more attractive) and significant in the other two models. Clearly, both estimates for the linear attributes show a strong negative bias. Note, however, that there are also differences in the other part-worth estimates across the models.

The right-hand side of Table 2 shows that when the Speed and Price variables are coded with values such that the mean of the levels is zero, the estimates for Speed and Price do not longer show the wrong sign, but are still biased downwards compared to the other models. In thechoice MNL model the estimates for all parameters are equal in both situations, except for thlinear parameters which have in the right-hand side of Table 2 exactly half the value of those inthe left-hand side of Table 2, which is the result of the doubled step-length of the linear leve

When the attributes Price and Speed are also coded with effects-type coding (not shown), the β-estimates are more similar across the three models, all having the same signs, however the MNL model that explicitly accounts for the no-choice option still outperforms the MNLwithout the constant and the NMNL mod

The conclusion that can be drawn from the above analysis is that the presence of a no-choicealternative and linearly coded attributes can give very misleading results, in particular for the parameters of those linear attributes when the conjoint choice data is estimated with a standardLogit model, without accounting for the no-choice option. However, the parameters of attributecoded with effects-type dummies are also affected, be it less severely. When all attributes are coded with effects-type coding the bias seems less strong, but still coefficients estimates are highly attenuated. Overall fit can be improved substantially by specifying a Nested Logit or byadding a No-choice constant to the design.

When we compare the Nested Logit and the No-choice Logit results we see that both compensate for the no-choice zero level for the linear attributes, but there are some differences ithe magnitudes of the estimated coimportant in sub

er than the Nested Logit model in our application.

The estimates in Table 2 were used to predict the 8 holdout choice sets. Table 2 gives the values of the statistics for the predictive fit of the three models for the three different designs options considered. The No-choice Logit model gives the best predictions which are significantlbetter than the Logit model (LR(1)

2012001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 216: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

estimation and o v

prediction, is better than those of the MNL models with both other ways of ding. Howe er, in all situations the Nested Logit and No-choice Logit models show superior

CO

interested at all in the product category under research and for this reason choose the no-choice. such a situation they would first decide whether to choose for the offered product profiles or

this beha nest than these product profiles.

he probability of the no-choice alternative may be an indication for the overall preference of the ness

of thbehavior of consumers in the market place. Second, respondents may choose the no-choice

choihoi , the MNL model, with a no-choice

esp the task.

to th

MN e reasons

ttra he effect of the no-

the mN ate.

opti ne utes (e.g. Brand) with regular dummies and code the (remaining)

ice ar

attri

cfit.

NCLUSIONS AND DISCUSSION

Respondents may choose the no-choice alternative for two reasons. First, they may not be

Innot and the Nested MNL model may be the most appropriate specification to describe

vior, since it puts the no-choice alternative in a differentTproduct in this case, and the model may be used to obtain an estimate of the overall attractive

e product category. Thus, in this case the no-choice alternative would capture “real”

because no real alternative in the choice set is attractive enough or because all alternatives are roughly equally attractive and they do not want to spend more time on making the difficult

ce. In this case the respondent treats the no-choice as “just another” alternative, and the no-ce captures an effect specific to the task. If this is the casec

constant, is the appropriate model to use, since it treats all alternatives equal. Now the utility of the no-choice option does not have a substantive meaning but serves as an indicator of

ondents’ involvement withr

In our application we saw that the No-choice MNL model produced better results compared e Nested MNL model. This may be an indication that the second explanation for choosing

the no-choice may have been appropriate. In other words, when a conjoint choice experiment with a no-choice alternative is estimated with both the No-choice MNL model and the Nested

L model, the (predictive) fit of the models may give an indication of the substantivrespondents have to choose the no-choice option. In particular, when the No-choice MNL model provides the best fit it may be inappropriate to interpret the estimates as reflecting the overall

ctiveness of the category. We would like to note that in our application tachoice option itself was significant. If that is not the case, the model converges to the standard MNL and may neither fit nor predict better. However, the inclusion of the no-choice constant in

odel allows for this test, and at the same time gives an indication of whether the standard L would be more appropriM

We also showed that there are at least three equivalent ways to account for the no-choice on in the MNL model. One could add a no-choice constant or a product category constant, orcould code one of the attribo

attributes in the design with effects-type and/or linear coding. Not accounting for the no-chooption may lead to biased estimates of the attribute-levels, especially the estimates of the line

butes may be highly affected.

202 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 217: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES Akaike, H. (1973), “Information Theory and an Extension of the Maximum Likelihood

Principle”, In: B.N. Petrov and F. Csáki (eds.) “2nd International Symposium on Information Theory”, Akadémiai Kiadó, Budapest, 267-281.

Batsell, R.R. and J.J. Louviere (1991), “Experimental Analysis of Choice”, Marketing Letters,

2(3), 199-214. Börsch-Supan, A. (1990), “On the Compatibility of Nested Logit Models with Utility

Maximization”, Journal of Econometrics, 46, 373-388. Brownstone, D., D.S. Bunch, and K. Train (2000), “Joint Mixed Logit Models of Stated and

Revealed Preferences for Alternative-fuel Vehicles,” Transportation Research B, 34(5), 315-338.

Carson, R.T., J.J. Louviere, D.A. Anderson, P. Arabie, D.S. Bunch, D.A. Hensher, R.M. Johnson,

W.F. Kuhfeld, D. Steinberg, J. Swait, H. Timmermans, and J.B. Wiley (1994), “Experimental Analysis of Choice”, Marketing Letters, 5(4), 351-368.

Haaijer, M.E. (1999), “Modeling Conjoint Choice Experiments with the Probit Model”, Thesis,

University of Groningen. Haaijer, M.E., W.A. Kamakura, and M. Wedel (2000), “The Information Content of Response

Latencies in Conjoint Choice Experiments”, Journal of Marketing Research, 37(3), 376-382. Haaijer, M.E., W.A. Kamakura, and M. Wedel (2001), “The ‘No-choice’ Alternative in Conjoint

Choice Experiments”, International Journal of Market Research, 43(1), 93-106. Haaijer, M.E., M. Wedel, M. Vriens, and T.J. Wansbeek (1998), “Utility Covariances and Context

Effects in Conjoint MNP Models”, Marketing Science, 17(3), 236-252. Louviere, J.J. (1988), “Conjoint Analysis Modelling of Stated Preferences. A Review of Theory,

Methods, Recent Developments and External Validity”, Journal of Transport Economics and Policy, January, 93-119.

Louviere, J.J. and G. Woodworth (1983), “Design and Analysis of Simulated Consumer Choice

or Allocation Experiments: An Approach Based on Aggregate Data”, Journal of Marketing Research, 20(4), 350-367.

McCulloch, R.E. and P.E. Rossi (1994), “An Exact Likelihood Analysis of the Multinomial

Probit Model”, Journal of Econometrics, 64, 207-240. McFadden, D. (1976), “Quantal Choice Analysis: A Survey”, Annals of Economic and Social

Measurement, 5(4), 363-390. Schwarz, G. (1978), “Estimating the Dimension of a Model”, Annals of Statistics, 6, 461-464.

2032001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 218: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

204 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 219: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

HISTORY OF ACA

Richard M. JohnsoSawtooth Software, In

INTRODUCTION

Although ACA makes use of ideas that originated much earlier, the direct thread of its historybegan in 1969. Like much work of the development in marketing research, it began in responseto a client problem that couldn’t be ha

n c.

ndled with current methodology.

s employed by Market Facts, Inc., and the client was in a durable goods usiness. In his company it was standard practice that whenever a new or modified product was

seri

d come to him and say: “We’re going to put two handles on it, it’s going to produce 20 units per minute, it will weigh 30 pounds, and be green.” Our client

lient found that there was never time to do the required concept tests fast enough to affect the product design cycle. So he came to us with what he considered to be an urgent

way to test all future product modifications at once. He wanted to able to tell the product manager, “Oh, you say it’s going to have one handle, with 22 units per minute, weigh 30 pounds and be green? Well, the answer to that is 17 share points. Any other questions?”

Of course, today this is instantly recognizable as a conjoint analysis problem. But Green and

Rao had not yet published their historic 1971 article, “Conjoint Measurement for Quantifying dgmental Data” in JMR. Also, the actual problem was more difficult than indicated by the

ane th some

cified level. This presented two immediate problems: a new ethod of questioning was needed to elicit information about values of attribute levels, and a

new estimation procedure was needed for converting that information into “utilities.”

THE PROBLEM

In the late ‘60s I wab

ously contemplated, a concept test had to be done. The client was responsible for carrying out concept tests, and he answered to a product manager who commissioned those tests. Our client’s experience was like this:

The product manager woul

would arrange to do a test of that concept, and a few weeks later come back with the results.

But before he could report them, the product manager would say: “Sorry we didn’t have time to tell you about this, but instead of two handles it’s going to have one and instead of 20 units per minute it will produce 22. Can you test that one in the next three weeks?” And so on.

Our c

problem – the need to find a

Jucdote above, since the client actually had 28 product features rather than just four, wi

having as many as 5 possible realizations. Tradeoff Matrices

It seemed that one answer might lie in thinking about a product as being a collection of separate attributes, each with a spem

2052001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 220: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Our solution came to be known as “Tradeoff Analysis.” Although I wasn’t yet aware of Land Tukey’s work on Conjoint Measurement, that’s what Tradeoff Analysis was.

To collect data, we presented respondents with a number of empty tables, each crosslevels of two attributes, and asked respondents to rank the c

uce

ing the ells in each table in terms of their

reference. We realized that not every pair of attributes could be compared, because that might lead to an enormous number of matrices to be ranked. After much consideration, we decided to pair each attribute with three others, which resulted in 42 matrices for the first study. One has to experience filling out a 5x5 tradeoff matrix before he can really understand what the respondent

oes through. If the respondent must fill out 42 of them, one can only hope he remains at least partially conscious through the task.

To estimate what we now call part-worths, we came up with a non-metric regression procedure which found a set of values for each respondent which, when used to border the rows and columns of his matrices, produced element-wise sums with rank orders similar to respondents’ answers.

Although we learned a lot about how to improve our technique for future applications, this first study, conducted in 1970, was a success. The client was enthusiastic about his improved abil

than

ng rs,

came clear that Tradeoff Analysis was just a different variety of Conjoint Analysis. As such, it made all of the assumptions common to Conjoint Analysis, plus one more big

lues

alues for attribute levels did not epend on which other attributes were present in a concept description. In other words, Tradeoff

Analysis required a strong “all else equal” assumption regarding the attributes omitted from each matrix.

This made Tradeoff Analysis uniquely vulnerable to distortion if attributes were not considered to be independent by respondents. Suppose two attributes are different in the mind of the researcher, but similar in the mind of the client, such as, say, Durability and Reliability. When trading off Durability with price, the respondent may fear he is giving up Reliability when considering a lower level of Durability. This kind of double-counting can lead to distorted measures of attribute importance.

p

g

ity to respond to his product manager’s requests. The client company commissioned many additional tradeoff studies, and similar approaches were used in hundreds of other projects during the next several years.

In those early days there was less communication between practitioners and academics we enjoy today. My early work at Market Facts was done almost in a vacuum, without the knowledge that a larger stream of similar development was taking place simultaneously amoPaul Green and his colleagues. ACA benefited greatly from interactions with Paul in later yeaand as time passed it be

one. Assumptions and Difficulties

Like other conjoint methods, we assumed that the utility of a product was the sum of vaattaching to its separate attribute levels. However, Tradeoff Analysis, like all more recent “partial profile” methods, further assumed that respondents’ vd

206 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 221: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

As another example, price is often regarded as an indicator of quality. As long as partial profile concept presentations include both price and quality, one would not expect to see reversals in which higher prices are preferred to lower ones. However, if concept presentations include only price but not quality, price may be mistaken as an indicator of quality and

spondents may act as though they prefer higher prices.

compared re to be considered identical on all omitted attributes.

A second problem unique to Tradeoff Analysis was the difficulty respondents had in carrying

out the ranking task. Though simple to describe, actual execution of the ranking task was beyond the capability of many respondents. We observed that many respondents simplified the

sk by what we called “patterned responses,” which consisted of ranking the rows within the columns, or the columns within the rows, thus avoiding the more subtle within-attribute tradeoffs we were seeking. This difficulty appeared to be so severe that it motivated the next step in the evolution which resulted in ACA. Computer-Assisted Interviewing

Researchers who began their careers in the ‘70s or later will never be able to appreciate the ramatic improvement of computer technology that occurred during the ‘50s and ‘60s. In the

late ‘50s “high speed” computers were available, but only at high cost and in a limited way. While at Procter and Gamble in the early ‘60s I considered myself lucky to have access to a computer at all, but I would get one or at best two chances in a 24 hour period to submit a

rogramming project for debugging. A single keypunch error would often render an attempt useless. It’s amazing that we rk done at all under those conditions. However, in the ‘70s time sha providing an enormous improvement in access to computers.

CRT terminals first became available, I became excited about the possibility of using em to enhance the quality of market research interviews.

eeting went well until the unveiling, hen, with a flourish, I removed the cloth to reveal the CRT. When they saw the tiny screen in

the

re

Similar problems still characterize all partial profile methods today, and it remains critically important when using partial profile methods to remind respondents that the concepts a

ta

d

pwere able to get any woring became common,

In marketing research, we depend heavily on data from survey respondents. When

something is wrong in a set of results, it can often be traced to a problem at the “moment of impact,” when the respondent provided the data. Originally having been trained as a psychologist, I was interested in the dynamics of what happens in interviews. When time sharing and th

I still remember an experience at Market Facts when I arranged a meeting of the company’s management to demonstrate the radical idea of computer-assisted interviewing. I had borrowed the most cutting-edge CRT terminal of the time, which consisted of a tiny 3-inch screen in an enormous cabinet. I had shrouded the CRT with a cloth so I could introduce the idea of computer-assisted interviewing without distraction. The mw

enormous cabinet, everyone in the room began to laugh. And they continued laughing until Iended the meeting. Fortunately, CRT terminals also improved rapidly, and it wasn’t long before computer-assisted interviewing became entirely feasible.

2072001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 222: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Pairwise Tradeoff Analysis Ranking cells in a matrix can be difficult for respondents, but answering simple pairwise

tradeoff questions is much easier. For example, we could ask whether a respondent would prefer a $1,000 laptop weighing 7 pounds or a $2,000 la

ptop weighing 3 pounds.

Consider two attributes like Price and Weight, each with three levels. In a 3x3 tradeoff matrix there are 9 possible combinations of levels, or cells. We could conceivably ask as many as 36 different pairwise preference questions about those 9 cells, taken two at a time.

However, if we can assume we know the order of preference for levels within each attribute, as w e

oid questions comparing two classes of cells. First, we can avoid questions comparing any two cells that are similar on one attribute, such as comparisons of cells in the sam

er without error) we may have to ask only two questions to infer the answers of the remaining seven. For example, in the 3x3 matrix wit

d e f g h i

referred to c would permit inference that column differences are all more important than any row differences, and we could also infer the entire ran

en

were concerned that the required number of tradeoff matrices would strain the capabilities of our respondents.

ons ting program that could be used

to administer a pairwise tradeoff interview. We purchased what was then described as a “minicomputer,” which meant that it filled only a small room rather than a large one.

e probably can for price and weight, we can avoid asking many of those questions. Supposwe arrange the levels of each attribute in decreasing order of attractiveness, so that cells above and to the left should be preferred to those below or to the right.

Then we can av

e row or in the same column. This avoids 18 possible questions. Of the possible questions that remain, we can avoid those that compare any cell with another that is dominated on both attributes, such as below it and to its right. That eliminates another 9 questions, leaving a total of only 9 for which we cannot assume the answer.

Among those, if we are lucky (and if respondents answ

h lettered cells,

a b c

if we were to learn that c is preferred to d and f is preferred to g, then we could infer that rows dominate columns in importance, and we could infer the rank order of all 9 cells. Likewise, learning that g is preferred to b and h is p

k order.

By the mid ‘70s computer technology had advanced sufficiently that it became feasible to do computer-assisted Tradeoff Analysis using pairwise questioning. A large project was undertakfor a U.S. military service branch to study various recruiting incentives. The respondents were young males who had graduated from high school but not college. A large number of possible incentives were to be studied, and we

My associate at Market Facts, Frank Goode, studied strategies for asking pairwise questithat would be maximally informative, and wrote a question-selec

208 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 223: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Res ,

f

und that questioning format to be dramatically easier for respondents than filling out tradeoff matrices. The data turned out to be of high quality and the study was judged a complete suc

gic marketing consulting practice. We were still utterly dependent on the quality of the data provided by respondent interviews, and that

any problems; but I remained convinced that computer-assisted interviewing held at least part of the answer.

-n

of m eant not having to be connected by phone lines and not having to wait for one’s turn in time sharing, and also provided powerful computational resources. My initial approach differed from the previous one in several ways:

First, it made more sense to choose questions that would reduce uncertainty in the part-

worths being estimated, rather than choosing questions to predict how respondents might fill out tradeoff matrices. This was a truly liberating realization, which greatly simplified the whole approach.

Second, it made sense to update the estimates of part-worths after each answer. Each update

took a second or two, but respondents appeared to appreciate the way the computer homed in on their values. One respondent memorably likened the interview to a chess game where he made a move, the computer made a move, etc.

Third, a “front-end” section was added to the interview, during which respondents chose

subsets of attributes that were most salient to them personally, as well as indicating the relative importance of each attribute. The questioning sequence borrowed some ideas from “Simalto,” a technique developed by John Greene at Rank Xerox. We used this information to reduce the

umber of attribute levels to be taken into the paired-comparison section of the interview, as well rate an initial set of self-explicated part-worths which could be used to start the paired-

comparison section of the interview.

Finally, those paired-comparison questions were asked using a graded scale, from “strongly refer left” to “strongly prefer right.” Initially we had used only binary answers, but found dd

pondents sat at CRT terminals at interviewing sites around the U.S., connected to a centralcomputer by phone lines. Each respondent was first asked for within-attribute preferencespermitting all attributes subsequently to be regarded as ordered, and then he was asked a series ointelligently chosen pairwise tradeoff questions.

We fo

cess. That study marked the beginning of the end for the tradeoff matrix. Microcomputer Interviewing

In the late ‘70s Curt Jones and I founded the John Morton Company, a partnership with the goal of applying emerging analytic techniques in a strate

led to m

By that time the first microcomputers were becoming available, and it seemed that computerassisted interviewing might finally become cost-effective. We purchased an Apple II and I begatrying to produce software for a practical and effective computer-assisted tradeoff interview. Use

icrocomputers m

nas to gene

pa itional information could be captured by the intensity scale.

2092001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 224: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Small computers were still rare, so the experience of being interviewed had considerable ente

had seen his own part-worths as revealed by the computer, he often couldn’t wait to use the same technology in a project.

We purchased several dozen Apple computers, and began a fascinating adventure of using them all over the world, in many languages and in product categories of almost every description. Those early Apples were much less reliable than current-day computers. I could talk for hours about difficulties we encountered, but the Apples worked well enough to provide a substantial advance in the quality of data we collected. ACA

In 1982 I retired as a marketing research practitioner, moved to Sun Valley, Idaho, and soon started Sawtooth Software, Inc. I had been fascinated by the application of small computers in the collection and analysis of marketing research data, and was now able to concentrate on that activity.

IBM had introduced their first PC in the early ‘80s, and it seemed clear that the “IBM-compatible” standard would become dominant, so we moved from the Apple II platform to the IBM DOS operating system. With that move we achieved 80 characters per line rather than 40, color rather than monochrome, and a large improvement in hardware reliability.

ACA was one of Sawtooth Software’s first products. The first version of ACA offered comparatively few options. Our main thought in designing it was to maximize the likelihood of useful results, which meant minimizing the number of ways users could go wrong. I think we were generally successful in that. ACA had the benefit of being developed over a period of several years, during which its predecessors were refined in dozens of actual commercial projects. Although there were some “ad hoc” aspects of the software, I think it is fair to say that “it worked.”

During the last 20 years I’ve had many helpful interactions with Paul Green and his colleagues. One of the most useful was a JMR article by Green, Kreiger, and Agarwall with suggestions about how to combine data from the self-explicated and paired comparison sections. Those suggestions led to a major revision of the product which provided additional user options.

ACA has also benefited from helpful contributions of other friendly academics, especially Greg Allenby and Peter Lenk. The ACA/HB module uses Bayesian methods to produce estimates of individual part-worths that are considerably better than the usual estimates provided by ACA.

In particular, HB provides a superior way to integrate information from the two parts of the interview. That consists of doing standard Bayesian regression where the paired comparison answers are the only data, and where the self-explicated data are used only as constraints.

rtainment value. We found that an effective way to sell research projects was to pre-program a conjoint interview for a prospective client’s product category and take an Apple with us on the sales call. Once a marketing executive had taken the interview and

210 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 225: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

I believe ACA users who are content with the ordinary utilities provided by ACA are too

satisfied with their results. The results from using the HB module are enough better than those of standard ACA that I think the HB module should almost always be used.

I have been involved in one way or another with ACA for more than 30 years. During that

time it has evolved from an interesting innovation to a popular tool used world-wide, and has been accepted by many organizations as a “gold standard.” As I enter retirement, others are carrying on the tradition, and I believe you will see continuing developments to ACA that will further improve its usefulness.

2112001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 226: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

212 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 227: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

A HISTORY OF CHOICE-BASED CONJOINT

Joel Huber Duke University

DUCTION

I will present today a brief and highly selective history of choice-based conjoint. In the

shadow of largely negative critical but positive popular response to Jurassic Park III, I propose that it is useful to think about ACA as the most efficient predator, the T-Rex of the rating-based conjoint systems. Just as ACA dominated most of the world of conjoint, I predict that choice

ased conjoint may evolve to eventually dominate over ratings based systems. The purpose of his

ement on choices instead f ratings, even ACA’s extremely clever battery of questions? There are three primary reasons:

e

were available.

ly use the implications of a choice model. Choice models can be timate the impact of a change in price on the expected share of

d

ight be reluctant to answer. However, they

to

INTRO

bt talk it to indicate how choice experiments, originally seeming so like tiny mammals scurrying under the heels of mighty dinosaurs, are likely to dominate the marketing research landscape. WHY CHOICES?

The first question to ask is why should we base preference measuro

1. Choice reflects what people do in the marketplace. In contrast to ratings, which peopl

rarely do unless asked by market researchers, people make choices every day. These choices make the difference between success and failure for a product or a company.Choices can be designed to replicate choices in the marketplace, but more important to assess what people would choose if options

2. Managers can immediatefed into a simulator to esan item. With choices it is not necessary to make the assertion that people’s ratings will match their choices, we only need to assert that their stated choices will match actual choices. Matching choices may require a leap, but the leap is much more justifiable anless risky than the hurdle between ratings and choices.

3. People are willing to make choices about almost anything. It is surprising how people

are willing to make choices, but less willing to offer general judgments supporting those choices. For example, suppose you were to ask a person how much more they would pay per year to buy electricity from a company that won an award from Friends of the Earth, over one that costs you $100 but is on the worst polluter list. Most respondents would consider that a hard question and mhave little problem with a choice between an award winning utility that cost $125 per month versus one on the worst polluter list that costs $100. Just as people are facile at making choices given partial data in the market, so they have little problem respondinghypothetical choices in a survey.

2132001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 228: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

In short, there are good reasons for basing business decisions of the responses of people to hypothetical choices. The reason they were not used initially is that choices were so very harmodel. Th

d to e story of the evolution of the technology that could tame choices to be able to

appropriately model market behavior encompasses the rest of this talk.

m are as

ffering, but for a whole range of possible changes?

ings when constructing a general utility function—choices contain much less information. Contrast the information in a choice among four items with a rating on each. The choice only tells which is best, whereas the ratings tell that plus information on how much better each is compared with the others. This power of ratings is illustrated by the fact that a group of 16 ratings constructed with an efficient factorial design offers substantial information on the utility of five attributes each with four levels. By contrast, one choice out of 16 indicates which is best and little more.

Not only are choices relatively uninformative, modeling them can be problematic. Choice robabilities are constrained to be greater than zero and less than one, making linear models

isleading. Maximum likelihood methods can provide logically consistent odels, but, until recently have not been able to deal with the critical issue of variability in dividual tastes. I will first review the evolution of maximum likelihood models and then move the far more perplexing issue of heterogeneity.

DAPTING OLS TO CHOICE

Historically, early choice modelers attempted to solve the estimation problem by building on e model on which most of us were weaned, ordinary least squares. The strategy was to treat

hoice probabilities as if they were continuous and defined over the real line. This “linear robability model” suffered both estimation and conceptual flaws. In terms of estimation, it agrantly defied the constant heteroscedasticity assumption of OLS, but worse permitted

probability estimates that were less than zero or greater than one. Conceptually, its linearity osed a second, and for marketers a more vexing problem—it assumed a link between market ctions and share estimates that almost everyone knew was wrong. Regression assumes that the lationship between share estimates and market efforts is linear, whereas in fact it needs to

follow an s-shaped or sigmoid distribution.

To understand why a sigmoid shape is important, consider the following thought experiment. uppose three children’s drinks reside in different categories, one drink with 5%, the second 0%, and the third with 95% share of its market. Which drink is most likely to gain from the ddition of a feature, such as a price cut or a tie into Jurassic Park? The answer is generally the ne with the moderate share; the brand with 5% share has not yet developed a critical mass, and

WHITHER CHOICES?

Where did choice experiments come from? Choice experiments in their simplest forold as commerce. All they require is the monitoring of sales as a function of different prices or features. The problem for choices is just the same as the one that motivated the development of ACA. How can we set up choice experiments to make predictions not just for one shift in an o

Choices harbor an obvious disadvantage compared with rat

pinappropriate and mminto A

thcpfl

pare

S5ao

214 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 229: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

the m the feature is the same (say 5%) regardless of its original share, something few of us would expect.

The solution was to transform the choice probabilities so that they would follow an s-shaped curve shown in Figure 1. Historically, various functional forms were used, but most focus was placed on the logistic transformation and the cumulative distribution for the normal curve. The

ormal ogive makes the most theoretical sense, since under the central limit theorem it approximates randomness arising from the a n of independent events. However, since

e logit is easier to estimate and is indistinguishable from the normal distribution except for its ightly heavier tails, logit came into common use.

The logistic transformation for the binary choice takes a very simple form. It says the

probability of choosing an item is transformed into utility by Ux = log(px/(1-px)). (1)

Solving for px provides the familiar expression for probability,

px = 1/(1+ eUx ), (2)

which when graphed looks just like Figure 1.

Figure 1

one with 95% share has no place to grow. The linear model assumes that the share boost fro

nggregatio

thsl

Typical Sigmoid Curve

0.4

0.5

0.6

0.7

0.8

0.9

1

5 6 7 8

0

0.1

0.2

0.3

0 1 2 3 4

Choice Probability

Marketing effort

2152001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 230: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

More significantly, if we take the first derivative of probability with respect to Ux, we get dpx/dUx = px(1-px), (3) which when graphed, looks like Figure 2, having a maximum at px = .5. Thus the marginal imp d close to zero

ed what

An obvious solution was to substitute a probability close to zero or one when they occurred, based on the premise that it shouldn’t matter if as long as it is sufficiently close. Unfortunately, it importantly matters how close one is. Contrast substituting .9 as opposed to .99999 for 1.0. In the latter case ln(p/(1-p) becomes a relatively large number, imputing a large effect on the solution. Thus the analyst is put in an untenable position of having to make an apparently arbitrary adjustment that strongly changes the results.

act of any action affecting utility is maximized at moderate probabilities and minimize or one, just as one would expect.

Figure 2

The logistic transformation solved a conceptual problem by replacing the input choice probabilities with their logits, following Equation (1). However, this clever solution raisproved to be an intractable implementation problem—logits are not defined for probabilities of zero or one. They approach negative infinity for probabilities approaching zero and positive infinity for those approaching one. This became a problem for the managerial case where probabilities are often zero or one.

0

0.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

1.5

2

2.5

3.5

4

3

4.5

Marginal value of incremental effort

Original probability of choice

216 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 231: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Then, about 25 years ag c ce Fa e

o, Daniel McFadden published his seminal paper on conditional logit odels of hoi (Mc dd n 1976). While much of what he wrote was derived from other

sch he

bilities of zero or one. One simply estimates the likelihood of such events given the parameters of the model. The article also

ed

den showed that the estimates

ge on of the model along with

t

ed .

ew,

molars, he was the first to put it all together in one extraordinary paper. His development of t

conditional logit model contained three important components: 1. The choice probabilities follow from a random utility framework. Random utility

assumes that the utility of each alternative has a defined distribution across the population. The probability of an item being chosen equals the probability that its momentary utility is the best in the set. If these random utilities are distributed as multivariate normal, then Probit results. Conditional logit assumes that they are distributed as independent variables with Weibull distributions.

2. The conditional logit choice model is estimable using maximum likelihood. Unlike least

squares, MLE has no difficulty with choice proba

offered efficient ways to search for the solution by providing the derivatives of the closform of the likelihood function with respect to the parameters.

3. McFadden worked through the critical statistics. McFadare asymptotically consistent, meaning that with enough observations they converthe true values. He also specified the statistical properties estimates of the covariance matrix of the parameters.

Even in hindsight it is difficult to comprehend the importance of this paper. It provided in

one document the entire system to enable one to analyze choice experiments. For this and other econometric work Dan McFadden last year received the Nobel Prize in Economics.

From a marketing research perspective the next critical event in choice modeling occurred eight years later when Jordan Louviere and George Woodworth (1983) published their evolutionary application of discrete choice theory to market research. Whereas McFadden’s r

work was applied largely to actual market and travel choices, Louviere and Woodworth took thatheory and applied it to experimentally designed choice sets. They thus offered the advantages of choices as their dependent variable combined with the control and flexibility of choice experiments designed by the researcher. From conjoint they borrowed the idea of orthogonal arrays and stimulated the use of partially balanced incomplete block designs for choice experiments. Coming five years after Green and Srinivasan’s (1978) classic review of issues in conjoint analysis, Louviere and Woodworth showed how choice experiments using the new conditional logit models could be applied to solve managerial problems that had heretofore reli

n ratings-based conjoint analysiso

That was almost 20 years ago, and despite its promise choice-based conjoint was quite slow in becoming accepted. One of the reasons was that its major competitor, ratings based conjoint, was itself evolving. These improvements, stimulated largely by Paul Green, Rich Johnson and even Jordan Louviere, made ratings-based methods less vulnerable to competition from the nelegant upstart. Additionally, the early choice models had a flaw that made their usefulness less than might have been otherwise, a flaw I like to call being hit by a red/blue bus.

2172001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 232: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

HIT BY A RED/BLUE BUS

This problem with early choice models arose from a property of multinomial logit readily ack

%

ounter-intuitive result for the red/blue bus problem, and can have isastrous implications for many managerial decisions.

product line given a company’s cost structure

All of these uses depend on the differential substitutability of options. Just as the red bus is much more substitutable with the blue bus than with a car, so are brands sharing the same feature, family brand, or price tier. Ignoring differential substitutability can lead to a consistent bias where one underestimates the degree that a new or repositioned brand takes from a company’s current brands, and overestimates the share that will be taken from dissimilar brands. Particularly in product line decisions, just as one must consider joint cost in minimizing the

roduction cost, so one must consider differential joint demand in accurately estimating sales.

so

ts only into the bus share. Generally, a simulator with individual utilities for ac

t

their

d detail. The rest of this paper reviews four competing ethods proposed to enable choice models to account for heterogeneity: (1) including customer

nowledged by both McFadden and Louviere. Logit assumes IIA, or independence from irrelevant alternatives, also known as the red bus, blue bus problem. The red bus, blue bus problem is simple and well known. Suppose votes in a population are equally shared between a red bus and a car, each with a 50% choice share. What happens if the choice set is expanded by a second bus equivalent to the red bus except for its blue color? One would expect that the two buses would then split the 50% share leaving the car at 50%. Not so under logit, however, whichrequires that the new blue bus take share equally from the red bus and the car, resulting in a 33share for each. The logit adjustment is known as the principle of proportionality, whereby a new alternative takes from current ones in proportion to their original shares. It works fine as a first approximation, but results in c

d

The problem of IIA in managerial choice models becomes more apparent if one considers ways managers want to use conjoint. They use conjoint to

• Provide demand estimates for a new or revised offering • Determine the impact of a new offering on current ones and on competitors • Optimize a

p

A major driver of differential substitutability is differences in values across the population, and the problem can be limited if the model takes into account these differences. In the red/bluebus example, the problem goes away if half the sample is modeled as preferring buses to cars, hat the blue bus cut

e h person has little trouble expressing differential substitutability. In ratings-based conjoint each respondent has a unique utility function and a simulation works much as one would hope. The problem is that, unlike ratings-based conjoint, the early conditional logit models assumed homogeneity across respondents.

It turns out that choice methods that account for variation in populations, just like species thabetter manage diversity in their genetic pools, are best able to survive. The choice models in the mid ‘80s and early ‘90s were not adept at handling variation in preferences, and this limitedpragmatic usefulness. They could provide an overall picture of a market, but could not offerritical descriptive or segment-basec

m

218 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 233: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

cha l

nal stes.

e terms for say, the value of price given as a function of six ge-income categories. Once the logit equation estimates these cross terms, then market share

esti

er, and approximate differential

bstitution will be revealed.

e nvenient,

e

The second problem relates to the characteristics of most marketing choices. Particularly at a bra

er’s of

As a result of conditioning on factors that bear too little relation to choice, the conditional log

aste amakura and Russell 1989, Wedell et. al 1999). By having enough (say 10) mass

oints it should be possible to approximate most preference distributions. The latent class h mass point and its parameters. Estimation of choices

ares then invol s simp for each segment weighted by its mass. Because he

racteristics in the utility function, (2) latent class, (3) random parameters, and (4) hierarchicaBayes.

1. Including customer characteristics in the utility function. McFadden’s original conditiologit model includes a provision for cross terms that link utility for attributes with customercharacteristics. Inclusion of such terms in a market model can account for heterogeneous taFor example, the analyst might includa

mation proceeds by a process called sample enumeration. That is, one first estimates shares within each distinct group and then aggregates those estimated shares. The aggregated share estimates do not display the IIA restriction, even if those within each group do. Items that areliked by the same or similar groups will take share from each othsu

Thus, by including customer characteristics one can in principle avoid a collision with thred/blue bus. However, two problems remained with this solution, the first merely incobut the second devastating. The first problem is that there can be many parameters in such a model. If there are 20 parameter and 12 customer characteristics that affect those parameters, then 240 parameters populate the fully crossed model. This many parameters can result in a difficult modeling task and runs the risk of producing an overfitted model that registers noisinstead of signal.

nd level, the correspondence between measured customer characteristics and choice is poor, with a typical R2 of less than 10%. The reason for the lack of correspondence is that many decisions, particularly at the level of a brand name or version, reflect accidents in a consumhistory rather than deterministic components. Customer characteristics may do a good job predicting how much soda a person will consume, but they are notoriously poor at specifying the particular brand or flavor.

it model did a poor job representing heterogeneity in marketing contexts. What was needed was a way to estimate variability in tastes directly on the choices, something achieved by our next three models.

2. Latent class. A latent class logit model assumes that heterogeneity across respondents can be characterized by a number of mass points, reflecting segments assumed to have identical tparameters (Kpprogram provides the weight for eac

ve ly aggregating sharessht latent segments tend to be highly variable they can both provide an understandable characterization of the market and result in appropriate choice shares reflecting differential substitution among alternatives.

2192001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 234: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Four years ago Sawtooth Software produced a promising version of latent class, called ICE (Johnson 1997). This method took the latent class solution and built a model for each respondent s a mixture of the different classes. Thus, one person might have a weight of .5 on class 1, a

wei dicted

of our fault appears to have been more to do with overfitting when

ying to approximate each individual as a mix of mass points.

3. Random parameter logit models. Economists such as Dan McFadden were well aware of problems with the conditional logit model in accurately representing variation in tastes. In response, they expanded the model to include the maximum likelihood estimate of the distribution of parameters over the population (McFadden and Train 2000). That is, the routine outputs not just the mean of the parameters, but also provides estimates of their variances and co-

ariances across the population. Simulations would then randomly sample from the distribution of parameters, and aggregate those estimated shares across the population. This ingenious technique solved the red/blue bus problem and gave economist an efficient and elegant way to characterize complex markets.

While the random parameter models reflected an important advance for economists striving understand aggregate markets, they were less useful to those in marketing for two reasons.

Fir

odels.

at

different model for each group. Thus, from the perspective f a marketing researcher, random parameter models are significantly more cumbersome to

imp

at jointly estimates the distribution of tastes both across and within individuals. The Bayesian system arose out of a pragmatic tradition that focuses on estimation and distrusts conventional hypothesis tests,

aght of -1 on class 2 and a weight of 1.2 on class 3. The routine then used these pre

utilities to represent the values of each person. It was then possible to put these individualparameters in a choice simulator and develop predictions for each individual, just like the simulator for ratings-based conjoint.

It turned out that this method did not work as well as hierarchal Bayes, described below. In my view the problem was not with the counter-factual nature of the assumptions, since all models are approximations. Itstr

v

tost, they depend on the form of distribution of preferences assumed. It is not likely that

preferences across the population will be normally distributed, but will likely have several segments or regions of higher density. That is not a problem in principle, as different underlying distributions can be tested. However, permitting multiple peaks and complex co-variances increases the number of parameters and complicates the estimation of random parameter m

The second problem is that the random parameter models do not provide utility estimatesthe individual level, only estimates of their distribution. Thus, it becomes more difficult to, for example, estimate models for subpopulations. With individual utility functions one can simplygroup or weight parameters of respondents and then predict their behavior. Random parameter models require that one re-estimate ao

lement. HIERARCHICAL BAYES MODELS

The final and current contender for modeling choice arose out of a very different research and theoretical tradition. Hierarchical Bayes uses a recursive simulation th

220 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 235: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

preferring to substf

itute confidence intervals arising out of a mixture of prior beliefs and sample ormation.

bined with a person’s actual choices to generate a posterior distribution of that person’s parameters. In terms of estimation, hierarchical Bayes uses a simulation that generates both the aggregate and the individual distributions simultaneously. It uses the likelihood function not in the sense of maximizing it, but by building distributions where the more likely parameter estimates are better populated.

From my perspective the surprising thing about hierarchal Bayes is how well it worked (Sawtooth Software 1999). Tests we ran showed that it could estimate individual level parameters well where respondents only have to respond to as many choice sets as parameters in the model (Arora and Huber 2001). While the HB estimates still contain error, the error appears to cancel across respondents so that estimates of choice shares remained remarkably robust. This robustness stems from two factors.

First, hierarchical Bayes is more robust against overfitting. As the number of parameters increase, maximum likelihood tends to find extreme solutions; by contrast, since HB is not concerned with the maximum likelihood, its coefficients reflect a range. By having a less exacting target, a distribution instead of a point estimate, hierarchical Bayes is less susceptible to opportunistically shifting due to a few highly leveraged points.

Second, HB is robust against misspecification of the aggregate distribution. Typically the aggregate distribution of parameters is assumed to be normal, but HB individual results tend not to change much depending on that specification, say an inverted gamma or a mixture of normals. The reason is that the aggregate distribution serves as a prior, nudging the individual estimates towards its values, but not requiring it to do so. Particularly if there is enough data at the individual level (e.g. at least one choice set per parameter estimated at the individual level), then the individual posterior estimates will not depend greatly on the form of the aggregate taste distribution. For example, if there is a group of respondents who form a focused segment, they will be revealed as a bulge in the histogram of individual posterior means. Thus, hierarchical Bayes is both less dependent on the form of the aggregate distribution and offers a ready test to see how well that form is satisfied.

It should be emphasized that the Bayesian aspect of the hierarchical Bayes model is not what makes it so useful for us in marketing research. The same structural model can be built from mixed logit aggregate distribution used to generate distributions of individual parameters given their choices. The resulting individual means are then virtually identical to those using Bayesian estimation techniques (Huber and Train 2001). What makes both techniques so effective is their use of aggregate information to stabilize estimates at the individual level.

in

Hierarchical Bayes took the heterogeneous parameter model and modified it both conceptually and in terms of estimation. Conceptually, it viewed the distribution of taste

arameters across the population as providing a prior estimate of a person’s values. Those values pwere com

2212001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 236: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

CONCLUSIONS

cho lusions, one about the volution of science and the second about what it takes to usefully model choices in a market.

for modeling consumers, the message points to the value of n open mind. Ten years ago I would have no more predicted the success of hierarchal Bayes

drawould try to model choices with least squares regression, only moving to a maximum likelihood

hen we had to, and then to hierarchal Bayes when it was shown to be so effective. However, it is those models with which we are least familiar that will certainly make the greatest changes in the coming millennium.

In terms of useful choice models, the theme here has been the value of modeling heterogeneity in creating appropriate market share predictions. Individual level models provide

e most robust and client-friendly method to estimate the impact of changes in offerings on

mec namics of a market. However, they are the best mechanism we have for predicting complex changes in share. I believe that one of the

ind rspective on market behavior. Now, with the magic of hierarchical Bayes applied to logit choice models, ratings-based conjoint is not alone

, is an easy prediction that choice-based conjoint will increasingly dominate its ratings-based

In the last 30 minutes I have offered you a quick and oversimplified story of the evolution ofice models in the last 30 years. I hope you will take with you two conc

e

In terms of the evolution of toolsathan I would have predicted that mammals would outlive the dinosaurs. We logically tend to be

wn to the models with which we are most familiar; thus it made sense that originally we

w

thmarket share. They have an admitted disadvantage; the choice simulator is not a very useful

hanism for helping an analyst understand the dy

major reasons rating-based conjoint has been so successful is because its simulations of ividual-level model offered an unrivaled pe

with this advantage. Given that choices are better for data collection, respondents and managersitcousin, at least until challenged by the next successful mutation.

222 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 237: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

REFERENCES Green, Paul E. and V. Srinivasan (1978) “Conjoint Analysis in Consumer Behavior: Issues and

Outlook,” Journal of Consumer Research, 5 (September), p. 103-123. Johnson, Richard, M (1997) “Individual Utilities from Choice Data: A New Method,” available

at www.sawtoothsoftware.com. Louviere, Jordan, and George Woodworth (1983) “Design and Analysis of Simulated Consumer

Choice or Allocation Experiments,” Journal of Marketing Research, 20, (November), p. 350-367.

Huber, Joel and Kenneth Train (2001) “On the Similarity of Classical and Bayesian Estimates of

Individual Mean Partworths,” Marketing Letters, 13(3), p. 257-267. McFadden, Daniel (1976) “Conditional Logit Analysis of Quantitative Choice Behavior,” in Paul

Zarenbka (ed.) Frontiers of Econometrics, p. 105-142, New York: Academic Press. McFadden, Daniel and Kenneth Train (2000) “Mixed MNL Models for Discrete Response,”

Journal of Applied Economics, 15(5), 447-470. Sawtooth Software Inc., (1999) “The CBC/HB Module for Hierarchical Bayes Estimation,”

available at www.sawtoothsoftware.com. Thurstone, L. (1927) “A Law of Comparative Judgment,” Psychological Review, 34, 273-286. Wedel, Michel; Wagner Kamakura, Neeraj Arora, Albert Bemmaor, Jeongwen Chiang, Terry

Elrod, Rich Johnson, Peter Lenk, Scot Neslin and Carsten Stig Poulsen (1999) “Discrete and Continuous Representations of Unobserved Heterogeneity in Choice Modeling,” Marketing Letters 10:3 219-232.

2232001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 238: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

224 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 239: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

RECOMMENDATIONS FOR VALIDATION OF CHOICE MODELS

Terry Elrod University of Alberta

INTRODUCTION

How are we to validate choice models sider this question in the context of choice-based conjoint; although the th fered in m nerally.

This paper points out two mistakes commonly made when validating choice models, explains the to

onstrate the efficacy of the proposed solutions. The following characterization of common practice describes both mistakes. I wonder if you will spot them.

We observe conjoint choices (or ratings) and holdout choices for a sample of customers. We fit seve l diffe th ustomer differences in different ways. W these ls by calculating their hit rates for the holdout choices fo onde d adopt odel e best hit rate.

Both mistakes are described in the last sentence. They are: (1) hit rates are used to evaluate

nd choose among models, and (2) the same respondents are used for both model estimation and mo

em yourself. I must be counted mong the guilty.

istake #1: Using Hit Rates to Choose a Model es are unreliable because hit rate differences are

all and noisy and they make inefficient use of the holdout data. Hit rates are invalid because ver e

Because hit rates make inefficient use of holdout data, they have a hard time identifying the bett

? I coneory and solutions of perta ore ge

consequences of these mistakes, and proposes remedies. I use examples and simulated datasupport my claims and dem

ra rent models at represent ce evaluate moder these resp nts an the m with th

adel validation.

Those of you familiar with the conjoint analysis literature will recognize the frequency with

which these mistakes are made, and you may even have made th1a

You are entitled to doubt that these are indeed mistakes. Most of this paper is intended to convince you that they are. Fortunately, practical remedies are at hand, which I also describe. M

Hit rates are unreliable and invalid. Hit ratsm

y poor models can have consistently better hit rates than better models. My remedy is to usthe loglikelihood of the holdout data rather than hit rates to choose among models. Hit Rates Are Unreliable

er model. I illustrate this with a simple example. There are two alternatives in the choice set, A and B, which are chosen equally often.

1 Elrod, Terry, Jordan J. Louviere and Krishnakumar S. Davey (1992), “An Empirical Comparison of Ratings-Based and Choice-Based Conjoint Models,” Journal of Marketing Research, 29 (August), 368-77.

2252001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 240: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Let’s allow the better of two models to have an expected hit rate of 80 percent. On any chooccasio

ice n, a choice of either A or B may be observed, and the model may predict either A or B.

Suppose the p ing this etter model are as shown in Table 1.

Table 1: Exp e P rm f the Better Model

serv Observe

robabilities of observed and predicted choices for a single occasion usb

ected Predictiv erfo ance o

Ob e A B Predict A 0.40 0.50 0.10Predict B 0.10 0.40 0.50

0.50 0.50 0.50

Observe A Observe B

Let’s also suppose the second model has a much worse expected hit rate of 70 percent, as

shown in Table 2.

Table 2: Expected Predictive Performance of the Worse Model

Predict A 0.35 0.15 0.50 Predict B 0.15 0.35 0.50

0.50 0.50

ive the marginal probabilities of the predicted and observed choices, respectively. Note that both models predicted that the two alte

considering only the reliability of hit rates as a m ans for model comparison. That is, how often will the better m ber of

oldout choice .

Case 1a: Independence in Model Predictions

First we will examine the simpler case of n n between the two models in their predictions. More precisely, given dge ac h we assume that knowing the prediction of one model does not help us guess re the other model. (We will onsider the dependent case in Section 1.1.2.)

The jo diction re given in Table 3. The table reveals that the probability of the better model hitting and the

worse model missing on any single occasion is only 0.24. This is also the probability that the hit rate criterion will identify the better model from any single holdout choice. There is also a probability of 0.14 that the worse model will be identified as better (it hits and the better one

The last column and bottom row of each of these tables g

rnatives will be chosen equally often, which agrees with their true choice probabilities. This is by design—I consider the validity of hit rates in Section 1.3.

Here the model with the higher expected hit rate is clearly the better model, and we aree

odel have a higher hit rate? The answer depends upon two things: the nums, and the degree of dependence in the predictions of the two modelsh

o depe denceknowle of the tual c oice,

the p diction ofc

int probabilities for the two models hitting or missing on any single choice prea

226 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 241: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

misses), and there is a probability of 0.62 that the two models will tie (either both hit or both miss).

Table 3: Hit/Miss Probabilities for the Two Models, Independent Case

Worse Model Better Model Hit Miss

Hit 0.56 0.24 0.80Miss 0.14 0.06 0.20

0.70 0.30

What is the probability that the better model will have a higher hit rate than the worse model given more than one holdout choice? The answer to this question for up to 20 holdout choices is shown in Figure 1. We see from the figure that, even with 20 holdout choices, the probability that the better model will be identified by the hit rate criterion is only 0.71.

Figure 1: Probabilities That the Better Model Wins/Ties/Loses, Independent Case

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Number of Holdout Choices

0.0

0.2

0.4

Pro

babi

lity

That

B

0.6

0.8

tter M

odel

..

1.0

.e

Loses

0.71Ties

Wins

ase 1b: Dependence in Model Predictions

en the two models and their predictions. se ice correctly if the other model did.

An instance of this form of dependence is given by Table 4.

The probability that the worse model will appear better on any choice occasion (that is, that it will hit and the other miss) has dropped from 0.14 in the independent case (Table 3) to 0.07 here, but the probability of a tie has increased. Figure 2 shows us the net effect. The probability of identifying the better model with 20 holdout choices has increased only slightly—to 0.76.

Table 4: Hit/Miss Probabilities for the Two Models, Dependent Case

C

Now we will consider the case of dependence betweems likely that one model is more likely to predict a choIt

2272001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 242: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Worse Model

Better Model Hit Miss Hit 0.63 0.17 0.80

Miss 0.07 0.13 0.20 0.70 0.30

Figure 2: Probabilities That the Better Model Wins/Ties/Loses, Dependent Case

1.0

Wins

Loses

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Number of Holdout Choices

0.0

0.2

0.4

0.6

Pro

babi

lity

That

Bet

t

0.8

er M

odel

...

0.76Ties

Holdout Loglikelihood: The Reliable Alternative to Hit Rates

What’s more reliable than hit rate? The likelihood (or equivalently, the loglikelihood) of holdout data.

The likelihood criterion has been used to fit models for the last 70 years. Maximum likelihood estimation, as its name suggests, seeks values for model parameters that maximize the likelihood of the data.

The increasingly popular Bayesian methods for model estimation also depend heavily upon the likelihood. Bayesian estimates are based upon the posterior distributions of parameters. Posterior distributions are always proportional to the likelihood times the prior distribution. Since prior distributions are typically chosen to be uninformative (in order not to prejudice the analysis), the posterior distributions are determined almost entirely by the likelihood.

Holdout loglikelihood makes efficient use of holdout data, making it more reliable than hit rates for model selection.

Holdout Loglikelihood Defined

228 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 243: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

The formu

la for holdout loglikelihood is given by:

( )∏ ∫ ∂=i i βββy ][]|[lnHLL , (1)

yi is a vector of his or her holdout choices, β is a vector of part -worths, [·] denotes a probability density, and [·|·] denotes a conditional probability density.

The formula of Equation 1 can be understood as follows. Our model tells us the probability of observing holdout choices yi for the i-th respondent given knowledge of his/her part-worths. (This conditional probability is denoted [yi | β].) Our model also specifies a distribution of the part-worths over the population of consumers (denoted [β]). The likelihood of the i-th respondent’s holdout choices y according to our model is therefore given by:

where i indexes the respondent,

i

∫ ∂= βββy ][]|[HL ii , (2)

and the holdout loglikelihood for all respondents is given by Equation 3, which is equivalent to Equation 1.

( )∑= i iLHlnHLL . (3) A Note on Model

The results of model comparisons are sometimes given in terms of deviance rather than loglikelihood. Devi e suggests, is a measure of the extent to which the data deviate from what is predicted by a model.

The relationship between holdout deviance (which we will denote as HD) and holdout loglikelihood (HLL of Equation 3) is simp

Deviance and Likelihood

ance, as its nam

ly

, HLL2HD +−= c (4)

here c is an arbitrary constant that is the s e for all m e data. Thus deviance easure of mode er deviance

dicates a better model.

Holdout loglikelihood (and holdout deviance) for some models can only be approximated. This is because the h h c involve evaluating a mathem tab . Equation 2 Demonstrating the Rel of Holdo glikeliho

We will create two di utions of pre ed choice probabilities th ight account for the two models’ predictions ses 1a and 1 e didn’t need to specify these distributions when discussing hit rates because the hit rate cri on consid nly which ice was predicted and otherwise ignores the probability associated with that pr tion. The holdout loglikelihood riterion, on the other hand, makes use of the probabilities associated with the choice predictions

wis a m

am odels fit to the saml inadequacy equivalent to using loglikelihood and a low

in

calculation of tatically intrac

e holdout likelile in al (cf

ood for ea h respondent can). tegr

iability ut Lo od stribof Ca

dictb. W

at m

teri ers o choedic

c

2292001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 244: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

to h

ty for the chosen alternatives that is less than one-half. (These predictions orrespond to “misses.”) A glance at the curve for the better model conforms that its distribution

percent.

elp distinguish better models from worse ones.

Figure 3 shows, for models Better and Worse, the cumulative distributions of predicted probabilities for chosen alternatives.2 The height of the curve for the worse model at the value 0.5 is equal to 0.3, which means that 30 percent of the time the worse model produces a predicted probabilicimplies a hit rate of 80

Figure 3: Distributions of Prediction Probabilities for the Two Models

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.6

0.7

0.8

0.9

1.0

WorseBetter

0.1

0.2

0.3

0.4

0.5

CD

Fs

Predicted probability of observed choice

Efficiency comparisons f r the i ent ca

column shows different probabilities of correctly identifying the better model. The second column shows the expecte er of dout ch needed to attain this probability using the holdout loglikelihood criterion. The third column s the ct ber of holdout choices needed using the hit rate c . Not that the ers h column are always much larger, reflecting the comparative inefficien hit r he nal column shows the fficiency of the hit rate criterion, which is simply the second column divided by the third.

. However, in the dependent ase reliability of the loglikelihood criterion is improved even more, and hence the efficiency of

o ndepend se (Case 1a) are shown in Table 5. The first

d numb hol oices show expe ed num

riterion ice numb in the it ratecy of ates. T fi

e

Efficiency comparisons for the dependent case (Case 1b) are shown in Table 6. You may recall from Section 1.1.2 that hit rates were somewhat more reliable in this more realistic scenario. This increase in reliability is reflected in the larger probabilities of identifying the better model shown in column one of Table 6 relative to those of Table 5chit rates is even worse, as can be seen in the fourth column of Table 6.

Be ame

valu2 ta distributions were used, with parameters (1.125,.375) and (.991,.509), respectively. The Beta parameters sum to the s

e (1.5) for both models, which holds the degree of consumer heterogeneity constant.

230 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 245: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 5: Efficiency Comparisons for the Independent Case

Number of Holdout Choices Required

Probability of Identifying the Hit Rate Better Model Loglikelihood Efficiency (%) Hit Rate

.60 2 10 20

.65 4 14 29

.70 7 19 37

.65 11 26 42

.80 17 36 47

Number of Holdout

Table 6: Efficiency Comparisons for the Dependent Case

Choices Required Probability of Identifying the Better Model Loglikelihood Hit Rate

Hit Rate Efficiency (%)

.65 1 12 8

.70 3 15 20

.75 5 19 26

.80 7 26 27

.85 11 34 32

.90 16 47 34

Why are hit rates so unreliable? Hit rates are unreliable because they throw away information about model performance by discretizing probabilistic predictions. For example, for two-brand choice all predicted probabilities of the chosen alternatives greater than 0.5 are lumped into the category “hit” and all probabilities less than 0.5 are lumped into the category “miss.” Throwing away so much information about the models’ predictions greatly increases our risk of choosing an i

ing measured on te measure due to the presence of a lot of random error. In

ion 1.1 the unreliability of hit rates was established, explained, and demonstrated. The superior reliability of holdout loglikelihood was demonstrated in Section 1.2.

Worse than being unreliable, a measure is invalid if it fails to measure what it is intended to measure even on average. I argue in this section that hit rates are invalid measures of model performance. I base this claim on the fact that very poor models can perform as well as good ones.

s

e. Since the theoretical hit rates are identical r these two models, no amount of holdout data will allow determination of the better model

nferior model. Hit Rates Are Invalid

A measure (or criterion) is unreliable if it is equal to the true entity beaverage but rarely provides an accuraSect

This is easily shown by means of an example. Consider Table 7, which compares two modelwith identical hit rates. Notice the difference in the marginal probabilities. The better model predicts that B will be chosen 30% of the time, which is correct, whereas the worse model

redicts that it will be chosen only 10% of the timpfo

2312001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 246: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

usi t for the data in

ng he hit rate criterion. Each model has an equal chance of obtaining the better hit rate hand.

Table 7: Case 2: Two Models With Identical Hit Rates

Observed Choice Predicted Choice A B

Better A 0.55 0.15 0.70 Model B 0.15 0.15 0.30 Worse A 0.65 0.25 0.90 Model B 0.05 0.05 0.10

0.70 0.30

This failure of the hit rate criterion has long been recognized. The common “remedy” is talso measure agreement between the o

o bserved and predicted marginal probabilities (i.e. choice

hares), using a measure such as mean square error of prediction.

must still choose the one best model and there is no theoretical basis for deciding how to

, however, which is the same as the remedy for

and validating models on the same respondents introduces predictable biases in odel selection. It causes biases the model lection process in favor of

er differences. Fortunately, a good alternative to this practice is also at hand: Validate models using holdout

profiles for holdout respond

e Intuition in a Nutshell The intuition behind the need to use holdout respondents as well as holdout profiles is easy to

con

ly

s

This solution is frequently inadequate; however, because often the model with the best hit rate is not the same as the model that fits the choice shares best. When this occurs, the researcher

combine the two different performance measures into a single measure of model adequacy.

There is a simple remedy to this problem

unreliability: use the holdout loglikelihood. It is not only more reliable than hit rates; it is also valid. Holdout loglikelihood optimally weighs aggregate and individual-level accuracy. (This is why we can use this single criterion to estimate our models.) The holdout loglikelihood has no trouble identifying the better of the two models of Case 2 shown in Table 7. MISTAKE #2: VALIDATING MODELS USING ESTIMATION RESPONDENTS

Estimatingm us to over fit customer heterogeneity. That is, it

models that estimate a lot of extra parameters in order to describe secustom

ents.

Th

vey because most of us understand intuitively why a model needs to be validated with holdout profiles rather than with the same profiles used to estimate the model. The same underlying reason explains why a model ought to be validated using holdout respondents.

In customer research we have a sample of customers evaluate a sample of profiles. Typicalit is not possible to include all profiles of interest in this study. (If it were, we would be doing

232 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 247: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

concept testing and not conjoint analysis.) Therefore our model needs to generalize beyond the pro

model eeds to generalize to the population of customers from which our sample was taken. Thus we

wan

The Principle of Model V

What I refer to as the principle of model validation is simply stated, and it applies equally to profiles and to customers.

The Principle of Model Validation. eds to neralize from a sample to the population fro sample as take must be validated using different portions of the sample for validation and estimation.

miliar with what happens when we use goodness-of-fit to estimation rion—as we add more and more parameters to our model, the fit

gets better and better. Here are some examples:

• A regression’s R-square keeps increasing as we add more and more independent variables.

• A brand-choice model fits the data better when we add brand-specific dummy variables.

• A segmentation analysis fits the data better and better as we divide the sample into more

and more segments.

• A ratings-based conjoint m est when we estimate a very general model separately for each respondent.

In all cases, using goodness-of-fit to es ata as our model selection criterion favors

models that “overfit the data.” Such models estim arameters to fit the very specific characteristics of the sample data. t data is excellent, the model predicts poorly for new cases because estim s cessary parameters results in unstable estimates of all parameters.

Most of us know that this tendency towards over fitting can be avoided by using as our model selection criterion goodness-of-fit to holdout data—that is, to data not used to estimate the model. We have assumed that, as long as different data are used for estimation and validation, the

files included in the study to the much larger set of profiles that can be generated by combining all levels of all attributes.

Similarly, it is rarely possible to include in our study all customers of interest, and our

nt our conjoint model to generalize beyond our sample to other respondents and other

profiles.

I show in the remainder of Section 2 that this implies we must validate our model using different profiles and different respondents than were used in estimation. My theoretical arguments are backed up with an analysis of simulated data in Section 3.

alidation

If a model ne gem which the w n, the model

I expect all of us are fa

data as our model selection crite

odel fits the estimation data b

timation date a lot of

While the fit extra p

o the sampleating all tho e extra unne

2332001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 248: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

principle of model validation has been satisfied. I show in the next section that this assumption is ofte

et y all the cells in the table excluding the first row and the first

column (which simply contain the words “Customers” and “Profiles”). In the case of Table 8, the enti

Profiles

n wrong, and I illustrate and explain proper application of the principle of model validation. Usual and Better Validation Methods

Table 8 portrays the common way of assessing predictive validity for conjoint models. This is the first of six tables which all portray validation methods by the same means. The entire data sat our disposal is represented b

re data set is represented by the last two cells in the bottom row, “Estimation” and “Validation.”

Table 8: The Usual Design for Model Validation

Customers Estimation Validation

The fact that the Estimation cell spans all the Customer rows (there is only one) butthe Profile columns signifies that the estimation data consist of evaluatio

not all of ns of only a portion of

the profiles by all respondents. The e remaining portion of profile valuations for all respondents co us all data are used, yet ifferent data are used for model estimation and model validation. So it would appear that the

principle of model validation is satisfied.

Not so. We have taken our sample from what are referred to as two domains: Customers and Profiles. We would like our model to generalize out-of-sample for both of these domains. Therefore the principle of model validation must hold for each of these domains. The usual model validation method is valid for the profile dom er domain.

The effect of this error is predictable. It will lead us to over fit inter-customer variability in the data, but not inter-profile variability.

An example of a proper validation design for conjoint studies is shown in Table 9. Model alidation is performed on different profiles and different respondents. Table 9 shows that the

model is assessed on its abil ot used in estimation.

Table 9: A Pr Design for Model Validation

Profiles

Validation cell shows that thnstitutes the validation data. The

d

ain but not for the custom

vity to predict to respondents and profiles n

oper

Estimation Customers alidation V

234 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 249: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

V-fold Cross-Validation A shortcoming of the proper validation method portrayed in Table 9 is that some data are

wasted—those observations represented by blank cells in the table. However we can easily avwasting respondents (and profiles) by making cyclical use of both for validation.

Table 10 illustrates what is known as two-fold validation, which uses all the data at our disposal but also satisfies the principle of model validation. This two-fold validation design would be implemented as follows. First the model is estimated on customers/profiles indicated by Estimation

oid

nd its loglikelihood is calculated for customers/profiles Validation1. The same mo

Table 10: Two-fold Validation

Profiles

1 adel is then estimated on Estimation2 and its loglikelihood calculated for Validation2. The

performance of the model is given by the sum of its validation scores on Validation1 and Validation2.

Estimation1 Validation2Customers Estimation2 Validation1

One might ask: Since the model is estimated twice, which model estimate is used,

Estimation1 or Estimation2? The best answer is: Neither. Model validation is used to select the best model. Once the best model is determined, it should be estimated on the entire data set in order to obtain the most precise estimates possible. A Problem With Two-fold Validation, and a Partial Remedy

There is a problem with two-fold validation. Assuming that the profiles and respondents are split in half, then Table 10 shows that models are assessed on their ability to generalize to new respondents and profiles when estimated on only one-fourth of the data at a time. In general, the best model to estimate for the entire data set will be more complex than the best model estimated on only one-fourth of the data. More data allow estimation of more parameters with sufficient reliability to generalize well.

This difficulty can be partially remedied, but not entirely. The remedy is to partition respondents and profiles into V parts of equal size and choose a value for V greater than two. Then we estimate the model V different times, each time on (V – 1)/V of the respondents and the same fraction of profiles, and evaluate each fit on its loglikelihood for the remaining 1/V respondents and profiles. The performance of the model is equal to its total loglikelihood over all V estimations of the model.

The larger the value we choose for V, the closer we are to evaluating models estimated on the entire data set, but the greater the number of times the model must be estimated.

Four-fold validation is illustrated in Table 11. It can be easier to think in terms of which customers and profiles are set aside for validation each of the four times, with it being understood that the other customers/profiles will be used in estimation.

2352001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 250: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 11: Four-fold Validation

Profiles V1

V2 V3Cus V4

tomers

In Table 11, Vi indicates the validation customers/profiles for each of the four “folds” i = 1, …, 4. That is, the division of customers and profiles into estimation and validation portions for the first fold is as shown in Table 12.

Table 12: Four-fold Validation, First Fold

Profiles V1

E1 Customers

An Allowable Simplification of V-fold Validation

While the four-fold cross-validation scheme of Table 11 possesses an elegant symmetry intreatment of respondents and profiles, it can be inconvenient in practice. While it is a simple matter to partition respondents into V groups and estimate the model on any V – 1 of these groups, the same cannot be said for the profiles. W

its

hile it is possible to come up with ue

explained in Section 2.1, the problem ith common validation practice is how it handles respondents, not with its handling of profiles.

q stionnaire designs consisting of V blocks of profiles such that the model may be estimated reliably on any V – 1 of these blocks, this is not easy to do using standard designs, even when Vis prespecified.

A simpler alternative to the design of Table 11 is shown in Table 13. There the familiar treatment of profiles in cross-validation is adopted—the profiles are divided into two groups,

e used for estimation and others for validation. As with somw

236 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 251: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Table 13: Four-fold Validation, Modified

Profiles

V1 V2 V3Customers V4

Note in Table 13 that the same profiles are used for validation each time. They need not be al number to the profiles used in estimation, and in fact they are typically fewer in number.

er the model must still be estimated four diequ inHowev fferent times, always on different respondents than those used for validation. in Tvalidati odel would require its being estimated and validated four different times, always using the estim

The13. Theestimatbecause

ation and validation, and need not worry simultaneously about reassigning profiles.

sensitiv is still cvalidatiprocedurofiles

, but nd

at on

Wh

An important advantage of this technique is that the holdout validation is performed using the aximum number of respondents possible (N – 1) for estimation. The best model estimated

sing N – 1 respondents cannot be very different from the best model estimated using the entire sample.

The data used to estimate the model that is validated on V1 shown able 13 is represented by the bottom three of the four blank cells shown in that table. Full

on of the mation and validation data as suggested by the table.

re are several advantages to the modified cross-validation procedure illustrated in Table primary advantage is that the problem of experimentally designing the profiles used in ion is simplified. A second advantage is that the validation procedure is easier to program we need only concern ourselves with changing the assignments of respondents to

estim

There are also two disadvantages that warrant mention. First, model selection is more e to choice of holdout profiles because the same profiles are always used for validation. Itommon to give great thought to the design of the estimation profiles but to select the on profiles in a haphazard manner, even though the efficacy of the cross-validation re is sensitive to both decisions. Staying with the old method of choosing validation encourages continuance of poor practice. p

The second disadvantage is that we still need to estimate the model as many times (V)

with a less reliable result. Much of the drawback to switching profiles used for estimation avalidation can be solved by programmers of conjoint software. It is the extra estimation time this an inevitable consequence of proper and efficient cross-validation, and the modified validatimethod of Table 13 does not help with this problem at all.

y Not N-fold Validation? N-fold validation involves estimating the model N times, where N is the number of

respondents, using a single one of the N respondents as the holdout each time.

mu

2372001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 252: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

However there are two disadvantages to N-fold validation. First, the model must be estimated N different times, which will usually be at least several hundred. Second, if there are numerous respondents in the sample that appear “aberrant” (as far as a model is concerned), every estimation sample will contain all, or all but one, of them. As a result, a model’s inability to properly account for these “aberrant” respondents cannot be readily discovered.3 Oth

the validation procedure.

he

number of times, reallocating respondents to holdout blocks randomly for each run, and the results summed over all runs. (A single “run” of a V-fold validation entails estimating the model V times, each of those times using a different (1/V) fraction of the respondents as holdouts.) Thus a second or third validation

ainty.

• Validate the model as you would use it. If you will use the model’s point estimates for prediction, validate the model using those point estimates. On the other hand, if you will retain uncertainty in the estimates when using the model, validate it this way. You can even validate a model using it both ways to compare model performance using its point

odel performance when you do not. (Although retaining parameter uncertainty in a model is more proper, using point estimates is usually more convenient.)

A Simulation For Investigating Mistake #2

ed a simulated data set to investigate the common practice of using the same resp

ifferent respondents for estimation and validation. A simulation has the important advantage that we know what the correct model is. We are using simulated data to see whether a model validation procedure succeeds or fails to select the true model.

ion lated data is unlikely to select the best model

s. This is particularly true when, as here, the simulation is simply confirming what we already know from theory.

er Tips For Validation Here are a few other suggestions for cross-validation.

• Begin by estimating the model on all respondents, and use the result as starting values for

the estimates in

• Assign respondents randomly to blocks, not sequentially according to their position in tdata file.

• A V-fold validation may be run any

run can be used when the first run fails to determine the best model with sufficient cert

estimates with m

I designondents for estimation and validation and compare this to my recommendation to use

d

Of course, in the real world the true model is unknown to us. However a model select

rocedure that fails to select the true model for simupfor our real problem

3 It is for this same reason that the jackknife, which leaves out only one observation, is inferior to the bootstrap. See Efron, Bradley (1982), The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia, PA: Society for Industrial and Applied Mathematics.

238 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 253: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Data Set Design I simulated 32 pairwise choices by 256 respondents. Sixteen of the choices were used for

estimation and 16 for validation. The two alternatives in each choice set are described in terms of n binary attributes. The attribute levels were assigned to the 32 pairs of alternatives by using a

the design divided the choice questions

A nice property of pairwise designs with binary attributes is that they can be generated using

rimental designs for binary attributes. One row of the design suffices to er

cept that provides for a tendency to choose either the first or second alternative in

ach question. Thus eleven coefficients (part-worths) are needed to describe each individual’s choice behavior. These part-worths were made to vary across respondents according to a multivariate normal distribution given by

te211 orthogonal array (“oa”) design. The last column ofnto estimation and holdout choice questions. i

ordinary expecharacterize the two alternatives in the choice set. The binary design specifies which of the twolevels is assigned to the first alternative in the pair, and the other alternative is assigned the othlevel for that attribute.

The individual-level model contains ten variables for the ten attributes (main effects only)lus an interp

e

( ) ( ) ( )1.0,,9.0,0.1,1.1diagand0.1,,2.0,1.0,0where,,n 11 K=′µΣµ K=Σ . (5)

igure 4 shows the distributions of choice probabilities over the sixteen estimation choice questions.

F

Figure 4: Distributions of Choice Probabilities for the Estimation Questions

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Choice probabilities

0.0

0.1

0.2

0.3

0.4

0.5

0.9

1.0

CD

Fs

0.6

0.7

0.8

ls were assessed by cross-validation. They all correctly represent the ind“mo

The Models Assessed

Three different modeividual-level model but they differ in their representation of customer heterogeneity. A fourth del” is used to provide a benchmark when validating using estimation respondents.

2392001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 254: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

MVN.True Here we use the true values of the parameters of the multivariate normal distribution,

( )Σµβ ,n][ 11= , (6)

wh

ere the values for µ and for Σ are as given in Equation 5. Then the holdout likelihood calculation for each holdout individual is given by ( )∫ ∂= βΣµβy ,n]|[HL 11ii

MVN.Sample

In thi

. (7)

s model we assume that we know that the true distribution of part-worth heterogeneity m

le

is ultivariate normal, but we don’t know the values for the multivariate normal parameters (µand Σ). However, since we know from our simulation the true part-worth vectors for our sampof respondents, we can estimate µ and Σ from these. These estimates, based on the sample, are denoted using the usual notation µ̂ and Σ̂ . Thus our model of customer heterogeneity is

( )Σµβ ˆ,ˆn][ 11= , (8) and the holdout likelihood for the i-th respondents is

( )∫ ∂= βΣµβy ˆ,ˆn]|[HL 11ii

Ind.Mixture

In this case we assume we also don’t know that the population of part-worth vectors has a ultivariate normal distribution. Instead we use the true part-worth vectors for our sample of

. (9)

n the

pondents used for estimation and { , i* = 1, … N*} denote the set of part-worth vectors for these res ondents, the distribution of part-worths is taken to be

, (10)

and the holdout likelihood for the i-th respondents is

mconsumers, and let these stand in as our estimate of the true distribution of part-worths ipopulation.

Letting i* = 1, …, N* index the N* res *iβp

⎩⎨⎧ =∈

=otherwise0

*}N,,1*i,{if*)N/1(][ *i Kβββ

∑=*i *iii ]|[*)N/1(HL βy . (11)

When validating on estimation respondents, the i-th respondent of Equation 11 will be among

the N* estimation respondents, but when validating on holdout respondents he or she will not.

240 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 255: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

This model may seem strange to some readers, but in fact it is used by choice simulators that predict shares based on individual-level part-worth estimates. The part-worths for the sample respondents are used to represent the true distribution of customer heterogeneity. Ind.True

This final case is for reference only because it cannot be used in cross-validation. No distribution for the part-worths β is specified. We simply use each respondent’s true part-worth vector to calculate the holdout likelihood for his or her holdout choices. This provides us with a performance threshold which no model can be expected to beat. The holdout likelihood for the i-th respondent given knowledge of his or her true part-worths is simply ]|[HL iii βy= . (12) Desired and Expected Results

We seek the best model for the population of customers from which our sample of respondents was taken.

Our validation procedure ought to identify MVN.True as best, MVN.Sample as second best, and Ind.Mixture as worst.

MVN.True uses the true model for part-worth heterogeneity in the population of consumers, nd

ng the true values of these param

e

may be better than MVN.Sample if the true distribution of customer heterogeneity departs substantially from the multivariate normal distribution. That is why we need a valid and reliable tool that will identify the best model of customer heterogeneity for a given data set.

As explained in Section 2, the common practice of using the same respondents for estimation and validation can be expected to lead us to adopt a model that overfits respondent heterogeneity. That is, models which have more freedom to (over)fit the heterogeneity in the sample will appear to perform better than they should.

Thus, common validation practice should cause us to prefer Ind.Mixture over the other two models because it has the most freedom to over fit heterogeneity in the sample. It should also cause us to prefer MVN.Sample to MVN.True because it has some freedom to fit heterogeneity in the sample (unlike MVN.True), but not as much freedom to overfit as Ind.Mixture.

a we can’t do better than the truth!

MVN.Sample is not as good a model as MVN.True because we are using estimates of the

multivariate normal parameters based on the sample, which is not as good as knowing and usieters.

Ind.Mixture is not as good as either MVN.True or MVN.Sample for the simulated data

because it does not make use of the fact that the true distribution for heterogeneity is multivariatnormal.

Of course for real-world data we won’t have MVN.True at our disposal, and Ind.Mixture

2412001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 256: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

Model Compariso tion Figure 5 plots the results from calculating the holdout deviance for the four models of

ection 3.2 using the common validation practice of using all sampled respondents (but different ho

ns Results When Using Estimation Respondents for Valida

Sc ice sets) for estimation and validation.

Figure 5: Model Performance for the Estimation Respondents

MVN.True MVN.Sample Ind.Mixture Ind.TrueEstimates

2800

2820

D 2840

2860

2880

evia

nce

The holdout deviance scores (cf. Equations 2–4) are portrayed in the figure as black dots. Thho dout deviance for the reference model, Ind.True, can be calculated exactly (Equation 12).

dout deviance for the other models was approximated by Monte Carlo methods using version of WinBUGS.4 Figure 5 also show

e l

Hol.3 s the 95% credibility intervals for these approximations.

attained by running the simulation program longer.)

Recalling that lower deviance scores are better, we see that the true model of customer heterogeneity (MVN.True) is shown to perform worst of all. MVN.Sample is shown to perform better. And better still, according to this procedure, is Ind.Mixture. Thus the bias of using the same respondents for estimation and validation is exactly as we predicted. Model Comparisons Results When Using Validation Respondents for Validation

Finally, Figure 6 shows the results obtained from using the “modified” four-fold validation design of Table 13. The performance of Ind.Mixture is shown to be much worse than the other two models. Although it appeared to be better than the other two models when employing usual validation practice, it is apparent here that it overfits heterogeneity for the sampled respondents.

1(Increasing accuracy can be

4 See Spiegelhalter, D. J., A. Thomas and N. G. Best (1999), WinBUGS Version 1.2 User Manual, MRC Biostatistics Unit. For further information about WinBUGS, consult the URL http://www.mrc-bsu.cam.ac.uk/bugs/.

242 2001 Sawtooth Software Conference Proceedings: Sequim, WA.

Page 257: PROCEEDINGS OF THE - Sawtooth SoftwareFOREWORD The ninth Sawtooth Software Conference, held in Victoria, BC on September 12-14, 2001, will be forever remembered due to the tragic events

243

Figure 6: Model Performance for the Validation Respondents

MVN.True MVN.Sample Ind.MixtureEstimates

2900

3000

3100

3200

Dev

ianc

e

The Ind.True reference point cannot be calculated for holdout respondents and is not portrayed. (One can never know the individual-level part-worths for respondents that are not in the sample.)

The two best models according to Figure 6 are MVN.True and MVN.Sample. The best model ought to be MVN.True but in fact the better of these two models could not be distinguished with certainty. That is, their 95% credibility intervals overlap considerably. This underscores the need for measures of model performance with maximum power and validity. You may recall from Section 1.1 that the hit rate criterion was criticized for being less reliable than holdout likelihood. We cannot afford to waste information about model performance when choosing among models. IN CONCLUSION

This paper identifies two mistakes commonly made when validating choice models, and proposes two remedies.

• Don’t use hit rates to “validate,” or choose among, models. Use holdout loglikelihood (or equivalently, holdout deviance) for improved reliability and validity in model selection.

• Don’t “validate” models using estimation respondents. Validate on holdout respondents

and avoid models that overfit respondent heterogeneity, leading to poor predictions for customers not in the sample.

It is worth noting that the common practice of using hit rates to validate models presumes

that the same respondents are used for estimation and validation. That is, we obtain estimates of each respondent’s part-worths and use these to predict that same respondent’s holdout choices. As Sections 2 and 3 demonstrate, it is important to validate models on holdout respondents. I see no valid method for validating models using hit rates on holdout respondents and choices, so it appears we have discovered a third reason to avoid using hit rates.

2001 Sawtooth Software Conference Proceedings: Sequim, WA.