18
This article was downloaded by: [Georgia Tech Library] On: 12 November 2014, At: 21:43 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Transport Reviews: A Transnational Transdisciplinary Journal Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ttrv20 Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling Pengfei Li a & Donggen Wang b a Department of Mathematical and Statistical Sciences , University of Alberta , Edmonton, Alberta, Canada b Department of Geography , Hong Kong Baptist University , Kowloon Tong, Kowloon, Hong Kong Published online: 26 Oct 2009. To cite this article: Pengfei Li & Donggen Wang (2009) Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling, Transport Reviews: A Transnational Transdisciplinary Journal, 29:5, 619-634, DOI: 10.1080/01441640902829454 To link to this article: http://dx.doi.org/10.1080/01441640902829454 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

  • Upload
    donggen

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

This article was downloaded by: [Georgia Tech Library]On: 12 November 2014, At: 21:43Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Transport Reviews: A TransnationalTransdisciplinary JournalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/ttrv20

Numerical Analysis of the StatisticalProperties of Uniform Design in StatedChoice ModellingPengfei Li a & Donggen Wang ba Department of Mathematical and Statistical Sciences , Universityof Alberta , Edmonton, Alberta, Canadab Department of Geography , Hong Kong Baptist University ,Kowloon Tong, Kowloon, Hong KongPublished online: 26 Oct 2009.

To cite this article: Pengfei Li & Donggen Wang (2009) Numerical Analysis of the StatisticalProperties of Uniform Design in Stated Choice Modelling, Transport Reviews: A TransnationalTransdisciplinary Journal, 29:5, 619-634, DOI: 10.1080/01441640902829454

To link to this article: http://dx.doi.org/10.1080/01441640902829454

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Page 2: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 3: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Transport Reviews, Vol. 29, No. 5, 619–634, September 2009

0144-1647 print/1464-5327 online/09/050619-16 © 2009 Taylor & Francis DOI: 10.1080/01441640902829454

Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

PENGFEI LI* AND DONGGEN WANG**

*Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Alberta, Canada; **Department of Geography, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong KongTaylor and FrancisTTRV_A_383115.sgm

(Received 5 February 2008; revised 16 September 2008; accepted 15 December 2008)10.1080/01441640902829454Transport Reviews0144-1647 (print)/1464-5327 (online)Original Article2009Taylor & Francis0000000002009PengfeiLi

ABSTRACT Stated choice methods have been widely used in transportation studies since1980s. In recent years, much research attention has been paid to develop optimal orefficient designs for choice experiments, such as the so-called D-optimal design, whichdoes not seek for orthogonality as the traditional approach does but aims at minimizingthe determinant of the variance–covariance matrix of the parameter estimators. This paperexamines the statistical properties of an alternative design method—uniform design,which also does not look for orthogonality but aims at maximizing uniformity—a measurethat is closely related to model efficiency. We compare the estimation efficiency and predic-tion efficiency of uniform design with that of the traditional fractional factorial orthogonaldesign in stated choice modelling. Monte Carlo experiments are used to generate models,whose parameters vary in scale. The results show that though uniform design uses a lotfewer profiles than orthogonal designs do, its prediction and estimation efficiencies instated choice modelling are comparable to that of orthogonal design.

Introduction

Stated choice methods have become established modelling tools for transporta-tion studies since 1980s when the Louviere and Woodworth’s (1983) article waspublished. In a traditional stated choice experiment, respondents are presentedwith a number of choice sets, each composed of several alternatives or profiles.These profiles are different combinations of attribute levels, which are usuallyconstructed using full or fractional factorial orthogonal designs. Shifted pairs, L2k,2J Block, all pairs and McFadden’s sampling rule are the commonly used methodsfor assigning profiles into choice sets (Louviere, 1988).

Fractional factorial orthogonal designs are based on two principles: balance andorthogonality. For balance, the designs require that attribute levels appearequally often; for orthogonality, the designs need to ensure that combinations ofattribute levels appear equally often for any two attributes. As a consequence of

Correspondence Address: Pengfei Li, Department of Mathematical and Statistical Sciences, University ofAlberta, Edmonton, Alberta, Canada T6G 2G1. Email: [email protected]

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 4: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

620 P. Li and D. Wang

these two principles, the number of profiles generated by fractional factorialorthogonal designs is usually sizable and rapidly increases with increasingnumbers of attributes and/or levels. This is particularly significant for casesinvolving attributes with uneven levels (e.g. for a case of seven attributes,two with 2 levels, three with 3 levels and two with 5 levels, at least 900 profileswill be needed by the fractional factorial orthogonal design). Fractional factorialorthogonal designs may provide no solution for extreme cases. A large number ofprofiles imply huge costs of model development and substantial demand fromrespondents. Thus, fractional factorial orthogonal designs impose constraints onmodellers regarding the numbers of levels and attributes available for use. Onepossible way to overcome the drawbacks of the fractional factorial orthogonaldesigns is the use of blocking. Typically, we can use an attribute to block profiles.However, the number of profiles in each block may still be quite large in somecases. For example, for the case of seven attributes mentioned above, if we blockthe profiles according to the attribute with five levels, each block will contain 180profiles, which is still a sizable number.

While orthogonality is an important consideration for estimating linear modelsefficiently, it may not be the case for estimating nonlinear models such as multi-nomial choice models (Sandor and Wedel, 2001, 2002, 2005; Kanninen, 2002; Roseand Bliemer, 2008). The conventional way of stated choice modelling is based onfractional factorial orthogonal designs; thus, it may not support the efficientestimation of parameters in choice models.

Recently, a number of attempts have been made to construct optimal or statisti-cally more efficient stated choice experiments (Kanninen, 2002; Kessels et al., 2006;Rose et al., 2008). It is proposed that choice experiments should be designed tooptimize a criterion that best addresses the requirements of modellers. Onecriterion that is considered important and should be optimized is the efficiencyof model estimation. This has led to the development of the so-called D-optimal orD-efficiency design, which has attracted much attention in recent years. D-optimaldesign seeks to maximize the determinant of the Fisher information matrix orminimize the determinant of the variance–covariance matrix of the parameter esti-mators so that the unknown parameters are efficiently estimated. For example,Kanninen (2002) proposed the optimal choice probability design for multinomiallogit models with continuous attributes and linear effects only. The design placesall attributes but one at their boundary points according to the two-level orthogo-nal design, and the remaining attribute is determined by choice probabilities,which may depend on prior information about the unknown parameters. Adilemma for applying D-optimal design is that modellers should know the modelform of choice probabilities as well as the attributes and their parameters beforeconducting the experiment. There is then a circular problem here because theobjective of model development is to look for the model form and estimatethe parameters. There are several ways that have been suggested to deal with thisproblem.

The first approach is to assign some prior values for the unknown parametersand construct the D-optimal design under these assigned values, see for example,Street et al. (2001), Burgess and Street (2003, 2005) and Street and Burgess (2004) forusing zero prior values for the parameters, and Huber and Zwerina (1996) forusing non-zero prior values for the parameters. The second approach applies theidea of Bayesian optimal design. Instead of assigning fixed values to the unknownparameters, it assumes a prior distribution, usually normal distribution, for the

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 5: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Uniform Design in Stated Choice Modelling 621

unknown parameters and based on which finds the Bayesian D-optimal design.Readers are referred to Sandor and Wedel (2001, 2002, 2005) for more details.

Notwithstanding these suggested approaches, modellers still need to haveprior information about the model form of choice probabilities. In some cases, wemay know that some attributes have significant effects on choice probabilities butmay not know what exactly the effects are. For example, we may know that thecost of transport mode may affect mode choice, but we may not know whetherthe effects are linear or quadratic. As a result, we may wonder whether we shouldinclude linear, quadratic or both effects in the model. If quadratic effects areimportant for the choice probabilities, but we use an efficient design for the modelwith linear effects only, this design may not be efficient or even not suitable forstudying quadratic effects (Kanninen, 2002). Even when the model form of thechoice probabilities is known in advance, if the parameters are misspecified, anefficient design based the misspecified parameters may not be efficient for thetrue model, see the examples in Rose and Bliemer (2008). Recently, severalresearchers have developed the so-called robust efficient designs, which areclaimed to be robust to problem of model form and parameter misspecifications(see Adewale and Wiens, 2009 for reviews). However, to implement this kind ofefficient designs, the applicant should have a good knowledge about the truemodel, at least a small value range which contains the true model. In other words,the applicant should have sufficient prior information about the model form andparameter values.

While the objective of D-optimal design is appealing, it is better considered as adesign criterion rather than a design method. Wang and Li (2002) introduced anew experimental design method: uniform design, which is also a kind of optimaldesign, for it seeks to maximize uniformity that is closely related to efficiency.This method selects from the S-dimensional space (‘S’ refers to the number ofattributes) experimental points or profiles that are uniformly (or evenly) scatteredin the space. The number of profiles produced by uniform design is substantiallyfewer than that by the fractional factorial orthogonal design. In addition, uniformdesign can easily handle the cases of attributes with uneven levels and provide areasonable number of profiles.

Comparing to D-optimal design, uniform design does not require prior infor-mation on model form and parameter values. As explained in the next section, thekey principle of uniform design is to select experimental points from the problemdomain so that the estimation (or prediction) of the model response is as accurateas possible. The more uniform the design is, the more accurate the estimatedmodel is. Hence, uniform design works on the model directly, while the D-optimaldesign deals with the parameter values given the model form and some priorinformation on parameter values. Therefore, comparing to D-optimal design,uniform design does not require prior information on model form and parametervalues, and is more robust to model form and parameter misspecifications.

In this paper, we intend to examine the statistical properties of uniform designwhen it is applied to develop multinomial choice models. Two important statisti-cal properties: estimation efficiency and prediction efficiency are considered.Firstly, are parameters estimated from uniform design close enough to the trueparameters? Secondly, one of the main objectives for using the stated choicemodel is to predict the choice probabilities (Kessels et al., 2006). So, will themodels developed from uniform design provide predictions at acceptable accu-racy level, or specifically, are the predicted choice probabilities from uniform

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 6: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

622 P. Li and D. Wang

design close enough to the true choice probabilities? Finally, how is uniformdesign comparable to fractional factorial orthogonal design in terms of theseproperties?

Wang and Li (2005) made an attempt to answer these questions. However, thestudy considered only fixed choice set design and fixed models. Other choice setdeign methods were not investigated. This paper extends the previous attempt tocomprehensively examine the statistical properties of uniform design for multino-mial choice models. Both uniform and fractional factorial orthogonal designs areused to generate profiles. The commonly used choice set design strategies includ-ing shifted pairs, L2k, 2J Block, all pairs (Bunch et al., 1996) and McFadden’ssampling rule are applied to put profiles into choice sets. Monte Carlo experimentsare used to generate simulated models, whose parameters vary in scale. The twomeasures of statistical properties are computed for and compared between differ-ent designs. Because of its wide applications, multinomial logit models will beexamined. The analysis can be easily extended to other types of choice models. Wefirstly develop some notations and asymptotic properties about the maximumlikelihood estimates of the unknown parameters and predicted choice probabilitiesof multinomial logit model. Some measures for comparing uniform and fractionalfactorial orthogonal designs are then introduced. Next, we describe the experimentframeworks and the Monte Carlo experiments. Conclusions and discussions arepresented in the last section.

Principles of Uniform Design

Assume that the research question is to establish a model between a responsevariable Y and S factors or attributes, denoted by X1,…,XS. The number of levelsof these attributes is denoted by q1,…,qS, respectively. The total number of allpossible profiles is given by N = q1 × q2 × … × qS. The collection of the N profiles isdenoted by CS. Selecting n profiles Z1,…,Zn from CS can get n response values Y1 =f(Z1),…,Yn = f(Zn). We can estimate the expectation of Y:E(Y) by averaging the nresponse values:

The research question here is how to select the n points, that is Z1,…,Zn from CS,so that is the closest to E(Y), or has the highest accuracy. The upper bound of thegap between and E(Y) is given by the Koksma–Hlawka inequality (Niederreiter,1992):

where V(f) is the variation of the integrand f on CS, which we cannot control;D(Z1, Z2,…,Zn) is a discrepancy function, whose value is a function of the design(i.e. the n experimental profiles). According to the upper bound, the smaller thediscrepancy, the higher the accuracy of the model is. On the other hand, the moreuniform that the experimental points of a design are scattered, the smaller thediscrepancy of the design is. Uniform design (UD) is such an experimentaldesign method that selects the n experimental profiles, Z1,…,Zn, which areuniformly (or evenly) scattered in CS. Uniformity can be measured by the

Yn

f Zii

n

==∑1

1

( ) (1)

YY

∆n nE f Z Y V f D Z Z Z= − ≤| ( ( )) | ( ) ( , , , )1 2 K (2)

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 7: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Uniform Design in Stated Choice Modelling 623

so-called Star Lp–discrepancy (Hua and Wang, 1981; Niederreiter, 1992) orCentred L2–discrepancy (Hickernell, 1998). For more details of uniform design,readers are referred to Wang and Li (2002).

To compare uniform design with the fractional factorial orthogonal design,assume there are S attributes each with q1,…,qS levels, respectively. The numberof profiles generated by the fractional factorial orthogonal design is a multiple ofthe lowest common multiple of q1 × q2, q1 × q3,…,q1 × qS,…,qs−1 × qS. The number ofprofiles generated by uniform design is the multiple of the lowest common multi-ple of q1,…,qS. For example, suppose we have a case of six attributes; two with 2levels, two with 3 levels and two with 7 levels. The lowest common multiple of 4,6, 9, 14, 21 and 49 is 1764, which is the smallest number needed by applying afractional factorial orthogonal design. The lowest common multiple of 2, 3 and 7is 42, which is the smallest number by applying a uniform design. As one maysee, the number of profiles generated by uniform designs is substantially fewerthan that by fractional factorial orthogonal designs.

Notations and Definition of Measures

Notations

Consider a design consists of Q choice sets, indexed by q = 1, …, Q, and Jq = thenumber of choice alternatives in choice set q (denoted by Cq). Let zjq denotesthe vector of explanatory variable for the jth alternative in choice set q, β theparameter vector with p parameters and M the number of respondents for a givenchoice set. For multinomial logit model, the probability of choosing the jth alter-native in Cq is given by

for j = 1,…,Jq. The maximum likelihood estimator of β (denoted by ) has the follow-

ing properties: is consistent and asymptotically normal and isapproximately multivariate normal distribution with mean 0 and the covariancematrix I(β)−1 as QM → ∞, where

and

Matrix I(β) is called the ‘normalized information matrix’ for estimated parameters(Bunch et al., 1996).

Pz

zjCjq

iqi C

q

q

( )exp( ' )

exp( ' )β

ββ

=

∈∑

(3)

β

β QM ( ˆ )β β−

IQ

P z z z zjC jq q jq qj Cq

Q

q

q

( ) ( )( )( )'β β= − −∈=∑∑1

1

(4)

z z Pq jq jCj C

q

q

=∈∑ ( )β (5)

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 8: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

624 P. Li and D. Wang

Let T = {zjT, j = 1,…, JT} be the test choice set with JT alternative and zjT be thevector of explanatory variables for the jth alternative in the test choice set T. LetPjT(β) be the probability of choosing the jth alternative in the test set T, that is,

Let PT(β) be a (JT − 1) × 1 vector with the jth element PjT(β) for j = 1,…,JT − 1. Replac-ing β by , we get the predicted choice probability vector PT( ). Applying the deltamethod based on a truncated Taylor series expansion, we can get

which is asymptotically normal with mean 0 and the covari-ance matrix COV(β,T) where COV(β,T) is a (JT − 1) × (JT − 1) matrix with the (i, j)thelements being

where

Similarly as above, {COV(β,T)}−1 is called the ‘normalized information matrix’ forpredicted choice probabilities. Since PJTT(β) can be determined by the first JT − 1predicted probabilities, the covariance matrix will be singular if we include all JTchoice probabilities in PT( ). Furthermore, if we can predict PT( ) accurately, thepredictors of PJTT(β) will also be accurate.

Definition of Measures

One popular criterion for measuring the goodness of the stated choice designs isD-error. The design which minimizes the value of D-error is called the D-optimaldesign. D-error evaluates the estimation efficiency of different designs. Thegeneral sense is that the smaller the D-error value, the smaller the volume ofelliptical confidence region of β is and the more efficient the estimators are. Math-ematically, D-error is defined as:

Another criterion for measuring the estimation efficiency is A-criterion, whichignores the correlation among and may result in different ordering of designsfor the different types of coding (Kessels et al., 2006). Because of this and thepopularity of the D-error criterion, we will use D-error to measure the estimationefficiency in this study.

As discussed in Kessels et al. (2006), one of the major objectives of choice exper-iments is to facilitate the development of choice models that can make correctresponse predictions. Hence the ability to produce accurate probability prediction

Pz

zj JjT

jT

iTi T

T( )exp( ' )

exp( ' ), , ,β

β

β= =

∈∑

1 K (6)

β β

QM P PT T{ ( ˆ) ( )},β β−

COV T P P z z I z zij iT jT iT T jT T( , ) ( ) ( )( )' ( ) ( )β β β β= − −−1 (7)

z z PT jT jTi T

=∈∑ ( )β (8)

β β

D error I p− = −det( ( ) ) /β 1 1 (9)

β

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 9: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Uniform Design in Stated Choice Modelling 625

is also important for a good design. In real applications, stated choice models mayneed to be calibrated by real market data before they are applied to predict marketshares of choice alternatives. In the simulations of our study, we do not have realmarket information and thus cannot calibrate the model by real market informa-tion. This is equivalent to assume that each choice alternative has equal marketshares. Note that the equal share assumption is somehow the underlined assump-tion of estimating multinomial logit models without prior information. We arguethat if the model developed from a design cannot predict accurately in this idealsituation, neither it can provide accurate prediction even if the model is calibratedby real market data. Similar approach was adopted by Kessels et al. (2006).

In order to evaluate prediction accuracy, a test choice set consisting of fouralternatives is randomly generated. Details about the choice set are described inSection ‘Generating the Test Choice Set’. Similar to estimation efficiency, thedeterminant of COV(β,T) is used to measure the errors around the predictedprobabilities. That is,

The smaller the P-error value, the more accurate the predicted choice probabilitiesare. We refer to Kessels et al. (2006) for different types of criteria for predictionefficiency.

Simulation Study

Bunch and Bastell (1989) studied the performance of different estimators byconsidering seven different designs and then generalizing the result. We adoptthis approach to compare the performance of uniform and fractional factorialdesigns.

Choice Set Design

Let us consider a design task, which involves three continuous two-levelattributes and two continuous three-level attributes. Suppose we are interested inthe linear effects of all five attributes and also the quadratic effects of the twothree-level attributes. By applying a fractional factorial orthogonal design, a totalof 36 profiles are needed. Using a uniform design, one may generate 12 or 18profiles. In order to apply the L2k strategy for designing choice set, we select the 18profiles design, that is, U(18,2332), which is listed in Appendix 1. The profilesgenerated by the two design methods may be used to form choice sets by thefollowing six strategies: all pairs, 2J Block, BIBD (Dey, 1986), L2k strategy, foldoverand shifted designs. For details of these strategies, readers are referred to Bunchet al. (1996). Since the foldover strategy may not guarantee the estimation of utilitymodel and the performance of BIBD is similar to that of 2J Block, we exclude thesetwo strategies but consider only shifted pairs, L2k, all pairs and 2J Block. Table 1compares the number and average size of choice sets by uniform and fractionalfactorial designs using the four strategies of designing choice sets. Here, theaverage size of choice sets is defined to the average of the number of alternativesin Q choice sets. For example, suppose there are two choice sets, C1 and C2, withtwo and four alternatives, respectively. Then the average size of the choice sets is(2 + 4)/2 = 3.

P error COV T JT− = −det( ( , )) /( )β 1 1 (10)

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 10: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

626 P. Li and D. Wang

Table 1 shows that the number of choice sets by fractional factorial orthogonaldesign is significantly larger than that by uniform design. In addition to the eightdesigns generated by the four strategies, we add one more choice set design,which is generated simply by sampling rules. Therefore, in total there are ninechoice set designs in our experiments. Table 2 lists these designs.

Generating the Test Choice Set

To measure the prediction efficiency, or the P-error of different design strategies,we randomly generate a test choice set with four alternatives as follows. Denotethe two levels of the two-level attribute by 0 and 1 and the three levels of thethree-level attribute by 0, 1 and 2. For each alternative, we randomly generate fivevalues as the attribute value for the alternative, with three from the uniformdistribution over the interval [0,1] and two from the uniform distribution over theinterval [0,2]. Since we could rescale the continuous variables to the interval [0,1]or [0,2], the above-mentioned generating process is equivalent to uniformlygenerating from between the lower and upper bounds of a continuous attribute.

Generating Models for Simulation

For the five attributes mentioned above, suppose only the linear effects of thesefive attributes are important for choice probabilities, which means the

Table 1. Candidate designs for comparison study

Design Number of choice sets Average size of choice sets

By orthogonal designShifted pairs 36 22J Block 40 18L2k 36 2All pairs 630 2

By uniform designShifted pairs 18 22J Block 20 9L2k 18 2All pairs 153 2

Table 2. List of choice set designs

Designs Profiles-generated method Choice set strategy

D1 Orthogonal design Shifted pairsD2 Orthogonal design L2k

D3 Orthogonal design All pairsD4 Orthogonal design 2J BlockD5 Uniform design Shifted pairsD6 Uniform design L2k

D7 Uniform design All pairsD8 Uniform design 2J BlockD9 Orthogonal design Sampling rule

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 11: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Uniform Design in Stated Choice Modelling 627

corresponding parameters are not equal to zero. As we mentioned before, weare also interested in the quadratic effects of the two three-level attributes, sotheir quadratic effects are also included in the model. In total, we are going toconsider seven effects, five linear effects and two quadratic effects. For the signsof the parameters, we assume that the utilities monotonically increase alongwith the level increase for each attribute. That is to say, the signs of all parame-ters are assumed positive.

The form of the true model affects the two measures of statistical properties:D-error and P-error. Different models will generate different results. In order toobtain reliable and general findings, we use Monte Carlo experiments to generatedifferent models (similar approach is adopted by Bunch and Bastell, 1989, andBunch et al., 1996).

In order to generate models for simulation, we define the scale of parameters as

By varying Scale(β), we may generate different models.

Chapman and Staelin (1982) discussed the effect of Scale(β) and its relationship tothe interpretation of the multinomial logit model, which is referred to ‘ScaleTheorem’ in their article. The theorem states that the larger the scale, the moreextreme the probabilities are. We adopt their approach of standardizing eacheffect to have 0 mean and the variance of 1.

The true values of the parameters for the linear effects are generated by draw-ing five independent values from uniform distribution with a range of 0.2–1 andthen rescaled so that Scale(β) is satisfied. We select three scale values: 20% ExpVar, 45% Exp Var and 70% Exp Var, where

In these three situations, the Scale(β) values are about 0.64, 1.16 and 1.96, respec-tively.

Simulation Procedure

To account for randomness, we generate 100 models for each scale value.Since we are interested in the ratios of D-errors and the ratios of the P-errorsfor different design strategies, the value of M, the number of respondentsis not really important. Without loss of generality, we assume that it is equalto 1.

To summarize, the simulation procedure is as follows:

● Step 1: Generate the choice sets according to nine design strategies, accordingto the description in Section ‘Choice Set Design’ and Table 2.

● Step 2: Generate the test choice sets with four alternatives as described inSection ‘Generating the Test Choice Set’.

● Step 3: For each of the three Scale(β) values: 0.64, 1.16 and 1.96, we generate 100models as described in Section ‘Generating Models for Simulation’, and calcu-late D-error and P-error for each of the nine designs.

● Step 4: Summarize the results for each given Scale(β) and each design strategybased on the 100 simulated models.

Scale kk

p( ) .β β=

=∑ 21

Exp Var (11)=+

ScaleScale

( )( ) /

ββ π

2

2 2 6

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 12: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

628 P. Li and D. Wang

Discussion of Results

Based on the values of the two measures for the nine designs, we may nowcompare the statistical properties of uniform designs with that of fractional facto-rial orthogonal designs in terms of estimation efficiency (D-error) and predictionefficiency (P-error).

Estimation Efficiency

As explained earlier, for each of the nine designs and the three scale values, a totalof 100 D-error values are produced. To evaluate relative estimation efficiency ofuniform design to fractional factorial orthogonal design for the four strategies ofdesigning choice sets and that of uniform design to sampling rule, we may anal-yse these data in two ways. Firstly, we may calculate the following eight ratiosof D-error values: D-error(D1)/D-error(D5), D-error(D2)/D-error(D6), D-error(D3)/D-error(D7), D-error(D4)/D-error(D8), D-error(D5)/D-error(D9), D-error(D6)/D-error(D9), D-error(D7)/D-error(D9) and D-error(D8)/D-error(D9). The 25% quar-tile, the median and the 75% quartile of the eight ratios for each scale value arelisted in Tables 3 and 4. Secondly, we can use the idea of pairwise comparison totest whether D-error values of fractional factorial orthogonal design, uniformdesign and sampling rule are the same for the given choice set strategy and scalevalue. The test statistic has t-distribution with 99 degrees of freedom. The resultsare listed in Table 5.

From Tables 3, 4 and 5, we can see that for the design strategy of all pairs,uniform design has higher estimation efficiency than fractional factorial orthogo-nal design. All the test statistics are significant at the 0.05 significance level andthe medians of ratios are larger than 1. For shifted pairs, the medians of the

Table 3. Comparing the relative estimation efficiency of uniform design to orthogonal design

Scale(β) = 20% Exp Var

Scale(β) = 45% Exp Var

Scale(β) = 70% Exp Var

D-error(D1)/D-error(D5)25% Quantile 0.975 0.937 0.897Median 0.983 0.988 1.02475% Quantile 0.989 1.030 1.129

D-error(D2)/D-error(D6)25% Quantile 0.931 0.894 0.833Median 0.937 0.929 0.92575% Quantile 0.947 0.957 1.049

D-error(D3)/D-error(D7)25% Quantile 1.011 0.994 0.989Median 1.015 1.017 1.03775% Quantile 1.019 1.032 1.069

D-error(D4)/D-error(D8)25% Quantile 0.943 0.894 0.810Median 0.952 0.907 0.85575% Quantile 0.960 0.929 0.916

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 13: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Uniform Design in Stated Choice Modelling 629

relative estimation efficiency for uniform design to fractional factorial orthogonaldesigns are larger than 0.98 and the test statistics are not significant for the lastscale value, which suggests that the estimation efficiency of uniform design is notsignificantly different from that of fractional factorial orthogonal design. Theestimation efficiency of fractional factorial orthogonal design is slightly higherthan that of uniform design for L2k and 2J Block. For L2k, the estimation efficiencyof uniform design is on average 7% less than that of fractional factorial orthogonaldesign. In the case of 2J Block, the advantage of the fractional factorial orthogonaldesign is more obvious especially for large scale values. The relative estimationefficiency of uniform design is 5%, 10% and 15% lower than that of fractionalfactorial orthogonal design for the three scale values, respectively. Why therelative estimation efficiencies decrease as scale values increase? Note that the

Table 4. Comparing the relative estimation efficiency of sampling rule to uniform design

Scale(β) = 20% Exp Var

Scale(β) = 45% Exp Var

Scale(β) = 70% Exp Var

D-error(D5)/D-error(D9)25% Quantile 0.575 0.622 0.727Median 0.583 0.670 0.78375% Quantile 0.592 0.698 0.872

D-error(D6)/D-error(D9)25% Quantile 0.934 0.925 0.973Median 0.951 0.983 1.07375% Quantile 0.964 1.027 1.167

D-error(D7)/D-error(D9)25% Quantile 0.838 0.820 0.828Median 0.847 0.851 0.88075% Quantile 0.856 0.886 0.928

D-error(D8)/D-error(D9)25% Quantile 0.428 0.405 0.448Median 0.432 0.417 0.477

75% Quantile 0.440 0.429 0.502

Table 5. Testing the equivalence of D-error valuesa

Null hypothesisScale(β) = 20%

Exp VarScale(β) = 45%

Exp VarScale(β) = 70%

Exp Var

D-error(D1) = D-error(D5) −15.678 −3.942 0.464D-error(D2) = D-error(D6) −60.500 −13.816 −4.061D-error(D3) = D-error(D7) 22.967 5.255 5.480D-error(D4) = D-error(D8) −43.663 −29.255 −14.900D-error(D5) = D-error(D9) −198.712 −60.095 −17.607D-error(D6) = D-error(D9) −24.951 −4.081 4.071D-error(D7) = D-error(D9) −88.978 −33.120 −14.914D-error(D8) = D-error(D9) −349.538 −132.742 −63.504

at0.975(99) = 1.984 and t0.95(99) = 1.660.Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 14: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

630 P. Li and D. Wang

average size of choice sets by fractional factorial orthogonal design is two timesthat by uniform design. As a result, utility by fractional factorial orthogonaldesign is more balanced than that by uniform design and the extreme probabilityin fractional factorial orthogonal design is much less than that of uniform designas the scale value increases. As a result, fractional factorial orthogonal designbecomes more efficient than uniform design as scale value increases (Huber andZwerina, 1996).

As for the comparison between uniform design and sampling rule, if the L2k

strategy is used, uniform design has slightly higher estimation efficiency thansampling rule for small scale value and conversely for large scale value. If theother three strategies are used, sampling rule is much worse than uniform design,especially when scale value is small.

The estimation efficiencies of the four strategies decrease in the followingsequence: 2J Block > Shifted pairs > All pairs > L2k, if uniform design is used toconstruct profiles. Similar findings for fractional factorial orthogonal design werereported by Bunch et al. (1996).

Prediction Efficiency

In similar ways, we may compare the prediction efficiency (P-error) by calculatingeight ratios between the prediction efficiency of uniform designs and that of frac-tional factorial orthogonal design. The first four ratios are shown in Table 6 andthe last four are given in Table 7. To account for randomness, the 25% quartile, themedian and the 75% quartile are tabulated. The values of test statistics for testingthe null hypothesis that the prediction efficiency is equal are presented in Table 8.

Table 6 shows that uniform design has slightly stronger prediction ability thanfractional factorial orthogonal design for shifted pairs and all pairs strategies. Therelative prediction efficiency of uniform design is on average 2% higher for

Table 6. Comparing the relative prediction efficiency of uniform design to orthogonal design

Scale(β) = 20% Exp Var

Scale(β) = 45% Exp Var

Scale(β) = 70% Exp Var

P-error(D1)/P-error(D5)25% Quantile 1.007 0.948 0.900Median 1.022 1.020 1.03075% Quantile 1.040 1.076 1.209

P-error(D2)/P-error(D6)25% Quantile 0.970 0.944 0.890Median 0.989 1.012 1.01275% Quantile 1.009 1.064 1.184

P-error(D3)/P-error(D7)25% Quantile 1.059 1.052 1.054Median 1.065 1.083 1.118

75% Quantile 1.071 1.110 1.172P-error(D4)/P-error(D8)

25% Quantile 0.835 0.675 0.550Median 0.848 0.727 0.62275% Quantile 0.871 0.767 0.717

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 15: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Uniform Design in Stated Choice Modelling 631

shifted pairs and more than 6% higher for all pairs. Furthermore, the test statisticfor shifted pairs and all pairs are significant for all scale values at the 0.05 signifi-cance level, which means the predicted choice probabilities obtained fromuniform design are more accurate than that from fractional factorial orthogonaldesign. For L2k, uniform design has approximately the same prediction efficiencyas fractional factorial orthogonal design. The relative prediction efficiency is onlyabout 1% different at small scale value (20% Exp Var) and at large scale values(45% Exp Var or more). For 2J Block, fractional factorial orthogonal design is moreefficient than uniform design, especially for large scale values. The predictionefficiencies of uniform design are about 15%, 26% and 38% less for the three scalevalues, respectively. The larger size of choice sets by fractional factorial orthogo-nal design may also be the major reason for this large difference.

Table 7. Comparing the relative prediction efficiency of sampling rule to uniform design

Scale(β) = 20% Exp Var

Scale(β) = 45% Exp Var

Scale(β) = 70% Exp Var

P-error(D5)/P-error(D9)25% Quantile 0.406 0.435 0.523Median 0.415 0.474 0.58875% Quantile 0.428 0.527 0.688

P-error(D6)/P-error(D9)25% Quantile 0.633 0.607 0.635Median 0.646 0.645 0.71075% Quantile 0.659 0.683 0.811

P-error(D7)/P-error(D9)25% Quantile 0.582 0.558 0.561Median 0.585 0.581 0.61675% Quantile 0.593 0.598 0.651

P-error(D8)/P-error(D9)25% Quantile 0.338 0.368 0.480Median 0.350 0.385 0.53075% Quantile 0.359 0.404 0.572

Table 8. Testing the equivalence of P-error valuesa

Null hypothesisScale(β) = 20%

Exp VarScale(β) = 45%

Exp VarScale(β) = 70%

Exp Var

P-error(D1)/P-error(D5) 9.333 1.839 3.116P-error(D2)/P-error(D6) −4.439 0.750 0.434P-error(D3)/P-error(D7) 71.597 23.232 15.617P-error(D4)/P-error(D8) −45.483 −41.176 −22.071P-error(D5)/P-error(D9) −153.309 −46.971 −21.979P-error(D6)/P-error(D9) −146.832 −50.446 −19.164P-error(D7)/P-error(D9) −158.432 −56.222 −27.103P-error(D8)/P-error(D9) −185.208 −65.213 −32.877

at0.975(99) = 1.984 and t0.95(99) = 1.660.Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 16: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

632 P. Li and D. Wang

For sampling rule and uniform design, the prediction efficiency of uniformdesign is much higher than sampling rule, which can be easily seen in Table 7.Similar to estimation efficiency, the prediction efficiencies of the four strategies byuniform design follow this order: 2J Block > Shifted pairs > All pairs > L2k.

Conclusions

The uniform and fractional factorial orthogonal designs are two experimentaldesigns to generate profiles. The advantage of uniform design over fractionalfactorial orthogonal design is that uniform design generates substantially fewerprofiles particularly for attributes with uneven levels. In this research, wecompared uniform design with fractional factorial orthogonal design on twostatistical properties when shifted pairs, L2k strategies, all pairs and 2J Blockstrategies are used to generate choice sets. From the comparison, the followingconclusions are made.

In terms of estimation efficiency (D-error), when shifted pairs and L2k strategiesare used to construct choice sets, the fractional factorial orthogonal design isslightly better than uniform design. The relative estimation efficiency willdecrease no more than 2% and 7.5%, respectively, for shifted pairs and L2k. Theprediction efficiency of uniform design is 2% higher than that of the fractionalfactorial orthogonal design if shifted pairs strategy is used and the predictionefficiencies are approximately the same for uniform and fractional factorialdesigns if L2k strategy is used. For the strategy of all pairs, uniform design seemsmore efficient than the fractional factorial orthogonal design in both estimationand prediction. Maybe due to the larger average size of choice set, the fractionalfactorial orthogonal design performs better in both estimation and predictionefficiencies when the 2J Block strategy is used. However, when the scale value ofthe parameters is small, for example, less than 20% Exp Var, the result is alsoadmissible (about 95% for estimation efficiency and 85% for prediction efficiency)for 2J Block strategy. Therefore, for the first three choice set strategies, uniformdesign is a good choice if the number of profiles of fractional factorial orthogonaldesign is not applicable. And when the scale value is smaller than 20% Exp Var,uniform design is also acceptable for 2J Block strategy, since uniform designcannot only reduce the number of choice set size but also reduce the number ofaverage choice set size.

The findings of this study prove that though uniform design uses substantiallyfewer profiles than orthogonal design does, their estimation and predictionefficiencies in developing multinomial logit models are comparable. This suggeststhat uniform design is a good alternative to, if not a replacement of, orthogonaldesign. Uniform design may not maximize efficiency as D-optimal design does, it,however, does not require prior information on model form and parametervalues. In other words, uniform design does not suffer from the circular problemof D-optimal design. In this sense, uniform design may have some advantages overD-optimal design as well. Nevertheless, we agree that when we have sufficientprior information about the model form of choice probabilities and the parametersin the model, D-optimal design is the best choice. However, when the informationabout model form or the parameter values is not available, we can first use uniformdesign to derive the prior information on model form and parameter values. Afterthat a follow-up stated choice design can be chosen such that the combined designis D-optimal. We shall explore this sequential idea in future studies.

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 17: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

Uniform Design in Stated Choice Modelling 633

Acknowledgements

This research is sponsored by a research grant from Hong Kong Research GrantCouncil (RGC) (HKBU2441/05H) and the start-up grant of University of Alberta.

References

Adewale, A. and Wiens, D. (2009) Robust designs for misspecified logistic models, Journal of StatisticalPlanning and Inference, 139, pp. 3–15.

Bunch, D. S. and Bastell, R. R. (1989) A Monte Carlo comparison of estimators for the multinomiallogit model, Journal of Marketing Research, 26, pp. 56–68.

Bunch, D. S., Louviere, J. J. and Anderson, D. A. (1996) A comparison of experimental design strate-gies for choice-based conjoint analysis with Gene Bric-attribute Multinomial Logit Model. Workingpaper, Graduate School of Management, University of California, Davis, CA.

Burgess, L. and Street, D. J. (2003) Optimal designs for 2k choice experiments, Communications inStatistics—Theory and Methods, 32, pp. 2185–2206.

Burgess, L. and Street, D. J. (2005) Optimal designs for choice experiments with asymmetric attributes,Journal of Statistical Planning and Inference, 134, pp. 288–301.

Chapman, R. G. and Staelin, R. (1982) Exploiting rank-ordered choice data within the stochastic utilitymodel, Journal of Marketing Research, 19, pp. 288–301.

Dey, A. (1986) Theory of Block Designs (New York: Wiley).Hickernell, F. J. (1998) A generalized discrepancy and quadrature error bound, Mathematics of

Computation, 17, pp. 299–322.Hua, L. K. and Wang, Y. (1981) Application of Numbers Theory to Numerical Analysis (Berlin and Beijing:

Springer and Science Press).Huber, J. and Zwerina, K. (1996) The importance of utility balance in efficient choice designs, Journal of

Marketing Research, 33, pp. 307–317.Kanninen, B. J. (2002) Optimal design for multinomial choice experiments, Journal of Marketing

Research, 39, pp. 214–217.Kessels, R., Goos, P. and Vandebroek, M. (2006) A comparison of criteria to design efficient choice

experiments, Journal of Marketing Research, 43, pp. 409–419.Louviere, J. J. (1988) Conjoint analysis modeling of stated preferences: a review of theory, methods,

recent developments and external validity, Journal of Transport Economics and Policy, 10, pp. 93–119.Louviere, J. J. and Woodworth, G. (1983) Design and analysis of simulated consumer choice or

allocation experiments: an approach based on aggregate data, Journal of Marketing Research, 20,pp. 350–367.

Niederreiter, H. (1992) Random number generation and quasi-Monte Carlo methods. SIAM CBMS-NSF Regional Conference Series in Applied Mathematics, Philadelphia, PA.

Rose, J. M. and Bliemer, M. C. (2008) Constructing efficient stated choice experimental designs. Paperpresented at the 87th annual meeting of Transportation Research Board, Washington, DC, 13–17January 2008.

Rose, J. M., Bliemer, M. C., Hensher, D. A. and Collins, A. T. (2008) Designing efficient stated choiceexperiments in the presence of reference alternatives, Transportation Research B, 42, pp. 395–406.

Sandor, Z. and Wedel, M. (2001) Designing conjoint choice experiments using managers’ prior beliefs,Journal of Marketing Research, 38, pp. 430–444.

Sandor, Z. and Wedel, M. (2002) Profile construction in experimental choice designs for mixed logitmodels, Marketing Science, 21, pp. 455–475.

Sandor, Z. and Wedel, M. (2005) Heterogeneous conjoint choice designs, Journal of Marketing Research,42, pp. 210–218.

Street, D. J., Bunch, D. S. and Moore, B. (2001) Optimal designs for 2k paired comparison experiments,Communications in Statistics—Theory and Methods, 30, pp. 2149–2171.

Street, D. J. and Burgess, L. (2004) Optimal and near-optimal pairs for the estimation of effects in2-level choice experiments, Journal of Statistical Planning and Inference, 118, pp. 185–199.

Wang, D. G. and Li, J. K. (2002) Handling large numbers of attributes and/or large numbers of levelsin conjoint experiments, Geographical Analysis, 34, pp. 350–362.

Wang, D. G. and Li, P. F. (2005) Does uniform design really work in stated choice modeling? Asimulation study, Transportmetrica, 1, pp. 209–222.

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14

Page 18: Numerical Analysis of the Statistical Properties of Uniform Design in Stated Choice Modelling

634 P. Li and D. Wang

Appendix 1. Uniform designs U(18,2332)a and U(18,2634)b

Appendix 1. Uniform designs U(18,2332)a and U(18,2634)b

1 2 3 4 5 6 7 8 9 10

0 0 0 0 0 0 0 0 0 10 0 0 1 0 1 0 1 2 00 0 1 0 1 0 0 2 1 20 0 1 1 1 1 1 0 2 20 1 0 0 1 1 1 1 1 10 1 0 1 1 0 1 2 0 00 1 1 0 0 1 2 0 1 00 1 1 1 0 0 2 1 0 20 0 0 0 0 0 2 2 2 11 1 1 1 1 1 0 0 0 11 1 1 0 1 0 0 1 2 01 1 0 1 0 1 0 2 1 21 1 0 0 0 0 1 0 2 21 0 1 1 0 0 1 1 1 11 0 1 0 0 1 1 2 0 01 0 0 1 1 0 2 0 1 01 0 0 0 1 1 2 1 0 21 1 1 1 1 1 2 2 2 1

aChoosing columns 1, 2, 3, 7 and 8 can generate U(18,2332).bThe above design is U(18,2634).

Dow

nloa

ded

by [

Geo

rgia

Tec

h L

ibra

ry]

at 2

1:43

12

Nov

embe

r 20

14