Dynamic Assortment Optimization with a Multinomial …shen/papers/paper49.pdf · Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint Paat

Dynamic Assortment Optimization with a Multinomial Logit

Choice Model and Capacity Constraint

Paat Rusmevichientong∗ Zuo-Jun Max Shen† David B. Shmoys‡

Cornell University UC Berkeley Cornell University

April 7, 2008

Abstract

The paper considers a stylized model of a dynamic assortment optimization problem, where

given a limited capacity constraint, we must decide the assortment of products to offer to

customers to maximize the profit. Our model is motivated by the problem faced by retailers

of stocking products on a shelf with limited capacities and by the problem of placing a limited

number of ads on a web page. We assume that each customer chooses to purchase the product

(or to click on the ad) that maximizes her utility. We use the multinomial logit choice model

to represent demand. However, we do not know the demand for each product. We can learn

the demand distribution by offering different product assortments, observing resulting selections,

and inferring the demand distribution from past selections and assortment decisions. We present

an adaptive policy for joint parameter estimation and assortment optimization. To evaluate our

proposed policy, we define a benchmark profit as the maximum expected profit that we can

earn if we know the underlying demand distribution in advance. We show that the running

average expected profit generated by our policy converges to the benchmark profit and establish

its convergence rate. Numerical experiments based on sales data from an online retailer indicate

that our policy performs well, generating over 90% of the optimal profit after less than two days

of sales.

1. Motivation and Problem Formulation

Companies have realized the importance of offering products that are tailored to the demand of

customers in each region. For instance, Wal-mart stocks specific lines of clothes targeted exclu-

sively to certain groups of customers (Zimmerman (2006)). Car manufacturers are well-known for∗School of Operations Research and Information Engineering, Cornell University, Ithaca, NY 14853, USA. E-mail:

[email protected]†Department of Industrial Engineering and Operations Research, University of California–Berkeley, 4129 Etchev-

erry Hall, Berkeley, CA 94720, USA. E-mail: [email protected]‡School of Operations Research and Information Engineering and Department of Computer Science, Cornell

University, Ithaca, NY 14853, USA. E-mail: [email protected]

1

understanding the importance of learning customer preferences and offering the right mix of prod-

ucts at the right locations (O’Connell (2005)). Given limited resources such as shelf capacity and

marketing budget, a retailer must determine which assortment of products to offer to customers to

maximize profits. In some cases, the retailer does not know the demand a priori, and in fact, the

probability that a customer will purchase a product will depend on the assortment offered. The

retailer can learn the demand by offering different assortments, observing purchases, and estimating

the underlying demand from past sales and assortment decisions.

In addition to applications in retail and consumer product industries, the problem of demand

learning and finding a profitable assortment also has applications in online advertising. Figure 1

shows the front page of the NY Times online (as of March 4, 2008). Highlighted items in the figure

correspond to advertisements, which appear in the top and the right hand column of the web page.

In this setting, the capacity constraint represents the limited number of locations on the web page

where the ads can appear. The demand for each product corresponds to the number of customers

who click on the ad. Since many ads are similar, the probability that a customer will click on any

one ad will likely depend on the assortment of ads shown to the customers. In general, we often

do not know these probabilities in advance and must estimate their values from observed clickthru

data. Given a limited online real estate and unknown demand for the ad, we must decide on the

assortment of ads that will generate the most profit, adjusting our assortment decisions and refining

our demand estimates over time as new data arrive.

Motivated by the problems of product stocking and ads placement mentioned above, we formu-

late a stylized dynamic assortment optimization model that captures some of the issues commonly

present in these problems, including the capacity constraint, the uncertainty in the demand dis-

tribution, and the dependence of the purchase or selection probability on the assortment decision.

Our formulation assumes that we have N products indexed by 1, 2, . . . , N . For each i = 1, . . . , N ,

let wi > 0 denote the marginal profit of product i, where we assume that through appropriate

scaling max`=1,...,N w` ≥ 1. Due to a capacity constraint, we can offer at most C products to the

customers. The goal is to determine the subset of at most C products that will yield the maximum

profit.

To account for dependence of the selection probability on the assortment decision, we will assume

a multinomial logit (MNL) choice model, which is one of the most popular and most commonly

used choice models in economics, marketing, and operations management. The MNL model was

first introduced by McFadden (1974) and has been applied in a very large number of applications

(see, Ben-Akiva and Lerman (1985), Anderson et al. (1992), Mahajan and van Ryzin (1998), and

the references therein). Under the MNL choice model, each customer chooses the product that

2

Figure 1: Sample advertisements (highlighted in circles) shown in the front page of the NY Times. The ads

can appear only in certain locations on the web page. Given a limited online real estate, we must decide on

the most profitable set of ads to display to customers.

maximizes her utility, where for each i = 0, 1, . . . , N , the utility Ui associated with product i (with

0 corresponding to the option of no purchase) is given by:

Ui = µi + ζi,

where µi ∈ R denotes the mean utility that the customer assigns to product i and ζ0, . . . , ζN

are independent and identically distributed random variables having a Gumbel distribution with

location parameter 0 and scale parameter 1; that is, P {ζi ≤ x} = e−e−x

for each x ∈ R. To simplify

our exposition and without loss of generality, we will set µ0 = 0.

Under the MNL choice model, it is a well known result (see, for example, McFadden (1974))

that, for each assortment S ⊆ {1, . . . , N}, the probability θj(S) that the customer chooses the

product j is given by:

θj(S) =

eµj/

(1 +

∑k∈S e

µk), if j ∈ S,

1/(1 +

∑k∈S e

µk), if j = 0,

0, otherwise.

3

To facilitate our exposition, we define vi = eµi for all i. Then, the expected profit f(S) associated

with the assortment S is given by

f(S) =∑j∈S

wjθj(S) =

∑j∈S wjvj

1 +∑

j∈S vj.

The assortment optimization problem, which we refer to as the Capacitated Multinomial Logit

Assortment Problem (CAPACITATED MNL), is defined as follows:

Z∗ = maxS⊆{1,...,N}:|S|≤C

f(S), (1)

and we denote the optimal solution by S∗C , that is, Z∗ = f (S∗C).

We are interested in the setting where the mean utilities µi’s are unknown, and we have to

infer these values by offering different assortments and estimating the parameters based on the

resulting selections. In this case, we aim to develop an adaptive policy that generates a sequence of

assortments S1, S2, . . . such that |St| ≤ C for each t, and St depends only on the customer selections

and assortment decisions in the prior t − 1 periods. As a performance measure, we will consider

the running average expected profit 1T

∑Tt=1E [f(St)], and we want this average to converge to Z∗

as the number of time periods T increases to infinity.

1.1 Contributions and Organization of the Paper

Our approach to the dynamic assortment optimization problem consists of two parts. In the first

part, which we refer to as Static Optimization, we assume that the mean utilities µ1, . . . , µN are

known in advance, and focus on finding a solution to the CAPACITATED MNL problem given

in Equation (1). We present a polynomial-time algorithm for finding the optimal assortment and

establish structural properties of the solution. In the second part, which we refer to as Dynamic

Optimization, we leverage the insight gained from the static optimization problem and adapt the

algorithm to the setting when the mean utilities are unknown and must be estimated from historical

data. We now describe the contributions and key insights of the paper.

Capacitated vs. Unconstrained Static Optimization: When there is no capacity constraint (that

is, C = N), Gallego et al. (2004) and Liu and Van Ryzin (2004) showed that the optimal solution to

the static optimization problem in Equation (1) can be found among the following N assortments:

Λ1 = {λ1} , Λ2 = {λ1, λ2} , Λ3 = {λ1, λ2, λ3} . . . , ΛN = {λ1, . . . , λN} ,

where the permutation λ = (λ1, . . . , λN ) represents the ordering of the products based on their

marginal profits, that is, wλ1 ≥ wλ2 · · · ≥ wλN. Further, they showed that the optimal assortment

can be found by performing a sequential search, starting with Λ1, and sequentially adding one

4

product at a time (in a descending order of marginal profit) until the first time that the profit

decreases. Therefore, the optimal assortment is the first Λi such that f (Λi) > f (Λi+1). This is a

powerful and compelling result because it enables us to restrict our search to at most N subsets

instead of 2N .

The solution to the unconstrained problem suggests that, when we have a capacity constraint

C, the optimal solution might involve the top C products with the highest marginal profits. As

shown in Example 1 in Section 2.1, however, this intuition is false. In fact, the optimal assortment

can actually contain the product with the lowest marginal profit. Moreover, one might hypothesize

that the optimal solution to the problem with a smaller capacity constraint is a subset of the

optimal solution for the problem with a larger capacity constraint, that is, S∗C1⊆ S∗C2

whenever

C1 ≤ C2. As shown in the example, this conjecture turns out to be false as well. The example

demonstrates that the constrained optimization problem is significantly more difficult, and thus,

we must be careful in applying our intuition from the unconstrained problem.

In Section 6.2, we compare the expected profit of the optimal assortment and the assortment

generated under the greedy heuristic of Gallego et al. (2004) and Liu and Van Ryzin (2004), using

actual sales data from an online retailer. We show that when the capacity constraint is 10, the

optimal expected profit is 14% higher than the profit of the assortment generated by the greedy

heuristic.

A Polynomial-time Algorithm for Static Optimization: In Section 2.2, we describe a polynomial-

time algorithm for solving the CAPACITATED MNL problem that has a running time of

O(N2), which we refer to as the StaticMNL algorithm. The algorithm generates a sequence

〈A0, A1, . . . , AK〉 of assortments that guarantees to contain an optimal solution, where K ≤(N+1

2

)−

1 = O(N2). We emphasize that the ordering among the sets A`’s is not based on the ordering of

the marginal profits.

The algorithm is based on an ingenious solution to a more general problem of optimizing a

rational objective function by Megiddo (1979). Given the widespread use of the MNL choice

model, we believe it is important to bring the community’s attention to this method, and we have

provided a detailed description of the algorithm as it applies to our setting.

New Structural Results for the Optimal Solution: Although we cannot sort the products in a

descending order of marginal profits as in the unconstrained optimization problem, the optimal

solution to the CAPACITATED MNL problem still exhibits many structural properties that

are similar to its unconstrained counterpart. By exploiting the specific structure of our multinomial

logit choice model, we derive a new upper bound associated with the sequence 〈A0, A1, . . . , AK〉.

5

We show that the number of distinct subsets among A0, A1, . . . , AK is at most C(N+C−1) (Propo-

sition 3). Thus, if the capacity C is fixed, this result implies that finding the optimal assortment

requires searching through O(N) subsets, roughly the same number as in the unconstrained prob-

lem. Furthermore, we show in Proposition 4 that the expected profit associated with the sequence

〈A0, A1, . . . , AK〉 is unimodal in the sense that there exists an index m∗ such that

f (A1) ≤ f(A2) ≤ · · · ≤ f (Am∗) and f (Am∗) ≥ f (Am∗+1) ≥ · · · ≥ f (AK) .

This result enables us to identify the optimal assortment quickly. To our knowledge, these represent

the first structural results associated with the CAPACITATED MNL problem.

Parameter Estimation and Error Bounds for Maximum Likelihood Estimates: In Section 3,

as the first step toward solving the dynamic optimization problem, we introduce a parameter

estimation technique based on maximum likelihood estimation (MLE). Traditional applications of

MLE to the MNL choice model have assumed that we can offer all N products to each customer

at once. The utilities of all products are then estimated from the resulting sales data. Due to the

capacity constraint in our setting, however, we can offer at most C products. It turns out that, by

offering overlapping assortments, we can still estimate the necessary parameters for computing the

optimal assortment.

In addition, we develop new error bounds that relate the quality of our parameter estimates

(based on MLE) as a function of the number customers. In Theorems 5 and 6, we show that if

we offer our assortments to T independent customers, then for any ε > 0, the probability that our

estimates of the utilities differ from their true values by more than ε is O(e−ε

4T)

. To the best of

our knowledge, this is the first such error bound for this problem.

The error bound from Theorem 6 combined with the key algorithmic insight from the static

optimization problem (Proposition 2) enable us to establish the following key result. Suppose we

estimate of the mean utilities based on observations from T i.i.d. customers, and we use these

estimated utilities as inputs for the StaticMNL algorithm. In Corollary 7, we show that the

output of the StaticMNL algorithm will be exactly the same as if we have used the true mean

utilities as inputs, with a very high probability of at least 1−O(e−T

).

An Adaptive Algorithm for Dynamic Optimization: In Section 4, we combine the results from

Sections 2 and 3 to develop an adaptive algorithm for joint parameter estimation and optimization.

Our proposed algorithm extends the algorithm for the static optimization problem to handle the

setting when the mean utilities µi’s are unknown and must be estimated from historical sales and

assortment decisions. The proposed policy generates a sequence of assortments S1, S2, . . . such that

6

the T -period average expected profit 1T

∑Tt=1E [f (St)] converges to Z∗ at the rate of O

((log T )2/T

)(see Theorem 9 and Corollary 10 for more details).

Numerical Experiments: We test our proposed policy using actual sales data from an online

retailer. The results indicate that our policy performs well, yielding over 90% of the optimal

expected profit after about two days of sales.

1.2 Literature Review

The literature on assortment optimization can be divided into two areas – static and dynamic

optimization – and we believe that our research contributes to both of these areas. Kok et al.

(2006) provide an excellent review of the current state-of-the-art in the assortment optimization

literature.

The problem of static assortment planning has been studied extensively in the literature. Most

models assume that the total display space is limited and the demand of a particular product

depends on the allocated shelf space. The focus here is to develop optimization algorithms for

incremental changes to the assortment by adding or dropping products, or by adjusting the space

allocated to different products (see, for example, Corstjens and Doyle (1981, 1983), Bultez and

Naert (1988), Bultez et al. (1989), and Dreze et al. (1994)).

Some recent assortment models are built within the multinomial logit framework. Chong et al.

(2001) consider brand-level assortment decisions using a nested MNL model including measure-

ments associated with each brand in a product category. They use a local improvement heuristic

based on pairwise interchanges of product variants to modify the assortment and achieve a sub-

stantial increase in profit. Cachon et al. (2005) and Gaur and Honhon (2006) explicitly account

for customer search in the assortment planning process and show that ignoring customer search in

demand estimation can lead to sub-optimal assortment decisions and lower expected profits. Smith

(2006) generalizes previous research by allowing general heterogeneous customer preferences and

by incorporating customer’s choice of retailer within the multinomial logit framework. He applies

his model to a commercially obtained data on customer preferences for DVD players. One of the

major findings from this paper is that including heterogeneity in customer preferences plays an

important role in choosing the optimal assortment.

Several papers with operations management focus also consider inventory decisions in product

assortment problems. These papers typically assume a certain product substitution structure when

stockouts occur (see Ryzin and Mahajan (1999), Smith and Agrawal (2000), Mahajan and Ryzin

(2001), Kok and Fisher (2004), Cachon et al. (2005), and Honhon et al. (2006)). Most of these

7

papers assume that customers are homogenous with the same expected attractiveness ranking for

products across all customers.

Another related stream of research deals with product line design issues. A product may have a

set of attributes where each attribute can have different levels, and a product line can have a set of

different attribute level configurations. Thus, a product line can satisfy more customers than a single

product would. Green and Krieger (1985) were among the first to model the product line design

problem. Followup work includes McBride and Zufryden (1988) (using an integer programming

method), Kohli and Sukumar (1990) (dynamic programming), Dobson and Kalish (1993) and Nair

et al. (1995) (heuristic for an extended Green and Krieger model), and Alexouda and Paparizos

(2001) and Fligler et al. (2006) (genetic algorithms). Chen and Hausman (2001) use choice-based

conjoint analysis to model customer preferences. They show that the conjoint approach has some

special mathematical properties that lead to an efficient optimal algorithm for the product line

selection problem. Yano and Dobson (1998) provide a comprehensive review of the literature in

this area. None of the above papers consider demand learning, thus their assortment problems are

static, not dynamic.

All of those papers begin with the presumption that we know the parameters of the underlying

customer choice model. To the best of our knowledge, the only paper on product assortment

optimization with demand learning is the work by Caro and Gallien (2006). Caro and Gallien (2006)

apply a finite-horizon multi-armed bandit model with Bayesian learning to a dynamic assortment

problem for seasonal goods. Several plays per stage are allowed in their model to accommodate the

situation where a company can make product design and assortment decisions during the testing

cycle when actual demand information becomes available. They derive a closed-form dynamic index

policy that captures the key exploration versus exploitation tradeoffs.

Connections to Multiarmed Bandit Problems: The work by Caro and Gallien (2006) is part

of the literature on multi-armed bandit problems. In a multi-armed bandit problem, we wish to

choose an arm (or a decision) with the highest expected reward, but we do not know the reward

distribution associated with each arm a priori. In their seminal paper, Lai and Robbins (1985)

propose an adaptive index-based policy whose running average expected reward converges to the

maximum reward. Subsequent work has refined and extended the original analysis. Examples

include the work by Lai (1987) (considers a Bayesian setting with prior distribution on the reward

of each arm); Agrawal (1995) (considers an index formula that depends only on the running average

reward of each arm); Auer et al. (1995) (develop a probabilistic decision rule for the bandit problem

in adversarial settings); Auer et al. (2002) (provide a finite-time analysis of the multi-armed bandit

problem).

8

We can formulate our problem as an instance of the multi-armed bandit problem by viewing

each assortment – corresponding to the sets of products to be offered at each store – as an arm

whose expected profit is unknown. We can then apply mutliarmed bandit algorithms such as those

of Lai and Robbins (1985) and Auer et al. (2002) to generate a sequence of assortments over time

whose average expected profit converges to the optimal profit. Unfortunately, in this formulation,

the number of possible decisions is O(NC), which grows exponentially with the capacity constraint.

All known generic multiarmed bandit algorithms have running times and convergence rates that

scale linearly with the number of decisions, making them impractical when the capacity constraint

C is large.

An alternative formulation of our problem is to view each product as an arm in a bandit problem.

In each period, we can play up to C arms, corresponding to offering an assortment of size C or less

to the customers. Variations of the multiarmed bandit problems where we can play multiple arms

in each period have been extensively studied by Anantharam et al. (1987a) and Anantharam et al.

(1987b). However, all of the literature in this area assume that the rewards of playing multiple

arms is the sum of the reward of each individual arm, which is not applicable to our setting because

of the structure of the multinomial logit choice model. Moreover, it is not clear whether the policy

proposed in the literature will be effective here.

We note that there have been resurgent interests in the multi-armed bandit problems with a very

large number of arms. Examples include Berry et al. (1997) who consider a Bayesian setting with

infinite Bernoulli arms, and Kleinberg (2004, 2005) who consider applications to network routing

and single-product pricing. However, most of these works are very problem-specific, and they are

not applicable to our assortment optimization problem.

Connection to Economics Literature: Motivated by applications in operations management

and online advertising, we focus on the assortment optimization problem under a multinomial logit

choice model. Our formulation can be viewed as a special case of the general problem of solving and

estimating the dynamic stochastic discrete choice model in economics. See, for example, Eckstein

and Wolpin (1989), Hotz and Miller (1993), Keane and Wolpin (1994), Paap and Franses (2000)

and the references therein. Much of the work in this area focuses on finding efficient estimation

procedures and approximation techniques for solving large-scale dynamic programming problems.

To our knowledge, our paper is the first application to the assortment optimization with a capacity

constraint.

9

1.3 Additional Applications and Discussion of the Problem Formulation

In Section 1, we describe potential applications of our model to the online ads placement problem

and to the assortment problem in traditional “brick-and-mortar” retail stores. We now highlight

additional potential applications of our model and discuss the strengths and weaknesses of the

formulation.

Although e-commerce companies such as Amazon.com can offer customers a very large assort-

ment of products, we believe our model can still be applied in the recommendation of products for

a generic customer whose purchase history is unknown. Examples of such a customer include a new

customer to the web site or a customer who does not have an identifiable “cookie” associated with

his or her web browser. Figure 2 shows an example of book recommendations by Amazon.com,

which in this case, is limited to about 15 books per page. Given the limited space to display

products on the web page, the company might wish to display the assortment of products that

maximizes the total profit. This problem also fits within our formulation.

Figure 2: An example of book recommendations shown on Amazon.com. Since the web page can display

only a limited number of products, we want to display the assortment of products that will give the highest

profit.

We now discuss the limitations of our formulation. Although the MNL choice model has

been used successfully in many applications, it exhibits the Independence of Irrelevant Alterna-

tives, where the ratio of the selection probabilities between two alternatives is independent of the

assortment containing them. In certain applications, this property is not consistent with the actual

10

customer choice behavior. For more discussions on this topic and extensions of the MNL model

that alleviate this problem, the reader is referred to Ben-Akiva and Lerman (1985) and Kok et al.

(2006).

Our formulation assumes that each customer independently subscribes to the same choice model.

This assumption is appropriate when we have a single market and the customers within the market

are relatively homogenous. Extensions of our model to the multi-store setting is beyond the scope

of this paper.

We also assume that the cost of changing an assortment from one customer to the next is

negligible. We believe that this assumption is reasonable in the online setting where the cost of

changing the ads or product recommendations on the web page is minimal. In settings where there

are significant costs associated with switching product assortments, however, our model might not

be appropriate. In addition, we implicitly assume that we have enough supply of each product to

ignore all inventory considerations.

2. Static Optimization: Algorithm and Structure of the Optimal Solution

This section focuses on developing a solution to the CAPACITATED MNL problem given in

Equation (1), under the assumption that the mean utilities µ1, . . . , µN are known in advance. In

Section 2.1, we present two examples which show that, when we have a capacity constraint, the

solution to the assortment optimization problem does not necessarily follow the intuition suggested

by its unconstrained counterpart. Then, in Section 2.2, we present a polynomial-time algorithm for

this problem, which is based on the work by Megiddo (1979). By exploiting the special structure of

the MNL choice model, we derive new structural results associated with the optimal assortments,

which are presented in Sections 2.3 and 2.4. We will rely on these structural insights extensively

when we develop an adaptive policy for the dynamic optimization problem in Section 4.

2.1 Contrast to the Unconstrained Optimization Problem

As mentioned in Section 1.1, Gallego et al. (2004) and Liu and Van Ryzin (2004) showed that the

unconstrained optimization problem (when C = N) can be solved by simply sorting the products

in descending order of the marginal profits. As shown in the following example, this result is not

true when we have a capacity constraint.

Example 1. Consider four products with

w1 = 9.5, w2 = 9.0, w3 = 7.0, w4 = 4.5 and v1 = 0.2, v2 = 0.6, v3 = 0.3, v4 = 5.2

11

The expected profit for each assortment is shown in Table 1. In this case, the optimal assortments

for each value of C is given by:

S∗1 = {4}, S∗2 = {2, 4}, S∗3 = {1, 2, 3}, S∗4 = {1, 2, 3, 4}.

Note that product 4 which has the lowest marginal profit is included in the optimal assortment when

C = 1 and C = 2, but it is not included in S∗3 . Yet, it re-appears in S∗4 when there is no capacity

constraint (C = 4). Moreover, note that although S∗1 ⊆ S∗2 , we observe that S∗2 * S∗3 .

S f(S) S f(S) S f(S) S f(S)

{1} 1.583 {1, 2} 4.056 {1, 2, 3} 4.476 {1, 2, 3, 4} 4.493

{2} 3.375 {1, 3} 2.667 {1, 2, 4} 4.386

{3} 1.615 {1, 4} 3.953 {1, 3, 4} 4.090

{4} 3.774 {2, 3} 3.947 {2, 3, 4} 4.352

{2, 4} 4.253

{3, 4} 3.923

Table 1: Expected profits for different assortments in Example 1. The optimal assortment for each value of

C is highlighted in bold.

2.2 A Polynomial-time Algorithm

We now present a polynomial-time algorithm for solving the CAPACITATED MNL optimization

problem based on an ingenious solution proposed by Megiddo (1979). The key idea underlying our

algorithm is the observation that we can express the optimal profit Z∗ from Equation (1) as follows.

Z∗ = max {λ ∈ R : ∃X ⊆ {1, . . . , N}, |X| ≤ C, and f(X) ≥ λ}

= max

{λ ∈ R : ∃X ⊆ {1, . . . , N}, |X| ≤ C, and

∑i∈X

vi (wi − λ) ≥ λ

},

where the final equality follows from the definition of the profit function f(·). The inputs to the

algorithm consist of the vectors v = (v1, v2, . . . , vN ) ∈ RN+ where vi = eµi for i = 1, . . . , N , and

the marginal profits w = (w1, . . . , wN ) ∈ RN+ , along with the capacity constraint C. Given these

inputs, we then construct N + 1 linear functions h0, h1, . . . , hN , where for each λ ∈ R,

h0(λ) = 0 and hi(λ) = vi (wi − λ) , for i = 1, . . . , N.

Note that Z∗ = max{λ ∈ R : ∃X ⊆ {1, . . . , N}, |X| ≤ C, and

∑i∈X hi(λ) ≥ λ

}. As suggested by

the expression for Z∗, the geometry of the collection of lines h0(·), h1(·), . . . , hN (·) will play a

prominent role in our proposed algorithm.

12

We wish to find the ordering of the products based on the values of hi(λ)’s as λ ranges over all

real values. To do this, we first compute the intersection point between any two lines. For each

{i, j} ⊆ {1, 2, . . . , N}, let I(i, j) denote the point on the x-axis where lines hi(·) and hj(·) intersect,

that is, I(0, j) = I(j, 0) = wj for each j = 1, . . . , N and for each {i, j} ⊆ {1, 2, . . . , N} such that

vi 6= vj ,

hi (I(i, j)) = hj (I (i, j)) ⇔ I(i, j) =viwi − vjwjvi − vj

.

We then sort the intersection points in an increasing order of I(i, j)’s. Let (i1, j1) , . . . , (iK , jK)

denote this ordering, that is, 0 ≤ i` < j` ≤ N for each ` = 1, . . . ,K and

−∞ ≡ I(i0, j0) < I(i1, j1) ≤ I(i2, j2) ≤ · · · ≤ I(iK , jK) < I(iK+1, jK+1) ≡ +∞,

where we have added the two end points I(i0, j0) and I(iK+1, jK+1) to facilitate our exposi-

tion. By definition, within the interval(I (it, jt) , I (it+1, jt+1)

), none of the N + 1 lines inter-

sect each other, and thus the ordering of the N + 1 lines remains constant. For t = 0, 1, . . . ,K,

the algorithm will maintain the following four pieces of information associated with the interval(I (it, jt) , I (it+1, jt+1)

):

1. The ordering σt of the N lines.

2. The set Gt corresponding to the first C elements according to the ordering σt.

3. The set Bt of products whose values have become negative.

4. The assortment At = Gt \Bt.

Starting with an ordering of the products based on vi’s (corresponding to λ = −∞), each time

we encounter an intersection point, we update(σt, Gt, Bt, At

). We continue this process until we

have exhausted all the intersection points. The algorithm then outputs a sequence of quadruplets⟨(σt, Gt, Bt, At

): t = 0, 1, . . . ,K

⟩. The formal description of our algorithm – which we refer to as

the StaticMNL(v, w,C) algorithm – is stated below.

StaticMNL (v, w,C)

INPUT: v = (v1, . . . , vN ) ∈ RN+ , w = (w1, . . . , wN ) ∈ RN

+ , and C ∈ Z+.

INITIALIZATION: For i = 0, 1, . . . , N , let a linear function hi : R → R be defined by: for each

λ ∈ R, h0(λ) = 0, and for i = 1, . . . , N , hi(λ) = vi (wi − λ). For each {i, j} ⊆ {0, 1, . . . , N}

with i 6= j, let I(i, j) denote the point on the x-axis where lines hi(·) and hj(·) intersect. Let

(i1, j1) , . . . , (iK , jK) denote the ordering of the intersection points, that is, i` < j` for each ` =

1, . . . ,K and

−∞ ≡ I(i0, j0) < I(i1, j1) ≤ I(i2, j2) ≤ · · · ≤ I(iK , jK) < I(iK+1, jK+1) ≡ +∞,

13

v3(w3 - λ)

v2(w2 - λ)

v1(w1 - λ)

Figure 3: An illustration of the StaticMNL algorithm with three products.

Also, let σ0 =(σ0

1, . . . , σ0N

)denote an ordering of the products based on vi’s, that is, vσ0

1≥ vσ0

2≥

· · · ≥ vσ0N.

DESCRIPTION: Let A0 = G0 ={σ0

1, . . . , σ0C

}and B0 = ∅. For t = 1, . . . ,K, construct the

permutation σt and sets At and Gt recursively as follows:

• If it 6= 0, let the permutation σt be obtained from σt−1 by transposing it and jt and set

Bt = Bt−1.

• If it = 0, let σt = σt−1 and Bt = Bt−1 ∪ {jt}.• Let Gt =

{σt1, . . . σ

tC

}and At = Gt \Bt.

OUTPUT: The sequence⟨(σt, Gt, Bt, At

): t = 0, 1, . . . ,K

⟩.

The following example illustrates the application of the StaticMNL algorithm to a setting

with three products.

Example 2. Consider an example with 3 products whose corresponding collection of lines h1(·),

h2(·), and h3(·) are given in Figure 3. In this example, the products are ordered so that v1 < v2 < v3

and we have six intersection points with

I(2, 3) < I(1, 3) < I(0, 3) < I(1, 2) < I(0, 2) < I(0, 1)

Suppose that we have a capacity constraint C = 2. The output of the StaticMNL algorithm is

then given in the following table. We note that, for each t, the assortment At correspond to the top

two lines whose values are nonnegative within the interval (I (it, jt) , I (it+1, jt+1)).

14

t I(it, jt) σt Gt Bt At

0 −∞ (3, 2, 1) {3, 2} ∅ {3, 2}

1 I(2, 3) (2, 3, 1) {2, 3} ∅ {2, 3}

2 I(1, 3) (2, 1, 3) {2, 1} ∅ {2, 1}

3 I(0, 3) (2, 1, 3) {2, 1} {3} {2, 1}

4 I(1, 2) (1, 2, 3) {1, 2} {3} {1, 2}

5 I(0, 2) (1, 2, 3) {1, 2} {3, 2} {1}

6 I(0, 1) (1, 2, 3) {1, 2} {3, 2, 1} ∅

The main result of this section is stated in Theorem 1 which establishes the running time and

the correctness of the StaticMNL(v, w,C) algorithm. A general proof of this result appeared in

Megiddo (1979). For completeness, we provide a short proof specific to our problem here.

Theorem 1. Let⟨A0, A1, . . . , AK

⟩denote the sequence of assortments generated by the Stat-

icMNL (v, w,C) algorithm. Then, Z∗ = max{f(Ai)

: i = 0, 1, . . . ,K}

. Also, the running time of

the algorithm is of order O(N2).

Proof. The running time of the algorithm is proportional to the number of intersection points

among the N + 1 lines h0(·), h1(·), . . . , hN (·), which is at most(N+1

2

)= O(N2). To establish the

correctness, note that from our earlier discussion at the beginning of Section 2.2, we have that

Z∗ = max

{λ ∈ R : max

X:|X|≤C

∑i∈X

vi(wi − λ) ≥ λ

}. (2)

However, for each λ ∈ R, the solution to the optimization problem maxX:|X|≤C∑

i∈X vi(wi − λ)

corresponds to the top C lines (among hj(λ) = vj(wj − λ)) whose values are nonnegative.

The ordering σ0 in the description of the StaticMNL(v, w,C) algorithm corresponds to the

ordering of vj (wj − λ)’s when λ = −∞. Moreover, by our construction, I (i, j) represents the point

where lines hi(λ) = vi (wi − λ) and hj(λ) = vj (wj − λ) intersect (if 1 ≤ i, j ≤ N), or the point

where the value of vj (wj − λ) becomes zero (if i = 0). Thus, for each λ strictly between I (it, jt)

and I (it+1, jt+1), the ordering of the values vj (wj − λ)’s must remain constant and their values do

not change sign. Therefore, for each λ strictly between I (it, jt) and I (it+1, jt+1), the optimization

problem given in Equation (2) will always have the same optimal solution, corresponding to the

assortment At.

Since the collection of assortments A0, A1, . . . , AK encompasses the optimal solution to the

optimization problem given in Equation (2) for all values of λ ∈ R, this collection of subsets must

contain an optimal solution to the CAPACITATED MNL assortment optimization problem.

15

From the description of the StaticMNL(v, w,C) algorithm, we observe the following crucial

insight. The correctness of the algorithm depends only on the correct initial ordering σ0 of the

mean utilities and the ordering of the intersection points I (ik, jk)’s. Suppose that we only have

approximations µj ’s of the true mean utilities. We can still obtain an optimal solution, provided

that the ordering of the utilities µj and the ordering of the intersection points I (ik, jk)’s (computed

using the estimated utilities µj) coincide with the true orderings. This observation is summarized

in the following proposition. The result of Proposition 2 will form a basis of our proposed adaptive

algorithm in Section 4.

Before we proceed to the statement of proposition, let us introduce the following notation.

Let µ = (µ1, . . . , µN ) ∈ RN denote an approximation to the true mean utilities µ1, . . . , µN . For

each i = 0, 1, . . . , N , let vi = ebµi and let hi : R → R be defined by: h0(λ) = 0 and for i ≥ 1,

hi (λ) = vi (wi − λ) for all λ ∈ R. For each {i, j} ⊆ {0, 1, . . . , N}, let I(i, j) denote the intersection

point of the lines defined by hi and hj .

Proposition 2. If the ordering of the estimated utilities µj’s and the ordering of the estimated

intersection points I(i, j)’s coincide with the orderings based on µj’s and I(i, j)’s, respectively,

then both the StaticMNL(v, w,C) and the StaticMNL(v, w, C) algorithms generate the same

sequence of assortments as outputs.

2.3 An Upper Bound on the Number of Distinct Assortments

The number of assortments generated by the StaticMNL(v, w,C) algorithm is upper bounded by(N+1

2

)= O(N2), presenting the maximum number of intersection points among N + 1 lines. By

exploiting the special structure of our logit choice model, we can show that the number of distinct

assortments is actually at most C(N + C − 1). When C is fixed, this represents an improvement

by a factor of N from the trivial bound. This result is stated in Proposition 3 whose proof is given

in Appendix A.

Before we proceed to the statement of the lemma, let us introduce the following assumption

that will be used throughout the paper. We emphasize that Assumption 1 is introduced primarily

to simplify our exposition and to facilitate the discussion of the key ideas without having to worry

about degenerate cases. In Section 5, we will discuss extensions of our results to settings where

this assumption may fail.

Assumption 1.

(a) The products are indexed so that v1 < v2 < · · · < vN .

(b) The intersection points are distinct, that is, for any (i, j) 6= (s, t), I(i, j) 6= I(s, t).

16

Assumption 1(a) requires the values of vi’s to be distinct, while Assumption 1(b) requires that the

marginal profit wi’s are distinct, and no three lines among h1(·), . . . , hN (·) can intersect at the same

point. As a consequence of Assumption 1, we observe that every pair of lines hi(·) and hj(·) will

intersect each other, and thus, the number of intersection points K is exactly(N+1

2

). In addition,

we also have a strict ordering of the intersection points, that is,

−∞ ≡ I(i0, j0) < I(i1, j1) < I(i2, j2) < · · · < I(iK , jK) < I(iK+1, jK+1) ≡ +∞,

and the collection of intervals{[I (it, jt) , I (it+1, jt+1)

): 0 ≤ t ≤ K

}partitions the real line.

Proposition 3. Let⟨A0, A1, . . . , AK


icMNL (v, w,C) algorithm. Under Assumption 1, the number of distinct non-empty assortments

generated by the StaticMNL(v, w,C) algorithm is at most C(N − C + 1); that is, there are at

most C(N − C + 1) distinct non-empty subsets among A0, A1, . . . , AK .

If the capacity C is fixed, the above lemma shows that finding the optimal assortment requires

us to search through O(N) sets, which is comparable to the unconstrained case. Of course, the

ordering of the assortments A0, A1, . . . AK is not based on the ordering of the marginal profits, but

rather, on the ordering of the intersection points. Given the result of Proposition 3, throughout the

rest of the paper, we will assume that the sequence⟨A0, A1, . . . , AK

⟩generated by our StaticMNL

algorithm has a length of at most C(N + C − 1).

2.4 Unimodality of the Sequence of Expected Profits

Proposition 3 limits the number of assortments that we have to search to find the optimal solution

to O(N) subsets. Using the special structure of our problem, we can further show that the ex-

pected profits associated with the sequence of assortments generated by the StaticMNL(v, w,C)

algorithm is unimodal in the sense given in Proposition 4 below. The proof of this result appears

in Appendix B.

Proposition 4. Let⟨A0, A1, . . . , AK


icMNL (v, w,C) algorithm. Then, under Assumption 1, the sequence⟨f(A0), f(A1), . . . , f

(AK)⟩

is unimodal, that is, there exists 0 ≤ m∗ ≤ K such that

f(A0)≤ f

(A1)≤ · · · ≤ f

(Am

∗)

and f(Am

∗)≥ f

(Am

∗+1)≥ · · · ≥ f

(AK)

Given the sequence of assortments⟨A0, A1, . . . , AK

⟩, if we can evaluate the expected profit f(·)

exactly, we can apply the standard Golden Ratio or Fibonacci search (see Press et al. (1999), for

example) to find the optimal assortment using only O (log(NC)) iterations (since there are O(NC)

17

distinct assortments). Of course, since the true mean utilities are unknown, we cannot evaluate

f(·) exactly. As shown in Section 4.1, the structure associated with the sequence of expected profit

still allows us to develop an efficient search algorithm for identifying the optimal assortment.

3. Parameter Estimation

In practice, the true mean utilities µ1, . . . , µN are unknown and we must estimate their values

from data. A standard approach for parameter estimation in the multinomial logit model is based

on maximum likelihood estimation (MLE). Under this approach, we offer all products to each

customer, observe their selections, and estimate the mean utilities µj by maximizing the likelihood

function.

Since we have a capacity C, we cannot offer all products simultaneously to the customers.

However, we can still obtain an efficient parameter estimation technique by leveraging the result

of Theorem 1 and Proposition 2. Recall from Section 2.2 that the intersection point I(i, j) of the

lines defined by hi(λ) := vi (wi − λ) and hj(λ) := vj (wj − λ) is given by

I(i, j) =wivi − wjvjvi − vj

=wiθi (S)− wjθj (S)θi (S)− θj (S)

, (3)

where for each S ⊆ {1, . . . , N} such that {i, j} ⊆ S, we define

θi (S) =vi

1 +∑

`∈S v`and θj (S) =

vj1 +

∑`∈S v`

.

Note that the parameters θi (S) and θj (S) correspond to the probabilities that a customer will select

product i and product j, respectively, from the assortment S. Thus, to estimate the intersection

points I(i, j), we do not need to offer all products to the customers. We only need to offer an

assortment that contains both products i and j, and observe the resulting selections. Also, note

that for each assortment S such that {i, j} ⊆ S, we have that µi ≤ µj if and only if θi (S) ≤ θj (S).

Therefore, we can also estimate the orderings of the mean utilities µj ’s based on the ordering of

the corresponding selection probabilities.

These observations lead us to consider the following parameter estimation technique. Consider

a collection of subsets E such that for each E ∈ E , |E| = C and for each pair of products i and j

with i 6= j, there exists an assortment E ∈ E where {i, j} ⊆ E, that is,

E = {E ⊆ {1, . . . , N} : |E| = C and for all i 6= j , {i, j} ⊆ X for some X ∈ E} . (4)

It is easy to verify that we only need at most 5(N/C)2 such subsets, that is, |E| ≤ 5(N/C)2. The

following example shows a construction of the collection E .

18

Example 3. Consider an example with N = 6 and C = 3. In this case, we can define E as follows:

E = {{1, 2, 3}, {4, 5, 6}, {1, 2, 4}, {1, 2, 5}, {1, 2, 6}, {1, 3, 4}, {3, 5, 6}} .

We can show that for all i 6= j, there exists E ∈ E such that {i, j} ⊆ E.

For each assortment E ∈ E , we will offer it to T independent customers. Let C1 (E) , . . . , CT (E)

denote the choices selected by each of the T customers when offered the assortment E, where

Ct (E) ∈ {0} ∪ E with 0 corresponding to the option of no purchase. For each x ∈ RN , let

xE = (xj : j ∈ E) ∈ RC denote the vector whose coordinates correspond to the indices in E. For

each xE ∈ RC , the likelihood function lik(xE

∣∣∣ C1 (E) , . . . , CT (E))

is given by:

lik(xE

∣∣∣ C1 (E) , . . . , CT (E))

=T∏t=1

Pr{Ct (E`)

∣∣∣ µE = xE

}=

T∏t=1

exCt(E)

1 +∑

`∈E ex`

where we define x0 = 0. Let the (negative) log-likelihood ` : RC → R be defined by: for each

xE ∈ RC ,

`(xE

∣∣∣ C1 (E) , . . . , CT (E))

=−1T

ln lik(xE

∣∣∣ C1 (E) , . . . , CT (E))

=1T

T∑t=1

ln

(1

exCt(E)/(1 +

∑`∈E e

x`)) ,

By construction, `(·) is a non-negative function. The maximum likelihood estimates

µE = (µi (C1 (E) , . . . , CT (E)) : i ∈ E)

is given by

µE = arg minxE∈RC

`(xE

∣∣∣ C1 (E) , . . . , CT (E))

(5)

We note that it is a well-known result (see, for example, Ben-Akiva and Lerman (1985)) that the

function ` is globally convex and µE can be computed using many commercially available optimizer

packages (see, for example, Bierlaire (2003)). To facilitate our analysis, throughout this write-up,

we will make the following assumption regarding the mean utilities µ1, . . . , µN .

Assumption 2. The manager has a prior knowledge of the upper bound M > 0 such that |µi| ≤M

for all i.

Note that Assumption 2 requires only that we have a prior knowledge of the range of the mean

utilities µi’s; it does not require that we know their actual values. We emphasize that Assumption

2 is introduced primarily to simplify our exposition and the mathematical proofs. All of the results

in this section remain true without this assumption. In Section 5, we discuss how to extend our

results to the setting when the manager does not know the range of the mean utilities in advance.

19

Given Assumption 2, it suffices to restrict the domain of the optimization problem in Equa-

tion (5) to [−M,M ]C . Given the maximum likelihood estimates of the mean utilities µ(E) =

(µi(E) : i ∈ E), define the estimated selection probabilities θ (E) =(θi (E) : i ∈ E

)as follows: for

each i ∈ E,

θi (E) =ebµi(E)

1 +∑

`∈E ebµ`(E)

Note that θ(E) is a random vector that depends on the selection made by the customers who

were offered the assortment E. The following theorem establishes an error bound on the difference

between the estimated selection probabilities θ (E) and the true values θ (E) as a function of the

sample size. The proof of this result appears in Appendix C.

Theorem 5. Under Assumption 2, if the assortment E ∈ E is offered to T independent customers,

then for each 0 < ε < 1,

P

{∑k∈E

∣∣∣θk (E)− θk (E)∣∣∣ ≥ ε} ≤ 8

(256eBε2

ln256eBε2

)C+1

e−ε4T/4096B2

,

where B = 2M + ln(1 + C).

The above result shows the errors in our estimated probabilities decrease to zero exponentially

fast with the number of customers. We emphasize that the above error bound is a worst-case error

bound that holds for any arbitrarily distribution and it is extremely conservative. Thus, the results

of the above theorem and that of Theorem 6 below provide only a qualitative assessment about

how our estimates scale with the number of samples.

For each E ∈ E and {i, j} ⊆ E with i 6= j, we can define an estimated intersection point IE(i, j)

as follows: if θi (E) 6= θj (E),

IE(i, j) =θi (E)wi − θj (E)wjθi (E)− θj (E)

,

otherwise, set IE(i, j) to some arbitrary value. Also, define IE(0, j) = IE(j, 0) = wj for each j. Let

us also introduce the following notation that will be used throughout the rest of the paper. Define

α > 0, η > 0 and ξ > 0 as follow:

α ≡ min {|I(i, j)− I(s, t)| > 0 : 0 ≤ i, j, s, t ≤ N and (i, j) 6= (s, t)} ,

η ≡ min {|θi (E)− θj (E)| : E ∈ E and {i, j} ⊆ E with i 6= j} ,

ξ ≡ min{α, η}/2 . (6)

Note that α corresponds to the minimum distance between any two intersection points, which is

strictly positive by Assumption 1. Moreover, under Assumption 1, the parameter η is positive

because µi 6= µj , and thus, θi(E) 6= θj(E) for all i 6= j and E ∈ E .

20

The main result of this section is given in the following theorem, which establishes an error

bound for the estimated intersection points. The proof of this result appears in Appendix D.

Theorem 6. Under Assumptions 1 and 2, there exist a1 > 0 and a2 > 0 such that if each assort-

ment in E is offered to T independent customers, then for each 0 < ε < 1,

P{

maxE∈E

{max

{i,j}⊆E:i 6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ ,max

i∈E

∣∣∣θi (E)− θi (E)∣∣∣} > ε

}≤ 40

(N

C

)2 (a1

ε2lna1

ε2

)C+1e−a2ε4T ,

where B = 2M + ln(1 + C) and we can take a1 and a2 to be any number such that

a1 ≥104eB (max`w`)

2

η4and a2 ≤

η8

6× 106 ×B2 (max`w`)2 .

Let us emphasize the importance of Theorem 6. From the discussion at the end of Section

2.2 and as summarized in Proposition 2, we know the following key fact. If the ordering of the

estimated mean utilities and the ordering of the estimated intersection points coincide with the

true (yet unknown) orderings, then the sequence of assortments generated by the StaticMNL

algorithm under the estimated utilities will be exactly the same as the output under the true mean

utilities.

Consider the event that∣∣∣θi(E)− θi(E)

∣∣∣ ≤ ξ and∣∣∣IE(i, j)− I(i, j)

∣∣∣ ≤ ξ for all E ∈ E and for all

{i, j} ⊆ E. Theorem 6 tells us that, after observing the selections from T independent customers,

this event will happen with a very high probability of at least 1 − O(e−ξ

4T)

. When this event

happens, it follows that the ordering of the intersection points based on IE(i, j) and the ordering

of the mean utilities based on θi(E) will coincide with the true orderings. From Proposition 2, we

know that when this happens, we are guaranteed that the outputs of the StaticMNL algorithm

– using the estimated utilities as inputs – will be exactly the same as if we had use the true mean

utilities as inputs. This result is summarized in the following corollary.

Before we proceed to the statement of the corollary, let us first introduce the following notation.

Let A = 〈A0, A1, . . . , AK〉 denote the sequence of assortments generated by the StaticMNL under

the true (yet unknown) mean utilities. Suppose that each assortment in E is offered to T indepen-

dent customers. Let A(T ) =⟨A0(T ), A1(T ), . . . , AK′(T )

⟩denote the output of the StaticMNL

algorithm when we use as inputs the estimated utilities based on the observed selections of the T

customers. Note that A(T ) is a random variable whose outcome depends on the selections of the

customers.

Corollary 7. Suppose we use the estimated utilities computed from the observations of T inde-

pendent customers as inputs into the StaticMNL algorithm. Then, under Assumptions 1 and 2,

with probability at least 1 − O(e−ξ

4T)

, the output of the StaticMNL algorithm (with estimated

21

utilities as inputs) will be exactly the same as the output under the true mean utilities, that is,

Pr{A(T ) = A

}≥ 1− 40

(N

C

)2(a1

ξ2lna1

ξ2

)C+1

e−a2ξ4T ,

where the constants a1 and a2 are given in Theorem 6, and the parameter ξ is defined in Equation

(6).

We note that the right hand side of the inequality in Corollary 7 increases to one as the number

of customers T increases. As before, we emphasize that the bound in the above theorem represents

the worst-case error bound, and it provides only a qualitative understanding of how the errors

behave as the number of customers increases. The constants appearing the above theorem are too

large to be of practical values. In practice, we observe that our estimates converge significantly

faster than what is predicted by the above theorem.

4. Dynamic Optimization

To develop an algorithm for dynamic assortment optimization, we will integrate the following

results: our polynomial-time algorithm for the static optimization problem from Section 2.2, the

structural insight from Sections 2.3 and 2.4, and the parameter estimation technique from the

previous section. We now provide an overview of our proposed adaptive policy. A more detailed

description of the algorithm and its analysis are given in Section 4.2. To simplify our discussion, we

will assume throughout this section that in each period, we offer an assortment to a single customer

and observe his or her selection. We introduce this assumption primarily to simplify our exposition.

Our analysis easily extends to settings where a single assortment is offered to multiple customers.

Our policy operates in cycles. Each cycle t ≥ 1 consists of an exploration phase followed by an

exploitation phase. During the exploration, we offer each assortment E ∈ E to a single customer

and observe his or her selection. At the end of the tth cycle, we will have a total of t observations

from the past t exploration phases associated with each assortment E ∈ E . We then compute the

maximum likelihood estimates based on these observations, and determine the estimated ordering

of the mean utilities and the estimated ordering of the intersection points.

We then begin the exploitation phase of cycle t by applying the StaticMNL algorithm, us-

ing the estimated orderings as inputs. The StaticMNL algorithm will generate a sequence of

assortments At =⟨At1, . . . , A

tLt

⟩as an output with Lt = O(N2). By Corollary 7, we know that

with a very high probability, the sequence At will contain the true optimal assortment S∗C with

f(S∗C) = Z∗ (see Equation (1)).

We cannot, however, evaluate the profit function f(·) directly because we do not know the

true mean utilities µj . However, we can view each assortment Atk in the collection At as an arm

22

in a multiarmed bandit problem (with a total of O(N2) arms), and apply generic all-purpose

multiarmed bandit algorithms (for example, Lai and Robbins (1985) and Auer et al. (2002)) to

find the assortment that gives the maximum expected profit. Since the number of arms Lt is only

O(N2), these generic multiarmed bandit algorithms will quickly identify the optimal arm.

By exploiting the unimodality of the sequence of profits (Proposition 4), we can identify the

optimal assortment even more quickly by using a sampling-based version of the classic Golden Ratio

Search for unimodal function. During our experiments, we found that the sampling-based golden

ratio search performs significantly better than the traditional multiarmed bandit algorithms. All of

the numerical experiments in Section 6 are conducted exclusively using the sampling-based golden

ratio search. In Section 4.1, we provide a more detailed comparison between the efficiency of the

bandit algorithms and the sampling-based golden ratio search.

4.1 Sampling-Based Golden Ratio Search Algorithm for Finding the Optimal Assort-

ment Without Knowing the Profit Function

Motivated by the discussion from the previous section, we consider the problem of finding the

optimal assortment from a sequence of assortments D1, . . . , DL. In our adaptive algorithm, this

sequence of assortments will correspond to the output of the StaticMNL algorithm, with our

estimates of the mean utilities as its inputs. If we can evaluate the profit function f(·) exactly,

this is a simple search problem. Of course, we cannot compute f(·) exactly since we do know the

true mean utilities. For each assortment S, we can only obtain an estimate of f(S) by offering the

assortment to a customer and observing her selection.

The problem of finding an optimal assortment when the profit function is unknown is an instance

of a well-known multiarmed bandit problem (see, for example, Lai and Robbins (1985); Auer et al.

(2002) for more details). Given the assortments D1, . . . , DL and a time horizon T , we wish to find

a strategy φ = (j1, j2, . . . jT ) for trying the different assortments in order to maximize the total

expected profit∑T

t=1E[f(Djt)]

over T periods. In this case, jt ∈ {1, . . . , L} denotes the index of

the assortment chosen in period t and it can depend only on the observed selections in the previous

t− 1 periods.

By formulating our problem as an instance of the multiarmed bandit, we can apply many of

the standard algorithms that have been developed for this problem. One such algorithm is the

UCB1 algorithm introduced by Auer et al. (2002). The algorithm maintains an average reward

associated with each assortment. In each period, the algorithm computes an index associated with

each assortment that corresponds to the sum of the average profit and an additional term that

reflects how often the assortment has been tried. Then, the algorithm chooses the assortment with

23

the highest index. Auer et al. (2002) establish the following lower bound on total expected profit

earned after T periods under the UCB1 algorithm:

E

[T∑t=1

f(Djt)

]≥(

max1≤i≤L

f(Di))

T − Fucb1 −Hucb1 lnT, (7)

where Fucb1 = 5L (max`w`) (max1≤`≤L ∆`) and Hucb1 = 8L(max`w`) max`:∆`>0 1/∆`, and for each

` = 1, . . . , L, ∆` = maxi=1,...,L f(Di)− f(D`).

The above result shows that the average expected profit under the UCB1 algorithm, 1T

∑Tt=1

E[f(Djt)]

, converges to the maximum expected profits at the rate of O (log T/T ). However, we

note that the constants Fucb1 and Gucb1 that appear in the performance bound for the UCB1

algorithm increase linearly with the number of assortments L. For our application, the number of

assortments generated by the StaticMNL algorithm is of order C(N + C − 1) by Proposition 3.

Thus, the performance and the error associated with the UCB1 algorithm will scale linearly with

the number of products N .

By exploiting the unimodality of the sequence of expected profits (Proposition 4), we can

develop a search algorithm whose error scales only with log(N) instead of N . When the number

of products N is quite large, this can yield a significant improvement in the performance of our

algorithm. Recall from Proposition 4 that given the sequence of the assortments 〈A0, A1, . . . , AK〉

generated by the StaticMNL algorithm, the sequence of profits 〈f (A0) , f (A1) , . . . , f (AK)〉 is

unimodal. If we can evaluate f(Ai)’s exactly, we can apply the classical Golden Ratio or Fibonacci

search algorithms (see, for example, Press et al. (1999)), which will find the maximum in O (logK)

steps.

For each assortment S, although we cannot evaluate f(S) exactly, we can offer the assortment

S to a sample of customers and compute the average profits from the resulting sales. If the number

of customers is sufficiently large, then the average profit should serve as a good proxy for the true

expected value f(S). This suggests that we can adapt the classical Golden Ratio search to our

setting by replacing the actual expected profit with the sample average. The resulting algorithm,

which we refer to as the Sampling-Based Golden Ratio Search algorithm, is given below.

We note that the algorithm requires as its input the minimum number of samples WGRS for each

assortment. This represents the minimum number of customers to whom we must have offered an

assortment and observe their selections.

Sampling-Based Golden Ratio Search

INPUTS: A unimodal sequence of assortments⟨D1, D2, . . . , DL

⟩, a time horizon T , and a minimum

number of samples WGRS for each assortment.

24

INITIALIZATION: We perform the standard Golden Ratio search. Whenever we need to compare

the values of two assortments Ds and Dt, we check to see if each assortment has been offered to at

least WGRS independent customers. If not, then offer each of these assortments to the customers

until we have at least WGRS observations for each assortment. If we have enough data, compare

the two assortments based on the average profits obtained and proceed as in the classical Golden

Search algorithm.

DESCRIPTION: At the end of the initialization phase, we are left with a single assortment, offer

that assortment until the end of the horizon.

The idea of using the sample average as an approximation of the true expectation has been

applied in many applications (see, for example, Shapiro (2003), Levi et al. (2005), and Swamy

and Shmoys (2005) for notable successes). We present the algorithm and its analysis primarily

for the sake of completeness. We will use this algorithm in the next section when we present an

adaptive policy for generating a sequence of assortments. The following proposition establishes a

performance bound associated with our sampling-based golden ratio research; the proof appears in

Appendix E. Let a sequence of assortments D1, . . . , DL and a time horizon T be given. Let β =

mini,j:f(Di)6=f(Dj)

∣∣f(Di)− f(Dj)∣∣, and let

{Djt : 1 ≤ t ≤ T

}denote the sequence of assortments

chosen by the Sampling-Based Golden Ratio Search algorithm.

Proposition 8. If the associated sequence of profits f(D1), . . . , f(DL) is unimodal and the number

of samples WGRS associated with each assortment is chosen so that WGRS =⌈4 (lnT ) (max`w`) /β2

⌉,

then the total expected profit after T periods is given by

E

[T∑t=1

f(Djt)]≥(

maxi=1,...,L

f(Di))T − Fg −Hg lnT,

where the constants Fg and Hg are given by:

Fg = 13(

max`w`

)ln(2L) and Hg =

34 (max`w`)2 ln(2L)

β2.

Let us contrast the performance of our Sampling-Based Golden Ratio Search with the

traditional multiarmed bandit UCB1 algorithm given in Equation (7). From Proposition 8, we

observe that average expected profit over T periods under the golden ratio search converges to

the optimal expected profit at the rate of O(log T/T ) as does the UCB1 algorithm. However,

the constants Fg and Hg appearing in the performance bound increase only with logarithm of the

number of assortments, as opposed to linear dependence in the UCB1 algorithm. In our application,

this translates to a dependence of log(NC) in the error bound. When the number of products N

is large, this can result in significant improvements in performance. In our numerical experiments,

we adaptively adjust the value of β using our current estimate of the mean utilities.

25

4.2 An Adaptive Algorithm for Joint Parameter Estimation and Assortment Opti-

mization

We now provide a detailed description of our dynamic policy for joint parameter estimation and as-

sortment optimization, which we refer to as the Adaptive Assortment algorithm. As mentioned

at the beginning of Section 4, our algorithm proceeds in cycles; each cycle consists of an exploration

phase followed by an exploitation phase. In the exploration phase, we refine our estimates of the

mean utilities by offering a collection of assortments to customers, observing their selections, and

estimating the utilities from the sales data. Then, in the exploitation phase, we generate a sequence

of assortments by applying the StaticMNL algorithm from Section 2.2, using our estimates of the

mean utilities as a proxy for the true values. Given a sequence of assortments outputted by Stat-

icMNL algorithm, we then search for the one that gives the highest expected profits using the

Sampling-Based Golden Ratio Search algorithm introduced in the previous section. Once

we identify the “optimal” assortment, we offer it to a pre-specified number of customers to complete

the exploitation phase. A formal description of the algorithm is given below. As we mentioned

earlier, we will assume for simplicity that in each period, we offer a single assortment to a single

customer.

Adaptive Assortment

INPUTS: A vector of marginal profits w = (w1, . . . , wN ) and a fixed capacity C. For any t ≥ 1,

let Vt denote the number of periods associated with exploitation phase of the tth cycle.

INITIALIZATION: Construct a collection of assortments E so that for each E ∈ E , |E| = C and

for each i 6= j, there exists E ∈ E such that {i, j} ∈ E (see Equation (4) for more details).

DESCRIPTION: The algorithm operates in cycle. Each cycle t ≥ 1 consists of two phases that

are defined as follows:

• Phase 1 (Exploration): This phase will consist of a total of |E| periods. For each assortment

E ∈ E , we offer it to a single customer and let Ct(E) ∈ {0} ∪E denote the selection made by

the customer.

Let µt (E) =(µti(E) : i ∈ E

)denote the maximum likelihood estimate of the mean utili-

ties based on the selections C1(E), . . . , Ct(E) made in Phase 1 of the previous t cycles (see

Equation (5)). Also, let θt(E) =(θti(E) : i ∈ E

)denote the estimated selection probabilities.

For any i 6= j, find E ∈ E such that {i, j} ⊆ E and compute the estimated intersection point

ItE(i, j) based on the estimated selection probabilities θt(E). Also, estimate the ordering

between µi and µj based on θti(E) and θtj(E).

26

Let Dt ={(it`, j

t`

): 0 ≤ it` < jt` ≤ N

}denote an ordering of the estimated intersection points

ItE(i, j)’s, and let σt denote the ordering of the estimated utilities.

• Phase 2 (Exploitation): Apply the StaticMNL algorithm from Section 2.2 using as inputs

the estimated orderings Dt and σt based on our estimated utilities computed from the

exploration phase. Let At =⟨At1, . . . , A

tLt

⟩denote the sequence of assortments generated

by the StaticMNL algorithm. Using the collection of assortments At as inputs, apply the

Sampling-Based Golden Ratio Search algorithm for the next Vt periods.

OUTPUTS: Let{Zti : t ≥ 1, 1 ≤ i ≤ |E|+ Vt

}denote the sequence of profits generated, where Zti

denote the profit in period i of the tth cycle.

Let w∗ = max`=1,...,N w`. The following theorem establishes an upper bound on the difference

between the optimal expected profit Z∗ and the average expected profit under the Adaptive

Assortment algorithm for any choice of cycle length Vt.

Theorem 9. Under Assumptions 1 and 2, for any M ≥ 1, the difference between the maximum

expected profit Z∗ and the average expected profit per period after M cycles under the Adaptive

Assortment algorithm is given by

Z∗ −∑M

t=1

∑|E|+Vt

i=1 E[Zti]

|E|M +∑M

t=1 Vt≤ A1∑M

t=1 Vt+

A2M∑Mt=1 Vt

+A3∑M

t=1 lnVt∑Mt=1 Vt

,

where

A1 = 40w∗(N

C

)2(a1

ξ2lna1

ξ2

)C+1 ∞∑t=1

Vte−a2ξ4t and A2 = 5

(N

C

)2

w∗ + Fg and A3 = Hg,

where the constants Fg and Hg are given in Proposition 8 while the constants a1 and a2 are given

in Theorem 6, and ξ is defined in Equation (6).

As an immediate corollary of the above theorem, we can choose a cycle length Vt so that

Vt =⌊exp

{(a2ξ

4 − ε)· t}⌋

for each t for some 0 < ε < a2ξ4. Then, we have that

∞∑t=1

Vte−a2ξ4t ≤

∞∑t=1

e−εt ≤ 11− e−ε

,

which implies that A1 <∞. Moreover,∑Mt=1 lnVt∑Mt=1 Vt

≤∑M

t=1

(a2ξ

4 − ε)t∑M

t=1 exp {(a2ξ4 − ε) t}≤

2(a2ξ

4 − ε)M2∫M

0 exp {(a2ξ4 − ε) t} dt

=2(a2ξ

4 − ε)2M2

exp {(a2ξ4 − ε)M} − 1= O

(

log(∑M

t=1 Vt

))2

∑Mt=1 Vt

,

27

which implies that, for large values of T , the average expected profit of our Adaptive Assortment

algorithm after T periods converges to Z∗ at the rate of O(

(log T )2 /T)

. This result is summarized

in the following corollary.

Corollary 10. If we choose the cycle length Vt so that Vt = O(exp

{(a2ξ

4 − ε)· t})

for all t, then

the average expected profit after T periods under the Adaptive Assortment algorithm converges

to Z∗ at the rate of O(

(log T )2 /T)

.

We are now ready to give a proof of Theorem 9.

Proof. Let A = 〈A0, A1, . . . , AK〉 denote the sequence assortments generated by the StaticMNL

algorithm under the true (yet unknown) mean utilities. Recall that At =⟨At1, . . . , A

tLt

⟩denote the

output of the StaticMNL algorithm when the inputs are the estimated mean utilities computed

from the observations of t independent customers at the end of the tth exploration phase. The

sequence At is a random variable that depends on the customer selections. It follows immediately

from Corollary 7 that

Pr{At = A

}≥ 1− 40

(N

C

)2(a1

ξ2lna1

ξ2

)C+1

e−a2ξ4t .

Since the profit earned in any period is always nonnegative, for each cycle t = 1, 2, . . . ,M ,

|E|+Vt∑i=1

E[Zti]≥ P

{At = A

}E

|E|+Vt∑i=|E|+1

Zti

∣∣∣∣∣ At = A

≥ P {At = A}

(VtZ∗ − Fg −Hg lnVt) ,

where the final inequality follows from the fact that when the event At = A occurs, the sequence of

profits is unimodal by Proposition 4 and the performance guarantee of the Sampling-Based Golden

Ration Search (Proposition 8) holds.

Using the lower bound on Pr{At = A

}, we obtain

|E|+Vt∑i=1

E[Zti]≥ VtZ

∗ − Fg −Hg lnVt − 40(N

C

)2(a1

ξ2lna1

ξ2

)C+1

VtZ∗e−a2ξ4t

= (Vt + |E|)Z∗ − |E|Z∗ − Fg −Hg lnVt − 40w∗(N

C

)2(a1

ξ2lna1

ξ2

)C+1

Vte−a2ξ4t

≥ (Vt + |E|)Z∗ − 40w∗(N

C

)2(a1

ξ2lna1

ξ2

)C+1

Vte−a2ξ4t − 5

(N

C

)2

w∗ − Fg −Hg lnVt

where the last inequality follows from the fact that Z∗ ≤ w∗. Therefore, the average expected profit

per period after M cycles is given by∑Mt=1

∑|E|+Vt

i=1 E[Zti]

|E|M +∑M

t=1 Vt≥ Z∗ − A1∑M

t=1 Vt− A2M∑M

t=1 Vt−A3∑M

t=1 lnVt∑Mt=1 Vt

which is the desired result.

28

5. Extensions and Future Research

In this section, we discuss extensions of results from the previous section and additional ideas for

improving the performance of our proposed Adaptive Assortment algorithm. These extensions

will be incorporated in our implementation of the algorithm in the next section.

5.1 Allowing Linear-In-Parameters Utility Expression

Thus far, our model assumes that the mean utility µi’s are independent and must be estimated

individually for each product. In many applications, we often assume that the mean utility µi is

given by a linear combination of the features associated with product i, that is, for i = 1, 2, . . . , N ,

µi =K∑`=1

β`φi,`,

where for each i = 1, . . . , N , φi = (φi,1, . . . , φi,K) ∈ RK is a K-dimensional vector that represents

the features of the ith product. Examples of product features might include its price, customer

reviews, or brands. We assume the feature vectors φ1, . . . , φN are known in advance and only need

to estimate the coefficients β1, . . . , βK . When K ≤ C, all of the results that we have developed in

Sections 3 and 4 easily extend to this setting. We omit the details due to space constraints.

5.2 Allowing For the Same Utilities (Relaxing Assumption 1)

Recall that Assumption 1 requires that the utilities of all products and the corresponding inter-

section points are distinct. As mentioned before, this assumption is used primarily to simplify our

discussion and to avoid degenerate cases. We emphasize that the correctness of the StaticMNL

algorithm in Section 2.2 does not depend on Assumption 1. Also, by slightly perturbing the utility

value or the marginal profit of each product, we can always guarantee that the problem instance

satisfies this assumption.

5.3 Removing Bounds on the Utilities (Relaxing Assumption 2)

Assumption 2 requires that we know upper and lower bounds on the true mean utilities µi’s.

The reason for this assumption is purely theoretical. The sampling complexity bound derived in

Theorems 5 and 6 requires that our likelihood function is bounded. In practice, this is almost never

an issue since the likelihood function in our problem is strictly concave, and thus, the maximum

is guaranteed to exist. By using a more complex argument and exploiting the concavity of the

likelihood function, it is possible to relax Assumption 2. For more details, the reader is referred to

Niemiro (1992).

29

6. Numerical Experiments

We describe the numerical experiments that we perform to validate our proposed algorithms. In

Section 6.1, we describe our motivation, the dataset used in our experiment, and our model of the

mean utilities. Then, in Section 6.2, we consider the static CAPACITATED MNL optimization

problem, and contrast the optimal assortment with the assortment computed under the greedy

heuristic of Gallego et al. (2004) and Liu and Van Ryzin (2004). Then, in Section 6.3, we assume

the mean utilities are unknown and apply our proposed Adaptive Assortment algorithm.

6.1 Motivation, Dataset, and Model

Before we can evaluate the performance of both the StaticMNL and Adaptive Assortment

algorithms, we need to identify a set of products and specify their mean utilities. To help us

understand the range of utility values that we might encounter in actual applications, we estimate

the utilities using data on DVD sales at a large online retailer. We consider DVDs that are sold

during a three-month period from July 1, 2005 through September 30, 2005. During this period,

the retailer sold over 4.3 million DVDs, spanning across 51,764 DVD titles.

To simplify our analysis, we restrict our attention to customers who have purchased DVDs

that account for the top 33% of the total sales, and assume that each customer purchases at most

one DVD. This gives us total of 1, 409, 261 customers in our dataset. We consider 200 products,

corresponding to the 200 best-selling DVDs. These 200 DVDs account for about 65% of the total

sales among our customers. We observe that the best-selling selling DVD in our dataset was

purchased by only about 2.6% of the customers. In fact, among the top ten best-selling DVDs,

each one was sold to only around 1.1% - 2.6% of the customers. Thus, only a small fraction of the

customers purchased each DVD.

We assume that the mean utility of each DVD is a linear combination of its attributes; that is,

for each i = 1, 2, . . . , 200, the mean utility µi of the ith DVD is given by: µi = β0 +∑K

`=1 β`φi,`,

where φi = (φi,1, . . . , φi,K) represents the features of the ith DVD. We set the utility associated

with the option of no purchase to 0 (µ0 = 0). The attributes of each DVD that we consider are the

average selling price and customer reviews. We obtain data on customer reviews of each DVD from

Amazon.com web site through a publicly available interface via Amazon.com E-Commerce Services

(http://aws.amazon.com). Each visitor at the Amazon.com web site can provide a review and

a rating for each DVD. The rating is on a scale of 1 to 5, with 5 representing the most favorable

review. Each review can be voted by other visitors as either “helpful” or “not helpful”. For each

DVD, we consider all reviews up until June 30, 2005, and compute features such as the average

30

rating, the proportion of reviews that give a 5 rating, and the average number of helpful votes

received by each review, and so on.

We then perform statistical model selection and parameter estimation to determine the most

relevant attributes and the corresponding coefficients. We use the software BIOGEME developed

by Bierlaire (2003) to estimate the coefficients and perform model selections. The two most relevant

attributes are the total number of votes received by the reviews of the product and the average

number of “helpful” votes per review. We estimate that for each DVD i = 1, . . . , 200,

µi = −4.82 +(3.74× 10−5

)× φ1,i +

(5.74× 10−3

)× φ2,i (8)

where

φ1,i = Total Number of Votes Received by All Reviews of DVD i,

φ2,i = Average Number of Helpful Votes Received by Each Review of DVD i

All estimated coefficients (−4.82, 3.74×10−5, and 0.00574) are statistically significant with p-values

of 0.00, 0.03, and 0.06, respectively. We also checked for any correlation between the two product

features, and found them to have statistically no correlation. Figure 4 shows a histogram of the

estimated utility across the 200 best-selling DVDs, along with descriptive statistics. The average

utility is approximately -4.67, with a maximum of -3.55 and a minimum of -4.72. We note that

since the utility of the option of no purchase is set to 0 and the fraction of customers who purchase

any one DVD is at most 2.6%, the mean utility of each DVD will be negative.

Histogram of the Estimated Utility for 200 Best-Selling DVDs

0%

5%

10%

15%

20%

25%

30%

35%

< -4.80 -4.75 -4.7 -4.65 -4.6 -4.55 -4.5 -4.45 -4.4 > 4.40

Estimated Utility

% o

f Tot

al D

VDs

Average = -4.67Std Dev = 0.18Max = -3.55Min = -4.82Median = -4.72

Figure 4: Histogram of the estimated utility based on Equation (8) for the 200 best-selling DVDs.

31

In our dataset, it turns out that the average selling price of each DVD is not a significant

factor in determining customer purchases. We hypothesize that this occurs because for the 200

best-selling DVDs, the customers are not price sensitive and the sales volumes are driven primarily

by customer reviews. Validating this hypothesis remains an open research problem.

We emphasize that our model of the mean utility is quite simplistic and it is unlikely to capture

details and factors that affect each customer’s purchasing decision. Rather, our goal is to obtain

a rough estimate of the range of utilities values that one might encounter in actual applications.

Developing a more sophisticated choice model is a subject of ongoing research and is beyond the

scope of this paper.

6.2 Static Optimization

Using the estimated utility of each DVD computed from our sales data (see Equation (8)), we

first consider the optimal assortment under the StaticMNL algorithm. In this setting, we set

the marginal profit wi’s of each DVD to the average selling price during the three-month period

from which we collect the sales data. We then contrast the result with the assortment computed

under the greedy algorithm of Gallego et al. (2004) and Liu and Van Ryzin (2004), which simply

picks among the top C most expensive DVDs. Table 2 contrasts the optimal assortment computed

by the StaticMNL and the greedy algorithms when we set the capacity to 10. In this case, the

“greedy assortment” corresponds to the 10 most expensive DVDs. The optimal assortment yields

about 14% (= (7.31-6.43)/6.43) increase in the expected profit.

We note that the first seven DVDs in the optimal and the greedy assortments coincide. The

remaining three DVDs in the optimal assortment consist of Star Wars Trilogy, Fraggle Rock (Season

One), and The Muppet Show (Season One). Although their selling prices are less than the 10 most

expensive DVDs, they are quite popular and have high estimated utilities. The Star Wars Trilogy

DVD had over 7,300 reviews and these reviews received over 32,000 votes, with an average of

8.74 helpful votes per review. This represents the largest number of votes received by any DVD

in our dataset. Fraggle Rock and The Muppet Show receive significantly less total votes (only

about 487 and 901, respectively). However, a very large number of customers considered these

reviews to be quite helpful, and these two DVDs garnered an average of 157 and 191 helpful votes

per review, representing the two highest average number of helpful votes across all DVDs. These

factors contribute to the selection of these three DVDs as part of the optimal assortment.

Figure 5 shows the characteristics of the optimal assortment as the capacity C increases from 1

to 20. For each value of C, the greedy assortment corresponds to the C most expensive DVDs. The

solid-circle line in Figure 5 shows the improvement in the expected profit (relative to the greedy

32

Greedy Assortment (Top 10 Most Expensive DVDs): Expected Profit = $6.43

Title Estimated Average Selling

Utility Price ($)

24 - Seasons 1-3 -4.74 115.49

The Boston Red Sox 2004 World Series Collector’s Edition -4.74 92.03

Star Trek Enterprise - The Complete Second Season -4.70 91.67

Band of Brothers -4.47 79.35

The Lord of the Rings - The Motion Picture Trilogy -4.42 77.94

Six Feet Under - The Complete Fourth Season -4.71 70.12

The Sopranos - The Complete Fifth Season -4.70 64.97

Shelley Duvall’s Faerie Tale Theatre -The Complete Collection Gift Set -4.62 49.95

The O.C. - The Complete Second Season -4.73 48.97

Thundercats - Season One, Volume One -4.73 46.12

Optimal Assortment: Expected Profit = $7.31

Title Estimated Average Selling

Utility Price ($)

24 - Seasons 1-3 -4.74 115.49

The Boston Red Sox 2004 World Series Collector’s Edition -4.74 92.03

Star Trek Enterprise - The Complete Second Season -4.70 91.67

Band of Brothers -4.47 79.35

The Lord of the Rings - The Motion Picture Trilogy -4.42 77.94

Six Feet Under - The Complete Fourth Season -4.71 70.12

The Sopranos - The Complete Fifth Season -4.70 64.97

Star Wars Trilogy -3.55 45.45

Fraggle Rock - Complete First Season -3.90 31.49

The Muppet Show - Season One -3.69 27.99

Table 2: Expected profits under the greedy and the optimal assortments.

assortment) as the capacity C increases. For small values of C, the increases in expected profit can

be quite significant. The dash-diamond line in the figure shows the number of DVDs in the optimal

assortments that are not in the greedy assortment.

6.3 Dynamic Optimization

In this section, we assume that the coefficients in underlying utility model in Equation (8) are

not known in advance, and we must adaptively estimate the coefficients β0, β1 and β2 based on

observed sales and assortment decisions. Figure 6 shows the running average expected profit over

time under our proposed adaptive policy, when averaged over 100 problem instances each with a

capacity of 10. In this experiment, we assume that each period consists of a single customer and use

the Sampling-Based Golden Ratio Search algorithm introduced in Section 4.1. We observe that the

average expected profit approaches 90% of the optimal profit after we have offered assortments to

about 30,000 customers, representing approximately two days of total sales (since we have 1, 409, 261

customers over a three-month period). In fact, the running average profit surpasses the expected

profit of the greedy assortment after about one day of sales.

33

Chart1

Page 1

Characteristics of the Optimal Assortment as the Capacity Increases

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Capacity

Impr

ovem

ent i

n Pr

ofit

Rel

ativ

e to

the

Gre

edy

Ass

ortm

ent (

%)

0

1

2

3

4

5

6

7

Num

ber o

f DVD

s in

the

Opt

imal

A

ssor

tmen

t Tha

t are

NO

T in

the

Gre

edy

Ass

ortm

ent

Improvement in Expected Profit (Relative to the Greedy Assortment)Number of DVDs in the Optimal Assorment That are NOT in the Greedy Assortment

Figure 5: Improvements in the expected profit (relative to the greedy assortment) as the capacity increases,

and the number of DVDs in the optimal assortment that do not appear in the greedy assortment.

7. Conclusion

We study a dynamic assortment optimization problem under a multinomial logit choice model

and capacity constraints. We present an adaptive algorithm that combines parameter estimation

and assortment optimization, and analyze its convergence properties. Numerical experiments on

sales data from an online retailer indicate that the proposed policy performs well. There are several

interesting future research directions. For example, it would be interesting to consider more realistic

choice models. It is also interesting to explore how we can incorporate additional constraints in the

problem formulation, such as the total capital investment or inventory constraints.

Acknowledgement

The authors would like to thank Candi Yano for her helpful comments and suggestions on the

earlier draft of our manuscript, which inspires us to consider the multinomial logit choice model.

We are grateful for all of her help. We would like to thank Mike Todd for insightful comments on

linear algebra and for pointers to implementations of the Golden Ratio Search and Newton-Raphson

algorithms, and Yinyu Ye for helpful discussions on the problem. We also would like to thank Miao

Song for all of her help with the BIOGEME software. This research is supported in part by the

National Science Foundation through grants DMS-0732196, CMMI-0621433, and CMMI-0727640.

34

Run. Avg. Expected Profit Under the Adaptive Assortment Algorithmfor 200 Best Selling DVDs, Capacity = 10, 100 Problem Instances

(dash lines around the solid line represent 95% confidence interval)

85.0%

86.0%

87.0%

88.0%

89.0%

90.0%

91.0%

92.0%

93.0%

94.0%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Number of Customers (thousands)

Run

. Avg

. Pro

fit (%

of O

ptim

al)

Expected Profit Under Greedy Assortment

Avg. # of Customers Per Day ≈ 15,000

Figure 6: Running average expected profit under the Adaptive Assortment algorithm.

References

Agrawal, R. 1995. Sample mean based index policies with O(log n) regret for the multi-armed banditproblem. Advances in Applied Probability 27:1054–1078.

Alexouda, G., and K. Paparizos. 2001. A Genetic Algorithm Approach to the Product Line Design ProblemUsing the Seller’s Return Criterion: An Extensive Comparative Computational Study. European Journalof Operational Research 134:165–178.

Anantharam, V., P. Varaiya, and J. Walrand. 1987a. Asymptotically efficient adaptive allocation rules for themulti-armed bandit problem with multiple plays, Part I: I.I.D. rewards. IEEE Transactions on AutomaticControl AC-32 (11): 968–976.

Anantharam, V., P. Varaiya, and J. Walrand. 1987b. Asymptotically efficient adaptive allocation rules forthe multi-armed bandit problem with multiple plays, Part II : Markovian rewards. IEEE Transactions onAutomatic Control AC-32 (11): 977–982.

Anderson, S., A. de Palma, and J. F. Thisse. 1992. Discrete choice theory of product differentiation. MITPress.

Auer, P., N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. 1995. Gambling in a rigged casino: The adversarialmulti-armed bandit problem. In Proceedings of the 6th Annual ACM-SIAM Symposium on DiscreteAlgorithms.

Auer, P., N. Cesa-Bianchi, and F. P. 2002. Finite-time analysis of the multiarmed bandit problem. MachineLearning 47:235–256.

Ben-Akiva, M., and S. Lerman. 1985. Discrete choice analysis: Theory and application to travel demand.MIT Press.

35

Berry, D. A., R. W. Chen, A. Zame, D. C. Heath, and L. A. Shepp. 1997. Bandit Problems With InfinitelyMany Arms. The Annals of Statistics 25 (5): 2103–2116.

Bierlaire, M. 2003. BIOGEME: A free package for the estimation of discrete choice models. In Proceedingsof the 3rd Swiss Transportation Research Conference.

Bultez, A., E. Gijsbrechts, P. Naert, and P. vanden Abeele. 1989. Asymmetric Cannibalism in RetailAssortments. Journal of Retailing 65 (2): 153–192.

Bultez, A., and P. Naert. 1988. SHARP: Shelf Allocation for Retailers Profit. Marketing Science 7:211–231.

Cachon, G., C. Terwiesch, and Y. Xu. 2005. Assortment Planning in the Presence of Consumer Search.Manufacturing and Service Operations Management 7:330–346.

Caro, F., and J. Gallien. 2006. Dynamic Assortment with Demand Learning for Seasonal Consumer Goods.To appear in Management Science.

Chen, K. D., and W. H. Hausman. 2001. Technical Note - Mathematical Properties of the Optimal ProductLine Selection Problem Using Choice-Based Conjoint Analysis. Management Science 46 (2): 327–332.

Chong, J. K., T. H. Ho, and C. S. Tang. 2001. A modeling framework for category assortment planning.Manufacturing Service Operation Management 3 (3): 191–210.

Corstjens, M., and P. Doyle. 1981. A Model for Optimizing Retail Space Allocations. Journal of Market-ing 27:822–833.

Corstjens, M., and P. Doyle. 1983. A Dynamic Model for Strategically Allocating Retail Space. Journal ofthe Operational Research Society 34 (10): 943–951.

Cover, T. M., and J. A. Thomas. 2006. Elements of information theory. Wiley.

Dobson, G., and S. Kalish. 1993. Heuristics for Pricing and Positioning a Product Line Using Conjoint andCost Data. Management Science 39:160–175.

Dreze, X., S. J. Hoch, and M. E. Purk. 1994. Shelf Management and Space Elasticity. Journal of Retail-ing 70:301–326.

Eckstein, Z., and K. I. Wolpin. 1989. The Specification and Estimation of Dynamic Stochastic DiscreteChoice Models: A Survey. The Journal of Human Resources 24 (4): 562–598.

Fligler, A., G. E. Fruchter, and R. Winer. 2006. Optimal Product Line Design: A Genetic AlgorithmApproach to Mitigate Cannibalization. To appear in Journal of Optimization Theory and Applications.

Gallego, G., G. Iyengar, R. Phillips, and A. Dubey. 2004. Managing Flexible Products on a Network. WorkingPaper.

Gaur, V., and D. Honhon. 2006. Assortment Planning and Inventory Decisions Under a Locational ChoiceModel. Management Science 52 (10): 1528–1543.

Green, P. E., and A. M. Krieger. 1985. Models and Heuristics for Product Line Selection. Marketing Science 4(1): 1–19.

Haussler, D. 1992. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other LearningApplications. Information and Computation 100.

36

Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. Journal of the AmericanStatistical Association 58:13–30.

Honhon, D., V. Gaur, and S. Seshadri. 2006. Assortment Planning and Inventory Management UnderDynamic Substitution. working paper.

Hotz, V. J., and R. A. Miller. 1993. Conditional Choice Probabilities and the Estimation of Dynamic Models.The Review of Economic Studies 60 (3): 497–529.

Keane, M. P., and K. I. Wolpin. 1994. The Solution and Estimation of Discrete Choice Dynamic Program-ming Models by Simulation and Interpolation: Monte Carlo Evidence. The Review of Economics andStatistics 76 (4): 648–672.

Kleinberg, R. 2004. Nearly Tight Bounds for the Continuum-Armed Bandit Problem. In Advances in NeuralInformation Processing Systems.

Kleinberg, R. 2005. Online decision problems with large strategy sets. Ph.D. Dissertation, MIT.

Kohli, R., and R. Sukumar. 1990. Heuristics for Product-Line Design Using Conjoint Analysis. ManagementScience 36 (12): 1464–1478.

Kok, A. G., and M. Fisher. 2004. Demand Estimation and Assortment Optimization Under Substitution:Methodology and Application. To appear in Operations Research.

Kok, A. G., M. Fisher, and R. Vaidyanathan. 2006. Assortment planning: Review of literature and industrypractice. In Retail Supply Chain Management: Kluwers.

Lai, T. L. 1987. Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. The Annals ofStatistics 15 (3): 1091–1114.

Lai, T. L., and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in AppliedMathematics 6:4–22.

Levi, R., R. O. Roundy, and D. B. Shmoys. 2005. Computing Provably Near-Optimal Sampling-BasedPolicies for Stochastic Inventory Control Models. Technical Report 1427, School of OR&IE, CornellUniversity. To appear in MOR.

Liu, Q., and G. Van Ryzin. 2004. On the Choice-Based Linear Programming Model for Network RevenueManagement. To appear in MSOM.

Mahajan, S., and G. V. Ryzin. 2001. Stocking Retail Assortments under Dynamic Substitution. OperationsResearch 49:334–351.

Mahajan, S., and G. van Ryzin. 1998. Retail inventories and consumer choice. In Quantitative Models forSupply Chain Management, ed. S. Tayur, R. Ganeshan, and M. Magazine, 491–551: Kluwer AcademicPublishers.

McBride, R. D., and S. Zufryden. 1988. An Integer Programming Approach To the Optimal Product LineSelection Problem. Marketing Science 7:126–140.

McFadden, D. 1974. Conditional Logit Analysis of Qualitative Choice Behavior. In Frontiers in Econometrics,105–142: Academic Press.

37

Megiddo, N. 1979. Combinatorial Optimization with Rational Objective Functions. Mathematics of Opera-tions Research 4 (4): 414–424.

Nair, S. K., L. S. Thakur, and K. Wen. 1995. Near Optimal Solutions For Product Line Design And Selection:Beam Search Heuristics. Management Science 41 (5): 767–785.

Niemiro, W. 1992. Asymptotics for M-Estimators Defined by Convex Minimization. The Annals of Statis-tics 20 (3): 1514–1533.

O’Connell, P. 2005. Shaping Cars for the Chinese. Business Week . November 21. Available at http:

//www.businessweek.com/magazine/content/05_47/b3960010.htm.

Paap, R., and P. H. Franses. 2000. A Dynamic Multinomial Probit Model for Brand Choice with DifferentLong-Run and Short-Run Effects of Marketing-Mix Variables. Journal of Applied Econometrics 15 (6):717–744.

Press, W. H., S. A. Teukolsky, and W. T. V. et al.. 1999. Numerical recipes in c, the art of scientificcomputing. Cambridge University Press.

Ryzin, G. V., and S. Mahajan. 1999. On the Relationship Between Inventory Costs and Variety Benefits inRetail Assortments. Management Science 45:1496–1509.

Shapiro, A. 2003. Stochastic Programming. In Handbook in Operations Research and Management Science:Elsevier.

Smith, S. 2006. Optimizing Retail Assortments for Diverse Preferences. working paper.

Smith, S. A., and N. Agrawal. 2000. Management of Multi-item Retail Inventories Systems with DemandSubstitution. Operations Research 48:50–64.

Swamy, C., and D. B. Shmoys. 2005. Sampling-based Approximation Algorithms for Multi-stage StochasticOptimization. In Proceedings of the 46th Annual IEEE Symposium on the Foundations of ComputerScience.

Yano, C. A., and G. Dobson. 1998. Profit Optimizing Product Line Design, Selection and Pricing withManufacturing Cost Considerations: A Survey. In Product Variety Management: Research and Advances,ed. T.-H. Ho and C. Tang, 145–176: Kluwer Academic Publisher.

Zimmerman, A. 2006. Thinking Local. To Boost Sale, Wal-Mart Drops One-Size-Fits-All Approach. WallStreet Journal . September 7, 2006.

A. Proof of Proposition 3

The proof of Proposition 3 makes use of the following four lemmas which give us additional insight into theproperties of the sets generated in each iteration of the StaticMNL(v, w,C) algorithm. The first lemma(whose proof appears in Appendix A.1) provides a simple characterization of the ordering among intersectionpoints among any three lines.

Lemma A.1. Under Assumption 1, for all 0 ≤ i < j < k ≤ N , one of the following conditions must hold:either I(i, j) < I(i, k) < I(j, k) or I(j, k) < I(i, k) < I(i, j).

38

The result of Lemma A.1 allows us to establish the following characterization of the sequence of permuta-tions generated by the StaticMNL(v, w,C) algorithm. We note that given a permutation τ : {1, . . . , N} →{1, . . . , N}, x is adjacent to y under τ if

∣∣τ−1(x)− τ−1(y)∣∣ = 1. The proof of this lemma appears in Appendix

A.2.

Lemma A.2. Let⟨σ0, . . . , σK

⟩denote the sequence of permutations generated by the StaticMNL(v, w,C)

algorithm. Then, under Assumption 1, for each t = 1, 2, . . . ,K, either σt = σt−1 or σt is obtained bytransposing two adjacent items under σt−1.

The next lemma shows that if a set Gt (corresponding to the first C elements in the permutation σt) isdistinct from all the previous sets, then one of its elements must be strictly smaller. The proof of this resultappears in Appendix A.3.

Lemma A.3. Let⟨G0, . . . , GK

⟩denote the sequence of sets generated by the StaticMNL(v, w,C) algo-

rithm. Under Assumption 1, for each t = 1, 2, . . . ,K, if Gt 6= Gt−1, then Gt is obtained from Gt−1 bytransposition between jt = σt−1

C and it = σt−1C+1 with 0 < it < jt, that is, Gt =

(Gt−1 \ {jt}

)∪ {it} and∑

`∈Gt ` <∑`∈Gt−1 `.

The next lemma shows that the size of the assortments generated by the StaticMNL algorithm is non-increasing and establishes the number of distinct assortments with sizes less than C. The proof appears inAppendix A.4.

Lemma A.4. Let⟨A0, . . . , AK

⟩denote the sequence of assortments generated by the StaticMNL(v, w,C)

algorithm. Then, under Assumption 1, for each s < C, there is exactly one distinct assortment of size s.

We are now ready to give a proof of Proposition 3.

Proof. Let⟨A0, . . . , AK

⟩denote the sequence of assortments generated by the StaticMNL(v, w,C) algo-

rithm. By Lemma A.4, we know that there are exactly C−1 distinct non-empty assortments of size C−1 orless. Thus, it suffices to count the number of distinct assortments of size C. Note that if At is an assortmentof size C, it must be the case that At = Gt. Therefore, the number of distinct assortments of size C isbounded above by the number of distinct subsets among G0, G1, . . . , GK .

Under Assumption 1, we know that, among h1(·), . . . , hN (·), each of the(N2

)pairs of lines will intersect

each other. Moreover, by Assumption 1, we have that σ0 = (N,N − 1, . . . , 2, 1), corresponding to the orderof hi(λ)’s at λ = −∞, and σK = (1, 2, . . . , N − 1, N) for λ = +∞. Note that∑

`∈G0

`−∑`∈GK

` = {N + (N − 1) + · · ·+ (N − C + 1)} − {1 + 2 + · · ·+ C}

= NC − 2(1 + 2 + · · ·+ C − 1)− C = NC − C(C − 1)− C = C(N − C).

By Lemma A.3, whenever Gt is distinct from Gt−1, the total value∑`∈Gt ` is strictly less than

∑`∈Gt−1 `.

Thus, in addition to G0, there can be at most C(N − C) distinct subsets of Gt’s. Therefore, the number ofdistinct subsets among G0, G1, . . . , GK is at most C(N − C) + 1. Thus, the maximum number of distinctnon-empty assortments is at most C(N −C) + 1 + (C − 1) = C(N −C + 1), which is the desired result.

A.1 Proof of Lemma A.1

39

I(i, k)

hj(·)−−Option2hj(·)−−Option1

Line : hk(·)

Line : hi(·)

I(i, j) I(i, j)

I(j, k)

I(j, k)

Figure 7: A geometric proof of Lemma A.1.

Proof. As shown in Figure 7, the proof of this lemma follows from a simple geometric intuition. By As-sumption 1, we have that vi < vj < vk, which implies that the line hi(·) will intersect with hk(·). Since vj isbetween vi and vk, the are two possible options for line hj(·) as shown in the two dash lines in Figure 7. Inboth cases, we observe that I(i, k) is always between I(i, j) and I(j, k), giving the desired results. A morealgebraic proof is also straightforward. We omit the details due to space constraints.


Proof. We will prove the lemma by induction on t. Consider the case when t = 1. By Assumption 1, weknow that σ0 = (N,N − 1, . . . , 2, 1). We want to show that either σ1 = σ0 or σ1 is obtained by transposingtwo adjacent products. Suppose that σ1 6= σ0. It then follows from the definition that σ1 is obtained fromσ0 by a transposition of i1 and j1, where I (i1, j1) corresponds to the smallest intersection point. Since thisis the first intersection point, it follows from Lemma A.1 that we must have j1 = i1 + 1, which is the desiredresult.

Now, suppose that the result holds for ` = 1, 2, . . . , t; that is, for each s = 1, . . . , t, σs is either equal toσs−1 or is obtained from σs−1 by transposing two adjacent products. We will now establish the result forσt+1 via proof by contradiction.

Suppose on the contrary that σt+1 6= σt and σt+1 is obtained by σt by transposing it and jt, which areNOT adjacent under σt. This implies there is another product m that is between it and jt under σt, thatis, it = σtu, m = σtv, and jt = σtw where either 1 ≤ u < v < w ≤ N or 1 ≤ w < v < u ≤ N .

We first claim that we cannot have 1 ≤ u < v < w ≤ N . The follows because we know from Assumption1(a) that σ0 = (N,N − 1, . . . , 2, 1). If 1 ≤ u < v < w ≤ N holds, this implies that we have σt =(. . . , it, . . . ,m, . . . , jt, . . .) . Since it < jt and the transpositions in all of the previous iterations involveadjacent items by induction, it must be the case that it and jt have switched places once before, that is, wemust have already encountered the intersection point I(it, jt) in the earlier iterations. Contradiction!

So, we have 1 ≤ w < v < u ≤ N , which implies that σt = (. . . , jt, . . . ,m, . . . , it, . . .) . By construction,we know that it < jt. Thus, there are three cases to consider for the value of m.

Case 1: m < it < jt. In this case, m is smaller than it but appears earlier in the ordering σt. Sinceσ0 = (N,N − 1, . . . , 2, 1) and all previous transpositions involve adjacent items by induction, it must be thecase that it and m interchange their positions in the earlier iteration, implying that I(m, it) < I(it, jt). It

40

thus follows from Lemma A.1 that

I(m, it) < I(m, jt) < I(it, jt),

implying that we must have already encountered the intersection point I(m, jt) before this iteration. Byinduction, this means that m and jt should have interchanged their places and we must have m before jt(note that m ≥ 1). This contradicts our definition of σt!

Case 2: it < m < jt. In this case, it follows from Lemma A.1 that either

I(it,m) < I(it, jt) < I(m, jt) OR I(m, jt) < I(it, jt) < I(it,m).

We will show that either one of the above condition will lead to a contradiction. Suppose that I(it,m) <I(it, jt) < I(m, jt). This implies that we must have encountered the intersection point I(it,m) before thecurrent iteration. Therefore, by induction, m and it should have switched places and we must have m afterit in σt. This again contradicts our definition of σt! Now, suppose that I(m, jt) < I(it, jt) < I(it,m).Then, we must have encountered the intersection point I(m, jt) before the current iteration. Therefore, byinduction, m and jt should have switched places and we must have m before jt in σt. This again contradictsour definition of σt!

Case 3: it < jt < m. The proof for this case is similar to Case 1 and we omit the details.

Thus, all three cases lead to contradictions. Therefore, it must be the case that σt+1 is obtained fromσt by transposing two adjacent products. This completes the induction.


Proof. By definition, if Gt 6= Gt−1, it must be the case that we encounter an intersection point I (it, jt)with 0 < it < jt. In this case, σt is obtained from σt−1 by transposition of it and jt. Moreover, we knowfrom Lemma A.2 that it is adjacent to jt. Note that Gt and Gt−1 correspond to the first C elements for theordering σt and σt−1, respectively. Thus, in order for Gt to be different from Gt−1, it must be the case thateither 1) σt−1

C = jt and σt−1C+1 = it, or 2) σt−1

C = it and σt−1C+1 = jt.

We will first show that option 2) is not feasible. Recall that under Assumption 1, we have σ0 =(N,N − 1, . . . , 2, 1). Since it < jt and every transposition only occur between adjacent items (by LemmaA.2), if σt−1

C = it and σt−1C+1 = jt, then it must be the case that it and jt have interchanged places before; that

is, we have already encountered the intersection point I (it, jt) in the earlier iterations. This is a contradiction!Therefore, we can only have σt−1

C = jt and σt−1C+1 = it. In this case, we have Gt =

(Gt−1 \ {jt}

)∪{it}, which

implies that∑`∈Gt `−

∑`∈Gt−1 ` = it − jt < 0, which is the desired result.


Proof. By definition,∣∣A0∣∣ = C. Let θ denote the smallest index such that

∣∣Aθ∣∣ < C. Note that by definition,|As| = C for all 0 ≤ s < θ, which implies that As = Gs. Thus, by Lemma A.3, if As 6= As−1, we must haveAs =

(As−1 \ {jt}

)∪{it}. To complete the proof of Lemma A.4, it suffices to show the following two results:

(a) Aθ = Aθ−1 \ {jθ}(b) For any 1 ≤ t ≤ K, if

∣∣At−1∣∣ < C, then either At = At−1 or At = At−1 \ {jt}

We will first prove part (a). The set Aθ is created when we encounter the θth intersection point I (iθ, jθ).We claim that we must have iθ = 0. Suppose on the contrary that we have 0 < iθ < jθ. In this case,

41

Bθ = Bθ−1. Thus, for∣∣Aθ∣∣ < C =

∣∣Aθ−1∣∣, it must be the case that Gθ 6= Gθ−1. It follows from Lemma

A.3 that Gθ is obtained from Gθ−1 by transposition of jθ = σθ−1C and iθ = σθ−1

C+1 with 0 < iθ < jθ. Since∣∣Aθ∣∣ < C, it follows that Bθ 3 iθ, which implies that we must have already encountered the intersectionI (0, iθ) earlier. By Lemma A.1,

I (0, iθ) < I (0, jθ) < I (iθ, jθ) ,

which implies that the intersection point I(0, jθ) appeared before the current iteration, and thus, Bθ−1 3 jθ,which implies that

∣∣Aθ−1∣∣ < C. Contradiction!

Therefore, iθ = 0. Then, we know that σθ = σθ−1, Gθ = Gθ−1, and Bθ = Bθ−1 ∪ {jθ}. Since∣∣Aθ−1∣∣ = C >

∣∣Aθ∣∣, Gθ−1 ∩ Bθ−1 = ∅ and Gθ ∩ Bθ 6= ∅. Since only jθ is added to the set Bθ−1, we haveGθ ∩Bθ = {jθ}, and therefore, Aθ = Gθ \Bθ = Gθ \ {jθ} = Gθ−1 \ {jθ} = Aθ−1 \ {jθ}, which is the desiredresult.

To prove part (b), consider At−1 such that∣∣At−1

∣∣ = r < C for some r. Note that by the def-inition, σt−1 =

(σt−1

1 , . . . , σt−1N

)represent the ordering of the lines h1(·), . . . , hN (·) during the interval(

I (it−1, jj−1) , I (it, jt))

with

hσt−11

(I (it, jt)) ≥ hσt−12

(I (it, jt)) ≥ · · · ≥ hσt−1N

(I (it, jt)) (9)

Since Gt−1 ={σt−1

1 , . . . , σt−1C

}and

∣∣At−1∣∣ = r < C, σt−1

r+1 ∈ Bt−1, which implies that

hσt−1r

(I (it, jt)) ≥ 0 > hσt−1r+1

(I (it, jt)) ≥ · · · ≥ hσt−1N

(I (it, jt)) .

Since the functions hi(·)’s are strictly decreasing, we have Bt−1 ={σt−1r+1, σ

t−1r+2, . . . , σ

t−1N

}.

We will now show that either At = At−1 or At = At−1\{jt}. Consider the tth intersection point I (it, jt).There are two cases to consider: it = 0 and it ≥ 1. Suppose that it = 0. Then, we claim that jt = σt−1

r . SinceBt−1 =

{σt−1r+1, σ

t−1r+2, . . . , σ

t−1N

}, we know that jt ∈

{σt−1

1 , . . . , σt−1r

}. If, on the contrary, jt = σt−1

k wherek < r, this means that I

(0, σt−1

k

)< I

(0, σt−1

r

); that is, the value associated with the line hσt−1

rremains non-

negative. However, by the ordering in (9), it must be the case that 0 > hσt−1k

(I (it, jt)) ≥ hσt−1r

(I (it, jt)).Contradiction! Therefore, jt = σt−1

r , which implies that At = At−1 \{σt−1r

}, which is the desired result.

On the other hand, suppose that it ≥ 1. In this case, we have Bt = Bt−1. We claim that we musteither have {it, jt} ⊂ Bt or {it, jt} ⊂ (Bt)c; that is, either both elements are in Bt or both or not. Thiswill show that At = At−1, which is the desired result. To prove this, note that since 0 < it < jt, itfollows from Lemma A.1 that either I(0, it) < I(0, jt) < I(it, jt) or I(it, jt) < I(0, jt) < I(0, it). IfI(0, it) < I(0, jt) < I(it, jt) holds, then the intersection points I(0, it) and I(0, jt) appear in the earlieriterations, implying that {it, jt} ⊂ Bt. On the other hand, if I(it, jt) < I(0, jt) < I(0, it), then it /∈ Bt andjt /∈ Bt.

B. Proof of Proposition 4

Let us introduce the following notation that will be used throughout the rest of this section. Let the functiong : R→ R be defined by:

g(λ) = max

{∑`∈X

v` (w` − λ) : X ⊆ {1, 2, . . . , N} and |X| ≤ C

}.

The following lemma establishes important properties of the function g and its relationship with the optimalprofit Z∗.

42

Lemma B.1. Under Assumption 1, we have the following properties.

• The function g is piecewise linear, non-increasing, convex, and continuous.

• For any λ ∈ R, if λ ∈[I (it, jt) , I (it+1, jt+1)

)for some t = 0, . . . ,K, then g(λ) =

∑`∈At v` (w` − λ)

• Z∗ = max {λ : g(λ) ≥ λ}

Proof. Observe that g(λ) can be expressed as a solution of the following linear program:

g(λ) = max

{N∑`=1

z`v` (w` − λ) :N∑i=1

zi ≤ C and 0 ≤ zi ≤ 1 for all i

}

The first part of Lemma B.1 follows immediately from standard results in linear programming.

To establish the second part of the lemma, note that to compute g(λ), we would order the products in adescending order of hi(λ) = vi (wi − λ) and choose the first C product with non-negative values. By defini-tion, we know that within the interval [I (it, jt) , I (it+1, jt+1)), none of the N + 1 lines h0(·), h1(·), . . . , hN (·)intersects each other. Thus, for λ ∈ [I (it, jt) , I (it+1, jt+1)), the ordering of hi(λ) is given by σt and thefirst C items with non-negative value is simply At. This is exactly what we need to show.

The final part of Lemma B.1 follows from the following argument. By definition of the profit functionf , there exists X ⊂ {1, . . . , N} with |X| ≤ C such that f(X) ≥ λ if and only if

∑i∈X vi (wi − λ) ≥ λ,

which occurs if and only if maxA:|A|≤C∑i∈A vi (wi − λ) ≥ λ, which is equivalent to g(λ) ≥ λ. Therefore,

Z∗ = max {λ : g(λ) ≥ λ}

The next lemma provides an expression for the change in the profit for the sequence of assortmentsgenerated by the StaticMNL algorithm. For x ∈ R, we let

sign(x) =

+1, if x > 0,0, if x = 0,−1, if x < 0

Lemma B.2. Let⟨A0, . . . , AK

⟩denote the sequence of assortments generated by the StaticMNL(v, w,C)

algorithm. Then, under Assumption 1, for each t = 1, 2, . . . ,K, if At 6= At−1,

sign(f(At)− f

(At−1

))= sign (g (I (it, jt))− I (it, jt)) .

43

Proof. There are two cases to consider: |At| = C and |At| < C. Suppose that |At| = C. It then follows fromLemma A.4 that At =

(At−1 \ {jt}

)∪ {it} with 1 ≤ it < jt. Let X = At−1 \ {it, jt}. Then, we have

f(At)− f

(At−1

)=

∑`∈X w`v` + witvit

1 +∑`∈X v` + vit

−∑`∈X w`v` + wjtvjt

1 +∑`∈X v` + vjt

=

(1 +

∑`∈X v`

)(witvit − wjtvjt)−

(∑`∈X w`v`

)(vit − vjt) + (wit − wjt) vitvjt(

1 +∑`∈X v` + vit

) (1 +

∑`∈X v` + vjt

)=

(vit − vjt) ·{(

1 +∑`∈X v`

) (witvit−wjtvjt)vit−vjt

−(∑

`∈X w`v`)

+ (wit−wjt)vitvjt

vit−vjt

}(1 +

∑`∈X v` + vit

) (1 +

∑`∈X v` + vjt

)=

(vit − vjt) ·{(

1 +∑`∈X v`

)I (it, jt)−

(∑`∈X w`v`

)− hit (I (it, jt))

}(1 +

∑`∈X v` + vit

) (1 +

∑`∈X v` + vjt

)=− (vit − vjt) ·

{−I(it, jt) + hit (I (it, jt)) +

∑`∈X v` (w` − I (it, jt))

}(1 +

∑`∈X v` + vit

) (1 +

∑`∈X v` + vjt

)=− (vit − vjt) ·

{−I(it, jt) +

∑`∈At h` (I(it, jt))

}(1 +

∑`∈X v` + vit

) (1 +

∑`∈X v` + vjt

) ,

where the fourth equality follows from the fact that

I (it, jt) =(witvit − wjtvjt)

vit − vjtand hit (I (it, jt)) =

− (wit − wjt) vitvjtvit − vjt

Note that the denominator is always positive. Also, since it < jt, it follows from Assumption 1 that− (vit − vjt) is always positive. Thus, the sign of f (At)−f

(At−1

)is determined by−I(it, jt)+

∑`∈At h` (I(it, jt)),

which is equal to −I(it, jt) + g (I (it, jt)) by Lemma B.1. This is the desired result. A similar argumentapplies to the case when |At| < C.

We are now ready to give a proof of Proposition 4.

Proof. Let m∗ ∈ {0, 1, . . . ,K} denote the largest index such that

I (i1, j1) ≤ · · · ≤ I (im∗ , jm∗) ≤ Z∗ < I (im∗+1, jm∗+1) < · · · < I (iK , jK) .

Consider any t = 1, . . . ,m∗, if At 6= At−1, then it follows from Lemma B.2 that

sign(f(At)− f

(At−1

))= sign (g (I(it, jt))− I(it, jt)) ≥ 0,

where the last inequality follows from the fact that I (it, jt) ≤ Z∗ and Z∗ = max {λ : g(λ) ≥ λ}. Thisimplies that g (I(it, jt)) ≥ I(it, jt). Thus, the sequence 〈f (At) : t = 1, 2, . . . ,K〉 is non-decreasing for t =0, 1, . . . ,m∗.

On the other hand, consider any t ≥ m∗ + 1. In this case, we have I (it, jt) > Z∗, which implies thatg (I(it, jt)) < I(it, jt). Using the same arguments as above, we can show that

sign(f(At)− f

(At−1

))≤ 0,

and thus, the sequence 〈f (At) : t = 1, 2, . . . ,K〉 is non-increasing for t ≥ m∗ + 1. So, we have that

f(A0)≤ f

(A1)≤ · · · ≤ f

(Am

∗)

and f(Am

∗)≥ f

(Am

∗+1)≥ · · · ≥ f

(AK),

which is the desired result.

44

C. Proof of Theorem 5

The proof of Theorem 5 relies on the following series of lemmas. Let us first introduce the following notation.For each E ∈ E , let fE : ({0} ∪ E)× RC → R be defined as follows: for each ` ∈ {0} ∪ E and xE ∈ RC ,

fE(`, xE) = ln

(1

ex`

/ (1 +

∑k∈E e

xk

)) = ln

1

ex`

/(∑k∈{0}∪E e

xk

) ,

where we define x0 = 0. Recall that the utility U` that each customer assigns to the option ` is given byU` = µ∗`+ζ`, where µ∗` = 0 and ζ`’s are i.i.d. Gumbel random variables with mean zero and scaling parameterone. Let the random variable C (E) denote the option that has the maximum utility given the assortmentE; that is, C(E) = arg max`∈E U`. For each xE ∈ RC , let g (xE) be defined by

gE (xE) := E [fE (C(E), xE)] =∑

`∈{0}∪E

P {C(E) = `} fE(`, xE)

=∑

`∈{0}∪E

eµ`∑k∈{0}∪E e

µkln

(1

ex`

/∑k∈{0}∪E e

xk

)

The following standard lemma establishes a relationship between the true mean utilities µi’s and the functiong. For the proof, the reader is referred to Cover and Thomas (2006).

Lemma C.1. For each E ∈ E, gE (µE) = minxE∈RC gE(xE), and for each xE ∈ RC ,

∑`∈{0}∪E

∣∣∣∣∣ eµ`∑k∈{0}∪E e

µk− ex`∑

k∈{0}∪E exk

∣∣∣∣∣ ≤ 2√gE (xE)− gE (µE).

The next lemma provides an upper bound on the difference between E [fE (C(E), µ)] and E [fE (C(E), µ∗)]for each assortment E ∈ E . Before we proceed to the lemma, we need the following definition (see Haussler(1992) for more details).

Definition 1. For any T ⊂ Rd, T is full if there exists some translation of T that intersects all 2d orthantsof Rd, that is, there exists x ∈ Rd such that {sign(x+ y) : y ∈ T} = {0, 1}d where for each vector z ∈ Rd,sign(z) = (sign(z1), . . . , sign(zd)) and sign(·) is the usual sign function which is 1 if the argument is positiveand zero otherwise.

Definition 2. Let F be a family of functions from a set Z to R. For any sequence z = (z1, . . . , zd) of pointsin Z, let F|z = {(f(z1), . . . , f(zd)) : f ∈ F} ⊂ Rd. The pseudo-dimension of F is the largest d such thatthere exists a sequence z of d points in Z and F|z is full.

For our application, we will be interested in the collection of functions GE defined as follows:

GE ≡{f(·, xE) : {0, i, j} → R+

∣∣ xE ∈ RC ∩ [−M,M ]C},

where M is the constant given in Assumption 2. Note that each function in GE is indexed by a C-dimensionalvector xE . The following lemma shows that the pseudo-dimension of GE is at most C + 1.

Lemma C.2. For each E ∈ E, the pseudo-dimension of GE is at most C + 1.

45

Proof. To establish this result, it suffices to show that for any (z1, z2, . . . zC+2) ∈ ({0} ∪ E)C+2, the set ofvectors {(

f(z1, xE), f(z2, xE), . . . , f(zC+2, xE),)

: xE ∈ RC ∩ [−M,M ]C}⊂ RC+2

cannot be full; that is, no translation of this set can intersect all orthants in RC+2. To prove this, note thatsince the set {0}∪E has C+2 elements, at least two of the zs’s must coincide. By relabeling if necessary, wecan assume without loss of generality that z1 = z2 = ` for some ` ∈ {0}∪E. However, since z1 = z2, we havethat f(z1, xE) = f(z2, xE) for all xE ∈ RC . Thus, the set

{(f(z1, xE), f(z2, xE)

): xE ∈ RC ∩ [−M,M ]C

}forms a line in the plane and no translation of this set can possibly intersect all four orthants in R2. Thus,the set cannot be full.

The following result follows from Theorem 11 in Haussler (1992).

Lemma C.3. Let F be a collection of functions from Z into [0, B] with pseudo-dimension d. Let W1,W2, . . . ,WT

be T independent and identically distributed random variables drawn according to any distribution from theset Z. Then, for each ε > 0,

Pr

{supf∈F

∣∣∣∣∣EW [f(W )]− 1T

T∑i=1

f(Wi)

∣∣∣∣∣ ≥ ε}≤ 8

(32eBε

ln32eBε

)de−ε

2T/64B2.

where W denotes a random variable having the same distribution as Wi’s.

We will now apply Lemma C.3 to our problem and establish the following result.

Lemma C.4. Under Assumption 2, for each E ∈ E and ε > 0,

Pr {gE (µE)− gE (µE) ≥ ε} ≤ 8(

64eBε

ln64eBε

)C+1

e−ε2T/256B2

,

where B = 2M + ln(C + 1) and the random variable µE denotes the maximum likelihood estimate definedin Equation (5) based on a sample of T customers, and the probability is taken with respect to the randomvariables C1(E), . . . , CT (E).

Proof. Let E ∈ E be given. To apply the result of Lemma C.3, we will consider the following collection offunctions

GE ≡{fE(·, xE) : {0} ∪ E → R+

∣∣ xE ∈ RC ∩ [−M,M ]C}.

Note that since xE ∈ RC ∩ [−M,M ]C , it follows that for each ` ∈ {0} ∪ E,

fE (`, xE) = ln

∑k∈{0}∪E

exk−x`

≤ ln(1 + Ce2M

)≤ ln

(e2M (C + 1)

)= 2M + ln(C + 1).

46

Note that

gE (µE)− g (µE) =

(E [fE(C(E), µE)]− 1

T

T∑t=1

fE (Ct(E), µE)

)

+

(1T

T∑t=1

fE (Ct(E), µE)− 1T

T∑t=1

fE (Ct(E), µE)

)

+

(1T

T∑t=1

fE (Ct, µE)− E [fE (C(E), µE)]

)

≤

(E [fE (C(E), µE)]− 1

T

T∑t=1

fE (Ct(E), µE)

)

+

(1T

T∑t=1

fE (Ct(E), µE)− E [fE (C(E), µE)]

)

≤ 2 supxE∈RC∩[−M,M ]C

∣∣∣∣∣E [fE (C(E), xE)]− 1T

T∑t=1

fE (Ct(E), xE)

∣∣∣∣∣ ,where the first inequality follows from Equation (5), which implies that

1T

T∑t=1

f (Ct(E), µE) = minxE∈RC∩[−M,M ]C

1T

T∑t=1

f (Ct(E), xE) .

The final inequality follows from the fact that both µE and µE are in RC ∩ [−M,M ]C . Therefore,

Pr {gE (µE)− gE (µE) ≥ ε}

≤ Pr

{sup

xE∈RC∩[−M,M ]2

∣∣∣∣∣E [fE (C(E), xE)]− 1T

T∑t=1

f (Ct(E), xE)

∣∣∣∣∣ ≥ ε

2

}

≤ 8(

64eBε

ln64eBε

)C+1

e−ε2T/256B2

,

where B = 2M + ln(C + 1). The last inequality follows from Lemma C.3 and the fact that the collection offunctions G has a pseudo-dimension of at most C + 1.

Finally, here is the proof of Theorem 5.

Proof. It follows from Lemma C.1 that

P

∑k∈{0}∪E

∣∣∣θk (E)− θk (E)∣∣∣ ≥ ε

≤ Pr{gE (µE)− g (µE) ≥ ε2

4

}≤ 8

(256eBε2

ln256eBε2

)C+1

e−ε4T/4096B2

where the last inequality follows from Lemma C.4.

D. Proof of Theorem 6

The proof of Theorem 6 makes use of the following result. Recall that

η = minE∈E

min{i,j}∈E:i 6=j

|θi(E)− θj(E)|

Under Assumption 1, we know that η > 0.

47

Lemma D.1. For each E ∈ E, if maxk∈E∣∣∣θk(E)− θk(E)

∣∣∣ ≤ η/3, then

maxi∈E,j∈E:i 6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ ≤ 6 max{wi, wj}

η2

∑k∈E

∣∣∣θk(E)− θk(E)∣∣∣

Proof. Let i ∈ E and j ∈ E be given. Since maxk∈E∣∣∣θk(E)− θk(E)

∣∣∣ ≤ η/3, it follows from the definition of

η that θi(E) 6= θj(E). Thus, by definition,∣∣∣I(i, j)− IE(i, j)∣∣∣ =

∣∣∣∣∣θi (E)wi − θj (E)wjθi (E)− θj (E)

− θi (E)wi − θj (E)wjθi (E)− θj (E)

∣∣∣∣∣≤ wi

∣∣∣∣∣ θi (E)θi (E)− θj (E)

− θi (E)

θi (E)− θj (E)

∣∣∣∣∣+ wj

∣∣∣∣∣ θj (E)θi (E)− θj (E)

− θj (E)

θi (E)− θj (E)

∣∣∣∣∣=

wi

∣∣∣θi (E) θj (E)− θi (E) θj (E)∣∣∣

|θi (E)− θj (E)| ·∣∣∣θi (E)− θj (E)

∣∣∣ +wj

∣∣∣θj (E) θi (E)− θj (E) θi (E)∣∣∣

|θi (E)− θj (E)| ·∣∣∣θi (E)− θj (E)

∣∣∣≤

2 max{wi, wj}(∣∣∣θi (E)− θi (E)

∣∣∣+∣∣∣θj (E)− θj (E)

∣∣∣)|θi (E)− θj (E)| ·

∣∣∣θi (E)− θj (E)∣∣∣

≤6 max{wi, wj}

(∣∣∣θi (E)− θi (E)∣∣∣+∣∣∣θj (E)− θj (E)

∣∣∣)η2

where the last inequality follows from the fact that since∣∣∣θi (E)− θi (E)

∣∣∣ and∣∣∣θi (E)− θi (E)

∣∣∣ are at most

η/3,∣∣∣θi (E)− θj (E)

∣∣∣ ≥ η/3.

Here is the proof of Theorem 6.

Proof. Note that by our construction I(i,0) = I(i,0) = wi for all i. By the union bound,

P{

maxE∈E

{max

{i,j}⊆E:i6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ ,max

i∈E

∣∣∣θi (E)− θi (E)∣∣∣} > ε

}≤

∑E∈EP{

max{i,j}⊆E:i 6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ > ε OR max

i∈E

∣∣∣θi (E)− θi (E)∣∣∣ > ε

}≤

∑E∈EP{

max{i,j}⊆E:i 6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ > ε OR max

i∈E

∣∣∣θi (E)− θi (E)∣∣∣ > min{ε, η/3}

}

≤∑E∈EP

{∑i∈E

∣∣∣θi (E)− θi (E)∣∣∣ > min

{ε,η

3,

εη2

6 max` w`

}}

≤∑E∈EP

{∑i∈E

∣∣∣θi (E)− θi (E)∣∣∣ > εη2

6 max` w`

},

where the last inequality follows from the fact that max` w` ≥ 1. Note that the third inequality follows fromthe fact that the event

max{i,j}⊆E:i 6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ > ε OR max

i∈E

∣∣∣θi (E)− θi (E)∣∣∣ > min{ε, η/3}

is a subset of the event ∑i∈E

∣∣∣θi (E)− θi (E)∣∣∣ > min

{ε,η

3,

εη2

6 max` w`

}

48

To see this, note that if maxi∈E∣∣∣θi (E)− θi (E)

∣∣∣ ≤ min{ε, η/3} and max{i,j}⊆E:i6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ > ε,

then it follows from Lemma D.1 that

ε < max{i,j}⊆E:i 6=j

∣∣∣I(i, j)− IE(i, j)∣∣∣ ≤ 6 max` w`

η2

∑i∈E

∣∣∣θi (E)− θi (E)∣∣∣ ,

which gives the desired result.

From Theorem 5, we have that for each E ∈ E ,

P

{∑i∈E

∣∣∣θi (E)− θi (E)∣∣∣ > εη2

6 max` w`

}

≤ 8

(9216eB (max` w`)

2

ε2η4ln

9216eB (max` w`)2

ε2η4

)C+1

exp

{− ε4η8T

5308416B2 (max` w`)2

}

Since |E| ≤ 5(N/C)2, the desired result follows.

E. Proof of Proposition 8

Proof. Let r =(√

5− 1)/2 denote the golden ratio and let w∗ = max1≤`≤N w`. Since the classical Golden

Ratio search reduces the size of the target set by factor of r in each iteration, the maximum number ofiterations is at most dlnL/ ln(1/r)e. For k = 1, . . . , dlnL/ ln(1/r)e, let Gk denote the event that we makethe wrong decision in the kth iteration; that is, we made a decision that is different from we would have doneif we can evaluate the profit function f(·) exactly.

Consider the first iteration and suppose we are comparing assortments Ds and Dt for some s and t. Thecomparison is based on the sample average of WGRS samples for each assortment. Let Xs

1 , . . . , XsWGRS

denotethe observed profit associated with the selections of the WGRS customers who were offered the assortmentDs. Similarly, let Xt

1, . . . , XtWGRS

denote observed profits associated with Dt. By definition, we know thatE [Xs

i ] = f(Ds) and E [Xti ] = f(Dt) for all 1 ≤ i ≤ WGRS . Suppose that f(Ds) > f(Dt). Then, we can

bound the probability of error in the first iteration as follows

P {G1} = P

{1

WGRS

WGRS∑i=1

Xsi <

1WGRS

WGRS∑i=1

Xti

}

≤ P

{1

WGRS

WGRS∑i=1

(Xti − f(Dt)

)+

1WGRS

WGRS∑i=1

(f(Ds)−Xsi ) > f(Ds)− f(Dt)

}

≤ P

{1

WGRS

WGRS∑i=1

(Xti − f(Dt)

)+

1WGRS

WGRS∑i=1

(f(Ds)−Xsi ) > β

}

≤ P

{1

WGRS

WGRS∑i=1

(Xti − f(Dt)

)>β

2

}+ P

{1

WGRS

WGRS∑i=1

(f(Ds)−Xsi ) >

β

2

}

≤ 2 exp{−β

2WGRS

4w∗

},

where the last inequality follows from the classical Chernoff-Hoeffding Inequality (Hoeffding (1963)). Usinga similar argument, it is easy to verify that the probability of error P {Gk} in the kth iteration is boundedabove by 2k exp

{−β

2WGRS

4w∗

}.

49

To determine an upper bound on the regret after T periods, we note that the total number of samplesthat we used before we are left with a single assortment is at most

WGRS ·(⌈

lnLln(1/r)

⌉+ 3)≤ 4WGRS ·

⌈lnL

ln(1/r)

⌉,

where the factor of 3 reflects the fact that the standard Golden Ratio search starts with 4 points in thefirst iteration and adds one additional search point in subsequent iterations. The maximum expected regretincurred from these samples is at most 4w∗ ·WGRS ·

⌈lnL

ln(1/r)

⌉. Moreover, if we correctly identify the optimal

assortment at the end of the search, the regret will be zero thereafter. Thus, maximum expected regret afterwe concluded the search algorithm is at most

w∗T · P{GdlnL/ ln(1/r)e

}≤ 2w∗T

⌈lnL

ln(1/r)

⌉exp

{− β2WGRS

4 max` w`

},

where the right hand side follows from the upper bound on the error probability. Using the definition ofWGRS , we have the following upper bound on the expected regret after T periods:

4w∗ ·WGRS ·⌈

lnLln(1/r)

⌉+ 2w∗T

⌈lnL

ln(1/r)

⌉exp

{− β2WGRS

4 max` w`

}≤ 4w∗

(4w∗ lnTβ2

+ 1)(

lnLln(1/r)

+ 1)

+ 2w∗(

lnLln(1/r)

+ 1)

= 4w∗(

4w∗ lnTβ2

+ 1)(

ln(L/r)ln(1/r)

)+ 2w∗

(ln(L/r)ln(1/r)

)=

16(w∗)2 ln(L/r)β2 ln(1/r)

lnT + 6w∗ln(L/r)ln(1/r)

,

and the desired result follows from the fact that 1/r ≤ 2 and 1/ ln(1/r) ≤ 2.079.

50

Documents

Dynamic Assortment Optimization with a Multinomial …shen/papers/paper49.pdf · Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint Paat