Throughput Maximization for Tandem Lines with Two Stations ... · Throughput Maximization for Tandem Lines with Two Stations and Flexible Servers Sigrun´ Andrad´ottir and Hayriye

Throughput Maximization for Tandem Lines with Two

Stations and Flexible Servers

Sigrun Andradottir and Hayriye Ayhan

School of Industrial and Systems Engineering

Georgia Institute of Technology

Atlanta, GA 30332-0205, U.S.A.

December 30, 2002

Abstract

For a Markovian queueing network with two stations in tandem, finite intermediate buffer,and M flexible servers, we study how the servers should be assigned dynamically to stations inorder to obtain optimal long-run average throughput. We assume that each server can work ononly one job at a time, that several servers can work together on a single job, and that the traveltimes between stations are negligible. Under these assumptions, we completely characterize theoptimal policy for systems with three servers. We also provide a conjecture for the structure ofthe optimal policy for systems with four or more servers that is supported by extensive numericalevidence. Finally, we develop heuristic server assignment policies for systems with three or moreservers that are easy to implement, robust with respect to the server capabilities, and generallyappear to yield near-optimal long-run average throughput.

1 Introduction

We consider a tandem queueing network with two stations and M servers. There is an infinite

supply of jobs in front of station 1, infinite room for completed jobs after station 2, and a finite

buffer of size 0 ≤ B < ∞ between stations 1 and 2. We assume that at any given time, there can

be at most one job in service at each station and that each server can work on at most one job.

Moreover, we assume that each server i ∈ {1, . . . , M} works at a deterministic rate µij ∈ [0,∞)

at each station j ∈ {1, 2}. Hence, server i is trained to work at station j if µij > 0. We assume

that several servers can work together on a single job, in which case their service rates are additive.

The service times of the different jobs at station j ∈ {1, 2} are independent and exponentially

distributed random variables with rate µ(j), and service times at stations 1 and 2 are independent.

Without loss of generality, we assume that µ(1) = µ(2) = 1. Finally, we assume that the network

operates under the manufacturing blocking mechanism.

Our objective in this paper is to determine the dynamic server assignment policy that maximizes

the long-run average throughput of the queueing system described above. For simplicity, we assume

1

that the travel and setup times associated with servers moving from one station to the other one

are negligible.

Andradottir, Ayhan, and Down [4] identify the optimal server assignment policies for M ≤ 2.

In particular, when M = 1, then any non-idling server assignment policy is optimal, and when

M = 2, the optimal policy involves assigning one server to work at each station in such a way

that the product of the server rates at the stations they are assigned to is maximized, with the

servers only working at the other station (that they are not assigned to) when there is no work

to be done at the station they are assigned to (due to blocking or starving). Consequently, this

paper is focused on the situation when M ≥ 3, so that the queueing network has more servers than

stations. We shall see that when M ≥ 3, then the optimal policy is more complicated than when

M = 2 in that servers may move away from a station when there is still work to do at that station

(see Sections 3 and 4 below).

Much of the existing work in the area of optimal dynamic assignment of servers to queues is

focused on parallel queues. In particular, for a two-class queueing system with one dedicated server,

one flexible server, and no exogenous arrivals, Ahn, Duenyas, and Zhang [3] characterize the server

assignment policy that minimizes the expected total holding cost incurred until all jobs initially

present in the system have departed. Moreover, under the heavy traffic assumption, Harrison

and Lopez [11], Bell and Williams [8], Williams [20], and Mandelbaum and Stolyar [12] develop

asymptotically optimal server assignment policies that minimize the discounted infinite-horizon

holding cost for parallel queueing systems with flexible servers and outside arrivals. Finally, under

the assumption of heavy traffic, Squillante et al. [18] use simulation to study the performance of

threshold-type policies for systems that consist of parallel queues.

Most of the papers that have considered the optimal assignment of multiple servers to multiple

interconnected queues focus on minimizing holding costs. In particular, for systems with two

queues in tandem and no arrivals, Farar [9], Pandelis and Teneketzis [15], and Ahn, Duenyas,

and Zhang [2] study how servers should be assigned to stations to minimize the expected total

holding cost incurred until all jobs leave the system. Moreover, Rosberg, Varaiya, and Walrand

[17], Hajek [10], and more recently, Ahn, Duenyas, and Lewis [1] study the assignment of (service)

effort to minimize holding costs in the two-station setting with Poisson arrivals. To the best of

our knowledge, Andradottir, Ayhan, and Down [4, 5] are the only two papers that consider the

dynamic assignment of servers to maximize the long-run average throughput in queueing networks

with flexible servers. In particular, Andradottir, Ayhan, and Down [4] characterize the optimal

dynamic server assignment policy for a two-stage finite tandem queue with two servers and also

present a simple server assignment heuristic for finite tandem queues with an equal number of servers

and stations. For more general queueing networks with infinite buffers, Andradottir, Ayhan, and

Down [5] develop dynamic server assignment policies that guarantee a capacity arbitrarily close to

the maximal capacity.

Other research on dynamic server assignment policies includes the work of Ostalaza, McClain,

2

and Thomas [14], McClain, Thomas, and Sox [13], and Zavadlav, McClain, and Thomas [21] on

dynamic line balancing. In particular, Ostalaza, McClain, and Thomas [14] and McClain, Thomas,

and Sox [13] study dynamic line balancing in tandem queues with shared tasks that can be per-

formed at either of two successive stations. This work was continued by Zavadlav, McClain, and

Thomas [21], who study several server assignment policies for systems with fewer servers than sta-

tions, in which all servers trained to work at a particular station have the same capabilities at that

station. Moreover, assuming that each server has a service rate that does not depend on the task

(s)he is working on, Bartholdi and Eisenstein [6] define the “bucket brigades” server assignment

policy and show that under this policy, a stable partition of work will emerge yielding optimal

throughput. Finally, Bartholdi, Eisenstein, and Foley [7] show that the behavior of the bucket

brigades policy, applied to systems with discrete tasks and exponentially distributed task times,

resembles that of the same policy applied in the deterministic setting with infinitely divisible jobs.

The remainder of this paper is organized as follows: In Section 2, we formulate the server

assignment problem considered in this paper as a Markov decision problem. In Section 3, we

provide an optimal server assignment policy for systems with two stations and three servers. In

Section 4, we present a conjecture for the structure of an optimal server assignment policy for

systems with two stations and four or more servers. Section 5 contains some reversibility results for

tandem lines with two stations and arbitrary numbers of servers. In Section 6, we present numerical

results that support the optimality of the policy proposed in Section 4, describe some properties of

this policy, and study heuristic policies that appear to yield near-optimal performance and involve

grouping all available servers into two or three teams. Section 7 contains some concluding remarks.

Finally, the proof of the main result in this paper is given in the Appendix.

2 Problem Formulation

Let Π be the set of server assignment policies under consideration, and for all π ∈ Π and t ≥ 0, let

Dπ(t) be the number of departures under policy π by time t, and let

T π = limt→∞

IE[Dπ(t)]t

(1)

be the long-run average throughput corresponding to the server assignment policy π. We are

interested in solving the optimization problem

maxπ∈Π

T π. (2)

For all π ∈ Π, consider the stochastic process {Xπ(t) : t ≥ 0}, where Xπ(t) = 0 if there is a job

to be processed at station 1, the number of jobs waiting to be processed between stations 1 and

2 is 0, and station 2 is starved at time t; Xπ(t) = s for 1 ≤ s ≤ B + 1 if there are jobs to be

processed at both stations 1 and 2 and in the buffer there are s − 1 jobs waiting to be processed

at time t; finally, Xπ(t) = B + 2 if station 1 is blocked, B jobs are waiting to be processed in the

3

buffer, and there is a job to be processed at station 2 at time t. For the remainder of this paper, we

assume that the class Π of server assignment policies under consideration consists of all Markovian

stationary deterministic policies corresponding to the state space S = {0, 1, 2, . . . , B + 2} of the

stochastic process {Xπ(t) : t ≥ 0}.It is clear that for all π ∈ Π, {Xπ(t) : t ≥ 0} is a continuous time Markov chain and that there

exists a scalar qπ ≤ ∑Mi=1 max1≤j≤2 µij < ∞ such that the transition rates {qπ(x, x′)} of {Xπ(t)}

satisfy∑

x′∈S,x6=x′ qπ(x, x′) ≤ qπ for all x ∈ S. Hence, {Xπ(t)} is uniformizable for all π ∈ Π. Let

{Y π(k)} be the corresponding discrete time Markov chain, so that {Y π(k)} has state space S and

transition probabilities pπ(x, x′) = qπ(x, x′)/qπ if x 6= x′ and pπ(x, x) = 1−∑x′∈S,x6=x′ q

π(x, x′)/qπ

for all x ∈ S. It has been shown by Andradottir, Ayhan, and Down [4] that since {Xπ(t)}is uniformizable, the original optimization problem in (2) can be translated into an equivalent

(discrete time) Markov decision problem. More specifically, let

Rπ(x) =

qπ(x, x− 1) for x ∈ {1, . . . , B + 2},0 for x = 0,

be the departure rate from state x under policy π, for all x ∈ S and π ∈ Π. Then the optimization

problem (2) has the same solution as the Markov decision problem

maxπ∈Π

limK→∞

IE

[1K

K∑

k=1

Rπ(Y π(k − 1))

]. (3)

In other words, Andradottir, Ayhan, and Down [4] showed that maximizing the steady-state

throughput of the original queueing system is equivalent to maximizing the steady-state depar-

ture rate for the associated embedded (discrete time) Markov chain.

In the next two sections, we characterize the dynamic server assignment policies that solve the

optimization problem (3) for two-stage tandem queues with M ≥ 3 servers. We consider the case

when M = 3 in Section 3 and the case when M > 3 in Section 4.

3 Two Stations and Three Servers

In this section, we consider the special case of a tandem Markovian queueing network with two

stations and three servers. Since the number of possible states and actions are both finite, the exis-

tence of an optimal Markovian stationary deterministic policy follows immediately from Theorem

9.1.8 of Puterman [16].

We assume that for all i ∈ {1, 2, 3}, either µi1 > 0 or µi2 > 0. (If there exists a server i such

that µi1 = µi2 = 0, then the problem reduces to having two servers, for which the optimal policy

is given in Andradottir, Ayhan, and Down [4].) Without loss of generality, we also assume that

there exist i, k ∈ {1, 2, 3} such that µi1 > 0 and µk2 > 0. (Note that if µ11 = µ21 = µ31 = 0 or

µ12 = µ22 = µ32 = 0, then the throughput is zero and any policy is optimal.) Define d as

d ∈ D = arg mini∈{1,2,3}

{µi1

µi2

}.

4

The above assumptions on the service rates guarantee thatµd1

µd2< ∞. Similarly, define m as

m ∈ M = arg mini∈{1,2,3}\{d}

{µi1

µi2

}.

Note that if µi2 = 0 for all i ∈ {1, 2, 3}\{d}, then M = {1, 2, 3}\{d}. Finally, define u ∈{1, 2, 3}\{d,m}. For reasons that become clear in Theorem 3.1, “u” stands for “upstream,” “d”

stands for “downstream,” and “m” stands for “moving.” Note that the definitions of d and m

imply that µi1µd2−µd1µi2 ≥ 0 for all i ∈ {1, 2, 3} and µi1µm2−µm1µi2 ≥ 0 for all i ∈ {1, 2, 3}\{d}.Moreover, from our assumptions on the service rates and from the definitions of d, m, and u, we

have that µd2 > 0 and µu1 > 0.

For fixed d, m, and u, and for all i ∈ {0, 1, . . .}, define

f(i) = µi−2d2 (µm1µd2 − µd1µm2)(µd2 + µm2 + µu2)

B−i+2∑

j=0

µju1(µd2 + µm2)B−i−j+2 −

µB−i+2u1 (µu1µm2 − µm1µu2)(µd1 + µm1 + µu1)

i−2∑

j=0

µjd2(µm1 + µu1)i−j−2, (4)

with the convention that summation over an empty set equals 0. Note that

f(i) ≥ 0 for i ≤ 1,

f(i) ≤ 0 for i ≥ B + 3.

Throughout our developments, we let As denote the set of allowable actions in state s ∈ S (see

the first paragraph of the proof of Theorem 3.1 in the Appendix) and (δ)∞ denote the policy

corresponding to the decision rule δ, which is a (B + 3)-dimensional vector whose components

δ(s) ∈ As specify what action in As should be applied in state s for all s ∈ S.

Remark 3.1 For i = 1, . . . , B + 2, let T (δi)∞ be the throughput of the policy (δi)∞ such that

δi(s) =

servers d,m, and u work at station 1 for s = 0,

servers m and u work at station 1, server d works at station 2 for 1 ≤ s ≤ i− 1,

server u works at station 1, servers d and m work at station 2 for i ≤ s ≤ B + 1,

servers d,m, and u work at station 2 for s = B + 2.(5)

Then sign(f(i)) = sign(T (δi)∞ − T (δi−1)∞) for all i = 2, . . . , B + 2. This follows with some algebra

from the expression for g0 given in the proof of Theorem 3.1 (see equation (13)).

Let

S∗ ={s ∈ S\{0} : f(s) ≥ 0 and f(s + 1) ≤ 0

}.

The following result follows directly from the fact that f(1) ≥ 0 and f(B + 3) ≤ 0.

5

Proposition 3.1 S∗ 6= ∅.

We are now ready to state the theorem that characterizes the optimal server assignment policy.

Theorem 3.1 Define s∗ ∈ S∗ and let δ∗(s) = δs∗(s) for all s ∈ S (see equation (5)). Then (δ∗)∞

is optimal in the class of Markovian stationary deterministic policies. Moreover, this is the unique

optimal policy if S∗ = {s∗}.

Remark 3.2 Theorem 3.1 shows that in the optimal policy the “upstream” server u works at the

upstream station 1 unless that station is blocked, the “downstream” server d works at the down-

stream station 2 unless that station is starved, and the “moving” server m works at the upstream

station 1 when the number of jobs in the buffer is small and then moves to the downstream station

2 when the number of jobs in the buffer has become sufficiently large. Note that the definitions

of d, m, and u imply that the server whose service rate at the upstream/downstream station is

relatively the largest (relative to the server’s rate at the other station) should be assigned to the

upstream/downstream station, and the server whose service rates at the upstream and downstream

stations are relatively the most balanced should move between the two stations depending on the

content of the buffer. Note also that the optimal policy in the case when M = 2 is essentially the

above policy without a moving server, see Andradottir, Ayhan, and Down [4].

The proof of Theorem 3.1 is presented in the Appendix. We now present a proposition which

illustrates some properties of S∗. Throughout our developments, #A denotes the cardinality of any

set A.

Proposition 3.2 (i) If #D = 1 and #M = 1, then S∗ has at most two elements, and if S∗ has

two elements, then these are two consecutive states.

(ii) If #D = 1 and #M = 2, then S∗ = {B + 2}.(iii) If #D = 2, then #M = 1 and S∗ = {1}.(iv) If #D = 3, then #M = 2 and S∗ = S\{0}.

Proof: Note first that for all s ∈ S \ {0},

f(s)− f(s + 1)

= (µm1µd2 − µd1µm2)(µd2 + µm2 + µu2)µs−2d2

µm2

B−s+1∑

j=0

µju1(µd2 + µm2)B−s−j+1 + µB−s+2

u1

+

(µu1µm2 − µm1µu2)(µd1 + µm1 + µu1)µB−s+1u1

µm1

s−2∑

j=0

µjd2(µm1 + µu1)s−j−2 + µs−1

d2

≥ 0. (6)

Hence, f(s) is non-increasing in s ∈ S \ {0}.(i) It follows from #D = 1 and #M = 1 that µm1µd2−µd1µm2 > 0 and µu1µm2−µm1µu2 > 0.

Then equation (6) implies that f(s) is (strictly) decreasing in s. Let s∗ = minS∗. If s∗ = B + 2,

6

then there is nothing to prove, so assume that s∗ ∈ {1, . . . , B + 1}. From the definition of s∗, we

have that f(s∗ + 1) ≤ 0. If f(s∗ + 1) < 0, then f(s) < 0 for all s ≥ s∗ + 1 and hence S∗ = {s∗}.On the other hand, if f(s∗ + 1) = 0 then f(s) < 0 for all s ≥ s∗ + 2 and hence S∗ = {s∗, s∗ + 1}.

(ii) It follows from #D = 1 and #M = 2 that µm1µd2−µd1µm2 > 0 and µu1µm2−µu2µm1 = 0.

Then equation (4) implies that f(s) > 0 for all s = 1, . . . , B +2 and f(B +3) = 0. Thus, s = B +2

is the only s ∈ S\{0} such that f(s) ≥ 0 and f(s + 1) ≤ 0.

(iii) Since #D = 2, we have

µd1

µd2=

µm1

µm2<

µu1

µu2.

Hence, #M = 1, µm1µd2−µd1µm2 = 0, and µu1µm2−µm1µu2 > 0. Then equation (4) implies that

f(1) = 0 and f(s) < 0 for all s ≥ 2. Thus, s = 1 is the only s ∈ S\{0} such that f(s) ≥ 0 and

f(s + 1) ≤ 0.

(iv) Since #D = 3, we have

µd1

µd2=

µm1

µm2=

µu1

µu2.

Hence, #M = 2, µm1µd2−µd1µm2 = 0, and µu1µm2−µm1µu2 = 0. Then equation (4) implies that

f(s) = 0 for all s ≥ 0, so that S∗ = S\{0}. 2

Remark 3.3 It immediately follows from Proposition 3.2 that in order to have S∗ = {s∗}, and

hence to have a unique optimal policy, it is necessary to have #D ≤ 2 and sufficient to have either

#D = 2 or #D = 1 and #M = 2. Moreover, note that when either #D = 2 or #D = 1 and

#M = 2, then the optimal policy is unique even though the servers d, m, and u are not uniquely

defined in these cases. In particular, when #D = 2, then D = {d,m} and Theorem 3.1 and

Proposition 3.2 imply that servers d and m are both at station 2 in all states s ∈ {1, 2, . . . , B + 2}(since S∗ = {s∗} = {1} in this case). Similarly, when #D = 1 and #M = 2, then M = {m,u}and Theorem 3.1 and Proposition 3.2 imply that servers m and u are both at station 1 in all states

s ∈ {0, 1, . . . , B + 1} (since S∗ = {s∗} = {B + 2} in this case).

4 Two Stations and More than Three Servers

In this section, we provide a conjecture for the structure of the optimal server assignment policy

for systems with M > 3 servers and two stations. Let L = {1, . . . , M}. Without loss of generality,

we again assume (as in Section 3) that for all i ∈ {1, . . . ,M}, either µi1 > 0 or µi2 > 0 (because if

there exists a server i such that µi1 = µi2 = 0, then the problem reduces to having M − 1 servers).

Moreover, we again assume that there exist i, k ∈ {1, . . . ,M} such that µi1 > 0 and µk2 > 0

(because if µ11 = · · · = µM1 = 0 or µ12 = · · · = µM2 = 0, then the maximal throughput is zero and

any policy is optimal). Let

l1 ∈ L1 = arg minL

{µi1

µi2

}, (7)

7

and for 2 ≤ j ≤ M , let

lj ∈ Lj = arg minL\{l1,...,lj−1}

{µi1

µi2

}. (8)

Then we conjecture that there exist 1 = s∗1 ≤ s∗2 ≤ s∗3 ≤ · · · ≤ s∗M−1 ≤ s∗M = B + 2 such that the

server assignment policy (δ∗)∞ given as

δ∗(s) =

servers l1, . . . , lM work at station 1 for s = 0,

servers l2, . . . , lM work at station 1, server l1 works at station 2 for 1 ≤ s ≤ s∗2 − 1,

servers l3, . . . , lM work at station 1, servers l1, l2 work at station 2 for s∗2 ≤ s ≤ s∗3 − 1,

servers l4, . . . , lM work at station 1, servers l1, l2, l3 work at station 2 for s∗3 ≤ s ≤ s∗4 − 1,...

...

server lM works at station 1, servers l1, . . . , lM−1 work at station 2 for s∗M−1 ≤ s ≤ B + 1,

servers l1, . . . , lM work at station 2 for s = B + 2,(9)

is optimal. It is clear from equations (7) and (8) that it is easy to determine l1, . . . , lM for any

particular problem. Moreover, once l1, . . . , lM are determined, then one can compute s∗2, . . . , s∗M−1

by considering all the possibilities and choosing the one that provides the best throughput. This

procedure requires much less effort than determining the optimal policy without knowing this

structure, which is conjectured to be optimal. In particular, we now need to compute the throughput

of ( B + M − 1M − 2 ) policies, rather than considering all 3M(B+3) Markovian stationary deterministic

policies (or all 2M(B+1) non-idling policies). Note that the optimal policy for M = 3 specified in

Section 3 and the optimal policy for M = 2 given by Andradottir, Ayhan, and Down [4] agree with

the conjecture given above (when M = 2, server l1 should be at station 2 and server l2 at station 1

for all 1 ≤ s ≤ B + 1). Moreover, the extensive numerical examples given in Section 6 demonstrate

that the conjectured policy also appears to be optimal for systems with M > 3.

5 Reversibility of Two Station Tandem Lines with Flexible Servers

Suppose that the original (forward) line has two stations and M ≥ 1 servers and consider the

reversed line in which station 2 is followed by station 1 (note that we have not relabeled the stations

or changed the size of the buffer). Suppose that the original and reversed lines operate under the

Markovian stationary deterministic server assignment policies π and πR, respectively. Let δ and δR

be the decision rules associated with the policies π and πR, respectively (so that δ and δR specify

what servers are assigned to stations 1 and 2, respectively, as a function of the state s ∈ S of the

two systems). Let {XπRR (t)} be the Markov chain model for the reversed line corresponding to the

model {Xπ(t)} specified in Section 2 for the original line (for example, XπRR (t) = s ∈ {1, . . . , B +1}

if at time t, the reversed line has jobs to be processed at both stations and s − 1 jobs waiting to

be processed in the buffer) and let T πRR be the long-run average throughput under policy πR in the

reversed line (note that T π and T πRR may depend on the initial states of the Markov chains {Xπ(t)}

8

and {XπRR (t)}, respectively, if these Markov chains have more than one recurrent equivalence class).

Throughout this section, we will assume that δR(s) = δ(B + 2− s), for all s ∈ S. We have

Proposition 5.1 If B + 2−XπRR (0) belongs to the same recurrent equivalence class of the Markov

chain {Xπ(t)} as Xπ(0), then

T πRR = T π.

Proof: For all s ∈ S, let κπ1 (s) and κπ

2 (s) denote the sets of servers assigned to stations 1 and 2,

respectively, in state s of the original line under the policy π. It is clear that the stochastic process

{Xπ(t)} is a birth-death process with state space S. For all s ∈ S, let λπ(s) and γπ(s) denote the

birth and death rates in state s, respectively. Then λπ(B + 2) = γπ(0) = 0 and

λπ(s) =∑

i∈κπ1 (s)

µi1, for s = 0, . . . , B + 1,

γπ(s) =∑

i∈κπ2 (s)

µi2, for s = 1, . . . , B + 2.

Moreover, {XπRR (t)} is also a birth-death process with state space S. For all s ∈ S, let λπR

R (s)

and γπRR (s) denote the birth and death rates in state s of the reversed line, respectively. Then the

assumption that δR(s) = δ(B + 2− s), for all s ∈ S, implies that λπRR (B + 2) = γπR

R (0) = 0 and

λπRR (s) =

∑

i∈κπ2 (B+2−s)

µi2, for s = 0, . . . , B + 1,

γπRR (s) =

∑

i∈κπ1 (B+2−s)

µi1, for s = 1, . . . , B + 2,

and hence that

λπRR (s) = γπ(B + 2− s) and γπR

R (s) = λπ(B + 2− s), for all s ∈ S.

Now suppose that Xπ(0) ∈ Eπ, where Eπ ⊂ S is a recurrent equivalence class of the Markov chain

{Xπ(t)}. Then EπR = {s ∈ S : B+2−s ∈ Eπ} is a recurrent equivalence class of the Markov chain

{XπRR (t)} with XπR

R (0) ∈ EπR . Let {pπ(s) : s ∈ Eπ} be the stationary distribution for {Xπ(t)} on

Eπ and let {pπRR (s) : s ∈ EπR} be the stationary distribution for {XπR

R (t)} on EπR . It is clear that

pπRR (s) = pπ(B + 2− s) for all s ∈ EπR . Therefore, XπR

R (0) ∈ EπR and Xπ(0) ∈ Eπ imply that

T πRR =

∑

s∈EπR

γπRR (s)pπR

R (s) =∑

s∈EπR

λπ(B + 2− s)pπ(B + 2− s) =∑

s∈Eπ

λπ(s)pπ(s) = T π,

and the proof is complete. 2

Let ΠF ⊂ Π be the class of all non-idling threshold policies for the forward system, so that for

all π ∈ ΠF and i = 1, . . . ,M , there exists tπi ∈ {1, . . . , B + 2} such that server i is at the upstream

station 1 in all states s < tπi and at the downstream station 2 in all states s ≥ tπi . For all π ∈ ΠF ,

9

let lπ1 , . . . , lπM be such that {lπ1 , . . . , lπM} = {1, . . . ,M} and 1 ≤ sπ1 ≤ · · · ≤ sπ

M ≤ B + 2, where

sπi = tπlπi

, for i = 1, . . . , M (see equation (9)). Hence, lπ1 is the first server to move from station 1 to

station 2 and lπM is the last server to do so under the policy π ∈ ΠF . Let ΠR ⊂ Π and tπi , lπi , and

sπi , where π ∈ ΠR and i = 1, . . . , M , be defined in the same manner for the reverse system. We

have

Proposition 5.2 (i) π ∈ ΠF if and only if πR ∈ ΠR.

(ii) For all i = 1, . . . , M and s = 1, . . . , B + 2, tπi = s if and only if tπRi = B + 3− s.

(iii) For all i = 1, . . . , M , lπi = lπRM+1−i.

(iv) For all i = 1, . . . , M and s = 1, . . . , B + 2, sπi = s if and only if sπR

M+1−i = B + 3− s.

Proof: We have that tπi = s if and only if δ(s′) has server i at station 1 for all s′ ≤ s − 1 and

δ(s′) has server i at station 2 for all s′ ≥ s. The definition of πR now implies that tπi = s if and

only if δR(B + 2 − s′) has server i at station 1 for all s′ ≤ s − 1 and δR(B + 2 − s′) has server

i at station 2 for all s′ ≥ s. This proves parts (i) and (ii) of the proposition. Moreover, since

tπi1 ≤ tπi2 implies that tπRi1

= B + 3 − tπi1 ≥ B + 3 − tπi2 = tπRi2

, it is clear that the order in which

the servers move from the upstream station to the downstream station is reversed under πR, and

part (iii) of the proposition follows. Finally, from (ii) and (iii), we have sπi = tπlπi

= s if and only if

sπRM+1−i = tπR

lπRM+1−i

= tπRlπi

= B + 3 − s, for all i = 1, . . . , M and s = 1, . . . , B + 2, proving that part

(iv) of the proposition holds. 2

We now present several corollaries that follow from Propositions 5.1 and 5.2. Recall that a policy

is optimal if it leads to the maximal throughput regardless of the initial state of the underlying

Markov chain, see equations (2) and (3).

Corollary 5.1 The policy π is optimal in the forward system if and only if the policy πR is optimal

in the reversed system.

Proof: If πR is not optimal in the reversed system, then there exists a policy π′R with decision

rule δ′R such that Tπ′RR > T πR

R , at least for some initial states XπRR (0) and X

π′RR (0). Let π′ (with

decision rule δ′) be the policy in the forward system such that δ′(s) = δ′R(B + 2− s) for all s ∈ S.

Proposition 5.1 now implies that if Xπ(0) and B +2−XπRR (0) belong to the same equivalence class

of {Xπ(t)} and also Xπ′(0) and B + 2−Xπ′RR (0) belong to the same equivalence class of {Xπ′(t)},

then

T π′ = Tπ′RR > T πR

R = T π,

which contradicts our assumption that π is an optimal policy for the forward system. 2

Corollary 5.2 If the service rates µij, where i = 1, . . . ,M and j = 1, 2, are drawn independently

from a certain distribution, then the probability that π is optimal in the forward system is equal to

the probability that πR is optimal in the reversed system.

10

Let µ be the M×2 matrix containing the service rates µij , where i = 1, . . . , M and j = 1, 2, and

let µR be the M ×2 matrix containing the two columns of µ in the reverse order (corresponding to

reversing the order of the two stations). For all µ and π ∈ Π, let T π(µ) be the throughput of the

system operated under the server assignment policy π when the service rates µij , where i = 1, . . . ,M

and j = 1, 2, are given by µ, see equation (1). Moreover, for all µ, let Π∗(µ) ⊂ ΠF ⊂ Π be the set of

policies conjectured to be optimal in Section 4 for the forward system with service rates µ. Assume

that we pick a policy π∗(µ) ∈ Π∗(µ) at random. For all µ, i = 1, . . . , M , and s = 1, . . . , B + 2, let

N(µ) be the number of policies in Π∗(µ), let Ni,s(µ) be the number of policies in Π∗(µ) such that

s∗i = s, and let s∗i (µ) be the value of s∗i corresponding to the policy π∗(µ), see equation (9) (note

that s∗i = sπ∗(�)i ). We have

Corollary 5.3 Suppose that the service rates µij, where i = 1, . . . , M and j = 1, 2, are drawn

independently from a certain distribution and that we choose a policy π∗(µ) from Π∗(µ) at random.

Then, IP{s∗i (µ) = s} = IP{s∗M+1−i(µ) = B + 3− s} and IE[s∗i (µ)] + IE[s∗M+1−i(µ)] = B + 3 for all

i = 1, . . . , M and s = 1, . . . , B + 2.

Proof: Parts (i) and (iv) of Proposition 5.2 and the fact that π∗(µ) is chosen from Π∗(µ) at

random imply that

IP{s∗i (µ) = s} = IE[IP{s∗i (µ) = s|µ}] = IE[Ni,s(µ)/N(µ)] = IE[NM+1−i,B+3−s(µR)/N(µR)]

= IP{s∗M+1−i(µR) = B + 3− s} = IP{s∗M+1−i(µ) = B + 3− s},

for all i = 1, . . . , M and s = 1, . . . , B + 2, where the last step follows from the fact that µ and µR

are identically distributed. Moreover, we have

IE[s∗i (µ)] =B+2∑

s=1

sIP{s∗i (µ) = s} =B+2∑

s=1

sIP{s∗M+1−i(µ) = B + 3− s}

=B+2∑

s=1

(B + 3− s)IP{s∗M+1−i(µ) = s} = B + 3− IE[s∗M+1−i(µ)],

for all i = 1, . . . , M , and the proof is complete. 2

We now consider server assignment policies that involve grouping several servers into teams,

where all servers in a team will move together between the two stations in the system. The following

corollary clearly follows from Propositions 5.1 and 5.2.

Corollary 5.4 If the servers in a set C ⊂ {1, . . . , M} are a team in an optimal policy for the

forward system, then they are also a team in an optimal policy for the reversed system.

For all µ, let l1(µ), . . . , lM (µ) be the ordered servers when the service rates are given by µij ,

for i = 1, . . . , M and j = 1, 2, see equations (7) and (8) (if equations (7) and (8) do not specify

l1(µ), . . . , lM (µ) uniquely, then ties can be broken arbitrarily, as long as this is done consistently

11

in the forward and reverse systems so that li(µ) = lM+1−i(µR) for all i = 1, . . . , M). For all

K = 1, . . . , M , let

NK = {(n1, . . . , nK) ∈ INK : n1, . . . , nK ≥ 1 and n1 + · · ·+ nK = M}. (10)

Moreover, for all µ, K ∈ {1, . . . , M}, and (n1, . . . , nK) ∈ NK , let Π(n1,...,nK)(µ) ⊂ Π be the set of all

policies with K teams, where each team k ∈ {1, . . . , K} consists of servers lΣk−1j=1 nj+1(µ), . . . , lΣk

j=1nj(µ)

(and hence has nk servers), all servers in each team k ∈ {1, . . . , K} are at station 1 in all states

s < sk(µ) and at station 2 in all states s ≥ sk(µ), and the switch points s1(µ), . . . , sK(µ) sat-

isfy 1 ≤ s1(µ) ≤ · · · ≤ sK(µ) ≤ B + 2. For example, the first team to switch from station 1

to station 2 consists of servers l1(µ), . . . , ln1(µ) and the last team to switch consists of servers

lM−nK+1(µ), . . . , lM (µ). For all µ and (n1, . . . , nK) ∈ NK , let π(n1,...,nK)(µ) ∈ Π(n1,...,nK)(µ) be

any policy with the team structure described above and with the switch point s(n1,...,nK)k (µ) of

each team k ∈ {1, . . . , K} chosen optimally (if there are multiple sets of optimal switch points

s1(µ) ≤ · · · ≤ sK(µ), then we choose a policy π(n1,...,nK)(µ) arbitrarily from all policies with the

prescribed team structure and optimal switch points). Let π(n1,...,nK)R (µ) be the corresponding

policy in the reversed system. Then we have the following corollary:

Corollary 5.5 Suppose that K ∈ {1, . . . , M}, (n1, . . . , nK) ∈ NK , the service rates µij, where

i = 1, . . . , M and j = 1, 2, are drawn independently from a certain distribution, and the Markov

chains {Xπ(n1,...,nK )(�)(t)} and {Xπ(nK,...,n1)(�)(t)} have only one recurrent equivalence class. Then,

IP{π(n1,...,nK)(µ) optimal} = IP{π(nK ,...,n1)(µ) optimal};IP{T π(n1,...,nK )(�)(µ) ≥ T πn(�)(µ), ∀n ∈ NK} = IP{T π(nK,...,n1)(�)(µ) ≥ T πn(�)(µ),∀n ∈ NK};

IE[T π(n1,...,nK )(�)(µ)] = IE[T π(nK,...,n1)(�)(µ)].

Proof: ¿From equations (7) and (8) and parts (ii) and (iii) of Proposition 5.2, it is clear that

π(n1,...,nK)R (µ) is equivalent to π(nK ,...,n1)(µR) (the only possible difference being that if there are

multiple optimal sets of switch points, then we pick one such set arbitrarily in each of π(n1,...,nK)(µ)

and π(nK ,...,n1)(µR)). Hence, Proposition 5.1 and the fact that µ and µR are identically distributed

imply that

IP{π(n1,...,nK)(µ) optimal} = IP{π(n1,...,nK)R (µ) optimal} = IP{π(nK ,...,n1)(µR) optimal}

= IP{π(nK ,...,n1)(µ) optimal}.

The other two conclusions of the corollary can be proved in a similar manner. 2

6 Numerical Results

In this section, we provide numerical results for systems with two stations and M ≥ 3 servers. In

Section 6.1, we first investigate whether the policy described in Section 4 is optimal for systems

12

with M > 3 and then present some interesting features of this policy. In Section 6.2, we develop

heuristic policies for tandem queues with two stations that group the M ≥ 3 available servers into

two or three teams and compare the throughput of these heuristics with the optimal throughput.

6.1 Conjectured Optimal Policy

In this section, we discuss two sets of numerical experiments aimed at determining whether the

policy conjectured to be optimal in Section 4 is in fact optimal for M > 3 and also at understanding

the behavior of that policy (recall that the conjectured policy is known to be optimal for M ≤ 3,

see Theorem 3.1 and Andradottir, Ayhan, and Down [4]). In the first set of numerical experiments,

we consider systems with two stations, M ∈ {3, 4, 5, 7, 10} servers, and B ∈ {0, 1, . . . , 5, 10, 15, 20}buffers between the two stations, where the service rate µij of each server i ∈ {1, . . . , M} at each

station j ∈ {1, 2} is drawn independently from a uniform distribution with range [0, 100]. For

systems with M ≤ 5, we generate 1,000,000 sets of service rates µij , where i = 1, . . . ,M and

j = 1, 2, for each buffer size B. On the other hand, for M ∈ {7, 10} and each choice of B, we

obtain our numerical results from 10,000 and 1,000 randomly generated systems, respectively. In

the second set of numerical experiments, we consider systems with 3, 4, or 5 servers, where the

service rates µij , for i = 1, . . . , M and j = 1, 2, take on all combinations of the values 1, 2, . . . , 10 for

systems with 3 or 4 servers and all combinations of the values 1, 2, . . . , 5 for systems with 5 servers.

The size B of the buffer between stations 1 and 2 again satisfies B ∈ {0, 1, . . . , 5, 10, 15, 20}.For each system with M > 3 considered in the two sets of numerical experiments described

in the previous paragraph, we compute the throughput of the policy that is conjectured to be

optimal in Section 4, as well as the throughput of the optimal policy (which is obtained by using

the policy iteration algorithm for communicating Markov chains as described in the Appendix).

(We use a smaller number of systems for M ∈ {7, 10} and randomly generated service rates and

also for M = 5 and deterministic service rates because determining the optimal policy requires a

considerable amount of effort for systems with large numbers of servers.) The throughput of the

conjectured optimal policy is always equal to the optimal throughput, which implies that for all of

the systems considered in the two sets of numerical experiments discussed in this section, the policy

described in Section 4 is indeed an optimal policy. These extensive numerical results demonstrate

that the policy described in Section 4 appears to be optimal for systems with M > 3 (at least with

high probability).

We now study the behavior of the conjectured optimal policy in a more detailed manner. Recall

that s∗i , where i ∈ {1, . . . , M}, denotes the state where the ith ordered server (i.e., server li, see

Section 4) moves from station 1 to station 2 according to the conjectured optimal policy (so that

1 ≤ s∗1 ≤ s∗2 ≤ · · · ≤ s∗M−1 ≤ s∗M = B + 2). We do not have simple expressions for computing the

switch points s∗2, . . . , s∗M−1 even when M = 3, see the definition of the set S∗ in Section 3. For

each choice of M , B, and either random or deterministic service rates, let s∗i be the average s∗ivalue over the total number of systems considered, for i = 1, . . . , M . In order to obtain a better

13

understanding of when servers l2, . . . , lM−1 move from station 1 to station 2, we consider the ratio

of the average switch points s∗2, . . . , s∗M−1 to the total number of states (B + 3) in the numerical

experiments described above. In the interest of space, we display these ratios only for systems with

four or five servers and randomly generated service rates in Table 1.

Buffer M = 4 M = 5

Size s∗2B+3

s∗3B+3

s∗2B+3

s∗3B+3

s∗4B+3

0 0.390595 0.609405 0.348271 0.499958 0.651729

1 0.355158 0.644842 0.290877 0.499824 0.709123

2 0.332058 0.667942 0.257209 0.499993 0.742791

3 0.315841 0.684160 0.233439 0.499846 0.766561

4 0.302724 0.697276 0.215206 0.500364 0.784794

5 0.292125 0.707875 0.200529 0.500062 0.799471

10 0.257131 0.742869 0.153968 0.499972 0.846032

15 0.237132 0.762868 0.128150 0.500215 0.871850

20 0.224115 0.775885 0.111320 0.499494 0.888680

Table 1: Ratio of the average switch points to the number of states for randomly generated service

rates.

The numerical results given in Table 1 are consistent with Corollary 5.3 in that

s∗iB + 3

+s∗M+1−i

B + 3' 1 for all i = 1, . . . , M,

ands∗(M+1)/2

B + 3' 0.5 when M is odd.

Moreover, for i ≤ M/2, the ratios s∗i /(B + 3) are decreasing and for i ≥ (M + 2)/2, the ratios

s∗i /(B + 3) are increasing. Similar results were obtained for M ∈ {7, 10} and randomly generated

service rates and for M ∈ {3, 4, 5} and deterministic service rates. More specifically, if we consider

M = 7 with B = 20 and randomly generated service rates, then we obtain

s∗2B + 3

' 0.0557,s∗3

B + 3' 0.1641,

s∗4B + 3

' 0.4960,s∗5

B + 3' 0.8359,

s∗6B + 3

' 0.9433.

Similarly, when M = 10, B = 20, and the service rates are randomly generated, then we have

s∗2B + 3

' 0.0455,s∗3

B + 3' 0.0663,

s∗4B + 3

' 0.1451,s∗5

B + 3' 0.3514,

s∗6B + 3

' 0.6489,s∗7

B + 3' 0.8523,

s∗8B + 3

' 0.9357,s∗9

B + 3' 0.9555.

Since the ratio s∗i /(B + 3) is very close to 0 or 1 for several servers i ∈ {2, . . . ,M − 1}, these

numerical results suggest that policies that group several servers into teams that move together

14

between stations 1 and 2 may yield good performance, at least for systems with two stations, large

number of servers M , and large buffer B between the two stations.

We next investigate how many teams the conjectured optimal policy has in the numerical

experiments described above (for example, there are two teams if there exists i∗ ∈ {2, . . . , M} such

that s∗i = s∗1 = 1 for all i < i∗ and s∗i = s∗M = B + 2 for all i ≥ i∗, and there are M teams

if 1 = s∗1 < s∗2 < · · · < s∗M−1 < s∗M = B + 2). Note that the total number of teams cannot

exceed B + 2, the total number of possible switch points. For each choice of M , B, and either

random or deterministic service rates, let ri denote the ratio of the number of systems where the

conjectured optimal policy has i teams to the total number of systems generated, for i = 2, . . . ,M

(since 1 = s∗1 < s∗M = B + 2, the policy conjectured to be optimal in Section 4 cannot have fewer

than two teams). Hence ri estimates the probability of having i teams in the conjectured optimal

policy. Table 2 shows the average number of teams,∑M

i=2 iri, as a function of the number of servers

M and buffer size B for the two sets of numerical experiments described previously, and Tables 3

and 4 display the values of r2, r3, and rM for various numbers of servers M and buffer sizes B for

the first and second sets of numerical experiments, respectively.

Buffer Random Service Rates Deterministic Service Rates

Size M = 3 M = 4 M = 5 M = 7 M = 10 M = 3 M = 4 M = 5

0 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00

1 2.45 2.58 2.69 2.82 2.92 2.45 2.58 2.66

2 2.64 2.91 3.10 3.38 3.65 2.64 2.92 3.03

3 2.74 3.12 3.37 3.77 4.19 2.74 3.11 3.24

4 2.80 3.25 3.55 4.04 4.64 2.78 3.23 3.38

5 2.83 3.33 3.67 4.23 4.89 2.81 3.30 3.46

10 2.89 3.51 3.95 4.69 5.60 2.85 3.48 3.64

15 2.90 3.56 4.04 4.84 5.86 2.86 3.53 3.69

20 2.91 3.58 4.06 4.90 6.03 2.87 3.54 3.78

Table 2: Average number of teams.

As expected, Table 2 shows that the average number of teams increases both with the number

of servers M and with the buffer size B. However, the growth rate is rather slow, so that the

average number of teams is significantly smaller than the maximum possible number of teams (i.e.,

min{M, B +2}) for large M and B. Moreover, Table 2 shows that for given values of M and B, the

average number of teams in the random and deterministic cases are quite similar, with the averages

being slightly larger when the service rates are generated at random, rather than deterministically

(this may be due to the fact that we use a larger range of possible values when the service rates

are generated at random, rather than deterministically, leading to larger differences between the

15

Buffer M = 3 M = 4 M = 5 M = 7 M = 10

Size r2 r3 rM r2 r3 rM r2 r3 rM r2 r3 rM r2 r3 rM

0 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000

1 0.550 0.450 0.450 0.425 0.575 0.000 0.315 0.685 0.000 0.184 0.816 0.000 0.081 0.919 0.000

2 0.357 0.643 0.643 0.232 0.618 0.149 0.131 0.640 0.000 0.050 0.517 0.000 0.012 0.332 0.000

3 0.259 0.741 0.741 0.144 0.594 0.262 0.066 0.535 0.032 0.017 0.317 0.000 0.002 0.128 0.000

4 0.204 0.796 0.796 0.098 0.560 0.343 0.037 0.447 0.067 0.007 0.205 0.000 0.001 0.043 0.000

5 0.171 0.829 0.829 0.070 0.530 0.400 0.023 0.381 0.099 0.003 0.141 0.000 0.000 0.029 0.000

10 0.112 0.888 0.888 0.022 0.449 0.529 0.004 0.231 0.189 0.000 0.039 0.004 0.000 0.002 0.000

15 0.097 0.903 0.903 0.011 0.419 0.570 0.001 0.185 0.224 0.000 0.021 0.007 0.000 0.000 0.000

20 0.091 0.909 0.909 0.007 0.405 0.588 0.001 0.168 0.233 0.000 0.016 0.009 0.000 0.000 0.000

Table 3: Team probabilities for randomly generated service rates.

capabilities of the different servers at the two stations in the system). Similarly, Tables 3 and

4 show that the probability of having two teams decreases as the buffer size B increases for all

M ≥ 3. Moreover, the probability of having three teams decreases with the buffer size for all

M ≥ 5 (looking only at B ≥ 1, so that it is possible to have three teams). When M = 3, r3

increases as the buffer size increases, which is reasonable since in this case r3 = rM ; when M = 4,

r3 first increases and then decreases with B. Finally, rM increases with the buffer size in all cases.

Note however that for fixed B, Tables 3 and 4 show that rM decreases as M increases (in fact,

when M = 10 in Table 3, then rM = 0 for all B ∈ {0, 1, . . . , 5, 10, 15, 20}). Together with Table 2,

this suggests that the conjectured optimal policy is likely to have some servers grouped into teams,

at least for large numbers of servers M .

6.2 Heuristic Server Assignment Policies with Two or Three Teams

The numerical results given in Section 6.1 suggest that the optimal server assignment policy for

systems with two stations in tandem and M servers has the structure described in Section 4.

However, this policy may be difficult to implement in practice when M is large. In this section,

we consider policies in which the servers are grouped into two or three teams, and then the teams

are assigned to stations in the manner found to be optimal for systems with two or three servers,

see Andradottir, Ayhan, and Down [4] and Section 3. Our goal is to develop server assignment

heuristics that are easily implementable and also robust with respect to the server capabilities in

that their average throughput as the service rates vary is near-optimal.

We first order the servers as is done in Section 4, see equations (7) and (8). Then we consider all

ways of grouping the ordered servers into two or three teams. More specifically, for all (n,M −n) ∈N2 (see equation (10)), we consider using the server assignment policy of Andradottir, Ayhan, and

Down [4] with servers l1, . . . , ln in one team and servers ln+1, . . . , lM in the other team. Similarly,

for all (n1, n2, M − n1 − n2) ∈ N3, we consider using the server assignment policy found to be

16

Buffer M = 3 M = 4 M = 5

Size r2 r3 rM r2 r3 rM r2 r3 rM

0 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000

1 0.547 0.453 0.453 0.425 0.575 0.000 0.315 0.685 0.000

2 0.356 0.644 0.644 0.235 0.615 0.150 0.150 0.670 0.000

3 0.264 0.736 0.736 0.150 0.592 0.258 0.085 0.604 0.018

4 0.217 0.783 0.783 0.105 0.561 0.333 0.058 0.540 0.035

5 0.190 0.810 0.810 0.079 0.537 0.383 0.045 0.497 0.050

10 0.148 0.852 0.852 0.027 0.463 0.510 0.031 0.392 0.091

15 0.139 0.861 0.861 0.015 0.440 0.545 0.029 0.362 0.108

20 0.133 0.867 0.867 0.011 0.434 0.555 0.013 0.315 0.138

Table 4: Team probabilities for deterministic service rates.

optimal in Section 3 with servers l1, . . . , ln1 in the first team, servers ln1+1, . . . , ln1+n2 in the second

team, and servers ln1+n2+1, . . . , lM in the third team. Note that in all of these server assignment

heuristics, the order of the servers (and hence the composition of the teams) depends on the service

rates, but the size of the teams does not. In order to evaluate and compare the performance of

these policies, we perform three sets of numerical experiments in which the service rates are drawn

independently from uniform distributions with ranges [40, 60], [20, 80], and [0, 100], respectively.

Note that these three uniform distributions all have a common mean but different variances. Hence

these distributions are chosen to model situations where the capabilities of the servers at the two

stations tend to be quite similar, quite different, and very different, respectively. In all cases, we

consider systems with M = 3, 4, . . . , 10 servers and B = 5, 10, 20 buffers. For each M ≤ 6 and

each choice of B, we generate 100,000 sets of service rates µij , where i = 1, . . . , M and j = 1, 2.

On the other hand, for M = 7, 8, 9, and 10, we generate 10000, 5000, 1000, and 500 sets of service

rates, respectively, for each value of B (as in Section 6.1, we generate fewer sets of service rates for

systems with large numbers of servers M because of the excessive amount of computational effort

required for determining the optimal policy for systems with many servers).

In the two team setting, the numerical experiments described in the previous paragraph suggest

that the two team heuristic with the number of servers in the two teams differing at most by one

performs the best on average in that it yields the highest average throughput and also leads to the

highest throughput among all the two team policies that we consider more often than any other

two team policy. Thus, if M is even, then our two team heuristic assigns servers l1, . . . , lM/2 to

station 2 unless station 2 is starved, and servers l(M+2)/2, . . . , lM to station 1 unless station 1 is

blocked. All servers work at station 2 (station 1) when station 1 (station 2) is blocked (starved).

On the other hand if M is odd, then either servers l1, . . . , l(M−1)/2 form the downstream team and

17

servers l(M+1)/2, . . . , lM form the upstream team or servers l1, . . . , l(M+1)/2 form the downstream

team and servers l(M+3)/2, . . . , lM form the upstream team (note that Corollary 5.5 shows that on

the average, these two server assignment heuristics behave in the same manner).

In the three team setting, the best average performance is obtained by forming the teams in such

a way that the size of the moving team is smaller than the sizes of the upstream and downstream

teams and the sizes of the upstream and downstream teams differ at most by one (as in the two team

setting, this approach to forming the teams maximizes both the average performance and also the

probability of achieving the best performance among all three team policies under consideration).

When M is odd, this translates into having servers l1, . . . , l(M−1)/2 as the downstream team, server

l(M+1)/2 as the moving team, and servers l(M+3)/2, . . . , lM as the upstream team. However, when

M is even, then there are two cases: If M ∈ {4, 6}, then either servers l1, . . . , lM/2 form the

downstream team, server l(M+2)/2 forms the moving team, and servers l(M+4)/2, . . . , lM form the

upstream team, or servers l1, . . . , l(M−2)/2 form the downstream team, server lM/2 forms the moving

team, and servers l(M+2)/2, . . . , lM form the upstream team (note that the average performance of

these two teams is the same by Corollary 5.5). On the other hand, if M ∈ {8, 10, 12, . . .}, then

servers l1, . . . , l(M−2)/2 form the downstream team, servers lM/2 and l(M+2)/2 form the moving team,

and servers l(M+4)/2, . . . , lM form the upstream team.

We now compare our two and three team heuristics with other server assignment policies,

including the optimal policy (determined by using the policy iteration algorithm for communicating

Markov chains, see the Appendix), the best two team policy, the best three team policy, and a

benchmark policy, namely the teamwork policy of Van Oyen, Gel, and Hopp [19] (where all servers

work in a single team that will follow each job from the first to the last station and only starts work

on a new job once all work on the previous job has been completed). For each randomly generated

choice of µ (see Section 5), the best two team policy is the one that yields the highest throughput

among all two team policies such that servers l1(µ), . . . , ln(µ) are primarily assigned to station 2 and

servers ln+1(µ), . . . , lM (µ) are primarily assigned to station 1, where n ∈ {1, . . . ,M −1}. Similarly,

for each choice of µ, the best three team policy is the one that yields the highest throughput

among all three team policies such that servers l1(µ), . . . , lk1(µ) form the downstream team, servers

lk1+1(µ), . . . , lk2(µ) form the moving team, and servers lk2+1(µ), . . . , lM (µ) form the upstream

team, where k1 ∈ {1, . . . ,M − 1} and k2 ∈ {k1, . . . , M − 1}. (In other words, in the best two and

three team policies, the team sizes are allowed to depend on µ.)

Although we perform nine sets of numerical experiments (one for each combination of three

choices of distribution for the service rates and three buffer sizes), in the interest of space we only

show results from four sets of numerical experiments here. For all a, b ∈ IR with a ≤ b, let U [a, b]

denote the uniform distribution with range [a, b]. Tables 5 through 8 show 95% confidence intervals

for the average throughput values of the policies described in the previous three paragraphs when

the service rates are drawn from either the U [40, 60] or U [0, 100] distribution and the buffer size

B satisfies B ∈ {5, 20}. Since we have two alternative ways of forming the teams for the two team

18

heuristic when M ∈ {3, 5, 7}, we arbitrarily choose the one that has servers l1, . . . , l(M−1)/2 in the

downstream team and servers l(M+1)/2, . . . , lM in the upstream team. Similarly, for the three team

heuristic and M ∈ {4, 6}, we assign servers l1, . . . , lM/2 to the downstream team, server l(M+2)/2 to

the moving team, and servers l(M+4)/2, . . . , lM to the upstream team.

Number of Teamwork Two Team Best Two Three Team Best Three Optimal

Servers Policy Heuristic Team Policy Heuristic Team Policy Policy

3 74.85 ± 0.06 77.44 ± 0.06 77.84 ± 0.06 78.18 ± 0.07 78.18 ± 0.07 78.18 ± 0.07

4 99.86 ± 0.07 104.51 ± 0.08 104.53 ± 0.08 104.62 ± 0.08 104.71 ± 0.08 104.75 ± 0.08

5 124.86 ± 0.08 130.38 ± 0.08 130.77 ± 0.08 131.19 ± 0.08 131.22 ± 0.08 131.25 ± 0.08

6 149.87 ± 0.08 157.31 ± 0.09 157.33 ± 0.09 157.50 ± 0.09 157.69 ± 0.09 157.76 ± 0.09

7 174.82 ± 0.11 183.24 ± 0.12 183.60 ± 0.12 184.03 ± 0.12 184.11 ± 0.12 184.20 ± 0.12

8 199.83 ± 0.16 210.05 ± 0.18 210.08 ± 0.18 210.43 ± 0.18 210.59 ± 0.18 210.70 ± 0.18

9 224.94 ± 0.38 236.16 ± 0.42 236.49 ± 0.42 236.92 ± 0.42 237.12 ± 0.42 237.24 ± 0.42

10 249.90 ± 0.57 262.68 ± 0.64 262.73 ± 0.64 263.25 ± 0.64 263.38 ± 0.64 263.53 ± 0.64

Table 5: Throughput values for systems with U [40, 60]-distributed service rates and B = 5.



3 74.85 ± 0.06 77.47 ± 0.06 77.88 ± 0.06 78.33 ± 0.07 78.33 ± 0.07 78.33 ± 0.07

4 99.86 ± 0.07 105.01 ± 0.08 105.03 ± 0.08 105.11 ± 0.08 105.17 ± 0.08 105.19 ± 0.08

5 124.86 ± 0.08 130.61 ± 0.08 130.99 ± 0.08 131.71 ± 0.09 131.72 ± 0.09 131.73 ± 0.09

6 149.87 ± 0.08 158.14 ± 0.09 158.15 ± 0.09 158.28 ± 0.09 158.39 ± 0.09 158.45 ± 0.09

7 174.82 ± 0.11 183.77 ± 0.12 184.20 ± 0.12 184.63 ± 0.12 184.96 ± 0.12 184.99 ± 0.12

8 199.83 ± 0.16 210.78 ± 0.19 210.82 ± 0.18 210.96 ± 0.19 211.26 ± 0.18 211.35 ± 0.18

9 224.94 ± 0.38 237.10 ± 0.43 237.82 ± 0.43 238.34 ± 0.43 238.58 ± 0.43 238.63 ± 0.43

10 249.90 ± 0.57 263.81 ± 0.64 263.85 ± 0.64 264.19 ± 0.65 264.31 ± 0.64 264.43 ±0.64


As expected, Tables 5 through 8 show that the throughputs achieved by all the policies con-

sidered in this section appear to increase as the number of servers M increases. Similarly, the

throughputs of all the policies except for the teamwork policy appear to increase with both the

number of buffers B and the variability in the service rates µij , where i = 1, . . . , M and j = 1, 2.

On the other hand, the throughput of the teamwork policy is by definition insensitive to B and

it appears to decrease slightly as the variability in the service rates increases. The fact that the

19



3 70.70 ± 0.32 82.64 ± 0.37 85.05 ± 0.37 86.23 ± 0.38 86.23 ± 0.38 86.23 ± 0.38

4 95.77 ± 0.36 116.56 ± 0.43 117.20 ± 0.42 117.61 ± 0.43 118.63 ± 0.43 118.73 ± 0.43

5 120.82 ± 0.40 146.20 ± 0.47 149.03 ± 0.46 150.50 ± 0.48 150.97 ± 0.47 151.14 ± 0.47

6 145.92 ± 0.43 179.93 ± 0.51 181.07 ± 0.50 181.76 ± 0.50 183.32 ± 0.51 183.51 ± 0.51

7 170.53 ± 0.55 209.47 ± 0.64 212.46 ± 0.64 213.56 ± 0.64 215.06 ± 0.64 215.46 ± 0.64

8 195.57 ± 0.83 242.87 ± 0.97 244.41 ± 0.95 246.28 ± 0.96 247.34 ± 0.96 247.87 ± 0.96

9 221.15 ± 1.96 273.33 ± 2.26 276.57 ± 2.22 278.39 ± 2.25 279.84 ± 2.24 280.49 ± 2.25

10 246.06 ± 2.97 305.28 ± 3.45 307.42 ± 3.42 309.85 ± 3.48 311.04 ± 3.45 311.81 ± 3.46


throughputs of all the policies except for the teamwork policy increase with the variability in the

service rates is reasonable because only the teamwork policy is unable to take advantage of this

variability by assigning servers primarily to tasks that they are good at (i.e., have a high service

rate at).

Tables 5 through 8 also show that the average behavior of our two team heuristic is in all cases

very close to that of the best two team policy and also that the average behavior of our three

team heuristic is always very similar to the average performance of the best three team policy.

Both the two and three team heuristics perform significantly better than the teamwork policy,

especially when the service rates are highly variable, with the three team heuristic showing slightly

better average performance than the two team heuristic. Finally, the average performance of the

three team heuristic is always very close to that of the optimal policy (it is equal to the average

performance of the optimal policy when M = 3, as predicted by Theorem 3.1). These observations

suggest that both of our server assignment heuristics are likely to yield very good performance

in practice, and that the behavior of the three team heuristic is usually near-optimal. Moreover,

although our heuristics are designed to be both easily implementable and also robust with respect

to the service rates (in that the sizes of the teams do not depend on the service rates), our numerical

results indicate that there is very little room for obtaining improved average performance through

the use of more complex policies or policies that depend more heavily on the service rates.

7 Conclusion

For Markovian queueing systems with two stations in tandem, finite intermediate buffer, and three

flexible and collaborative servers, we have completely specified how servers should be assigned to

stations in order to achieve maximal long-run average throughput. Moreover, we have provided

20



3 70.70 ± 0.32 83.14 ± 0.37 85.91 ± 0.37 87.47 ± 0.39 87.47 ± 0.39 87.47 ± 0.39

4 95.77 ± 0.36 118.33 ± 0.44 119.06 ± 0.44 119.63 ± 0.44 120.90 ± 0.44 120.94 ± 0.44

5 120.82 ± 0.40 147.89 ± 0.47 151.60 ± 0.47 153.68 ± 0.49 154.17 ± 0.48 154.25 ± 0.48

6 145.92 ± 0.43 183.13 ± 0.53 184.67 ± 0.51 185.30 ± 0.53 187.47 ± 0.52 187.58 ± 0.52

7 170.53 ± 0.55 212.57 ± 0.66 216.97 ± 0.65 218.96 ± 0.67 220.15 ± 0.66 220.38 ± 0.66

8 195.57 ± 0.83 247.58 ± 1.01 249.97 ± 0.97 251.58 ± 0.98 253.40 ± 0.98 253.73 ± 0.99

9 221.15 ± 1.96 277.93 ± 2.34 283.03 ± 2.28 284.81 ± 2.36 286.76 ± 2.30 287.18 ± 2.30

10 246.06 ± 2.97 311.28 ± 3.66 316.44 ± 3.52 317.19 ± 3.57 318.77 ± 3.54 319.25 ± 3.55


a conjecture for the structure of an optimal server assignment policy for two-station tandem lines

with an arbitrary number of flexible and collaborative servers; the results of extensive numerical

experiments suggest that our conjecture appears to be correct. Finally, we have proposed heuris-

tic server assignment policies that involve grouping all available servers into two or three teams

and presented numerical results that suggest that our heuristic policies (especially our three team

heuristic) generally achieve near-optimal long-run average throughput.

Acknowledgments

The research of the first author was supported by the National Science Foundation under grants

DMI–0000135 and DMI–0217860. The research of the second author was supported by the National

Science Foundation under grants DMI–9908161 and DMI–9984352.

Appendix: Proof of Theorem 3.1

We will use the notation aσdσmσu for the possible actions, where, for i = d,m, u, σi ∈ {I, 1, 2} is

the status of server i, with σi = I when server i is idle and σi = j ∈ {1, 2} when server i is working

at station j. Then the set As of allowable actions in state s ∈ S is given by

As =

{aIII , a1II , aI1I , aII1, a11I , a1I1, aI11, a111} for s = 0,

{aIII , a1II , aI1I , aII1, a11I , a1I1, aI11, a2II , aI2I , aII2,

a22I , a2I2, aI22, a111, a211, a112, a121, a122, a221, a212, a222} for s ∈ {1, . . . , B + 1},{aIII , a2II , aI2I , aII2, a22I , a2I2, aI22, a222} for s = B + 2.

Note that the set of possible actions in states 0 and B + 2 can be reduced. For example, in state

0, action aIII is identical to actions a2II , aI2I , aII2, a22I , a2I2, aI22, and a222, and in state B + 2,

21

action aIII is identical to actions a1II , aI1I , aII1, a11I , a1I1, aI11, and a111.

As was mentioned in Section 3, under our assumptions on the service rates and definitions of d

and u, neither µd2 nor µu1 can be equal to zero. This shows that the policy described in Theorem

3.1 corresponds to an irreducible Markov chain, and consequently that we have a communicating

Markov decision process. Therefore, we use the policy iteration algorithm for communicating

models (see pages 479 and 480 of Puterman [16]) to prove the optimality of the policy described in

Theorem 3.1.

For all decision rules δ, let Pδ be the (B +3)× (B +3) dimensional transition probability matrix

corresponding to the policy (δ)∞ and let rδ be the B + 3 dimensional reward vector corresponding

to δ, with rδ(s) denoting the reward earned in state s under the policy (δ)∞, for all s ∈ S.

Moreover, let q denote the uniformization constant (we assume, without loss of generality, that the

uniformization constant does not depend on the policy π ∈ Π, see Section 2).

In the policy iteration algorithm, we start by choosing

δ0(s) = δ∗(s) = δs∗(s) =

a111 for s = 0,

a211 for 1 ≤ s ≤ s∗ − 1,

a221 for s∗ ≤ s ≤ B + 1,

a222 for s = B + 2,

corresponding to the policy described in Theorem 3.1. Then

rδ0(s) =

0 for s = 0,

µd2 for 1 ≤ s ≤ s∗ − 1,

µd2 + µm2 for s∗ ≤ s ≤ B + 1,

µd2 + µm2 + µu2 for s = B + 2,

and

Pδ0(s, s′) =

µd1+µm1+µu1

q for s = 0, s′ = 1,q−(µd1+µm1+µu1)

q for s = s′ = 0,µd2q for 1 ≤ s ≤ s∗ − 1, s′ = s− 1,

q−(µd2+µu1+µm1)q for 1 ≤ s ≤ s∗ − 1, s′ = s,

µu1+µm1

q for 1 ≤ s ≤ s∗ − 1, s′ = s + 1,µd2+µm2

q for s∗ ≤ s ≤ B + 1, s′ = s− 1,q−(µd2+µm2+µu1)

q for s∗ ≤ s ≤ B + 1, s′ = s,µu1

q for s∗ ≤ s ≤ B + 1, s′ = s + 1,µd2+µm2+µu2

q for s = B + 2, s′ = B + 1,q−(µd2+µm2+µu2)

q for s = s′ = B + 2.

Since the Markov chain under the policy (δ0)∞ is irreducible, we find a scalar g0 and a vector h0

solving

rδ0 − g0e + (Pδ0 − I)h0 = 0, (11)

22

subject to h0(0) = 0. In equation (11), e is a column vector of ones and I is the identity matrix.

For the rest of the proof we will use the following notation to simplify our expressions:

Σ1 = µu1 + µm1 + µd1, Σ2 = µu2 + µm2 + µd2, Σu = µu1 + µm1, Σd = µm2 + µd2,

∆md = µm1µd2 − µd1µm2, ∆ud = µu1µd2 − µd1µu2, ∆um = µu1µm2 − µm1µu2.

Recall that under our assumptions on the service rates and by the definitions of d, m, and u, we

have Σ1 > 0, Σ2 > 0, Σu > 0, Σd > 0, ∆md ≥ 0, ∆ud ≥ 0, and ∆um ≥ 0. Moreover, note that

∆ud = 0 if and only if ∆md = ∆ud = ∆um = f(s) = 0 for all s ≥ 0, (12)

see the proof of Proposition 3.2. Let

Θ1 =Σ1(µs∗

d2 − Σs∗u )

µs∗−1d2 (µd2 − Σu)

+Σs∗−1

u µu1(ΣB+2−s∗d − µB+2−s∗

u1 )ΣB+2−s∗

d µs∗−1d2 (Σd − µu1)

,

Θ2 = 1 +Σ1(µs∗−1

d2 − Σs∗−1u )

µs∗−1d2 (µd2 − Σu)

+Σ1Σs∗−1

u

ΣB+2−s∗d µs∗−1

d2

(ΣB+2−s∗

d − µB+2−s∗u1

Σd − µu1+

µB+2−s∗u1

Σ2

),

Θ3 = Σ1s∗ +

µu1(ΣB+2−s∗d − µB+2−s∗

u1 )ΣB+2−s∗

d (Σd − µu1),

Θ4 = 1 +Σ1(s∗ − 1)

µd2+

Σ1

ΣB+2−s∗d

(ΣB+2−s∗

d − µB+2−s∗u1

Σd − µu1+

µB+2−s∗u1

Σ2

),

Θ5 =Σ1(µs∗

d2 − Σs∗u )

µs∗−1d2 (µd2 − Σu)

+Σs∗−1

u (B + 2− s∗)µs∗−1

d2

,

and

Θ6 = 1 +Σ1(µs∗−1

d2 − Σs∗−1u )

µs∗−1d2 (µd2 − Σu)

+Σ1Σs∗−1

u

Σdµs∗−1d2

(B + 2− s∗ +

Σd

Σ2

).

One can show that

g0 =

Θ1Θ2

if µd2 6= Σu and µu1 6= Σd,Θ3Θ4

if µd2 = Σu and µu1 6= Σd,Θ5Θ6

if µd2 6= Σu and µu1 = Σd,

(13)

h0(0) = 0,

h0(s) =qg0

Σ1

((µd1 + µd2)

s−2∑

j=0

(j + 1)µs−j−2d2 Σj+1−s

u + s)− qµd2

s−2∑

j=0

(j + 1)µs−2−jd2 Σj+1−s

u

for 1 ≤ s ≤ s∗, and

h0(s) = h0(s∗) +qΣd

µs−s∗u1

s−s∗−1∑

j=0

µju1Σ

s−s∗−j−1d

g0

Σ1

((µd1 + µd2)

s∗−2∑

j=0

µs∗−j−2d2 Σj+1−s

u + Σs∗−su

)−

s∗−2∑

j=0

µs∗−j−2d2 Σj+1−s

u

+

q(g0 − Σd)µs−s∗

u1

s−s∗−1∑

j=0

(j + 1)µju1Σ

s−s∗−j−1d

23

for s∗ + 1 ≤ s ≤ B + 2 (with the convention that summation over an empty set equals to zero)

constitute a solution to equation (11). Note that g0 > 0 in all three cases listed in equation (13)

and that under our assumptions on service rates, it is not possible to have µd2 = Σu and µu1 = Σd,

because this would imply that µm1 = µm2 = 0.

For the remainder of the proof, we assume that µd2 6= Σu and µu1 6= Σd; the other two cases

can be handled in a similar manner. For all s ∈ S and a ∈ As, let r(s, a) be the immediate reward

obtained when action a is chosen in state s and let p(j|s, a) be the probability of going to state j

in one step when action a is chosen in state s. As a next step of the policy iteration algorithm, we

choose

δ1(s) ∈ arg maxa∈As

r(s, a) +

∑

j∈S

p(j|s, a)h0(j)

, ∀s ∈ S,

setting δ1(s) = δ0(s) if possible. We now show that if d, m, u, and s∗ are chosen as described in

Section 3, then δ1(s) = δ0(s), for all s ∈ S. In particular, for all s ∈ S and a ∈ As, we will compute

the differences

r(s, a) +∑

j∈S

p(j|s, a)h0(j)−r(s, d0(s)) +

∑

j∈S

p(j|s, d0(s))h0(j)

(14)

and show that the differences are non-positive.

For s = 0, recall that δ0(s) = a111. We have

r(s, aIII) +∑

j∈S

p(j|s, aIII)h0(j)−r(s, a111) +

∑

j∈S

p(j|s, a111)h0(j)

= −g0 < 0,

r(s, a1II) +∑

j∈S

p(j|s, a1II)h0(j)−r(s, a111) +

∑

j∈S

p(j|s, a111)h0(j)

= −Σug0

Σ1< 0,

r(s, aI1I) +∑

j∈S

p(j|s, aI1I)h0(j)−r(s, a111) +

∑

j∈S

p(j|s, a111)h0(j)

= −(µd1 + µu1)g0

Σ1< 0,

r(s, aII1) +∑

j∈S

p(j|s, aII1)h0(j)−r(s, a111) +

∑

j∈S

p(j|s, a111)h0(j)

= −(µd1 + µm1)g0

Σ1≤ 0,(15)

r(s, a11I) +∑

j∈S

p(j|s, a11I)h0(j)−r(s, a111) +

∑

j∈S

p(j|s, a111)h0(j)

= −µu1g0

Σ1< 0,

r(s, a1I1) +∑

j∈S

p(j|s, a1I1)h0(j)−r(s, a111) +

∑

j∈S

p(j|s, a111)h0(j)

= −µm1g0

Σ1≤ 0, (16)

r(s, aI11) +∑

j∈S

p(j|s, aI11)h0(j)−r(s, a111) +

∑

j∈S

p(j|s, a111)h0(j)

= −µd1g0

Σ1≤ 0. (17)

Note that in (15), we have the expression equal to zero only when µd1 = µm1 = 0, in which case

a111 is identical to aII1; in (16), we have the expression equal to zero only when µm1 = 0, in which

24

case a111 is identical to a1I1; and finally in (17), we have the expression equal to zero only when

µd1 = 0, in which case a111 is identical to aI11. This shows that δ1(0) = δ0(0).

For 1 ≤ s ≤ s∗ − 1, we have that δ0(s) = a211. Since the set As of all possible actions is large,

in the interest of space we will specify the difference in (14) only for the actions a111, a112, a121,

a122, a221, a212, and a222. Define

Γ1 = µs∗−2d2 ΣB+2−s∗

d Σ2(µd1 + µd2) + Σs∗−1u µB+1−s∗

u1 Σ1(µu1 + µu2) + Σs∗u ΣB+2−s∗

d +

Σs∗−1u ΣB+3−s∗

d + ΣB+3−s∗d

s∗−3∑

j=0

µj+1d2 Σs∗−2−j

u + ΣB+2−s∗d µu2

s∗−2∑

j=0

µjd2Σ

s∗−1−ju +

µd1ΣB+2−s∗d Σ2

s∗−4∑

j=0


u + µd1

min(1,s∗−2)∑

j=0

ΣB+2−s∗+jd Σs∗−1−j

u +

µd1µu2

min(0,s∗−3)∑

j=0

ΣB+2−s∗+jd Σs∗−2−j

u + Σ1(µu1 + µu2)Σs∗−1u

B−s∗∑

j=0

Σj+1d µB−s∗−j

u1 .

Note that Γ1 is always positive. We have

r(s, a111) +∑

j∈S

p(j|s, a111)h0(j)−r(s, a211) +

∑

j∈S

p(j|s, a211)h0(j)

= −Γ2(s)

Γ1,

where

Γ2(s) = Σ1

s−1∑

j=0

µjd2Σ

s∗−2−ju

(∆md

( B−s∗+2∑

j=0

µju1Σ

B−s∗+2−jd + µu2

B−s∗+1∑

j=0

Σjdµ

B−s∗+1−ju1

)+

µB−s∗+2u1 ∆ud

)≥ 0.

Then −Γ2(s)Γ1

≤ 0. Moreover,Γ2(s)Γ1

= 0 if and only if ∆ud = 0, which implies that S∗ = S \ {0} 6={s∗} (see equation (12) and the definition of S∗). Similarly,

r(s, a112) +∑

j∈S

p(j|s, a112)h0(j)−r(s, a211) +

∑

j∈S

p(j|s, a211)h0(j)

= −Γ3(s)

Γ1,

where

Γ3(s) = (∆md + ∆ud + ∆um)µB+2−s∗u1 Σ1

s−1∑

j=0

µjd2Σ

s∗−2−ju +

((∆ud + ∆um)µm2 + ∆umµu2

)

s−1∑

k=0

µkd2Σ

s∗−1−ku

B−s∗+1∑

j=0

Σjdµ

B+1−s∗−ju1

+

∆udµu2µs−1d2 Σs∗−s

u

B−s∗+1∑

j=0

Σjdµ

B+1−s∗−ju1 + ∆udΣ2

s∗−s−1∑

j=0

µB−jd2 Σj

u +

(∆ud + ∆um)µsd2Σ

s∗−s−1u

B−s∗+1∑

j=0

µj+1u1 ΣB+1−s∗−j

d +

25

(∆md + ∆um)µd1

s−1∑

k=0

µkd2Σ

s∗−2−ku

B−s∗+1∑

j=0

Σj+1d µB+1−s∗−j

u1

+

(∆umµd1µm2 + ∆md(ΣuΣd + µd1µm2)

)

s−1∑

k=0

µkd2Σ

s∗−2−ku

B−s∗+1∑

j=0

Σjdµ

B+1−s∗−ju1

≥ 0.

Then −Γ3(s)Γ1


= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. We

have

r(s, a121) +∑

j∈S

p(j|s, a121)h0(j)−r(s, a211) +

∑

j∈S

p(j|s, a211)h0(j)

= −Γ4(s)

Γ1,

where Γ4(s) = Γ14(s) + Γ2

4,

Γ14(s) =

Σ1

s−1∑

j=0

µjd2Σ

s∗−2−ju + (µm2Σu + µm1µd2)

s∗−s−2∑

j=0

Σjuµs∗−3−j

d2

×

(∆mdΣ2

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd + µB+2−s∗

u1 (∆md + ∆ud))≥ 0,

and Γ24 = f(s∗). Note that the definition of s∗ implies that Γ2

4 ≥ 0. Hence, Γ4(s) ≥ 0 and

−Γ4(s)Γ1


= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. Similarly,

r(s, a122) +∑

j∈S

p(j|s, a122)h0(j)−r(s, a211) +

∑

j∈S

p(j|s, a211)h0(j)

= −Γ5(s)

Γ1,

where

Γ5(s) = (∆md + ∆ud)

µB+2−s∗

u1

s−1∑

j=0

µjd2Σ

s∗−1−ju + Σ2µ

s−1d2 Σs∗−s

u

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd +

Σ2µm1

s−2∑

k=0

µkd2Σ

s∗−2−ku

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd +

Σ2ΣB+2−s∗d

s∗−s−1∑

j=0

µs∗−j−2d2 Σj

u + µB+2−s∗u1 µd1

s−1∑

j=0

µjd2Σ

s∗−2−ju

+

(∆md + ∆um)Σ2Σ1

s−1∑

k=0

µkd2Σ

s∗−2−ku

B+1−s∗∑

j=0

µju1Σ

B+1−s∗−jd ≥ 0.

Thus, −Γ5(s)Γ1



now consider

r(s, a221) +∑

j∈S

p(j|s, a221)h0(j)−r(s, a211) +

∑

j∈S

p(j|s, a211)h0(j)

= −Γ6(s)

Γ1,

26

where Γ6(s) = Γ16(s) + Γ2

6,

Γ16(s) = (µm2Σu + µm1µd2)

s∗−s−2∑

j=0

Σjuµs∗−3−j

d2 ×µB+2−s∗

u1 (∆md + ∆ud) + ∆mdΣ2

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd

≥ 0,

and Γ26 = f(s∗) ≥ 0. Thus, Γ6(s) ≥ 0 and −Γ6(s)

Γ1≤ 0. Moreover, if s < s∗ − 1, then

Γ6(s)Γ1

= 0

if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. Similarly, if s = s∗ − 1, thenΓ6(s)Γ1

= 0 if

and only if f(s∗) = 0, which implies that s∗ − 1, s∗ ∈ S∗ by the definition of S∗, the fact that f is

non-increasing (see the proof of Proposition 3.2), and the fact that s∗ − 1 = s ≥ 1. Moreover,

r(s, a212) +∑

j∈S

p(j|s, a212)h0(j)−r(s, a211) +

∑

j∈S

p(j|s, a211)h0(j)

= −Γ7(s)

Γ1,

where

Γ7(s) = ∆um

s−1∑

k=0

µkd2Σ

s∗−2−ku

Σu

( B−s∗+2∑

j=0

µju1Σ

B+2−s∗−jd + µu2

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd

)+

µd1

B−s∗+2∑

j=0

µju1Σ

B+2−s∗−jd + µu2µd1

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd

+

∆udΣ2

µs−1

d2 Σs∗−s−1u

B−s∗+2∑

j=0

µju1Σ

B+2−s∗−jd + ΣB+2−s∗

d

s∗−s−2∑

j=0

µs∗−2−jd2 Σj

u

≥ 0.

Thus, −Γ7(s)Γ1


= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.Finally,

r(s, a222) +∑

j∈S

p(j|s, a222)h0(j)−r(s, a211) +

∑

j∈S

p(j|s, a211)h0(j)

= −Γ8(s)

Γ1,

where

Γ8(s) = Σ2

(∆um

(Σs∗−s−1

u µd1µs−1d2

B−s∗∑

j=0

Σj+1d µB−s∗−j

u1 +s−1∑

k=0

µkd2Σ

s∗−1−ku

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd +

µd1

s−2∑

k=0

µkd2Σ

s∗−2−ku

B−s∗+1∑

j=0

µju1Σ

B+1−s∗−jd

)+

(∆ud + ∆md)(Σs∗−s−1

u µs−1d2

B−s∗+1∑

j=0

Σj+1d µB−s∗+1−j

u1 + ΣB−s∗+2d

s∗−s−2∑

j=0

µs∗−2−jd2 Σj

u

)+

∆udΣs∗−su µs−1

d2 µB−s∗+1u1

)≥ 0.

27

Hence, −Γ8(s)Γ1


= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.This shows that δ1(s) = δ0(s) for all 1 ≤ s ≤ s∗ − 1.

We now consider s∗ ≤ s ≤ B + 1, for which we have δ0(s) = a221. In the interest of space we

will again specify the difference in (14) only for the actions a111, a211, a112, a121, a122, a212, and

a222. Define

Υ1 = ΣB+2−s∗d

(µd2Σ2 + (µd1 + µm1)Σ2

) s∗−2∑

j=0

µju1µ

s∗−2−jd2 + (µu2 + µd1 + µm1)

B+1∑

j=s∗−1

µju1Σ

B+1−jd +

B+2∑

j=s∗−1

µju1Σ

B+2−jd + µu2(µd1 + µm1)

B∑

j=s∗−1

µju1Σ

B−jd + µm1Σ1

s∗−2∑

j=0

Σs∗−2−ju µB−s∗+2+j

u1 +

µm1Σ1Σ2

s∗−2∑

j=0

Σs∗−2−ju

ΣB−s∗+2

d

j−1∑

k=0

µku1µ

j−1−kd2 +

B−s∗+1∑

k=0

µk+ju1 ΣB−s∗+1−k

d

.

Note that Υ1 is always positive. We have

r(s, a111) +∑

j∈S

p(j|s, a111)h0(j)−r(s, a221) +

∑

j∈S

p(j|s, a221)h0(j)

= −Υ2(s)

Υ1,

where

Υ2(s) = Σ1

∆md

s∗−2∑

j=0

µjd2Σ

s∗−2−ju

( B+1−s∑

k=0

µku1Σ

B+2−s∗−kd + µu2

B−s∑

k=0

µku1Σ

B+1−s∗−kd

)+

(∆md + ∆ud)(µd1 + µm1)s∗−1s−s∗∑

j=0

Σjdµ

B−s∗+1−ju1 +

∆udµB+1−su1 Σs−s∗+1

d

s∗−2∑

j=0

µjd2Σ

s∗−2−ju

≥ 0.

Then −Υ2(s)Υ1

≤ 0. Moreover,Υ2(s)Υ1

= 0 if and only if ∆ud = 0 (this can be seen by considering

the cases when s∗ = 1 and s∗ > 1 separately), which implies that S∗ 6= {s∗} (see equation (12) and

the definition of S∗). Similarly,

r(s, a211) +∑

j∈S

p(j|s, a211)h0(j)−r(s, a221) +

∑

j∈S

p(j|s, a221)h0(j)

= −Υ3(s)

Υ1,

where Υ3(s) = Υ13(s) + Υ2

3,

Υ13(s) = (µm1µd2 + µm1µm2 + µu1µm2)×

s−s∗−1∑

j=0

Σjdµ

B−s∗−ju1

(µs∗−1

d2 ∆ud + ∆um

( s∗−1∑

k=0

µkd2Σ

s∗−1−ku + µd1

s∗−2∑

k=0

µkd2Σ

s∗−2−ku

))≥ 0,

and Υ23 = −f(s∗ + 1). Note that the definition of s∗ implies that Υ2

3 ≥ 0. Thus, Υ3(s) ≥ 0

and −Υ3(s)Υ1

≤ 0. Moreover, if s > s∗, thenΥ3(s)Υ1

= 0 if and only if ∆ud = 0, which implies

28

that S∗ 6= {s∗}. Similarly, if s = s∗, thenΥ3(s)Υ1

= 0 if and only if f(s∗ + 1) = 0, which implies

that s∗, s∗ + 1 ∈ S∗ by the definition of S∗, the fact that f is non-increasing, and the fact that

s∗ = s ≤ B + 1. We now consider

r(s, a112) +∑

j∈S

p(j|s, a112)h0(j)−r(s, a221) +

∑

j∈S

p(j|s, a221)h0(j)

= −Υ4(s)

Υ1,

where

Υ4(s) = (∆ud + ∆um)

Σs∗−1

u

((µd1 + µm1)

s−s∗∑

j=0

Σjdµ

B+1−s∗−ju1 +

B+2−s∗∑

j=0

µju1Σ

B+2−s∗−jd

)+

µu2µs∗−1d2

B+1−s∑

j=0

µju1Σ

B+1−s∗−jd + ΣB+2−s∗

d

s∗−2∑

j=0

Σjuµs∗−1−j

d2

+

µu2Σ1∆um

s∗−2∑

j=0

µjd2Σ

s∗−2−ju

B+1−s∑

k=0

µku1Σ

B+1−s∗−kd + ∆md(µd1 + µm1)×

(Σ2

s∗−2∑

j=0

µjd2Σ

s∗−2−ju

B−s∑

k=0

µku1Σ

B+1−s∗−kd + Σs−s∗+1

d µB+1−su1

s∗−2∑

j=0

µjd2Σ

s∗−2−ju

)≥ 0.

Thus, −Υ4(s)Υ1


= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.For the action a121, we have

r(s, a121) +∑

j∈S

p(j|s, a121)h0(j)−r(s, a221) +

∑

j∈S

p(j|s, a221)h0(j)

= −Υ5(s)

Υ1,

where

Υ5(s) = ∆md

Σ1Σ2

s∗−2∑

j=0

µjd2Σ

s∗−2−ju

B+1−s∑

k=0

µku1Σ

B+1−s∗−kd + Σ2µ

s∗−1d2

B+1−s∑

j=0

µju1Σ

B+1−s∗−jd

+

Σ1∆ud

s−s∗∑

j=0

Σjdµ

B−ju1 + Σs−s∗

d µB+1−su1

s∗−2∑

j=0


u +

µm1

s∗−2∑

j=0

µju1Σ

s∗−2−ju

s−s∗∑

k=0

Σs−s∗+kd µB+1−s−k

u1

≥ 0.

Hence, −Υ5(s)Υ1


= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.Similarly,

r(s, a122) +∑

j∈S

p(j|s, a122)h0(j)−r(s, a221) +

∑

j∈S

p(j|s, a221)h0(j)

= −Υ6(s)

Υ1,

where

Υ6(s) = Σ1∆ud

µs∗

d2

s−s∗∑

j=0

Σjdµ

B−s∗−ju1 + µB−s+1

u1 Σs−s∗d

(s∗−2∑

j=0


u + µm1

s∗−2∑

j=0

µju1Σ

s∗−2−ju

) +

29

Σ2(∆md + ∆ud + ∆um)µs∗−1d2

B+1−s∑

j=0

µju1Σ

B+1−s∗−jd +

Σ1Σ2(∆md + ∆um)s∗−2∑

j=0

µjd2Σ

s∗−2−ju

B+1−s∑

k=0

µku1Σ

B+1−s∗−kd ≥ 0.

Then −Υ6(s)Υ1



now consider

r(s, a212) +∑

j∈S

p(j|s, a212)h0(j)−r(s, a221) +

∑

j∈S

p(j|s, a221)h0(j)

= −Υ7(s)

Υ1,

where Υ7(s) = Υ17(s) + Υ2

7,

Υ17(s) = (∆ud + ∆um)

Σ2µ

s∗−1d2

B+1−s∑

j=0

µju1Σ

B+1−s∗−jd +

(µd2µm1 + µm2µu1 + µm2µm1)µs∗−1d2

s−s∗−1∑

j=0

µB−s∗−ju1 Σj

d

+

Σ1∆um

Σ2

s∗−2∑

j=0

µjd2(µd1 + µm1)s∗−2−j

B+1−s∑

k=0

µku1Σ

B+1−s∗−kd +

(µd2µm1 + µm2µu1 + µm2µm1)s∗−2∑

j=0

µjd2Σ

s∗−2−ju

s−s∗−1∑

k=0

µB−s∗−ku1 (µm2 + µu2)k

≥ 0,

and Υ27 = −f(s∗ + 1) ≥ 0. Thus, Υ7(s) ≥ 0 and −Υ7(s)

Υ1≤ 0. Moreover,

Υ7(s)Υ1

= 0 if and only if

∆ud = 0, which implies that S∗ 6= {s∗}. Finally,

r(s, a222) +∑

j∈S

p(j|s, a222)h0(j)−r(s, a221) +

∑

j∈S

p(j|s, a221)h0(j)

= −Υ8(s)

Υ1,

where

Υ8(s) = Σ2

∆um

((Σs∗−1

u + µs∗−1d2 )

B+1−s∑

j=0

µju1Σ

B+1−s∗−jd +

(µd1 + µd2)s∗−2∑

j=0

µjd2Σ

s∗−2−ju

B+1−s∑

k=0

µku1Σ

B+1−s∗−kd

)+ ∆udµ

s∗−1d2

B+1−s∑

j=0

µju1Σ

B+1−s∗−jd

≥ 0.

Thus, −Υ8(s)Υ1


= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.This shows that δ1(s) = δ0(s) for all s∗ ≤ s ≤ B + 1.

We finally consider s = B + 2, for which we have δ0(B + 2) = a222. Define

Ψ = Σ1

Σs∗−1

u

B+2−s∗∑

j=0

µju1Σ

B+2−s∗−jd + ΣB+2−s∗

d

s∗−2∑

j=0


u

> 0.

30

We then have

r(s, aIII) +∑

j∈S

p(j|s, aIII)h0(j)−r(s, a222) +

∑

j∈S

p(j|s, a222)h0(j)

= −Σ2Ψ

Υ1< 0,

r(s, a2II) +∑

j∈S

p(j|s, a2II)h0(j)−r(s, a222) +

∑

j∈S

p(j|s, a222)h0(j)

= −(µm2 + µu2)Ψ

Υ1≤ 0,(18)

r(s, aI2I) +∑

j∈S

p(j|s, aI2I)h0(j)−r(s, a222) +

∑

j∈S

p(j|s, a222)h0(j)

= −(µd2 + µu2)Ψ

Υ1< 0,

r(s, aII2) +∑

j∈S

p(j|s, aII2)h0(j)−r(s, a222) +

∑

j∈S

p(j|s, a222)h0(j)

= −ΣdΨ

Υ1< 0,

r(s, a22I) +∑

j∈S

p(j|s, a22I)h0(j)−r(s, a222) +

∑

j∈S

p(j|s, a222)h0(j)

= −(µu2)Ψ

Υ1≤ 0, (19)

r(s, a2I2) +∑

j∈S

p(j|s, a2I2)h0(j)−r(s, a222) +

∑

j∈S

p(j|s, a222)h0(j)

= −(µm2)Ψ

Υ1≤ 0, (20)

r(s, aI22) +∑

j∈S

p(j|s, aI22)h0(j)−r(s, a222) +

∑

j∈S

p(j|s, a222)h0(j)

= −(µd2)Ψ

Υ1< 0.

Note that in (18), we have the expression equal to zero only when µm2 = µu2 = 0, in which case

a222 is identical to a2II ; in (19), we have the expression equal to zero only when µu2 = 0, in which

case a222 is identical to a22I ; and finally in (20), we have the expression equal to zero only when

µm2 = 0, in which case a222 is identical to a2I2. This shows that δ1(B + 2) = δ0(B + 2).

We have shown that δ1(s) = δ0(s) for all s ∈ S. By Theorem 9.5.1 of Puterman [16], this

proves that the policy described in Theorem 3.1 is optimal. In order to prove the uniqueness of the

optimal policy, we consider a decision rule δ′ that differs from δ0 in at least one state s ∈ S. As is

done in Lemma 9.2.4 of Puterman [16], define

u = Pδ′g0e− g0e = 0e,

v = rδ′ + (Pδ′ − I)h0 − g0e = rδ′ + Pδ′h0 − (rδ0 + Pδ0h0),

where we have used equation (11). Note that it follows from our derivations above that

v(s) ≤ 0, ∀s ∈ S, and S∗ = {s∗} ⇒ v(s) < 0, ∀s ∈ S with δ′(s) 6= δ0(s). (21)

Let g′ denote the gain of the stationary policy (δ′)∞, let P ∗δ′ be the limiting matrix under decision

rule δ′ (see Section A.4 of Puterman [16]), and define ∆g = g′−g0e. Suppose that Pδ′ has n recurrent

classes and partition Pδ′ such that P1, . . . , Pn correspond to transitions within recurrent classes,

Q1, . . . , Qn correspond to transitions from transient to recurrent classes and Qn+1 corresponds

to transitions between transient states. Also, partition g′, ∆g, v, and P ∗δ′ in a manner that is

consistent with this partition of Pδ′ . For example g′i is a vector of constants with appropriate

31

dimension denoting the gain in recurrent class i for 1 ≤ i ≤ n. Then we know from Lemma 9.2.5

of Puterman [16] that

∆gi = P ∗i vi, for all i = 1, . . . , n. (22)

Since Pδ0 is irreducible, it is clear that if δ′ differs from δ0 in at least one state s ∈ S, then δ′ must

differ from δ0 in at least one state s0 ∈ S that is recurrent under δ′. But then S∗ = {s∗} and

equations (21) and (22) imply that g′(s0)− g0 < 0, so that the decision rule δ′ cannot be optimal.

This proves that (δ0)∞ is the unique optimal policy when S∗ = {s∗}. 2

References

[1] Ahn, H.-S., I. Duenyas, and M. E. Lewis. 2002. The Optimal Control of a Two-Stage Tandem

Queueing System with Flexible Servers. Preprint .

[2] Ahn, H.-S., I. Duenyas, and R. Zhang. 1999. Optimal Stochastic Scheduling of a Two-Stage

Tandem Queue with Parallel Servers. Advances in Applied Probability, 31, 1095–1117.

[3] Ahn, H.-S., I. Duenyas, and R. Q. Zhang. 2002. Optimal Control of a Flexible Server. Preprint .

[4] Andradottir, S., H. Ayhan, and D. G. Down. 2001. Server Assignment Policies for Maximizing

the Steady-State Throughput of Finite Queueing Systems. Management Science, 47, 1421–

1439.

[5] Andradottir, S., H. Ayhan, and D. G. Down. 2002. Dynamic Server Allocation for Queueing

Networks with Flexible Servers. Under review.

[6] Bartholdi, III, J. J., and D. D. Eisenstein. 1996. A Production Line that Balances Itself.

Operations Research, 44, 21–34.

[7] Bartholdi, III, J. J., D. D. Eisenstein, and R. D. Foley. 2001. Performance of Bucket Brigades

when Work is Stochastic. Operations Research, 49, 710–719.

[8] Bell, S. L., and R. J. Williams. 2001. Dynamic Scheduling of a System with Two Parallel Servers

in Heavy Traffic with Complete Resource Pooling: Asymptotic Optimality of a Continuous

Review Threshold Policy. Annals of Applied Probability, 11, 608–649.

[9] Farrar, T. M. 1993. Optimal Use of an Extra Server in a Two Station Tandem Queueing

Network. IEEE Transactions on Automatic Control , 38, 1296–1299.

[10] Hajek, B. 1984. Optimal Control of Two Interacting Service Stations. IEEE Transactions on

Automatic Control , 29, 491–499.

[11] Harrison, J. M. and M. J. Lopez. 1999. Heavy Traffic Resource Pooling in Parallel-server

Systems. Queueing Systems, 33, 339–368.

32

[12] Mandelbaum, A., and A. L. Stolyar. 2002. Scheduling Flexible Servers with Convex Delay

Costs: Heavy-Traffic Optimality of the Generalized cµ-Rule. Preprint .

[13] McClain, J. O., L. J. Thomas, and C. Sox. 1992. “On-the-fly” Line Balancing with Very Little

WIP. International Journal of Production Economics, 27, 283–289.

[14] Ostolaza, J., J. O. McClain, and L. J. Thomas. 1990. The Use of Dynamic (State-Dependent)

Assembly-Line Balancing to Improve Throughput. J. Mfg. Oper. Mgt., 3, 105–133.

[15] Pandelis, D. G., and D. Teneketzis. 1994. Optimal Multiserver Stochastic Scheduling of Two

Interconnected Priority Queues. Advances in Applied Probability, 26, 258–279.

[16] Puterman, M. L. 1994. Markov Decision Processes. John Wiley & Sons, New York, NY.

[17] Rosberg, Z., P. P. Varaiya, and J. C. Walrand. 1982. Optimal Control of Service in Tandem

Queues. IEEE Transactions on Automatic Control , 27, 600–609.

[18] Squillante, M. S., C. H. Xia, D. D. Yao, and L. Zhang. 2000. Threshold Based Priority Policies

for Parallel-server Systems with Affinity Scheduling. Extended abstract .

[19] Van Oyen, M. P., E. G. S. Gel, and W. J. Hopp. 2001. Performance Opportunity for Workforce

Agility in Collaborative and Noncoolabortive Work Systems. IIE Transactions, 33, 761–777.

[20] Williams, R. J. 2000. On Dynamic Scheduling of a Parallel Server System with Complete

Resource Pooling. In Analysis of Communication Networks: Call Centres, Traffic and Perfor-

mance, D. R. McDonald and S. R. E. Turner (eds.), Fields Institute Communications Volume

28, American Mathematical Society, 49–71.

[21] Zavadlav, E., J. O. McClain, and L. J. Thomas. 1996. Self-buffering, Self-balancing, Self-

flushing Production Lines. Management Science, 42, 1151–1164.

33

Documents

Throughput Maximization for Tandem Lines with Two Stations ... · Throughput Maximization for Tandem Lines with Two Stations and Flexible Servers Sigrun´ Andrad´ottir and Hayriye