Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
Throughput Maximization for Tandem Lines with Two
Stations and Flexible Servers
Sigrun Andradottir and Hayriye Ayhan
School of Industrial and Systems Engineering
Georgia Institute of Technology
Atlanta, GA 30332-0205, U.S.A.
December 30, 2002
Abstract
For a Markovian queueing network with two stations in tandem, finite intermediate buffer,and M flexible servers, we study how the servers should be assigned dynamically to stations inorder to obtain optimal long-run average throughput. We assume that each server can work ononly one job at a time, that several servers can work together on a single job, and that the traveltimes between stations are negligible. Under these assumptions, we completely characterize theoptimal policy for systems with three servers. We also provide a conjecture for the structure ofthe optimal policy for systems with four or more servers that is supported by extensive numericalevidence. Finally, we develop heuristic server assignment policies for systems with three or moreservers that are easy to implement, robust with respect to the server capabilities, and generallyappear to yield near-optimal long-run average throughput.
1 Introduction
We consider a tandem queueing network with two stations and M servers. There is an infinite
supply of jobs in front of station 1, infinite room for completed jobs after station 2, and a finite
buffer of size 0 ≤ B < ∞ between stations 1 and 2. We assume that at any given time, there can
be at most one job in service at each station and that each server can work on at most one job.
Moreover, we assume that each server i ∈ {1, . . . , M} works at a deterministic rate µij ∈ [0,∞)
at each station j ∈ {1, 2}. Hence, server i is trained to work at station j if µij > 0. We assume
that several servers can work together on a single job, in which case their service rates are additive.
The service times of the different jobs at station j ∈ {1, 2} are independent and exponentially
distributed random variables with rate µ(j), and service times at stations 1 and 2 are independent.
Without loss of generality, we assume that µ(1) = µ(2) = 1. Finally, we assume that the network
operates under the manufacturing blocking mechanism.
Our objective in this paper is to determine the dynamic server assignment policy that maximizes
the long-run average throughput of the queueing system described above. For simplicity, we assume
1
that the travel and setup times associated with servers moving from one station to the other one
are negligible.
Andradottir, Ayhan, and Down [4] identify the optimal server assignment policies for M ≤ 2.
In particular, when M = 1, then any non-idling server assignment policy is optimal, and when
M = 2, the optimal policy involves assigning one server to work at each station in such a way
that the product of the server rates at the stations they are assigned to is maximized, with the
servers only working at the other station (that they are not assigned to) when there is no work
to be done at the station they are assigned to (due to blocking or starving). Consequently, this
paper is focused on the situation when M ≥ 3, so that the queueing network has more servers than
stations. We shall see that when M ≥ 3, then the optimal policy is more complicated than when
M = 2 in that servers may move away from a station when there is still work to do at that station
(see Sections 3 and 4 below).
Much of the existing work in the area of optimal dynamic assignment of servers to queues is
focused on parallel queues. In particular, for a two-class queueing system with one dedicated server,
one flexible server, and no exogenous arrivals, Ahn, Duenyas, and Zhang [3] characterize the server
assignment policy that minimizes the expected total holding cost incurred until all jobs initially
present in the system have departed. Moreover, under the heavy traffic assumption, Harrison
and Lopez [11], Bell and Williams [8], Williams [20], and Mandelbaum and Stolyar [12] develop
asymptotically optimal server assignment policies that minimize the discounted infinite-horizon
holding cost for parallel queueing systems with flexible servers and outside arrivals. Finally, under
the assumption of heavy traffic, Squillante et al. [18] use simulation to study the performance of
threshold-type policies for systems that consist of parallel queues.
Most of the papers that have considered the optimal assignment of multiple servers to multiple
interconnected queues focus on minimizing holding costs. In particular, for systems with two
queues in tandem and no arrivals, Farar [9], Pandelis and Teneketzis [15], and Ahn, Duenyas,
and Zhang [2] study how servers should be assigned to stations to minimize the expected total
holding cost incurred until all jobs leave the system. Moreover, Rosberg, Varaiya, and Walrand
[17], Hajek [10], and more recently, Ahn, Duenyas, and Lewis [1] study the assignment of (service)
effort to minimize holding costs in the two-station setting with Poisson arrivals. To the best of
our knowledge, Andradottir, Ayhan, and Down [4, 5] are the only two papers that consider the
dynamic assignment of servers to maximize the long-run average throughput in queueing networks
with flexible servers. In particular, Andradottir, Ayhan, and Down [4] characterize the optimal
dynamic server assignment policy for a two-stage finite tandem queue with two servers and also
present a simple server assignment heuristic for finite tandem queues with an equal number of servers
and stations. For more general queueing networks with infinite buffers, Andradottir, Ayhan, and
Down [5] develop dynamic server assignment policies that guarantee a capacity arbitrarily close to
the maximal capacity.
Other research on dynamic server assignment policies includes the work of Ostalaza, McClain,
2
and Thomas [14], McClain, Thomas, and Sox [13], and Zavadlav, McClain, and Thomas [21] on
dynamic line balancing. In particular, Ostalaza, McClain, and Thomas [14] and McClain, Thomas,
and Sox [13] study dynamic line balancing in tandem queues with shared tasks that can be per-
formed at either of two successive stations. This work was continued by Zavadlav, McClain, and
Thomas [21], who study several server assignment policies for systems with fewer servers than sta-
tions, in which all servers trained to work at a particular station have the same capabilities at that
station. Moreover, assuming that each server has a service rate that does not depend on the task
(s)he is working on, Bartholdi and Eisenstein [6] define the “bucket brigades” server assignment
policy and show that under this policy, a stable partition of work will emerge yielding optimal
throughput. Finally, Bartholdi, Eisenstein, and Foley [7] show that the behavior of the bucket
brigades policy, applied to systems with discrete tasks and exponentially distributed task times,
resembles that of the same policy applied in the deterministic setting with infinitely divisible jobs.
The remainder of this paper is organized as follows: In Section 2, we formulate the server
assignment problem considered in this paper as a Markov decision problem. In Section 3, we
provide an optimal server assignment policy for systems with two stations and three servers. In
Section 4, we present a conjecture for the structure of an optimal server assignment policy for
systems with two stations and four or more servers. Section 5 contains some reversibility results for
tandem lines with two stations and arbitrary numbers of servers. In Section 6, we present numerical
results that support the optimality of the policy proposed in Section 4, describe some properties of
this policy, and study heuristic policies that appear to yield near-optimal performance and involve
grouping all available servers into two or three teams. Section 7 contains some concluding remarks.
Finally, the proof of the main result in this paper is given in the Appendix.
2 Problem Formulation
Let Π be the set of server assignment policies under consideration, and for all π ∈ Π and t ≥ 0, let
Dπ(t) be the number of departures under policy π by time t, and let
T π = limt→∞
IE[Dπ(t)]t
(1)
be the long-run average throughput corresponding to the server assignment policy π. We are
interested in solving the optimization problem
maxπ∈Π
T π. (2)
For all π ∈ Π, consider the stochastic process {Xπ(t) : t ≥ 0}, where Xπ(t) = 0 if there is a job
to be processed at station 1, the number of jobs waiting to be processed between stations 1 and
2 is 0, and station 2 is starved at time t; Xπ(t) = s for 1 ≤ s ≤ B + 1 if there are jobs to be
processed at both stations 1 and 2 and in the buffer there are s − 1 jobs waiting to be processed
at time t; finally, Xπ(t) = B + 2 if station 1 is blocked, B jobs are waiting to be processed in the
3
buffer, and there is a job to be processed at station 2 at time t. For the remainder of this paper, we
assume that the class Π of server assignment policies under consideration consists of all Markovian
stationary deterministic policies corresponding to the state space S = {0, 1, 2, . . . , B + 2} of the
stochastic process {Xπ(t) : t ≥ 0}.It is clear that for all π ∈ Π, {Xπ(t) : t ≥ 0} is a continuous time Markov chain and that there
exists a scalar qπ ≤ ∑Mi=1 max1≤j≤2 µij < ∞ such that the transition rates {qπ(x, x′)} of {Xπ(t)}
satisfy∑
x′∈S,x6=x′ qπ(x, x′) ≤ qπ for all x ∈ S. Hence, {Xπ(t)} is uniformizable for all π ∈ Π. Let
{Y π(k)} be the corresponding discrete time Markov chain, so that {Y π(k)} has state space S and
transition probabilities pπ(x, x′) = qπ(x, x′)/qπ if x 6= x′ and pπ(x, x) = 1−∑x′∈S,x6=x′ q
π(x, x′)/qπ
for all x ∈ S. It has been shown by Andradottir, Ayhan, and Down [4] that since {Xπ(t)}is uniformizable, the original optimization problem in (2) can be translated into an equivalent
(discrete time) Markov decision problem. More specifically, let
Rπ(x) =
qπ(x, x− 1) for x ∈ {1, . . . , B + 2},0 for x = 0,
be the departure rate from state x under policy π, for all x ∈ S and π ∈ Π. Then the optimization
problem (2) has the same solution as the Markov decision problem
maxπ∈Π
limK→∞
IE
[1K
K∑
k=1
Rπ(Y π(k − 1))
]. (3)
In other words, Andradottir, Ayhan, and Down [4] showed that maximizing the steady-state
throughput of the original queueing system is equivalent to maximizing the steady-state depar-
ture rate for the associated embedded (discrete time) Markov chain.
In the next two sections, we characterize the dynamic server assignment policies that solve the
optimization problem (3) for two-stage tandem queues with M ≥ 3 servers. We consider the case
when M = 3 in Section 3 and the case when M > 3 in Section 4.
3 Two Stations and Three Servers
In this section, we consider the special case of a tandem Markovian queueing network with two
stations and three servers. Since the number of possible states and actions are both finite, the exis-
tence of an optimal Markovian stationary deterministic policy follows immediately from Theorem
9.1.8 of Puterman [16].
We assume that for all i ∈ {1, 2, 3}, either µi1 > 0 or µi2 > 0. (If there exists a server i such
that µi1 = µi2 = 0, then the problem reduces to having two servers, for which the optimal policy
is given in Andradottir, Ayhan, and Down [4].) Without loss of generality, we also assume that
there exist i, k ∈ {1, 2, 3} such that µi1 > 0 and µk2 > 0. (Note that if µ11 = µ21 = µ31 = 0 or
µ12 = µ22 = µ32 = 0, then the throughput is zero and any policy is optimal.) Define d as
d ∈ D = arg mini∈{1,2,3}
{µi1
µi2
}.
4
The above assumptions on the service rates guarantee thatµd1
µd2< ∞. Similarly, define m as
m ∈ M = arg mini∈{1,2,3}\{d}
{µi1
µi2
}.
Note that if µi2 = 0 for all i ∈ {1, 2, 3}\{d}, then M = {1, 2, 3}\{d}. Finally, define u ∈{1, 2, 3}\{d,m}. For reasons that become clear in Theorem 3.1, “u” stands for “upstream,” “d”
stands for “downstream,” and “m” stands for “moving.” Note that the definitions of d and m
imply that µi1µd2−µd1µi2 ≥ 0 for all i ∈ {1, 2, 3} and µi1µm2−µm1µi2 ≥ 0 for all i ∈ {1, 2, 3}\{d}.Moreover, from our assumptions on the service rates and from the definitions of d, m, and u, we
have that µd2 > 0 and µu1 > 0.
For fixed d, m, and u, and for all i ∈ {0, 1, . . .}, define
f(i) = µi−2d2 (µm1µd2 − µd1µm2)(µd2 + µm2 + µu2)
B−i+2∑
j=0
µju1(µd2 + µm2)B−i−j+2 −
µB−i+2u1 (µu1µm2 − µm1µu2)(µd1 + µm1 + µu1)
i−2∑
j=0
µjd2(µm1 + µu1)i−j−2, (4)
with the convention that summation over an empty set equals 0. Note that
f(i) ≥ 0 for i ≤ 1,
f(i) ≤ 0 for i ≥ B + 3.
Throughout our developments, we let As denote the set of allowable actions in state s ∈ S (see
the first paragraph of the proof of Theorem 3.1 in the Appendix) and (δ)∞ denote the policy
corresponding to the decision rule δ, which is a (B + 3)-dimensional vector whose components
δ(s) ∈ As specify what action in As should be applied in state s for all s ∈ S.
Remark 3.1 For i = 1, . . . , B + 2, let T (δi)∞ be the throughput of the policy (δi)∞ such that
δi(s) =
servers d,m, and u work at station 1 for s = 0,
servers m and u work at station 1, server d works at station 2 for 1 ≤ s ≤ i− 1,
server u works at station 1, servers d and m work at station 2 for i ≤ s ≤ B + 1,
servers d,m, and u work at station 2 for s = B + 2.(5)
Then sign(f(i)) = sign(T (δi)∞ − T (δi−1)∞) for all i = 2, . . . , B + 2. This follows with some algebra
from the expression for g0 given in the proof of Theorem 3.1 (see equation (13)).
Let
S∗ ={s ∈ S\{0} : f(s) ≥ 0 and f(s + 1) ≤ 0
}.
The following result follows directly from the fact that f(1) ≥ 0 and f(B + 3) ≤ 0.
5
Proposition 3.1 S∗ 6= ∅.
We are now ready to state the theorem that characterizes the optimal server assignment policy.
Theorem 3.1 Define s∗ ∈ S∗ and let δ∗(s) = δs∗(s) for all s ∈ S (see equation (5)). Then (δ∗)∞
is optimal in the class of Markovian stationary deterministic policies. Moreover, this is the unique
optimal policy if S∗ = {s∗}.
Remark 3.2 Theorem 3.1 shows that in the optimal policy the “upstream” server u works at the
upstream station 1 unless that station is blocked, the “downstream” server d works at the down-
stream station 2 unless that station is starved, and the “moving” server m works at the upstream
station 1 when the number of jobs in the buffer is small and then moves to the downstream station
2 when the number of jobs in the buffer has become sufficiently large. Note that the definitions
of d, m, and u imply that the server whose service rate at the upstream/downstream station is
relatively the largest (relative to the server’s rate at the other station) should be assigned to the
upstream/downstream station, and the server whose service rates at the upstream and downstream
stations are relatively the most balanced should move between the two stations depending on the
content of the buffer. Note also that the optimal policy in the case when M = 2 is essentially the
above policy without a moving server, see Andradottir, Ayhan, and Down [4].
The proof of Theorem 3.1 is presented in the Appendix. We now present a proposition which
illustrates some properties of S∗. Throughout our developments, #A denotes the cardinality of any
set A.
Proposition 3.2 (i) If #D = 1 and #M = 1, then S∗ has at most two elements, and if S∗ has
two elements, then these are two consecutive states.
(ii) If #D = 1 and #M = 2, then S∗ = {B + 2}.(iii) If #D = 2, then #M = 1 and S∗ = {1}.(iv) If #D = 3, then #M = 2 and S∗ = S\{0}.
Proof: Note first that for all s ∈ S \ {0},
f(s)− f(s + 1)
= (µm1µd2 − µd1µm2)(µd2 + µm2 + µu2)µs−2d2
µm2
B−s+1∑
j=0
µju1(µd2 + µm2)B−s−j+1 + µB−s+2
u1
+
(µu1µm2 − µm1µu2)(µd1 + µm1 + µu1)µB−s+1u1
µm1
s−2∑
j=0
µjd2(µm1 + µu1)s−j−2 + µs−1
d2
≥ 0. (6)
Hence, f(s) is non-increasing in s ∈ S \ {0}.(i) It follows from #D = 1 and #M = 1 that µm1µd2−µd1µm2 > 0 and µu1µm2−µm1µu2 > 0.
Then equation (6) implies that f(s) is (strictly) decreasing in s. Let s∗ = minS∗. If s∗ = B + 2,
6
then there is nothing to prove, so assume that s∗ ∈ {1, . . . , B + 1}. From the definition of s∗, we
have that f(s∗ + 1) ≤ 0. If f(s∗ + 1) < 0, then f(s) < 0 for all s ≥ s∗ + 1 and hence S∗ = {s∗}.On the other hand, if f(s∗ + 1) = 0 then f(s) < 0 for all s ≥ s∗ + 2 and hence S∗ = {s∗, s∗ + 1}.
(ii) It follows from #D = 1 and #M = 2 that µm1µd2−µd1µm2 > 0 and µu1µm2−µu2µm1 = 0.
Then equation (4) implies that f(s) > 0 for all s = 1, . . . , B +2 and f(B +3) = 0. Thus, s = B +2
is the only s ∈ S\{0} such that f(s) ≥ 0 and f(s + 1) ≤ 0.
(iii) Since #D = 2, we have
µd1
µd2=
µm1
µm2<
µu1
µu2.
Hence, #M = 1, µm1µd2−µd1µm2 = 0, and µu1µm2−µm1µu2 > 0. Then equation (4) implies that
f(1) = 0 and f(s) < 0 for all s ≥ 2. Thus, s = 1 is the only s ∈ S\{0} such that f(s) ≥ 0 and
f(s + 1) ≤ 0.
(iv) Since #D = 3, we have
µd1
µd2=
µm1
µm2=
µu1
µu2.
Hence, #M = 2, µm1µd2−µd1µm2 = 0, and µu1µm2−µm1µu2 = 0. Then equation (4) implies that
f(s) = 0 for all s ≥ 0, so that S∗ = S\{0}. 2
Remark 3.3 It immediately follows from Proposition 3.2 that in order to have S∗ = {s∗}, and
hence to have a unique optimal policy, it is necessary to have #D ≤ 2 and sufficient to have either
#D = 2 or #D = 1 and #M = 2. Moreover, note that when either #D = 2 or #D = 1 and
#M = 2, then the optimal policy is unique even though the servers d, m, and u are not uniquely
defined in these cases. In particular, when #D = 2, then D = {d,m} and Theorem 3.1 and
Proposition 3.2 imply that servers d and m are both at station 2 in all states s ∈ {1, 2, . . . , B + 2}(since S∗ = {s∗} = {1} in this case). Similarly, when #D = 1 and #M = 2, then M = {m,u}and Theorem 3.1 and Proposition 3.2 imply that servers m and u are both at station 1 in all states
s ∈ {0, 1, . . . , B + 1} (since S∗ = {s∗} = {B + 2} in this case).
4 Two Stations and More than Three Servers
In this section, we provide a conjecture for the structure of the optimal server assignment policy
for systems with M > 3 servers and two stations. Let L = {1, . . . , M}. Without loss of generality,
we again assume (as in Section 3) that for all i ∈ {1, . . . ,M}, either µi1 > 0 or µi2 > 0 (because if
there exists a server i such that µi1 = µi2 = 0, then the problem reduces to having M − 1 servers).
Moreover, we again assume that there exist i, k ∈ {1, . . . ,M} such that µi1 > 0 and µk2 > 0
(because if µ11 = · · · = µM1 = 0 or µ12 = · · · = µM2 = 0, then the maximal throughput is zero and
any policy is optimal). Let
l1 ∈ L1 = arg minL
{µi1
µi2
}, (7)
7
and for 2 ≤ j ≤ M , let
lj ∈ Lj = arg minL\{l1,...,lj−1}
{µi1
µi2
}. (8)
Then we conjecture that there exist 1 = s∗1 ≤ s∗2 ≤ s∗3 ≤ · · · ≤ s∗M−1 ≤ s∗M = B + 2 such that the
server assignment policy (δ∗)∞ given as
δ∗(s) =
servers l1, . . . , lM work at station 1 for s = 0,
servers l2, . . . , lM work at station 1, server l1 works at station 2 for 1 ≤ s ≤ s∗2 − 1,
servers l3, . . . , lM work at station 1, servers l1, l2 work at station 2 for s∗2 ≤ s ≤ s∗3 − 1,
servers l4, . . . , lM work at station 1, servers l1, l2, l3 work at station 2 for s∗3 ≤ s ≤ s∗4 − 1,...
...
server lM works at station 1, servers l1, . . . , lM−1 work at station 2 for s∗M−1 ≤ s ≤ B + 1,
servers l1, . . . , lM work at station 2 for s = B + 2,(9)
is optimal. It is clear from equations (7) and (8) that it is easy to determine l1, . . . , lM for any
particular problem. Moreover, once l1, . . . , lM are determined, then one can compute s∗2, . . . , s∗M−1
by considering all the possibilities and choosing the one that provides the best throughput. This
procedure requires much less effort than determining the optimal policy without knowing this
structure, which is conjectured to be optimal. In particular, we now need to compute the throughput
of ( B + M − 1M − 2 ) policies, rather than considering all 3M(B+3) Markovian stationary deterministic
policies (or all 2M(B+1) non-idling policies). Note that the optimal policy for M = 3 specified in
Section 3 and the optimal policy for M = 2 given by Andradottir, Ayhan, and Down [4] agree with
the conjecture given above (when M = 2, server l1 should be at station 2 and server l2 at station 1
for all 1 ≤ s ≤ B + 1). Moreover, the extensive numerical examples given in Section 6 demonstrate
that the conjectured policy also appears to be optimal for systems with M > 3.
5 Reversibility of Two Station Tandem Lines with Flexible Servers
Suppose that the original (forward) line has two stations and M ≥ 1 servers and consider the
reversed line in which station 2 is followed by station 1 (note that we have not relabeled the stations
or changed the size of the buffer). Suppose that the original and reversed lines operate under the
Markovian stationary deterministic server assignment policies π and πR, respectively. Let δ and δR
be the decision rules associated with the policies π and πR, respectively (so that δ and δR specify
what servers are assigned to stations 1 and 2, respectively, as a function of the state s ∈ S of the
two systems). Let {XπRR (t)} be the Markov chain model for the reversed line corresponding to the
model {Xπ(t)} specified in Section 2 for the original line (for example, XπRR (t) = s ∈ {1, . . . , B +1}
if at time t, the reversed line has jobs to be processed at both stations and s − 1 jobs waiting to
be processed in the buffer) and let T πRR be the long-run average throughput under policy πR in the
reversed line (note that T π and T πRR may depend on the initial states of the Markov chains {Xπ(t)}
8
and {XπRR (t)}, respectively, if these Markov chains have more than one recurrent equivalence class).
Throughout this section, we will assume that δR(s) = δ(B + 2− s), for all s ∈ S. We have
Proposition 5.1 If B + 2−XπRR (0) belongs to the same recurrent equivalence class of the Markov
chain {Xπ(t)} as Xπ(0), then
T πRR = T π.
Proof: For all s ∈ S, let κπ1 (s) and κπ
2 (s) denote the sets of servers assigned to stations 1 and 2,
respectively, in state s of the original line under the policy π. It is clear that the stochastic process
{Xπ(t)} is a birth-death process with state space S. For all s ∈ S, let λπ(s) and γπ(s) denote the
birth and death rates in state s, respectively. Then λπ(B + 2) = γπ(0) = 0 and
λπ(s) =∑
i∈κπ1 (s)
µi1, for s = 0, . . . , B + 1,
γπ(s) =∑
i∈κπ2 (s)
µi2, for s = 1, . . . , B + 2.
Moreover, {XπRR (t)} is also a birth-death process with state space S. For all s ∈ S, let λπR
R (s)
and γπRR (s) denote the birth and death rates in state s of the reversed line, respectively. Then the
assumption that δR(s) = δ(B + 2− s), for all s ∈ S, implies that λπRR (B + 2) = γπR
R (0) = 0 and
λπRR (s) =
∑
i∈κπ2 (B+2−s)
µi2, for s = 0, . . . , B + 1,
γπRR (s) =
∑
i∈κπ1 (B+2−s)
µi1, for s = 1, . . . , B + 2,
and hence that
λπRR (s) = γπ(B + 2− s) and γπR
R (s) = λπ(B + 2− s), for all s ∈ S.
Now suppose that Xπ(0) ∈ Eπ, where Eπ ⊂ S is a recurrent equivalence class of the Markov chain
{Xπ(t)}. Then EπR = {s ∈ S : B+2−s ∈ Eπ} is a recurrent equivalence class of the Markov chain
{XπRR (t)} with XπR
R (0) ∈ EπR . Let {pπ(s) : s ∈ Eπ} be the stationary distribution for {Xπ(t)} on
Eπ and let {pπRR (s) : s ∈ EπR} be the stationary distribution for {XπR
R (t)} on EπR . It is clear that
pπRR (s) = pπ(B + 2− s) for all s ∈ EπR . Therefore, XπR
R (0) ∈ EπR and Xπ(0) ∈ Eπ imply that
T πRR =
∑
s∈EπR
γπRR (s)pπR
R (s) =∑
s∈EπR
λπ(B + 2− s)pπ(B + 2− s) =∑
s∈Eπ
λπ(s)pπ(s) = T π,
and the proof is complete. 2
Let ΠF ⊂ Π be the class of all non-idling threshold policies for the forward system, so that for
all π ∈ ΠF and i = 1, . . . ,M , there exists tπi ∈ {1, . . . , B + 2} such that server i is at the upstream
station 1 in all states s < tπi and at the downstream station 2 in all states s ≥ tπi . For all π ∈ ΠF ,
9
let lπ1 , . . . , lπM be such that {lπ1 , . . . , lπM} = {1, . . . ,M} and 1 ≤ sπ1 ≤ · · · ≤ sπ
M ≤ B + 2, where
sπi = tπlπi
, for i = 1, . . . , M (see equation (9)). Hence, lπ1 is the first server to move from station 1 to
station 2 and lπM is the last server to do so under the policy π ∈ ΠF . Let ΠR ⊂ Π and tπi , lπi , and
sπi , where π ∈ ΠR and i = 1, . . . , M , be defined in the same manner for the reverse system. We
have
Proposition 5.2 (i) π ∈ ΠF if and only if πR ∈ ΠR.
(ii) For all i = 1, . . . , M and s = 1, . . . , B + 2, tπi = s if and only if tπRi = B + 3− s.
(iii) For all i = 1, . . . , M , lπi = lπRM+1−i.
(iv) For all i = 1, . . . , M and s = 1, . . . , B + 2, sπi = s if and only if sπR
M+1−i = B + 3− s.
Proof: We have that tπi = s if and only if δ(s′) has server i at station 1 for all s′ ≤ s − 1 and
δ(s′) has server i at station 2 for all s′ ≥ s. The definition of πR now implies that tπi = s if and
only if δR(B + 2 − s′) has server i at station 1 for all s′ ≤ s − 1 and δR(B + 2 − s′) has server
i at station 2 for all s′ ≥ s. This proves parts (i) and (ii) of the proposition. Moreover, since
tπi1 ≤ tπi2 implies that tπRi1
= B + 3 − tπi1 ≥ B + 3 − tπi2 = tπRi2
, it is clear that the order in which
the servers move from the upstream station to the downstream station is reversed under πR, and
part (iii) of the proposition follows. Finally, from (ii) and (iii), we have sπi = tπlπi
= s if and only if
sπRM+1−i = tπR
lπRM+1−i
= tπRlπi
= B + 3 − s, for all i = 1, . . . , M and s = 1, . . . , B + 2, proving that part
(iv) of the proposition holds. 2
We now present several corollaries that follow from Propositions 5.1 and 5.2. Recall that a policy
is optimal if it leads to the maximal throughput regardless of the initial state of the underlying
Markov chain, see equations (2) and (3).
Corollary 5.1 The policy π is optimal in the forward system if and only if the policy πR is optimal
in the reversed system.
Proof: If πR is not optimal in the reversed system, then there exists a policy π′R with decision
rule δ′R such that Tπ′RR > T πR
R , at least for some initial states XπRR (0) and X
π′RR (0). Let π′ (with
decision rule δ′) be the policy in the forward system such that δ′(s) = δ′R(B + 2− s) for all s ∈ S.
Proposition 5.1 now implies that if Xπ(0) and B +2−XπRR (0) belong to the same equivalence class
of {Xπ(t)} and also Xπ′(0) and B + 2−Xπ′RR (0) belong to the same equivalence class of {Xπ′(t)},
then
T π′ = Tπ′RR > T πR
R = T π,
which contradicts our assumption that π is an optimal policy for the forward system. 2
Corollary 5.2 If the service rates µij, where i = 1, . . . ,M and j = 1, 2, are drawn independently
from a certain distribution, then the probability that π is optimal in the forward system is equal to
the probability that πR is optimal in the reversed system.
10
Let µ be the M×2 matrix containing the service rates µij , where i = 1, . . . , M and j = 1, 2, and
let µR be the M ×2 matrix containing the two columns of µ in the reverse order (corresponding to
reversing the order of the two stations). For all µ and π ∈ Π, let T π(µ) be the throughput of the
system operated under the server assignment policy π when the service rates µij , where i = 1, . . . ,M
and j = 1, 2, are given by µ, see equation (1). Moreover, for all µ, let Π∗(µ) ⊂ ΠF ⊂ Π be the set of
policies conjectured to be optimal in Section 4 for the forward system with service rates µ. Assume
that we pick a policy π∗(µ) ∈ Π∗(µ) at random. For all µ, i = 1, . . . , M , and s = 1, . . . , B + 2, let
N(µ) be the number of policies in Π∗(µ), let Ni,s(µ) be the number of policies in Π∗(µ) such that
s∗i = s, and let s∗i (µ) be the value of s∗i corresponding to the policy π∗(µ), see equation (9) (note
that s∗i = sπ∗(�)i ). We have
Corollary 5.3 Suppose that the service rates µij, where i = 1, . . . , M and j = 1, 2, are drawn
independently from a certain distribution and that we choose a policy π∗(µ) from Π∗(µ) at random.
Then, IP{s∗i (µ) = s} = IP{s∗M+1−i(µ) = B + 3− s} and IE[s∗i (µ)] + IE[s∗M+1−i(µ)] = B + 3 for all
i = 1, . . . , M and s = 1, . . . , B + 2.
Proof: Parts (i) and (iv) of Proposition 5.2 and the fact that π∗(µ) is chosen from Π∗(µ) at
random imply that
IP{s∗i (µ) = s} = IE[IP{s∗i (µ) = s|µ}] = IE[Ni,s(µ)/N(µ)] = IE[NM+1−i,B+3−s(µR)/N(µR)]
= IP{s∗M+1−i(µR) = B + 3− s} = IP{s∗M+1−i(µ) = B + 3− s},
for all i = 1, . . . , M and s = 1, . . . , B + 2, where the last step follows from the fact that µ and µR
are identically distributed. Moreover, we have
IE[s∗i (µ)] =B+2∑
s=1
sIP{s∗i (µ) = s} =B+2∑
s=1
sIP{s∗M+1−i(µ) = B + 3− s}
=B+2∑
s=1
(B + 3− s)IP{s∗M+1−i(µ) = s} = B + 3− IE[s∗M+1−i(µ)],
for all i = 1, . . . , M , and the proof is complete. 2
We now consider server assignment policies that involve grouping several servers into teams,
where all servers in a team will move together between the two stations in the system. The following
corollary clearly follows from Propositions 5.1 and 5.2.
Corollary 5.4 If the servers in a set C ⊂ {1, . . . , M} are a team in an optimal policy for the
forward system, then they are also a team in an optimal policy for the reversed system.
For all µ, let l1(µ), . . . , lM (µ) be the ordered servers when the service rates are given by µij ,
for i = 1, . . . , M and j = 1, 2, see equations (7) and (8) (if equations (7) and (8) do not specify
l1(µ), . . . , lM (µ) uniquely, then ties can be broken arbitrarily, as long as this is done consistently
11
in the forward and reverse systems so that li(µ) = lM+1−i(µR) for all i = 1, . . . , M). For all
K = 1, . . . , M , let
NK = {(n1, . . . , nK) ∈ INK : n1, . . . , nK ≥ 1 and n1 + · · ·+ nK = M}. (10)
Moreover, for all µ, K ∈ {1, . . . , M}, and (n1, . . . , nK) ∈ NK , let Π(n1,...,nK)(µ) ⊂ Π be the set of all
policies with K teams, where each team k ∈ {1, . . . , K} consists of servers lΣk−1j=1 nj+1(µ), . . . , lΣk
j=1nj(µ)
(and hence has nk servers), all servers in each team k ∈ {1, . . . , K} are at station 1 in all states
s < sk(µ) and at station 2 in all states s ≥ sk(µ), and the switch points s1(µ), . . . , sK(µ) sat-
isfy 1 ≤ s1(µ) ≤ · · · ≤ sK(µ) ≤ B + 2. For example, the first team to switch from station 1
to station 2 consists of servers l1(µ), . . . , ln1(µ) and the last team to switch consists of servers
lM−nK+1(µ), . . . , lM (µ). For all µ and (n1, . . . , nK) ∈ NK , let π(n1,...,nK)(µ) ∈ Π(n1,...,nK)(µ) be
any policy with the team structure described above and with the switch point s(n1,...,nK)k (µ) of
each team k ∈ {1, . . . , K} chosen optimally (if there are multiple sets of optimal switch points
s1(µ) ≤ · · · ≤ sK(µ), then we choose a policy π(n1,...,nK)(µ) arbitrarily from all policies with the
prescribed team structure and optimal switch points). Let π(n1,...,nK)R (µ) be the corresponding
policy in the reversed system. Then we have the following corollary:
Corollary 5.5 Suppose that K ∈ {1, . . . , M}, (n1, . . . , nK) ∈ NK , the service rates µij, where
i = 1, . . . , M and j = 1, 2, are drawn independently from a certain distribution, and the Markov
chains {Xπ(n1,...,nK )(�)(t)} and {Xπ(nK,...,n1)(�)(t)} have only one recurrent equivalence class. Then,
IP{π(n1,...,nK)(µ) optimal} = IP{π(nK ,...,n1)(µ) optimal};IP{T π(n1,...,nK )(�)(µ) ≥ T πn(�)(µ), ∀n ∈ NK} = IP{T π(nK,...,n1)(�)(µ) ≥ T πn(�)(µ),∀n ∈ NK};
IE[T π(n1,...,nK )(�)(µ)] = IE[T π(nK,...,n1)(�)(µ)].
Proof: ¿From equations (7) and (8) and parts (ii) and (iii) of Proposition 5.2, it is clear that
π(n1,...,nK)R (µ) is equivalent to π(nK ,...,n1)(µR) (the only possible difference being that if there are
multiple optimal sets of switch points, then we pick one such set arbitrarily in each of π(n1,...,nK)(µ)
and π(nK ,...,n1)(µR)). Hence, Proposition 5.1 and the fact that µ and µR are identically distributed
imply that
IP{π(n1,...,nK)(µ) optimal} = IP{π(n1,...,nK)R (µ) optimal} = IP{π(nK ,...,n1)(µR) optimal}
= IP{π(nK ,...,n1)(µ) optimal}.
The other two conclusions of the corollary can be proved in a similar manner. 2
6 Numerical Results
In this section, we provide numerical results for systems with two stations and M ≥ 3 servers. In
Section 6.1, we first investigate whether the policy described in Section 4 is optimal for systems
12
with M > 3 and then present some interesting features of this policy. In Section 6.2, we develop
heuristic policies for tandem queues with two stations that group the M ≥ 3 available servers into
two or three teams and compare the throughput of these heuristics with the optimal throughput.
6.1 Conjectured Optimal Policy
In this section, we discuss two sets of numerical experiments aimed at determining whether the
policy conjectured to be optimal in Section 4 is in fact optimal for M > 3 and also at understanding
the behavior of that policy (recall that the conjectured policy is known to be optimal for M ≤ 3,
see Theorem 3.1 and Andradottir, Ayhan, and Down [4]). In the first set of numerical experiments,
we consider systems with two stations, M ∈ {3, 4, 5, 7, 10} servers, and B ∈ {0, 1, . . . , 5, 10, 15, 20}buffers between the two stations, where the service rate µij of each server i ∈ {1, . . . , M} at each
station j ∈ {1, 2} is drawn independently from a uniform distribution with range [0, 100]. For
systems with M ≤ 5, we generate 1,000,000 sets of service rates µij , where i = 1, . . . ,M and
j = 1, 2, for each buffer size B. On the other hand, for M ∈ {7, 10} and each choice of B, we
obtain our numerical results from 10,000 and 1,000 randomly generated systems, respectively. In
the second set of numerical experiments, we consider systems with 3, 4, or 5 servers, where the
service rates µij , for i = 1, . . . , M and j = 1, 2, take on all combinations of the values 1, 2, . . . , 10 for
systems with 3 or 4 servers and all combinations of the values 1, 2, . . . , 5 for systems with 5 servers.
The size B of the buffer between stations 1 and 2 again satisfies B ∈ {0, 1, . . . , 5, 10, 15, 20}.For each system with M > 3 considered in the two sets of numerical experiments described
in the previous paragraph, we compute the throughput of the policy that is conjectured to be
optimal in Section 4, as well as the throughput of the optimal policy (which is obtained by using
the policy iteration algorithm for communicating Markov chains as described in the Appendix).
(We use a smaller number of systems for M ∈ {7, 10} and randomly generated service rates and
also for M = 5 and deterministic service rates because determining the optimal policy requires a
considerable amount of effort for systems with large numbers of servers.) The throughput of the
conjectured optimal policy is always equal to the optimal throughput, which implies that for all of
the systems considered in the two sets of numerical experiments discussed in this section, the policy
described in Section 4 is indeed an optimal policy. These extensive numerical results demonstrate
that the policy described in Section 4 appears to be optimal for systems with M > 3 (at least with
high probability).
We now study the behavior of the conjectured optimal policy in a more detailed manner. Recall
that s∗i , where i ∈ {1, . . . , M}, denotes the state where the ith ordered server (i.e., server li, see
Section 4) moves from station 1 to station 2 according to the conjectured optimal policy (so that
1 ≤ s∗1 ≤ s∗2 ≤ · · · ≤ s∗M−1 ≤ s∗M = B + 2). We do not have simple expressions for computing the
switch points s∗2, . . . , s∗M−1 even when M = 3, see the definition of the set S∗ in Section 3. For
each choice of M , B, and either random or deterministic service rates, let s∗i be the average s∗ivalue over the total number of systems considered, for i = 1, . . . , M . In order to obtain a better
13
understanding of when servers l2, . . . , lM−1 move from station 1 to station 2, we consider the ratio
of the average switch points s∗2, . . . , s∗M−1 to the total number of states (B + 3) in the numerical
experiments described above. In the interest of space, we display these ratios only for systems with
four or five servers and randomly generated service rates in Table 1.
Buffer M = 4 M = 5
Size s∗2B+3
s∗3B+3
s∗2B+3
s∗3B+3
s∗4B+3
0 0.390595 0.609405 0.348271 0.499958 0.651729
1 0.355158 0.644842 0.290877 0.499824 0.709123
2 0.332058 0.667942 0.257209 0.499993 0.742791
3 0.315841 0.684160 0.233439 0.499846 0.766561
4 0.302724 0.697276 0.215206 0.500364 0.784794
5 0.292125 0.707875 0.200529 0.500062 0.799471
10 0.257131 0.742869 0.153968 0.499972 0.846032
15 0.237132 0.762868 0.128150 0.500215 0.871850
20 0.224115 0.775885 0.111320 0.499494 0.888680
Table 1: Ratio of the average switch points to the number of states for randomly generated service
rates.
The numerical results given in Table 1 are consistent with Corollary 5.3 in that
s∗iB + 3
+s∗M+1−i
B + 3' 1 for all i = 1, . . . , M,
ands∗(M+1)/2
B + 3' 0.5 when M is odd.
Moreover, for i ≤ M/2, the ratios s∗i /(B + 3) are decreasing and for i ≥ (M + 2)/2, the ratios
s∗i /(B + 3) are increasing. Similar results were obtained for M ∈ {7, 10} and randomly generated
service rates and for M ∈ {3, 4, 5} and deterministic service rates. More specifically, if we consider
M = 7 with B = 20 and randomly generated service rates, then we obtain
s∗2B + 3
' 0.0557,s∗3
B + 3' 0.1641,
s∗4B + 3
' 0.4960,s∗5
B + 3' 0.8359,
s∗6B + 3
' 0.9433.
Similarly, when M = 10, B = 20, and the service rates are randomly generated, then we have
s∗2B + 3
' 0.0455,s∗3
B + 3' 0.0663,
s∗4B + 3
' 0.1451,s∗5
B + 3' 0.3514,
s∗6B + 3
' 0.6489,s∗7
B + 3' 0.8523,
s∗8B + 3
' 0.9357,s∗9
B + 3' 0.9555.
Since the ratio s∗i /(B + 3) is very close to 0 or 1 for several servers i ∈ {2, . . . ,M − 1}, these
numerical results suggest that policies that group several servers into teams that move together
14
between stations 1 and 2 may yield good performance, at least for systems with two stations, large
number of servers M , and large buffer B between the two stations.
We next investigate how many teams the conjectured optimal policy has in the numerical
experiments described above (for example, there are two teams if there exists i∗ ∈ {2, . . . , M} such
that s∗i = s∗1 = 1 for all i < i∗ and s∗i = s∗M = B + 2 for all i ≥ i∗, and there are M teams
if 1 = s∗1 < s∗2 < · · · < s∗M−1 < s∗M = B + 2). Note that the total number of teams cannot
exceed B + 2, the total number of possible switch points. For each choice of M , B, and either
random or deterministic service rates, let ri denote the ratio of the number of systems where the
conjectured optimal policy has i teams to the total number of systems generated, for i = 2, . . . ,M
(since 1 = s∗1 < s∗M = B + 2, the policy conjectured to be optimal in Section 4 cannot have fewer
than two teams). Hence ri estimates the probability of having i teams in the conjectured optimal
policy. Table 2 shows the average number of teams,∑M
i=2 iri, as a function of the number of servers
M and buffer size B for the two sets of numerical experiments described previously, and Tables 3
and 4 display the values of r2, r3, and rM for various numbers of servers M and buffer sizes B for
the first and second sets of numerical experiments, respectively.
Buffer Random Service Rates Deterministic Service Rates
Size M = 3 M = 4 M = 5 M = 7 M = 10 M = 3 M = 4 M = 5
0 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
1 2.45 2.58 2.69 2.82 2.92 2.45 2.58 2.66
2 2.64 2.91 3.10 3.38 3.65 2.64 2.92 3.03
3 2.74 3.12 3.37 3.77 4.19 2.74 3.11 3.24
4 2.80 3.25 3.55 4.04 4.64 2.78 3.23 3.38
5 2.83 3.33 3.67 4.23 4.89 2.81 3.30 3.46
10 2.89 3.51 3.95 4.69 5.60 2.85 3.48 3.64
15 2.90 3.56 4.04 4.84 5.86 2.86 3.53 3.69
20 2.91 3.58 4.06 4.90 6.03 2.87 3.54 3.78
Table 2: Average number of teams.
As expected, Table 2 shows that the average number of teams increases both with the number
of servers M and with the buffer size B. However, the growth rate is rather slow, so that the
average number of teams is significantly smaller than the maximum possible number of teams (i.e.,
min{M, B +2}) for large M and B. Moreover, Table 2 shows that for given values of M and B, the
average number of teams in the random and deterministic cases are quite similar, with the averages
being slightly larger when the service rates are generated at random, rather than deterministically
(this may be due to the fact that we use a larger range of possible values when the service rates
are generated at random, rather than deterministically, leading to larger differences between the
15
Buffer M = 3 M = 4 M = 5 M = 7 M = 10
Size r2 r3 rM r2 r3 rM r2 r3 rM r2 r3 rM r2 r3 rM
0 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000
1 0.550 0.450 0.450 0.425 0.575 0.000 0.315 0.685 0.000 0.184 0.816 0.000 0.081 0.919 0.000
2 0.357 0.643 0.643 0.232 0.618 0.149 0.131 0.640 0.000 0.050 0.517 0.000 0.012 0.332 0.000
3 0.259 0.741 0.741 0.144 0.594 0.262 0.066 0.535 0.032 0.017 0.317 0.000 0.002 0.128 0.000
4 0.204 0.796 0.796 0.098 0.560 0.343 0.037 0.447 0.067 0.007 0.205 0.000 0.001 0.043 0.000
5 0.171 0.829 0.829 0.070 0.530 0.400 0.023 0.381 0.099 0.003 0.141 0.000 0.000 0.029 0.000
10 0.112 0.888 0.888 0.022 0.449 0.529 0.004 0.231 0.189 0.000 0.039 0.004 0.000 0.002 0.000
15 0.097 0.903 0.903 0.011 0.419 0.570 0.001 0.185 0.224 0.000 0.021 0.007 0.000 0.000 0.000
20 0.091 0.909 0.909 0.007 0.405 0.588 0.001 0.168 0.233 0.000 0.016 0.009 0.000 0.000 0.000
Table 3: Team probabilities for randomly generated service rates.
capabilities of the different servers at the two stations in the system). Similarly, Tables 3 and
4 show that the probability of having two teams decreases as the buffer size B increases for all
M ≥ 3. Moreover, the probability of having three teams decreases with the buffer size for all
M ≥ 5 (looking only at B ≥ 1, so that it is possible to have three teams). When M = 3, r3
increases as the buffer size increases, which is reasonable since in this case r3 = rM ; when M = 4,
r3 first increases and then decreases with B. Finally, rM increases with the buffer size in all cases.
Note however that for fixed B, Tables 3 and 4 show that rM decreases as M increases (in fact,
when M = 10 in Table 3, then rM = 0 for all B ∈ {0, 1, . . . , 5, 10, 15, 20}). Together with Table 2,
this suggests that the conjectured optimal policy is likely to have some servers grouped into teams,
at least for large numbers of servers M .
6.2 Heuristic Server Assignment Policies with Two or Three Teams
The numerical results given in Section 6.1 suggest that the optimal server assignment policy for
systems with two stations in tandem and M servers has the structure described in Section 4.
However, this policy may be difficult to implement in practice when M is large. In this section,
we consider policies in which the servers are grouped into two or three teams, and then the teams
are assigned to stations in the manner found to be optimal for systems with two or three servers,
see Andradottir, Ayhan, and Down [4] and Section 3. Our goal is to develop server assignment
heuristics that are easily implementable and also robust with respect to the server capabilities in
that their average throughput as the service rates vary is near-optimal.
We first order the servers as is done in Section 4, see equations (7) and (8). Then we consider all
ways of grouping the ordered servers into two or three teams. More specifically, for all (n,M −n) ∈N2 (see equation (10)), we consider using the server assignment policy of Andradottir, Ayhan, and
Down [4] with servers l1, . . . , ln in one team and servers ln+1, . . . , lM in the other team. Similarly,
for all (n1, n2, M − n1 − n2) ∈ N3, we consider using the server assignment policy found to be
16
Buffer M = 3 M = 4 M = 5
Size r2 r3 rM r2 r3 rM r2 r3 rM
0 1.000 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.000
1 0.547 0.453 0.453 0.425 0.575 0.000 0.315 0.685 0.000
2 0.356 0.644 0.644 0.235 0.615 0.150 0.150 0.670 0.000
3 0.264 0.736 0.736 0.150 0.592 0.258 0.085 0.604 0.018
4 0.217 0.783 0.783 0.105 0.561 0.333 0.058 0.540 0.035
5 0.190 0.810 0.810 0.079 0.537 0.383 0.045 0.497 0.050
10 0.148 0.852 0.852 0.027 0.463 0.510 0.031 0.392 0.091
15 0.139 0.861 0.861 0.015 0.440 0.545 0.029 0.362 0.108
20 0.133 0.867 0.867 0.011 0.434 0.555 0.013 0.315 0.138
Table 4: Team probabilities for deterministic service rates.
optimal in Section 3 with servers l1, . . . , ln1 in the first team, servers ln1+1, . . . , ln1+n2 in the second
team, and servers ln1+n2+1, . . . , lM in the third team. Note that in all of these server assignment
heuristics, the order of the servers (and hence the composition of the teams) depends on the service
rates, but the size of the teams does not. In order to evaluate and compare the performance of
these policies, we perform three sets of numerical experiments in which the service rates are drawn
independently from uniform distributions with ranges [40, 60], [20, 80], and [0, 100], respectively.
Note that these three uniform distributions all have a common mean but different variances. Hence
these distributions are chosen to model situations where the capabilities of the servers at the two
stations tend to be quite similar, quite different, and very different, respectively. In all cases, we
consider systems with M = 3, 4, . . . , 10 servers and B = 5, 10, 20 buffers. For each M ≤ 6 and
each choice of B, we generate 100,000 sets of service rates µij , where i = 1, . . . , M and j = 1, 2.
On the other hand, for M = 7, 8, 9, and 10, we generate 10000, 5000, 1000, and 500 sets of service
rates, respectively, for each value of B (as in Section 6.1, we generate fewer sets of service rates for
systems with large numbers of servers M because of the excessive amount of computational effort
required for determining the optimal policy for systems with many servers).
In the two team setting, the numerical experiments described in the previous paragraph suggest
that the two team heuristic with the number of servers in the two teams differing at most by one
performs the best on average in that it yields the highest average throughput and also leads to the
highest throughput among all the two team policies that we consider more often than any other
two team policy. Thus, if M is even, then our two team heuristic assigns servers l1, . . . , lM/2 to
station 2 unless station 2 is starved, and servers l(M+2)/2, . . . , lM to station 1 unless station 1 is
blocked. All servers work at station 2 (station 1) when station 1 (station 2) is blocked (starved).
On the other hand if M is odd, then either servers l1, . . . , l(M−1)/2 form the downstream team and
17
servers l(M+1)/2, . . . , lM form the upstream team or servers l1, . . . , l(M+1)/2 form the downstream
team and servers l(M+3)/2, . . . , lM form the upstream team (note that Corollary 5.5 shows that on
the average, these two server assignment heuristics behave in the same manner).
In the three team setting, the best average performance is obtained by forming the teams in such
a way that the size of the moving team is smaller than the sizes of the upstream and downstream
teams and the sizes of the upstream and downstream teams differ at most by one (as in the two team
setting, this approach to forming the teams maximizes both the average performance and also the
probability of achieving the best performance among all three team policies under consideration).
When M is odd, this translates into having servers l1, . . . , l(M−1)/2 as the downstream team, server
l(M+1)/2 as the moving team, and servers l(M+3)/2, . . . , lM as the upstream team. However, when
M is even, then there are two cases: If M ∈ {4, 6}, then either servers l1, . . . , lM/2 form the
downstream team, server l(M+2)/2 forms the moving team, and servers l(M+4)/2, . . . , lM form the
upstream team, or servers l1, . . . , l(M−2)/2 form the downstream team, server lM/2 forms the moving
team, and servers l(M+2)/2, . . . , lM form the upstream team (note that the average performance of
these two teams is the same by Corollary 5.5). On the other hand, if M ∈ {8, 10, 12, . . .}, then
servers l1, . . . , l(M−2)/2 form the downstream team, servers lM/2 and l(M+2)/2 form the moving team,
and servers l(M+4)/2, . . . , lM form the upstream team.
We now compare our two and three team heuristics with other server assignment policies,
including the optimal policy (determined by using the policy iteration algorithm for communicating
Markov chains, see the Appendix), the best two team policy, the best three team policy, and a
benchmark policy, namely the teamwork policy of Van Oyen, Gel, and Hopp [19] (where all servers
work in a single team that will follow each job from the first to the last station and only starts work
on a new job once all work on the previous job has been completed). For each randomly generated
choice of µ (see Section 5), the best two team policy is the one that yields the highest throughput
among all two team policies such that servers l1(µ), . . . , ln(µ) are primarily assigned to station 2 and
servers ln+1(µ), . . . , lM (µ) are primarily assigned to station 1, where n ∈ {1, . . . ,M −1}. Similarly,
for each choice of µ, the best three team policy is the one that yields the highest throughput
among all three team policies such that servers l1(µ), . . . , lk1(µ) form the downstream team, servers
lk1+1(µ), . . . , lk2(µ) form the moving team, and servers lk2+1(µ), . . . , lM (µ) form the upstream
team, where k1 ∈ {1, . . . ,M − 1} and k2 ∈ {k1, . . . , M − 1}. (In other words, in the best two and
three team policies, the team sizes are allowed to depend on µ.)
Although we perform nine sets of numerical experiments (one for each combination of three
choices of distribution for the service rates and three buffer sizes), in the interest of space we only
show results from four sets of numerical experiments here. For all a, b ∈ IR with a ≤ b, let U [a, b]
denote the uniform distribution with range [a, b]. Tables 5 through 8 show 95% confidence intervals
for the average throughput values of the policies described in the previous three paragraphs when
the service rates are drawn from either the U [40, 60] or U [0, 100] distribution and the buffer size
B satisfies B ∈ {5, 20}. Since we have two alternative ways of forming the teams for the two team
18
heuristic when M ∈ {3, 5, 7}, we arbitrarily choose the one that has servers l1, . . . , l(M−1)/2 in the
downstream team and servers l(M+1)/2, . . . , lM in the upstream team. Similarly, for the three team
heuristic and M ∈ {4, 6}, we assign servers l1, . . . , lM/2 to the downstream team, server l(M+2)/2 to
the moving team, and servers l(M+4)/2, . . . , lM to the upstream team.
Number of Teamwork Two Team Best Two Three Team Best Three Optimal
Servers Policy Heuristic Team Policy Heuristic Team Policy Policy
3 74.85 ± 0.06 77.44 ± 0.06 77.84 ± 0.06 78.18 ± 0.07 78.18 ± 0.07 78.18 ± 0.07
4 99.86 ± 0.07 104.51 ± 0.08 104.53 ± 0.08 104.62 ± 0.08 104.71 ± 0.08 104.75 ± 0.08
5 124.86 ± 0.08 130.38 ± 0.08 130.77 ± 0.08 131.19 ± 0.08 131.22 ± 0.08 131.25 ± 0.08
6 149.87 ± 0.08 157.31 ± 0.09 157.33 ± 0.09 157.50 ± 0.09 157.69 ± 0.09 157.76 ± 0.09
7 174.82 ± 0.11 183.24 ± 0.12 183.60 ± 0.12 184.03 ± 0.12 184.11 ± 0.12 184.20 ± 0.12
8 199.83 ± 0.16 210.05 ± 0.18 210.08 ± 0.18 210.43 ± 0.18 210.59 ± 0.18 210.70 ± 0.18
9 224.94 ± 0.38 236.16 ± 0.42 236.49 ± 0.42 236.92 ± 0.42 237.12 ± 0.42 237.24 ± 0.42
10 249.90 ± 0.57 262.68 ± 0.64 262.73 ± 0.64 263.25 ± 0.64 263.38 ± 0.64 263.53 ± 0.64
Table 5: Throughput values for systems with U [40, 60]-distributed service rates and B = 5.
Number of Teamwork Two Team Best Two Three Team Best Three Optimal
Servers Policy Heuristic Team Policy Heuristic Team Policy Policy
3 74.85 ± 0.06 77.47 ± 0.06 77.88 ± 0.06 78.33 ± 0.07 78.33 ± 0.07 78.33 ± 0.07
4 99.86 ± 0.07 105.01 ± 0.08 105.03 ± 0.08 105.11 ± 0.08 105.17 ± 0.08 105.19 ± 0.08
5 124.86 ± 0.08 130.61 ± 0.08 130.99 ± 0.08 131.71 ± 0.09 131.72 ± 0.09 131.73 ± 0.09
6 149.87 ± 0.08 158.14 ± 0.09 158.15 ± 0.09 158.28 ± 0.09 158.39 ± 0.09 158.45 ± 0.09
7 174.82 ± 0.11 183.77 ± 0.12 184.20 ± 0.12 184.63 ± 0.12 184.96 ± 0.12 184.99 ± 0.12
8 199.83 ± 0.16 210.78 ± 0.19 210.82 ± 0.18 210.96 ± 0.19 211.26 ± 0.18 211.35 ± 0.18
9 224.94 ± 0.38 237.10 ± 0.43 237.82 ± 0.43 238.34 ± 0.43 238.58 ± 0.43 238.63 ± 0.43
10 249.90 ± 0.57 263.81 ± 0.64 263.85 ± 0.64 264.19 ± 0.65 264.31 ± 0.64 264.43 ±0.64
Table 6: Throughput values for systems with U [40, 60]-distributed service rates and B = 20.
As expected, Tables 5 through 8 show that the throughputs achieved by all the policies con-
sidered in this section appear to increase as the number of servers M increases. Similarly, the
throughputs of all the policies except for the teamwork policy appear to increase with both the
number of buffers B and the variability in the service rates µij , where i = 1, . . . , M and j = 1, 2.
On the other hand, the throughput of the teamwork policy is by definition insensitive to B and
it appears to decrease slightly as the variability in the service rates increases. The fact that the
19
Number of Teamwork Two Team Best Two Three Team Best Three Optimal
Servers Policy Heuristic Team Policy Heuristic Team Policy Policy
3 70.70 ± 0.32 82.64 ± 0.37 85.05 ± 0.37 86.23 ± 0.38 86.23 ± 0.38 86.23 ± 0.38
4 95.77 ± 0.36 116.56 ± 0.43 117.20 ± 0.42 117.61 ± 0.43 118.63 ± 0.43 118.73 ± 0.43
5 120.82 ± 0.40 146.20 ± 0.47 149.03 ± 0.46 150.50 ± 0.48 150.97 ± 0.47 151.14 ± 0.47
6 145.92 ± 0.43 179.93 ± 0.51 181.07 ± 0.50 181.76 ± 0.50 183.32 ± 0.51 183.51 ± 0.51
7 170.53 ± 0.55 209.47 ± 0.64 212.46 ± 0.64 213.56 ± 0.64 215.06 ± 0.64 215.46 ± 0.64
8 195.57 ± 0.83 242.87 ± 0.97 244.41 ± 0.95 246.28 ± 0.96 247.34 ± 0.96 247.87 ± 0.96
9 221.15 ± 1.96 273.33 ± 2.26 276.57 ± 2.22 278.39 ± 2.25 279.84 ± 2.24 280.49 ± 2.25
10 246.06 ± 2.97 305.28 ± 3.45 307.42 ± 3.42 309.85 ± 3.48 311.04 ± 3.45 311.81 ± 3.46
Table 7: Throughput values for systems with U [0, 100]-distributed service rates and B = 5.
throughputs of all the policies except for the teamwork policy increase with the variability in the
service rates is reasonable because only the teamwork policy is unable to take advantage of this
variability by assigning servers primarily to tasks that they are good at (i.e., have a high service
rate at).
Tables 5 through 8 also show that the average behavior of our two team heuristic is in all cases
very close to that of the best two team policy and also that the average behavior of our three
team heuristic is always very similar to the average performance of the best three team policy.
Both the two and three team heuristics perform significantly better than the teamwork policy,
especially when the service rates are highly variable, with the three team heuristic showing slightly
better average performance than the two team heuristic. Finally, the average performance of the
three team heuristic is always very close to that of the optimal policy (it is equal to the average
performance of the optimal policy when M = 3, as predicted by Theorem 3.1). These observations
suggest that both of our server assignment heuristics are likely to yield very good performance
in practice, and that the behavior of the three team heuristic is usually near-optimal. Moreover,
although our heuristics are designed to be both easily implementable and also robust with respect
to the service rates (in that the sizes of the teams do not depend on the service rates), our numerical
results indicate that there is very little room for obtaining improved average performance through
the use of more complex policies or policies that depend more heavily on the service rates.
7 Conclusion
For Markovian queueing systems with two stations in tandem, finite intermediate buffer, and three
flexible and collaborative servers, we have completely specified how servers should be assigned to
stations in order to achieve maximal long-run average throughput. Moreover, we have provided
20
Number of Teamwork Two Team Best Two Three Team Best Three Optimal
Servers Policy Heuristic Team Policy Heuristic Team Policy Policy
3 70.70 ± 0.32 83.14 ± 0.37 85.91 ± 0.37 87.47 ± 0.39 87.47 ± 0.39 87.47 ± 0.39
4 95.77 ± 0.36 118.33 ± 0.44 119.06 ± 0.44 119.63 ± 0.44 120.90 ± 0.44 120.94 ± 0.44
5 120.82 ± 0.40 147.89 ± 0.47 151.60 ± 0.47 153.68 ± 0.49 154.17 ± 0.48 154.25 ± 0.48
6 145.92 ± 0.43 183.13 ± 0.53 184.67 ± 0.51 185.30 ± 0.53 187.47 ± 0.52 187.58 ± 0.52
7 170.53 ± 0.55 212.57 ± 0.66 216.97 ± 0.65 218.96 ± 0.67 220.15 ± 0.66 220.38 ± 0.66
8 195.57 ± 0.83 247.58 ± 1.01 249.97 ± 0.97 251.58 ± 0.98 253.40 ± 0.98 253.73 ± 0.99
9 221.15 ± 1.96 277.93 ± 2.34 283.03 ± 2.28 284.81 ± 2.36 286.76 ± 2.30 287.18 ± 2.30
10 246.06 ± 2.97 311.28 ± 3.66 316.44 ± 3.52 317.19 ± 3.57 318.77 ± 3.54 319.25 ± 3.55
Table 8: Throughput values for systems with U [0, 100]-distributed service rates and B = 20.
a conjecture for the structure of an optimal server assignment policy for two-station tandem lines
with an arbitrary number of flexible and collaborative servers; the results of extensive numerical
experiments suggest that our conjecture appears to be correct. Finally, we have proposed heuris-
tic server assignment policies that involve grouping all available servers into two or three teams
and presented numerical results that suggest that our heuristic policies (especially our three team
heuristic) generally achieve near-optimal long-run average throughput.
Acknowledgments
The research of the first author was supported by the National Science Foundation under grants
DMI–0000135 and DMI–0217860. The research of the second author was supported by the National
Science Foundation under grants DMI–9908161 and DMI–9984352.
Appendix: Proof of Theorem 3.1
We will use the notation aσdσmσu for the possible actions, where, for i = d,m, u, σi ∈ {I, 1, 2} is
the status of server i, with σi = I when server i is idle and σi = j ∈ {1, 2} when server i is working
at station j. Then the set As of allowable actions in state s ∈ S is given by
As =
{aIII , a1II , aI1I , aII1, a11I , a1I1, aI11, a111} for s = 0,
{aIII , a1II , aI1I , aII1, a11I , a1I1, aI11, a2II , aI2I , aII2,
a22I , a2I2, aI22, a111, a211, a112, a121, a122, a221, a212, a222} for s ∈ {1, . . . , B + 1},{aIII , a2II , aI2I , aII2, a22I , a2I2, aI22, a222} for s = B + 2.
Note that the set of possible actions in states 0 and B + 2 can be reduced. For example, in state
0, action aIII is identical to actions a2II , aI2I , aII2, a22I , a2I2, aI22, and a222, and in state B + 2,
21
action aIII is identical to actions a1II , aI1I , aII1, a11I , a1I1, aI11, and a111.
As was mentioned in Section 3, under our assumptions on the service rates and definitions of d
and u, neither µd2 nor µu1 can be equal to zero. This shows that the policy described in Theorem
3.1 corresponds to an irreducible Markov chain, and consequently that we have a communicating
Markov decision process. Therefore, we use the policy iteration algorithm for communicating
models (see pages 479 and 480 of Puterman [16]) to prove the optimality of the policy described in
Theorem 3.1.
For all decision rules δ, let Pδ be the (B +3)× (B +3) dimensional transition probability matrix
corresponding to the policy (δ)∞ and let rδ be the B + 3 dimensional reward vector corresponding
to δ, with rδ(s) denoting the reward earned in state s under the policy (δ)∞, for all s ∈ S.
Moreover, let q denote the uniformization constant (we assume, without loss of generality, that the
uniformization constant does not depend on the policy π ∈ Π, see Section 2).
In the policy iteration algorithm, we start by choosing
δ0(s) = δ∗(s) = δs∗(s) =
a111 for s = 0,
a211 for 1 ≤ s ≤ s∗ − 1,
a221 for s∗ ≤ s ≤ B + 1,
a222 for s = B + 2,
corresponding to the policy described in Theorem 3.1. Then
rδ0(s) =
0 for s = 0,
µd2 for 1 ≤ s ≤ s∗ − 1,
µd2 + µm2 for s∗ ≤ s ≤ B + 1,
µd2 + µm2 + µu2 for s = B + 2,
and
Pδ0(s, s′) =
µd1+µm1+µu1
q for s = 0, s′ = 1,q−(µd1+µm1+µu1)
q for s = s′ = 0,µd2q for 1 ≤ s ≤ s∗ − 1, s′ = s− 1,
q−(µd2+µu1+µm1)q for 1 ≤ s ≤ s∗ − 1, s′ = s,
µu1+µm1
q for 1 ≤ s ≤ s∗ − 1, s′ = s + 1,µd2+µm2
q for s∗ ≤ s ≤ B + 1, s′ = s− 1,q−(µd2+µm2+µu1)
q for s∗ ≤ s ≤ B + 1, s′ = s,µu1
q for s∗ ≤ s ≤ B + 1, s′ = s + 1,µd2+µm2+µu2
q for s = B + 2, s′ = B + 1,q−(µd2+µm2+µu2)
q for s = s′ = B + 2.
Since the Markov chain under the policy (δ0)∞ is irreducible, we find a scalar g0 and a vector h0
solving
rδ0 − g0e + (Pδ0 − I)h0 = 0, (11)
22
subject to h0(0) = 0. In equation (11), e is a column vector of ones and I is the identity matrix.
For the rest of the proof we will use the following notation to simplify our expressions:
Σ1 = µu1 + µm1 + µd1, Σ2 = µu2 + µm2 + µd2, Σu = µu1 + µm1, Σd = µm2 + µd2,
∆md = µm1µd2 − µd1µm2, ∆ud = µu1µd2 − µd1µu2, ∆um = µu1µm2 − µm1µu2.
Recall that under our assumptions on the service rates and by the definitions of d, m, and u, we
have Σ1 > 0, Σ2 > 0, Σu > 0, Σd > 0, ∆md ≥ 0, ∆ud ≥ 0, and ∆um ≥ 0. Moreover, note that
∆ud = 0 if and only if ∆md = ∆ud = ∆um = f(s) = 0 for all s ≥ 0, (12)
see the proof of Proposition 3.2. Let
Θ1 =Σ1(µs∗
d2 − Σs∗u )
µs∗−1d2 (µd2 − Σu)
+Σs∗−1
u µu1(ΣB+2−s∗d − µB+2−s∗
u1 )ΣB+2−s∗
d µs∗−1d2 (Σd − µu1)
,
Θ2 = 1 +Σ1(µs∗−1
d2 − Σs∗−1u )
µs∗−1d2 (µd2 − Σu)
+Σ1Σs∗−1
u
ΣB+2−s∗d µs∗−1
d2
(ΣB+2−s∗
d − µB+2−s∗u1
Σd − µu1+
µB+2−s∗u1
Σ2
),
Θ3 = Σ1s∗ +
µu1(ΣB+2−s∗d − µB+2−s∗
u1 )ΣB+2−s∗
d (Σd − µu1),
Θ4 = 1 +Σ1(s∗ − 1)
µd2+
Σ1
ΣB+2−s∗d
(ΣB+2−s∗
d − µB+2−s∗u1
Σd − µu1+
µB+2−s∗u1
Σ2
),
Θ5 =Σ1(µs∗
d2 − Σs∗u )
µs∗−1d2 (µd2 − Σu)
+Σs∗−1
u (B + 2− s∗)µs∗−1
d2
,
and
Θ6 = 1 +Σ1(µs∗−1
d2 − Σs∗−1u )
µs∗−1d2 (µd2 − Σu)
+Σ1Σs∗−1
u
Σdµs∗−1d2
(B + 2− s∗ +
Σd
Σ2
).
One can show that
g0 =
Θ1Θ2
if µd2 6= Σu and µu1 6= Σd,Θ3Θ4
if µd2 = Σu and µu1 6= Σd,Θ5Θ6
if µd2 6= Σu and µu1 = Σd,
(13)
h0(0) = 0,
h0(s) =qg0
Σ1
((µd1 + µd2)
s−2∑
j=0
(j + 1)µs−j−2d2 Σj+1−s
u + s)− qµd2
s−2∑
j=0
(j + 1)µs−2−jd2 Σj+1−s
u
for 1 ≤ s ≤ s∗, and
h0(s) = h0(s∗) +qΣd
µs−s∗u1
s−s∗−1∑
j=0
µju1Σ
s−s∗−j−1d
g0
Σ1
((µd1 + µd2)
s∗−2∑
j=0
µs∗−j−2d2 Σj+1−s
u + Σs∗−su
)−
s∗−2∑
j=0
µs∗−j−2d2 Σj+1−s
u
+
q(g0 − Σd)µs−s∗
u1
s−s∗−1∑
j=0
(j + 1)µju1Σ
s−s∗−j−1d
23
for s∗ + 1 ≤ s ≤ B + 2 (with the convention that summation over an empty set equals to zero)
constitute a solution to equation (11). Note that g0 > 0 in all three cases listed in equation (13)
and that under our assumptions on service rates, it is not possible to have µd2 = Σu and µu1 = Σd,
because this would imply that µm1 = µm2 = 0.
For the remainder of the proof, we assume that µd2 6= Σu and µu1 6= Σd; the other two cases
can be handled in a similar manner. For all s ∈ S and a ∈ As, let r(s, a) be the immediate reward
obtained when action a is chosen in state s and let p(j|s, a) be the probability of going to state j
in one step when action a is chosen in state s. As a next step of the policy iteration algorithm, we
choose
δ1(s) ∈ arg maxa∈As
r(s, a) +
∑
j∈S
p(j|s, a)h0(j)
, ∀s ∈ S,
setting δ1(s) = δ0(s) if possible. We now show that if d, m, u, and s∗ are chosen as described in
Section 3, then δ1(s) = δ0(s), for all s ∈ S. In particular, for all s ∈ S and a ∈ As, we will compute
the differences
r(s, a) +∑
j∈S
p(j|s, a)h0(j)−r(s, d0(s)) +
∑
j∈S
p(j|s, d0(s))h0(j)
(14)
and show that the differences are non-positive.
For s = 0, recall that δ0(s) = a111. We have
r(s, aIII) +∑
j∈S
p(j|s, aIII)h0(j)−r(s, a111) +
∑
j∈S
p(j|s, a111)h0(j)
= −g0 < 0,
r(s, a1II) +∑
j∈S
p(j|s, a1II)h0(j)−r(s, a111) +
∑
j∈S
p(j|s, a111)h0(j)
= −Σug0
Σ1< 0,
r(s, aI1I) +∑
j∈S
p(j|s, aI1I)h0(j)−r(s, a111) +
∑
j∈S
p(j|s, a111)h0(j)
= −(µd1 + µu1)g0
Σ1< 0,
r(s, aII1) +∑
j∈S
p(j|s, aII1)h0(j)−r(s, a111) +
∑
j∈S
p(j|s, a111)h0(j)
= −(µd1 + µm1)g0
Σ1≤ 0,(15)
r(s, a11I) +∑
j∈S
p(j|s, a11I)h0(j)−r(s, a111) +
∑
j∈S
p(j|s, a111)h0(j)
= −µu1g0
Σ1< 0,
r(s, a1I1) +∑
j∈S
p(j|s, a1I1)h0(j)−r(s, a111) +
∑
j∈S
p(j|s, a111)h0(j)
= −µm1g0
Σ1≤ 0, (16)
r(s, aI11) +∑
j∈S
p(j|s, aI11)h0(j)−r(s, a111) +
∑
j∈S
p(j|s, a111)h0(j)
= −µd1g0
Σ1≤ 0. (17)
Note that in (15), we have the expression equal to zero only when µd1 = µm1 = 0, in which case
a111 is identical to aII1; in (16), we have the expression equal to zero only when µm1 = 0, in which
24
case a111 is identical to a1I1; and finally in (17), we have the expression equal to zero only when
µd1 = 0, in which case a111 is identical to aI11. This shows that δ1(0) = δ0(0).
For 1 ≤ s ≤ s∗ − 1, we have that δ0(s) = a211. Since the set As of all possible actions is large,
in the interest of space we will specify the difference in (14) only for the actions a111, a112, a121,
a122, a221, a212, and a222. Define
Γ1 = µs∗−2d2 ΣB+2−s∗
d Σ2(µd1 + µd2) + Σs∗−1u µB+1−s∗
u1 Σ1(µu1 + µu2) + Σs∗u ΣB+2−s∗
d +
Σs∗−1u ΣB+3−s∗
d + ΣB+3−s∗d
s∗−3∑
j=0
µj+1d2 Σs∗−2−j
u + ΣB+2−s∗d µu2
s∗−2∑
j=0
µjd2Σ
s∗−1−ju +
µd1ΣB+2−s∗d Σ2
s∗−4∑
j=0
µj+1d2 Σs∗−3−j
u + µd1
min(1,s∗−2)∑
j=0
ΣB+2−s∗+jd Σs∗−1−j
u +
µd1µu2
min(0,s∗−3)∑
j=0
ΣB+2−s∗+jd Σs∗−2−j
u + Σ1(µu1 + µu2)Σs∗−1u
B−s∗∑
j=0
Σj+1d µB−s∗−j
u1 .
Note that Γ1 is always positive. We have
r(s, a111) +∑
j∈S
p(j|s, a111)h0(j)−r(s, a211) +
∑
j∈S
p(j|s, a211)h0(j)
= −Γ2(s)
Γ1,
where
Γ2(s) = Σ1
s−1∑
j=0
µjd2Σ
s∗−2−ju
(∆md
( B−s∗+2∑
j=0
µju1Σ
B−s∗+2−jd + µu2
B−s∗+1∑
j=0
Σjdµ
B−s∗+1−ju1
)+
µB−s∗+2u1 ∆ud
)≥ 0.
Then −Γ2(s)Γ1
≤ 0. Moreover,Γ2(s)Γ1
= 0 if and only if ∆ud = 0, which implies that S∗ = S \ {0} 6={s∗} (see equation (12) and the definition of S∗). Similarly,
r(s, a112) +∑
j∈S
p(j|s, a112)h0(j)−r(s, a211) +
∑
j∈S
p(j|s, a211)h0(j)
= −Γ3(s)
Γ1,
where
Γ3(s) = (∆md + ∆ud + ∆um)µB+2−s∗u1 Σ1
s−1∑
j=0
µjd2Σ
s∗−2−ju +
((∆ud + ∆um)µm2 + ∆umµu2
)
s−1∑
k=0
µkd2Σ
s∗−1−ku
B−s∗+1∑
j=0
Σjdµ
B+1−s∗−ju1
+
∆udµu2µs−1d2 Σs∗−s
u
B−s∗+1∑
j=0
Σjdµ
B+1−s∗−ju1 + ∆udΣ2
s∗−s−1∑
j=0
µB−jd2 Σj
u +
(∆ud + ∆um)µsd2Σ
s∗−s−1u
B−s∗+1∑
j=0
µj+1u1 ΣB+1−s∗−j
d +
25
(∆md + ∆um)µd1
s−1∑
k=0
µkd2Σ
s∗−2−ku
B−s∗+1∑
j=0
Σj+1d µB+1−s∗−j
u1
+
(∆umµd1µm2 + ∆md(ΣuΣd + µd1µm2)
)
s−1∑
k=0
µkd2Σ
s∗−2−ku
B−s∗+1∑
j=0
Σjdµ
B+1−s∗−ju1
≥ 0.
Then −Γ3(s)Γ1
≤ 0. Moreover,Γ3(s)Γ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. We
have
r(s, a121) +∑
j∈S
p(j|s, a121)h0(j)−r(s, a211) +
∑
j∈S
p(j|s, a211)h0(j)
= −Γ4(s)
Γ1,
where Γ4(s) = Γ14(s) + Γ2
4,
Γ14(s) =
Σ1
s−1∑
j=0
µjd2Σ
s∗−2−ju + (µm2Σu + µm1µd2)
s∗−s−2∑
j=0
Σjuµs∗−3−j
d2
×
(∆mdΣ2
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd + µB+2−s∗
u1 (∆md + ∆ud))≥ 0,
and Γ24 = f(s∗). Note that the definition of s∗ implies that Γ2
4 ≥ 0. Hence, Γ4(s) ≥ 0 and
−Γ4(s)Γ1
≤ 0. Moreover,Γ4(s)Γ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. Similarly,
r(s, a122) +∑
j∈S
p(j|s, a122)h0(j)−r(s, a211) +
∑
j∈S
p(j|s, a211)h0(j)
= −Γ5(s)
Γ1,
where
Γ5(s) = (∆md + ∆ud)
µB+2−s∗
u1
s−1∑
j=0
µjd2Σ
s∗−1−ju + Σ2µ
s−1d2 Σs∗−s
u
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd +
Σ2µm1
s−2∑
k=0
µkd2Σ
s∗−2−ku
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd +
Σ2ΣB+2−s∗d
s∗−s−1∑
j=0
µs∗−j−2d2 Σj
u + µB+2−s∗u1 µd1
s−1∑
j=0
µjd2Σ
s∗−2−ju
+
(∆md + ∆um)Σ2Σ1
s−1∑
k=0
µkd2Σ
s∗−2−ku
B+1−s∗∑
j=0
µju1Σ
B+1−s∗−jd ≥ 0.
Thus, −Γ5(s)Γ1
≤ 0. Moreover,Γ5(s)Γ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. We
now consider
r(s, a221) +∑
j∈S
p(j|s, a221)h0(j)−r(s, a211) +
∑
j∈S
p(j|s, a211)h0(j)
= −Γ6(s)
Γ1,
26
where Γ6(s) = Γ16(s) + Γ2
6,
Γ16(s) = (µm2Σu + µm1µd2)
s∗−s−2∑
j=0
Σjuµs∗−3−j
d2 ×µB+2−s∗
u1 (∆md + ∆ud) + ∆mdΣ2
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd
≥ 0,
and Γ26 = f(s∗) ≥ 0. Thus, Γ6(s) ≥ 0 and −Γ6(s)
Γ1≤ 0. Moreover, if s < s∗ − 1, then
Γ6(s)Γ1
= 0
if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. Similarly, if s = s∗ − 1, thenΓ6(s)Γ1
= 0 if
and only if f(s∗) = 0, which implies that s∗ − 1, s∗ ∈ S∗ by the definition of S∗, the fact that f is
non-increasing (see the proof of Proposition 3.2), and the fact that s∗ − 1 = s ≥ 1. Moreover,
r(s, a212) +∑
j∈S
p(j|s, a212)h0(j)−r(s, a211) +
∑
j∈S
p(j|s, a211)h0(j)
= −Γ7(s)
Γ1,
where
Γ7(s) = ∆um
s−1∑
k=0
µkd2Σ
s∗−2−ku
Σu
( B−s∗+2∑
j=0
µju1Σ
B+2−s∗−jd + µu2
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd
)+
µd1
B−s∗+2∑
j=0
µju1Σ
B+2−s∗−jd + µu2µd1
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd
+
∆udΣ2
µs−1
d2 Σs∗−s−1u
B−s∗+2∑
j=0
µju1Σ
B+2−s∗−jd + ΣB+2−s∗
d
s∗−s−2∑
j=0
µs∗−2−jd2 Σj
u
≥ 0.
Thus, −Γ7(s)Γ1
≤ 0. Moreover,Γ7(s)Γ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.Finally,
r(s, a222) +∑
j∈S
p(j|s, a222)h0(j)−r(s, a211) +
∑
j∈S
p(j|s, a211)h0(j)
= −Γ8(s)
Γ1,
where
Γ8(s) = Σ2
(∆um
(Σs∗−s−1
u µd1µs−1d2
B−s∗∑
j=0
Σj+1d µB−s∗−j
u1 +s−1∑
k=0
µkd2Σ
s∗−1−ku
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd +
µd1
s−2∑
k=0
µkd2Σ
s∗−2−ku
B−s∗+1∑
j=0
µju1Σ
B+1−s∗−jd
)+
(∆ud + ∆md)(Σs∗−s−1
u µs−1d2
B−s∗+1∑
j=0
Σj+1d µB−s∗+1−j
u1 + ΣB−s∗+2d
s∗−s−2∑
j=0
µs∗−2−jd2 Σj
u
)+
∆udΣs∗−su µs−1
d2 µB−s∗+1u1
)≥ 0.
27
Hence, −Γ8(s)Γ1
≤ 0. Moreover,Γ8(s)Γ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.This shows that δ1(s) = δ0(s) for all 1 ≤ s ≤ s∗ − 1.
We now consider s∗ ≤ s ≤ B + 1, for which we have δ0(s) = a221. In the interest of space we
will again specify the difference in (14) only for the actions a111, a211, a112, a121, a122, a212, and
a222. Define
Υ1 = ΣB+2−s∗d
(µd2Σ2 + (µd1 + µm1)Σ2
) s∗−2∑
j=0
µju1µ
s∗−2−jd2 + (µu2 + µd1 + µm1)
B+1∑
j=s∗−1
µju1Σ
B+1−jd +
B+2∑
j=s∗−1
µju1Σ
B+2−jd + µu2(µd1 + µm1)
B∑
j=s∗−1
µju1Σ
B−jd + µm1Σ1
s∗−2∑
j=0
Σs∗−2−ju µB−s∗+2+j
u1 +
µm1Σ1Σ2
s∗−2∑
j=0
Σs∗−2−ju
ΣB−s∗+2
d
j−1∑
k=0
µku1µ
j−1−kd2 +
B−s∗+1∑
k=0
µk+ju1 ΣB−s∗+1−k
d
.
Note that Υ1 is always positive. We have
r(s, a111) +∑
j∈S
p(j|s, a111)h0(j)−r(s, a221) +
∑
j∈S
p(j|s, a221)h0(j)
= −Υ2(s)
Υ1,
where
Υ2(s) = Σ1
∆md
s∗−2∑
j=0
µjd2Σ
s∗−2−ju
( B+1−s∑
k=0
µku1Σ
B+2−s∗−kd + µu2
B−s∑
k=0
µku1Σ
B+1−s∗−kd
)+
(∆md + ∆ud)(µd1 + µm1)s∗−1s−s∗∑
j=0
Σjdµ
B−s∗+1−ju1 +
∆udµB+1−su1 Σs−s∗+1
d
s∗−2∑
j=0
µjd2Σ
s∗−2−ju
≥ 0.
Then −Υ2(s)Υ1
≤ 0. Moreover,Υ2(s)Υ1
= 0 if and only if ∆ud = 0 (this can be seen by considering
the cases when s∗ = 1 and s∗ > 1 separately), which implies that S∗ 6= {s∗} (see equation (12) and
the definition of S∗). Similarly,
r(s, a211) +∑
j∈S
p(j|s, a211)h0(j)−r(s, a221) +
∑
j∈S
p(j|s, a221)h0(j)
= −Υ3(s)
Υ1,
where Υ3(s) = Υ13(s) + Υ2
3,
Υ13(s) = (µm1µd2 + µm1µm2 + µu1µm2)×
s−s∗−1∑
j=0
Σjdµ
B−s∗−ju1
(µs∗−1
d2 ∆ud + ∆um
( s∗−1∑
k=0
µkd2Σ
s∗−1−ku + µd1
s∗−2∑
k=0
µkd2Σ
s∗−2−ku
))≥ 0,
and Υ23 = −f(s∗ + 1). Note that the definition of s∗ implies that Υ2
3 ≥ 0. Thus, Υ3(s) ≥ 0
and −Υ3(s)Υ1
≤ 0. Moreover, if s > s∗, thenΥ3(s)Υ1
= 0 if and only if ∆ud = 0, which implies
28
that S∗ 6= {s∗}. Similarly, if s = s∗, thenΥ3(s)Υ1
= 0 if and only if f(s∗ + 1) = 0, which implies
that s∗, s∗ + 1 ∈ S∗ by the definition of S∗, the fact that f is non-increasing, and the fact that
s∗ = s ≤ B + 1. We now consider
r(s, a112) +∑
j∈S
p(j|s, a112)h0(j)−r(s, a221) +
∑
j∈S
p(j|s, a221)h0(j)
= −Υ4(s)
Υ1,
where
Υ4(s) = (∆ud + ∆um)
Σs∗−1
u
((µd1 + µm1)
s−s∗∑
j=0
Σjdµ
B+1−s∗−ju1 +
B+2−s∗∑
j=0
µju1Σ
B+2−s∗−jd
)+
µu2µs∗−1d2
B+1−s∑
j=0
µju1Σ
B+1−s∗−jd + ΣB+2−s∗
d
s∗−2∑
j=0
Σjuµs∗−1−j
d2
+
µu2Σ1∆um
s∗−2∑
j=0
µjd2Σ
s∗−2−ju
B+1−s∑
k=0
µku1Σ
B+1−s∗−kd + ∆md(µd1 + µm1)×
(Σ2
s∗−2∑
j=0
µjd2Σ
s∗−2−ju
B−s∑
k=0
µku1Σ
B+1−s∗−kd + Σs−s∗+1
d µB+1−su1
s∗−2∑
j=0
µjd2Σ
s∗−2−ju
)≥ 0.
Thus, −Υ4(s)Υ1
≤ 0. Moreover,Υ4(s)Υ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.For the action a121, we have
r(s, a121) +∑
j∈S
p(j|s, a121)h0(j)−r(s, a221) +
∑
j∈S
p(j|s, a221)h0(j)
= −Υ5(s)
Υ1,
where
Υ5(s) = ∆md
Σ1Σ2
s∗−2∑
j=0
µjd2Σ
s∗−2−ju
B+1−s∑
k=0
µku1Σ
B+1−s∗−kd + Σ2µ
s∗−1d2
B+1−s∑
j=0
µju1Σ
B+1−s∗−jd
+
Σ1∆ud
s−s∗∑
j=0
Σjdµ
B−ju1 + Σs−s∗
d µB+1−su1
s∗−2∑
j=0
µj+1d2 Σs∗−2−j
u +
µm1
s∗−2∑
j=0
µju1Σ
s∗−2−ju
s−s∗∑
k=0
Σs−s∗+kd µB+1−s−k
u1
≥ 0.
Hence, −Υ5(s)Υ1
≤ 0. Moreover,Υ5(s)Υ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.Similarly,
r(s, a122) +∑
j∈S
p(j|s, a122)h0(j)−r(s, a221) +
∑
j∈S
p(j|s, a221)h0(j)
= −Υ6(s)
Υ1,
where
Υ6(s) = Σ1∆ud
µs∗
d2
s−s∗∑
j=0
Σjdµ
B−s∗−ju1 + µB−s+1
u1 Σs−s∗d
(s∗−2∑
j=0
µj+1d2 Σs∗−2−j
u + µm1
s∗−2∑
j=0
µju1Σ
s∗−2−ju
) +
29
Σ2(∆md + ∆ud + ∆um)µs∗−1d2
B+1−s∑
j=0
µju1Σ
B+1−s∗−jd +
Σ1Σ2(∆md + ∆um)s∗−2∑
j=0
µjd2Σ
s∗−2−ju
B+1−s∑
k=0
µku1Σ
B+1−s∗−kd ≥ 0.
Then −Υ6(s)Υ1
≤ 0. Moreover,Υ6(s)Υ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}. We
now consider
r(s, a212) +∑
j∈S
p(j|s, a212)h0(j)−r(s, a221) +
∑
j∈S
p(j|s, a221)h0(j)
= −Υ7(s)
Υ1,
where Υ7(s) = Υ17(s) + Υ2
7,
Υ17(s) = (∆ud + ∆um)
Σ2µ
s∗−1d2
B+1−s∑
j=0
µju1Σ
B+1−s∗−jd +
(µd2µm1 + µm2µu1 + µm2µm1)µs∗−1d2
s−s∗−1∑
j=0
µB−s∗−ju1 Σj
d
+
Σ1∆um
Σ2
s∗−2∑
j=0
µjd2(µd1 + µm1)s∗−2−j
B+1−s∑
k=0
µku1Σ
B+1−s∗−kd +
(µd2µm1 + µm2µu1 + µm2µm1)s∗−2∑
j=0
µjd2Σ
s∗−2−ju
s−s∗−1∑
k=0
µB−s∗−ku1 (µm2 + µu2)k
≥ 0,
and Υ27 = −f(s∗ + 1) ≥ 0. Thus, Υ7(s) ≥ 0 and −Υ7(s)
Υ1≤ 0. Moreover,
Υ7(s)Υ1
= 0 if and only if
∆ud = 0, which implies that S∗ 6= {s∗}. Finally,
r(s, a222) +∑
j∈S
p(j|s, a222)h0(j)−r(s, a221) +
∑
j∈S
p(j|s, a221)h0(j)
= −Υ8(s)
Υ1,
where
Υ8(s) = Σ2
∆um
((Σs∗−1
u + µs∗−1d2 )
B+1−s∑
j=0
µju1Σ
B+1−s∗−jd +
(µd1 + µd2)s∗−2∑
j=0
µjd2Σ
s∗−2−ju
B+1−s∑
k=0
µku1Σ
B+1−s∗−kd
)+ ∆udµ
s∗−1d2
B+1−s∑
j=0
µju1Σ
B+1−s∗−jd
≥ 0.
Thus, −Υ8(s)Υ1
≤ 0. Moreover,Υ8(s)Υ1
= 0 if and only if ∆ud = 0, which implies that S∗ 6= {s∗}.This shows that δ1(s) = δ0(s) for all s∗ ≤ s ≤ B + 1.
We finally consider s = B + 2, for which we have δ0(B + 2) = a222. Define
Ψ = Σ1
Σs∗−1
u
B+2−s∗∑
j=0
µju1Σ
B+2−s∗−jd + ΣB+2−s∗
d
s∗−2∑
j=0
µj+1d2 Σs∗−2−j
u
> 0.
30
We then have
r(s, aIII) +∑
j∈S
p(j|s, aIII)h0(j)−r(s, a222) +
∑
j∈S
p(j|s, a222)h0(j)
= −Σ2Ψ
Υ1< 0,
r(s, a2II) +∑
j∈S
p(j|s, a2II)h0(j)−r(s, a222) +
∑
j∈S
p(j|s, a222)h0(j)
= −(µm2 + µu2)Ψ
Υ1≤ 0,(18)
r(s, aI2I) +∑
j∈S
p(j|s, aI2I)h0(j)−r(s, a222) +
∑
j∈S
p(j|s, a222)h0(j)
= −(µd2 + µu2)Ψ
Υ1< 0,
r(s, aII2) +∑
j∈S
p(j|s, aII2)h0(j)−r(s, a222) +
∑
j∈S
p(j|s, a222)h0(j)
= −ΣdΨ
Υ1< 0,
r(s, a22I) +∑
j∈S
p(j|s, a22I)h0(j)−r(s, a222) +
∑
j∈S
p(j|s, a222)h0(j)
= −(µu2)Ψ
Υ1≤ 0, (19)
r(s, a2I2) +∑
j∈S
p(j|s, a2I2)h0(j)−r(s, a222) +
∑
j∈S
p(j|s, a222)h0(j)
= −(µm2)Ψ
Υ1≤ 0, (20)
r(s, aI22) +∑
j∈S
p(j|s, aI22)h0(j)−r(s, a222) +
∑
j∈S
p(j|s, a222)h0(j)
= −(µd2)Ψ
Υ1< 0.
Note that in (18), we have the expression equal to zero only when µm2 = µu2 = 0, in which case
a222 is identical to a2II ; in (19), we have the expression equal to zero only when µu2 = 0, in which
case a222 is identical to a22I ; and finally in (20), we have the expression equal to zero only when
µm2 = 0, in which case a222 is identical to a2I2. This shows that δ1(B + 2) = δ0(B + 2).
We have shown that δ1(s) = δ0(s) for all s ∈ S. By Theorem 9.5.1 of Puterman [16], this
proves that the policy described in Theorem 3.1 is optimal. In order to prove the uniqueness of the
optimal policy, we consider a decision rule δ′ that differs from δ0 in at least one state s ∈ S. As is
done in Lemma 9.2.4 of Puterman [16], define
u = Pδ′g0e− g0e = 0e,
v = rδ′ + (Pδ′ − I)h0 − g0e = rδ′ + Pδ′h0 − (rδ0 + Pδ0h0),
where we have used equation (11). Note that it follows from our derivations above that
v(s) ≤ 0, ∀s ∈ S, and S∗ = {s∗} ⇒ v(s) < 0, ∀s ∈ S with δ′(s) 6= δ0(s). (21)
Let g′ denote the gain of the stationary policy (δ′)∞, let P ∗δ′ be the limiting matrix under decision
rule δ′ (see Section A.4 of Puterman [16]), and define ∆g = g′−g0e. Suppose that Pδ′ has n recurrent
classes and partition Pδ′ such that P1, . . . , Pn correspond to transitions within recurrent classes,
Q1, . . . , Qn correspond to transitions from transient to recurrent classes and Qn+1 corresponds
to transitions between transient states. Also, partition g′, ∆g, v, and P ∗δ′ in a manner that is
consistent with this partition of Pδ′ . For example g′i is a vector of constants with appropriate
31
dimension denoting the gain in recurrent class i for 1 ≤ i ≤ n. Then we know from Lemma 9.2.5
of Puterman [16] that
∆gi = P ∗i vi, for all i = 1, . . . , n. (22)
Since Pδ0 is irreducible, it is clear that if δ′ differs from δ0 in at least one state s ∈ S, then δ′ must
differ from δ0 in at least one state s0 ∈ S that is recurrent under δ′. But then S∗ = {s∗} and
equations (21) and (22) imply that g′(s0)− g0 < 0, so that the decision rule δ′ cannot be optimal.
This proves that (δ0)∞ is the unique optimal policy when S∗ = {s∗}. 2
References
[1] Ahn, H.-S., I. Duenyas, and M. E. Lewis. 2002. The Optimal Control of a Two-Stage Tandem
Queueing System with Flexible Servers. Preprint .
[2] Ahn, H.-S., I. Duenyas, and R. Zhang. 1999. Optimal Stochastic Scheduling of a Two-Stage
Tandem Queue with Parallel Servers. Advances in Applied Probability, 31, 1095–1117.
[3] Ahn, H.-S., I. Duenyas, and R. Q. Zhang. 2002. Optimal Control of a Flexible Server. Preprint .
[4] Andradottir, S., H. Ayhan, and D. G. Down. 2001. Server Assignment Policies for Maximizing
the Steady-State Throughput of Finite Queueing Systems. Management Science, 47, 1421–
1439.
[5] Andradottir, S., H. Ayhan, and D. G. Down. 2002. Dynamic Server Allocation for Queueing
Networks with Flexible Servers. Under review.
[6] Bartholdi, III, J. J., and D. D. Eisenstein. 1996. A Production Line that Balances Itself.
Operations Research, 44, 21–34.
[7] Bartholdi, III, J. J., D. D. Eisenstein, and R. D. Foley. 2001. Performance of Bucket Brigades
when Work is Stochastic. Operations Research, 49, 710–719.
[8] Bell, S. L., and R. J. Williams. 2001. Dynamic Scheduling of a System with Two Parallel Servers
in Heavy Traffic with Complete Resource Pooling: Asymptotic Optimality of a Continuous
Review Threshold Policy. Annals of Applied Probability, 11, 608–649.
[9] Farrar, T. M. 1993. Optimal Use of an Extra Server in a Two Station Tandem Queueing
Network. IEEE Transactions on Automatic Control , 38, 1296–1299.
[10] Hajek, B. 1984. Optimal Control of Two Interacting Service Stations. IEEE Transactions on
Automatic Control , 29, 491–499.
[11] Harrison, J. M. and M. J. Lopez. 1999. Heavy Traffic Resource Pooling in Parallel-server
Systems. Queueing Systems, 33, 339–368.
32
[12] Mandelbaum, A., and A. L. Stolyar. 2002. Scheduling Flexible Servers with Convex Delay
Costs: Heavy-Traffic Optimality of the Generalized cµ-Rule. Preprint .
[13] McClain, J. O., L. J. Thomas, and C. Sox. 1992. “On-the-fly” Line Balancing with Very Little
WIP. International Journal of Production Economics, 27, 283–289.
[14] Ostolaza, J., J. O. McClain, and L. J. Thomas. 1990. The Use of Dynamic (State-Dependent)
Assembly-Line Balancing to Improve Throughput. J. Mfg. Oper. Mgt., 3, 105–133.
[15] Pandelis, D. G., and D. Teneketzis. 1994. Optimal Multiserver Stochastic Scheduling of Two
Interconnected Priority Queues. Advances in Applied Probability, 26, 258–279.
[16] Puterman, M. L. 1994. Markov Decision Processes. John Wiley & Sons, New York, NY.
[17] Rosberg, Z., P. P. Varaiya, and J. C. Walrand. 1982. Optimal Control of Service in Tandem
Queues. IEEE Transactions on Automatic Control , 27, 600–609.
[18] Squillante, M. S., C. H. Xia, D. D. Yao, and L. Zhang. 2000. Threshold Based Priority Policies
for Parallel-server Systems with Affinity Scheduling. Extended abstract .
[19] Van Oyen, M. P., E. G. S. Gel, and W. J. Hopp. 2001. Performance Opportunity for Workforce
Agility in Collaborative and Noncoolabortive Work Systems. IIE Transactions, 33, 761–777.
[20] Williams, R. J. 2000. On Dynamic Scheduling of a Parallel Server System with Complete
Resource Pooling. In Analysis of Communication Networks: Call Centres, Traffic and Perfor-
mance, D. R. McDonald and S. R. E. Turner (eds.), Fields Institute Communications Volume
28, American Mathematical Society, 49–71.
[21] Zavadlav, E., J. O. McClain, and L. J. Thomas. 1996. Self-buffering, Self-balancing, Self-
flushing Production Lines. Management Science, 42, 1151–1164.
33