Decentralized decision-making in a large team with local

b

h agentking hisached bymon

ed sincewhichyed astive to

ould beize moreber of

Games and Economic Behavior 43 (2003) 266–295www.elsevier.com/locate/ge

Decentralized decision-makingin a large team with local information

Paat Rusmevichientonga and Benjamin Van Royb,∗

a Stanford University, Stanford, CA 94305, USAb Management Science and Engineering, Terman Engineering Center 427, Stanford University,

Stanford, CA 94305-4023, USA

Received 9 August 2000

Abstract

We study a problem involving a team of agents each associated with a node in a chain. Eacmakes a decision that influences only his own cost and those of adjacent agents. Prior to madecision, each agent observes only the cost structure associated with nodes that can be retraversing no more thanr arcs. Decisions are selected without any coordination, with the comobjective of minimizing average cost among agents. We consider such decisions decentralizagents act based on different information. Cost incurred by an optimal centralized strategy, ina single decision-maker has access to all information and dictates all decisions, is emploa performance benchmark. We show that, to maintain a certain level of performance relaoptimal centralized strategies, decentralized deterministic strategies requirer to be proportional tothe number of agents. This means that the amount of information accessible to any agent shproportional to the total number of agents. Stochastic strategies, on the other hand, decentralgracefully—the amount of information required by each agent is independent of the total numagents. 2003 Elsevier Science (USA). All rights reserved.

* Corresponding author.E-mail addresses:[email protected] (P. Rusmevichientong), [email protected] (B. Van Roy).

0899-8256/03/$ – see front matter 2003 Elsevier Science (USA). All rights reserved.doi:10.1016/S0899-8256(03)00006-X

P. Rusmevichientong, B. Van Roy / Games and Economic Behavior 43 (2003) 266–295267

ic

y

y

riguedmplifiedplexityms. The

Nomenclature

B upper bound on cost functions;d number of nodes in the chain;δ probability that a decision

is chosen randomly underδ-stochastic strategies;

E[ ] expectation of a randomvariable;

f overall cost function;fi cost function at nodei;f ri vector of cost functions that

are within the neighborhood ofradiusr aroundfi ;

Ji cost-to-go function at nodeiunder the optimal policy;

Ji approximation to Ji underthe decentralized stochasticstrategy;

Ji,l cost-to-go function generatedduring the computation ofJi ;

Mδ set of functions that defineδ-stochastic strategies;

Pr probability of an event;πi function that defines the

stochastic decision at nodei + 1 under the optimalδ-stochastic strategy;

πi function that defines thedecision at nodei + 1 underthe decentralized stochasticstrategy;

Ψd,U ,B class of problem instances;Φd,U set of deterministic strategies;

Φd,U set of stochastic strategies;

Φδd,U set ofδ-stochastic strategies;

Φd,U ,r set of decentralized deterministstrategies of radiusr;

Φd,U ,r set of decentralized stochasticstrategies of radiusr;

φ strategy;φ∗ optimal (deterministic) strategy;φδ optimalδ-stochastic strategy;φδ,r decentralized stochastic strateg

of radiusr;φδ,r r-lookahead strategy;φi(f ) decision at nodei for the

problem instancef under thestrategy φ (φ∗

i (f ), φδi (f ),

φδ,ri (f ), and φ

δ,ri (f ) are

defined analogously);r radius of decentralization;T i dynamic programmingoperator

at nodei;T iπi dynamic programmingoperator

at nodei under the mappingπi ;ui decision at nodei;u∗i optimal decision at nodei;

U set of available decisions ateach node;

Ud set of available decisions foreach problem instance;

Wi random variable that is uniformldistributed in[0,1];

(Ω,F ,P) probability triple;‖ · ‖s span-seminorm;‖ · ‖∞ maximum norm;D equality in distribution.

1. Introduction

For several decades, the subject of decentralized decision-making has inteconomists, operations researchers, and engineers. Today, interest in the topic is aby its necessity. Centralized decisions are not an option in the face of magnified comand miniaturized time scales posed by modern organizations and engineering syste

268 P. Rusmevichientong, B. Van Roy / Games and Economic Behavior 43 (2003) 266–295

e two

72)—limitedmakesle to

s arele, asand

uced

blemsan ben ofp the

se of awitht each

resulty Ba¸

omplexond itsf takingarises

ith andby theality of

ysicaltion”

actly in

csnttics

cope

Internet and social networks with their ever increasing size and interconnectivity pospopular examples.

Team theory—as introduced by Marschak (1955) and Marschak and Radner (19presents a general framework for decentralized decision-making that accounts forexchange of information. A team is defined as a collection of agents, each of whoma decision based on information available to him—which may differ from that availabother agents. Agents share as their goal minimization of a common cost function.

Unfortunately, even for simple problems where optimal centralized decisionreadily derived, computation of optimal decentralized decisions can be intractabformally established by the work of Tsitsiklis and Athans (1985) and PapadimitriouTsitsiklis (1986). This impediment motivates more restrictive formulations with redcomputational requirements.

Two early papers by Radner offer interesting cases where team decision prosimplify. The first (Radner, 1959) shows that optimal decentralized decisions cgenerated via linear programming if cost is a known convex polyhedral functiodecisions. This procedure is efficient when the number of linear pieces making ucost function is reasonably small. A second paper (Radner, 1962) treats the caquadratic cost function. Coefficients for linear terms of the cost function togetherall observations are drawn from a Gaussian prior. The main result establishes thaagent’s optimal decision is a linear function of his observations. Extensions of thisto settings where agents possess differing Gaussian priors have been explored bsar(1985).

Though the Gaussian model captures a broad array of relevant phenomena, corganizations and engineering systems of contemporary interest generally fall beyscope. Such systems do nevertheless exhibit special structure. However, instead othe form of Gaussian distributions and linear relationships, the special structurefrom limited interdependencies among agents. Each agent shares information winfluences costs of “nearby” agents but not agents that are “far away.” As suggestedcolloquial terms in the preceding sentence, influence between agents and commoninformation is governed by proximity. In a physical system this may correspond to phdistance, while in a social network proximity may relate to the “degrees of separabetween two individuals.

Regardless of the context, proximity between agents can be represented abstrterms of an undirected graphG with verticesV = 1, . . . , d and edgesE ⊆ V × V . Thedistanceρ(i, j) between two nodesi, j ∈ V is defined to be the minimal number of arneeded to form a path connectingi and j . Each nodei ∈ V corresponds to an ageand therefore a decisionui . We take this graph to identify two structural characterisassociated with a given decision problem. First, cost decomposes according to

f (u)=∑(i,j)∈E

fij (ui, uj ).

Second, eachith agent observes a cost functionfjk if and only if ρ(i, j) r andρ(i, k) r. Hence, the graph limits information available to agents as well as the sof their influence.


s, andtationited toilisticoint tog, werm of

hich

isionstralizede, withd fromwhereementad intead,

trategiesents torelative

willthe face

willgent toroaderandomciated

de canrandomlem at

hat ofsults

gies,f

ow by

timal

Graphs are capable of capturing structure inherent in many complex systemexploring how this structure may reduce requirements of information and compuposes a promising research direction. Indeed, graphical models have been explothis end in nonserial dynamic programming (Bertelé and Brioschi, 1972) and probabinference (see, e.g., (Jensen, 1996; Pearl, 1988)). In this paper, as a starting pexploring benefits enabled by graphical structure in decentralized-decision makinstudy problems for which interdependencies are represented by the simplest fograph—a chain.

In decentralized decision-making, there is a dividing line between situations in wagents can and in which they cannotcoordinate. We associate the termcoordinationwith an ability to select decision strategies in a centralized manner, though decgenerated by the strategies upon observation of problem data may entail only decenprocessing. In other words, the agents are allowed to communicate, even to colludother agents when they form a decision strategy; however, the decision generatesuch strategy may only use partial information. The focus of this paper is on the casecoordination is prohibited. This situation is of interest when one considers the managof large networks. In such contexts, coordination may incur substantial overhecommunication costs. Practical strategies typically do not involve coordination. Inseach agent makes his decision based on observed problem data, without regard to semployed by other agents. We model the absence of coordination by requiring agshare a common strategy. In other words, if two agents face the same problem datato their respective locations in the chain, they make the same decision.

Another dividing line separates deterministic from stochastic strategies. Weconsider both and demonstrate significant advantages associated with the latter inof decentralization. It is worth commenting on the form of stochastic strategies weconsider. One approach to defining a stochastic strategy involves allowing each agenerate random numbers that influence only his own decision. We consider a bclass of stochastic strategies in which different agents can make use of common rnumbers. In particular, random numbers—along with other problem data—are assowith particular nodes in the chain, and any agent observing data at a particular noalso observe random numbers associated with that node. One may think of thesenumbers as information that is available at a node but irrelevant to the decision probhand.

In the next section, we formulate the main problem considered in the paper—tdecentralized decision-making in a chain without coordination. In this context, our recapture the following facts:

(1) To maintain a certain level of performance relative to optimal centralized stratedecentralized deterministic strategies requirer to be proportional to the number oagents. In particular, the worst-case performance difference is bounded belc(1/4− r/d), whered is the length of the chain, andc is some positive constant.

(2) Stochastic strategies decentralize more gracefully. In particular, for anyp < 1, optimaldecentralized stochastic strategies offer performance differing from that of opcentralized strategies by no more thanc/rp, for some positive scalarc, which isindependent ofd .


sights. Ourse of

mationed

lizedance.edure

at the

t, it isdiffer

f thenation.cal orabsence

liness whoIn ourgraph.esults

ms,ions inishesximity

haringset of

ved withlevel

ed assed on

These results constitute the main contributions of the paper. They provide new ininto the role of information in the decision-making process in a team environmentresults are different from other works in the literature in our characterization and uproximity between agents in a team. Most of the prior research assume that an inforstructure is specifieda priori and focus primarily on finding an optimal decentralizcontroller.

The proof of the positive result involves showing that a particular decentrastochastic strategy, which is constructed in Section 3, delivers promised performThe proof is provided in Section 4. The strategy amounts to a computational procthat is fortuitously efficient, so as a by-product of our analysis, it is established thperformance identified by the theorem is delivered by an efficient algorithm.

In Section 5, we consider the case where coordination is allowed. In this contexeasy to show that the performance of optimal decentralized deterministic strategiesfrom that of an optimal centralized strategy by no more thanc/r, for some positive scalarc,independent of problem size. This performance appears quite positive in light onegative result concerning decentralized decision-making in the absence of coordiHowever, as we will further discuss in Section 5, coordination is sometimes impractiimpossible. In such cases, randomization enables comparable performance in theof coordination.

It is worth noting that our results relate to other interesting and quite separateof research. One involves the study of games in which a team of decision-makerare unable to coordinate face a single adversary (von Stengel and Koller, 1997).context, the adversary selects cost-structure, while constrained by the topology of aOur formulation brings a new sort of special structure to such problems, and our roffer means for a team to devise effective counter-measures against the adversary.

Another related line of research, involving minimum cost network-flow problestudies how changes in costs and constraints in one part of a network impact decisanother (Granot and Veinott Jr., 1985). The main result here is that “influence” diminas “distance” increases. This is the same intuitive notion that underlies our use of proin graphs as a basis for decentralization.

Finally, our results relate to the work of Billard and Pasquale (1995) on resource sin distributed computer systems, where multiple users attempt to access a fixedresources. The authors show that reasonable system performance can be achielimited communication among users. They showed that there is a threshold for theof communication below which decision quality deteriorates. Our work can be viewan extension of this idea, and it is distinguished by its use of special structure baproximity to characterize communication constraints.

2. Cost structure and decision strategies

We consider a problem entailing the selection of a decision from a finite setUd =U × · · · × U . An elementu ∈ Ud can be thought of as a vectoru = (u1, . . . , ud) with


s

ipletn

at,

forof

lss

thin a

ui ∈ U for eachi. The objective is to minimize a cost functionf :Ud → that decomposeaccording to

f (u)= 1

d

d−1∑i=1

fi(ui, ui+1),

for some sequence of functionsf1, . . . , fd−1. An optimal decisionu ∈ Ud is one thatattains the minimum off in Ud .

Eachinstanceof the aforementioned optimization problem is characterized by a trP = (d,U, f ) whered is the number of decision variables,U is a set whose CartesiaproductUd forms the decision space, andf = (f1, . . . , fd−1) is a sequence ofd − 1functions, with eachfi mappingU ×U to . We will denote byΨd,U ,B aclassof probleminstances, defined by

Ψd,U ,B = (d,U, f )

∣∣ ‖fi‖∞ B ∀i.Note that each element(d,U, f ) of Ψd,U ,B is distinguished only byf , and therefore wewill often refer to elements ofΨd,U ,B simply in terms off . For example, we will usestatementf ∈ Ψd,U ,B as shorthand for(d,U, f ) ∈ Ψd,U ,B . Abusing notation somewhawe will also usef as though it were a function defined by

f (u)= 1

d

d−1∑i=1

fi(ui, ui+1).

A strategyis a mappingφ :Ψd,U ,B → Ud that, for each problem instancef ∈ Ψd,U ,B ,generates a decisionφ(f ) ∈ Ud . LetΦd,U denotes the collection of strategies availableoptimization problems overUd . A strategyφ∗ :Ψd,U ,B → Ud associated with the classproblemsΨd,U ,B is optimal if

f(φ∗(f )

) = minu∈U f (u),

for everyf ∈ Ψd,U ,B .

2.1. Decentralized strategies

In general, selecting a componentui of a decision that minimizes the cost functionfentails consideration of every component functionfi . In other words, for each optimastrategyφ∗ ∈ Φd,U , φ∗

i (f ) may depend onf1, f2, . . . , fd−1—that is, optimal strategiearecentralized. We consider the possibility ofdecentralizingthe decision-making procesand the consequences of decentralization on cost.

In a decentralized strategy each decision is determined from cost functions wineighborhood. Let f ri denote a vector of cost functions that are within theneighborhoodof radiusr aroundfi , i.e.,

f ri = (f(i−r)∨1, . . . , fi , . . . , f(i+r)∧(d−1)).


onelativef the

havereover,ce thehouldue that

a newon is

that a

tradiusthat

s the

mightof an

We define the spaceΦd,U ,r of decentralized strategies of radiusr by

Φd,U ,r = φ ∈Φd,U

∣∣ ∀f ∈Ψd,U ,B,φi(f )= φi(f ri

),

andφi(f )= φj (f ) if f ri = f rj,

where the conditionφi(f ) = φi(fri ) means that the decisionφi(f ) only depends onf ri .

We require thatφi(f )= φj (f ) wheneverf ri = f rj . This condition means that the decisiat each node depends only on the cost functions in its neighborhood, not on its rposition in the chain. In a large network where it is impractical to keep track onetwork configuration, this requirement is quite reasonable since it is not practical toa decision strategy that depends on the location of each node in the network. Mothis assumption yields a decision strategy that can be implemented efficiently sindecision at each node only depends on the cost functions within its vicinity. We snote that this requirement effectively precludes coordination among nodes—an isswill be further discussed in Section 5.

Since decentralized strategies base decisions on partial cost information,optimality criterion that takes into account uncertainty of the unavailable informatirequired. We adopt a worst-case objective:

maxf∈Ψd,U ,B

f(φ(f )

).

The optimization problem faced is then

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

f(φ(f )

).

For decentralized strategies to be effective in large problems, one would hopesmall loss of optimality is incurred when the “radius of observation”r is sufficiently largeand that the required radius is independent of the problem sized . Unfortunately, this is nothe case. As captured by the following theorem, which is proved in Section 4, therequired forε-optimality grows linearly with problem size. In the theorem, we assumed 4 and 4r < d . In general, we are often interested in the case whenr d ; thus, thisassumption does not impose any practical constraint.

Theorem 1. For any|U | 2 andB > 0 there exists a positive scalarc such that

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

(f

(φ(f )

) − f(φ∗(f )

)) c

(1

4− r

d

),

for all d 4 and4r < d .

It is worth mentioning that our performance measure is quite conservative. It viewproblem instance confronted by a decision-maker as being chosen by anadversarythatknows exactly the decision strategy to be deployed. A less pessimistic formulationconstruct a probability distribution over problem instances and require optimizationexpectation:

minφ∈Φ E

[f

(φ(f )

)],

d,U ,r


e losst-caseutionsffersblemaker’s

nor thetegies

ag

n

y

es

inistic

where the expectation is taken over a random cost functionf drawn from Ψd,U ,B .A reasonable conjecture would be that—for certain classes of distributions—thof optimality associated with decentralization is attenuated relative to the worsformulation. This may be true. However, rather than characterizing classes of distribfor which decentralized decision-making is effective, we take a direction that ostronger justification for decentralization. In particular, we leave the choice of proinstance in the hands of an adversary with complete knowledge of the decision-mstrategy, but generalize the formulation to allow use ofstochasticstrategies in whichdecisions are influenced by random events predictable neither to the adversarydecision-maker. It turns out that—even in the adversarial setting—stochastic stradecentralize gracefully, as will be discussed in the following section.

2.2. Stochastic strategies

A stochastic decisionU = (U1,U2, . . . ,Ud) is a random variable defined onprobability space(Ω,F ,P), taking on values inUd . For the purpose of evaluatinstochastic decisions, we generalize our objective to one involving an expectationE[f (U)].An optimal stochastic decision minimizes this expectation. Astochastic strategyφ is amapping that, for each problem instancef ∈Ψd,U ,B and random sampleω ∈Ω , generatesa stochastic decisionφ(f,ω)= (φ1(f,ω), . . . , φd(f,ω)). We will often suppress notatioindicating the dependence onω and view each decisionφi(f ) = φi(f,ω) as a randomvariable. We denote the set of stochastic strategies byΦd,U , and a stochastic strategφ∗ ∈ Φd,U is optimal if

E[f

(φ∗(f )

)] = minφ∈Φd,U

E[f

(φ(f )

)],

for all f ∈ Ψd,U ,B .The space ofdecentralized stochastic strategies of radiusr, denoted byΦd,U ,r , is

defined by

Φd,U ,r = φ ∈ Φd,U

∣∣ ∀f ∈Ψd,U ,B, φi(f )= φi(f ri

),

φj (f )⊥ φk(f ) ∀ |j − k|> 2r,

andφi(f )D= φj (f ) if f ri = f rj

,

whereφj (f )⊥ φk(f ) denotes independence between random variablesφj (f ) andφk(f ).This condition precludes the possibility that agents further than distance 2r apart make useof common information. The conditionφi(f )= φi(f

ri ) requires that for eachω, φi(f,ω)

only depends onf ri . Finally, the conditionφi(f )D= φj (f )means that the random variabl

φi(f ) andφj (f ) have the same distribution wheneverf ri = f rj . Note that this definitiongeneralizes that employed for decentralized deterministic strategies—the determcounterpart can be viewed as a special case in which|Ω | = 1.

We set as our objective minimization of a worst-case criterion:

maxf∈Ψ E

[f

(φ(f )

)].

d,U ,B


deter-er—ralized

nsed by, which

ralizedence ofelated

ective

The optimization problem is therefore

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

E[f

(φ(f )

)].

It turns out that stochastic strategies decentralize more gracefully than theirministic counterparts. The following theorem—which is the main result of this papcharacterizes the rate at which performance converges to that of an optimal centstrategy:

Theorem 2. For anyp < 1, B, andU , there exists a scalarc such that

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

(E

[f

(φ(f )

)] − f(φ∗(f )

)) c

rp,

for all d andr.

One interesting implication of this theorem is that the radius of observationr required forε-optimality no longer depends on the problem sized . We describe in the next sectioa decentralized stochastic strategy that achieves the level of performance promiTheorem 2. This strategy is used as the basis for a constructive proof of the theoremis provided in Section 4.

3. A decentralized stochastic strategy

Rather than describing at once all characteristics of our proposed decentstochastic strategy, let us present and motivate its features piecemeal via a sequsimpler strategies. In particular, we define in the following subsections several rstrategies:

(1) a centralized deterministic strategy;(2) a centralized stochastic strategy;(3) and finally, a decentralized stochastic strategy.

3.1. A centralized deterministic strategy

Recall that a decisionu= (u1, . . . , ud) is optimal if it attains the minimum in

minu∈Ud

1

d

d−1∑i=1

fi(ui, ui+1).

The additive structure of our cost function enables dynamic programming as an effapproach to computing optimal decisions. In particular, optimal decisionsu1, . . . , ud canbe generated based on cost-to-go functionsJ0, . . . , Jd , each mappingU to . Thesefunctions are computed via a dynamic programming recursion:


rated

e

timalategy.aty is

m”ized

isions

efining

t

Jd(ud)= 0, ∀ud ∈ U,

Ji(ui)= minui+1∈U

(1

dfi(ui, ui+1)+ Ji+1(ui+1)

), ∀ui ∈ U, i = 0, . . . , d − 1,

wheref0(u0, u1)= 0. (We have introduced here an inconsequential decisionu0 and costfunctionf0 solely for notational convenience.) Optimal decisions can then be genesequentially by lettingu0 be any element ofU and, for eachi = 0, . . . , d − 1, lettingui+1be a decision attaining the minimum in

minui+1∈U

(1

dfi(ui, ui+1)+ Ji+1(ui+1)

).

For concreteness, let us denote byι an arbitrary fixed element ofU , and letu0 = ι. Let usdefineπi :U → U as a mapping from a decisionui to one that attains the minimum in thabove expression. Then, the equation

ui+1 = πi(ui), ∀i = 0, . . . , d − 1,

together with the initial conditionu0 = ι defines a sequence of optimal decisions.We have provided a method—based on dynamic programming—for computing op

decisions given a problem instance. It is straightforward to translate this into a strIn particular, an optimal centralized deterministic strategyφ∗ can be defined as one thgenerates decisions according to the method we have provided. Note that this strategnotdecentralized. Decisions are generated sequentially, so eachith decisionui is influenced by“upstream” decisionsui−1, ui−2, . . . , u1. Furthermore, the selection of eachith decisionui relies on the cost-to-go functionJi , which is computed based on “downstreacost functionsfi, fi+1, . . . , fd−1. In the coming subsections, we develop a decentralstochastic variant of this strategy.

3.2. A centralized stochastic strategy

The deterministic strategy introduced in the previous section generates deciteratively according to

ui+1 = πi(ui), ∀i = 0, . . . , d − 1.

One mechanism for introducing randomness into the decision process involves redeachπi to be a function of two variables: a decisionU and a random number in[0,1]. Inparticular, we will consider a stochastic decisionU = (U1, . . . ,Ud) generated by lettingU0 = ι and

Ui+1 = πi(Ui,Wi), ∀i = 0, . . . , d − 1, (1)

for some sequence of functionsπ0, . . . , πd−1, whereW0, . . . ,Wd−1 are independenrandom variables, each uniformly distributed in[0,1].

We will consider only functionsπi from a special class. For eachδ ∈ (0,1), we definea setMδ to consist of functionsπ :U × [0,1] → U that take on the form

π(U,W)=π(U) if 0 W < 1− δ,

thekth element ofU if 1 − δ+ (k − 1)δ W < 1− δ + kδ,
|U | |U |


e

ty

rategy

g an

ew

d viaes amakes

lined

for some functionπ :U → U , where the setU is assumed to be ordered in sompredetermined way, andπ(U,1) is defined to be the|U |th element ofU . Note that ifW is uniformly distributed in[0,1], a functionπ ∈Mδ assigns a (dominant) probabili1 − δ to a selection of a distinguished decisionπ(U). The remaining probabilityδ isdistributed evenly among all decisions. Hence, the decision follows a deterministic stwith probability 1− δ and is chosen in a completely random way with probabilityδ.

A δ-stochastic strategyφ is a mapping that, for each problem instancef ∈ Ψd,U ,B ,generates a decisionφ(f ) according to Eq. (1) with eachπi being an element ofMδ. Theclass ofδ-stochastic strategies will be denoted byΦδ

d,U . A strategyφδ ∈ Φδd,U is an optimal

δ-stochastic strategy if it attains the minimum in

minφ∈Φδd,U

maxf∈Ψd,U ,B

E[f

(φ(f )

)].

We will establish in Section 4.2 that the loss in performance incurred when usinoptimal δ-stochastic strategyφδ in place of an optimal deterministic strategyφ∗ growsat most linearly inδ.

Analogous to the deterministic case, given a problem instancef ∈ Ψd,U,B , an optimalδ-stochastic decisionφδ(f ) can be computed via dynamic programming. In our ncontext, cost-to-go functions are generated according to

Jd(u)= 0, ∀u ∈ U,

Ji(u)= minπ∈Mδ

E

[1

dfi

(u,π(u,Wi)

) + Ji+1(π(u,Wi)

)], ∀u ∈ U, i = 0, . . . , d − 1,

where the expectation is taken overWi . Given the cost-to-go functions, eachπi is chosento attain the minimum in

minπ∈Mδ

E

[1

dfi

(u,π(u,Wi)

) + Ji+1(π(u,Wi)

)],

simultaneously for allu. Then, the sequence of decisionsφδ1(f ), . . . , φδd(f ) can be

generated iteratively by lettingφδ0(f )= ι and

φδi+1(f )= πi(φδi (f ),Wi

), ∀i = 0, . . . , d − 1.

Note that—similar to the deterministic context—stochastic strategies generatedynamic programming arenot decentralized. The iterative selection process inducdependence of future decisions on preceding ones, while use of cost-to-go functionsthe choice of a decision rely on all future cost information.

Let us close this subsection by introducing some notation that will enable streamdiscussion of ideas we have developed. LetT i :|U | → |U | be defined by

(T iH

)(u)= min

π∈MδE

[1

dfi

(u,π(u,Wi)

) +H(π(u,Wi)

)], ∀H ∈ |U |, u ∈ U .

Also, for anyπ ∈Mδ, let T iπ :|U | → |U | be defined by

(T iπH

)(u)=E

[1fi

(u,π(u,Wi)

) +H(π(u,Wi)

)], ∀H ∈ |U |, u ∈ U .

d


vely

:

infor-

eceding

r

s,

at all

Given this notation, we see that the cost-to-go functions are related via

Ji = T iJi+1, ∀i = 0, . . . , d − 1.

Once cost-to-go functionsJ0, . . . , Jd are computed, decisions are generated iteratiaccording

φδi+1(f )= πi(φδi (f ),Wi

), ∀i = 0, . . . , d − 1,

whereφδ0(f )= ι, and eachπi ∈Mδ is chosen to satisfy

T iπi Ji+1 = T iJi+1.

The expected cost,E[f (φδ(f ))], is given by

E[f

(φδ(f )

)] = T 0π0J1 = J0.

3.3. A decentralized stochastic strategy

In this section, we will formulate a decentralized version ofφδ , which will be denotedbyφδ,r . To motivate the definition ofφδ,r , let us briefly review whyφδ is not decentralized

(1) Use of cost-to-go functions makes the choice of a decision rely on all future costmation. In other words, the computation ofJi depends onfi, fi+1, . . . , fd−1. As a re-sult, the mappingπi−1 that defines the decisionφδi (f ) depends onfi, fi+1, . . . , fd−1.

(2) The iterative selection process induces a dependence of future decisions on prones, that is,φδi (f ) depends onφδi−1(f ), . . . , φ

δ1(f ).

Our decentralized stochastic strategyφδ,r alleviates these dependencies.First, in order to limit the dependence on future cost information tor stages, we conside

an approximation toJi , denoted byJi , which is generated according toJi = Ji,0, where

Ji,r = 0, Ji,l = T i+l Ji,l+1, ∀l = 0, . . . , r − 1.

Note that computation ofJi relies only onfi, fi+1, . . . , fi+r−1. The hope is that—ifr issufficiently large—Ji provides a good approximation toJi . Given such approximationwe choose a sequenceπ0, . . . , πd−1 ∈Mδ to satisfy

T iπiJi+1 = T iJi+1, ∀i = 0,1, . . . , d − 1.

Note that eachπi is determined by the cost functionsfi, fi+1, . . . , fi+r .One might imagine generating decisions according to

φδ,ri+1(f )= πi

(φδ,ri (f ),Wi

), ∀i = 0, . . . , d − 1,

with φδ,r0 (f ) = ι. However, each such decision would then depend on decisions

preceding stages. To limit this dependence tor stages we let each decisionφδ,ri (f ) be

generated according toφδ,ri (f )= φδ,ri,0 (f ), where

φδ,r

(f )= πi+l(φδ,r(f ),Wi+l

), ∀l = −r, . . . ,−1,
i,l+1 i,l


ss

t

hoicefining afluencede bytegy arembershe alsolevantndomrtheless,h ins

n Sec-rmin-tralizedpertiesalysisan op-

sect

inistic

with φδ,ri,−r (f )= ι.

Indeed,φδ,r is a decentralized strategy (with radiusr). To see this, consider the proceof generating eachφδ,ri (f ). It is clear that the distribution ofφδ,ri (f ) is completelydetermined from the mappingsπi−r , πi−r+1, . . . , πi−1, which depend only on cosfunctionsfi−r , . . . , fi , . . . , fi+r . In addition, for anyi, j with i > j + r, the decisionφδ,ri (f ) makes use of the random numbersWi−r , . . . ,Wi−1, while the decisionφδ,rj (f )

usesWj−r , . . . ,Wj−1. SinceWi ’s are independent,φδ,ri (f ) is independent ofφδ,rj (f ).The situation whenj > i + r is analogous.

Before we proceed with the performance analysis, let us briefly discuss the cof decentralized stochastic strategies proposed in this paper. One approach to destochastic strategy involves allowing each agent to generate random numbers that inonly his own decision. This approach would imply independence of decisions maeach agent. However, the decisions generated by each agent under an optimal stradependent. To allow for this dependency, our formulation associates a random nuwith nodes in the chain. When an agent observes cost-relevant data at a node,observes its random number. This random number might be thought of as cost-irredata that can be observed by agents within the vicinity. The use of common ranumbers induces dependencies among decisions made by different agents. Neveeach agent’s decision depends only on information within its vicinity. We will establisthe next section that—given appropriate choices ofδ andr – φδ,r satisfies the propertiepromised by Theorem 2.

4. Performance analysis

In this section, we analyze performance of decentralized strategies. We start ition 4.1 by proving Theorem 1—our negative result concerning decentralized deteistic strategies. The remaining sections pursue an analysis of the stochastic decenstrategy proposed in the previous section, showing that it does indeed exhibit propromised by Theorem 2. This constitutes a constructive proof of the theorem. The anis presented in stages. In Section 4.2, we start by arguing that the performance oftimal δ-stochastic strategyφδ is close to that of an optimal centralized strategyφ∗. Then,in Section 4.3, the decentralized stochastic strategyφδ,r is shown to perform almost awell asφδ , provided thatr is sufficiently large. The proof of Theorem 2 is almost a dirconsequence, as discussed in Section 4.4.

4.1. Performance of decentralized deterministic strategies (Proof of Theorem 1)

Recall that Theorem 1 identifies performance limitations of decentralized determstrategies. Formally, the theorem states that for any|U | 2 andB > 0, there exists apositive scalarc such that

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

(f

(φ(f )

) − f(φ∗(f )

)) c

(1

4− r

d

),

for all d 4 and 4r < d .


exedated

nisticate thereasonon costrexceptface of. This

bsenceents.alled

rnative.makes

To prove the theorem, it suffices to consider the case ofU = 0,1. Let f ∗ :U × U →[−B,B] be defined by

f ∗(u, v)=B if u= v,−B if u = v.

Consider a problem instancef = (f1, . . . , fd−1) ∈Ψd,U ,B such thatfi = f ∗ for all i. Forthis problem, the optimal decision is either(0,1,0, . . .) or (1,0,1, . . .) with an associatedcost of−B(d − 1)/d .

Let us now consider decisions generated by an arbitrary strategyφ ∈ Φd,U ,r . Sincefi = f ∗ for all i, it follows from the definition ofΦd,U ,r that

φi(f )= φj (f ), r + 1 i, j d − 1− r,

which implies that

f(φ(f )

) (d − 2r − 2)B

d− B(2r + 1)

d,

where (d − 2r − 2)B/d corresponds to the cost associated with decisions indr + 1, . . . , d − 1 − r, while −B(2r + 1)/d represents a lower bound on costs associwith decisions indexed 1, . . . , r, d − r, . . . , d . Therefore,

f(φ(f )

) − f(φ∗(f )

)

(d − 2r − 2

d− 2r + 1

d+ d − 1

d

)B = 4B

(1

2− 1

d− r

d

)

4B

(1

4− r

d

),

where the last inequality follows from the fact thatd 4. Hence,

maxf∈Ψd,U ,B

(f

(φ(f )

) − f (φ∗(f )

)) 4B

(1

4− r

d

).

Sinceφ ∈Φd,U ,r is arbitrary,

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

(f

(φ(f )

) − f(φ∗(f )

)) 4B

(1

4− r

d

).

Let us highlight the intuitive reason for poor performance of decentralized determistrategies and briefly discuss possible remedies. This discussion will hopefully motivchoice of decentralized stochastic strategy considered in this paper. The primaryfor poor performance is the requirement that each agent’s decision depends onlyfunctions within its vicinity, i.e.,φi(f ) = φj (f ) wheneverf ri = f rj . Given the particulacost function used in the proof, this condition implies that the decision at each node,for those near the boundary, must be the same. This leads to poor performance in theour cost function, which is optimized when adjacent agents make different decisionssituation underscores the pitfalls associated with deterministic strategies in the aof coordination. One solution to this shortcoming is to allow coordination among agHowever, when coordination is impractical or impossible, an alternative solution is cfor.

The decentralized stochastic strategy presented in Section 3.3 offers such an alteBy appropriate randomization, this strategy avoids the situation where every agent


numberslect thewhich

y

m

o

the same decision. In some sense, the fact that nearby agents use common randomenables a form of coordination that reduces the probability that adjacent agents sesame decision. This fact will be employed in the upcoming performance analysis,establishes the desirable performance bound offered by Theorem 2.

4.2. Performance ofδ-stochastic strategies

As captured formally by the following lemma, the performance of optimalδ-stochasticstrategies degrades at most linearly asδ grows.

Lemma 1.

maxf∈Ψd,U,B

(E

[f

(φδ(f )

)] − f(φ∗(f )

)) 4δB.

Proof. Fix f ∈ Ψd,U ,B , and letφ∗ be an optimal strategy. Consider aδ-stochastic strategthat generates decisions using a sequenceπ0, . . . , πd−1 ∈Mδ defined by

πi(U,W)=φ∗i+1(f ) if 0 W < 1− δ,

kth element ofU if 1 − δ+ (k − 1)δ

|U | W < 1− δ+ kδ

|U | ,

whereπi(U,1) is defined to the|U |th element ofU . Hence, the decisions are randovariables generated according to

Ui+1 = πi(Ui,Wi), ∀i = 0, . . . , d − 1,

with U0 = ι. Since eachπi(Ui,Wi) depends only onWi , the decisionsU1, . . . ,Ud areindependent random variables. We therefore have(Ui,Ui+1) = (φ∗

i (f ),φ∗i+1(f )) with

probability at least(1− δ)2, which implies that

E∣∣fi(φ∗

i (f ),φ∗i+1(f )

) − fi(Ui,Ui+1)∣∣ 2B

(1− (1− δ)2

) 4δB,

where the inequalities hold because‖fi‖∞ B and 1− (1− δ)2 2δ. Sincef is arbitraryandφδ attains the optimum within the class ofδ-stochastic strategies, it follows that

maxf∈Ψd,U,B

(E

[f

(φδ(f )

)] − f(φ∗(f )

)) 4δB.

4.3. Performance losses from decentralizingδ-stochastic strategies

In this section, we will show that asr grows, the performance ofφδ,r becomes close tthat ofφδ in a sense stated formally in the following lemma.

Lemma 2.

maxf∈Ψd,U,B

(E

[f

(φδ,r(f )

)] −E[f

(φδ(f )

)])

(2B

δ+ 4B|U |2

)(1− δ

|U |)r.


dfrom

uch

to-go

enerate

nds

ed

to-goe

Instead of comparing directly the performance ofφδ,r with that of φδ, we will firstintroduce anr-lookaheadstrategy, denoted byφδ,r . This is a variant of the centralizestochastic strategyφδ that can be viewed as an intermediate step in the transitionthe centralized strategyφδ to the decentralized strategyφδ,r . The strategyφδ,r sharescommon features withφδ,r , though it makes use of centralized information as doesφδ .After studyingr-lookaheadstrategies in Section 4.3.1, we relate in Section 4.3.2 sstrategies to the decentralized strategyφδ,r in order to prove Lemma 2.

4.3.1. r-lookahead strategies and their performanceRecall that the decentralized strategyφδ,r entails computation of a sequenceJ0, . . . , Jd

of approximate cost-to-go functions. In particular, for eachi, Ji is obtained from thefollowing recursion:

Ji,r = 0, Ji,l = T i+l Ji,l+1, ∀l = 0, . . . , r − 1,

where we setJi = Ji,0. Mappingsπ0, . . . , πd−1 are then chosen to satisfy

T iπiJi+1 = T iJi+1, ∀i = 0,1, . . . , d − 1.

A sequence of decisionsφδ,ri,−r (f ), . . . , φδ,ri,0 (f ) is then computed according to

φδ,ri,l+1(f )= πi+l

(φδ,ri,l (f ),Wi+l

), ∀l = −r, . . . ,−1,

initialized withφδ,ri,−r (f )= ι, and the decisionφδ,ri (f ) is set to be equal toφδ,ri,0 (f ).An r-lookaheadstrategy entails computation of the same approximate cost-

functionsJ0, . . . , Jd and mappingsπ0, . . . , πd−1. However, ther-lookahead strategyφδ,r

differs from the decentralized strategy in the way that these mappings are used to gdecisions. In particular,φδ,r generates decisions according to

φδ,ri+1(f )= πi

(φδ,ri (f ),Wi

), i = 0,1, . . . , d,

initialized with φδ,r0 (f )= ι. Note thatφδ,r is notdecentralized since each decision depeon all of its predecessors.

Whenr becomes large, the performance ofφδ,r becomes close to that of the centralizstochastic strategyφδ , as captured in the following lemma.

Lemma 3.

maxf∈Ψd,U,B

(E

[f

(φδ,r (f )

)] −E[f

(φδ(f )

)]) 2B

δ

(1− δ

|U |)r.

We will prove this lemma in two stages. First, we will show that the approximate cost-functionsJi is close to the true cost-to-go functionJi , in some suitably defined sense. Wthen use this fact to prove that the two strategies deliver similar performance.

As a metric for the proximity betweenJi andJi we will use the span-seminorm‖ · ‖s .This seminorm is defined by

‖H‖s = maxH(u)− minH(u).
u∈U u∈U


lt

ciatedn

d

costs.

ion

from

for all H ∈ |U |. It is not hard to verify that for any scalarα ∈ ,

‖αH‖s = |α|‖H‖s,and forH1,H2 ∈ |U |,

‖H1 +H2‖s ‖H1‖s + ‖H2‖s .Moreover, if‖H‖s = 0, thenH = Ce for some scalarC ∈ . We have the following resuwhose proof is given in Appendix A.

Lemma 4. For eachi,

∥∥Ji − Ji∥∥s 2B

δd

(1− δ

|U |)r.

The key ideas underlying the proof of Lemma 4 are a contraction property assowith Ti and a bound on the cost-to-go functionJi . As shown in Appendix A, the contractioproperty implies that

∥∥Ji − Ji∥∥s ‖Ji+r‖s

(1− δ

|U |)r.

The maximum cost at each time step is of orderB/d and Ji+r denotes the expectecost-to-go starting from timei + r; thus, we might expect that‖Ji+r‖s is of order(d− i− r)×2B/d , where(d− i− r) denotes the remaining time afteri+ r. However, therandomization in our stochastic strategies induces mixing, which attenuates futureThis allows us to obtain a sharper bound on‖Ji+r‖s that is independent ofi:

‖Ji+r‖s 2B

d

(1+ (1− δ)+ · · · + (1− δ)d−i−r

) 2B

δd.

The lemma follows from this bound.The proximity betweenJi and Ji suggests that the mappingsπi and πi that are

generated fromJi andJi , respectively, should exhibit some form of similarity. This notis captured by the following lemma, whose proof is provided in Appendix B.

Lemma 5. For H1,H2 ∈ |U |, let θ1, θ2 ∈Mδ satisfy

T iθ1H1 = T iH1 and T iθ2H2 = T iH2.

Then,∥∥T iθ1H1 − T iθ2H1∥∥∞ ‖H1 −H2‖s .

When applied in a context whereH1 is a true cost-to-go function andH2 is anapproximation, this result shows that the performance of the greedy policyθ2 that is derivedfrom H2 approximates that of an optimal policy, provided thatH2 is close toH1 (in aspan-seminorm sense). Let us now prove Lemma 3, which almost directly followsLemmas 4 and 5.


e

at the995)).

Proof of Lemma 3. Fix f ∈ Ψd,U,B . Let φδ(f ) = (φδ1(f ), . . . , φδd(f )) and φδ,r (f ) =

(φδ,r1 (f ), . . . , φ

δ,rd (f )) denote the decisions under the strategiesφδ andφδ,r , respectively.

Let Ji andJi denote the cost-to-go functions underφδ(f ) andφδ,r (f ), respectively. Thescost-to-go functions are related via the following dynamic programming recursion:

Ji = T iπi Ji+1 and Ji = T iπiJi+1,

where we initializeJd = Jd = 0. Therefore,∥∥Ji − Ji∥∥∞ = ∥∥T iπi Ji+1 − T iπi

Ji+1∥∥∞

∥∥T iπi Ji+1 − T iπi Ji+1

∥∥∞ + ∥∥T iπi Ji+1 − T iπi Ji+1∥∥∞

∥∥T iπi Ji+1 − T iπi

Ji+1∥∥∞ + ∥∥Ji+1 − Ji+1

∥∥∞,

where the last inequality follows from a standard result in dynamic programming thoperatorT i

πiis non-expansive under the maximum norm (see, e.g., (Bertsekas, 1

Recall that the mappingsπi andπi are chosen to satisfy

T iπi Ji+1 = T iJi+1 and T iπiJi+1 = T i Ji+1,

where Ji+1 is the approximate cost-to-go functions computed under ther-lookaheadstrategy. It follows from Lemma 4 and 5 that

∥∥T iπi Ji+1 − T iπi Ji+1∥∥∞

∥∥Ji+1 − Ji+1∥∥s 2B

δd

(1− δ

|U |)r.

Therefore,

∥∥Ji − Ji∥∥∞ 2B

δd

(1− δ

|U |)r

+ ∥∥Ji+1 − Ji+1∥∥∞.

Using the fact thatJd = Jd = 0, the above recursion implies that

∥∥J0 − J0∥∥∞ 2B

δ

(1− δ

|U |)r.

Note that

E[f

(φδ(f )

)] = T 0π0J1 = J0 and E

[f

(φδ,r (f )

)] = T 0π0J1 = J0,

therefore,

(E

[f

(φδ,r (f )

)] −E[f

(φδ(f )

)]) 2B

δ

(1− δ

|U |)r.

Sincef ∈Ψd,U ,B is arbitrary,

maxf∈Ψd,U,B

(E

[f

(φδ,r (f )

)] −E[f

(φδ(f )

)]) 2B

δ

(1− δ

|U |)r.


a 3a,ma isciated

rategiese close.

4.3.2. Performance of the decentralized strategy (Proof of Lemma 2)In this section, we provide a proof of Lemma 2, which relies heavily on Lemm

and similarities betweenφδ,r andφδ,r . The proof also makes use of the following lemmwhich relates distributions of decisions under the two strategies. The proof of this lemgiven in Appendix C, and it exploits mixing of the stochastic decision processes assowith φδ,r andφδ,r .

Lemma 6. For f ∈ Ψd,U ,B andx, y ∈ U ,∣∣Pr(φδ,ri (f ),φ

δ,ri+1(f )

) = (x, y) − Pr

(φδ,ri (f ), φ

δ,ri+1(f )

) = (x, y)∣∣ 4(1− δ)r,

for all i.

The above lemma tells us that the decision-processes generated by the two stexhibit almost the same distribution. This suggests that expected costs should also bThe proof of Lemma 2, provided below, combines this insight with the Lemma 1.

Proof. Fix f ∈Ψd,U ,B . For anyi, we have

E[fi

(φδ,ri (f ),φ

δ,ri+1(f )

)] =∑

(x,y)∈U×Ufi(x, y)Pr

(φδ,ri (f ),φ

δ,ri+1(f )

) = (x, y),

and

E[fi

(φδ,ri (f ), φ

δ,ri+1(f )

)] =∑

(x,y)∈U×Ufi(x, y)Pr

(φδ,ri (f ), φ

δ,ri+1(f )

) = (x, y).

Thus, it follows from Lemma 6 that∣∣E[fi

(φδ,ri (f ),φ

δ,ri+1(f )

)] −E[fi

(φδ,ri (f ), φ

δ,ri+1(f )

)]∣∣ 4(1− δ)r

∑(x,y)∈U×U

∣∣fi(x, y)∣∣ 4B|U |2(1− δ)r ,

where the last inequality follows from the fact that‖fi‖∞ B. Therefore,∣∣E[f

(φδ,r(f )

)] −E[f

(φδ,r (f )

)]∣∣ 4B|U |2(1− δ)r .

Sincef ∈Ψd,U,B is arbitrary, it follows that

maxf∈Ψd,U,B

∣∣E[f

(φδ,r(f )

)] −E[f

(φδ,r (f )

)]∣∣ 4B|U |2(1− δ)r

4B|U |2(

1− δ

|U |)r.

We know from Lemma 3 that

maxf∈Ψd,U,B

(E

[f

(φδ,r (f )

)] −E[f

(φδ(f )

)]) 2B

δ

(1− δ

|U |)r,

from which it follows that

maxf∈Ψd,U,B

(E

[f

(φδ,r(f )

)] −E[f

(φδ(f )

)])

(2B

δ+ 4B|U |2

)(1− δ

|U |)r.


n inorst-

tegiest’sntifiesies areain ad

4.4. Proof of Theorem 2

The proof of Theorem 2 makes use of the following lemma whose proof is giveAppendix D. This result will be used to establish the desired upper bound on the wcase performance of decentralized stochastic strategies.

Lemma 7. For anya > 0 and0<p < 1, there exists a scalarc′ such that

rp(

1− a

rp

)r c′

rp,

for all r 1.

Proof of Theorem 2. It follows from Lemmas 1 and 2 that

maxf∈Ψd,U,B

(E

[f

(φδ(f )

)] − f(φ∗(f )

)) 4δB

and

maxf∈Ψd,U,B

(E

[f

(φδ,r(f )

)] −E[f

(φδ(f )

)])

(2B

δ+ 4B|U |2

)(1− δ

|U |)r.

Therefore,

maxf∈Ψd,U,B

(E

[f

(φδ,r(f )

)] − f(φ∗(f )

)) 4δB +

(2B

δ+ 4B|U |2

)(1− δ

|U |)r.

Sinceδ ∈ (0,1) can be chosen arbitrarily, letδ = r−p . It follows that

minφ∈Φd,U ,r

maxf∈Ψd,U,B

(E

[f

(φ(f )

)] − f(φ∗(f )

))

4B

rp+ (

2Brp + 4B|U |2)(1− 1

|U |rp)r.

It follows from Lemma 7 (witha = 1/|U |) that

(2Brp + 4B|U |2)(1− 1

|U |rp)r

c′′

rp,

for some constantc′′. The desired result follows.

5. Allowing coordination

In order to preclude cooperation, our definition of decentralized deterministic strarequires thatφi(f )= φj (f ) wheneverf ri = f rj . This condition implies that each agendecision does not depend on its relative position in the chain. Theorem 1 idean undesirable loss in performance introduced by decentralization when strategrequired to be deterministic and coordination is forbidden. In particular, to maintgiven level of error,r is required to be proportional tod . The situation is much improve


chasticf

uired

to that

by the introduction of stochastic strategies. Theorem 2 establishes that under the stostrategy, the radius of decentralizationr is independent ofd . In particular, the loss operformance due to decentralization is bounded for anyp < 1 by c/rp, wherec is a scalarwhose value does not depend ond .

Let us consider a situation in which coordination is allowed, but strategies are reqto be deterministic. In particular, the strategy space is

Φd,U ,r = φ ∈Φd,U

∣∣ ∀f ∈Ψd,U ,B, φi(f )= φi(f ri

).

The following theorem provides a bound offering a performance guarantee superiorassociated with decentralized stochastic strategies.

Theorem 3. For anyU andB,

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

(f

(φ(f )

) − f(φ∗(f )

)) 2B

r.

Proof. Without loss of generality, let us assume thatd = kr for some integerk. The sameargument applies for generald . Let f ∈ Ψd,U ,B be given, and letf i :U r → be definedby

f i(uir+1, . . . , uir+r )=r−1∑l=1

fir+l (uir+l , uir+l+1),

for i = 0, . . . , k − 1. Also, letui = (uir+1, . . . , uir+r ) ∈ U r be defined by

f i(ui

) = minu∈U r

f i(u), ∀i = 0, . . . , k − 1.

Consider a strategyφ ∈ Φd,U ,B such thatφ(f ) = (u0, . . . , uk−1) ∈ Ud . If φ∗(f ) =(u∗

1, . . . , u∗d) denotes the optimal decision for this problem instance, then

f(φ(f )

) = 1

d

k−1∑i=0

f i(ui

) + 1

d

k−1∑i=1

fir (uir , uir+1)

= 1

d

k−1∑i=0

r−1∑l=1

fir+l (uir+l , uir+l+1)+ 1

d

k−1∑i=1

fir (uir , uir+1)

1

d

k−1∑i=0

r−1∑l=1

fir+l(u∗ir+l, u∗

ir+l+1

) + 1

d

k−1∑i=1

fir (uir , uir+1)

= f(φ∗(f )

) + 1

d

k−1∑i=1

fir (uir , uir+1)− fir(u∗ir , u

∗ir+1

)

f(φ∗(f )

) + 2B

r


sirabletated ins intooftene of aology

eated orto all

entailstime

when

ithoutSuch

senceinisticer toopriatehand,

asticuctive,error

ht bebitraryts in

topice (see,fectivetable.ce the

where the last inequality follows from the fact thatk = d/r, and‖fi‖∞ B for all i. Sinceφ ∈ Φd,U ,r andf ∈Ψd,U ,B is arbitrary, it follows that

minφ∈Φd,U ,r

maxf∈Ψd,U ,B

(f

(φ(f )

) − f(φ∗(f )

)) 2B

r.

The above result suggests that coordination—rather than randomization—is a deway to go. The loss in performance diminishes at the rate of 1/r, which is faster than thaassociated with decentralized stochastic strategies that do not coordinate. As illustrthe constructive proof provided above, this is achieved simply by partitioning agentgroups ofr consecutive nodes. However, as we will now discuss, coordination isimpractical. In making a case for this claim, let us consider the more general casnetwork, rather than a chain. One might imagine situations where the network topchanges over time, as agents are introduced or deleted and connections are crdestroyed. At the time when a decision is required, the topology might not be knownagents. Formation of appropriate groups for coordinated decision-making, however,either centralized knowledge of this topology or iterative protocols that can requireproportional with the size of the network. Such requirements become prohibitivenetworks are large.

In contrast, decentralized stochastic strategies offer comparable performance wcoordination. Each agent needs only information associated with his local vicinity.strategies may therefore provide more practical means to effective decentralization.

6. Concluding remarks

We have studied the problem of decentralized decision-making in a chain, in the abof coordination. Our main results identify advantages of stochastic over determstrategies. In particular, the amount of information required by each agent in ordguarantee a certain level of performance is independent of problem size, when apprstochastic strategies are employed. With deterministic strategies, on the otherinformation requirements can grow proportionately with the total number of agents.

A by-product of our main result is an efficient algorithm that generates stochstrategies that deliver promised performance. In fact, the analysis was constrinvolving verification that strategies generated by this algorithm do indeed satisfy thebound.

There are several interesting directions in which the results of this paper migextended. One involves generalizing the results to encompass networks of artopology. Another interesting direction is to study the implications of such resulstochastic networks that are controlled over time, such as queuing networks. Theof decentralized control in dynamic systems has been studied for quite some time.g., (Sandell Jr. et al., 1978) for a survey), but there has generally been a lack of efmethods. Even relatively simple problems have proven to be computationally intracWhether limited interdependencies—as captured by graphical structure—can reducomplexity of such problems poses an intriguing research question.


nd byseful

rks andodels

o

that

Acknowledgments

This research was supported in part by NSF CAREER Grant ECS-9985229 aa Stanford Graduate Fellowship. We thank John Tsitsiklis and Pete Veinott for udiscussions and pointers to literature on decentralized decision-making and netwoKenneth Arrow and Richard Zeckhauser for stimulating conversations on network marising in economics.

Appendix A. Proof of Lemma 4

In this section, we will show that the approximate cost-to-go functionJi is close to the true cost-to-gfunctionJi . The metric that we will employ is the span-seminorm. Recall that

‖H‖s = maxu∈U

H(u)− minu∈UH(u),

for all H ∈ U . The proof of Lemma 4 requires the following result which shows that the operatorT i isa contraction with respect to the span-seminorm.

Lemma 8. For anyH1,H2 ∈ |U |,

∥∥T iH1 − T iH2∥∥s

(1− δ

|U |)

‖H1 −H2‖s .

Proof. From the definition of‖ · ‖s , it follows that for anyH ∈ |U | andC ∈ ,

‖H‖s = ‖H +Ce‖s ,wheree denotes a vector of all ones. Fixu ∈ U . Then,

∥∥T iH1 − T iH2∥∥s

= ∥∥T iH1 − T iH2 − (H1(u)−H2(u)

)e∥∥s

= ∥∥T i(H1 −H1(u)e) − T i

(H2 −H2(u)e

)∥∥s,

since T i(H + Ce) = T iH + Ce for any C ∈ . Hence, we can assume without loss of generalityH1(u)=H2(u)= 0. Letθ1, θ2 ∈Mδ be defined by

T iθ1H1 = T iH1 and T iθ2H2 = T iH2.

It follows from the definition ofT i that for anyu ∈ U ,

(T iH1

)(u)=E

[1

dfi

(u, θ1(u,Wi)

) +H1(θ1(u,Wi)

)]E

[1

dfi

(u, θ2(u,Wi)

) +H1(θ2(u,Wi)

)].

Similarly,

(T iH2

)(u)=E

[1

dfi

(u, θ2(u,Wi)

) +H2(θ2(u,Wi)

)]E

[1

dfi

(u, θ1(u,Wi)

) +H2(θ1(u,Wi)

)],

and therefore,

E[(H1 −H2)

(θ1(u,Wi)

)]

(T iH1 − T iH2

)(u)E

[(H1 −H2)

(θ2(u,Wi)

)].

SinceH1(u)=H2(u)= 0 andθ2 ∈Mδ , it follows that


hand is

(T iH1 − T iH2

)(u) E

[(H1 −H2)

(θ2(u,Wi)

)] =∑v =u

Prθ2(u,Wi)= v

(H1 −H2)(v)

(∑v =u

Prθ2(u,Wi)= v

)maxv∈U

(H1 −H2)(v)

(

1− δ

|U |)

maxv∈U

(H1 −H2)(v).

Sinceu ∈ U is arbitrary,

maxu∈U

(T iH1 − T iH2

)(u)

(1− δ

|U |)

maxu∈U

(H1 −H2)(u).

A similar argument shows that

minu∈U

(T iH1 − T iH2

)(u)

(1− δ

|U |)

minu∈U(H1 −H2)(u).

The last two inequalities imply that

∥∥T iHi − T iH2∥∥s

= maxu∈U

(T iH1 − T iH2

)(u)− min

u∈U(T iH1 − T iH2

)(u)

(

1− δ

|U |)(

maxu∈U

(H1 −H2)(u)− minu∈U(H1 −H2)(u)

)

=(

1− δ

|U |)

‖H1 −H2‖s .

Another property making the span-seminorm a desirable instrument for analysis of our problem atthat‖Ji‖s is uniformly bounded overi, and that the bound is inversely proportional withd .

Lemma 9. For eachi,

‖Ji‖s 2B

δd.

Proof. Let u, u ∈ U be chosen to satisfy

Ji (u)= maxu∈U

Ji (u) and Ji (u )= minu∈U Ji (u).

SinceJi = T iJi+1 = T iπi Ji+1,

Ji (u)=(T iπi Ji+1

)(u)=E

[1

dfi

(u, πi (u,Wi)

) + Ji+1(πi(u,Wi)

)],

and

Ji (u )=(T iπi Ji+1

)(u )=E

[1

dfi

(u,πi (u,Wi)

) + Ji+1(πi(u,Wi)

)].

Because‖fi‖∞ B,∣∣E[fi

(u, πi(u,Wi)

)] −E[fi

(u,πi (u,Wi)

)]∣∣ 2B,

and therefore,

‖Ji‖s = Ji (u)− Ji (u )2B

d+ ∣∣E[

Ji+1(πi(u,Wi)

)] −E[Ji+1

(πi(u,Wi)

)]∣∣ 2B

d+

∣∣∣∣ ∑(Pr

πi(u,Wi)= x

− Prπi(u,Wi)= x

)Ji+1(x)

∣∣∣∣.
x∈U


we are

Sinceπi ∈Mδ , there existv, v ∈ U such that

Prπi(u,Wi)= x

=

1− δ + δ

|U | if x = v,

δ

|U | otherwise,

and

Prπi(u,Wi)= x

=

1− δ+ δ

|U | if x = v,

δ

|U | otherwise.

Therefore,∑x∈U

(Pr

πi(u,Wi)= x

− Prπi(u,Wi)= x

)Ji+1(x)= (1− δ)(Ji+1(v)− Ji+1( v )

),

which implies that

‖Ji‖s 2B

d+ (1− δ)∣∣Ji+1(v)− Ji+1( v )

∣∣ 2B

d+ (1− δ)‖Ji+1‖s .

SinceJd = 0, we have

‖Ji‖s 2B

d

(1+ (1− δ)+ (1− δ)2 + · · · + (1− δ)d−i−1) 2B

δd.

The proof of Lemma 4 is a direct consequence of Lemmas 8 and 9, and it is given below.

Proof. It follows from the definition ofJi andJi that

Ji = T i . . . T i+r−1Ji+r and Ji = T i . . . T i+r−1Ji,r ,

whereJi,r = 0. By repeated applications of Lemma 8, we have

∥∥Ji − Ji∥∥s

(1− δ

|U |)r

‖Ji+r‖s ,

and the desired conclusion follows from Lemma 9.

Appendix B. Proof of Lemma 5

The proof Lemma 5 makes use of the following auxiliary results. The first shows that the expectationstaking induce contractions of the span-seminorm.

Lemma 10. For H ∈ |U | andu ∈ U , letQ :Mδ → be defined by

Q(θ)=E[H

(θ(u,W)

)],

where the random variableW is uniformly distributed in[0,1]. Then,

‖Q‖s = (1− δ)‖H‖s ‖H‖s .

Proof. Let u, u ∈ U be chosen to satisfy

H(u)= maxH(u) and H(u)= minH(u).
u∈U u∈U


Chooseθ , θ ∈Mδ such that

Prθ (u,W)= x

=

1− δ+ δ

|U | if x = u,

δ

|U | otherwise,

and

Prθ(u,W)= x

=

1− δ+ δ

|U | if x = u,

δ

|U | otherwise.

Then,

‖Q‖s = maxθ∈Mδ

E[H

(θ(u,W)

)] − minθ∈Mδ

E[H

(θ(u,W)

)] =E[H

(θ (u,W)

)] −E[H

(θ(u,W)

)]=

∑x∈U

(Pr

θ(u,W)= x

− Prθ(u,W)= x

)H(x)= (1− δ)

(H(u)−H(u))

= (1− δ)‖H‖s. The next lemma establishes a general result about the span-seminorm.

Lemma 11. LetS be a finite space. For anyg1, g2 :S → , if x∗1 , x

∗2 ∈ S satisfy

g1(x∗

1

) = minx∈S g1(x) and g2

(x∗

2

) = minx∈S g2(x),

then

0 g1(x∗

2

) − g1(x∗

1

) ‖g1 − g2‖s .

Proof. Note that

g1(x∗

2

) − g1(x∗

1

) = g1(x∗

2

) − g2(x∗

2

) − g1(x∗

1

) + g2(x∗

2

)= g1

(x∗

2

) − g2(x∗

2

) − g1(x∗

1

) + g2(x∗

1

) − g2(x∗

1

) + g2(x∗

2

)= (

(g1 − g2)(x∗

2

) − (g1 − g2)(x∗

1

)) + g2(x∗

2

) − g2(x∗

1

) ‖g1 − g2‖s ,

where the last inequality follows from the definition of‖ · ‖s and the fact that

g2(x∗

2

) = minx∈S g2(x) g2

(x∗

1

).

Proof of Lemma 5. Fix u ∈ U . LetQ1,Q2 :Mδ → be defined by

Q1(θ)= (T iθ H1

)(u)=E

[1

dfi

(u, θ(u,Wi)

) +H1(θ(u,Wi)

)]

and

Q2(θ)= (T iθ H2

)(u)=E

[1

dfi

(u, θ(u,Wi)

) +H2(θ(u,Wi)

)].

Thus, (Q1 −Q2)(θ)=E

[(H1 −H2)

(θ(u,Wi)

)],


om

and it follows from Lemma 10 that∥∥Q1 −Q2∥∥s ‖H1 −H2‖s .

Since

Q1(θ1)= minθ∈Mδ

Q1(θ) and Q2(θ2)= minθ∈Mδ

Q2(θ),

it follows from Lemma 11 that

0(T iθ2H1

)(u)− (

T iθ1H1)(u)=Q1(θ2)−Q1(θ1)

∥∥Q1 −Q2∥∥s

∥∥H1 −H2∥∥s,

and sinceu ∈ U is arbitrary,∥∥T iθ1H1 − T iθ2H1∥∥∞ ‖H1 −H2‖s .

Appendix C. Proof of Lemma 6

The proof Lemma 6 makes use of the following result.

Lemma 12. Letθkk0 be a sequence of mappings inMδ , and letWkk0 be a sequence of independent randvariables, each uniformly distributed in[0,1]. Consider sequences of random variablesXkk0, Ykk0, andZkk1 defined by

Xk+1 = θk(Xk,Wk), k 0,

and

Yk+1 = θk(Yk,Wk), Zk+2 = θk+1(Zk+1,Wk+1), k 0,

whereX0, Y0, andZ1 are any random variables inU . Then, foru,v ∈ U ,∣∣Pr(Xr ,Xr+1)= (u, v)

− Pr(Yr ,Zr+1)= (u, v)

∣∣ 4(1− δ)r ,

for all r .

Proof. Let a stopping timeN1 be defined by

N1 = infk > 0: Yk =Xk.It follows that onN1 r, (Yr ,Yr+1)= (Xr ,Xr+1) almost surely. Thus, for anyu,v ∈ U ,

Pr(Yr ,Yr+1)= (u, v)

= Pr(Yr ,Yr+1)= (u, v), N1 r

+ Pr(Yr ,Yr+1)= (u, v), N1 > r

= Pr

(Xr ,Xr+1)= (u, v), N1 r

+ Pr(Yr ,Yr+1)= (u, v), N1 > r

.

Since

Pr(Xr ,Xr+1)= (u, v), N1 r

= Pr(Xr ,Xr+1)= (u, v)

− Pr(Xr ,Xr+1)= (u, v), N1 > r

,

it follows that∣∣Pr(Xr ,Xr+1)= (u, v)

− Pr(Yr ,Yr+1)= (u, v)

∣∣ 2PrN1> r.Note that

PrN1 > r = PrX1 = Y1, . . . ,Xr = Yr =r−1∏

PrXl+1 = Yl+1 |Xs = Ys, 1 s l.
l=0


find

Sinceθl ∈Mδ , it follows that

θl (U,W)=θl (U) if 0 W < 1− δ,kth element ofU if 1 − δ+ (k− 1)δ

|U | W < 1− δ+ kδ

|U | ,

whereθl is some function inU . Recall that

Xl+1 = θl (Xl ,Wl) and Yl+1 = θl (Yl,Wl).

Thus, ifWl > 1− δ, thenXl+1 = Yl+1 regardless of the value ofXl andYl . Therefore,

PrXl+1 = Yl+1 |Xs = Ys , 1 s l δ,

which implies that

PrN1 > r =r−1∏l=0

PrXl+1 = Yl+1 |Xs = Ys, 1 s l (1− δ)r .

Hence,∣∣Pr(Xr ,Xr+1)= (u, v)

− Pr(Yr ,Yr+1)= (u, v)

∣∣ 2(1− δ)r .

Now, consider a stopping timeN2 defined by

N2 = infk > 0: Y1+k = Z1+k.Then, onN2 r, (

Yr ,Yr+1) = (

Yr ,Zr+1)

almost surely. Using exactly the same argument as above, wethat ∣∣Pr

(Yr ,Yr+1)= (u, v)

− Pr(Yr ,Zr+1)= (u, v)

∣∣ 2(1− δ)r .An application of a triangle inequality yields∣∣Pr

(Xr ,Xr+1)= (u, v)

− Pr(Yr ,Zr+1)= (u, v)

∣∣ 4(1− δ)r . Proof of Lemma 6. It follows from the definition ofφδ,r that there are mappingsπ0, . . . , πd−1 ∈Mδ such that

φδ,rk+1(f )= πk

(φδ,rk (f ),Wk

), ∀k = 0, . . . , d − 1,

whereφδ,r0 (f ) = ι, andW0, . . . ,Wd−1 are independent random variables, each uniformly distributed in[0,1].Under the decentralized stochastic strategy, the decisionφ

δ,ri (f )= φ

δ,ri,0 (f ) is generated from the equation:

φδ,ri,l+1(f )= πi+l

(φδ,ri,l (f ),Wi+l

), ∀l = −r, . . . ,−1,

whereφδ,ri,−r (f )= ι. Analogously, the decisionφδ,ri+1(f )= φδ,ri+1,0(f ) is generated from the equation:

φδ,ri+1,l+1(f )= πi+1+l

(φδ,ri+1,l (f ),Wi+1+l

), ∀l = −r, . . . ,−1,

whereφδ,ri+1,−r (f )= ι. Let sequences of random variablesXkk0, Ykk0, andZkk1 be defined by

Xk = φδ,ri−r+k(f ), ∀k 0,

and

Yk = φδ,ri,−r+k (f ), Zk+1 = φ

δ,ri+1,−r+k(f ), ∀k 0.

Then, for anyx,y ∈ U ,

Pr(φδ,ri (f ), φ

δ,ri+1(f )

) = (x, y) = Pr

(Xr ,Xr+1)= (x, y)

,

and

Pr(φδ,ri (f ),φ

δ,ri+1(f )

) = (x, y) = Pr

(Yr ,Zr+1)= (x, y)

.

The desired result follows directly from Lemma 12 (withθk = πi−r+k ).


IEEE

lmont,

. IEEE

. 10 (3).

4 (4).nn, San

43–150.

Appendix D. Proof of Lemma 7

It suffices to show that

limr→∞

rp(1− a/rp)r1/rp

= 0.

Since

limx→0

ln(1+ x)

x= 1,

it follows that

1= limr→∞

ln(1− a/rp)−a/rp = lim

r→∞ln(1− a/rp)r

−ar1−p .

Thus, there existsN such that wheneverr >N ,

1

2 ln(1− a/rp)r

−ar1−p 3

2.

Note that

ln

(rp(1− a/rp)r

1/rp

)= 2p ln r + ln

(1− a

rp

)r= 2p ln r − ar1−p ln(1− a/rp)r

−ar1−p .

Thus, forr >N ,

ln

(rp(1− a/rp)r

1/rp

) 2p ln r − a

2r1−p,

which implies that

limr→∞ ln

(rp(1− a/rp)r

1/rp

)= −∞.

Therefore,

limr→∞

rp(1− a/rp)r1/rp

= 0.

References

Basar, T., 1985. An equilibrium theory for multiperson decision making with multiple probabilistic models.Trans. Automat. Control 30 (2).

Bertelé, U., Brioschi, F., 1972. Nonserial Dynamic Programming. Academic Press, New York, NY.Bertsekas, D.P., 1995. Dynamic Programming and Optimal Control. Vols. I and II. Athena Scientific, Be

MA.Billard, E., Pasquale, J., 1995. Adaptive coordination in distributed systems with delayed information

Trans. Systems Man Cybern. 25 (4), 546–554.Granot, F., Veinott, A.F. Jr., 1985. Substitutes, complements and ripples in network flows. Math. Oper. ResJensen, F.V., 1996. An Introduction to Bayesian Networks. Springer-Verlag, New York.Marschak, J., 1955. Elements for a theory of teams. Manage. Sci. 1, 127–137.Marschak, J., Radner, R., 1972. Economic Theory of Teams. Yale Univ. Press, New Haven, NJ.Papadimitriou, C.H., Tsitsiklis, J.N., 1986. Intractable problems in control theory. SIAM J. Control Optim. 2Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Kaufma

Mateo, CA.Radner, R., 1959. The application of linear programming to team decision problems. Manage. Sci. 5 (2), 1Radner, R., 1962. Team decision problems. Ann. Math. Statist. 33, 857–881.


ods for

blems.

Sandell, N.R. Jr., Varaiya, P.P., Athans, M., Safonov, M.G., 1978. Survey of decentralized control methlarge scale systems. IEEE Trans. Automat. Control 23 (2).

Tsitsiklis, J.N., Athans, M., 1985. On the complexity of decentralized decision making and detection proIEEE Trans. Automat. Control 30 (5).

von Stengel, B., Koller, D., 1997. Minmax equilibria in team games. Games Econ. Behav. 21, 309–321.

Documents

Decentralized decision-making in a large team with local